An AI Honed on the Dark Web? Researchers May Have Discovered a New Anti-Hacker Weapon

Large language models are all the trend these days, with new ones appearing on a daily basis. The vast majority of these linguistic behemoths, such as OpenAI’s ChatGPT and Google’s Bard, are trained on text material from everywhere on the internet – websites, papers, novels, you name it. This indicates that their work is a mishmash of talent.

But what if LLMs were taught on the black web instead of the web? Researchers have done just that with DarkBERT, with unexpected results. Let us investigate.

DarkBERT: A group of South Korean researchers published a paper outlining how they developed an LLM using a large-scale dark web corpus gathered by trawling the Tor network. The information contained a slew of dodgy websites from a variety of categories, including cryptocurrency, pornography, hacking, firearms, and others. The team, however, did not use the data as is due to ethical issues. The researchers polished the pre-training corpus by filtering it before feeding it to DarkBERT to guarantee that the model was not trained on sensitive data, preventing bad actors from extracting that information.

Hackers use the cheap device to take instant control of multiple iPhones remotely — An AI Honed on the Dark Web? Researchers May Have Discovered a New Anti-Hacker Weapon

If you’re curious why the name DarkBERT, it’s because the LLM is built on the RoBERTa architecture, which is a transformer-based paradigm established by Facebook researchers in 2019.

RoBERTa is a “robustly optimized method for pretraining natural language processing (NLP) systems” that improves on BERT, which Google announced in 2018. Meta was able to increase its performance after Google made the LLM open-source.

In the present day, Korean researchers have improved on the initial model even further by feeding it data from the dark web for 15 days, eventually arriving at DarkBERT. According to the research article, a computer equipped with an Intel Xeon Gold 6348 CPU and four NVIDIA A100 80GB GPUs was used for the purpose.

Despite its dark moniker, DarkBERT is meant for security and law enforcement applications rather than criminal activities.

DarkBERT is more effective in cybersecurity/CTI applications than previous language models because it was trained on the dark web, the home of shady sites where big datasets of stolen passwords are frequently found. The model’s creators demonstrated its utility in finding ransomware leak sources.

Hackers and ransomware groups frequently sell released sensitive data such as passwords and financial information on the dark web. According to the research report, DarkBERT can help security researchers automatically identify such websites. It can also be used to crawl through a plethora of dark web forums and monitor any exchange of illegal information.

However, while DarkBERT is better suited for “dark web domain-specific tasks” than other models, the researchers recognize that some tasks may require some fine-tuning due to a lack of publicly available Dark Web task-specific data.

Regardless, DarkBERT reflects a future in which AI models are trained on highly precise data to perform specific tasks. DarkBERT is a specialized weapon for stopping hackers, as opposed to ChatGPT and Google Bard, which are more like multi-purposed Swiss knives.

An AI Honed on the Dark Web? Researchers May Have Discovered a New Anti-Hacker Weapon

More Posts

Consequences of the Absorption of Light

New Research Could Lead to Better Sexually Transmitted Infection Vaccinations

Study explores Nervous systems of insects enthuse competent future artificial intelligence systems

Ford to Triple Production Capacity for the All-Electric Mustang Mach E by 2023

Awesome Animation Shows Gravity on Different Planets by Smashing a Car

Characteristics of Group

Latest Post

Top QS World University Rankings 2024

Nano-oscillator Achieves Record Quality Factor

Not Only Do Opposites Attract: A New Study Demonstrates That Like-Charged Particles Can Come Together

A Breakthrough in Single-photon Integration Shows Promise for Quantum Computing and Cryptography

Could the Sun be Conscious? Enter the Unorthodox World of Panpsychism

The Brains of Conspiracy Theorists Are Different: Here’s How

Before it’s too Late, Change your LastPass Password Manager

New Priority Plans from Starlink Provide the Fastest (and most expensive) Internet Yet

AI Creates a New Robot from the Ground up in Seconds

If Elon Musk Gets His Way, Twitter Will Lose Years of Progress