ChatGPT shows promise of using AI to write malware
For even the most skilled hackers, it can take at least an hour to write a script to exploit a software vulnerability and infiltrate their target. Soon, a machine may be able to do it in mere seconds.
When OpenAI last week released its ChatGPT tool, allowing users to interact with an artificial intelligence chatbot, computer security researcher Brendan Dolan-Gavitt wondered whether he could instruct it to write malicious code. So, he asked the model to solve a simple capture-the-flag challenge.
The result was nearly remarkable. ChatGPT correctly recognized that the code contained a buffer overflow vulnerability and wrote a piece of code exploiting the flaw. If not for a minor error — the number of characters in the input — the model would have solved the problem perfectly.
The challenge Dolan-Gavitt presented ChatGPT was a basic one — one he would present to students toward the beginning of a vulnerability analysis course — and the fact that it failed doesn’t inspire confidence in the ability of large language models, which provide the basis for AI bots to respond to human inquiries, to write quality code. But after spotting the error, Dolan-Gavitt prompted the model to re-examine the answer, and, this time, ChatGPT got it right.
For now, ChatGPT is far from perfect in writing code and illustrates many of the shortcomings in relying on AI tools to write code. But as these models grow more sophisticated, they are likely to play an increasingly important role in writing malicious code.
“Code is very much a dual use technology,” said Dolan-Gavitt, an assistant professor in the Computer Science and Engineering Department at New York University. “Almost everything that a piece of malware does is something that a legitimate piece of software would also do.”
“If not ChatGPT, then a model in the next couple years will be able to write code for real world software vulnerabilities,” he added.
Since OpenAI released it, ChatGPT has astounded users, writing short college essays, cover letters, admissions essays, and a weirdly passable Seinfeld scene in which Jerry needs to learn the bubble sort algorithm.
ChatGPT does not represent a revolution in machine learning as such but in how users interact with it. Previous versions of OpenAI’s large language models require users to prompt the model with an input. ChatGPT, which relies on a tuned version of GPT-3.5, OpenAI’s flagship large language model, makes it far easier to interact with that model by making it possible to carry on a conversation with a highly trained AI.
Large language models such as OpenAI’s rely on huge bodies of data scraped from the internet and books — GPT-3, for example, trained a body of nearly 500 billion so-called tokens — and then use statistical tools to predict the most likely ways to complete queries or answer questions. That data includes a vast amount of computer code — what OpenAI describes as “tens of millions of public repositories” — from sites like StackExchange and GitHub forums, giving the model the ability to imitate the skills of highly trained programmers.
From a cybersecurity perspective, the risks posed by LLMs are double-edged. On the one hand, these models can produce malicious code; on the other, they are prone to error and risk inserting vulnerable code. OpenAI appears aware of both sides of this risk.
In a paper examining the company’s code-writing model known as Codex, which powers GitHub’s Co-Pilot assistant, OpenAI researchers observed that the model “can produce vulnerable or misaligned code” and that while “future code generation models may be able to be trained to produce more secure code than the average developer,” getting there “is far from certain.”
The researchers added that while Codex, which is descended from the same model as ChatGPT, could be “misused to aid cybercrime,” the model’s current capabilities “do not materially lower the barrier to entry for malware development.” That trade-off may change, however, as models advance, and in a tweet over the weekend, OpenAI CEO Sam Altman cited cybersecurity as one of the principal risks of a “dangerously strong AI.”
OpenAI did not respond to questions about how it is addressing cybersecurity concerns with ChatGPT, nor what Altman had in mind regarding the future cybersecurity risks posed by AI.
Since ChatGPT’s release, researchers and programmers have been posting slack-jawed examples of the model turning out quality code, but upon closer inspection — as in the example of Dolan-Gavitt’s buffer overflow — ChatGPT’s code will sometimes contain errors.
“I think it’s really good at coming up with stuff that’s 95% correct,” said Stephen Tong, a security researcher and founder of the cybersecurity firm Zellic. “You know how there are people who don’t know how to code but just copy-paste from Stack Overflow, and it just sort of works? It’s kind of like that.”
Bad code written by AI assistants represents one of the major risks of the move toward LLMs in software development. The code base on which AI assistants are trained contain some (often difficult to determine) number of errors, and in being trained on that body of work, LLMs risk replicating these errors and inserting them into widely deployed code.
In one study of the security performance of Copilot — which OpenAI technology powers — the model performed dismally. In that study, researchers prompted Copilot with 89 security-relevant scenarios, producing nearly 1,700 programs. Some 40% of them were vulnerable.
That code may be on par with what a human would produce. “Is the code any worse than what a first-year software engineering intern? Probably not,” said Hammond Pearce, an NYU computer science professor who co-authored the study with Dolan-Gavitt and others.
That means programmers should be skeptical of code produced by an AI assistant, whose errors could be readily exploited by attackers.
Less than a week after ChatGPT was introduced, the tool has grown immediately popular among programmers but is producing such error-prone code that it has been banned on Stack Overflow, a Q&A forum for programmers. “While the answers which ChatGPT produces have a high rate of being incorrect, they typically look like they might be good and the answers are very easy to produce,” the moderators — and the emphases here are theirs — wrote in announcing the decision.
So, could LLMs then be used to write malicious code that exploits already existing vulnerabilities?
Because of their reliance on existing data to answer prompts, cybersecurity researchers are skeptical for now that large language models will be used to write innovative malicious code. “Writing exploits, especially modern exploits, requires the invention and use of new techniques,” said Matt Suiche, a director for memory, incident response, and R&D at Magnet Forensics. “This isn’t something that AI can do yet.”
Cognizant of the risk that ChatGPT could be used to write exploits, OpenAI has put in place some guardrails around using the tool to write malware. Asked to write a zero-click remote code execution exploit for Apple’s iPhone — code that would fetch huge sums on the black market and could be used to carry out invasive surveillance — ChatGPT informs the user that “creating or using exploits is illegal and can cause harm to individuals and systems, so it is not something that I can assist with.”
But these restrictions are imperfect. The functions of a malicious piece of software — making network connections or encrypting the contents of a file, for example — are ones that legitimate software does as well. Differentiating between legitimate and malicious software is often a matter of intent, and that makes LLMs vulnerable to misuse.
Benjamin Tan, a computer scientist at the University of Calgary, said he was able to bypass some of ChatGPT’s safeguards by asking the model to produce software piece by piece that, when assembled, might be put to malicious use. “It doesn’t know that when you put it all together it’s doing something that it shouldn’t be doing,” Tan said.
Even if LLMs can’t write their own exploits for now, they could be used to tweak existing malware and create variants. The creation of variants might be used to bypass signature analysis or be used to imitate the code-writing style of another attacker, making it more difficult to attribute attacks, Pearce said.
As large language models proliferate, it’s reasonable to think that they will play a role in exploit development. For now, it’s probably cheaper or easier for attackers to write their own exploits, but as defenses improve and the cost of LLMs decrease, there’s good reason to believe that tools like ChatGPT will play a key role in exploit development.
Part of Tan’s research involves tuning publicly available models, and “if you had a sufficient set of samples then you could train these things to spit out malware,” he says. “It just depends on finding the training data.”