A couple days ago I reported a survey saying that most IT professional are worried about the security of LLMs. They have every right to be. There seems to be an endless number of ways of attacking them. In my forthcoming book, Taming Silicon Valley, I describe two examples. The first sometimes gets an LLM to disgorge private information:
The second (which may be patched in many systems) used ASCII art to get around guardrails:
Today Anthropic reported a new one, called Many Shot Jailbreaking. (H/t
).There are many others approaches to jailbreaking, too; I won’t give them all away here.
Nor will I pretend to know them all; the security hits keep coming; nobody knows a complete list, and that’s the point. Some get patched, but there very likely will aways be more.
Nobody knows exactly how LLMs work, and that means that nobody can ever issue strong guarantees around them. That would be fine if LLMs were kept in the lab, where they arguably still belong, but with hundreds of millions of people using them daoly, and companies like OpenAI promising to put them in all purpose agents with essentially full device-level access, the lack of any kind of guarantee gets more worrisome by the day.
As noted above, Rule 1 of cybersecurity is to keep your attack surface small; in LLMs the attack surface appears to be infinite. That can’t be good.
Gary Marcus doesn’t really know how else to make this point. Secure, trustworthy AI could be a great thing, but LLMs will never be that.
LLMs aren't even real AI. They're basically just repositories of data that try to predict what comes next. That's great for all the companies in the data-selling business like Google, not so great for everyone else, especially not people who research AI, but also not people who just want to use it. https://philosophyoflinguistics618680050.wordpress.com/2023/10/18/has-chatgpt-refuted-noam-chomsky/
So, the day after LLMs are invented, we came up with social engineering attacks to compromise them. Sounds about right.