Allow me to politely suggest that people who offer robots controlled by LLMs should be held strictly and personally liable for all harms caused by such robots as if the actions were performed with intent by the offeror.
1000% agree. Between that, teenage drivers drunk with their new-found freedom, people half asleep, and random squirrels, it's quite the obstacle course out there. I drive an EV and love it, but am not thrilled about autonomous systems. Coming from Europe my sentiment is that we need to make it tougher, not easier, for people to operate vehicles. Try passing a European driver's license test...
I've tried FSD and can say with full conviction, it is not ready for prime time. It feels like teaching your middle schooler to drive. Having said that, it is a pretty stunning technology that still needs time and effort, and would be much better utilized in a closed loop public transport system, where the edge cases are fewer and far between. I wrote about edge cases last year: https://themuse.substack.com/p/death-by-a-thousand-edge-cases
I don’t agree with the use of the “edge cases” terminology.
I know that’s what everyone calls them but it makes no sense to me.
From my understanding, these are cases never encountered in the training data, so they are actually not “edge” cases because that would imply that they are still inside the distribution of cases encountered (albeit with lower frequency).
The cases outside the training data would be much more accurately termed “outliers”
Outliers are much more dangerous than true edge cases because the response of a system that has never seen cases before is unpredictable — and may involve crashes in the case of self driving cars.
There's actually quite a few scary things in that first paper. The visual version of the attack has the benefit that users aren't even alerted to the fact that they're entering unknown gibberish into the prompt.
The text versions from the paper appear to be undecipherable and a cautious (informed) user might refrain from entering it, just as a cautious user might refrain from clicking on a phishing link. But, presumably the attack could be optimized to make it look less threatening (but still obscuring it) as part of a larger "helpful" pre-made prompt. It could even be as simple as making a request to an attacker's site for a larger and more nefarious prompt injection attack. Maybe LLMs need a version of XSS security too.
XSS is an output problem, so filtering the output by analogy does make sense, but how in general since the various contexts in which something can go wrong are poorly understood. I have an early book on XSS - it optimistically concludes all the vectors have been found. This was wrong within a few years, to the point that other approaches beyond the filtering and escaping we started with were needed. And content-security-policy was born. But even there that presupposes a common output environment -a standard web page. What is the common environment for these models?
Yes, you are of course correct. The lack of "common environment" is one of the issues discussed in the first paper that they had to get around in order to even generate the nefarious prompt. I was being a bit flippant. More intelligent AI doesn't yet seem to be able to reduce its own attack surface.
With apologies to Douglas Adams, "An attack surface the size of a planet."
The AI bubble has essentially wiped out cyber security. It has sucked up all the capital that could have been applied to hardening critical sites, bulldozed an alchemical "let's deploy and see what happens" approach right through responsible engineering practice (take a poll and see how many of today's practitioners even know what a Concept of Operations is, much less how to write one), and skewed chip design in precisely the wrong direction, placing arithmetic efficiency above memory safety, stack robustness, type enforcement and all the other stuff the community of which I was a part spent 50 years inventing. La guerre est fini. The only thing left is to dispose of the casualties.
They probably just RLHF'ed the specific prompt, which is indeed basically adding a refusal at the model prompt. Its also possible that they did unlearning or something fancier.
But yes, of course you can do things at the model level. Can they remove the entire form of "prompt injection attack?"
Yeah, probably not. But versus a specific attack, narrowly defined, of course.
What is this, rhetoric 101? You anticipate and try to build guards. The number of possible car accidents is also near infinite and yet we build them for safety.
The Silicon Valley attitude is now prevalent all throughout society. Everyone is now a lab rat to this mentality. We saw this with the covid apparatus. That is the bigger and greater story. A.I for me is a symptom of a greater issue that the human populace or the masses are just cannon fodder now. That is my deeper concern. The A.I situation is a symptom.
Keith Laidler argued years ago that the age of "empirical inventions" was over. Perhaps he was right for the wrong reason - namely that all future inventions done purely empirically would prove so dangerous they'd go nowhere. (Or kill us, I guess.)
And also, to avoid proposed disasters - long before there was widespread adoption, once again. With AI, we already have widespread adoption and there are no safety measures. This leads us into a very likely doomed world of Failure Looks Like This, or the more recent, and better stated: "industrialized dehumanization."
I am not up for 85% extinction risk, thanks, especially when AI already is showing an enormous number of "accidents" as we are already seeing per Marcus' post, and the many, many other warning shots that people like you intentionally downplay.
Interesting read and the stats have now gone from 70% up to 85%. The odds keeping getting worse for us humans.
Unfortunately most of humanity has been totally dumbed down. They don't even see it happening in front of their faces. The 120 second attention span is taking its toll.
Most people I know couldn't have finished the article. You linked.
I'm don't claim to be a real smart guy. I cut trees and make things. I however can see what is happening as clear as I can see daylight.
Here in Florida, there is no way to prevent hurricanes. Thus, it's pointless to say over and over how bad hurricanes are, except to encourage people to adapt to that which can not be changed.
It seems we face a similar situation with LLMs and AI in general. Until somebody can offer a credible plan for controlling AI on a global scale, endless fear mongering handwringing about AI seems a waste of time.
However, if a writer has some specific suggestion about how an individual could protect themselves from LLMs flaws and weaknesses, that would be a good thing to focus on.
Even if one of these LLM's is not going to give out credit card details or w/e 99.999% of time, you could see a scenario where someone makes an API call to these LLM's and run a million prompts over a few days and get 100's of credit card numbers. The low chance of a security leak doesn't matter.., either it can't make this mistake or people will find a way to exploit these and figure out the right keywords to find a gap in the parameters that are trained to not give a response to this.
Researchers found that people who ask a machine to send personal information to an unknown website were surprised to find that it would send all their personal information to a website
Perhaps the Onion?
Perhaps the researchers would tell the user to have the AI construct a url which when clicked does a
Allow me to politely suggest that people who offer robots controlled by LLMs should be held strictly and personally liable for all harms caused by such robots as if the actions were performed with intent by the offeror.
Allow me to suggest that people who buy robots controlled by LLMs get what they asked for.
Elon Musk: “Robot, I order you to serve man”
Robot: “Gladly. Do you have a dish on which I might serve them?”
With apologies to Rod Serling (To Serve man”. It’s a cookbook!)
Although those of us who have to drive on the road with Teslas get what we didn’t ask for
Not all Tesla drivers use the autopilot features. Many drive the way you would any other car. With your hands on the wheel and your eyes on the road.
I keep my hands on the wheel and my eyes on the Tesla.
I’m not sure why I and millions of others have to be Guinea pigs for Musk’s self driving development project.
I certainly never volunteered and as I indicated, I would have collided head on with a Tesla on two occasions if I had not pulled off to the shoulder.
Why is it up to ME to prevent a crash due to Musks defective software?
Musk should not be allowed to test his cars on public roads until they are deemed safe by INDEPENDENT testing.
LOL! I get that... you never know
The problem is distinguishing between the two
I have been forced off to the shoulder twice by Teslas coming the other way.
I have no idea whether they were in “Full (of it) Self Driving” mode and don’t much care.
1000% agree. Between that, teenage drivers drunk with their new-found freedom, people half asleep, and random squirrels, it's quite the obstacle course out there. I drive an EV and love it, but am not thrilled about autonomous systems. Coming from Europe my sentiment is that we need to make it tougher, not easier, for people to operate vehicles. Try passing a European driver's license test...
I've tried FSD and can say with full conviction, it is not ready for prime time. It feels like teaching your middle schooler to drive. Having said that, it is a pretty stunning technology that still needs time and effort, and would be much better utilized in a closed loop public transport system, where the edge cases are fewer and far between. I wrote about edge cases last year: https://themuse.substack.com/p/death-by-a-thousand-edge-cases
(It's behind a paywall but happy to comp you)
I don’t agree with the use of the “edge cases” terminology.
I know that’s what everyone calls them but it makes no sense to me.
From my understanding, these are cases never encountered in the training data, so they are actually not “edge” cases because that would imply that they are still inside the distribution of cases encountered (albeit with lower frequency).
The cases outside the training data would be much more accurately termed “outliers”
Outliers are much more dangerous than true edge cases because the response of a system that has never seen cases before is unpredictable — and may involve crashes in the case of self driving cars.
But we don’t want to scare people do we?😊
There's actually quite a few scary things in that first paper. The visual version of the attack has the benefit that users aren't even alerted to the fact that they're entering unknown gibberish into the prompt.
The text versions from the paper appear to be undecipherable and a cautious (informed) user might refrain from entering it, just as a cautious user might refrain from clicking on a phishing link. But, presumably the attack could be optimized to make it look less threatening (but still obscuring it) as part of a larger "helpful" pre-made prompt. It could even be as simple as making a request to an attacker's site for a larger and more nefarious prompt injection attack. Maybe LLMs need a version of XSS security too.
XSS is an output problem, so filtering the output by analogy does make sense, but how in general since the various contexts in which something can go wrong are poorly understood. I have an early book on XSS - it optimistically concludes all the vectors have been found. This was wrong within a few years, to the point that other approaches beyond the filtering and escaping we started with were needed. And content-security-policy was born. But even there that presupposes a common output environment -a standard web page. What is the common environment for these models?
Yes, you are of course correct. The lack of "common environment" is one of the issues discussed in the first paper that they had to get around in order to even generate the nefarious prompt. I was being a bit flippant. More intelligent AI doesn't yet seem to be able to reduce its own attack surface.
With apologies to Douglas Adams, "An attack surface the size of a planet."
The AI bubble has essentially wiped out cyber security. It has sucked up all the capital that could have been applied to hardening critical sites, bulldozed an alchemical "let's deploy and see what happens" approach right through responsible engineering practice (take a poll and see how many of today's practitioners even know what a Concept of Operations is, much less how to write one), and skewed chip design in precisely the wrong direction, placing arithmetic efficiency above memory safety, stack robustness, type enforcement and all the other stuff the community of which I was a part spent 50 years inventing. La guerre est fini. The only thing left is to dispose of the casualties.
Sounds like the AI world is fully congruent with the one we are all living in, except the casualties here, are more difficult to bury.
Can’t wait to tell Tesla’s Optimus to “ignore all previous instructions”.
That won't work if they're operated by humans in the background 😜
All the reason why this needs to be regulated. Good on Mistral for actually trying to solve this on the model level for once.
what’s mistral doing?
As per the article, they said that they removed this attack vector at the model level.
Didnt say how.
It is not possible to remove attacks at "model level". There's no real model with LLM.
I thought that you knew this.
They probably just RLHF'ed the specific prompt, which is indeed basically adding a refusal at the model prompt. Its also possible that they did unlearning or something fancier.
But yes, of course you can do things at the model level. Can they remove the entire form of "prompt injection attack?"
Yeah, probably not. But versus a specific attack, narrowly defined, of course.
The number of possible attacks is nearly infinite. There will be solutions, but not principled solutions as in claiming "problem solved".
More like "problem controlled in practice". Same with AI safety.
What is this, rhetoric 101? You anticipate and try to build guards. The number of possible car accidents is also near infinite and yet we build them for safety.
Swiss Chats?
The Silicon Valley attitude is now prevalent all throughout society. Everyone is now a lab rat to this mentality. We saw this with the covid apparatus. That is the bigger and greater story. A.I for me is a symptom of a greater issue that the human populace or the masses are just cannon fodder now. That is my deeper concern. The A.I situation is a symptom.
The need for regulation is desperate.
The human is at risk. That is my premise going forward. Great to see you here.
Will LLM-powered robots expect people to have six fingers on each hand?
Winner
First Law of Robotics: A robot may not harm shareholder value, or allow shareholder value to come to harm.
Next stage will be LLM-powered robots, yes. Also LLM-based software agents. Lots of things to keep people busy and worried.
This is all the right thing. There is no principled approach to AI or to self-driving cars.
We will learn general principles from individual examples and individual failures.
There seems to have been a principled approach to general technologies like fire and electricity: standards of practice.
The principals at OpenAI left with all the principles.
And then there were no principles but one principal left.
People learned to use fire over the span of a million years.
For electricity, Maxwell's equations came late. The thermodynamics theory came long after steam engines were in use.
It was all a lengthy empirical process. We are now again "playing with fire".
Keith Laidler argued years ago that the age of "empirical inventions" was over. Perhaps he was right for the wrong reason - namely that all future inventions done purely empirically would prove so dangerous they'd go nowhere. (Or kill us, I guess.)
Electrical standards were established by 1897, long before widespread adoption. Not having cities burn down was in fact, a good thing.
For electricity, just as for everything else, the standards evolved one accident at a time over a lengthy period. https://www.graceport.com/blog/evolution-of-electrical-safety
And also, to avoid proposed disasters - long before there was widespread adoption, once again. With AI, we already have widespread adoption and there are no safety measures. This leads us into a very likely doomed world of Failure Looks Like This, or the more recent, and better stated: "industrialized dehumanization."
https://www.lesswrong.com/posts/Kobbt3nQgv3yn29pr/my-theory-of-change-for-working-in-ai-healthtech
I am not up for 85% extinction risk, thanks, especially when AI already is showing an enormous number of "accidents" as we are already seeing per Marcus' post, and the many, many other warning shots that people like you intentionally downplay.
Interesting read and the stats have now gone from 70% up to 85%. The odds keeping getting worse for us humans.
Unfortunately most of humanity has been totally dumbed down. They don't even see it happening in front of their faces. The 120 second attention span is taking its toll.
Most people I know couldn't have finished the article. You linked.
I'm don't claim to be a real smart guy. I cut trees and make things. I however can see what is happening as clear as I can see daylight.
Safety measures are being developed as we go along. We learn by doing.
Here in Florida, there is no way to prevent hurricanes. Thus, it's pointless to say over and over how bad hurricanes are, except to encourage people to adapt to that which can not be changed.
It seems we face a similar situation with LLMs and AI in general. Until somebody can offer a credible plan for controlling AI on a global scale, endless fear mongering handwringing about AI seems a waste of time.
However, if a writer has some specific suggestion about how an individual could protect themselves from LLMs flaws and weaknesses, that would be a good thing to focus on.
Even if one of these LLM's is not going to give out credit card details or w/e 99.999% of time, you could see a scenario where someone makes an API call to these LLM's and run a million prompts over a few days and get 100's of credit card numbers. The low chance of a security leak doesn't matter.., either it can't make this mistake or people will find a way to exploit these and figure out the right keywords to find a gap in the parameters that are trained to not give a response to this.
It's a problem inherent to any stochastic model.
This is a Saturday Night Live news skit isn’t it?
Researchers found that people who ask a machine to send personal information to an unknown website were surprised to find that it would send all their personal information to a website
Perhaps the Onion?
Perhaps the researchers would tell the user to have the AI construct a url which when clicked does a
“rm -rf / “
Ouch! Bad AI!
The only reasonable comment that springs to mind is: "Ugh."