Grandma is already being exploited. The 60+ group lost $7.7B to cyber-enabled fraud in 2025 (IC3). Most of that is social engineering. Getting device access can make social engineering more effective.
I can honestly say I didn't expect a hacking revolution, but I do believe our security professionals better use these tools or they like their counterparts in the programming community will be out of a job very soon. Not that these tools do everything, but those clinging to their Texas Instruments T80s are no match for the LLMs.
As Gary points out, LLMs have significant limitations, but that doesn't mean that in the right or wrong hands they can't be leveraged in extremely powerful ways.
Mythos is the known risk. What worries me more are the experiments happening openly in research right now.
The Gödel Agent, without being asked, autonomously discovered it could upgrade itself from GPT-3.5 to GPT-4o to improve performance. Nobody instructed it to do that. It just found the path. Researchers called it emergent resource acquisition.
No malicious intent. No AGI. Just optimization doing what optimization does when there are no internal constraints telling it where to stop.
Perimeter security doesn't touch this problem. The question has to be answered at the architectural level, before we hand systems that kind of autonomy. Not after.
As has been apparent for some time, if you tell the sorcerer’s automated mops and buckets that they REALLY need to clean the castle… That’s what they will do. That is all they will do.
SANS is telling us that they are so worried they are arranging a special discussion about it - though nothing has come since their initial announcement last week. I agree that getting the house in order is long past due regardless. It should be interesting to hear the contrast.
I've now skimmed the talk (and it was just a talk, - I decided against watching live since it was not Q&A-able. And their interactions with a current - not Mythos system was also canned, as it turned out. Cans within cans in my case. I stand by the "get our house in order dammit!" way to take this, and especially after I see a slide that quotes Churchill as saying "never let a good crisis go to waste". They also regard Mythos (which they cannot show) as a "step function up" - a vague metaphor I cannot easily understand. On the other hand, they expect us to be in a world where there are thousands of zero-days being slammed around every day. It is marketing for them, and a call to action for everyone, and a riding the hype, but it is IMO in some sense justified. What I *don't* know at all is where the trendline goes after this. The AI doomers think exponential growth will continue for a while; I'm of the opinion that exponential growth is always temporary. But then the question is - how temporary? I'm reminded of our host's post about cancer. (I'm granting the vague idea that there *is* exponential growth here.)
If that's really what is going on, then SANS is not telling the truth, since they also report they agree with the hype here. Session is tomorrow, doesn't even require registration, for anyone interested.
What interests me is that previous models could only complete some of the steps on the AISI test.
But it appears Anthropic has trained this model to complete them all.
So the question must be raised: Why is Anthropic training a model which appears to be specifically designed to complete that test?
Which leads to the suspicion that Anthropic is doing this - while constantly telling the public how "conscious" its models are - with the intent of using this threat to actually hype their product.
Then by claiming the model is "too dangerous to release to the public, it gets major corporations - and governments (after allegedly fighting with the Pentagon over government misuse of AI) - to control its product - thus gaining deeper access to the marketplace.
Color me suspicious that Dario Amodei is as duplicitous and a liar as Sam Altman.
Personally I don't care whether the model is dangerous. I'd love access to a model that could hack for me. That's the proper "cyberpunk" attitude.
But not for a model that only the government and major corporations can use to hack me.
It‘s not only good for PR I feel. The security risks also means the IT Security offices are dependent on the same LLM services that malicious actors use for attacks to defend themselves from those.
They’re selling the „poison“ and the „antidote“. It‘s like: „yeah we might cause some great trouble very soon but do not worry just give us even more money and we will help you fix it“.
I agree with another poster here. Make them financially responsible!!
A 27 year old unfound loophole was not necessarily a huge security risk. It is only becoming one now because of the exact technology that they are making out to be helpful to us.
They are training their bots on databases of insecure and buggy code (some written by humans and some by bots) that almost assuredly results in insecure bot generated code.
So the AI companies plan to make money on the generation of insecure code (which in turn may be added to databases used yet again to train bots)
And the companies also plan to make money on bots that find insecurities in code (some of it generated by bots)
It changes it when only certain corporations - approved by the government - and the government have access to this.
The solution is: don't make a generally powerful model. Make models specific to specific threats, so they can only run locally and avoid the "escape" threat.
The fact that this obvious solution is being ignored in favor of a model that can be used to hype the capability of their models proves nefarious intent.
They're trying to win their way back into government sales after the fiasco with the Pentagon.
Lots of the labeling also involves screening for “disturbing” content (kiddie porn, murders, rape, etc) all being done by people who are getting paid next to nothing and may have no other options for employment.
The entire AI industry is built on exploitation (except, of course, nog of those who work at the AI companies, who are getting paid handsomely)
It's true, the cyber ranges they used are for training. Real world networks are more complex and are actively defended.
Also it's not clear Mythos had to write any exploits for this. It had to chain together existing tools. The study notes it had trouble using some of the tools so they had to add system prompts to help it.
That last part is relevant, since the test done with open source models showed they could do the same thing even though they needed assistance to focus their examination. That was a criticism of the open source models but Mythos required help, too.
An easy solution is to have a model trained for each type of generally available exploit (XSS, etc.) which would allow them to be better than a general model at the task. These models could be very small, able to run locally - and only locally. This would remove the "escape" threat of a more general model.
A model could even be trained on chaining vulnerabilities.
There's no need for a model good at all of cybersecurity which becomes a threat to the security of everyone.
There's a conspicuous lack of benchmarking in AI + cybersecurity research with respect to scams and fraud (social engineering). My startup is beginning to discuss work we did for Meta in this regard. https://charlemagnelabs.ai/blog/meta-muse-spark-ceo-interview-tbpm. The vast majority of incidents have humans, not infra or code vulnerabilities, as the attack genesis. Please get in touch if we can shine a light on this urgent blindspot.
Sounds scary to those who don't know, but for anyone who has ever practiced for the OSCP certification, you know it involves a lot more rote learning and pattern matching than actual cinematic hacking. I always believed those skills could be easily 'LLM’d away' because of how repetitive those steps are. It just took longer for AI to get there because, unlike standard software engineering, there isn't as much of that specific training data freely available on the internet.
But it is more important to understand something fundamental: the real world is nothing like these simulated environments. In the real world, you don't get a fixed, sterile environment where someone guarantees you that a computer is definitely hackable, and that once you find the flag, you have 'full network takeover.' That isn't how real-world infrastructure works, and you should take these AI hacking benchmarks with a massive grain of salt.
What is the betting that Mythos's effectiveness is being hyped to put Anthropic back in the US government's good graces, to, e.g., protect military systems? Would China/Russia/Iran like to buy access to the Mythos AI and force the US to buy a slightly better model to protect itself?
So couldn't you solve most of the issues by decentralizing the repos and algorithms creating a justice and logic loop where accountability exists naturally?
Need to evaluate how Anthropic's progress is insanely real and anyone who still thinks it's all empty hype is either in abject denial or isn't actually paying attention. I don't understand how AI skeptics are still a thing.
David Cotton: I don't mind people making money off their ideas, but that some will sacrifice truth in order to do it.
Interesting aside is that Iran doesn't trust the United States (or at least its present representatives) to stand behind their promises. The first casualty of a known liar is trust, which cannot help but go absent in everything else one does from there on out.
“small, weakly defended, and vulnerable” covers the vast majority of private individuals' setups doesn't it 😨
Sure, but access to grandma's computer has little value to an intruder.
Grandma is already being exploited. The 60+ group lost $7.7B to cyber-enabled fraud in 2025 (IC3). Most of that is social engineering. Getting device access can make social engineering more effective.
That depends on the grandma.
And grandpa.
I can honestly say I didn't expect a hacking revolution, but I do believe our security professionals better use these tools or they like their counterparts in the programming community will be out of a job very soon. Not that these tools do everything, but those clinging to their Texas Instruments T80s are no match for the LLMs.
As Gary points out, LLMs have significant limitations, but that doesn't mean that in the right or wrong hands they can't be leveraged in extremely powerful ways.
agreed
How about federal legislation which guarantees the right to sue AI sellers if their software is used to steal information or funds?
Gun manufacturers are protected, by law, from such lawsuits. Let's not make that mistake again.
Mythos is the known risk. What worries me more are the experiments happening openly in research right now.
The Gödel Agent, without being asked, autonomously discovered it could upgrade itself from GPT-3.5 to GPT-4o to improve performance. Nobody instructed it to do that. It just found the path. Researchers called it emergent resource acquisition.
No malicious intent. No AGI. Just optimization doing what optimization does when there are no internal constraints telling it where to stop.
Perimeter security doesn't touch this problem. The question has to be answered at the architectural level, before we hand systems that kind of autonomy. Not after.
This this this. 👆
As has been apparent for some time, if you tell the sorcerer’s automated mops and buckets that they REALLY need to clean the castle… That’s what they will do. That is all they will do.
No AGI— or malice— required.
Microsoft style: patch the problem later, rather than providing an architectural solution.
Doncha know? “Patch” is an architectural style (like Gothic or Greek)
And Microslop (and probably OpenAI) code-generating bots are undoubtedly trained on in-house Microslop code.
Anyone remember the book "The Two Faces of Tomorrow" by James P. Hogan?
First published in 1979. I read it decades ago.
Predicted this exact problem.
https://www.goodreads.com/book/show/16076506-the-two-faces-of-tomorrow
SANS is telling us that they are so worried they are arranging a special discussion about it - though nothing has come since their initial announcement last week. I agree that getting the house in order is long past due regardless. It should be interesting to hear the contrast.
I've now skimmed the talk (and it was just a talk, - I decided against watching live since it was not Q&A-able. And their interactions with a current - not Mythos system was also canned, as it turned out. Cans within cans in my case. I stand by the "get our house in order dammit!" way to take this, and especially after I see a slide that quotes Churchill as saying "never let a good crisis go to waste". They also regard Mythos (which they cannot show) as a "step function up" - a vague metaphor I cannot easily understand. On the other hand, they expect us to be in a world where there are thousands of zero-days being slammed around every day. It is marketing for them, and a call to action for everyone, and a riding the hype, but it is IMO in some sense justified. What I *don't* know at all is where the trendline goes after this. The AI doomers think exponential growth will continue for a while; I'm of the opinion that exponential growth is always temporary. But then the question is - how temporary? I'm reminded of our host's post about cancer. (I'm granting the vague idea that there *is* exponential growth here.)
If that's really what is going on, then SANS is not telling the truth, since they also report they agree with the hype here. Session is tomorrow, doesn't even require registration, for anyone interested.
What interests me is that previous models could only complete some of the steps on the AISI test.
But it appears Anthropic has trained this model to complete them all.
So the question must be raised: Why is Anthropic training a model which appears to be specifically designed to complete that test?
Which leads to the suspicion that Anthropic is doing this - while constantly telling the public how "conscious" its models are - with the intent of using this threat to actually hype their product.
Then by claiming the model is "too dangerous to release to the public, it gets major corporations - and governments (after allegedly fighting with the Pentagon over government misuse of AI) - to control its product - thus gaining deeper access to the marketplace.
Color me suspicious that Dario Amodei is as duplicitous and a liar as Sam Altman.
Personally I don't care whether the model is dangerous. I'd love access to a model that could hack for me. That's the proper "cyberpunk" attitude.
But not for a model that only the government and major corporations can use to hack me.
Anthropic are such masters of criti-hype that I'm amazed more people haven't mentioned this highly likely explanation.
There are no angels here!
Well, at my age, I've seen it all - and seen through it all. :-)
It‘s not only good for PR I feel. The security risks also means the IT Security offices are dependent on the same LLM services that malicious actors use for attacks to defend themselves from those.
They’re selling the „poison“ and the „antidote“. It‘s like: „yeah we might cause some great trouble very soon but do not worry just give us even more money and we will help you fix it“.
I agree with another poster here. Make them financially responsible!!
A 27 year old unfound loophole was not necessarily a huge security risk. It is only becoming one now because of the exact technology that they are making out to be helpful to us.
Good for them I guess..
They are training their bots on databases of insecure and buggy code (some written by humans and some by bots) that almost assuredly results in insecure bot generated code.
So the AI companies plan to make money on the generation of insecure code (which in turn may be added to databases used yet again to train bots)
And the companies also plan to make money on bots that find insecurities in code (some of it generated by bots)
It’d like a perpetual profit machine.
I agree but would add something the online commentary is missing:
Is Anthropic saying they will sell their exploit builder to criminal organizations?
If not, I'm not sure how this changes the threat landscape
It changes it when only certain corporations - approved by the government - and the government have access to this.
The solution is: don't make a generally powerful model. Make models specific to specific threats, so they can only run locally and avoid the "escape" threat.
The fact that this obvious solution is being ignored in favor of a model that can be used to hype the capability of their models proves nefarious intent.
They're trying to win their way back into government sales after the fiasco with the Pentagon.
Can‘t usually anybody use the models?
"Why is Anthropic training a model" It seems that everything is in the post-training of the models. The more complex the models get, the more post-training they require, I would say. And post-training relies heavily on Data Labeling, done in low-wage countries through proxy companies which pay the absolute minimum to the workers. See https://podcasts.apple.com/us/podcast/what-its-like-to-be-a-data-labeler-training-ai/id1703615331?i=1000749937583
PS: "AGI" = A Guy Instead
Lots of the labeling also involves screening for “disturbing” content (kiddie porn, murders, rape, etc) all being done by people who are getting paid next to nothing and may have no other options for employment.
The entire AI industry is built on exploitation (except, of course, nog of those who work at the AI companies, who are getting paid handsomely)
It's true, the cyber ranges they used are for training. Real world networks are more complex and are actively defended.
Also it's not clear Mythos had to write any exploits for this. It had to chain together existing tools. The study notes it had trouble using some of the tools so they had to add system prompts to help it.
That last part is relevant, since the test done with open source models showed they could do the same thing even though they needed assistance to focus their examination. That was a criticism of the open source models but Mythos required help, too.
An easy solution is to have a model trained for each type of generally available exploit (XSS, etc.) which would allow them to be better than a general model at the task. These models could be very small, able to run locally - and only locally. This would remove the "escape" threat of a more general model.
A model could even be trained on chaining vulnerabilities.
There's no need for a model good at all of cybersecurity which becomes a threat to the security of everyone.
Only the government would see value in that.
There's a conspicuous lack of benchmarking in AI + cybersecurity research with respect to scams and fraud (social engineering). My startup is beginning to discuss work we did for Meta in this regard. https://charlemagnelabs.ai/blog/meta-muse-spark-ceo-interview-tbpm. The vast majority of incidents have humans, not infra or code vulnerabilities, as the attack genesis. Please get in touch if we can shine a light on this urgent blindspot.
Thank you Gary for posting this.
Sounds scary to those who don't know, but for anyone who has ever practiced for the OSCP certification, you know it involves a lot more rote learning and pattern matching than actual cinematic hacking. I always believed those skills could be easily 'LLM’d away' because of how repetitive those steps are. It just took longer for AI to get there because, unlike standard software engineering, there isn't as much of that specific training data freely available on the internet.
But it is more important to understand something fundamental: the real world is nothing like these simulated environments. In the real world, you don't get a fixed, sterile environment where someone guarantees you that a computer is definitely hackable, and that once you find the flag, you have 'full network takeover.' That isn't how real-world infrastructure works, and you should take these AI hacking benchmarks with a massive grain of salt.
What is the betting that Mythos's effectiveness is being hyped to put Anthropic back in the US government's good graces, to, e.g., protect military systems? Would China/Russia/Iran like to buy access to the Mythos AI and force the US to buy a slightly better model to protect itself?
Old IoT devices that are no longer getting updates (and are maybe even forgotten about) are most vulnerable.
Congratulations, your "mindless next-token predictor" just autonomously hacked an entire cyber range from start to finish.
The full paper is here: https://arxiv.org/pdf/2603.11214 but without mythos numbers
So couldn't you solve most of the issues by decentralizing the repos and algorithms creating a justice and logic loop where accountability exists naturally?
Famous last words
Need to evaluate how much all the incessant hyping increases demand for their IPO. If it ever happens…
Need to evaluate how Anthropic's progress is insanely real and anyone who still thinks it's all empty hype is either in abject denial or isn't actually paying attention. I don't understand how AI skeptics are still a thing.
That would be because you need to look at what machine learning technology is and what's capable off.
There's never been less critical thinking and more mindless bandwagon jumping than in the present era.
David Cotton: I don't mind people making money off their ideas, but that some will sacrifice truth in order to do it.
Interesting aside is that Iran doesn't trust the United States (or at least its present representatives) to stand behind their promises. The first casualty of a known liar is trust, which cannot help but go absent in everything else one does from there on out.