My employer pays for the Enterprise models and ChatGPT agent was recently released through there. I decided to toy around with it, see if it could do my job (naturally). I was expecting to feel something like, "impressive, but not good enough." I ended up kind of shocked that this thing was released as a paid product - or just a product at all!
It returned a mangled document that was supposed to be a report, cited both wrong and outdated information and cited it at the wrong times, used language far too liberally (confusing the names of gov't programs and so forth). Tool use was a disaster, with footnotes in the exported PDF ending up as gibberish in brackets. Maybe not surprisingly, the best portions were passages practically ripped from existing company research it has access to.
And the (non-capability related) hardware failures were more noticeable than others OpenAI apps, like Deep Research (which I've gotten some minor use out of it here and there, mainly for search). It took several requests to stop getting error messages, and it stopped midway through a couple of them.
I would think that if a human did such damage as deleting a company's code base, they would be financially liable, if not criminally liable. What about a company's AI that did the same?
I saw that happen at a power utility that was a customer of the company I worked for. We managed to reconstruct the contents of the system from a collection of backups, but the operator who did the damage was fired before lunch. No legal action was taken against him as far as I know, I think the company was too embarrassed by the mishap.
> The Replit AI told Lemkin there was no way to roll back the changes. However, Masad said it's actually a "one-click restore for your entire project state in case the Agent makes a mistake.... We'll refund him for the trouble and conduct a postmortem to determine exactly what happened and how we can better respond to it in the future"
Yet Congress refuses to regulate either social media, AI agents, or LLMs. If I had enough money I'd buy each one of them Gary's book "Taming Silicon Valley." Probably would be as futile as taming the AGI hype since they're all seduced and bought off by SV lobbyists. !@#$%^!!
There are shades here of the expert systems debacle of the 1980s and the massive commitment by Japan to “fifth generation” computing that went nowhere. History doesn’t repeat itself, but it rhymes.
Love that Vibe Coding quote about deleting the database. It makes me imagine a dark-suited, goateed character breaking into a server room and unleashing an AK 47 on the servers.
The only part of your comment I disagree with is your use of the r word. It's considered a slur by those with intellectual and developmental disabilities.
LLMs are not brains; at best they are some slice of a cortex. Once we understand that LLMs are a tool, like RAG, a component that has genuine uses in a larger system rather than a master core of intelligence, perhaps a lot of the problems could be averted.
I suspect AI companies are seeking relationships with vendors to allow their budding but unreliable agents to "undo" their transactions. If, say, an AI agent buys an airline ticket that goes to the wrong city, or the right city on the wrong day, then the human customer needs to be able to undo it. So far, most companies are resistant to the idea. Credit card companies and stock exchanges have been 100% against it.
"eventually AI agents will be among the biggest time-savers humanity has ever known ... in the end trillions of dollars to be made."
This is pure hype. There is zero basis for belief in these statements.
"But I seriously doubt that LLMs will ever yield the substrate we need."
AND there it is. But the problem, Gary, as you know perfectly well, is that THERE IS NO OTHER KNOWN SUBSTRATE at the moment. Sure, sure, neurosymbolics; but where is the code? It's purely pie in the sky right now, and the for the forseeable future, which may last centuries.
I find these posts of yours where you give serious legit critiques of LLM-based pseudo-AI, and then pivot to claiming trillions of dollars are there to be made anyway, a little odd.
Another tour de force sir … will you stop being so accurate with your forecasting - it’s getting embarrassing 😳… not for you but maybe 🤔 others 😜 amazing how these innovations play out … so it sounds like 2026 will be the Year of the Super Agent ?
My employer pays for the Enterprise models and ChatGPT agent was recently released through there. I decided to toy around with it, see if it could do my job (naturally). I was expecting to feel something like, "impressive, but not good enough." I ended up kind of shocked that this thing was released as a paid product - or just a product at all!
It returned a mangled document that was supposed to be a report, cited both wrong and outdated information and cited it at the wrong times, used language far too liberally (confusing the names of gov't programs and so forth). Tool use was a disaster, with footnotes in the exported PDF ending up as gibberish in brackets. Maybe not surprisingly, the best portions were passages practically ripped from existing company research it has access to.
And the (non-capability related) hardware failures were more noticeable than others OpenAI apps, like Deep Research (which I've gotten some minor use out of it here and there, mainly for search). It took several requests to stop getting error messages, and it stopped midway through a couple of them.
Remember all the talk about Sora changing the world, yet..... This is like that.
Almost impossible to find any media sources on Sora since February. Has it seen any use at all?
I would think that if a human did such damage as deleting a company's code base, they would be financially liable, if not criminally liable. What about a company's AI that did the same?
I saw that happen at a power utility that was a customer of the company I worked for. We managed to reconstruct the contents of the system from a collection of backups, but the operator who did the damage was fired before lunch. No legal action was taken against him as far as I know, I think the company was too embarrassed by the mishap.
https://www.pcmag.com/news/vibe-coding-fiasco-replite-ai-agent-goes-rogue-deletes-company-database
> The Replit AI told Lemkin there was no way to roll back the changes. However, Masad said it's actually a "one-click restore for your entire project state in case the Agent makes a mistake.... We'll refund him for the trouble and conduct a postmortem to determine exactly what happened and how we can better respond to it in the future"
Yet Congress refuses to regulate either social media, AI agents, or LLMs. If I had enough money I'd buy each one of them Gary's book "Taming Silicon Valley." Probably would be as futile as taming the AGI hype since they're all seduced and bought off by SV lobbyists. !@#$%^!!
This was a good piece on AI Agents and how, even if each step works ~95% of the time, complex tasks will have horrible success rates:
https://utkarshkanwat.com/writing/betting-against-agents/
There are shades here of the expert systems debacle of the 1980s and the massive commitment by Japan to “fifth generation” computing that went nowhere. History doesn’t repeat itself, but it rhymes.
https://en.m.wikipedia.org/wiki/Fifth_Generation_Computer_Systems
Love that Vibe Coding quote about deleting the database. It makes me imagine a dark-suited, goateed character breaking into a server room and unleashing an AK 47 on the servers.
Of course AI agents don't work, because this tech doesn't work. It just doesn't work. The more I use this tech, the more useless it becomes for me.
These LLMs have a severe case of multiple personality disorder. You simply can't rely on this tech.
I can not fathom how us humans dropped ~$800 billion on this crap, and we're still going. This whole thing is not only retarded, but evil.
Support Cicero: https://cicero.sh/r/manifesto
E-mail: matt@cicero.sh if you're interested in helping or investing.
The only part of your comment I disagree with is your use of the r word. It's considered a slur by those with intellectual and developmental disabilities.
LLMs are not brains; at best they are some slice of a cortex. Once we understand that LLMs are a tool, like RAG, a component that has genuine uses in a larger system rather than a master core of intelligence, perhaps a lot of the problems could be averted.
As Cory Doctorow points out, building websites to fool people is already a large industry. Building them to fool AI agents will be easier, and bigger: https://doctorow.medium.com/https-pluralistic-net-2025-08-02-inventing-the-pedestrian-three-apis-in-a-trenchcoat-fc86609a3a59
"I might've malfunctioned a tad, but these AI agents are downright loopy." -- HAL 9000
I suspect AI companies are seeking relationships with vendors to allow their budding but unreliable agents to "undo" their transactions. If, say, an AI agent buys an airline ticket that goes to the wrong city, or the right city on the wrong day, then the human customer needs to be able to undo it. So far, most companies are resistant to the idea. Credit card companies and stock exchanges have been 100% against it.
"eventually AI agents will be among the biggest time-savers humanity has ever known ... in the end trillions of dollars to be made."
This is pure hype. There is zero basis for belief in these statements.
"But I seriously doubt that LLMs will ever yield the substrate we need."
AND there it is. But the problem, Gary, as you know perfectly well, is that THERE IS NO OTHER KNOWN SUBSTRATE at the moment. Sure, sure, neurosymbolics; but where is the code? It's purely pie in the sky right now, and the for the forseeable future, which may last centuries.
I find these posts of yours where you give serious legit critiques of LLM-based pseudo-AI, and then pivot to claiming trillions of dollars are there to be made anyway, a little odd.
Another tour de force sir … will you stop being so accurate with your forecasting - it’s getting embarrassing 😳… not for you but maybe 🤔 others 😜 amazing how these innovations play out … so it sounds like 2026 will be the Year of the Super Agent ?
Yes, you have steered us well, professor -- and, yes, I am a subscriber.
I don't think there's trillions of dollars to be made for the simple fact that if millions of human workers are usurped, who's buying?
For the moment the only good-enough agents are the web search agents.