Why ChatGPT can’t be trusted with breaking…

Jan 3

A new case in point

134 Comments

Srini Pagidyala

Yes, this is an inherent limitation of LLMs frozen architecture.

LLMs can’t learn incrementally in real time and update their model, they’re batch trained with internet’s data periodically.

Between their training periods they’re stuck with whatever was there in prior training cycle, so by definition they’re not current.

Great timing in posting this Gary, makes this limitation obvious for everyone to understand without any argument. Thanks.

Gary and Srini: I think that "limitation" is only a "limitation" if we expect it to perform otherwise. Rule #1: Know what it can and cannot do. No one ever expected their "new" typewriter to fly.

They are still thinking in terms of their own hallucinations.

Oleg Alexandrov

Gemini first retrieves information and then produces a response based on it. It can also create and run code on the fly, check weather, do analysis using maps, etc. Tool-based work is just getting started.

Srini Pagidyala

Sure, but the core model isn’t capable of cascading the relevant changes to its beliefs, behaviours, actions, consequences beyond retrieving and regurgitating the new information and therein lies the problem of trust with LLMs.

Oleg Alexandrov

Jan 3Edited

LLM are meant to be engines for quick synthesis. The CPU also can't reconfigure itself. It has to be provided with good inputs.

LLM do more than "regurgitating". They can actually work with a codebase and apply operations to it. It is a manipulator based on learned rules. That is a powerful thing, but it does have limits.

>LLM are meant to be engines for quick synthesis.

I thought they were meant to be engines for artificial general intelligence. Is that really where the goalposts are now?

Oleg Alexandrov

It is not clear how far current AI will go. Many of the most obvious limitations are removable with additional logic. A lot of work out there is simple enough to be done reliably if the products get better. We will see. AGI is an aspirational goal.

You are not an LLM yourself, are you, Oleg? Reread your comment and I hope you see what I mean. Or actually you might be a marketing director.

Continue thread →

Oleg Alexandrov

Jan 4Edited

I am still quite amused by you assertion that LLM merely "regurgitates".

The Aigo.ai approach you favor relies on a hard-coded ontology, and everything must be mapped to that.

If you try to grow it, you will run into precisely the same problems that resulted in people moving to neural nets, as they have a lot more flexibility.

Pedro Domingos's recent Tensor Logic system holds a lot more promise here, as it starts with a neural net, does an SVD-like decomposition, and keeps the most relevant connections. This results in a more rigid system but which is easier to interpret. But even this one is likely overly limited.

Srini Pagidyala

I think there are a few category errors bundled together here.

First, “regurgitation” isn’t meant as an insult. It’s a description of mechanism that statistically remixes the outputs, statistical echoes as some call it. LLMs recombine statistical patterns from training distributions. That can be extraordinarily useful, but it is not the same thing as grounded understanding, persistent memory, real time learning, belief revision, or causal reasoning or autonomous adaptation. Calling that out isn’t dismissive, it’s precise.

Second, Cognitive AI systems like Aigo are not “hard-coded ontologies” in the GOFAI sense people moved away from. The ontology is not a brittle rulebook, it’s a living world model that is incrementally learned, extended, revised, and sometimes corrected. The key difference is that it is raised, not programmed. Like a human child, it starts with a small set of core priors and developmental scaffolding, then acquires concepts through interaction, feedback, and experience—building meaning over time rather than massive pre training and retraining periodically and memorising correlations.

So, the ontology isn’t hard-coded. It’s grown incrementally autonomously the way a child grows their own model of the world, starting from core knowledge and refining it through experience.

The reason neural nets won flexibility wasn’t because symbols failed. It’s because static symbols failed. Conflating the two, misses the last 30 years of progress in neuro-symbolic systems.

Approaches like Pedro Domingos’s Tensor Logic are interesting precisely because they point in the same direction: structure, abstraction, interpretability, and constraint layered on top of statistical learning. That’s not an argument against Cognitive AI. It’s an argument for it.

Where we differ is on limits. Starting with a neural net and pruning connections can improve interpretability, but it still doesn’t give you persistent memory, belief revision, goal reasoning, or a notion of being wrong. Those aren’t scale problems. They’re architectural requirements.

Flexibility without accountability gives you fluency.

Flexibility with explicit cognition gives you intelligence.

LLMs excel at the former. Cognitive AI is built for the latter.

Oleg Alexandrov

Jan 5Edited

This was a very lengthy writeup that I very much appreciate. It is this kind of discussions that make it worthwhile to visit blogs like this.

The difference between LLM and a system that you develop is much less than claimed. These are both representations of the world. One, like LLM, is grown organically. The second, which is what you have, is grown under a lot of supervision. Care is taken to have more explicit abstractions, interpretability, traceability etc.

Yet, it is very important to note that neither LLM nor your methods have an intimate understanding of the world. They are both rough approximations. It takes a huge lot more than neatly adding concepts to a representation.

There not much of a difference between periodic retraining, as done for LLM, and presumably the more continuous way in which your methods work. In either case, feedback from the world finds its way back to the system, which is how any intelligent entity grows.

Both of these approaches have way, way to go till they are able to function in the world. Our own intelligence is driven in no small part by having motor control, first-hand experience, and very many passes of iterative learning and feedback.

The key difference is that current AI-based systems are slowly growing out of the language domain. They are able to wield tools, create and run code, and likely there will be a physics-based learned system integrated into the AI as well.

I believe, and maybe I am wrong, that your method can't match the complexity of the world. It is like trying to use say 1 million of degrees of freedom for a problem that has billions of them. The representation space is too small, and its expressiveness is too constrained.

Srini Pagidyala

This is a thoughtful critique, and I appreciate the seriousness of it.

Allow me to respond point-by-point, because the disagreements here are real but also require clarifying.

First, on “both are just representations of the world.”

Yes, both LLMs and Cognitive AI systems represent the world. That’s where the similarity largely ends. The way the representation is formed, maintained, and revised matters more than the fact that a representation exists.

LLMs compress experience into static weights through batch training. What they grow is correlation.

Cognitive AI systems maintain an explicit, persistent world model that is incrementally autonomously updated. What they grow is understanding.

That distinction is not cosmetic; it determines whether a system can carry beliefs forward, notice contradictions, and revise itself coherently over time.

Second, on “organic growth vs supervised growth.”

LLMs are not grown organically in the developmental sense. They are grown statistically and then frozen. Periodic retraining is not the same as incremental learning in real-time/ continuous learning, because the system does not remain itself across updates. There is no enduring identity, belief continuity, or memory trajectory. Incremental learning isn’t about speed or frequency of updates; it’s about preserving epistemic lineage. Humans don’t wake up retrained from scratch every few months. That difference is foundational.

Third, on “neither has intimate understanding of the world.”

I agree that neither today matches human-level understanding. Where we differ is what that implies. You seem to assume that intimacy with the world requires enormous representational dimensionality and raw degrees of freedom. Cognitive science repeatedly shows the opposite: intelligence emerges from abstraction, concept formation/ revision, causation, and constraint, not exhaustive enumeration. Humans do not model the world with billions of degrees of freedom. We survive precisely because we don’t. We learn causal structure, not pixel exhaust.

Fourth, on embodiment and motor control.

I agree embodiment matters. But embodiment without cognition gives you sophisticated reflexes, not intelligence. Many embodied systems remain reactive because they lack belief maintenance, goal reasoning, and metacognition. Cognitive AI does not deny embodiment; it decouples cognitive architecture from any single physical instantiation. You can add sensors and motors to a system that can already reason about goals and consequences. You can’t bolt cognition onto a system whose internal state has no semantic commitments.

Fifth, on tools, code, and physics simulators.

Tool use and code execution are powerful, but they don’t resolve the core issue. A system can call tools brilliantly and still have no idea when it’s wrong. Physics simulators add external structure, but without internal belief tracking, the system can’t reconcile simulator output with prior knowledge or long-term goals. You get externalized orchestration, not internal understanding.

Finally, on the “degrees of freedom” argument.

This is where I think the intuition flips. More degrees of freedom don’t buy you intelligence. They buy you expressiveness. Intelligence comes from selective constraint: knowing what matters, what doesn’t, and why.

A smaller, structured representation that can revise itself coherently will outperform a vast, unstructured one in any domain where trust, accountability, and long-horizon reasoning matter. Biology figured this out long ago.

LLMs scaled outputs, skipped the cognitive mechanisms that produced them - https://srinipagidyala.substack.com/p/rip-techbro-era-20082025-the-inevitable

Cognitive AI develops and scales cognitive mechanisms, so they can create any output. These are different bets, more here: (https://petervoss.substack.com/p/cognitive-ai-vs-statistical-ai)

Continue thread →

Oleg Alexandrov

Addendum: I understand that LLM and your method represent data and learning differently. I am saying they are both shadows of the real thing. Neither neatly organized concepts nor language are enough.

Continue thread →

Peter beobachtet KI (in EN)

Gemini did a pretty good job in the Maduro case - at least when I tested it.

That said, there are still countless pre-structurings (data selection problems, provider / dev / tester ideologies, a constraints architectures which decides what becomes visible or not, ethical guidelines plus possible censorship, finetuning such as RLHF, etc.) at play here.

In short, no one should trust an LLM blindly or use just one LLM because, depending on the use case, the results can differ significantly between US LLMs, Mistral (for the EU) and DeepSeek for China.

IMO, we need something similar to "Ground News" for AI.

Oleg Alexandrov

The tech reflects the bias of its creators.

Srini Pagidyala

Here's my latest post on this:

https://srinipagidyala.substack.com/p/the-future-we-want-the-future-we

Oleg Alexandrov

Note that here you conflating two things: first is ethics, and second is better algorithms. All the abuse in the left image will happen also with the supposedly better cognitive AI methods.

It’s a tool … (sure, hyped like crazy, I agree) … but like any tool, if its not understood by someone (as far as practically neccessary) and used naively without the appropriate skills… chances are, you hit your thumb with it, to lean into the tool metaphor … attorneys showing up with hallucinated cases … anthropic presenting hallucinated study authors in court … etc etc … hilarious … but… I can’t see the fault in the ai tool. THAT I consider a little … lets say rationalising / confabulating like ChatGPT in defensive mode 😉

Srini Pagidyala

Totally fair to use the tool metaphor. But here’s the uncomfortable part: some tools fail safely, some fail catastrophically. That distinction matters.

A hammer doesn’t invent a nail, cite a fake building code, and then argue with you about it. When you “hit your thumb,” the error is obvious, local, and immediately detectable.

With LLMs, the failure mode is different: confident fabrication that looks like competence. That’s not just user naivety. That’s an architectural property of a probabilistic system optimized for plausibility, not truth. The attorney examples aren’t “hilarious edge cases,” they’re the most honest demo of the core issue: the tool can output convincing nonsense with no internal alarm bell.

So yes, users should learn limitations. But “skill” here increasingly means babysitting: verifying, cross-checking, and policing a system that can’t reliably know when it’s wrong.

And when the industry responds with “use it properly,” that’s often a tell. Reliable tools don’t require rhetorical choreography to stay reliable under normal use.

I’m not blaming the user. I’m also not letting the architecture off the hook.

Enterprise adoption will bury these systems ultimately.

That’s why the path forward isn’t louder models or more guardrails. It’s Cognitive AI: systems with persistent memory, belief tracking, uncertainty, and accountable reasoning.

Jan 3Edited

The AI field's primary achievement over the last decade has been to build a trillion dollar Chinese Room possessing at most trivial machine cognition (and all the hype has been mere kabuki theatre!)

Oleg Alexandrov

Jan 3Edited

When the dude in the Chinese room has access to a real-time feed, an internal simulator, arms for manipulation, things change.

AI's primary achievement over the last decade has been an architecture that can take as input massive amount of multi-sensor data, and produce useful actions. Now it is a matter of building up the needed infrastructure, including for validation and learning better from mistakes.

Concept of Consistently useful actions assumes stationarity.

Mistakes happen because there is a drift or regime change (as in the case of Maduro, pardon the pun). In domains where the processes don't change often, AI will perform well with the help of additional rules.

Else, it will be playing catch up. That is why AI is real world will always fall short since its playbook is always stale.

Oleg Alexandrov

The vast majority of work people do is quite predictable. Context changes, there is variability, but the general patterns remain.

If AI can get feedback about how it does, and can try something else on failure, it can go far. See the recent progress on ARC grand challenge.

The fundamental problem nowadays seems to be that AI cannot learn continuously very well. It can remember some lessons via some summary, but they are not well-internalized. It an also get lost as the tasks get longer.

"It an also get lost as the tasks get longer."

Catastrophic forgetting?

Oleg Alexandrov

I think catastrophic forgetting is an issue when you reoptimize the model with new data. It may forget to do old work.

I think a model gets lost when it does long tasks because it keeps on adding what it produces to the context, so it does not forget. But then there's a lot more to remember and it is not smart enough to tell what is important. So it remembers everything but gets confused anyway.

Martin Machacek

The inability to assess relevancy is one of the main shortcomings of current AI systems. Continuous learning may solve this problem , but (as of now) we do not have any viable architecture allowing it.

Continue thread →

Agree. (Feel inclined to quote) … If measured against the hype in all its dimensions … and crazy silly expectations. And in my opinion seems like a trance sometimes and critical thinking evaporates. That said, I’m teaching ai vocational training courses for otherwise highly qualified professionals …for a year now … and the usefulness or not turns out to be in how it’s used in the right way - you could say in the least useless way - but then, practically, it makes more and more sense (that is, I guess, until the real cost is passed on to users) … and I guess when the real thing “happens” in future years … like some hybrid ai systems that know their place so to speak … skills learned with this wierdly off LLM tech may still help … I can imagine.

Non-trivial utility (vocalisation) does not necessarily require non-trivial machine cognition (understanding and reasoning); LLM-based chatbots have the former, but not the latter.

"No, silly human, Generalissimo Francisco Franco is not dead."

Hi Gary! "Omg" but not really - when your "world" consists just of vector embeddings, your "truth" will be limited to mashups that come out of the embeddings.

"Reasoning", "thinking", "chain of thought", "tree of thought" ... yeah right.

Shameful.

Oleg Alexandrov

What is needed is guided reasoning and grounding. The latest chatobots are getting quite good at that, but that is not cheap and not fool-proof.

Jan 3Edited

Dude, they don't have it now?

'Not cheap, not fool-proof' - so just accept it and smile? The delusion these LLM companies have is that their product will replace EVERYTHING, including our going to BBC News "manually". Do we happily sign at the dotted line?

It's all built on a house of cards, no amount of Band-Aid is ever going to fix that. There is no level of reliability, other than 100%, that is good enough.

Oleg Alexandrov

LLM of course won't replace everything. The hype can be wild.

"There no level of reliability, other than 100%, that is good enough."

It is an evolutionary process. For the near-future, these are tools for augmentation.

When I work with an AI in my code, I catch its mistakes sometimes, and it catches my bugs some other time. And each release is better than before.

Checking code and math proofs is way different compared to dealing with real-world news, where truth isn't approximated asymptotically better in each new update.

No interest in arguing anymore, I have better things to do. Peace!

Oleg Alexandrov

Actually real-world news is very easy to deal with. It is messy but high-level, very easily expressible via language.

LLMs fail badly for fine-grained work, where they need to intimately know the physics and material properties.

Right. Which is why ChatGPT and Perplexity can, which is why Gary didn't write this post :)

Continue thread →

Martin Machacek

… or the purpose, context and history of your code.

Continue thread →

Great article! AI is not going to to tell you your military plans are dumb. It could probably help with analyzing your plans as far as logistics goes but anything else would be a disaster.

Maybe an AI specifically trained for military operations might not be completely worthless but I don’t think anyone is working on that.

Martin Machacek

It may tell you that your plan is bad because there was nothing like it in its training data.

Why would someone be working on that? 🤔🤣🤣

ShootyBear: This is true--working in the obscurities and happenstances of rea-time events (like on a battlefield, or flying a plane, or caring for a child, for instance) NO ONE in their right mind would ask someone who didn't know the up-to-the-minute details to guide anyone directly into the situation, e.g., battle, flying in a storm, or bathing an infant.

What a bogus and desperate writeup about the “totally useless”LLM…. As if millions of biological brains would respond differently this morning - unless they already checked the news. Retrieving the news is a relatively simple feature for LLM (retrieval augmentation).

Martin Machacek

Hmm, then why didn’t ChatGPT check the news before answering if it is easy?

Bryan Richard Jones

Because he didn’t ask it to and it usually defaults to its training data unless promoted to search online to save time and money. He should know this if he’s a leading voice in AI

Right, but that kinda shows the lack of intelligence in "artificial intelligence". An actual intelligence, if asked about a major news story from this morning that they hadn't heard, would check it first rather than coming up with reasons why it must be fake and belittling the person asking, all because looking it up is too much like hard work

Bryan Richard Jones

I could equally argue about the intelligence of the user. He’s a “leading voice in AI” and instead of asking it to use the internet like he knows he has to for up to date info, he wrote an essay about how it doesn’t work properly

I think you're deliberately missing the point - it wasn't that the user couldn't possibly figure how to prompt and cajole the AI into doing the research, it's that the AI's first response being "nuh uh" rather than checking shows that it's a bit dim and bad at its job. Also fwiw the user in question is the Wired editor Brian Barrett, who is writing for a more general audience than Gary Marcus

Bryan Richard Jones

It’s a tool and it’s being used wrong. If I hit my face with a hammer I wouldn’t blame the hammer

Continue thread →

Very well said! When I woke up this morning and someone told me, I couldn't believe it and thought it must be fake news. Till I checked the news.

LLMs combination of being tuned to be maximally convincing and being fundamentally untrustworthy is really something. How does anyone defend that combo?

Bryan McCormick

I asked Grok today about measles cases in adults who had been vaccinated/previously had the disease. It is "impossible" - "immunity is life long". Well, Dr Grok, you got it wrong. It is in fact possible and the variants are Atypical Measles Syndrome (AMS) and Breakthrough. Now when I used those phrases it got it right and apologized. Too bad for the person who doesn't know enough to do so. If you ask it with a generalized prompt you'll possibly get the confidently wrong version. Why are we letting these out in the wild when they are not ready and can do real harm?

We can have it both ways! Neither Mr. Trump nor GPT-5.1 is a trustworthy source for information on world events, especially conflicts.

In my case, it was even worse. In the first prompt, chatGPT searched the internet and confirmed Maduro's arrest. In a subsequent prompt asking for more information, it denied it and said it was fake news ( brilliant context management). So, in just six prompts, it contradicted itself three times. They said chatGPT was going to be AGI, LOL.

I believe one of the reasons why LLMs can be useful for writing code (programming, as it used to be called) is that the training set mostly consists of functioning, non-contradictory examples. In the case of politics, the training set is mostly contradictory rubbish. That should make a big difference.

basics: If using ChatGPT you have to tell it to look currently or "as of today" before your request. The basic structure last info is June 2024. Anything after that has to come from a browser. Hey, it's machine NOT alive!

"The greatest challenge facing mankind is the challenge of distinguishing reality from fantasy, truth from propaganda. Perceiving the truth has always been a challenge to mankind, but in the information age (or as I think of it, the disinformation age) it takes on a special urgency and importance."

— Michael Crichton (1942-2008)

Gary, we’d love to run this piece in the Spectator magazine. Is there any chance? I’m mary@spectator.co.uk I do hope so

Ai mainly has a problem with an artificially created aura that embraces almost all superlatives. What you point out should be set in stone so that most people realize it. What is even more dangerous is how much these technologies are forced to be used. When you mentioned use for war purposes, I am directly terrified by the infiltration of Palantir's Maven software into NATO structures.

Nathalie Suteau

I agree: they are terrible for recent events but not only. ChatGPT has a weird bias towards what is considered as left wing. If I mention Islamists, it’s refusing to admit it’s an extreme religious ideology. If asked to give advice to visit Morocco, it doesn’t mention the dangers of Islamists, of the dictatorship. It can’t be used as political, geopolitical fact checker.

Brendon Rowland

Your own bias is telling.

Nathalie Suteau

Which bias? Sorry: I started to test ChatGPT 3 years ago and yes, it has a bias and a massive one. I’m neutral when I test an LLM.

The mere fact that you assert that Islamism is a threatening activity (when it is little more than a strong commitment to Islam) says everything about your biases. In fact, I am afraid, it suggests that you confuse opinion with fact, and so cannot be trusted.

Nathalie Suteau

It is. An extreme religious view is always dangerous whether you like it or not. Moreover, double check history: according to Islamists, all people born on an Islamic land are Muslim. So it’s a religion which is always seeking a territorial expansion instead of converting people. The principle of neutrality has to be applied to all LLMs and religious people and people following a religion, a rhetoric or an ideology cannot test them properly.

People on Twitter have uncovered untolds amounts of bias like this across people, governments, chat bots and everything in between. It's actually insane the amount of influence they have especially when you look at their microscopic share of the population in the places they covertly rule.

Nathalie Suteau

I was contacted by the DM @elonmusk to be part of the community notes early rollout in 2023. There’s not a single LLMs using them. ChatGPT told me 1h ago that Maduro was still in power. I had to reprompt it 3 times to have a summary of what happened. Sorry for the typo in my initial comment.

Oh, I was referring to what you mentioned it "refused to admit." My bad, I can see I wasn't clear enough.

Yeah, ChatGPT went considerably down hill immediately after 4o if you ask me. They just changed things for the sake of changing things when they went to 5. All in an effort to make it seem "new and different". I stopped caring for it a long time ago. It was so overly dumbed down and geared toward the masses in a way that made in more and more useless with each "update". And it would lie all the time too.

Nathalie Suteau

It depends on the topic, for neuro science, medicine and legal, image recognition and now video recognition: it’s the best LLM ever. I use only ChatGPT. The 5.2 version: huge but you have to prompt for hours to test it properly and use 2 languages mixed. The translation is still poor because they use open source translation instead of linguists.

Gotcha 👍🏾

#nojs-banner { position: fixed; bottom: 0; left: 0; padding: 16px 16px 16px 32px; width: 100%; box-sizing: border-box; background: red; color: white; font-family: -apple-system, "Segoe UI", Roboto, Helvetica, Arial, sans-serif, "Apple Color Emoji", "Segoe UI Emoji", "Segoe UI Symbol"; font-size: 13px; line-height: 13px; } #nojs-banner a { color: inherit; text-decoration: underline; } This site requires JavaScript to run correctly. Please turn on JavaScript or unblock scripts