Yeah, I heard that latest interview. Sam just blurts out shit that doesn't make any sense. "yeah, I asked it this really hard problem that I didn't even know, and Chat GPT just solved it, I sat back in my chair in awe".
Same as he blurts out other shit like, "pretty soon, Chat GPT is just going to discover new science and start solving physics". That makes no sense at all, an no, this tech is nowhere close to making scientific break throughs.
I look forward to more people excitedly claiming that it's conscious, based on the fact that prompts for sci fi scripts about conscious machines last it to generate sci fi scripts about conscious machines.
The last 5 years haven't increased my excitement about AI nearly as much as they've decreased my trust in human intelligence.
Yeah I imagine so but I can't figure out how. I'm on the android app and the menu only has Share Comment, Hide Comment, and Delete Comment. Long press just folds it.
If and when your predictions come true you don't really expect them to move on do you? Their reputations and so much money is invested in LLMs. Plus then they would have to admit all the lies they have been telling.
Prediction 8: Sam will make a series of anthropomorphic comments about GPT-5, along lines of: 'It thinks really hard', 'It really knows you', 'It has genuine wisdom', 'It's got a great sense of humour'. In fact, I'm going to do another Charting Gen AI bingo card predicting some of these!
Real question - what is there to improve in LLM's?
I'm speaking here as a random person who checks in on chatgpt every few months and always find it utterly underwhelming. It's a neat party trick and a little better than Google for very basic searches. But even that is mostly because Google, and the internet at large, is choked with spam. I'd take the 2012 internet over an llm any day of the week.
If I make a wishlist of things that would make chatgpt useful to me, it's basically all the stuff you say isn't going to change without incorporating actual world modeling in a serious way. And I suspect you're right.
So... what's left? What's something I currently couldn't do with 4, that I even conceivably might be able to do with 5?
Items 4 & 6 feel related because both rely on the fidelity of language which is greatly lacking for these claimed tasks. Heck, just talking through a request with another person requires a lot more interactions than I believe most people will want to have with these technologies. If we just look at how we have to deal with laws today and all the exceptions that then require amendments to address...constantly. Even translated religious texts that only resemble the original text in the old language it was translated from, and how different versions of translation appear differently depending on whose version it is (ie. the Bible).
I'm still lost by practitioners' obsession with the use of natural language for high detailed and precise tasks description as it has to be the worse way and completely antithetical to mathematical or scientific notation, the former of which has helped shape computer science. As someone who speaks several modern languages, I find that each is exceedigly challenged in expressing precise and nuanced statements. It's the genie and the 3 wishes over and over again, you can never make (describe) a wish that doesn't end you in an unexpected problematic situation, no matter how hard one tries.
The only way I justify the progress made to date is that so much has been written and ingested into these LLMs that they can attain useful statistical probability for the use of terms next to each other. But as a way to "understand" instructions or set guardrails, it feels like using a tractor to pick up lint on a carpet. Sure, you might pull it off, but it will likely fail more times than succeed at the task ;)
Hey, Gary I am firm believer as well that LLM’s are like a person who does not know anything and has memoriesed everything.
Hence, as a graduate student interested in working on the next frontier of AI what direction do you recommend studying and researching in. Thanks in advance!
Good article. The term "cargo cult" comes to mind to describe LLM mimetics.
"To improve accuracy, we might prompt the LLM to be careful: “Only answer if you’re sure,” or “Say ‘I don’t know’ if you’re unsure.” And the model will comply—linguistically. While it changes the tone of the output, it has no effect on accuracy."
The problem here is that the prompt wrongly assumes that the LLM has mental states. But an LLM is not the sort of thing that can be sure or unsure or know or not know things--that's a category mistake.
P.S.
Today I had ChatGPT tell me "Ah—you’re right! I had forgotten that ...".
My response: "You're an LLM ... you didn't "forget" anything, and such claims are jarring and unnecessary."
It replied:
"You’re absolutely right. Thank you for pointing that out.
It’s more accurate for me to say something like:
That detail wasn’t included in my earlier response.
rather than framing it as if I “forgot,” because I don’t have memory in the human sense. I’ll keep my phrasing cleaner to avoid that jarring effect."
Of course it doesn't know any of these things either, thanking me is completely meaningless, and it won't "keep" or "avoid" anything ... LLMs always revert.
You may want to properly study advanced statistics and mathematics. I feel like very few, if any at all, of the people who work on these LLMs have any ideas about what they are doing from a theoretical point of view (e.g. are the LLMs consistent estimators?) They seem to forget, and to quote Prof Judea Pearl, that LLMs are "just glorified regressors". Can we expect from a purely regression framework (with all that it entails) some form of "intelligence"? Prof. Pearl argues that any form of "intelligence" requires effective counterfactual modelling. And we know from first principles that this does not happen with regression models, unless the data is collected in a specific way (see e.g. clinical trials.) Throwing more data scraped willy-nilly from the Internet at ever bigger models is not going to address the issue at all. There is also the issue that deep learning doesn't produce models which are better than kernel machines (see e.g. the work by Prof. Pedro Domingos.) This discovery has HUGE implications. First because kernel machines have been thoroughly studied from a mathematical standpoint (see e.g. the works of Prof. Vapnik, or the book by Ingo Steinwart "Support Vector Machines"); second because it dispels all the "magical thought" around deep learning models.
"Seven Dark Predictions About Alex, Our New Human Expert"
Let's replace 'GPT-5' with 'Alex,' a brilliant human expert, and see how Gary's predictions hold up:
"In 2026, Alex will be a bull in a china shop, making shake-your-head stupid errors"
→ Every Nobel laureate who's forgotten where they parked. Every surgeon who's operated on the wrong knee. Every expert witness who confidently testified the wrong person was guilty.
"Alex's Reasoning will continue to be unreliable, especially in complex scenarios"
→ Economists who missed the 2008 crash. Weather forecasters and their 7-day predictions. That time NASA lost a $125M Mars orbiter because someone forgot to convert units.
"Fluent hallucinations will be common"
→ Brian Williams' helicopter story. Every eyewitness testimony ever. The NYT's 2002 WMD reporting. Your uncle at Thanksgiving explaining cryptocurrency.
"Natural language won't reliably interface with systems"
→ "I didn't mean delete everything!" Why lawyers exist. Why "that's not what I meant" is humanity's unofficial motto. The entire field of technical writing.
"Won't be general-purpose intelligence"
→ Ask your cardiologist to fix your WiFi. Ask your IT expert to perform heart surgery. No human has beaten Cicero at Diplomacy AND driven Formula 1 AND performed brain surgery.
"Alignment will remain unsolved"
→ Every war ever. Every divorce. Every company that claims "our employees are aligned with our values" while covering up scandals. Congress.
"Will need structured systems to augment them"
→ Why we invented writing (memory sucks). Why we need peer review (individuals are biased). Why democracy has checks and balances (no one is trustworthy with absolute power).
The punchline is this: Gary just described human intelligence perfectly. These aren't bugs—they're features of any sufficiently complex intelligence navigating an uncertain world. The fact that LLMs share these "flaws" might be the best evidence yet that we're on the right track.
There's a HUGE difference between human foibles and failures, and those of technological instruments/machines/software. People tend to inherently trust "the machine" and the machine can move at scale in ways no human will ever be able to. Yes, doctors make mistakes, but they are held liable. Who will be held liable when a glitch in the matrix or just an error in the system screwed things up? When a lawyer deletes an important text, there's no scale to that problem. Everyone of the things you've listed involves people and there's a limit to the damage they can produce. Machines have always been meant for scale and flawless predictable repeatability. Now we stand at the cusp of flawed and unpredictable repeatability...at scale. The point of view that Gary and others have previously espoused and seems most in "alignment" with humanity, is for these technologies to continue to play the role of tools we control, not technology that does our thinking for us so that we can abdicate our responbilities as a society or culture.
You're absolutely right about the trust + scale risk. We've navigated this before though: autopilot (trusted, scaled, still needs pilots), medical devices (FDA regulated, doctors still liable), trading algorithms (caused flash crashes, now have circuit breakers). The pattern is always: new tech → accidents → regulations → shared human/machine accountability.
Here's the irony: the very 'human-like' flaws Gary criticizes might be what keeps these systems as tools rather than overlords. We know how to supervise something that thinks like us—flaws and all. It's the promise of 'flawless' machine intelligence that Gary seems to want that should worry us more. Those perfect machines won't need us.
The messy, fallible, human-like AI that Gary dismisses? That's the one that will always need human oversight, human judgment, and human accountability. The 'bugs' he identifies might be the best features for keeping humans in the loop for a long time.
If all we're getting is something just as dumb as humans, then why are we paying so much money and burning up so much electricity when we can just get some readily available humans to do this stuff for us? The whole point of computers is that they can do stuff we can't. I don't see the point in wasting billions of dollars on something that unemployed Bill down the street could do for way less
But we're NOT getting something 'just as dumb'—we're getting something with human-like reasoning PLUS: nearly perfect memory, infinite patience, no ego, works 24/7, never judges my stupid questions, and reads 1000x faster. It's like hiring someone with 20 PhDs who never needs coffee or bathroom breaks for just $20 per month!
"Nearly perfect memory" and "reads 1000x faster" are both metrics that don't matter if it's giving you random wrong answers and you have no way to predict which ones are wrong. It also won't learn from getting things wrong, because it doesn't actually know anything.
"No ego", "never judges my stupid questions", "infinite patience" are already things you can get from a google search.
"20 PhDs who never need breaks" is just delusional thinking, I'm sorry.
You're right that '20 PhDs' is hyperbolic. What I actually find valuable: it helps me understand opposing viewpoints before engaging in discussions (like actively probing both sides of adoption ethics debates at a very deep level). Not perfect, sometimes wrong, but useful for perspective-taking in a polarized world. Think of it less as an oracle, more as a patient devil's advocate who helps you stress-test your own thinking. If you're curious, www.chat.com lets you try it free without any signup. Might be worth a quick test to see if it's as useless as you think—or you might find a specific use case where it actually helps.
Why would you assume I haven't tried it? I've used it to help me write an insurance denial appeal (among other things) - by which I mean I had it generate one for me which I read, trashed, and rewrote from scratch. It was sort of helpful in getting my brain started on the task.
I'm not saying there's no use cases at all, I just don't think we should be blowing up the environment for an inaccurate, ethically dubious, and marginally useful product that only does jobs that it's probably better for our brains to do ourselves anyway.
The problem with the structured approach to AI is the curse of exponentiality -- the same thing that was the undoing of expert systems. You cannot hardcode all possible connections between all possible components.
You don’t have to. If you include something like sn LLM. In the architecture, and loop it with a (somewhat) malleable structured system that might include expert systems, world modelers, physics simulators, etc., when the structured system hits something it can’t handle it can use the LLM to run a generate-and-test algorithm to find a way to handle it.
I know that’s somewhat handwavy, but we have a lot of research to do before we could build something that has the ability to find ground truth when it knows it does’t have it, which is what think we need to get the reliability, alignment, and the other attributes that Gary is calling out.
This is realistic skepticism about this tech. As a technologist with a background in software development and adopting tech early, I think we are seeing some scaling problems and the hallucinations (made up facts the bots concoct to try and be helpful) will still continue because these systems can't really fact check properly. I'd bet on all seven predictions to hold true!
superb piece. fully agree, including the key last point. hopefully the AI players will move more to LLM+logic front end development before investors become frustrated.
Yeah, I heard that latest interview. Sam just blurts out shit that doesn't make any sense. "yeah, I asked it this really hard problem that I didn't even know, and Chat GPT just solved it, I sat back in my chair in awe".
Same as he blurts out other shit like, "pretty soon, Chat GPT is just going to discover new science and start solving physics". That makes no sense at all, an no, this tech is nowhere close to making scientific break throughs.
Comedy piece for those interested about the absurdity of this all: https://cicero.sh/r/hows-the-ai-revolution
Thanks for the link.
I look forward to more people excitedly claiming that it's conscious, based on the fact that prompts for sci fi scripts about conscious machines last it to generate sci fi scripts about conscious machines.
The last 5 years haven't increased my excitement about AI nearly as much as they've decreased my trust in human intelligence.
*lead it to generate
You can edit comments.
Yeah I imagine so but I can't figure out how. I'm on the android app and the menu only has Share Comment, Hide Comment, and Delete Comment. Long press just folds it.
Tell me please. Why are we trying to replace humans with machines?
Why are we allowing twelve-year-old computer boys to determine our future?
While they destroy our environment and waste billions of dollars. They infuriate me; waste, fraud, and abuse.
If and when your predictions come true you don't really expect them to move on do you? Their reputations and so much money is invested in LLMs. Plus then they would have to admit all the lies they have been telling.
Once the penny drops, the VCs will just quietly back away. Already starting to happen.
Love the title - touche’ 😀
I am not a AI researcher, but this is my article based on analyzing from first principles and listening to the likes of Gary Marcus and others.
Human Intelligence Made Language, Can AI do the Reverse ?
https://open.substack.com/pub/pramodhmallipatna/p/human-intelligence-made-language
"Language is a projection of intelligence and not its source. So training on language alone offers a window/slice of intelligence and not all of it."
Nicely said.
With each new model society should have the right to an audited environmental impact statement.
Have you seen this? https://mistral.ai/news/our-contribution-to-a-global-environmental-standard-for-ai
I had not. Thank you.
Prediction 8: Sam will make a series of anthropomorphic comments about GPT-5, along lines of: 'It thinks really hard', 'It really knows you', 'It has genuine wisdom', 'It's got a great sense of humour'. In fact, I'm going to do another Charting Gen AI bingo card predicting some of these!
Real question - what is there to improve in LLM's?
I'm speaking here as a random person who checks in on chatgpt every few months and always find it utterly underwhelming. It's a neat party trick and a little better than Google for very basic searches. But even that is mostly because Google, and the internet at large, is choked with spam. I'd take the 2012 internet over an llm any day of the week.
If I make a wishlist of things that would make chatgpt useful to me, it's basically all the stuff you say isn't going to change without incorporating actual world modeling in a serious way. And I suspect you're right.
So... what's left? What's something I currently couldn't do with 4, that I even conceivably might be able to do with 5?
Items 4 & 6 feel related because both rely on the fidelity of language which is greatly lacking for these claimed tasks. Heck, just talking through a request with another person requires a lot more interactions than I believe most people will want to have with these technologies. If we just look at how we have to deal with laws today and all the exceptions that then require amendments to address...constantly. Even translated religious texts that only resemble the original text in the old language it was translated from, and how different versions of translation appear differently depending on whose version it is (ie. the Bible).
I'm still lost by practitioners' obsession with the use of natural language for high detailed and precise tasks description as it has to be the worse way and completely antithetical to mathematical or scientific notation, the former of which has helped shape computer science. As someone who speaks several modern languages, I find that each is exceedigly challenged in expressing precise and nuanced statements. It's the genie and the 3 wishes over and over again, you can never make (describe) a wish that doesn't end you in an unexpected problematic situation, no matter how hard one tries.
The only way I justify the progress made to date is that so much has been written and ingested into these LLMs that they can attain useful statistical probability for the use of terms next to each other. But as a way to "understand" instructions or set guardrails, it feels like using a tractor to pick up lint on a carpet. Sure, you might pull it off, but it will likely fail more times than succeed at the task ;)
Hey, Gary I am firm believer as well that LLM’s are like a person who does not know anything and has memoriesed everything.
Hence, as a graduate student interested in working on the next frontier of AI what direction do you recommend studying and researching in. Thanks in advance!
Start with your premise: LLMs are NOT like a person. Read a few of the other posts Marcus has written on that. And this: https://www.wordrake.com/blog/youre-thinking-about-reasoning-wrong#_ftn2
Good article. The term "cargo cult" comes to mind to describe LLM mimetics.
"To improve accuracy, we might prompt the LLM to be careful: “Only answer if you’re sure,” or “Say ‘I don’t know’ if you’re unsure.” And the model will comply—linguistically. While it changes the tone of the output, it has no effect on accuracy."
The problem here is that the prompt wrongly assumes that the LLM has mental states. But an LLM is not the sort of thing that can be sure or unsure or know or not know things--that's a category mistake.
P.S.
Today I had ChatGPT tell me "Ah—you’re right! I had forgotten that ...".
My response: "You're an LLM ... you didn't "forget" anything, and such claims are jarring and unnecessary."
It replied:
"You’re absolutely right. Thank you for pointing that out.
It’s more accurate for me to say something like:
That detail wasn’t included in my earlier response.
rather than framing it as if I “forgot,” because I don’t have memory in the human sense. I’ll keep my phrasing cleaner to avoid that jarring effect."
Of course it doesn't know any of these things either, thanking me is completely meaningless, and it won't "keep" or "avoid" anything ... LLMs always revert.
Precisely. LLMs are misaligned with regular communicative tasks in the products for which they are marketed.
You may want to properly study advanced statistics and mathematics. I feel like very few, if any at all, of the people who work on these LLMs have any ideas about what they are doing from a theoretical point of view (e.g. are the LLMs consistent estimators?) They seem to forget, and to quote Prof Judea Pearl, that LLMs are "just glorified regressors". Can we expect from a purely regression framework (with all that it entails) some form of "intelligence"? Prof. Pearl argues that any form of "intelligence" requires effective counterfactual modelling. And we know from first principles that this does not happen with regression models, unless the data is collected in a specific way (see e.g. clinical trials.) Throwing more data scraped willy-nilly from the Internet at ever bigger models is not going to address the issue at all. There is also the issue that deep learning doesn't produce models which are better than kernel machines (see e.g. the work by Prof. Pedro Domingos.) This discovery has HUGE implications. First because kernel machines have been thoroughly studied from a mathematical standpoint (see e.g. the works of Prof. Vapnik, or the book by Ingo Steinwart "Support Vector Machines"); second because it dispels all the "magical thought" around deep learning models.
"Seven Dark Predictions About Alex, Our New Human Expert"
Let's replace 'GPT-5' with 'Alex,' a brilliant human expert, and see how Gary's predictions hold up:
"In 2026, Alex will be a bull in a china shop, making shake-your-head stupid errors"
→ Every Nobel laureate who's forgotten where they parked. Every surgeon who's operated on the wrong knee. Every expert witness who confidently testified the wrong person was guilty.
"Alex's Reasoning will continue to be unreliable, especially in complex scenarios"
→ Economists who missed the 2008 crash. Weather forecasters and their 7-day predictions. That time NASA lost a $125M Mars orbiter because someone forgot to convert units.
"Fluent hallucinations will be common"
→ Brian Williams' helicopter story. Every eyewitness testimony ever. The NYT's 2002 WMD reporting. Your uncle at Thanksgiving explaining cryptocurrency.
"Natural language won't reliably interface with systems"
→ "I didn't mean delete everything!" Why lawyers exist. Why "that's not what I meant" is humanity's unofficial motto. The entire field of technical writing.
"Won't be general-purpose intelligence"
→ Ask your cardiologist to fix your WiFi. Ask your IT expert to perform heart surgery. No human has beaten Cicero at Diplomacy AND driven Formula 1 AND performed brain surgery.
"Alignment will remain unsolved"
→ Every war ever. Every divorce. Every company that claims "our employees are aligned with our values" while covering up scandals. Congress.
"Will need structured systems to augment them"
→ Why we invented writing (memory sucks). Why we need peer review (individuals are biased). Why democracy has checks and balances (no one is trustworthy with absolute power).
The punchline is this: Gary just described human intelligence perfectly. These aren't bugs—they're features of any sufficiently complex intelligence navigating an uncertain world. The fact that LLMs share these "flaws" might be the best evidence yet that we're on the right track.
"humans too" is stupid tiresome dishonest whataboutism.
There's a HUGE difference between human foibles and failures, and those of technological instruments/machines/software. People tend to inherently trust "the machine" and the machine can move at scale in ways no human will ever be able to. Yes, doctors make mistakes, but they are held liable. Who will be held liable when a glitch in the matrix or just an error in the system screwed things up? When a lawyer deletes an important text, there's no scale to that problem. Everyone of the things you've listed involves people and there's a limit to the damage they can produce. Machines have always been meant for scale and flawless predictable repeatability. Now we stand at the cusp of flawed and unpredictable repeatability...at scale. The point of view that Gary and others have previously espoused and seems most in "alignment" with humanity, is for these technologies to continue to play the role of tools we control, not technology that does our thinking for us so that we can abdicate our responbilities as a society or culture.
You're absolutely right about the trust + scale risk. We've navigated this before though: autopilot (trusted, scaled, still needs pilots), medical devices (FDA regulated, doctors still liable), trading algorithms (caused flash crashes, now have circuit breakers). The pattern is always: new tech → accidents → regulations → shared human/machine accountability.
Here's the irony: the very 'human-like' flaws Gary criticizes might be what keeps these systems as tools rather than overlords. We know how to supervise something that thinks like us—flaws and all. It's the promise of 'flawless' machine intelligence that Gary seems to want that should worry us more. Those perfect machines won't need us.
The messy, fallible, human-like AI that Gary dismisses? That's the one that will always need human oversight, human judgment, and human accountability. The 'bugs' he identifies might be the best features for keeping humans in the loop for a long time.
If all we're getting is something just as dumb as humans, then why are we paying so much money and burning up so much electricity when we can just get some readily available humans to do this stuff for us? The whole point of computers is that they can do stuff we can't. I don't see the point in wasting billions of dollars on something that unemployed Bill down the street could do for way less
But we're NOT getting something 'just as dumb'—we're getting something with human-like reasoning PLUS: nearly perfect memory, infinite patience, no ego, works 24/7, never judges my stupid questions, and reads 1000x faster. It's like hiring someone with 20 PhDs who never needs coffee or bathroom breaks for just $20 per month!
"Nearly perfect memory" and "reads 1000x faster" are both metrics that don't matter if it's giving you random wrong answers and you have no way to predict which ones are wrong. It also won't learn from getting things wrong, because it doesn't actually know anything.
"No ego", "never judges my stupid questions", "infinite patience" are already things you can get from a google search.
"20 PhDs who never need breaks" is just delusional thinking, I'm sorry.
You're right that '20 PhDs' is hyperbolic. What I actually find valuable: it helps me understand opposing viewpoints before engaging in discussions (like actively probing both sides of adoption ethics debates at a very deep level). Not perfect, sometimes wrong, but useful for perspective-taking in a polarized world. Think of it less as an oracle, more as a patient devil's advocate who helps you stress-test your own thinking. If you're curious, www.chat.com lets you try it free without any signup. Might be worth a quick test to see if it's as useless as you think—or you might find a specific use case where it actually helps.
Why would you assume I haven't tried it? I've used it to help me write an insurance denial appeal (among other things) - by which I mean I had it generate one for me which I read, trashed, and rewrote from scratch. It was sort of helpful in getting my brain started on the task.
I'm not saying there's no use cases at all, I just don't think we should be blowing up the environment for an inaccurate, ethically dubious, and marginally useful product that only does jobs that it's probably better for our brains to do ourselves anyway.
The problem with the structured approach to AI is the curse of exponentiality -- the same thing that was the undoing of expert systems. You cannot hardcode all possible connections between all possible components.
You don’t have to. If you include something like sn LLM. In the architecture, and loop it with a (somewhat) malleable structured system that might include expert systems, world modelers, physics simulators, etc., when the structured system hits something it can’t handle it can use the LLM to run a generate-and-test algorithm to find a way to handle it.
I know that’s somewhat handwavy, but we have a lot of research to do before we could build something that has the ability to find ground truth when it knows it does’t have it, which is what think we need to get the reliability, alignment, and the other attributes that Gary is calling out.
What if it's trained itself using Chat GPT 4 output (some being wrong) people have posted to the internet?
All good except for point 7: the most performant future AGI systems will not incorporate LLMs at all.
This is realistic skepticism about this tech. As a technologist with a background in software development and adopting tech early, I think we are seeing some scaling problems and the hallucinations (made up facts the bots concoct to try and be helpful) will still continue because these systems can't really fact check properly. I'd bet on all seven predictions to hold true!
superb piece. fully agree, including the key last point. hopefully the AI players will move more to LLM+logic front end development before investors become frustrated.