Marcus on AI

LLMs amaze at what they can do, LLMs amaze at what they can't do. The dichotomy is as fascinating as it is frustrating.

Reply (4)

Dakara

Yes, they are incredible pretenders of capability. They are just good enough to fully elicit the imagination of what they might could do, but never will.

They give you the perpetual feeling of "we are almost there". For that reason, I expect the sunk cost of LLM development will be legendary. We will ride this train fully off the rails.

Reply (4)

Drivin’ that train, high on cocAIne

VC Jones you’d better watch your greed

Bubble ahead, Altman behind

And you know that boltin’ just crossed his mind

the sunk cost of LLM development will be LLMgendary

schwortz

Jun 10, 2025

I'm not even sure at this point who's the bigger pretender at this point: the LLMs trying to convince us of how great their capabilities and answers are OR their CEOs, "experts" and venture capitalist hype men constantly promoting or warning us of some impending Skynet or Hal 9000. Either way they seem to be stochastic parrots these days, including the tech bros increasingly, since you can readily predict what they will say next, or tell their bots to say.

TheAISlop

Hope is a powerful persuasion.

khimru

Indeed. Google teaches, at their AI trainings, to not ask Gemini to solve complicated problems directly, but to ask it to write python program to solve it – and run it… um, hello Google, if Gemini can not even be made to do that silently and automatically, when appropriate – and I'm the one who needs to decide… what kind of pre-AGI it is?

It's like supposedly “superior intellect” that couldn't even consistently use tools… which is, as anthropologists claim, was the core point that separated “Home sapiens” from other hominids…

Reply (2)

We (the public) are the “tools” that the LLMs (and AI companies) are using.

Call us “homo sappiens” (because we are saps)

Reply (3)

khimru

AI companies are using us as tools, maybe, but that's another aspects. LLMs don't know how to use ANYTHING. That's the issue.

It's almost as if we are building AI “from inside out”: in Isaac Asimov's works first primitive robots were mute, could hear but poorly understood things. Then they learned to understand things but were unable to speak. Then spoke with bad screech. Then finally learned everything to speak fluently and clearly. Mimicking humans.

In reality AI is built in the opposite order: from good pronunciation to good composition and great vocabulary… yet still no understanding.

And anthropomorphisation makes it very hard to understand and accept what is happening.

Good point about LLMs not “knowing” how to use anything.

Or maybe it’s “homo sapsiens”?

Alistair Windsor

Jun 23, 2025

I don't think any anthropologist claim that. The most frequently cited hominid associated with tool use is Homo Habilis "Handy Man" who is associated with Oldowan tools. There is evidence of earlier tool use, possibly by Australopithecus afarensis, which would mean that tool use predates homo entirely (see Lomekwi 3 tools).

Mystic William

AI for me has been very helpful. But I have used it in ways that are somewhat closed systems. I have used it in legal cases where I probed it with MY ideas and had it refute me, or agree somewhat. Then I would adjust and ask again. Very valuable because it has a broad but limited number of answers. And it has been helpful with some health issues. But it never comes up with ideas.

Richard Seager (AI theist)

Jun 29, 2025

I have found that it makes up cases more than 90% of the time.

Mystic William

Jun 29, 2025

Really? So ‘Parker vs Regina - 1979’ might not even have happened?

Mystic William

Jun 29, 2025

For me it has shown me actual cases for every one I followed up. But it has made up answers for me. It works better if you are immersed in the subject. And can steer it.

Richard Seager (AI theist)

Jul 3, 2025

Not just steer it but keep it away from complexity.

And yes for me 90% of the cases it suggests are completely bogus. And when you tell it that it has just made up a bogus case it then makes up another bogus case. The other day it formulated the title of a Wisconsin case for me in Australian format.

Two days ago it just made up what was in a couple of pdfs. Complete fiction. And as for the cases it kept on doing it when you told it that its example was bogus, it just gave you another bogus reading of the pdf. Hopeless.

Ari

True idiot savants

Yaxiong Zhao

Michael D Metzler, MD, PhD

Thanks for the through review of relevant works up to this point. I very much share your opinion. The paper is an elegant scientific research, which the computer science community unfortunately have lost as its essense.

On the other hand

LLMs scaling to super intelligence is a lazy man's day dream.

No true elevation of human civilization has been achieved by "dumb scaling"

I am more than ever to be bullish on the resurgence of theory driven system building (probably on top of LLMs).

I think you're absolutely right, LLMs are far from intelligent, but the corporate world is rushing them out like they're going out of style. Yesterday my cell phone failed and I attempted to talk to Verizon, but wasted hours with their supposedly intelligent virtual assistant. It responded to my description of the problem: "I cannot send or receive text messages" by attempting to send me a text message containing a link.

Clearly not artificial intelligence, but rather, synthetic stupidity!

Reply (4)

Annabel Mullin

Will be stealing, ‘synthetic stupidity’! 👏

Joe

Aug 15, 2025

A.I. and A.I.I. are really:

Artificial Information

Artificial Incorrect Information

Artificial Inaccurate Information

Geoffrey Tully

Feb 13

"... but wasted hours with their supposedly intelligent virtual assistant." Yep; been there, done that. Part of the "intelligent" use of such VAs is (you) knowing when to invoke the call for "Agent" (a live body). If the system does not support a handoff to a live agent, close the session. Once you see the wall, recognize it for what it is (as you said): "synthetic stupidity!"

B. G. Weathersby

Jun 14, 2025Edited

Although I find most of those virtual assistants to be more helpful than the human sort, when you hit the kind of dead end you describe it is absolutely maddening – trapped in a maze with no way out.

Dakara

https://www.mindprison.cc/p/no-progress-toward-agi-llm-braindead-unreliable

"But anybody who thinks LLMs are a direct route to the sort AGI that could fundamentally transform society for the good is kidding themselves."

I think they will continue to do so. Anthropic's earlier paper also pretty much killed the idea of any real intelligence as well. Something I covered in more detail here.

The response to criticism for LLMs always seems to be something like "But I can't do that either. Humans make mistakes too."

But the key difference is humans have self-reflection. We understand our own failings. That is the only reason we can overcome them. It is another reason why LLMs experience model collapse consuming only their own output. It has no understanding for its own failures.

Throw away the benchmarks. A system that can produce new semantic information should not experience modal collapse analyzing its own output. That would be a signal of progress.

blake harper

Excellent point about self-reflection. The point that we can reliably recognize the mistakes as mistakes is part of the difference.

Nat Irvin II

Gary -- this is an excellent contributions to the madness. I mean that in a positive way...Thanks for helping the non computer scientist better understand what we humans intuitively think... using commonsense...i think.

S.S.W.(ahiyantra)

Apparently, the wall hit by deep learning never actually went away but was merely camouflaged for a bit by those who had something to gain from hiding that wall. Apple Inc's research shattered the illusionary barrier.

As I commented to some friends about Nate Silver's article about LLMs and poker (with its ignorant nonsense about AGI):

Any human can be given a set of rules and then generally apply them ... this goes for games, doing math, physics, biology, medicine, etc., operating or repairing machinery, etc. etc. LLMs are completely incapable of doing anything of the sort.

Jun 8, 2025Edited

I think one of the most dangerous things here is the development of these models by a handful of wealthy companies. "Dangerous" primarily in the sense that it means that objective research on their capabilities and nature is largely in the hands of organizations with every incentive to misrepresent, distort and conceal what they know.

When academia drives research into a field, biases and ego can affect honesty, but when corporations do, the potential for inaccurate reporting is so much greater. That these companies have fired a lot of their QA, DEI and ethics teams only makes the problem worse.

As an example, I read a paper recently that used a slight modification to common question benchmarks to reduce date leakage (the "none of the other options" variation), and saw large drops in performance for most models. Companies have every incentive to claim that data leakage is impossible and that their simple n-gram filtering is enough to detect it, because other claims would make them look worse and affect their profits.

Dangerous, as well, because the notion of an AI that is aligned with "the good of humanity" while such development is driven almost entirely by a handful of people of disproportionately European and Asian descent who have self-selected for an obsessive focus on wealth is shaky at best.

Shauna Gordon

Even worse, the companies themselves (such as OpenAI) aren't actually themselves wealthy. They're burning other people's money on the promise of AGI, and with the latest round of fundraising, they're burning the money of other people who themselves don't have the money.

First, there's Microsoft, which owns 49% of OpenAI already, and part of their funding deal is that Microsoft will get 75% of the revenue from OpenAI's "AGI products" (or whatever they call it). Altman literally managed to grift Microsoft out of *billions*.

But wait, there's more! The latest round of funding includes $20b from Softbank -- half of which Softbank themselves have to borrow -- on the condition that OpenAI go for-profit, before OpenAI sees a dime of it.

Talk about incentive to lie and obfuscate.

Peter Dorman

Small point and a big one. Small: "They also can’t play chess as well as conventional algorithms...." Really? AlphaZero and its ML confreres are rated a few hundred points (ELO) above the best human-programmed machines. We're still trying to understand why it pushes those h-pawns and keeps winning.

Big: I agree with the main thrust of this analysis, but to me (being of a certain age), it brings us back to the old, 70s-era debate over expert systems vs systems for experts. I think you're saying that AI agents are increasingly powerful when employed by humans with lots of context knowledge for specific tasks, or as part of an iterative process with those humans, and if so, I'm on board. And it means we are not going to eliminate human experts at any point in the foreseeable future, but the nature of expertise and they way we cultivate it will have adjust.

Reply (6)

Gary Marcus

alphazero is not an LLM and is purpose built w monte carlo tree search (more about that in my next essay)

Reply (3)

Peter Dorman

Yes about alphazero, insofar as there is no language involved; it's simply ML. I thought its high performance was a function of the chess context (clearly defined goal on which choices can be optimized), but it will be interesting to hear how the learning was structured. This was never made clear in the descriptive material I saw.

Patricio Rodriguez

Jun 8, 2025Edited

Yes also there are some algorithms that can solve chess plays by recursively laying out all the possible "next steps" and choosing the optimal one since chess is a constraint play, there's even a simpler algorithm with the same idea for tic-tac-toe. So technically you don't even need ML just compute and brute force

Henry

Opportunity to bang my bitter British ‘what the hell were the UK gov doing letting google buy deepmind’ drum.

B. G. Weathersby

Jun 14, 2025

When it comes to British drums worth banging, the last two decades have left us with more than enough to ensure a lifetime of repetitive strain injuries. I’ll be amazed if I still have both arms in a few years.

Gil Press

Tree search was used in the first-ever machine learning program, which played checkers. It also used an early version of reinforcement learning, also a component of AlphaZero, which Hinton made sure to dismiss and belittle in his Turing Award lecture.

Ann A

Thank you! 🙏

I'm nothing but a spectator in this, but that said?

AI, as it appears today, feels like the digital data manipulation equivalent of a crane and a fork lift.

The human keeps it pointed in the right direction and the "machine" does the heavy lifting .

Humans are capable save our memory limitations .... We can't "lift" all the info, despite knowing what to do.

FORESEEABLE FUTURE. ... LLMs and humans look like they're going to need to work together for the foreseeable future.

Tunde

This is what interests me! How will the nature of expertise change! How will our cultivation of expertise change. As these systems are will they significantly change our societies?

Jun 8, 2025Edited

Your "small point" is a fundamental failure to comprehend ... "They" is ===> LLMs <===, not AI or ML in general. As for "the descriptive material I saw", just read https://en.wikipedia.org/wiki/AlphaZero. The only "chess context" it has is the rules of the game ... it similarly can be given a "Go context" and a "Shogi context".

As for your other point, Gary is talking about his "vision of AGI", not merely an iterative process between humans and machines. In context, I read "one that combines the strengths of humans with the strength of machines, overcoming the weaknesses of humans" as being an AI system that combines *in itself* human cognitive ability with "the strength of machines" -- that is vast speed, vast memory, physical endurance, etc.

Y Thn

All @Peter Dorman said is very much in line with my experience with the older behavioral decision making research, already employed in various fields. Until AI systems show real initiative and originate purpose, they are just doing what they are programmed to do. Problem is how they are marketed today, as an orange that oozes gold if squeezed by anyone - especially middle managers.

esk

I believe Stockfish is the current champ in the engine world.

Not really. You can learn alot about constrained problems from it, but most (as in: exponentially more...) problems we'd want AGI for are precisely interesting because they are *NOT* constrained or we do not know the constraints.

esk

I think you responded to the wrong post?

No.

Jun 9, 2025Edited

You clearly did, because your comment had nothing at all to do with what you responded to.

P.S. Ah, I get it ... you paid no attention whatsoever to the *context*, and took "current champ in the engine world" in some broad abstract way rather than referring--as it SO OBVIOUSLY did--to chess ability. It seems that there's a bug in your cognition program ... it fails to grasp that comments are part of a *thread*, where the meaning of each comment is context-sensitive, referring in various ways to the comments above it.

esk

Ok well then ... yes really? Stockfish is the current strongest chess engine as determined in international competitions.

And at the same time, we're being inundated in the media once again by folks like Dario Amodei, about how AI is going to imminently spark massive changes to society; how it is an existential threat to humanity; how it tried to "blackmail engineers"; how AGI is right around the corner; and...blah, blah, blah...

And, of course, everyone is—once again—declaring the death of Hollywood because of Google's Veo 3.

Is it just me, or didn't we hear pretty much the same hyperbole this time last year? It seems they just rerun the identical playbook during lulls in the excitement.

Just because the magic trick gets a little better and more sophisticated doesn't mean it isn't still just a magic trick. Or am I missing something here?

Reply (3)

Jun 8, 2025Edited

If Dario Amodei truly believed half of what he was saying about how dangerous these models are, he would have to see himself as one of the greatest villains in the history of humanity.

Reply (2)

And that's one of the great dichotomies in all of this.

"It's dangerous, it could destroy humanity! Can we please have more money to develop it further?"

Jun 8, 2025Edited

The whole field is a mess of arrant hypocrisy, unfortunately.

OpenAI, a non-profit with closed-source models whose employees and executives bring in huge compensation. Anthropic, a business that distinguishes itself by caring about safety, but has released models with more safety red flags than most of the rest on a similar, if not more accelerated, timeline, and whose CEO has goals to conquer space.

X and Grok, a model that is meant to seek truth and avoid political bias, but questions the Holocaust death toll and talks about white genocide. Google, a search engine company that wants to replace its search engine with AI.

The pattern being: these flawed systems are only as good as their flawed creators. And that's over and above the foundational tech problems of LLMs.

This is an old claim long proved to be false, from the time that Arthur Samuel first lost to his own checkers program.

To which claim, specifically, are you referring?

Gary also says that they are dangerous ... in fact that is one of the major points here.

But they differ substantially as to what dangers they attribute. Important point.

Jun 11, 2025

Probably so, but that’s not the point. The point is that Amodei is doing something that he, himself, by his own lights, considers to be potentially tremendously destructive. Gary Marcus is not doing that same thing, so the fact that he also considers that it could be tremendously destructive does not reflect on him, one way or another.

Ann A

Both-And.

Experts will keep working on AGI.

Capitalists will keep exploiting whatever LLMs with little heed as to the inherent limitations (therefore ultimately unintended outcomes) to make money.

BOTH - AND

Other than about AGI, nothing in this article contradicts those statements.

Reality contradicts those statements. And Gary's article backs that up.

Jun 8, 2025Edited

Ignorant intellectually dishonest cognitively inept nonsense. I'll have nothing more to say to this foolish person who thinks with his amygdala.

Chad Woodford

Already my favorite WWDC announcement 👏🏻

Reply (2)

Chad Woodford

This also reminds me of a year or so ago when people were lauding the mathematical abilities of LLMs: “They can do math now!” My man, it’s a computer. The fact that sometimes they can’t despite enormous compute costs should be troubling

Nitin Badjatia

THIS👆. I know these types of papers aren’t released in coordination with marketing departments, but on the eve of one of the most anticipated WWDCs??

Jules Pitt

V interesting perspectives, especially in consideration of the maxed out training data considerations.. is quantum compute a factor in the next leg up to agi? Would love views on where you think that’s at @gary 🙏🏼

MarkS

Julia | Taking you global

Quantum computing will not be useful for AI for a very long time. Controllable qubits are counted by the dozen, scaling up to the billion needed for AI is not remotely feasible.

As a software engineer — and a mother — I decided to introduce AI to my 8-year-old. Mainly because it’s inevitable in her future, but also because I want her to understand early on what it is, and what it isn’t.

I started by telling her: “First off, the name artificial intelligence is misleading. There isn’t a single atom of real intelligence in it. It can tell you something true one day and something wrong the next if it’s trained on the wrong data.”

For me, that’s an important lesson for her to learn early.

Joe

Aug 15, 2025

AI is Artificial Information

Gerard

A great summary of current affairs in AI and Apple closing another chapter for generative AI. Last year, it was “reasoning”; last week, “reasoning models”.

This puts an uncomfortable question into the spotlight of the AI community and industry as a whole: why did it take so long to verify these claims? In other fields, people would have pushed back against OpenAI, who introduced this technique, and asked for actual proof, which we now know was always missing. Somehow here we are with the US government making policy around AI that couldn’t beat a kitchen calculator.

AI is so hyped up that some people are voicing their anxiety and fear of imaginary threats and scenarios taken from sci-fi.

I do see this paper as a success but also a massive failure of academia and AI research to protect the public against Silicon Valley greed for power. We all have failed against the hype and the myths spreading like wildfires. Now we have a full generation of people truly believing that AGI is coming and mountains of money are being wasted. That’s a very sad story.

Besides a couple of AI researchers raising awareness around AI limitations, the rest have been sitting silently or, even worse, following a bit too sheepishly.

If you are an AI researcher, this is a good time to take a serious look into yourself and your ways. Reconsider the importance of due diligence and verifiable facts.

The reality is that current AI research lacks scientific rigour and is way too willing to take on unsubstantiated claims and speculation for a minute of attention.

https://open.substack.com/pub/pramodhmallipatna/p/agi-meets-the-data-wall

AI could be a science if more AI practitioners behaved like scientists.

But I suspect that the (probably justified) fear of losing funding is a large part of the reason so few have publicly pushed back on the hype.

Pramodh Mallipatna

Jun 7, 2025Edited

Interesting set of data points from the new paper from Apple. Aligns with the observations you have been making.

Sharing my recent article on the same topic.

From Scaling to Bottleneck Era: AGI Meets the Data Wall

David Hsing

https://davidhsing.substack.com/p/what-the-hell-is-agi-even-for

"LLMs are no substitute for good well-specified conventional algorithms."

My corollary:

For every task that an AGI performs there is at least one non-AGI that does it just as well as an AGI except cheaper and more reliably.

While this may well be true, it is a non-sequitur for the discovery of such algorithms and that's the overall point in the first place: Folks praying to AGI god want (or at least pretend) to be going after the big problems of humanity where we precisely do not have an algorithm. Finding *any* algorithm toward an unsolved problem is usually the achievement, not (consequtively) lowering the bound of efficiency.

David Hsing

https://en.wikipedia.org/wiki/List_of_unsolved_problems_in_mathematics

Instead of sparce pickin' in the mud at very best, how about THIS ready-made list?

...Methinks the lopsided investment into water-guzzing monstrosities aren't even a tiny fraction as worthwhile... No?

The entire direction of this "endeavor" is comically MISTAKEN

https://davidhsing.substack.com/p/what-the-world-needs-isnt-artificial