240 Comments
User's avatar
TheAISlop's avatar

LLMs amaze at what they can do, LLMs amaze at what they can't do. The dichotomy is as fascinating as it is frustrating.

Expand full comment
Dakara's avatar

Yes, they are incredible pretenders of capability. They are just good enough to fully elicit the imagination of what they might could do, but never will.

They give you the perpetual feeling of "we are almost there". For that reason, I expect the sunk cost of LLM development will be legendary. We will ride this train fully off the rails.

Expand full comment
Larry Jewett's avatar

Drivin’ that train, high on cocAIne

VC Jones you’d better watch your greed

Bubble ahead, Altman behind

And you know that boltin’ just crossed his mind

Expand full comment
Larry Jewett's avatar

the sunk cost of LLM development will be LLMgendary

Expand full comment
Tim Nguyen's avatar

I'm not even sure at this point who's the bigger pretender at this point: the LLMs trying to convince us of how great their capabilities and answers are OR their CEOs, "experts" and venture capitalist hype men constantly promoting or warning us of some impending Skynet or Hal 9000. Either way they seem to be stochastic parrots these days, including the tech bros increasingly, since you can readily predict what they will say next, or tell their bots to say.

Expand full comment
TheAISlop's avatar

Hope is a powerful persuasion.

Expand full comment
khimru's avatar

Indeed. Google teaches, at their AI trainings, to not ask Gemini to solve complicated problems directly, but to ask it to write python program to solve it – and run it… um, hello Google, if Gemini can not even be made to do that silently and automatically, when appropriate – and I'm the one who needs to decide… what kind of pre-AGI it is?

It's like supposedly “superior intellect” that couldn't even consistently use tools… which is, as anthropologists claim, was the core point that separated “Home sapiens” from other hominids…

Expand full comment
Larry Jewett's avatar

We (the public) are the “tools” that the LLMs (and AI companies) are using.

Call us “homo sappiens” (because we are saps)

Expand full comment
khimru's avatar

AI companies are using us as tools, maybe, but that's another aspects. LLMs don't know how to use ANYTHING. That's the issue.

It's almost as if we are building AI “from inside out”: in Isaac Asimov's works first primitive robots were mute, could hear but poorly understood things. Then they learned to understand things but were unable to speak. Then spoke with bad screech. Then finally learned everything to speak fluently and clearly. Mimicking humans.

In reality AI is built in the opposite order: from good pronunciation to good composition and great vocabulary… yet still no understanding.

And anthropomorphisation makes it very hard to understand and accept what is happening.

Expand full comment
Larry Jewett's avatar

Good point about LLMs not “knowing” how to use anything.

Expand full comment
Larry Jewett's avatar

Or maybe it’s “homo sapsiens”?

Expand full comment
Mystic William's avatar

AI for me has been very helpful. But I have used it in ways that are somewhat closed systems. I have used it in legal cases where I probed it with MY ideas and had it refute me, or agree somewhat. Then I would adjust and ask again. Very valuable because it has a broad but limited number of answers. And it has been helpful with some health issues. But it never comes up with ideas.

Expand full comment
Arie te Stroete's avatar

True idiot savants

Expand full comment
Yaxiong Zhao's avatar

Thanks for the through review of relevant works up to this point. I very much share your opinion. The paper is an elegant scientific research, which the computer science community unfortunately have lost as its essense.

On the other hand

LLMs scaling to super intelligence is a lazy man's day dream.

No true elevation of human civilization has been achieved by "dumb scaling"

I am more than ever to be bullish on the resurgence of theory driven system building (probably on top of LLMs).

Expand full comment
Michael D Metzler's avatar

I think you're absolutely right, LLMs are far from intelligent, but the corporate world is rushing them out like they're going out of style. Yesterday my cell phone failed and I attempted to talk to Verizon, but wasted hours with their supposedly intelligent virtual assistant. It responded to my description of the problem: "I cannot send or receive text messages" by attempting to send me a text message containing a link.

Clearly not artificial intelligence, but rather, synthetic stupidity!

Expand full comment
Annabel Mullin's avatar

Will be stealing, ‘synthetic stupidity’! 👏

Expand full comment
B. G. Weathersby's avatar

Although I find most of those virtual assistants to be more helpful than the human sort, when you hit the kind of dead end you describe it is absolutely maddening – trapped in a maze with no way out.

Expand full comment
Dakara's avatar

"But anybody who thinks LLMs are a direct route to the sort AGI that could fundamentally transform society for the good is kidding themselves."

I think they will continue to do so. Anthropic's earlier paper also pretty much killed the idea of any real intelligence as well. Something I covered in more detail here.

https://www.mindprison.cc/p/no-progress-toward-agi-llm-braindead-unreliable

The response to criticism for LLMs always seems to be something like "But I can't do that either. Humans make mistakes too."

But the key difference is humans have self-reflection. We understand our own failings. That is the only reason we can overcome them. It is another reason why LLMs experience model collapse consuming only their own output. It has no understanding for its own failures.

Throw away the benchmarks. A system that can produce new semantic information should not experience modal collapse analyzing its own output. That would be a signal of progress.

Expand full comment
blake harper's avatar

Excellent point about self-reflection. The point that we can reliably recognize the mistakes as mistakes is part of the difference.

Expand full comment
Nat Irvin II's avatar

Gary -- this is an excellent contributions to the madness. I mean that in a positive way...Thanks for helping the non computer scientist better understand what we humans intuitively think... using commonsense...i think.

Expand full comment
S.S.W.(ahiyantra)'s avatar

Apparently, the wall hit by deep learning never actually went away but was merely camouflaged for a bit by those who had something to gain from hiding that wall. Apple Inc's research shattered the illusionary barrier.

Expand full comment
Peter Dorman's avatar

Small point and a big one. Small: "They also can’t play chess as well as conventional algorithms...." Really? AlphaZero and its ML confreres are rated a few hundred points (ELO) above the best human-programmed machines. We're still trying to understand why it pushes those h-pawns and keeps winning.

Big: I agree with the main thrust of this analysis, but to me (being of a certain age), it brings us back to the old, 70s-era debate over expert systems vs systems for experts. I think you're saying that AI agents are increasingly powerful when employed by humans with lots of context knowledge for specific tasks, or as part of an iterative process with those humans, and if so, I'm on board. And it means we are not going to eliminate human experts at any point in the foreseeable future, but the nature of expertise and they way we cultivate it will have adjust.

Expand full comment
Gary Marcus's avatar

alphazero is not an LLM and is purpose built w monte carlo tree search (more about that in my next essay)

Expand full comment
Peter Dorman's avatar

Yes about alphazero, insofar as there is no language involved; it's simply ML. I thought its high performance was a function of the chess context (clearly defined goal on which choices can be optimized), but it will be interesting to hear how the learning was structured. This was never made clear in the descriptive material I saw.

Expand full comment
Patricio Rodriguez's avatar

Yes also there are some algorithms that can solve chess plays by recursively laying out all the possible "next steps" and choosing the optimal one since chess is a constraint play, there's even a simpler algorithm with the same idea for tic-tac-toe. So technically you don't even need ML just compute and brute force

Expand full comment
Henry's avatar

Opportunity to bang my bitter British ‘what the hell were the UK gov doing letting google buy deepmind’ drum.

Expand full comment
B. G. Weathersby's avatar

When it comes to British drums worth banging, the last two decades have left us with more than enough to ensure a lifetime of repetitive strain injuries. I’ll be amazed if I still have both arms in a few years.

Expand full comment
Gil Press's avatar

Tree search was used in the first-ever machine learning program, which played checkers. It also used an early version of reinforcement learning, also a component of AlphaZero, which Hinton made sure to dismiss and belittle in his Turing Award lecture.

Expand full comment
Ann A's avatar

Thank you! 🙏

I'm nothing but a spectator in this, but that said?

AI, as it appears today, feels like the digital data manipulation equivalent of a crane and a fork lift.

The human keeps it pointed in the right direction and the "machine" does the heavy lifting .

Humans are capable save our memory limitations .... We can't "lift" all the info, despite knowing what to do.

FORESEEABLE FUTURE. ... LLMs and humans look like they're going to need to work together for the foreseeable future.

Expand full comment
Tunde's avatar

This is what interests me! How will the nature of expertise change! How will our cultivation of expertise change. As these systems are will they significantly change our societies?

Expand full comment
Y Thn's avatar

All @Peter Dorman said is very much in line with my experience with the older behavioral decision making research, already employed in various fields. Until AI systems show real initiative and originate purpose, they are just doing what they are programmed to do. Problem is how they are marketed today, as an orange that oozes gold if squeezed by anyone - especially middle managers.

Expand full comment
esk's avatar

I believe Stockfish is the current champ in the engine world.

Expand full comment
Fabian Transchel's avatar

Not really. You can learn alot about constrained problems from it, but most (as in: exponentially more...) problems we'd want AGI for are precisely interesting because they are *NOT* constrained or we do not know the constraints.

Expand full comment
esk's avatar

I think you responded to the wrong post?

Expand full comment
jibal jibal's avatar

You clearly did, because your comment had nothing at all to do with what you responded to.

P.S. Ah, I get it ... you paid no attention whatsoever to the *context*, and took "current champ in the engine world" in some broad abstract way rather than referring--as it SO OBVIOUSLY did--to chess ability. It seems that there's a bug in your cognition program ... it fails to grasp that comments are part of a *thread*, where the meaning of each comment is context-sensitive, referring in various ways to the comments above it.

Expand full comment
esk's avatar

Ok well then ... yes really? Stockfish is the current strongest chess engine as determined in international competitions.

Expand full comment
jibal jibal's avatar

Your "small point" is a fundamental failure to comprehend ... "They" is ===> LLMs <===, not AI or ML in general. As for "the descriptive material I saw", just read https://en.wikipedia.org/wiki/AlphaZero. The only "chess context" it has is the rules of the game ... it similarly can be given a "Go context" and a "Shogi context".

As for your other point, Gary is talking about his "vision of AGI", not merely an iterative process between humans and machines. In context, I read "one that combines the strengths of humans with the strength of machines, overcoming the weaknesses of humans" as being an AI system that combines *in itself* human cognitive ability with "the strength of machines" -- that is vast speed, vast memory, physical endurance, etc.

Expand full comment
jibal jibal's avatar

As I commented to some friends about Nate Silver's article about LLMs and poker (with its ignorant nonsense about AGI):

Any human can be given a set of rules and then generally apply them ... this goes for games, doing math, physics, biology, medicine, etc., operating or repairing machinery, etc. etc. LLMs are completely incapable of doing anything of the sort.

Expand full comment
Jonah's avatar

I think one of the most dangerous things here is the development of these models by a handful of wealthy companies. "Dangerous" primarily in the sense that it means that objective research on their capabilities and nature is largely in the hands of organizations with every incentive to misrepresent, distort and conceal what they know.

When academia drives research into a field, biases and ego can affect honesty, but when corporations do, the potential for inaccurate reporting is so much greater. That these companies have fired a lot of their QA, DEI and ethics teams only makes the problem worse.

As an example, I read a paper recently that used a slight modification to common question benchmarks to reduce date leakage (the "none of the other options" variation), and saw large drops in performance for most models. Companies have every incentive to claim that data leakage is impossible and that their simple n-gram filtering is enough to detect it, because other claims would make them look worse and affect their profits.

Dangerous, as well, because the notion of an AI that is aligned with "the good of humanity" while such development is driven almost entirely by a handful of people of disproportionately European and Asian descent who have self-selected for an obsessive focus on wealth is shaky at best.

Expand full comment
Shauna Gordon's avatar

Even worse, the companies themselves (such as OpenAI) aren't actually themselves wealthy. They're burning other people's money on the promise of AGI, and with the latest round of fundraising, they're burning the money of other people who themselves don't have the money.

First, there's Microsoft, which owns 49% of OpenAI already, and part of their funding deal is that Microsoft will get 75% of the revenue from OpenAI's "AGI products" (or whatever they call it). Altman literally managed to grift Microsoft out of *billions*.

But wait, there's more! The latest round of funding includes $20b from Softbank -- half of which Softbank themselves have to borrow -- on the condition that OpenAI go for-profit, before OpenAI sees a dime of it.

Talk about incentive to lie and obfuscate.

Expand full comment
Chad Woodford's avatar

Already my favorite WWDC announcement 👏🏻

Expand full comment
Chad Woodford's avatar

This also reminds me of a year or so ago when people were lauding the mathematical abilities of LLMs: “They can do math now!” My man, it’s a computer. The fact that sometimes they can’t despite enormous compute costs should be troubling

Expand full comment
Nitin Badjatia's avatar

THIS👆. I know these types of papers aren’t released in coordination with marketing departments, but on the eve of one of the most anticipated WWDCs??

Expand full comment
Robert Keith's avatar

And at the same time, we're being inundated in the media once again by folks like Dario Amodei, about how AI is going to imminently spark massive changes to society; how it is an existential threat to humanity; how it tried to "blackmail engineers"; how AGI is right around the corner; and...blah, blah, blah...

And, of course, everyone is—once again—declaring the death of Hollywood because of Google's Veo 3.

Is it just me, or didn't we hear pretty much the same hyperbole this time last year? It seems they just rerun the identical playbook during lulls in the excitement.

Just because the magic trick gets a little better and more sophisticated doesn't mean it isn't still just a magic trick. Or am I missing something here?

Expand full comment
Jonah's avatar

If Dario Amodei truly believed half of what he was saying about how dangerous these models are, he would have to see himself as one of the greatest villains in the history of humanity.

Expand full comment
Robert Keith's avatar

And that's one of the great dichotomies in all of this.

"It's dangerous, it could destroy humanity! Can we please have more money to develop it further?"

Expand full comment
Jonah's avatar

The whole field is a mess of arrant hypocrisy, unfortunately.

OpenAI, a non-profit with closed-source models whose employees and executives bring in huge compensation. Anthropic, a business that distinguishes itself by caring about safety, but has released models with more safety red flags than most of the rest on a similar, if not more accelerated, timeline, and whose CEO has goals to conquer space.

X and Grok, a model that is meant to seek truth and avoid political bias, but questions the Holocaust death toll and talks about white genocide. Google, a search engine company that wants to replace its search engine with AI.

Expand full comment
Robert Keith's avatar

The pattern being: these flawed systems are only as good as their flawed creators. And that's over and above the foundational tech problems of LLMs.

Expand full comment
jibal jibal's avatar

This is an old claim long proved to be false, from the time that Arthur Samuel first lost to his own checkers program.

Expand full comment
Robert Keith's avatar

To which claim, specifically, are you referring?

Expand full comment
jibal jibal's avatar

Gary also says that they are dangerous ... in fact that is one of the major points here.

Expand full comment
Fabian Transchel's avatar

But they differ substantially as to what dangers they attribute. Important point.

Expand full comment
Jonah's avatar

Probably so, but that’s not the point. The point is that Amodei is doing something that he, himself, by his own lights, considers to be potentially tremendously destructive. Gary Marcus is not doing that same thing, so the fact that he also considers that it could be tremendously destructive does not reflect on him, one way or another.

Expand full comment
jibal jibal's avatar

Other than about AGI, nothing in this article contradicts those statements.

Expand full comment
Robert Keith's avatar

Reality contradicts those statements. And Gary's article backs that up.

Expand full comment
jibal jibal's avatar

Ignorant intellectually dishonest cognitively inept nonsense. I'll have nothing more to say to this foolish person who thinks with his amygdala.

Expand full comment
Ann A's avatar

Both-And.

Experts will keep working on AGI.

Capitalists will keep exploiting whatever LLMs with little heed as to the inherent limitations (therefore ultimately unintended outcomes) to make money.

BOTH - AND

Expand full comment
Jules Pitt's avatar

V interesting perspectives, especially in consideration of the maxed out training data considerations.. is quantum compute a factor in the next leg up to agi? Would love views on where you think that’s at @gary 🙏🏼

Expand full comment
MarkS's avatar

Quantum computing will not be useful for AI for a very long time. Controllable qubits are counted by the dozen, scaling up to the billion needed for AI is not remotely feasible.

Expand full comment
Pramodh Mallipatna's avatar

Interesting set of data points from the new paper from Apple. Aligns with the observations you have been making.

Sharing my recent article on the same topic.

From Scaling to Bottleneck Era: AGI Meets the Data Wall

https://open.substack.com/pub/pramodhmallipatna/p/agi-meets-the-data-wall

Expand full comment
Gerard's avatar

A great summary of current affairs in AI and Apple closing another chapter for generative AI. Last year, it was “reasoning”; last week, “reasoning models”.

This puts an uncomfortable question into the spotlight of the AI community and industry as a whole: why did it take so long to verify these claims? In other fields, people would have pushed back against OpenAI, who introduced this technique, and asked for actual proof, which we now know was always missing. Somehow here we are with the US government making policy around AI that couldn’t beat a kitchen calculator.

AI is so hyped up that some people are voicing their anxiety and fear of imaginary threats and scenarios taken from sci-fi.

I do see this paper as a success but also a massive failure of academia and AI research to protect the public against Silicon Valley greed for power. We all have failed against the hype and the myths spreading like wildfires. Now we have a full generation of people truly believing that AGI is coming and mountains of money are being wasted. That’s a very sad story.

Besides a couple of AI researchers raising awareness around AI limitations, the rest have been sitting silently or, even worse, following a bit too sheepishly.

If you are an AI researcher, this is a good time to take a serious look into yourself and your ways. Reconsider the importance of due diligence and verifiable facts.

The reality is that current AI research lacks scientific rigour and is way too willing to take on unsubstantiated claims and speculation for a minute of attention.

Expand full comment
Larry Jewett's avatar

AI could be a science if more AI practitioners behaved like scientists.

But I suspect that the (probably justified) fear of losing funding is a large part of the reason so few have publicly pushed back on the hype.

Expand full comment
Julia Diez's avatar

As a software engineer — and a mother — I decided to introduce AI to my 8-year-old. Mainly because it’s inevitable in her future, but also because I want her to understand early on what it is, and what it isn’t.

I started by telling her: “First off, the name artificial intelligence is misleading. There isn’t a single atom of real intelligence in it. It can tell you something true one day and something wrong the next if it’s trained on the wrong data.”

For me, that’s an important lesson for her to learn early.

Expand full comment
David Hsing's avatar

"LLMs are no substitute for good well-specified conventional algorithms."

My corollary:

For every task that an AGI performs there is at least one non-AGI that does it just as well as an AGI except cheaper and more reliably.

https://davidhsing.substack.com/p/what-the-hell-is-agi-even-for

Expand full comment
Fabian Transchel's avatar

While this may well be true, it is a non-sequitur for the discovery of such algorithms and that's the overall point in the first place: Folks praying to AGI god want (or at least pretend) to be going after the big problems of humanity where we precisely do not have an algorithm. Finding *any* algorithm toward an unsolved problem is usually the achievement, not (consequtively) lowering the bound of efficiency.

Expand full comment
Oleg  Alexandrov's avatar

LLM fill a very important niche. They can solve poorly specified problems for which we have a lot of data. At least they try. If wielding tools and having the ability to inspect how well they do, they would be a very valuable component.

Expand full comment
David Hsing's avatar

Instead of sparce pickin' in the mud at very best, how about THIS ready-made list?

https://en.wikipedia.org/wiki/List_of_unsolved_problems_in_mathematics

...Methinks the lopsided investment into water-guzzing monstrosities aren't even a tiny fraction as worthwhile... No?

The entire direction of this "endeavor" is comically MISTAKEN

https://davidhsing.substack.com/p/what-the-world-needs-isnt-artificial

Expand full comment
Fabian Transchel's avatar

Yes, true.

Expand full comment