71 Comments
User's avatar
David Hsing's avatar

Gary, I've been thinking... Is generative AI a "net negative" tech? Meaning it's generally doing more harm than good for the world at large. I don't see how it's NOT a net negative tech.

Expand full comment
Gary Marcus's avatar

That’s my current tentative conclusion, yes

Expand full comment
Larry Jewett's avatar

Of course degenerative AI is s net negative.

It’s in the name

Expand full comment
Az's avatar

I fully agree with you. Its harm is much much more than its usefulness.

It's trying to destroy the value of human knowledge and human intelligence and human skill, and it's killing the internet by filling it with AI generated slop and AI bots.

I swear, the world and the internet in particular was much much better before Gen AI. This is an objective fact.

Expand full comment
Ari's avatar

In my non-expert opinion, yes by a distance. Just in the opportunity cost of what real world good those hundreds of billions could have gone to.

Expand full comment
Larry Jewett's avatar

But if they had gone to other things, they wouldn’t have gone to enrich Nobel “physicists” like Geoff Hinton at Google, so one would have to counter balance things with that opportunity cost as well.

Expand full comment
Ari's avatar

Ah touché, yes we must think of Mr. Hinton, and the shareholders! Oh the shareholders.

Expand full comment
Max Millick's avatar

The good news is that people can sell their stock in the AI companies and blame it on tariffs to avoid embarrassment.

Expand full comment
Larry Jewett's avatar

“Playing fast and loose” in order to boost profits and outside investment is also known as fraud (or at least used to be)

Expand full comment
Richard Foxall's avatar

Put much more succinctly than my comment, but agreed. This really goes beyond puffery ("when released, our software will be the best in the world and drive your lover wild") and straight into intentional misrepresentation. Like the VW diesel emissions scandal.

Expand full comment
Larry Jewett's avatar

It’s also scientific fraud (for any scientists and engineers involved)

Expand full comment
Michael Mennies's avatar

Gary, I'm sure you have spoken about this, but can you please point me towards your writings about the the development of useful AI? What value do you see in what has been created so far? What are alternatives to LLMs? Do you believe that AGI is actually possible?

Expand full comment
Gary Marcus's avatar

best short answer: read my paper The Next Decade in AI

But there is more to be said, and i hope to find time before long to say it

Expand full comment
Michael Mennies's avatar

And I will! Thank you, Gary!

Expand full comment
Richard Foxall's avatar

Zuckerberg lying isn't big news. But his VP of AI quitting is.

I don't know who advises guys like Zuckerberg on legal matters, or if he really listens, but making a claim (Llama 4 did great!) based on a material misrepresentation (um, it didn't, and we fudged the test that we showed you) with the intent that people rely on it (why else do you make the claim?), causing the stock to go higher when you knew the claim was false, gets you a long way toward the kind of fraud that a shareholders' attorney would lick his chops at. I guess that Zuck can get out of any federal suit with another payment to Trump, but a civil cause of action could still be pretty ugly for him.

We won't get to general intelligence in a machine when the folks promoting it act this unintelligently themselves. but that doesn't mean they cannot do a lot of harm trying.

Expand full comment
Jasmine R's avatar

I keep wondering about the lawyers at these companies. They must want to muzzle their CEOs sometimes.

Expand full comment
MarkS's avatar
Apr 7Edited

As an outsider to the AI community, it strikes me as just blindingly obvious that a purported "AI" that is unable to learn arithmetic (and no LLM has been able to learn arithmetic) is simply nowhere close to human level "AGI".

Expand full comment
Larry Jewett's avatar

Arithmetic is the foundation of all of mathematics.

An AI than has not learned arithmetic has not learned mathematics.

Expand full comment
Larry Jewett's avatar

Not surprisingly, common crows are better at arithmetic than LLMs.

I guess we’ll have to replace the term “birdbrain” with “LLMbrain” (then again, lamebrain already covers that)

https://www.smithsonianmag.com/smart-news/crows-can-count-up-to-four-like-human-toddlers-study-suggests-180984420/

Expand full comment
Aaron Turner's avatar

LLMs are dead. Long live LLMs.

Expand full comment
Spartacus's avatar

La, la-la, di-dee-da

La-la, di-dee-da, da-dum

Sing us a song you're the AI man!

Sing us a song to-nite!

We're all in the mood for absurdity

And you've got us ranting all night!

Expand full comment
Bruce Olsen's avatar

I've mentioned some of my experience with cooked benchmarks but obviously cooked product announcements have always been part of the tech game.

After the System/360 mainframe was launched, IBM was famous for announcing machines that weren't delivered; partly because they didn't know what customers would want, but mostly as preemptive strikes against competitors.

Expand full comment
Jonah's avatar

Gary, can you comment on the update from Mislav Balunović where they claimed that Gemini 2.5 got 24% of the possible points on the Math Olympiad problems?

I find this deeply suspicious, since all the previous models, including Gemini 2.0 Flash Thinking, released a month and a half earlier, were in a similar, lower, range, but I don't really know how plausible it is.

I am wondering whether Google might have employed some quick-and-dirty methods to boost their apparent performance through data leakage, such as putting problems and answers in the pre-prompt or fine-tuning on a more limited dataset that included them. To me, such a theory would seem like a lot of effort (and a lot of dishonesty) for little gain, but then, maybe appearing to be better than the competition seems lucrative enough to people at Google that such underhanded techniques might seem like an option.

Expand full comment
Runner's avatar

Gary, I really think you need to make a deep dive or analysis of the benchmark tests. The greater public need to know. From what I have looked into....it is a racket.

Its like that scene in the Big Short where they go to the credit agencies and ask why something that is clearly junk is being rated as AAA, before coming to the realisation that the credit agencies' clients are the same companies it is meant to regulate.

Frontier labs have tens of billions and 30B+ stock valuations at stake on the perf of these benchmarks. I would not put it past the snakeoil AI bros to claim AGI has been achieved by releasing some shoddy bar charts of these benchmarks. Even more worryingly, as pointed out, AI is making very little actual progress but being gamified to appear so from compromised benchmarks.

The most common issue seems to be they are putting benchmark questions/solutions into the training set. This should be automatic disqaulification. But that would require AI companies to completely disclose all training data which we know they deliberately hide because it will expose so much (piracy, cheating, copyright violation, breaking robots.txt, going against ToS etc). The most recent case was OpenAI funding FrontierMath and also given secret access to its dataset. The only reason the public found out was due to a leak!

And the LLMArena benchmark....its a joke. The recent top models only need 10K votes to make it. Big tech has 150-200K employees. IMO, Meta has deliberately fingerprinted Llama 4 so those in the know can manipulate votes and boost ELO. Just look at the voting dataset LLMArena released. Llama 4 has a very distinct writing style, uses a very distinct formatting style (way it uses bold, paragraphs, bullet points), is nearly always the longest response by far, and is nearly always has the highest emoji usage. I think most from a few questions can easilt guess which response is Llama 4.

This isn't anonymous. Llama 4 can be identified easily even if it was just a pure text stream!

Expand full comment
Gerben Wierda's avatar

In 2023, I have said I expected the hype to run for about 5 years before the unrealistic AGI expectations would run out of steam and we would see a 'dotom-bust'-like reaction. That would mean 2027 (2022 was the start). I still keep that in my mind for now. I may very well have been wrong, but I still feel the dreams and myths are so strong they have a longer life than would be expected from an rational/economic/etc. perspective. So far the money is still streaming in to be burned in the furnace of hype. And some of it is still untapped and vulnerable to FOMO.

Expand full comment
Larry Jewett's avatar

Not one has to my knowledge publicly asked the difficult questions about data leakage and data contamination“

It’s actually much worse. They not only don’t ask the questions (which they already know the answers to) but they purposely don’t acknowledge the issue because it’s advantageous not to: it makes their bots seem “smarter” and more capable that they actually are.

Expand full comment
Ben P's avatar
Apr 8Edited

When LLMs impress on "reasoning" questions, it's always because they're mimicking something from the training. And the AI companies running their goofy little evaluations have always made half-assed at best efforts to prevent this. Remember how the GPT-4 "system card" claimed that contamination wasn't an issue because they'd failed to find a handful of randomly chosen 50-character long exact string matches from benchmark questions in the training? That seemed less like an attempt to protect against contamination and more like an attempt to put on a show of appearing to have made an attempt to protect against contamination. And then, sure enough, right after GPT-4 came out someone found that it aced Codeforces questions from just before the training cutoff and flunked similar ones from just after the cutoff. And of course Codeforces was one of the benchmarks OpenAI used to sell the world on the awesome reasoning powers of GPT-4. Derp de derp.

Intentionally inserting benchmarks into the training to dupe people would simply be a more brazen and more cynical version of what's been going on the whole time.

Expand full comment
Birgitte Rasine's avatar

“Two things are infinite, as far as we know – the universe and human stupidity.”

Stupidity and greed are interchangeable.

Expand full comment
Matthew Harris's avatar

"While I can't report the veracity of the rumor here is a bunch of negativity I have heard anecdotally from Reddit"

People have known for months there's marginal returns on scaling training because of the data wall

https://open.substack.com/pub/matthewharris/p/beyond-the-scaling-laws?r=298d1j&utm_medium=ios

Expand full comment