Gary, I've been thinking... Is generative AI a "net negative" tech? Meaning it's generally doing more harm than good for the world at large. I don't see how it's NOT a net negative tech.
I fully agree with you. Its harm is much much more than its usefulness.
It's trying to destroy the value of human knowledge and human intelligence and human skill, and it's killing the internet by filling it with AI generated slop and AI bots.
I swear, the world and the internet in particular was much much better before Gen AI. This is an objective fact.
But if they had gone to other things, they wouldn’t have gone to enrich Nobel “physicists” like Geoff Hinton at Google, so one would have to counter balance things with that opportunity cost as well.
Put much more succinctly than my comment, but agreed. This really goes beyond puffery ("when released, our software will be the best in the world and drive your lover wild") and straight into intentional misrepresentation. Like the VW diesel emissions scandal.
Gary, I'm sure you have spoken about this, but can you please point me towards your writings about the the development of useful AI? What value do you see in what has been created so far? What are alternatives to LLMs? Do you believe that AGI is actually possible?
Zuckerberg lying isn't big news. But his VP of AI quitting is.
I don't know who advises guys like Zuckerberg on legal matters, or if he really listens, but making a claim (Llama 4 did great!) based on a material misrepresentation (um, it didn't, and we fudged the test that we showed you) with the intent that people rely on it (why else do you make the claim?), causing the stock to go higher when you knew the claim was false, gets you a long way toward the kind of fraud that a shareholders' attorney would lick his chops at. I guess that Zuck can get out of any federal suit with another payment to Trump, but a civil cause of action could still be pretty ugly for him.
We won't get to general intelligence in a machine when the folks promoting it act this unintelligently themselves. but that doesn't mean they cannot do a lot of harm trying.
As an outsider to the AI community, it strikes me as just blindingly obvious that a purported "AI" that is unable to learn arithmetic (and no LLM has been able to learn arithmetic) is simply nowhere close to human level "AGI".
I've mentioned some of my experience with cooked benchmarks but obviously cooked product announcements have always been part of the tech game.
After the System/360 mainframe was launched, IBM was famous for announcing machines that weren't delivered; partly because they didn't know what customers would want, but mostly as preemptive strikes against competitors.
Gary, can you comment on the update from Mislav Balunović where they claimed that Gemini 2.5 got 24% of the possible points on the Math Olympiad problems?
I find this deeply suspicious, since all the previous models, including Gemini 2.0 Flash Thinking, released a month and a half earlier, were in a similar, lower, range, but I don't really know how plausible it is.
I am wondering whether Google might have employed some quick-and-dirty methods to boost their apparent performance through data leakage, such as putting problems and answers in the pre-prompt or fine-tuning on a more limited dataset that included them. To me, such a theory would seem like a lot of effort (and a lot of dishonesty) for little gain, but then, maybe appearing to be better than the competition seems lucrative enough to people at Google that such underhanded techniques might seem like an option.
Gary, I really think you need to make a deep dive or analysis of the benchmark tests. The greater public need to know. From what I have looked into....it is a racket.
Its like that scene in the Big Short where they go to the credit agencies and ask why something that is clearly junk is being rated as AAA, before coming to the realisation that the credit agencies' clients are the same companies it is meant to regulate.
Frontier labs have tens of billions and 30B+ stock valuations at stake on the perf of these benchmarks. I would not put it past the snakeoil AI bros to claim AGI has been achieved by releasing some shoddy bar charts of these benchmarks. Even more worryingly, as pointed out, AI is making very little actual progress but being gamified to appear so from compromised benchmarks.
The most common issue seems to be they are putting benchmark questions/solutions into the training set. This should be automatic disqaulification. But that would require AI companies to completely disclose all training data which we know they deliberately hide because it will expose so much (piracy, cheating, copyright violation, breaking robots.txt, going against ToS etc). The most recent case was OpenAI funding FrontierMath and also given secret access to its dataset. The only reason the public found out was due to a leak!
And the LLMArena benchmark....its a joke. The recent top models only need 10K votes to make it. Big tech has 150-200K employees. IMO, Meta has deliberately fingerprinted Llama 4 so those in the know can manipulate votes and boost ELO. Just look at the voting dataset LLMArena released. Llama 4 has a very distinct writing style, uses a very distinct formatting style (way it uses bold, paragraphs, bullet points), is nearly always the longest response by far, and is nearly always has the highest emoji usage. I think most from a few questions can easilt guess which response is Llama 4.
This isn't anonymous. Llama 4 can be identified easily even if it was just a pure text stream!
In 2023, I have said I expected the hype to run for about 5 years before the unrealistic AGI expectations would run out of steam and we would see a 'dotom-bust'-like reaction. That would mean 2027 (2022 was the start). I still keep that in my mind for now. I may very well have been wrong, but I still feel the dreams and myths are so strong they have a longer life than would be expected from an rational/economic/etc. perspective. So far the money is still streaming in to be burned in the furnace of hype. And some of it is still untapped and vulnerable to FOMO.
Not one has to my knowledge publicly asked the difficult questions about data leakage and data contamination“
It’s actually much worse. They not only don’t ask the questions (which they already know the answers to) but they purposely don’t acknowledge the issue because it’s advantageous not to: it makes their bots seem “smarter” and more capable that they actually are.
When LLMs impress on "reasoning" questions, it's always because they're mimicking something from the training. And the AI companies running their goofy little evaluations have always made half-assed at best efforts to prevent this. Remember how the GPT-4 "system card" claimed that contamination wasn't an issue because they'd failed to find a handful of randomly chosen 50-character long exact string matches from benchmark questions in the training? That seemed less like an attempt to protect against contamination and more like an attempt to put on a show of appearing to have made an attempt to protect against contamination. And then, sure enough, right after GPT-4 came out someone found that it aced Codeforces questions from just before the training cutoff and flunked similar ones from just after the cutoff. And of course Codeforces was one of the benchmarks OpenAI used to sell the world on the awesome reasoning powers of GPT-4. Derp de derp.
Intentionally inserting benchmarks into the training to dupe people would simply be a more brazen and more cynical version of what's been going on the whole time.
Gary, I've been thinking... Is generative AI a "net negative" tech? Meaning it's generally doing more harm than good for the world at large. I don't see how it's NOT a net negative tech.
That’s my current tentative conclusion, yes
Of course degenerative AI is s net negative.
It’s in the name
I fully agree with you. Its harm is much much more than its usefulness.
It's trying to destroy the value of human knowledge and human intelligence and human skill, and it's killing the internet by filling it with AI generated slop and AI bots.
I swear, the world and the internet in particular was much much better before Gen AI. This is an objective fact.
In my non-expert opinion, yes by a distance. Just in the opportunity cost of what real world good those hundreds of billions could have gone to.
But if they had gone to other things, they wouldn’t have gone to enrich Nobel “physicists” like Geoff Hinton at Google, so one would have to counter balance things with that opportunity cost as well.
Ah touché, yes we must think of Mr. Hinton, and the shareholders! Oh the shareholders.
The good news is that people can sell their stock in the AI companies and blame it on tariffs to avoid embarrassment.
“Playing fast and loose” in order to boost profits and outside investment is also known as fraud (or at least used to be)
Put much more succinctly than my comment, but agreed. This really goes beyond puffery ("when released, our software will be the best in the world and drive your lover wild") and straight into intentional misrepresentation. Like the VW diesel emissions scandal.
It’s also scientific fraud (for any scientists and engineers involved)
Gary, I'm sure you have spoken about this, but can you please point me towards your writings about the the development of useful AI? What value do you see in what has been created so far? What are alternatives to LLMs? Do you believe that AGI is actually possible?
best short answer: read my paper The Next Decade in AI
But there is more to be said, and i hope to find time before long to say it
And I will! Thank you, Gary!
Zuckerberg lying isn't big news. But his VP of AI quitting is.
I don't know who advises guys like Zuckerberg on legal matters, or if he really listens, but making a claim (Llama 4 did great!) based on a material misrepresentation (um, it didn't, and we fudged the test that we showed you) with the intent that people rely on it (why else do you make the claim?), causing the stock to go higher when you knew the claim was false, gets you a long way toward the kind of fraud that a shareholders' attorney would lick his chops at. I guess that Zuck can get out of any federal suit with another payment to Trump, but a civil cause of action could still be pretty ugly for him.
We won't get to general intelligence in a machine when the folks promoting it act this unintelligently themselves. but that doesn't mean they cannot do a lot of harm trying.
I keep wondering about the lawyers at these companies. They must want to muzzle their CEOs sometimes.
As an outsider to the AI community, it strikes me as just blindingly obvious that a purported "AI" that is unable to learn arithmetic (and no LLM has been able to learn arithmetic) is simply nowhere close to human level "AGI".
Arithmetic is the foundation of all of mathematics.
An AI than has not learned arithmetic has not learned mathematics.
Not surprisingly, common crows are better at arithmetic than LLMs.
I guess we’ll have to replace the term “birdbrain” with “LLMbrain” (then again, lamebrain already covers that)
https://www.smithsonianmag.com/smart-news/crows-can-count-up-to-four-like-human-toddlers-study-suggests-180984420/
LLMs are dead. Long live LLMs.
La, la-la, di-dee-da
La-la, di-dee-da, da-dum
Sing us a song you're the AI man!
Sing us a song to-nite!
We're all in the mood for absurdity
And you've got us ranting all night!
I've mentioned some of my experience with cooked benchmarks but obviously cooked product announcements have always been part of the tech game.
After the System/360 mainframe was launched, IBM was famous for announcing machines that weren't delivered; partly because they didn't know what customers would want, but mostly as preemptive strikes against competitors.
Gary, can you comment on the update from Mislav Balunović where they claimed that Gemini 2.5 got 24% of the possible points on the Math Olympiad problems?
I find this deeply suspicious, since all the previous models, including Gemini 2.0 Flash Thinking, released a month and a half earlier, were in a similar, lower, range, but I don't really know how plausible it is.
I am wondering whether Google might have employed some quick-and-dirty methods to boost their apparent performance through data leakage, such as putting problems and answers in the pre-prompt or fine-tuning on a more limited dataset that included them. To me, such a theory would seem like a lot of effort (and a lot of dishonesty) for little gain, but then, maybe appearing to be better than the competition seems lucrative enough to people at Google that such underhanded techniques might seem like an option.
Gary, I really think you need to make a deep dive or analysis of the benchmark tests. The greater public need to know. From what I have looked into....it is a racket.
Its like that scene in the Big Short where they go to the credit agencies and ask why something that is clearly junk is being rated as AAA, before coming to the realisation that the credit agencies' clients are the same companies it is meant to regulate.
Frontier labs have tens of billions and 30B+ stock valuations at stake on the perf of these benchmarks. I would not put it past the snakeoil AI bros to claim AGI has been achieved by releasing some shoddy bar charts of these benchmarks. Even more worryingly, as pointed out, AI is making very little actual progress but being gamified to appear so from compromised benchmarks.
The most common issue seems to be they are putting benchmark questions/solutions into the training set. This should be automatic disqaulification. But that would require AI companies to completely disclose all training data which we know they deliberately hide because it will expose so much (piracy, cheating, copyright violation, breaking robots.txt, going against ToS etc). The most recent case was OpenAI funding FrontierMath and also given secret access to its dataset. The only reason the public found out was due to a leak!
And the LLMArena benchmark....its a joke. The recent top models only need 10K votes to make it. Big tech has 150-200K employees. IMO, Meta has deliberately fingerprinted Llama 4 so those in the know can manipulate votes and boost ELO. Just look at the voting dataset LLMArena released. Llama 4 has a very distinct writing style, uses a very distinct formatting style (way it uses bold, paragraphs, bullet points), is nearly always the longest response by far, and is nearly always has the highest emoji usage. I think most from a few questions can easilt guess which response is Llama 4.
This isn't anonymous. Llama 4 can be identified easily even if it was just a pure text stream!
In 2023, I have said I expected the hype to run for about 5 years before the unrealistic AGI expectations would run out of steam and we would see a 'dotom-bust'-like reaction. That would mean 2027 (2022 was the start). I still keep that in my mind for now. I may very well have been wrong, but I still feel the dreams and myths are so strong they have a longer life than would be expected from an rational/economic/etc. perspective. So far the money is still streaming in to be burned in the furnace of hype. And some of it is still untapped and vulnerable to FOMO.
Not one has to my knowledge publicly asked the difficult questions about data leakage and data contamination“
It’s actually much worse. They not only don’t ask the questions (which they already know the answers to) but they purposely don’t acknowledge the issue because it’s advantageous not to: it makes their bots seem “smarter” and more capable that they actually are.
When LLMs impress on "reasoning" questions, it's always because they're mimicking something from the training. And the AI companies running their goofy little evaluations have always made half-assed at best efforts to prevent this. Remember how the GPT-4 "system card" claimed that contamination wasn't an issue because they'd failed to find a handful of randomly chosen 50-character long exact string matches from benchmark questions in the training? That seemed less like an attempt to protect against contamination and more like an attempt to put on a show of appearing to have made an attempt to protect against contamination. And then, sure enough, right after GPT-4 came out someone found that it aced Codeforces questions from just before the training cutoff and flunked similar ones from just after the cutoff. And of course Codeforces was one of the benchmarks OpenAI used to sell the world on the awesome reasoning powers of GPT-4. Derp de derp.
Intentionally inserting benchmarks into the training to dupe people would simply be a more brazen and more cynical version of what's been going on the whole time.
“Two things are infinite, as far as we know – the universe and human stupidity.”
Stupidity and greed are interchangeable.
"While I can't report the veracity of the rumor here is a bunch of negativity I have heard anecdotally from Reddit"
People have known for months there's marginal returns on scaling training because of the data wall
https://open.substack.com/pub/matthewharris/p/beyond-the-scaling-laws?r=298d1j&utm_medium=ios