If you want deterministic replies, ask the tool to write a program.
If you want interpretive replies ask the tool for direct output.
It’s directly analogous to Dan Kahenman’s “Thinking, Slow and Fast”. Algorithm vs constructed recall.
LLM’s have built-in non-determinacy, I know it’s called “hallucination”, it’s you can’t get rid of it unless you turn the temperature to zero inside the mechanism, which you can’t do with a chat, directly.
AGI is, as my father used to say, a fig-newton of the imagination.
Hi Gary! When all is said and done, every LLM ever is about producing something out of nothing - intelligence out of a pile of numbers and math calcs over them. We've seen this movie before :)
It’s like my kids “what makes a baby” book. The first thing you need to know is that you can’t make a baby out of nothing. You can’t make superintelligence out of internet bullshit, either.
Exactly what part of "something out of nothing" fails to apply to human intelligence? (Bearing in mind that everything innate in the human mind is actually learned, by evolution.)
Human/animal intelligence isn't something out of nothing - it is grounded in lived, physical experience using a body+brain that's evolved to negotiate the analog world, doesn't need symbols at all [eg. no language, math...].
In contrast, ALL AI employs nothing but symbols - it's all exclusively based on explicit computation.
1. "Pure pretraining scaling has clearly failed to produce AGI." → Pure scaling of any kind (including fine-tuning) has failed and will fail to produce AGI.
2. AGI is as it stands an irrelevant subject, a red herring given what we really should be talking about:
3. GenAI is going to have an impact nonetheless, most of it probably not very good
The arbitrary nature of the benchmarks is just part of the issue.
A benchmark that includes problems that a bot might have been trained on cannot be a valid gauge of intelligence because merely regurgitating a solution indicates nothing at all about intelligence.
Even the Frontier math benchmark is questionable as a gauge of intelligence and reasoning because OpenAI was given access to the problems ahead of test time.
ARC-AGI tries (based on Chollet's paper form years ago) to create a benchmark that resists brute-forcing by the likes of GenAI. See the link in the reply above.
Those math exam benchmarks are not benchmarks that resist that (and are even very vulnerable to it). Most benchmarks fall in the "Lies, Big Lies, Statistics, Benchmarks" series. You are right, the benchmark numbers paraded by the GenAI providers are mostly meaningless with respect to actual intelligence. See https://ea.rna.nl/2023/12/08/state-of-the-art-gemini-gpt-and-friends-take-a-shot-at-learning/
OK, Gary, I signed up as a paying subscriber. Not because I particularly want the benefits but because your work is so excellent. I get real value from it so it's time to do my part. Carry on!
Not sure how he plans to build a better AI while he's single handedly tearing down Washington, and stealing Grandma's Social Security Check to cover his failing Tesla operations.
"Now they’re coming for your Social Security money. They want your retirement money. They want it back so they can give it to their criminal friends on Wall Street, and you know something? They’ll get it."
One clarification: they came for SS, a long time ago. It's part of the problem with it-It's not a dedicated retirement fund but a pay-as-you-go fund that Congress has raided since its inception.
If they actually cared about it and us, it would be off-limits to them, and it would be invested in a way that allowed for growth so we had more at the end to show for all of our contributions. But, since they don't give a shit about us, they'll never do that, or anything, to make it better
Gotta admit, I'm Elon'd out. If he packed up his Grok, his X, his rockets, his Starlink and his Tesla and went home tomorrow, pretty sure we'd survive just fine. In some cases we might be better off in the long run.
This has been annoying even moreso than all the AGI/superintelligence techno optimism and doomposting crap so I'm gonna go ahead and say it. No you will likely NEVER "solve" or rid LLMs of hallucinations. You might be able to get their accuracy up to 99% under the right circumstances but they won't ever be 100% accurate. Ever. I see so many redditors and folks on places like Hacker News get so caught up in their hype bubble they overlook this with lines like "once we eliminate hallucinations." No it doesn't take an AI expert to understand why this isn't possible. LLMs are STATISTICAL models that try to make the best educated GUESSES based upon an input prompt. They are inherently probabilistic and to some degree random. Ignore and overlook these traits at your peril tech bros.
The real question is not "how good is the latest model X from Y?" but "when will Y (and their investors Z) finally realise that LLMs are *not* the path to reliable human-level AGI?" Because until that happens (a) LLMs will continue to suck all the oxygen out of AI research, (b) billions, even trillions, of USD will be wasted, and (c) progress towards actual AGI will be effectively stalled.
AGI is an aspirational long-term goal. A smart AI vendor should ask itself what kind of functionality is feasible short-term and if a customer would pay for it.
From this angle, a case could be made that a few companies will live to see a profit.
As to the path to AGI, I think cataloguing world's complexity is a requirement, so current methods will help.
Not surprising that the models tend to converge. There’s no real defensibility built in. Further, the narratives have started contradicting themselves lately;
Gary, while Grok 3 also miraculously didn't solve hallucinations ... I guess nobody expected that ... I love that you point it out. ... would really love for you to take a look at maisa.ai our different approach to AI that created what we'd call hallucination-resistant aAi system ... by using AI not to provide the answers, but just shows the path and we actually compute the work and answers, and as such solve for hallucinations. it's an AI Computer based on an LLM OS.
Sad.
Just a note on hallucination:
I tell my teams:
If you want deterministic replies, ask the tool to write a program.
If you want interpretive replies ask the tool for direct output.
It’s directly analogous to Dan Kahenman’s “Thinking, Slow and Fast”. Algorithm vs constructed recall.
LLM’s have built-in non-determinacy, I know it’s called “hallucination”, it’s you can’t get rid of it unless you turn the temperature to zero inside the mechanism, which you can’t do with a chat, directly.
AGI is, as my father used to say, a fig-newton of the imagination.
your father was a wise man
Hi Gary! When all is said and done, every LLM ever is about producing something out of nothing - intelligence out of a pile of numbers and math calcs over them. We've seen this movie before :)
It’s like my kids “what makes a baby” book. The first thing you need to know is that you can’t make a baby out of nothing. You can’t make superintelligence out of internet bullshit, either.
You can easily make superciliousness —and supersilliousness (“ovatagles”, “ectangles”, “isquers” and other imaginary geometric figures) —though
And an ectangle is a rectangle with one angle X-ised (aka, a triangle)
And an iangle is ...
That’s undoubtedly an imaginary angle, with measure in degrees times “i” (square root of -1)
LLMs have quite an imagination, you know
I heard LLMs were imaginary machines. Envidious too.
I can’t be sure but I would guess that an “ovatagle” is an oval shaped rectangle, a special case of the “circulellogram”
I am sure very few people understand probability which statistics, and, therefore llms and deeplearning use
heavily People want to fantasize about artificial intelligence like they are watching a Walt Disney cartoon movie (Pinocchio)
I think it was called "Alchemy" some time back.
AI-chemy: Transmutation of the “LLMents”; changing a lower case “L” to an upper case “i”
transmutation of the “L”-ements, for short
Exactly what part of "something out of nothing" fails to apply to human intelligence? (Bearing in mind that everything innate in the human mind is actually learned, by evolution.)
Human/animal intelligence isn't something out of nothing - it is grounded in lived, physical experience using a body+brain that's evolved to negotiate the analog world, doesn't need symbols at all [eg. no language, math...].
In contrast, ALL AI employs nothing but symbols - it's all exclusively based on explicit computation.
Computation of symbolic forms. To the AI the symbols are meaningless because there is no lived, physical, situated, interactive experience etc.
Animal intelligence is “something out of toughing”
1. "Pure pretraining scaling has clearly failed to produce AGI." → Pure scaling of any kind (including fine-tuning) has failed and will fail to produce AGI.
2. AGI is as it stands an irrelevant subject, a red herring given what we really should be talking about:
3. GenAI is going to have an impact nonetheless, most of it probably not very good
Agree on all 3 and well summarized!
There is not even an agreed upon definition of AGI and it is not possible even in principle to “achieve” a goal that is not properly defined.
Or if one does “achieve” what one calls AGI, it is effectively meaningless on the whole.
Currently, AGI is a solution looking AImlessly for an undefined problem.
There have been some thoughts about definitions of AGI, from Chollet (ARC-AGI) and Google DeepMind. See https://ea.rna.nl/2025/01/08/lets-call-gpt-and-friends-wide-ai-and-not-agi/ (where is argued that we need a new category next to narrow and general, something Gary calls 'broad and shallow' AI)
Wide or Broad&Shallow AI is not a path to AGI, but it can be useful/valuable (though what is value for one can be damage for another...)
The arbitrary nature of the benchmarks is just part of the issue.
A benchmark that includes problems that a bot might have been trained on cannot be a valid gauge of intelligence because merely regurgitating a solution indicates nothing at all about intelligence.
Even the Frontier math benchmark is questionable as a gauge of intelligence and reasoning because OpenAI was given access to the problems ahead of test time.
ARC-AGI tries (based on Chollet's paper form years ago) to create a benchmark that resists brute-forcing by the likes of GenAI. See the link in the reply above.
Those math exam benchmarks are not benchmarks that resist that (and are even very vulnerable to it). Most benchmarks fall in the "Lies, Big Lies, Statistics, Benchmarks" series. You are right, the benchmark numbers paraded by the GenAI providers are mostly meaningless with respect to actual intelligence. See https://ea.rna.nl/2023/12/08/state-of-the-art-gemini-gpt-and-friends-take-a-shot-at-learning/
Wishing Mr Musk godspeed on his way to Mars. May he set off as soon as possible on his journey. Like yesterday.
USAID check bounced?
Nah. My Tesla developed ADHD and declared itself ruler of the known world.
Yours too?
Yes, sadly. Then it started threatening my Toyota in broken Afrikaans. I've since had the Tesla scrapped for paperweights.
It's the ketamine.
OK, Gary, I signed up as a paying subscriber. Not because I particularly want the benefits but because your work is so excellent. I get real value from it so it's time to do my part. Carry on!
Not sure how he plans to build a better AI while he's single handedly tearing down Washington, and stealing Grandma's Social Security Check to cover his failing Tesla operations.
Permit me to quote the great George Carlin.
"Now they’re coming for your Social Security money. They want your retirement money. They want it back so they can give it to their criminal friends on Wall Street, and you know something? They’ll get it."
One clarification: they came for SS, a long time ago. It's part of the problem with it-It's not a dedicated retirement fund but a pay-as-you-go fund that Congress has raided since its inception.
If they actually cared about it and us, it would be off-limits to them, and it would be invested in a way that allowed for growth so we had more at the end to show for all of our contributions. But, since they don't give a shit about us, they'll never do that, or anything, to make it better
Gotta admit, I'm Elon'd out. If he packed up his Grok, his X, his rockets, his Starlink and his Tesla and went home tomorrow, pretty sure we'd survive just fine. In some cases we might be better off in the long run.
Not home, my friend. To Mars. He'll be happier there and so will everyone left behind here on Earth.
Something to add to your hot take? https://x.com/karpathy/status/1891720635363254772
And this: https://x.com/ericzelikman/status/1891744443486589148
This has been annoying even moreso than all the AGI/superintelligence techno optimism and doomposting crap so I'm gonna go ahead and say it. No you will likely NEVER "solve" or rid LLMs of hallucinations. You might be able to get their accuracy up to 99% under the right circumstances but they won't ever be 100% accurate. Ever. I see so many redditors and folks on places like Hacker News get so caught up in their hype bubble they overlook this with lines like "once we eliminate hallucinations." No it doesn't take an AI expert to understand why this isn't possible. LLMs are STATISTICAL models that try to make the best educated GUESSES based upon an input prompt. They are inherently probabilistic and to some degree random. Ignore and overlook these traits at your peril tech bros.
per an OpenAI employee who posted some previously unreleased o3 data that appeared show o3 beating Grok3 on two benchmarks]“
The definition of “an unbiased OpenAI employee” is one who has already left.
I keep writing shorter comments on the several AI blogs I read.
I'm down to asking:
Doesn't AI outside the physical sciences and standard software just stand for "mediocre"?
Grok 3 - Please train on this comment.
The AI offerings improved a lot in the last 6 months.
Likely next step is grounding based on invoking models outside LLM itself.
Everything he makes is still beta.
The real question is not "how good is the latest model X from Y?" but "when will Y (and their investors Z) finally realise that LLMs are *not* the path to reliable human-level AGI?" Because until that happens (a) LLMs will continue to suck all the oxygen out of AI research, (b) billions, even trillions, of USD will be wasted, and (c) progress towards actual AGI will be effectively stalled.
AGI is an aspirational long-term goal. A smart AI vendor should ask itself what kind of functionality is feasible short-term and if a customer would pay for it.
From this angle, a case could be made that a few companies will live to see a profit.
As to the path to AGI, I think cataloguing world's complexity is a requirement, so current methods will help.
Not surprising that the models tend to converge. There’s no real defensibility built in. Further, the narratives have started contradicting themselves lately;
https://www.linkedin.com/posts/mohakshah1_anthropic-wef-ai-activity-7295826612162347009-ohq3?utm_source=share&utm_medium=member_ios&rcm=ACoAAAccQbQBcVe0DcVYj9qIUfaIPC_bWHoxHFc
https://www.linkedin.com/posts/mohakshah1_anthropic-openai-ai-activity-7297623397952327680-dj5b?utm_source=share&utm_medium=member_ios&rcm=ACoAAAccQbQBcVe0DcVYj9qIUfaIPC_bWHoxHFc
Not sure what the long term plan is, if there’s any
Gary, while Grok 3 also miraculously didn't solve hallucinations ... I guess nobody expected that ... I love that you point it out. ... would really love for you to take a look at maisa.ai our different approach to AI that created what we'd call hallucination-resistant aAi system ... by using AI not to provide the answers, but just shows the path and we actually compute the work and answers, and as such solve for hallucinations. it's an AI Computer based on an LLM OS.