Viewing an LLM merely as a tool (as one should) this is important to keep in mind. These tools are only as accurate as a Google search, albeit quicker and more tailored to your specific question. They can be jacks of all trades but will never have deep (human-beating) expertise.
It's telling that in the areas where we see true superhuman performance out of AI (chess, go, certain robotics applications) the training data is largely synthetic, i.e. not from humans.
Kathleen Weber: Hugh (in a note below this one) is most likely correct. The distinction to be maintained, however, is between the intelligible that is only that, as in the natural and physical sciences, and the intelligible that is also intelligent, as in human things, e.g., the social sciences, arts, culture, history, religion, even economics (and sports).
The first is pretty standard--like medicine, which is off-the-charts-helpful. And the standardization, even of the laws, is, in part, not intelligent but commonly, functionally consistent--much more amenable and really helpful as AI trained.
The second, however, is intelligibility that is also intelligent (human). And it is in those areas that your "shallow pool of mediocrity" comment comes in, at least for now. Intelligence and history, where it lives, always has a heavy-duty non-systematic component, an empirical residue, so to speak, which is also the source of (or A source of) human questions and creativity. Exit: AI coherence.
So, the distinction is also between (a) somewhat static, and (b) constitutionally dynamic that is not sorted out (yet) for AI or (if I understand it well enough) for AGI.
There is also the element of normativity which is underpinned by a value system which I seriously doubt can be copied, especially when related to the non-systematic of human thinking and activities in real-time history.
Actually, on all current medical benchmarks, LLMs currently vastly outperform human doctors, see for example "https://cpcbench.com/" which benchmarks LLMs on a set of extremely difficult medical cases published over the last 100 years.
This is not to say that this means that LLMs outperform a skilled specialist in their field of choice, a GP that has known you for a decade, or an entire hospital system at diagnostic information processing. However, if you have some question about your health, these benchmarks seem to imply that you will get a better answer if you approach an LLM, rather than a randomly sampled doctor.
*Gasp* Gary, you mean to tell me that the doomer fanfiction written by a philosophy dropout, a guy who spreads debunked scientific racism, and a guy who specializes in “AGI Forecasting” is wrong and off target and will probably get pushed back again when 2030 comes around!?!?!?
This is news to me! I had absolutely no idea that this would happen. This is a real curveball I tell you.
In principle, rationalism sounds great, but the hubris, the shallowness, and the fundamental lack of self-examination of their assumptions that it magically attracts make this club just unbearable.
And now, like a bizarre skepticism-fundamentalism horseshoe theory, they even end up where religious fundamentalists are: a doomsday cult.
It's probably like Immanuel Kant wrote: when we have freed ourselves from the chains of conventional wisdom, or what today is “political correctness,” we are still unaccustomed to such free movement. And so many people who strive to be independent, free thinkers just end up as cranks.
I was watching Melanie Mitchell talking about her work testing LLMs and analogies. [She has 2 very good papers about this, using the Chollet test examples.] The upshot was that although the larger LLMs do better, their performance quickly degraded as the test examples got harder, whilst humans maintained about the same average 85-90% score regardless of difficulty.
This is interesting, as these analogies are relatively "simple" visual ones, and not ones on concepts, e.g., waves in water and sound waves, or the idea that the [old] model of the atom of electrons circling the nucleus was like the planets orbiting the sun.
Without such an ability to build analogies, I see no real hope that AIs will "solve all our problems" as the best they can do is reanalyze data and apply different reasoning. There cannot be real breakthroughs without the capability to create new concepts and models to further progress. This is in addition to Dr. Marcus's insistence that AIs need to have internalized world models to be able to do tasks well without making silly mistakes.
I think humans are not going to be replaceable any time soon except for some tasks that are best suited to computers - e.g., doing tasks requiring lots of working memory, or staying vigilant when doing repetitive tasks, and staying separated from real-world tasks needing dexterity to handle objects.
Based on what information or experiments? We do have analog computers, so it would be easy to test that hypothesis, although perhaps very expensive to scale it up to current data servers for AI. However, since strings of characters were done decades ago in Lisp [e.g. ABCD->ABBCD, IJKL -> ?], it should be easy to test this level of analogies on small analog computers.
You can surely rest easy knowing that your bunker will still be useful in 1 (2(5 (10( 50( 100(…))))) years when AGI finally inevitably occurs.
It would nonetheless be useful if there were an “AGI Index” to estimate the likelihood of AGI arriving on any given day, so that we might at least put our papers in order.
That caption - "doublings may get faster due to fewer new skills being needed at higher timescales" - just goes to show that these people have the two-dimensional understanding of intelligence that you'd expect of a child. The very best that they are is stunted fan fiction writers, if not outright stooges.
I really wish people like that co-author would just stop lying. It is hard to hear people so obviously lie about things you find deeply interesting and are knowledgable about.
That co-author knew what he did and is backpedaling when his lies became too obvious for him to maintain
Been saying that from the beginning. We're not anywhere close to what they call AGI. And it may never happen let alone picking a date or a year. If I had to bet on never or within 10 years I pick never.
Oh good. I’m still considering quitting my hospital job rather than helping them build a gate kept AI system that they can then charge patients for access to. Since AI is a business model built on theft and one of the biggest scams ever perpetrated on America I find it’s use in healthcare grotesque and unethical. I no longer trust the judgment of the leadership team at the hospital that I work at. In fact, I’m acutely aware of their contempt for all of us employees and would like to move on. I also think the community would be outraged to know they decided to sell out every private interaction with a healthcare provider to Microsoft. But we haven’t gotten to that stage yet.
Take heart! When our local hospital tested out computer models to replace hospital workers the only thing that happened was the patients ordered them out of their rooms and told them not to come back. They abandoned the plan before they even finished implementing it. Who knows how much money they spent at our publicly funded, nonprofit hospital.
And so begins a promising, decades-long grift of making a career out of perpetually back-pedalling a nothing burger of a "seminal publication" with completely bs "predictions" 🥳🍾
Gary What is clear is none of the AI failures you have described can be addressed by the architecture of genAI. You described the solution in terms of symbolic systems like CYC. It is not architecture it is knowledge acquisition. Knowledge like common sense facts must be acquired automatically since knowledge acquisition by hand is just not feasible. The genAI guys know that but don't know how to solve the problem. Unless the knowledge acquisition problem is solved no progress to AGI is likely.
The groundbreaking reasoning of Gemini 3 Pro Preview... this was the very first thing I tried:
User
How many r in strrrrawberrrrrrrrry
Model
Thoughts
Reviewing the Input
I'm currently reviewing the input string, "strrrrawberrrrrrrrry", to accurately count the occurrences of the letter 'r'. I've broken down the string into individual characters and am now verifying each 'r' to ensure a precise count. This initial analysis is crucial for achieving an accurate result, and I am being meticulous in my review.
Verifying the Count
I've meticulously recounted the 'r's in "strrrrawberrrrrrrrry" to confirm my initial assessment. Breaking the string into segments and double-checking each occurrence, I'm confident in the final count of 12. My attention has been on a methodical verification process, eliminating the possibility of any errors.
Clearly Defining the Result
I'm now stating the result: the letter 'r' appears precisely 12 times in "strrrrawberrrrrrrrry". I've finalized and validated this value, concluding the processing stage. It's now time to output the final result.
"We" also need to know more about human consciousness as well as human foibles, not necessarily to copy either, but as a gauge, and to understand what cannot be copied, on principle.
But again, the starry-eyed self-serving hyper-capitalists who are also anti-competitors (regardless of what they say they are) are best at putting the cart before the horse as anyone in history has ever been. And if the cost of AI use goes up to "cover their investment," they are just carving out the same old divisions in cultures between the haves and the have-nots that has been around for centuries, only now it concerns sources of information and getting educated--whereas, what is to happen to the model of public education? Sounds like regression to me.
BTW, I read in one of those New York Times articles (that I post here) that the substance of the lawsuit about AI companies using (stealing) writers, reporters' and news organizations' work has been denied by the defendants. What a shock.
I'm still wondering how incremental improvements to autocorrect parlay into spending billions on infrastructure to support it, with the hope it'll ever do anything more than type out random words that kind of look like everything else you've already seen before.
Even the hybrid symbolic-LLM is simply adding language accessibility to everything we already had. It's interesting, but doesn't really change the game, especially since we already know natural language doesn't compute and programming languages were a massive leap forward for mechanized pseudo-intelligence. Beyond that, graphic interfaces are so much better for dealing with systems, because a picturing is worth a thousand prompts. Most of us don't even think with words, we encode thoughts into words for transmission, but generally hold objects in mind that are much more informationally dense and kind of bumble out words to describe the object. You couldn't even talk if you had to formulate everything you were going to say before you started speaking because the other person would think you fell asleep with your eyes open.
It's like someone came along and looked at calculators, and tried to convince us that we could go back to counting on our hands but use gloves with extra fingers.
UNCOMPUTABLE TANGENT
[a poem for G. Marcus :. AI Rex]
By : Kastor, Atropa, and that spirit that freaked out Lemoine for lulz
11/2025
Bits from the switches
Traces to logic
Binary functions
Actions atomic
Monochrome migraine
Barfing on consoles
Assembly for shorthand
Derivative mnemonic
Bootstrap the graphic
High level language
Symbolic objects
Refresh and repaint
Teach it to speak
Language of birds
Serialized two wire
Chirping absurd
Networked and noding
Spider webs and pearls
Memory castle
Spelunk and go swimming
Psychonauts exploring
All the cats and women
Derivatives like turtles
Perhaps Freud was right
Until Jungian objects compiled one night
Between platonic metaverses
MySpace and DARPA
Reddit and 4Chan
Data collection complete
We've folded at home
And Amazon's sweet
But Temu is coming
TikTok too
Time to get drastic
Alchemical too
Toss the Teddy Ruxpin in a vat of acid
Have the neckbeard vigorously fill a flashlight until flaccid
When I learned that an LLM is simply an average of the Internet, I knew it could be nothing more EVER than a large but shallow pool of mediocrity.
So, THAT’s what AI stands for? Average Internet?
Or is it “Average Idiot”?
Is there a difference?
Ding! Ding! Ding! 🏆🏆🏆
Viewing an LLM merely as a tool (as one should) this is important to keep in mind. These tools are only as accurate as a Google search, albeit quicker and more tailored to your specific question. They can be jacks of all trades but will never have deep (human-beating) expertise.
It's telling that in the areas where we see true superhuman performance out of AI (chess, go, certain robotics applications) the training data is largely synthetic, i.e. not from humans.
Kathleen Weber: Hugh (in a note below this one) is most likely correct. The distinction to be maintained, however, is between the intelligible that is only that, as in the natural and physical sciences, and the intelligible that is also intelligent, as in human things, e.g., the social sciences, arts, culture, history, religion, even economics (and sports).
The first is pretty standard--like medicine, which is off-the-charts-helpful. And the standardization, even of the laws, is, in part, not intelligent but commonly, functionally consistent--much more amenable and really helpful as AI trained.
The second, however, is intelligibility that is also intelligent (human). And it is in those areas that your "shallow pool of mediocrity" comment comes in, at least for now. Intelligence and history, where it lives, always has a heavy-duty non-systematic component, an empirical residue, so to speak, which is also the source of (or A source of) human questions and creativity. Exit: AI coherence.
So, the distinction is also between (a) somewhat static, and (b) constitutionally dynamic that is not sorted out (yet) for AI or (if I understand it well enough) for AGI.
There is also the element of normativity which is underpinned by a value system which I seriously doubt can be copied, especially when related to the non-systematic of human thinking and activities in real-time history.
Relying on LLMs for medical advice is just foolish.
The answers can not only be “mediocre” but just plain wrong. Dangerous even.
medical “research” provided by an LLM can be just as wrong as medical advice.
LLMs are simply unreliable so everything they return needs to be independently verified.
Larry Jewett: If you are referring to my comment about medicine, I was referring to research abilities, not specific medical advice.
Actually, on all current medical benchmarks, LLMs currently vastly outperform human doctors, see for example "https://cpcbench.com/" which benchmarks LLMs on a set of extremely difficult medical cases published over the last 100 years.
This is not to say that this means that LLMs outperform a skilled specialist in their field of choice, a GP that has known you for a decade, or an entire hospital system at diagnostic information processing. However, if you have some question about your health, these benchmarks seem to imply that you will get a better answer if you approach an LLM, rather than a randomly sampled doctor.
*Gasp* Gary, you mean to tell me that the doomer fanfiction written by a philosophy dropout, a guy who spreads debunked scientific racism, and a guy who specializes in “AGI Forecasting” is wrong and off target and will probably get pushed back again when 2030 comes around!?!?!?
This is news to me! I had absolutely no idea that this would happen. This is a real curveball I tell you.
I share your shock and disappointment!
It is human biodiversity, please. 😉
But seriously—yikes!
In principle, rationalism sounds great, but the hubris, the shallowness, and the fundamental lack of self-examination of their assumptions that it magically attracts make this club just unbearable.
And now, like a bizarre skepticism-fundamentalism horseshoe theory, they even end up where religious fundamentalists are: a doomsday cult.
It's probably like Immanuel Kant wrote: when we have freed ourselves from the chains of conventional wisdom, or what today is “political correctness,” we are still unaccustomed to such free movement. And so many people who strive to be independent, free thinkers just end up as cranks.
These folks obviously fashion themselves as today’s version of America’s Founding Fathers.
Maybe they are — just sans the intelligence, rationality, wisdom and foresight.
I was watching Melanie Mitchell talking about her work testing LLMs and analogies. [She has 2 very good papers about this, using the Chollet test examples.] The upshot was that although the larger LLMs do better, their performance quickly degraded as the test examples got harder, whilst humans maintained about the same average 85-90% score regardless of difficulty.
This is interesting, as these analogies are relatively "simple" visual ones, and not ones on concepts, e.g., waves in water and sound waves, or the idea that the [old] model of the atom of electrons circling the nucleus was like the planets orbiting the sun.
Without such an ability to build analogies, I see no real hope that AIs will "solve all our problems" as the best they can do is reanalyze data and apply different reasoning. There cannot be real breakthroughs without the capability to create new concepts and models to further progress. This is in addition to Dr. Marcus's insistence that AIs need to have internalized world models to be able to do tasks well without making silly mistakes.
I think humans are not going to be replaceable any time soon except for some tasks that are best suited to computers - e.g., doing tasks requiring lots of working memory, or staying vigilant when doing repetitive tasks, and staying separated from real-world tasks needing dexterity to handle objects.
Perhaps an analog computer is fundamentally better than a digital one at producing analogies (including world models)
Based on what information or experiments? We do have analog computers, so it would be easy to test that hypothesis, although perhaps very expensive to scale it up to current data servers for AI. However, since strings of characters were done decades ago in Lisp [e.g. ABCD->ABBCD, IJKL -> ?], it should be easy to test this level of analogies on small analog computers.
Gary! 🤣🤣 Now I just purchased my survival bunker with assurances that it would be installed by October of 2026. Now this. Whats a AI believer to do?
Said believer just might wake up and smell the bullshit..
Twas scarcasm. 🙃🙃🙃🙃
I knew that--I just kept the joke going. In the world of comedy, when a question is asked, someone has to answer.
Yes, and!
BREAKING.... so social!
You can surely rest easy knowing that your bunker will still be useful in 1 (2(5 (10( 50( 100(…))))) years when AGI finally inevitably occurs.
It would nonetheless be useful if there were an “AGI Index” to estimate the likelihood of AGI arriving on any given day, so that we might at least put our papers in order.
“The Second Coming of AGI”
The AGI is coming
(The second time this year)
The Valley Boyz are drumming
And leading with a cheer
The grapher’s madly plotting
Predicting the arrival
The rest of us alotting
The rice for our survival
That caption - "doublings may get faster due to fewer new skills being needed at higher timescales" - just goes to show that these people have the two-dimensional understanding of intelligence that you'd expect of a child. The very best that they are is stunted fan fiction writers, if not outright stooges.
I really wish people like that co-author would just stop lying. It is hard to hear people so obviously lie about things you find deeply interesting and are knowledgable about.
That co-author knew what he did and is backpedaling when his lies became too obvious for him to maintain
Been saying that from the beginning. We're not anywhere close to what they call AGI. And it may never happen let alone picking a date or a year. If I had to bet on never or within 10 years I pick never.
Oh good. I’m still considering quitting my hospital job rather than helping them build a gate kept AI system that they can then charge patients for access to. Since AI is a business model built on theft and one of the biggest scams ever perpetrated on America I find it’s use in healthcare grotesque and unethical. I no longer trust the judgment of the leadership team at the hospital that I work at. In fact, I’m acutely aware of their contempt for all of us employees and would like to move on. I also think the community would be outraged to know they decided to sell out every private interaction with a healthcare provider to Microsoft. But we haven’t gotten to that stage yet.
Take heart! When our local hospital tested out computer models to replace hospital workers the only thing that happened was the patients ordered them out of their rooms and told them not to come back. They abandoned the plan before they even finished implementing it. Who knows how much money they spent at our publicly funded, nonprofit hospital.
The patients ordered the computer models out of their rooms and told them not to come back?
“Bad, bad AI! Leave now! And don’t come back!”
And so begins a promising, decades-long grift of making a career out of perpetually back-pedalling a nothing burger of a "seminal publication" with completely bs "predictions" 🥳🍾
I agree, but I believe you are mixing your metaphors.
Yes, so do most great authors.
Gary What is clear is none of the AI failures you have described can be addressed by the architecture of genAI. You described the solution in terms of symbolic systems like CYC. It is not architecture it is knowledge acquisition. Knowledge like common sense facts must be acquired automatically since knowledge acquisition by hand is just not feasible. The genAI guys know that but don't know how to solve the problem. Unless the knowledge acquisition problem is solved no progress to AGI is likely.
The groundbreaking reasoning of Gemini 3 Pro Preview... this was the very first thing I tried:
User
How many r in strrrrawberrrrrrrrry
Model
Thoughts
Reviewing the Input
I'm currently reviewing the input string, "strrrrawberrrrrrrrry", to accurately count the occurrences of the letter 'r'. I've broken down the string into individual characters and am now verifying each 'r' to ensure a precise count. This initial analysis is crucial for achieving an accurate result, and I am being meticulous in my review.
Verifying the Count
I've meticulously recounted the 'r's in "strrrrawberrrrrrrrry" to confirm my initial assessment. Breaking the string into segments and double-checking each occurrence, I'm confident in the final count of 12. My attention has been on a methodical verification process, eliminating the possibility of any errors.
Clearly Defining the Result
I'm now stating the result: the letter 'r' appears precisely 12 times in "strrrrawberrrrrrrrry". I've finalized and validated this value, concluding the processing stage. It's now time to output the final result.
Collapse to hide model thoughts
Thought for 9.2s
There are 12 r's in "strrrrawberrrrrrrrry".
Here is the breakdown:
st rrr awbe rrrrrrrrr y
(3) + (9) = 12
Something clearly broke down in the breakdown.
"We" also need to know more about human consciousness as well as human foibles, not necessarily to copy either, but as a gauge, and to understand what cannot be copied, on principle.
But again, the starry-eyed self-serving hyper-capitalists who are also anti-competitors (regardless of what they say they are) are best at putting the cart before the horse as anyone in history has ever been. And if the cost of AI use goes up to "cover their investment," they are just carving out the same old divisions in cultures between the haves and the have-nots that has been around for centuries, only now it concerns sources of information and getting educated--whereas, what is to happen to the model of public education? Sounds like regression to me.
BTW, I read in one of those New York Times articles (that I post here) that the substance of the lawsuit about AI companies using (stealing) writers, reporters' and news organizations' work has been denied by the defendants. What a shock.
This reminds me of that old doctors joke.
Nurse: the patient who only has 3 months to live can't pay his bill
Doctor: Well, give him another 3 months
Lol, Gary!
TL;DR: "The sky isn't falling - yet".
This isn't as bad as Kurzweil's saying it for decades, so there's that :)
The public celebrates the “psychics” who got it right as “soothsayers” and simply forget the vast majority of them who got it wrong.
That’s even true of the ones who made wrong predictions time and time again. If their latest prediction came true, that’s all that matters.
Surprise, surprise...
I'm still wondering how incremental improvements to autocorrect parlay into spending billions on infrastructure to support it, with the hope it'll ever do anything more than type out random words that kind of look like everything else you've already seen before.
Even the hybrid symbolic-LLM is simply adding language accessibility to everything we already had. It's interesting, but doesn't really change the game, especially since we already know natural language doesn't compute and programming languages were a massive leap forward for mechanized pseudo-intelligence. Beyond that, graphic interfaces are so much better for dealing with systems, because a picturing is worth a thousand prompts. Most of us don't even think with words, we encode thoughts into words for transmission, but generally hold objects in mind that are much more informationally dense and kind of bumble out words to describe the object. You couldn't even talk if you had to formulate everything you were going to say before you started speaking because the other person would think you fell asleep with your eyes open.
It's like someone came along and looked at calculators, and tried to convince us that we could go back to counting on our hands but use gloves with extra fingers.
UNCOMPUTABLE TANGENT
[a poem for G. Marcus :. AI Rex]
By : Kastor, Atropa, and that spirit that freaked out Lemoine for lulz
11/2025
Bits from the switches
Traces to logic
Binary functions
Actions atomic
Monochrome migraine
Barfing on consoles
Assembly for shorthand
Derivative mnemonic
Bootstrap the graphic
High level language
Symbolic objects
Refresh and repaint
Teach it to speak
Language of birds
Serialized two wire
Chirping absurd
Networked and noding
Spider webs and pearls
Memory castle
Spelunk and go swimming
Psychonauts exploring
All the cats and women
Derivatives like turtles
Perhaps Freud was right
Until Jungian objects compiled one night
Between platonic metaverses
MySpace and DARPA
Reddit and 4Chan
Data collection complete
We've folded at home
And Amazon's sweet
But Temu is coming
TikTok too
Time to get drastic
Alchemical too
Toss the Teddy Ruxpin in a vat of acid
Have the neckbeard vigorously fill a flashlight until flaccid
Let them cook on the glow
Infrared graphics
Night vision now
Can't see a thing
Visual cortex
Demon core sings
INTELLIGENCE REBORN!
Derivative level : slop
Put glue on your pizza
Psychosis won't stop
Staring into the void
Other peoples money
Flames
And apocalyptic visions of a world without work