Every transformer based model has been caught out parroting its dataset. The justification for this behavior is "this is also how people learn", but you can teach a child how to roll a ball without giving them three hundred trillion examples of it. That's what a general problem solving intelligence does. It problem solves.
These transformers don't. They parrot, bootstrapping themselves off an immense amount of information taken from 8 billion actual problem solvers, sufficient enough to look slightly smarter than people who don't know what they are.
If you consider (from a very high level perspective) the mechanisms through which OpenAI and similar have an opportunity to improve their LLMs, they really have three areas of improvement they can pursue:
1. Improving conceptual recognition through context, a largely mathematical problem
2. Improvement of contextual length, which is both a capacity problem and a mathematical problem
3. Improvement of contextual summarising, which is a mathematical problem
You're well into death by dimensionality in problem 3. Problem 1 is still largely theoretical and we don't even really have a conceptual understanding of it in people, let alone machines. Problem 2 is partly reliant on problem 3, but also on sheer numerical brute force. That will get them some of the way, but we're talking getting to warp drive by dinosaur bone fuel here.
None of these problems are solvable by the LLM themselves. Worse, even assuming you were able to teach an agent something (you can't, you can only alter the results from its querying by embedding additional context), it has no mechanism of feeding that information back as anything other than "this query solved a problem".
There is no intuition. No conceptual wrangling. No unsupervised learning. No feedback method. Worse, they don't perform on the fly conceptualisation like we do. They have it hard coded, embedded into them because of the language they speak.
Not only are we not going to see AGI this year, but without a substantial revolution in how these systems conceptualise (which shows no signs of coming), it won't arrive.
So true. Memorization and understanding are not the same thing. A child can memorize a book, but it doesn’t mean they can read. When they start to learn the sounds letters make, and the blends, and then how to put them together, then they can read. Even then, it doesn’t mean they can interpret the text.
There's also the problem that the format their information is contained within, limits the manner in which they can learn.
When you think about how kids learn language, they don't learn it by reading a thouand sentences that contain the word cat. They see a cat. They stroke a cat. They see a symbol of a cat. They conceptualise: "cat". This process is pretty damn quick. It can take a handful of attempts to get a kid to recognise a cat. They then start generalising (sometimes incorrectly), but as they get better at language, they start asking questions about their conceptualisations.
"Hey, daddy, is that a cat?"
"No, that's a dog."
"Why isn't it a cat?"
People don't see these moments as what they are - attempts by the child to build an improved conceptual model of the thing they have in their head, that matches the concept of "cat".
Once they have this, if you say to them "can cats swim" they'll ask "what is swimming" and if you explain to them swimming, and they've swam themselves, they'll say "they have legs so they can swim". If they haven't, they'll ask how you swim. If you look like you're swimming, they will either incorrectly assume that cats must swim the same way, or ask if there's a way for cats to swim. (1)
That process there is problem solving intelligence. Not this "write out a program to solve x in Python" bullshit people keep pretending these LLMs are really good at. The language exchange is a tiny fraction of the process. The kid isn't being asked a query and then querying their dataset to determine how best to put together a sentence to respond. They've got an internal conceptual mirror which is assembled on the fly, attached to ways of expressing that concept, and then they communicate what they know, want to know, or don't communicate it at all.
The most important thing about (1) is that some kids will incorrectly assume that cats can't swim, or they might intuit that a cat might not like swimming. Or that their fur will get wet and cats might not like fur getting wet because they don't like getting wet themselves. Or maybe a cat swimming is funny.
All of that conceptual learning and expression is the same process as the learning what a cat is! LLMs can't even do the first bit right. Ask them a simple question in a linguistically similar form to a conceptual riddle and they solve the riddle, not the question!
Worse, there's no way to fix this, because humans have complex and multivariate compared to our relatively flattened method of verbal communication. If you ask a child to act like a cat, and they haven't seen one, sometimes they'll just come up with something they imagine could be called a cat!
Yes yes a thousand times yes! Thank you for explaining this so clearly.
The only example of GI that we have (humans, and maybe some animals depending on where you want to put the bar) has MANY aspects that MAY OR MAY NOT be NECESSARY for GI. For example, sleep. MAYBE that's just a bad feature that was needed for chemistry-based brains to clear out debris but ... maybe not! Maybe it's crucial! We don't know. Many other things as well, e.g. continuous sensory input (cut off all the senses of a human and see how long any functionality lasts), physical architecture shaped under those continuous inputs, ability to maniputlate the physically nearby environment .... etc etc etc. We have NO CLUE what is needed for GI and what is not. It is simply being ASSUMED that none of those things are necessary. And that might be right, but it might also be wrong. Personally, my bet is wrong.
And that would mean that AGI is decades or centuries or millenia away ...
Indeed, LLM alone is a mindless prediction engine. At most it does a local generalization around the samples.
We will build on this though. The way we do work is in small steps, with each step involving recognition, applying a recipe, continuous inspection, and refinement.
Where OpenAI is now, this is likely easier to implement with generating and running code, invoking tools for verification, searching around. Longer term, likely an on-the-fly simulation could be built for many problems, the way we do in our minds.
All this will take time though. My point is that this is not a dead-end, but there's a lot to do.
"We will build on this though. The way we do work is in small steps, with each step involving recognition, applying a recipe, continuous inspection, and refinement.: It isn't a reducible problem that can be cut into small pieces and solved that way, and it isn't going to be solved with a background in programming or Machine Learning. It requires people who are literate, and think fast enough to catch glimpses of their Unconscious Mind in action.
I advise caution against buying into the hype created by OpenAI. They have repeatedly made claims that have later been discredited by independent research.
There are also significant issues with AI benchmarks, particularly given that the training data is neither shared nor independently verifiable, leaving them open to manipulation.
This appears to be yet another case of what resembles p-hacking – manipulating data to produce results that align with one’s interests.
Consider this excerpt from a tweet:
“Remember o3’s 25% performance on the FrontierMath benchmark?
It turns out OpenAI funded FrontierMath and had access to a substantial portion of the dataset.
The mathematicians who created the problems and solutions for the benchmark were not informed that OpenAI was behind the funding or had access to the data.
This raises critical concerns:
• It is unclear whether OpenAI trained o3 on the benchmark, making the reported results questionable.
• Mathematicians, some of whom are sceptical of OpenAI and would not wish to aid general AI advancements due to concerns about existential risks, were misled. Many were unaware that a major AI company funded the effort.
From Epoch AI:
‘Our contract specifically prevented us from disclosing information about the funding source and the fact that OpenAI has data access to much but not all of the dataset.’
Sam Altman: “guys please stop believing the hype I’ve played a huge hand in creating,” 🙄. I feel like Sam Altman’s corporate title should be “Chief of AI Hype,” because that seems to be his whole job.
So what does "we know how to build AGI" mean, pray? A Sutskever-style answer like: we ask GPT4-3o to build it for us and give it unlimited time and resources. Anything that doesn't involve magical or alchemical thinking? Is *anyone* able to put the screws on so he'll answer what 'we know how' means?
All an LLM is and can ever be is a cheaternet. It looks smart by globally peeking over people's shoulders, copying their answers, and then pretending it came up with those answers all by itself.
I've been thinking about an apt analogy for current AI. I've landed on human muscle memory, except exceptionally better muscle memory. Just like a human can blurt out words of a song after listening to it over and over again, or automatically swing a tennis racket the right way after hours and hours of practice, or chess grandmasters just play opening moves by memory, AI is good at that. The problem is we are trying to now make muscle memory produce useful things. AI is failing exactly the same way human muscle memory would fail if it were applied this way. Imagine a bank clerk using muscle memory to process your transaction.
Altman is making a mockery out of the technology his company pioneered. Can you imagine the CEO of IBM, or Microsoft, or Oracle doing this? No. If those guys did it, their board would fire them.
Imagine a world where AI research could advance without the extreme distorting effects of self-promotion. Imagine where we (would not) be if Feynman, Von Neumann, Cantor, Curie and the rest had been such shameless hucksters.
Any and all abstraction layers add to computational overhead. Any computer capable of solving for that overhead better than a human being is not going to be interested in us.
I'm of the opinion that AGI of the type they are discussing is conceptually impossible without immense changes in how we perform computations. The idea that they are is a human conceit, in large part because the people pushing this nonsense are Torment Nexusers who believe creating the Torment Nexus is how we achieve a multiplanetary species, and since they are the only people able to "understand this", they need to be outrageously wealthy to ensure it happens.
This is because the types of AGI they're discussing are a result of science fiction writers trying to conceptualise the kind of interface mathematicians thought they could have when they discovered how to metaprogramme.
On the sci fi angle. If you take the Enterprise D's main computer as an example, in universe it's some kind of quantum computer capable of faster than light communication inside itself. This sounds very impressive, but look at how the ship operates. The 'smart bit' which is the bulk of the conversation inside those Star Trek series involves the computer solving complex known equations in conversation with the actor.
Said actor mostly spews technobabble at another actor, and that technobabble has some vague meaning in the plot, the computer says "working" and then provides a plain text or visual text result. Okay, so far so cool.
The problem is the computer is also displaying a ton of information to the various foreheads inside the ship. So you've got a computer capable of superhuman computation based off natural language, which also spends most of its time providing useless readouts to beings acting a trillion times slower than it does.
Following this logic (as Iain M Banks does) leads to the same type of computers seen in the Culture novels - Minds. In the Culture novels, these things are essentially computers that get bored with so called "meatspace" and spend most of their time imagining universes inside universes. The people who take this seriously are the singularity cultists, and they're the people practically praying to AI gods at this point.
Reality is, the people running these companies, who largely haven't done engineering in a long time if ever, are trying to pretend their sci fi dreams are real. Pretty much every good and bad idea that came out of sci fi in the last thirty years got its start in Star Trek the Next Generation.
No. It won't take a century. I think the approach of learning a lot from the world from data is in fact the right starting point.
OpenAI's products are also moving from just rehashing stuff to multi-step logic where the AI tries many strategies and evaluates itself until it solves the problem.
What is needed is for AI to have a better understanding of what it is dealing with and a better feedback loop. Likely more sophisticated representations too.
Every transformer based model has been caught out parroting its dataset. The justification for this behavior is "this is also how people learn", but you can teach a child how to roll a ball without giving them three hundred trillion examples of it. That's what a general problem solving intelligence does. It problem solves.
These transformers don't. They parrot, bootstrapping themselves off an immense amount of information taken from 8 billion actual problem solvers, sufficient enough to look slightly smarter than people who don't know what they are.
If you consider (from a very high level perspective) the mechanisms through which OpenAI and similar have an opportunity to improve their LLMs, they really have three areas of improvement they can pursue:
1. Improving conceptual recognition through context, a largely mathematical problem
2. Improvement of contextual length, which is both a capacity problem and a mathematical problem
3. Improvement of contextual summarising, which is a mathematical problem
You're well into death by dimensionality in problem 3. Problem 1 is still largely theoretical and we don't even really have a conceptual understanding of it in people, let alone machines. Problem 2 is partly reliant on problem 3, but also on sheer numerical brute force. That will get them some of the way, but we're talking getting to warp drive by dinosaur bone fuel here.
None of these problems are solvable by the LLM themselves. Worse, even assuming you were able to teach an agent something (you can't, you can only alter the results from its querying by embedding additional context), it has no mechanism of feeding that information back as anything other than "this query solved a problem".
There is no intuition. No conceptual wrangling. No unsupervised learning. No feedback method. Worse, they don't perform on the fly conceptualisation like we do. They have it hard coded, embedded into them because of the language they speak.
Not only are we not going to see AGI this year, but without a substantial revolution in how these systems conceptualise (which shows no signs of coming), it won't arrive.
Ever.
Excellent & thank you.
So true. Memorization and understanding are not the same thing. A child can memorize a book, but it doesn’t mean they can read. When they start to learn the sounds letters make, and the blends, and then how to put them together, then they can read. Even then, it doesn’t mean they can interpret the text.
There's also the problem that the format their information is contained within, limits the manner in which they can learn.
When you think about how kids learn language, they don't learn it by reading a thouand sentences that contain the word cat. They see a cat. They stroke a cat. They see a symbol of a cat. They conceptualise: "cat". This process is pretty damn quick. It can take a handful of attempts to get a kid to recognise a cat. They then start generalising (sometimes incorrectly), but as they get better at language, they start asking questions about their conceptualisations.
"Hey, daddy, is that a cat?"
"No, that's a dog."
"Why isn't it a cat?"
People don't see these moments as what they are - attempts by the child to build an improved conceptual model of the thing they have in their head, that matches the concept of "cat".
Once they have this, if you say to them "can cats swim" they'll ask "what is swimming" and if you explain to them swimming, and they've swam themselves, they'll say "they have legs so they can swim". If they haven't, they'll ask how you swim. If you look like you're swimming, they will either incorrectly assume that cats must swim the same way, or ask if there's a way for cats to swim. (1)
That process there is problem solving intelligence. Not this "write out a program to solve x in Python" bullshit people keep pretending these LLMs are really good at. The language exchange is a tiny fraction of the process. The kid isn't being asked a query and then querying their dataset to determine how best to put together a sentence to respond. They've got an internal conceptual mirror which is assembled on the fly, attached to ways of expressing that concept, and then they communicate what they know, want to know, or don't communicate it at all.
The most important thing about (1) is that some kids will incorrectly assume that cats can't swim, or they might intuit that a cat might not like swimming. Or that their fur will get wet and cats might not like fur getting wet because they don't like getting wet themselves. Or maybe a cat swimming is funny.
All of that conceptual learning and expression is the same process as the learning what a cat is! LLMs can't even do the first bit right. Ask them a simple question in a linguistically similar form to a conceptual riddle and they solve the riddle, not the question!
Worse, there's no way to fix this, because humans have complex and multivariate compared to our relatively flattened method of verbal communication. If you ask a child to act like a cat, and they haven't seen one, sometimes they'll just come up with something they imagine could be called a cat!
Joey knows what’s a cat
https://m.youtube.com/watch?v=_WJfx6BJleI
Yes yes a thousand times yes! Thank you for explaining this so clearly.
The only example of GI that we have (humans, and maybe some animals depending on where you want to put the bar) has MANY aspects that MAY OR MAY NOT be NECESSARY for GI. For example, sleep. MAYBE that's just a bad feature that was needed for chemistry-based brains to clear out debris but ... maybe not! Maybe it's crucial! We don't know. Many other things as well, e.g. continuous sensory input (cut off all the senses of a human and see how long any functionality lasts), physical architecture shaped under those continuous inputs, ability to maniputlate the physically nearby environment .... etc etc etc. We have NO CLUE what is needed for GI and what is not. It is simply being ASSUMED that none of those things are necessary. And that might be right, but it might also be wrong. Personally, my bet is wrong.
And that would mean that AGI is decades or centuries or millenia away ...
Indeed, LLM alone is a mindless prediction engine. At most it does a local generalization around the samples.
We will build on this though. The way we do work is in small steps, with each step involving recognition, applying a recipe, continuous inspection, and refinement.
Where OpenAI is now, this is likely easier to implement with generating and running code, invoking tools for verification, searching around. Longer term, likely an on-the-fly simulation could be built for many problems, the way we do in our minds.
All this will take time though. My point is that this is not a dead-end, but there's a lot to do.
"We will build on this though. The way we do work is in small steps, with each step involving recognition, applying a recipe, continuous inspection, and refinement.: It isn't a reducible problem that can be cut into small pieces and solved that way, and it isn't going to be solved with a background in programming or Machine Learning. It requires people who are literate, and think fast enough to catch glimpses of their Unconscious Mind in action.
Do you have one specific example of an irreducible task?
I advise caution against buying into the hype created by OpenAI. They have repeatedly made claims that have later been discredited by independent research.
There are also significant issues with AI benchmarks, particularly given that the training data is neither shared nor independently verifiable, leaving them open to manipulation.
This appears to be yet another case of what resembles p-hacking – manipulating data to produce results that align with one’s interests.
Consider this excerpt from a tweet:
“Remember o3’s 25% performance on the FrontierMath benchmark?
It turns out OpenAI funded FrontierMath and had access to a substantial portion of the dataset.
The mathematicians who created the problems and solutions for the benchmark were not informed that OpenAI was behind the funding or had access to the data.
This raises critical concerns:
• It is unclear whether OpenAI trained o3 on the benchmark, making the reported results questionable.
• Mathematicians, some of whom are sceptical of OpenAI and would not wish to aid general AI advancements due to concerns about existential risks, were misled. Many were unaware that a major AI company funded the effort.
From Epoch AI:
‘Our contract specifically prevented us from disclosing information about the funding source and the fact that OpenAI has data access to much but not all of the dataset.’
https://x.com/mihonarium/status/1880944026603376865?s=46&t=oOBUJrzyp7su26EMi3D4XQ
The whole “benchmarking” process is very unscientific and completely unreliable.
The people involved need to take some very basic science classes to learn what science is about.
No self respecting, legitimate scientific organization would ever have agreed to such a contract to begin with.
Sam Altman: “guys please stop believing the hype I’ve played a huge hand in creating,” 🙄. I feel like Sam Altman’s corporate title should be “Chief of AI Hype,” because that seems to be his whole job.
Also, Gary you’re the best.
So what does "we know how to build AGI" mean, pray? A Sutskever-style answer like: we ask GPT4-3o to build it for us and give it unlimited time and resources. Anything that doesn't involve magical or alchemical thinking? Is *anyone* able to put the screws on so he'll answer what 'we know how' means?
Not sure what “we know how to build AGI” means but I think we can be fairly confident it doesn’t mean Sam does.
Most likely it means “Give us some more money”
All an LLM is and can ever be is a cheaternet. It looks smart by globally peeking over people's shoulders, copying their answers, and then pretending it came up with those answers all by itself.
This needs to stop.
Such a clown show!
He’s been giving Holmes vibes for a while. The whole refusing to show the details bit is unscientific and, as the kids say, sus.
Trade secrets are one thing if you start off as a for profit, and quite another when you claim to be saving humanity. 🤦🏼♀️
This feels like definitional gaslighting.
Thanks again Gary for being the voice of sanity countering the shitstorm that Substack Notes has become.
So bored by this bait-and-switch.
(On the other hand, I'm confident we'll end up on the better side of this with more instead of less skepticism.)
I've been thinking about an apt analogy for current AI. I've landed on human muscle memory, except exceptionally better muscle memory. Just like a human can blurt out words of a song after listening to it over and over again, or automatically swing a tennis racket the right way after hours and hours of practice, or chess grandmasters just play opening moves by memory, AI is good at that. The problem is we are trying to now make muscle memory produce useful things. AI is failing exactly the same way human muscle memory would fail if it were applied this way. Imagine a bank clerk using muscle memory to process your transaction.
I’d have no problem if the bank teller was using the muscle memory he had developed giving billionaires cash to process my transaction.
You would want them to pay attention and use their intelligence rather than mindless going through rote steps
Altman is making a mockery out of the technology his company pioneered. Can you imagine the CEO of IBM, or Microsoft, or Oracle doing this? No. If those guys did it, their board would fire them.
I prefer the term hypesters to influencers. That way critics aren't lumped in with them.
Imagine a world where AI research could advance without the extreme distorting effects of self-promotion. Imagine where we (would not) be if Feynman, Von Neumann, Cantor, Curie and the rest had been such shameless hucksters.
So, does this mean that we now need to recalibrate the delivery of AGI to the end of this century or even the next?
Any and all abstraction layers add to computational overhead. Any computer capable of solving for that overhead better than a human being is not going to be interested in us.
I'm of the opinion that AGI of the type they are discussing is conceptually impossible without immense changes in how we perform computations. The idea that they are is a human conceit, in large part because the people pushing this nonsense are Torment Nexusers who believe creating the Torment Nexus is how we achieve a multiplanetary species, and since they are the only people able to "understand this", they need to be outrageously wealthy to ensure it happens.
This is because the types of AGI they're discussing are a result of science fiction writers trying to conceptualise the kind of interface mathematicians thought they could have when they discovered how to metaprogramme.
On the sci fi angle. If you take the Enterprise D's main computer as an example, in universe it's some kind of quantum computer capable of faster than light communication inside itself. This sounds very impressive, but look at how the ship operates. The 'smart bit' which is the bulk of the conversation inside those Star Trek series involves the computer solving complex known equations in conversation with the actor.
Said actor mostly spews technobabble at another actor, and that technobabble has some vague meaning in the plot, the computer says "working" and then provides a plain text or visual text result. Okay, so far so cool.
The problem is the computer is also displaying a ton of information to the various foreheads inside the ship. So you've got a computer capable of superhuman computation based off natural language, which also spends most of its time providing useless readouts to beings acting a trillion times slower than it does.
Following this logic (as Iain M Banks does) leads to the same type of computers seen in the Culture novels - Minds. In the Culture novels, these things are essentially computers that get bored with so called "meatspace" and spend most of their time imagining universes inside universes. The people who take this seriously are the singularity cultists, and they're the people practically praying to AI gods at this point.
Reality is, the people running these companies, who largely haven't done engineering in a long time if ever, are trying to pretend their sci fi dreams are real. Pretty much every good and bad idea that came out of sci fi in the last thirty years got its start in Star Trek the Next Generation.
No. It won't take a century. I think the approach of learning a lot from the world from data is in fact the right starting point.
OpenAI's products are also moving from just rehashing stuff to multi-step logic where the AI tries many strategies and evaluates itself until it solves the problem.
What is needed is for AI to have a better understanding of what it is dealing with and a better feedback loop. Likely more sophisticated representations too.
I'd say another 5-10 years.
If the AI wants to know if it’s a cat, it should just ask Joey
https://m.youtube.com/watch?v=_WJfx6BJleI