"describe the ways in which large language models (LLMs) can carry out tasks for which they were not specifically trained" - how do they know the LLMs were not specifically trained, have they examined the terabytes of training data or the millions if not billions of instances of RLHF to be able to claim that. To declare that LLMs can do that, the first step would be for the LLM to learn simple arithmetic and demonstrate it with big numbers with a lot of digits (that can not be simply remembered from the training data). Until an LLM can be demonstrated to be able to do that, all claims of a magically emergent AGI are just bla, bla, bla. So, count me among the confused too :) Also, I think 10 years from now people will look back at the current events and claims from prominent leaders in the AI field and just shake their heads in bemused disbelief.
Hi Gary. I basically agree with everything you said. I read the article a few days ago and was rather taken aback, especially by the condescending tone. Given the authors' stature in the field, they should know better than to make the kinds of pronouncements they did about today's systems. Along with Hinton's 60 minutes interview, there seems to be a lot of wishful thinking going around these days. This is shades of the '70s and '80s.
When I had started to read this blog, I believed that AGI might come soon, just behind the next corner. I have learned a lot since then and I don’t think so anymore. From the comments published here I tried to figure out how the LLMs were operating and I found a correspondence which is familiar to me as a physicist and engineer and which is the “black box model”. The “black box model” is a fully empirical model as opposed to knowledge based (laws of mechanics, thermodynamics, etc.) models. The empirical models are made of correlations based on experiments’ results. In their engineering form they are a set of polynomial equations relating outputs to input parameters, with a multitude of coefficients obtained by mathematical fitting the model output to some experimental data. There is no internal logic, schemes or rules, the model represents blindly the data it was fitted on. And it works well in mechanical or chemical engineering applications if a single well delimited phenomenon, or process is represented. It can be efficient for prediction if the user does not cross (even by the smallest amount) the value domain of parameters considered and does not try to represent a situation where an additional parameter is needed (even just a single parameter more). The LLMs have basically the same limitations. But the ambition with LLMs is quite extravagant as they will pretend to describe the entire world. If one wants to describe the entire world with a “black box model”, one would need an infinite number of parameters and an infinite number of coefficients. That is not possible. That’s why obviously LLMs are not very close to AGI. In order to get AGI, or just a reliable general representation of the world complexity, we will need a hybrid approach, a combination of empirical correlations, knowledge based equations and internal strong inferring rules.
Yours and Gary's well-reasoned and factual opposition against hype and nonsense is failing to shake existing convictions. Exactly the same happened in the 1960s, with Life stating in a famous article in 1970 that 'electronic persons' had arrived and Minsky in that article quoted saying that AGI would be there 'within 3 to 8 years'. Minsky then. Hinton c.s. now. History doesn't repeat, but it definitely rhymes. And the voices of realism are lost in a chorus of unfounded convictions. In the discussion on artificial intelligence, the role of human intelligence and how convictions form and are resistant to argument is the real interesting subject.
Oct 18, 2023·edited Oct 18, 2023Liked by Gary Marcus
You two gentlemen are way too polite. Agüera y Arcas and Norvig are not just wrong. They are miserably wrong. There is nothing in generative AI or LLMs that is even remotely useful to solving AGI. To crack AGI, the mainstream paradigm must not merely shift. It must be discarded entirely. A completely new model must arise to replace the current one. Unfortunately, I don't see this revolution coming from the deep learning generation.
PS. In my opinion, generalization and precise timing are key components of AGI.
Replace all instances of "must" with "reasonably" in your comment, and you'll reasonably have a less pseudoscientific answer
Albeit, what science/papers have you written to help solve the issue? Of late, neither Marcus nor several proponents of "X is absolutely not-AGI" claim camp have proposed modern solutions
Seems to me this was written while on auto-pilot, the temperature was a bit lower than Andreesen when he did his techno-optimist manifest, but still pretty much autopilot. You do that when you've given up on thinking for some reason or another. Maybe it's too hard, or too painful. These people just want to party down in the Good Place and leave all the worries to, well, whomever.
A more than welcome rebuttal. What I don’t get it why Arcad and Norvig would go as far as making the argument in the first place, when there is such clear evidence of the contrary...
There's money in hype. That's about as good of an explanatory theory as I can come up with. I don't say these people are getting paid big bucks to say it, but if your in a culture where people are making a lot of wealth from hype then the belief system and world view of that culture will be influential to anyone in it.
Consider Norvig's reply to Chomsky. I found this on the cs224 notes when I was teaching myself ML:
"And while it may seem crass and anti-intellectual to consider a financial measure of success, it is worth noting that the intellectual offspring of Shannon's theory create several trillion dollars of revenue each year, while the offspring of Chomsky's theories generate well under a billion."
This is supposed to be an argument in favour of the statistical machines. It generates money therefore its good science? I'm sure scientology has made some nice bank from its followers compared to, idk, whatever the latest research in knot theory is.
LLMs are in the strongly rising part of the hype cycle. It will pass, either as generative AI hits clear limits or because a new hype cycle resulting from a new approach takes over.
The terminology is kinda confusing. LLMs are a (somewhat-)general technique in AI, in that they're able to do many kinds of tasks without being specifically trained on that task, e.g. few-shot learning. But that's not what capital-G General AI means.
Anecdotally, I've heard of fine-tuning an LLM on training materials a human might use (e.g. a spec or guide) and the LLM getting better at the task. That's maybe the first time I've seen something in "AI" that we might call "learning" if done by humans. Most older supervised learning tasks (even using deep learning) would seem bizarrely rote to humans. Like a human who needed to see thousands of pictures of a cat to be able to identify a cat would be considered to have some kind of severe learning disability. But it's a big leap from "maybe not completely unintelligent" to GAI.
I think the "always a human in the loop" component that Rodney Brooks recently described on his blog (https://rodneybrooks.com/what-will-transformers-transform/), is a particularly important factor here: these systems wouldn't be nearly as impressive if there wasn't a human in the loop.
This is what really limits their utility (and generality) in practice - because of unreliability, they cannot operate autonomously and serve as building blocks of bigger systems. A human with a domain expertise is what makes them actually somewhat useful in practice.
I think they're still pretty impressive, but I guess it depends on what your baseline is. Doing linguistic tasks unreliably is a huge leap over not being able to do them at all.
The success of computational, ungrounded large language models based on the distributional hypothesis tells us a great deal about about how natural languages work but probably rather little about AGI other than to confirm that, contrary to Turing's imitation game, linguistic competency (however defined) is not a good measure of general intelligence.
The observation that linguistic competency can be programmed into a box, just as numeric competency can be programmed into a calculator, challenges our view of ourselves, our understanding of what makes humans intelligent, but this isn't relevant to AGI other than perhaps gently to question what we mean by "general intelligence", as discussed in the article here.
Humans use both numeric calculations and natural language, among other techniques, to reason about the world. Machines can be programmed to do the same but that doesn't make a system like GPT-4 any more intelligent than an electronic calculator, although both can be extremely useful.
Gary, I wonder how many times we can repeat these basic weaknesses to persons unwilling to acknowledge them. Part of me thinks we are encountering something akin to a culture war within AI circles. Why not? America breeds binaries that repel all reason or logic. There is a ideological impetus undergirding the constant redefinition of AGI on the part of the corporate tech giants. What can we do? Keep our heads. Realize larger forces at work behind the debate. And focus on questions of implementation.
Gary, I love your take on this. We as a community should build a solid AGI benchmark. Or even a subset of various general intelligences benchmark (physical intelligence, verbal intelligence, game solving, mathematics e.t.c).
Right now AGI is tossed around to get that sweet sweet VC handouts. But if there is a common agreed upon benchmark, then it's easy to say we're 10% hitting AGI benchmark but failing this domains.
Unless there is objective measure, most people will talk past each other.
there is a very simple test and Gary mentioned it - doing arithmetic with large numbers. Due to the exponential increase of combinations with every new digit, even Microsoft's data centres don't have enough memory to remember all combinations and so to solve that task the LLM has to really learn the rules of arithmetic and apply them. No LLM has been able to demonstrate that so far.
A good summary of the state of the art. It’s amazing how much is changed and how little has changed. Like the philosopher Dreyfus said back in 1972 and revisited in the new edition of his book (1992) “What Computers *Still* Can’t Do". 😆
Agreed, although Dreyfus, as I recall, made claims that AI would *never* be able to do various things. A dangerously dogmatic claim about the indefinite future!
Interesting – I'll have to review the book again. It's been quite a while.
I just found t on my shelf and opened the introduction, where he responds to the fading of the symbolic approach and the rise of neural nets, connectionism, etc... and found this: "...Even if we assume the simplistic view that human beings behave so as to maximize their total sense of satisfaction, a reinforcement learning approach to producing such behavior would require a rule for determining the immediate satisfaction derived from each possible action in each possible situation. But human beings do not have or need any such rule. Our needs, desires, and emotions provide us directly with a sense of the appropriateness of our behavior. If these needs, desires, and emotions in turn depend on the abilities and vulnerabilities of a biological body socialized into a culture, even reinforcement-learning devices still have a very long way to go."
Seems reasonable and non-dogmatic to me, though I haven't had time to read any more...
I'm much less familiar with Dreyfus than with people like John Searle, and my memories are from the late 1980s/early 1990s, so I may be unfair to Dreyfus. I believe he was primarily deeply skeptical about the symbolic approaches to AI that were dominant at the time (GOFAI). His views might not apply so much to neural network approaches.
One thing symbol-processing AI was not good with was tacit knowledge. That was one of Dreyfus' very reasonable points. It probably applies also to connectionist approaches although perhaps less so -- depending on how much tacit knowledge can be extracted from words.
Dreyfus made mince meat of GOFAI and symbolic, rule-and-discrete-fact-based AI, and his book is still worth reading, not just for the critique but also for the way he describes the 'beliefs' that drive the assumptions AGI was around the corner. and how he chronicled the 'hype'.
History is currently repeating itself. Minsky then is like Hinton now.
In the 1992 edition, he addresses one of the 1980's rebirth of neural network architecture approaches (which actually predates even digital computers).
Anyway, AGI and AI were not so separated in the discussion then as they are now. When people were talking AI, they generally were assuming an on-ramp to AGI. This went on until even the Jeopardy! win by IBM's Watson in the 00s. But the 'big data analytics' that followed and the data scientists that drove it were not encumbered with AGI-dreams. Now, because LLMs produce grammatically perfect language, and because evaluating language is our main way of assessing intelligence, AGI returns in the discussion.
PS. For people wanting to read Dreyfus: every previous edition (1972, 1979) is embedded in the last one. The best way to read it is to start with the introduction to the 1979 edition, then follow the order given there, and read the introduction to the 1992 MIT edition as an epilogue. That 'introduction' more or less assumes you are already familiar with everything in the book.
I should find time to read Dreyfus fully. I was probably too dismissive of him without really reading him due to the influence of Hofstadter, Dennett, and others. Regarding Minsky, there was much less AI history then, but Hinton has less excuse!
Definitely. Dreyfus is, after half a century, still very worthwhile reading, even if the paradigm he demolished has been largely become irrelevant (though it still lurks here and there). Which is rather unique.
As for me I consider only one goal for Humankind and Science - Natural Language Understanding - and I use a General Intelligence System for this purpose and the system is a Semantic Multilingual Model (now for 16 languages - activedictionary.com) implemented as a relational database and originally created in 2002 as a large language model for English and Russian languages. If anyone is interested - https://www.linkedin.com/newsletters/natural-language-understanding-7020382304715911168/
"describe the ways in which large language models (LLMs) can carry out tasks for which they were not specifically trained" - how do they know the LLMs were not specifically trained, have they examined the terabytes of training data or the millions if not billions of instances of RLHF to be able to claim that. To declare that LLMs can do that, the first step would be for the LLM to learn simple arithmetic and demonstrate it with big numbers with a lot of digits (that can not be simply remembered from the training data). Until an LLM can be demonstrated to be able to do that, all claims of a magically emergent AGI are just bla, bla, bla. So, count me among the confused too :) Also, I think 10 years from now people will look back at the current events and claims from prominent leaders in the AI field and just shake their heads in bemused disbelief.
Hi Gary. I basically agree with everything you said. I read the article a few days ago and was rather taken aback, especially by the condescending tone. Given the authors' stature in the field, they should know better than to make the kinds of pronouncements they did about today's systems. Along with Hinton's 60 minutes interview, there seems to be a lot of wishful thinking going around these days. This is shades of the '70s and '80s.
When I had started to read this blog, I believed that AGI might come soon, just behind the next corner. I have learned a lot since then and I don’t think so anymore. From the comments published here I tried to figure out how the LLMs were operating and I found a correspondence which is familiar to me as a physicist and engineer and which is the “black box model”. The “black box model” is a fully empirical model as opposed to knowledge based (laws of mechanics, thermodynamics, etc.) models. The empirical models are made of correlations based on experiments’ results. In their engineering form they are a set of polynomial equations relating outputs to input parameters, with a multitude of coefficients obtained by mathematical fitting the model output to some experimental data. There is no internal logic, schemes or rules, the model represents blindly the data it was fitted on. And it works well in mechanical or chemical engineering applications if a single well delimited phenomenon, or process is represented. It can be efficient for prediction if the user does not cross (even by the smallest amount) the value domain of parameters considered and does not try to represent a situation where an additional parameter is needed (even just a single parameter more). The LLMs have basically the same limitations. But the ambition with LLMs is quite extravagant as they will pretend to describe the entire world. If one wants to describe the entire world with a “black box model”, one would need an infinite number of parameters and an infinite number of coefficients. That is not possible. That’s why obviously LLMs are not very close to AGI. In order to get AGI, or just a reliable general representation of the world complexity, we will need a hybrid approach, a combination of empirical correlations, knowledge based equations and internal strong inferring rules.
exactly
Very well said.
Yours and Gary's well-reasoned and factual opposition against hype and nonsense is failing to shake existing convictions. Exactly the same happened in the 1960s, with Life stating in a famous article in 1970 that 'electronic persons' had arrived and Minsky in that article quoted saying that AGI would be there 'within 3 to 8 years'. Minsky then. Hinton c.s. now. History doesn't repeat, but it definitely rhymes. And the voices of realism are lost in a chorus of unfounded convictions. In the discussion on artificial intelligence, the role of human intelligence and how convictions form and are resistant to argument is the real interesting subject.
You two gentlemen are way too polite. Agüera y Arcas and Norvig are not just wrong. They are miserably wrong. There is nothing in generative AI or LLMs that is even remotely useful to solving AGI. To crack AGI, the mainstream paradigm must not merely shift. It must be discarded entirely. A completely new model must arise to replace the current one. Unfortunately, I don't see this revolution coming from the deep learning generation.
PS. In my opinion, generalization and precise timing are key components of AGI.
Replace all instances of "must" with "reasonably" in your comment, and you'll reasonably have a less pseudoscientific answer
Albeit, what science/papers have you written to help solve the issue? Of late, neither Marcus nor several proponents of "X is absolutely not-AGI" claim camp have proposed modern solutions
One does not have to solve rocket propulsion to know that no one can reach the moon by climbing a tree. Thanks for asking.
I am putting my money on Variational Bayes for system 1 tasks
Seems to me this was written while on auto-pilot, the temperature was a bit lower than Andreesen when he did his techno-optimist manifest, but still pretty much autopilot. You do that when you've given up on thinking for some reason or another. Maybe it's too hard, or too painful. These people just want to party down in the Good Place and leave all the worries to, well, whomever.
A more than welcome rebuttal. What I don’t get it why Arcad and Norvig would go as far as making the argument in the first place, when there is such clear evidence of the contrary...
There's money in hype. That's about as good of an explanatory theory as I can come up with. I don't say these people are getting paid big bucks to say it, but if your in a culture where people are making a lot of wealth from hype then the belief system and world view of that culture will be influential to anyone in it.
Consider Norvig's reply to Chomsky. I found this on the cs224 notes when I was teaching myself ML:
from
https://norvig.com/chomsky.html
"And while it may seem crass and anti-intellectual to consider a financial measure of success, it is worth noting that the intellectual offspring of Shannon's theory create several trillion dollars of revenue each year, while the offspring of Chomsky's theories generate well under a billion."
This is supposed to be an argument in favour of the statistical machines. It generates money therefore its good science? I'm sure scientology has made some nice bank from its followers compared to, idk, whatever the latest research in knot theory is.
Yes, ChatGPT is stochastic parrot.
AGI my ass!
I asked it to create a math equation homework for my daughter and solve it. Look at the output below. It has NO UNDERSTANDING of math.
ChatGPT:
5(4x - 2y + 3) + 2(3x - 5y - 6) = 20x - 10y + 15 + 6x - 10y + 12 = 26x - 20y + 27
Me:
Your math is wrong for question 10
ChatGPT:
Apologies for the mistake. Let's correct problem 10:
5(4x - 2y + 3) + 2(3x - 5y - 6) = 20x - 10y + 15 + 6x - 10y - 12 = 26x - 20y + 3
Thank you for pointing that out. If you have any more questions or need further assistance, please let me know!
Looks like GPT-4 does not have this exact issue.
GPT4:
Sure, let’s simplify the equation:
First, distribute the numbers outside the parentheses:
20x - 10y + 15 + 6x - 10y - 12
Then, combine like terms:
26x - 20y + 3
So, the simplified form of the equation is 26x - 20y + 3.
Here's what he wrote:
Actually, some years later Marvin explained to me why he wanted me to
lead the project.
He said that if he put a grad student in charge the grad student would
know enough to realize that it was impossible to do what he wanted and
would just ignore him. However he wanted to know what was hard about
the problem so he put a sophomore in charge so I would just rage ahead
and tell him what difficulties I found. He said that I reported
exactly what he needed to know--that the "lines" we see in the edges
of objects in the scene are not really clearly there; the noise is
very large and much of the edges was illusory.
GJS
LLMs are in the strongly rising part of the hype cycle. It will pass, either as generative AI hits clear limits or because a new hype cycle resulting from a new approach takes over.
The terminology is kinda confusing. LLMs are a (somewhat-)general technique in AI, in that they're able to do many kinds of tasks without being specifically trained on that task, e.g. few-shot learning. But that's not what capital-G General AI means.
Anecdotally, I've heard of fine-tuning an LLM on training materials a human might use (e.g. a spec or guide) and the LLM getting better at the task. That's maybe the first time I've seen something in "AI" that we might call "learning" if done by humans. Most older supervised learning tasks (even using deep learning) would seem bizarrely rote to humans. Like a human who needed to see thousands of pictures of a cat to be able to identify a cat would be considered to have some kind of severe learning disability. But it's a big leap from "maybe not completely unintelligent" to GAI.
I think the "always a human in the loop" component that Rodney Brooks recently described on his blog (https://rodneybrooks.com/what-will-transformers-transform/), is a particularly important factor here: these systems wouldn't be nearly as impressive if there wasn't a human in the loop.
This is what really limits their utility (and generality) in practice - because of unreliability, they cannot operate autonomously and serve as building blocks of bigger systems. A human with a domain expertise is what makes them actually somewhat useful in practice.
I think they're still pretty impressive, but I guess it depends on what your baseline is. Doing linguistic tasks unreliably is a huge leap over not being able to do them at all.
The success of computational, ungrounded large language models based on the distributional hypothesis tells us a great deal about about how natural languages work but probably rather little about AGI other than to confirm that, contrary to Turing's imitation game, linguistic competency (however defined) is not a good measure of general intelligence.
The observation that linguistic competency can be programmed into a box, just as numeric competency can be programmed into a calculator, challenges our view of ourselves, our understanding of what makes humans intelligent, but this isn't relevant to AGI other than perhaps gently to question what we mean by "general intelligence", as discussed in the article here.
Humans use both numeric calculations and natural language, among other techniques, to reason about the world. Machines can be programmed to do the same but that doesn't make a system like GPT-4 any more intelligent than an electronic calculator, although both can be extremely useful.
Gary, I wonder how many times we can repeat these basic weaknesses to persons unwilling to acknowledge them. Part of me thinks we are encountering something akin to a culture war within AI circles. Why not? America breeds binaries that repel all reason or logic. There is a ideological impetus undergirding the constant redefinition of AGI on the part of the corporate tech giants. What can we do? Keep our heads. Realize larger forces at work behind the debate. And focus on questions of implementation.
Yes I have thinking about writing about this culture war
I’d love to read that!
Gary, I love your take on this. We as a community should build a solid AGI benchmark. Or even a subset of various general intelligences benchmark (physical intelligence, verbal intelligence, game solving, mathematics e.t.c).
Right now AGI is tossed around to get that sweet sweet VC handouts. But if there is a common agreed upon benchmark, then it's easy to say we're 10% hitting AGI benchmark but failing this domains.
Unless there is objective measure, most people will talk past each other.
there is a very simple test and Gary mentioned it - doing arithmetic with large numbers. Due to the exponential increase of combinations with every new digit, even Microsoft's data centres don't have enough memory to remember all combinations and so to solve that task the LLM has to really learn the rules of arithmetic and apply them. No LLM has been able to demonstrate that so far.
and see the newest essay, on exactly that
A good summary of the state of the art. It’s amazing how much is changed and how little has changed. Like the philosopher Dreyfus said back in 1972 and revisited in the new edition of his book (1992) “What Computers *Still* Can’t Do". 😆
Agreed, although Dreyfus, as I recall, made claims that AI would *never* be able to do various things. A dangerously dogmatic claim about the indefinite future!
Interesting – I'll have to review the book again. It's been quite a while.
I just found t on my shelf and opened the introduction, where he responds to the fading of the symbolic approach and the rise of neural nets, connectionism, etc... and found this: "...Even if we assume the simplistic view that human beings behave so as to maximize their total sense of satisfaction, a reinforcement learning approach to producing such behavior would require a rule for determining the immediate satisfaction derived from each possible action in each possible situation. But human beings do not have or need any such rule. Our needs, desires, and emotions provide us directly with a sense of the appropriateness of our behavior. If these needs, desires, and emotions in turn depend on the abilities and vulnerabilities of a biological body socialized into a culture, even reinforcement-learning devices still have a very long way to go."
Seems reasonable and non-dogmatic to me, though I haven't had time to read any more...
I'm much less familiar with Dreyfus than with people like John Searle, and my memories are from the late 1980s/early 1990s, so I may be unfair to Dreyfus. I believe he was primarily deeply skeptical about the symbolic approaches to AI that were dominant at the time (GOFAI). His views might not apply so much to neural network approaches.
One thing symbol-processing AI was not good with was tacit knowledge. That was one of Dreyfus' very reasonable points. It probably applies also to connectionist approaches although perhaps less so -- depending on how much tacit knowledge can be extracted from words.
Dreyfus made mince meat of GOFAI and symbolic, rule-and-discrete-fact-based AI, and his book is still worth reading, not just for the critique but also for the way he describes the 'beliefs' that drive the assumptions AGI was around the corner. and how he chronicled the 'hype'.
History is currently repeating itself. Minsky then is like Hinton now.
In the 1992 edition, he addresses one of the 1980's rebirth of neural network architecture approaches (which actually predates even digital computers).
Anyway, AGI and AI were not so separated in the discussion then as they are now. When people were talking AI, they generally were assuming an on-ramp to AGI. This went on until even the Jeopardy! win by IBM's Watson in the 00s. But the 'big data analytics' that followed and the data scientists that drove it were not encumbered with AGI-dreams. Now, because LLMs produce grammatically perfect language, and because evaluating language is our main way of assessing intelligence, AGI returns in the discussion.
PS. For people wanting to read Dreyfus: every previous edition (1972, 1979) is embedded in the last one. The best way to read it is to start with the introduction to the 1979 edition, then follow the order given there, and read the introduction to the 1992 MIT edition as an epilogue. That 'introduction' more or less assumes you are already familiar with everything in the book.
I should find time to read Dreyfus fully. I was probably too dismissive of him without really reading him due to the influence of Hofstadter, Dennett, and others. Regarding Minsky, there was much less AI history then, but Hinton has less excuse!
Definitely. Dreyfus is, after half a century, still very worthwhile reading, even if the paradigm he demolished has been largely become irrelevant (though it still lurks here and there). Which is rather unique.
As for me I consider only one goal for Humankind and Science - Natural Language Understanding - and I use a General Intelligence System for this purpose and the system is a Semantic Multilingual Model (now for 16 languages - activedictionary.com) implemented as a relational database and originally created in 2002 as a large language model for English and Russian languages. If anyone is interested - https://www.linkedin.com/newsletters/natural-language-understanding-7020382304715911168/