Reports of the birth of AGI are greatly exaggerated

Breathless predictions about Artificial General Intelligence (AGI) are nothing new; they still aren’t true

and

Oct 18, 2023

Marvin Minsky apparently thought, back in the 1960s, that computer vision was a problem that could be solved in a summer. (Spoiler alert, in 2023 the problem is still far from solved). What’s new, though, are rumors that AGI has been solved. Sam Altman announced in late September, perhaps in jest, that “AGI has been achieved internally”, only to walk it back a few hours later. Last week, Blaise Agüera y Arcas, vice president and fellow at Google Research, and Peter Norvig, formerly Director of Research at Google, and now at Stanford, argued in all seriousness that “Artificial General Intelligence Is Already Here”— in what can only be described as an epic act of goal-post shifting.

The article begins with this bold claim:

“Today’s frontier models perform competently even on novel tasks they were not trained for, crossing a threshold that previous generations of AI and supervised deep learning systems never managed. Decades from now, they will be recognized as the first true examples of AGI, just as the 1945 ENIAC is now recognized as the first true general-purpose electronic computer.”

Agüera y Arcas and Norvig describe the ways in which large language models (LLMs) can carry out tasks for which they were not specifically trained; they claim that this ability is key to being an AGI; and they argue that therefore the kernel of AGI has been achieved. They conclude with a long, condescending discussion of the various reasons that some pitifully confused people are “reluctan[t] to acknowledge [that] AGI has already arrived.”

Count us among the confused; we really, really don’t know what Agüera y Arcas and Norvig are talking about.

To begin with, by the classical criteria for AGI [Artificial General Intelligence], LLM’s hardly qualify. One of us wrote about those criteria last year, in this Substack, in preparation for a proposed bet with Elon Musk, consulting with two of the people who coined the term AGI, Ben Goertzel and Shane Legg, and came up with this.

If we are going to bet, we should make some ground rules. The term AGI is pretty vague, and that’s not good for either of us. As I offered on Twitter the other day: I take AGI to be “a shorthand for any intelligence ... that is flexible and general, with resourcefulness and reliability comparable to (or beyond) human intelligence”.

Current machines can’t do this, especially not the reliability part. They can’t count, do five digit multiplication, reliably generalize kinship relations, or even reliably follow the rules of chess, and so on—despite the fact that each of these activities is presumably well explained, explicitly, in the training set, They also can’t, for example, obey the dictum “don’t make stuff up”, when they write a biography or a new story, and we have yet to see them drive cars as well as the average human adult.

DeepMind co-founder Shane Legg is generally more optimistic about AI than we are, but even he couldn’t quite see what his Google colleagues Aguera y Arcas and Norvig are on about; when we asked him on X whether he bought their argument, he replied as diplomatically as he could, and optimistically, but with a clear affirmation of what we said: AyA&N had lowered the bar :

They use a lower bar for AGI than I do, and so they get a different answer. My bar for AGI is a machine that can do the cognitive tasks that people can typically do. We're not there yet, but we're much closer to this than we were a few years ago.

Melanie Mitchell was also a bit diplomatic, but at least as miffed as we were by the condescension:

They made some good points, but also made some dubious statements. Also, I think the undercurrent of dismissal of "skeptics" is not helpful. All scientists should be skeptics -- that's part of the job.

The problem with the paper though isn’t just its hostility towards critics, or the way in which it lowers the bar, it’s that its key arguments don’t go through.

One key claim (written in bold pullout text) that made our jaw drop was that “Frontier language models can perform competently at pretty much any information task that can be done by humans, can be posed and answered using natural language, and has quantifiable performance.” That’s just obviously not so. There are many purely language-based tasks that humans can do — question answering, writing factual prose, summarizing, examining the truth of a claim, cross-examining a hostile witness in court, reading and understanding a full-length book — that existing AI systems cannot perform “competently”. The first author’s nine-year-old can do five digit multiplication better than an LLM. We love Noema, but their fact checker was asleep at the switch. (What Agüera y Arcas and Norvig might really mean, perhaps, is that LLMs do well on a lot of benchmarks, but then again since we don’t know eg how many law school exams GPT-4 was trained on, it’s hard to even know what to say about these benchmarks. The fact that making benchmarks is hard doesn’t mean LLMs have achieved AGI).

Meanwhile, the boldfaced (written in bold pullout text) claim that “The most important parts of AGI have already been achieved by the current generation of advanced AI large language models” is logically flawed. Until we get to the finish line, we can’t know the route. Maybe what’s required already exists, but more likely, we are still at least a few paradigm shifts away . To take one example, true AGI might revolve around planning with respect to detailed world models that are maintained over time; as Subbarao Kambhampati has shown, LLMs are weak at planning; as hallucinations show, they are also weak at the maintenance of world models, strongly suggesting some key ingredients have not yet been invented. It’s certainly plausible that some of the prerequisites of AGI may have been achieved, but considerable overreach to presume that all of “the most important ones” have already been invented.

The fact that LLMs trained on the task of next-token prediction can also do a pretty good job at a variety of natural language tasks, and, very recently, a considerably less good job of tasks that combine text with images, as emphasized by Agüera y Arcas and Norvig is certainly interesting, but it doesn’t justify the claim that they constitute “general intelligence”. It is common in the history of technology (as well as biological evolution) that a mechanism developed for one purpose turns out to be useful for another. GPUs were developed for video games and other graphics applications, not for training deep learning systems or mining crypto-currency. That doesn’t make GPU’s AGI. (Though they may get used in the service of AGI when it comes).

The World Wide Web was developed for sharing scientific papers, not for online commerce or for inducing people to make available enormous quantities of digital data that can be used to train ML-based AI systems. That generality doesn’t make the web AGI.

Just one more thing to keep things real. It’s worth noting that for many different AI tasks --- robotics, image interpretation, planning, game playing, scientific applications, many forms of reasoning --- state of the art AI currently makes little to no use of the transformer architecture that underpins LLMs. If they really were AGI, presumably everybody would use them, for everything. But in many practical problems, working practitioners don’t; they use a wide range of other methods, often far more tractable, far more reliable, and far more efficient. Transformers and generative AI are getting a lot of attention, and have considerable utility, but they are nothing like the all purpose tools some people seems to imagine them to be. In AI, no single technique is likely to be a silver bullet, now or ever.

Here’s where we actually are, so much more more boring, and so much more real: Current models trained on next-token prediction have remarkable abilities and remarkable weaknesses. The scope of those abilities and weaknesses of the current systems is not well understood. There are some applications where the current systems are reliable enough to be practically useful, though not nearly as many as is often claimed. No doubt the next generation of systems will have greater abilities and more extensive applications and will somewhat mitigate the weaknesses. What is to come is even more poorly understood than the present. But there is zero justification for claiming that the current technology has achieved general intelligence.

Gary Marcus used to get flak for saying that some people were trying to build AGI. Now, perhaps, he will get flak for saying that we haven’t yet achieved it.

Ernie Davis is a Professor of Computer Science at New York University.

A guest post by

Ernest Davis

Professor of Computer Science, New York University

Roumen Popov

"describe the ways in which large language models (LLMs) can carry out tasks for which they were not specifically trained" - how do they know the LLMs were not specifically trained, have they examined the terabytes of training data or the millions if not billions of instances of RLHF to be able to claim that. To declare that LLMs can do that, the first step would be for the LLM to learn simple arithmetic and demonstrate it with big numbers with a lot of digits (that can not be simply remembered from the training data). Until an LLM can be demonstrated to be able to do that, all claims of a magically emergent AGI are just bla, bla, bla. So, count me among the confused too :) Also, I think 10 years from now people will look back at the current events and claims from prominent leaders in the AI field and just shake their heads in bemused disbelief.

Expand full comment

CFB

Hi Gary. I basically agree with everything you said. I read the article a few days ago and was rather taken aback, especially by the condescending tone. Given the authors' stature in the field, they should know better than to make the kinds of pronouncements they did about today's systems. Along with Hinton's 60 minutes interview, there seems to be a lot of wishful thinking going around these days. This is shades of the '70s and '80s.

41 more comments...

Marcus on AI

Discussion about this post