54 Comments
Aug 20, 2023Liked by Gary Marcus

We are not even on the right road to achieve AGI. See https://thereader.mitpress.mit.edu/ai-insight-problems-quirks-human-intelligence/ and the book that is referenced there.

Let's take a look at a slightly deeper level. It is not so much about what the models seem to get right and seem to get wrong, but the kinds of intelligence processes that are missing from the current approach to AI. General AI is not about how many problems can be solved, it is about the types of problems that are required to achieve general intelligence.

All of the current approaches to AI work by adjusting a model's parameters. Where does that model come from? Every breakthrough in AI comes from some human who figures out a new way to build a model. Intelligence needs more than adjusting parameters.

The model constrains completely the kind of "thoughts" (meant loosely), as represented by the parameters, that a model can even entertain. Anything else is "unthinkable."

There are whole classes of problems, often called insight problems, that are not even being considered here and certainly cannot be accomplished by even the current crop of GenAI models. These problems are solved when the solver comes up with a formerly unthinkable solution. Some humans are good at these problems, but computers, so far, not so much.

Expand full comment

Ironically modern LLMs exhibit a majority of the quirks the article talks about. Did you read it?

Expand full comment
Aug 20, 2023·edited Aug 20, 2023Liked by Gary Marcus

The right phrase for what we are now experiencing in societal discussions on AI is 'fever'.

The fever makes people convinced of things like 'AGI around the corner'. And people's existing convictions steer their logic and observations much more than the other way around. We could say that society has an AI-infection of the conviction-engine. We do not have a proper medication against this fever, just as we do not have a proper medicine against QAnon and such. Hence, your several valid observations and reasonings have little effect. Facts are pretty much useless. Logic as well.

We are talking a lot about what AI is or is not and what it might be able to do. But maybe we should be talking more about the human side and how easily human intelligence/society can get a fever like this.

I am glad that an insider like Shane Legg feeds the fever right now as my biggest fear is that the fever will break before I will give my presentation at EACBPM in London on 10 October and my presentation will be worthless as a result...

Expand full comment

Yes, and should those suffering from chronic fevers be given powers without limit?

Expand full comment

But why are some feeding the fever? It seems as if there is a race in progress, a race to solve AGI. It almost feels as if those who are promoting the hype are panicking, as if they are deathly afraid of losing the race to some unknown invisible power. It's a strange feeling.

Expand full comment

Most will simply be infected themselves, I guess.

Expand full comment

We work for an assessment organisation and have been surprised by the number of people claiming that exams designed for humans are an appropriate benchmark for AI. They're not, and that's not us moving the goalposts. Assessment theory 101 says that when a human takes a test, what matters is not the test score but the inferences that the end user can make about those test scores. That's why human test scores that have been achieved by cheating are not valid. So if we do want to be consistent and apply the same standard to AI as to humans and not to move the goalposts - then we need to know HOW the AIs achieve their results. In some cases we don't know - and therefore can't make the inferences we would with a human. In some cases there is a strong suspicion they are cheating - and in those cases we can infer that we can't make any useful inferences!

See point 3 here https://substack.nomoremarking.com/p/if-we-are-setting-assessments-that-a-robot-can-complete-what-does-that-say-about-our-assessments-cbc1871f502

Expand full comment

Exactly! In the same way I cannot make any inferences about any black box. I have no clue what it might produce in the future either towards some interest(s) I care about or against those interests. We (or some experts who are able to assess such things -- unlike myself) must have access to see under the hood exactly how the engine operates.

Expand full comment

The GPT-5 hype regarding AGI is becoming fantastical. It seems that progress is more dependent on humans being trained to not scratch their ears, and to accept GPT output as gospel - no further verification required. https://www.linkedin.com/posts/joergstorm_chatgpt-5-ugcPost-7096046781926387712-Ecl4

Expand full comment

It's absolutely bonkers... OpenAI have said nothing about GPT-5. Google it, and an endless list of completely speculative articles come up, all from websites I've never heard of, and probably mostly written by LLMs.

Expand full comment

As I said on my Stack, the serpent will start to feed on its tail as the VCs strap the world into another hype cycle fairground ride.

Expand full comment

Great article. In my opinion, if we are counting on generative AI experts to solve AGI, we can forget about it. It will never happen.

AGI will be considered solved when an intelligent robot can walk into an unfamiliar kitchen and fix an ordinary breakfast. It doesn't have to play chess, GO or write a PhD thesis. Heck, I'd be impressed to tears if mainstream AI practitioners suddenly achieved the generalized intelligence of a honeybee.

But this is not to say AGI can't be solved soon. One little breakthrough in generalized perception is all it will take to crack AGI in my opinion. It can happen at any time. And, again, it doesn't have to be at human level.

Expand full comment
Aug 20, 2023·edited Aug 20, 2023

I agree, Gary, that Legg 2023 lowers the bar. But I want to make a comment about one of your tasks, film comprehension. It is difficult, and I'm not sure that one can appreciate the difficulty unless one has done it many times, which I have.

I note, in the first place, that it is not unusual for professional film reviewers to make generally small errors of either omission or commission in their published reviews. They miss something that's relevant to their review or they assert something that didn't happen (hallucination?). It's not frequent, nor even common, but it's not unusual either. Further, online plot summaries – I'm most familiar with those in Wikipedia – are often faulty in one way or another. It is not easy to remember what happened and then to organize those memories into a coherent summary.

As an exercise, I wrote a blog post about those tasks where I used Spielberg's Jaws as an example: Operationalizing two tasks in Gary Marcus’s AGI challenge, href="https://new-savanna.blogspot.com/2022/06/operationalizing-two-tasks-in-gary.html.

Beyond that, there is the task of providing an interpretation of a film. One of the first things I did once I started playing around with ChatGPT was ask it to provide a Girardian interpretation of Jaws: Conversing with ChatGPT about Jaws, Mimetic Desire, and Sacrifice, https://3quarksdaily.com/3quarksdaily/2022/12/conversing-with-chatgpt-about-jaws-mimetic-desire-and-sacrifice.html. Why did I choose that film? Because, having already done it myself I knew, 1) that it was a doable task and, 2) I knew that were was plenty of online material about the film, including good summary information. This is important because ChatGPT cannot actually watch the film. It has to work from written material about it. ChatGPT did a serviceable, but hardly superior, job.

At the end of that article I pointed out the difference between what I had done in writing my article and what ChatGPT had to do in response to my prompting. The BIG thing is that I supplied ChatGPT with both the text/film to be examined and the interpretive strategy. That's half the game there, if not more, lots more. Here's the final section of my essay:

What’s This Tell us about the Relative Capabilities of Humans and Computers?

In an exact way? Very little to nothing. This isn’t a question we know how to address with exactitude. I’m afraid the best I can do is to offer a sophisticated, but (merely) qualitative, judgment.

I was impressed with ChatGPT’s capabilities. Interacting with it was fun, so much fun that at times I was giggling and laughing out loud. But whether or not this is a harbinger of the much-touted Artificial General Intelligence (AGI), much less a warning of impending doom at the hands of an All-Knowing, All-Powerful Superintelligence – are you kidding? Nothing like that, nothing at all. A useful assistant for a variety of tasks, I can see that, and relatively soon. Maybe even a bit more than an assistant. But that’s as far as I can see.

We can compare what ChatGPT did in response to my prompting with what I did unprompted, freely and of my own volition. There’s nothing in its replies that approaches my article, Shark City Sacrifice, nor the various blog posts I wrote about the film. That’s important. I was neither expecting, much less hoping, that ChatGPT would act like a full-on AGI. No, I have something else in mind.

What’s got my attention is what I had to do to write the article. In the first place I had to watch the film and make sense of it. As I’ve already indicated, have no artificial system with the required capabilities, visual, auditory, and cognitive. I watched the film several times in order to be sure of the details. I also consulted scripts I found on the internet. I also watched Jaws 2 more than once. Why did I do that? There’s curiosity and general principle. But there’s also the fact that the Wikipedia article for Jaws asserted that none of the three sequels were as good as the original. I had to watch the others to see for myself – though I was unable to finish watching either of that last two.

At this point I was on the prowl, though I hadn’t yet decided to write anything.

I now asked myself why the original was so much better than the first sequel, which was at least watchable. I came up with two things: 1) the original film was well-organized and tight while the sequel sprawled, and 2) Quint, there was no character in the sequel comparable to Quint.

Why did Quint die? Oh, I know what happened in the film; that’s not what I was asking. The question was an aesthetic one. As long as the shark was killed the town would be saved. That necessity did not entail the Quint’s death, nor anyone else’s. If Quint hadn’t died, how would the ending have felt? What if it had been Brody or Hooper?

It was while thinking about such questions that it hit me: sacrifice! Girard! How is it that Girard’s ideas came to me. I wasn’t looking for them, not in any direct sense. I was just asking counter-factual questions about the film.

Whatever.

Once Girard was on my mind I smelled blood, that is, the possibility of writing an interesting article. I started reading, making notes, and corresponding with my friend, David Porush, who knows Girard’s thinking much better than I do. Can I make a nice tight article? That’s what I was trying to figure out. It was only after I’d made some preliminary posts, drafted some text, and run it by David that I decided to go for it. The article turned out well enough that I decided to publish it. And so I did.

It’s one thing to figure out whether or not such and such a text/film exhibits such and such pattern when you are given the text and the pattern. That’s what ChatGPT did. Since I had already made the connection between Girard and Jaws it didn’t have to do that. I was just prompting ChatGPT to verify the connection, which it did (albeit in a weak way). That’s the kind of task we set for high school students and lower division college students.

Expand full comment
Aug 20, 2023·edited Aug 20, 2023

Hi Gary, lol, love your points, voice of reason!

Here's one thing that humans (and other animals, even worms) do, that any "AGI" based on even more gobs of data has a snowball's chance in hell of doing: deal with the world directly, ie without measured data, coded rules or goals.

Specifically about data - "data is dead". In nature there is no such thing as data, our brains don't use a decimal or binary or some other number system unless learned, etc etc. (cats for sure don't, unless they are not telling us!).

It's the same Chicken Little approach, combined with magical thinking - the sky will be falling anytime (but not just yet - ask me in a few years, I'll say the same thing again!) - Singularity, Bitcoin, Industrial Internet 4.0... and, if I keep saying it over and over it will become true (eg Elon Musk and SDCs).

Expand full comment

I agree fully regarding data and computation, and few people seem willing to make this point. LLMs learn how to produce language by practicing guessing the next word on existing text and using vector calculus to update a giant set of parameter values in such a way as to minimize a loss function that quantifies guessing error. And we... don't do that. No human performs matrix algebra in their heads while they talk. But LLMs do this non-stop when they "talk". What good reason is there to suspect that a computer can achieve "human intelligence", given such differences?

The usual retort to this is that the brain is a kind of computer and we use math to decipher nature in other scientific disciplines, so why shouldn't we be able to do the same and recreate human intelligence? Which just utterly confuses our method for understanding nature with nature itself. We are not in the business of reproducing nature using math. Math is an amazing tool for learning about nature, and for designing and constructing machines. The AGI crowd seem to believe that having a mathematical theoretical representation of a natural phenomenon implies that we can create that phenomenon artificially, using... well, math! The simulation becomes that which it simulates! The theory is the thing itself!

Will they try this out for general relativity? Can we math us up some black holes? How about some photons, what's the formula for summoning those into existance? Or how about a plain old boring living cell? We have computer simulations of living cells, so surely humans have solved the problem of making actual living cells from inorganic material by now. I'd guess we tackled that one decades ago, given that we're now on to thinking machines. That think using math.

Oh, and transistors. Lest I be unfair, we're gonna perform godly feats using the combination of math *and* transistors. We haven't figured out how to make life from non-life. But we're gonna make human intelligence from transistors.

How this silliness gets taken seriously by people in the field is beyond me.

Expand full comment

I know it is by now cliche but the maps (maths) are not the territory nor do they create territory (brains or any thing else). Platonism (À la Max Tegmark and company) feeds this kind of nonsense.

Expand full comment

Ben, lovely points, made me smile!

Exactly - the brain is not a computer in the von Neumann (stored program architecture) sense.

A $1 kitchen timer 'counts' (it actually doesn't even do that but let's go along with that) very differently compared to a digital timer that uses two variables, an incrementer and comparator. To claim that they function identically is beyond absurd. It's not that the former can be substituted by the latter; it's that the former is not even computing!!

Expand full comment

AGI is a term that causes far more trouble than the idea warrants. We came up with "intelligence" to describe, for the most part, human beings. Sometimes we think of it as one coherent phenomenon, other times we describe it in terms of a huge bundle of descriptions and abilities of, again, mostly humans. Then we take "intelligence" and start using it to describe properties of certain machines. Then we refer to a hypothetical machine that can do most or all of the huge bundle of stuff we associate with human intelligence, and call this "AGI". Oh, but of course "AGI" doesn't need to be a perfect reproduction of human intelligence! We know we're not getting that. It just needs to be able to do, for the most part, the big bundle of human stuff that we have a really really hard time defining precisely but that we think we're talking about when we say "intelligence".

And so then we play this ridiculous game, where new "AI" (of the non-general form) is able to perform some tasks typically associated with human intelligence, and so that's a sign that we're on the road to AGI. Except it doesn't achieve the tasks the way we do, but that's OK because that's not required. We only need the task to be accomplished.

And so new advances get treated as evidence that we're getting closer to this type of human-like machine intelligence called "AGI", but there's no requirement that these advance be similar in process to what humans do. Although if it so happens that we see some similarities, more evidence that we're getting closer to AGI!

Am I being unfair? What I described above is silly, but lots of very smart people take it very seriously.

Expand full comment

What you're describing is the shape of a puzzle, not a problem. A problem is a gap between some reasoner or reasoners' desired state of the world and its actual perceived state. A puzzle is a socially defined game with rules and conditions.

Subjectively, it's more fun and enjoyable to work on puzzles than problems. Puzzles have neat boundaries, they admit of particular techniques, there are rewards (social recognition at least) for ingenuity in solving them. Puzzles are the kind of thing you can train machine learning algorithms to solve with relative ease: Sudoku, Go, Chess, Starcraft.

Problems are ill-defined (indeed defining them is one of the challenges on the road to solving them), have stakes, and often cross conflicting interests and groups. Learning algorithms cannot solve problems - one of the properties of human intelligence is to make problems tractable and solvable by redefinition, investigation, and creativity.

Having read deeply in the literature, AGI seems far more like a puzzle than a problem. It has a more than passing resemblance to the so-called "Problems of Philosophy," which are for the most part defective puzzles, ones without defined win-states, but which are enjoyable enough to their practitioners that playing the game is satisfactory in itself, especially because it gives a low-stakes way to demonstrate ingenuity.

Even better if you can convince society at large, or a few philanthropists, to direct funds to support you in playing your endless game.

Expand full comment

I think this has drawn in so much money that everyone involved feels the need to “keep the faith” (or spread the propaganda) to justify that the amount of time and money spent has been (or soon will be) worth it.

Expand full comment

I feel the same way, at least if we're talking about most of the tech investory/CEO class. But when I listen to Sam Altman and Geoff Hinton, they sound like true believers. OpenAI's entire website seems like it was written by true believers.

I don't know quite what to make of them. A lot of that GPT-4 "system card" seems too cynical to be anything other than self-interested PR (e.g. misrepresenting the TaskRabbit anecdote; using a pathetically weak method for detecting contamination in the training data). But would a company who mainly cares about their bottom line use their whole website to push statements about AGI that are bound to come across as weird and kooky to most of the public? The most parsimonious explanation is that a lot of people there really believe the stuff they're putting on their website.

Expand full comment

Yes. Even (outside of Partisan Politics) in what OUGHT to be an evidence based (and mechanisms shown) debate, that there are True Believers persuaded by the correctness of their holy cause they won’t be moved (or slowed) by evidence or logics. Because they believe proof is in how confident you are, as shown by their intensity of faith and refusing or refuting (even by very bad argumentation) every question of doubt. (Much, I would note, like Donnie Diaper Dump does).

Is it too far to say the Tech Bros believe they are building the Rapture for Nerds into imminent reality? The urgency is the older that they get the less able and agile their minds will be and they want to preserve what is best at a high water mark.

The Singularity is Soon and the Upload is Neigh …

{Is that funny Ha-Ha? Or should I be gently — or greatly? — terrified?}

Expand full comment

How do you have such confident assertions on what can and can’t work? The truth is nobody really knows.

Expand full comment
Comment deleted
Expand full comment

“Nobody knows” is the null hypothesis. He has offered absolutely justifications whatsoever to refute this. Half of his points are factually incorrect but that doesn’t matter (I used AI to write summaries for homework 6 years ago). Even if they are true, what use are recognizing current limitations on future progress. Could the scaling hypothesis fail? Maybe, maybe not. No one really knows. I have yet to see a convincing explanation why not. And the explanation for why is just engineers induction.

He doesn’t provide a theoretical basis for why AGI isn’t imminent nor one can anyone exactly say why it might be. This is just speculative and his guess is as good as anyone’s.

Expand full comment
Comment deleted
Expand full comment

Repeating the words logic and epistemology without explanation doesn’t make your response logical or correct. It’s simply true that this article lays out claims without justification and fails to address counterclaims. The stated claims are also not relevant to forecasting. It’s not stated why relevant benchmarks fail. Just that some researcher stated at some point in time that some benchmarks may be overfitted.

Overfitting is a common problem with any sort of statistical method, and no doubt some models overfit to the validation set but this doesn’t mean all statistical methods always overfit.

Marcus goes on to write various current limitations of LLMs and neural nets. Some of these are clearly false, as summarization has been solved for probably 95% of cases. At the limit, summarization may be reduce to deciphering or translating a text to simpler words. This could theoretically require superhuman capabilities, so it’s not very relevant that a below human intelligence can’t solve the remaining 5%.

The rest is true for now, but even if these were all true, it would be pretty useless for forecasting. What’s more important is the rate of growth and whether that rate will change. The relevant question to answer this is the scaling hypothesis. There’s no theoretical basis for it besides a straight (log normed) line that’s continued will continue going straight. If the line continues to go straight the end will be a perfect Oracle, if not then it won’t.

Expand full comment
Comment deleted
Expand full comment

Neither are non-sequiturs.

Expand full comment

We are quite far from AGI. Even for software work, it takes takes a huge amount of thinking at multiple levels of abstraction, writing code, testing it, examining results, searching for bugs, etc, to get something useful.

Current AI is nowhere near being able to reflect on outcomes, adjust the course of action, validate, refine, etc.

Expand full comment

Hi Gary -- it seems a lot of problems in this area are caused by using short and pithy definitions of intelligence rather than useful ones and then projecting near term victory in attaining AGI based on that vague definition. The definition I use for intelligence is:

Intelligence is that quality which allows an entity to solve a wide range of deductive and inductive problems, extract and prioritize information from the environment, infer causal as well as correlative relationships from both small and large data sets over many known and novel domains, generalize knowledge from a known domain to another known or novel domain, extrapolate probable outcomes from both factual and counterfactual circumstances, recognize in its own cognition both the potential for fallacies and the fallacies themselves, synthesize existing knowledge to form original concepts, and acquire awareness of its own cognition and of itself as an independent and unique entity distinct from other entities and from its environment.

It's longer than most of those out there, but it makes it much harder for premature AGI declaration. (I discuss the specifics of the definition more in a blog post)

Expand full comment

“AI” is also the wrong term for what we have now with LLMs. Instead, we should be calling it what it is: “Imitative Machine Learning”. It’s excellent at integrating Known-Knowns that match the source data and training scenarios. It’s excellent at imitating items from its source dataset, and combining them in differing ways.

It is, however, entirely enable to create new, and unable to deal with or create novel works. It’s output is derivative, averaging, “typical” statistically, of what it’s trained on. It’s incapable of originality, of synthesizing the new. It’s “Imitative”, not “creative”.

Expand full comment

Given that the dataset is much larger than the model size and much more efficient than trying to make those combinations for the dataset with code, do you think it’s relevant that it overfits for its usefulness? (I think transformers obviously don’t always overfit, there are mini examples showing they don’t and GPT4 can obviously answer out of distribution questions)

Expand full comment

Aren't human beings 99.9% imitative ?

Expand full comment
author

i would put the # lower, but in any case it is the part that is not imitative that really separates us from the pack, and it is critical to why we so thoroughly dominate the cognitive niche

Expand full comment

Typo: “...entirely ‘unable’ “

Expand full comment

It seems credible to propose that AGI is not almost here. But is AGI coming at some point? And what are the implications for society if it does? As an exercise we might assume that AGI is coming, and ask whether we want that to happen.

If we do want to see AGI emerge, what other revolutionary new powers should we welcome? Today's AI, and the possible development of AGI, aren't the end of the story, right? There are more powers of vast scale coming too, powers that we may currently not be able to even imagine, just as not that long ago we couldn't imagine today's AI. Do we want all these powers? As many as possible? As soon as possible?

Is there any limit to the powers we should give ourselves? Are we blindly assuming that human beings can successfully manage ANY amount of power, delivered at ANY rate?

What seems ironic is that those developing AI are super smart technologists, while at the same time being bad engineers. As example, a bad engineer would proudly boast that they have designed a car that can go 700mph, while ignoring that almost nobody can control a car at that speed. A bad engineer focuses only on what they want to see, and fails to take the entire situation in to account.

It seems true that we aren't on the edge of AGI emergence. We should celebrate that! Because we're no where near ready. And may never be.

Expand full comment
author

agree we are not ready :(

Expand full comment

Possible definitions of "being ready".

Fix the problems of existential scale that we've already created, before creating any more.

1) Get rid of nuclear weapons

2) Meet the climate change challenge in a convincing credible manner.

Prove that we are ready, before assuming that we are. Or, put another way...

Act like adults.

Expand full comment

We will not only need new algorithms but probably would need a new branch of mathematics and philosophy to reach the AGI level. We have made progress on several narrow cognitive capabilities but it does not mean we can just put all of them together and we will get to AGI.

However, when we reach there, it definitely will be our last invention and another question would be what would happen if multiple organizations/countries reach at the same time, can we expect a war between these AGI machines where they or one of them survives at our expense?

An interesting article on the same topic:

https://www.palladiummag.com/2023/08/10/artificial-general-intelligence-is-possible-and-deadly/

The recent wave of progress in deep learning resulted from the unexpected effectiveness of applying GPU acceleration to back-propagation-based training ideas invented in the late 1980s. In between then, neural nets had mostly stagnated as an approach. Where deep learning goes next, and if it goes anywhere novel at all, is hard to know. The next major breakthrough could be another deep learning architecture like the “attention”-based transformer, but it could also come from somewhere else entirely. Perhaps some breakthrough in proof theory or symbol learning could suddenly make GOFAI viable the way deep learning suddenly made neural nets viable. Or the field could stagnate for another 20 years. Further progress may depend on some new branch of mathematics developed by an unrelated neo-Pythagorean cult. The whole thing may even depend on new philosophy or theology. It may be that no one currently working on these problems has the right mix of obscure skills and knowledge.

The real problem of AGI feasibility is philosophical: restating McCarthy’s conjecture, can we untangle what we mean by intelligence enough to build an algorithm for it? Judging by the practical and cross-cultural robustness of the concept, we probably can. We just don’t know how yet.

Expand full comment

I don't think we need new math or new philosophy to reach AGI. We need better ways of representing a problem and a better way of navigating and iterating through the "concept landscape". So, need better models.

Expand full comment

HI thanks for this - did you see this? Studios ain't gonna make movies with AI if they can't protect them...from the H'wood Reporter:

"... intellectual property law has long said that copyrights are only granted to works created by humans, and that doesn’t look like it’s changing anytime soon.

A federal judge on Friday upheld a finding from the U.S. Copyright Office that a piece of art created by AI is not open to protection."😊

Expand full comment
author

Indeed I did! I almost reference it in the caption to the lead photo :)

Expand full comment

Yes as a mostly former high tech patent, copyright and IP lawyer and now actor I followed this issue with interest re: the strike here and I can't see how this is not going to change the tenor of the talks...

Expand full comment

There’s one word in AI we use for when something doesn’t generalize well when exposed to new things: overfitting. Models are overfitted (in a larger sense than usual) for these standardized benchmarks.

Expand full comment

I think most commenters here should read https://en.m.wikipedia.org/wiki/Overfitting

Expand full comment