The Road to AI We Can Trust

Share this post

Does AI really need a paradigm shift?

garymarcus.substack.com

Does AI really need a paradigm shift?

Probably so. A response to Scott Alexander’s essay, “Somewhat Contra Marcus On AI Scaling”

Gary Marcus
Jun 11, 2022
31
26
Share this post

Does AI really need a paradigm shift?

garymarcus.substack.com

The provocative public conversation I am having with Scott Alexander, of SlateStarCodex fame, already a meme, continues!

In a fresh reply to my “What does it mean when AI fails?”, Alexander has put forward his second stimulating critique of the week, “Somewhat Contra Marcus On AI Scaling”.

Which is not to say it’s perfect; on the other hand, one doesn’t need perfection in order to be provocative. I will respond briefly to the disappointing part, and then we then get to the good stuff.

Strawman, Steelman

In general, Alexander is known for being exceptionally fair to ideas he doesn’t particularly like, even if at times that makes him unpopular. An entry on Quora perfectly distills this laudable aspiration:

“[Scott Alexander] optimizes for enlightenment, rather than for winning arguments. Thus, when considering an issue, he will actively seek out and often formulate the strongest arguments (that is, steelman) [for] both sides.”

I only wish he had extended the same courtesy to my position.

To take one example, Alexander puts words like prove or proven in my mouth:

“Marcus says GPT’s failures prove that purely statistical AI is a dead end”

and

GPT certainly hasn’t yet proven that statistical AI can do everything the brain does. But it hasn’t proven the opposite, either [as if Marcus said that it had].

But that’s a strawman. In reality I would never say that I have proven anything; what I do as a scientist is to weigh evidence and suggest research directions. I say that we have given the scaling hypothesis a really good look (with a larger budget than all but a handful of projects in history), and as such the failures of large scale systems are evidence (not proof) that we ought to seriously consider alternatives, e.g., here:

Rather than supporting the Lockean, blank-slate view, GPT-2 appears to be an accidental counter-evidence to that view […]

GPT-2 is both a triumph for empiricism, and, in light of the massive resources of data and computation that have been poured into them, a clear sign that it is time to consider investing in different approaches.

In making it sound like I have declared proof when I have never made such declarations, Alexander paints me as an extremist, rather than a scientist who weighs evidence and uncertainty. Where has his steelperson aspiration gone?

1

Another rhetorical trick is to paint me as a lone, lunatic voice, as if I were the only person doubting that scaling will get us to AGI (whatever that is) when in fact there are loads of people with similar concerns.

Melanie Mitchell, for example, has repeatedly emphasized the importance of representing meaning, above and beyond anything that we currently know GPT-3 to do. Emily Bender, Margaret Mitchell and Timnit Gebru have derided Large language models as stochastic parrots. (Not entirely fair to the parrots, but you get the idea.)

Rising star Abebe Birhane has written a withering criticism of the ways in which LLMs rely of objectionable data scraped from the internet. Ernie Davis, I mentioned last time; almost all of our joint rejoinders draw on his deep work on common sense. Judea Pearl has shouted over and over that we need deeper understanding and written a whole book about the importance of causality and how it is missing from current models. Meredith Broussard and Kate Crawford have written recent books sharply critical of current AI. Meta researcher Dieuwke Hupkes has been exposing limits in the abilities of current LLM’s to generalize.

As an important aside, a lot of those other voices are women, and while I am certainly flattered by all the attention Alexander has been giving me lately, it’s not a good look to make this whole discussion sound like one more white guy-on-white guy debate when (a) so many strong female (and minority) voices have participated, and (b) so many of the unfortunate consequences of a webscraping/big data approach to AI are disproportionately borne by women and minorities.

In inaccurately portraying me as a lone crusader, Alexander has not given the scaling is not enough for AGI view the “steelman” treatment that he is known for delivering.

Phew. Now for the good part!

What’s the chance that AI needs a paradigm shift?

Bets are evidently in the air. I bet Elon Musk $100,000 that we wouldn’t have AGI by 2029 (no reply), and in similar vein tried to get Alexander to go in on a sucker bet about the capabilities of GPT-4. Alexander wisely declined, but countered with five bets of his own:

On the first, we are basically in agreement. I in no way doubt that there is at least a little bit more headroom left for large language models. The real controversy is whether that’s enough.

On the second, the notion of “deep learning based model” is too vague; it might apply to a pure deep-learning model, but e.g., also to any kind of neurosymbolic hybrid in which deep learning was just one of a dozen mechanisms. It’s just not clear that anything serious is excluded.

There is also some softness in the formulation of the third bet, where the key word is “descendant”. If, for example, The World’s First Successful AGI was a 50:50 hybrid of large language models and something symbolic like CYC, it might overall look relatively little like GPT-3, but its champions might still be tempted to declare victory. At the same time, I would (rightly) be entitled to declare moral victory for neurosymbolic AI. Both symbols and LLMs could note their genes in the grandchild. Hybrid vigor for the win!

But then things get interesting.

Paradigm shift (raised in 4 and 5) is exactly what this whole discussion is really about. Thank Kuhn, Alexander was brave enough to say it out loud.

What we all really want to know, as a global research community, is are we approaching things properly right now, or should we shift in some way?

Personally I’d put the probability that we need to shift at 90%, well above the 60% that Alexander suggests, and I would put the probability that we will need to embrace symbol-manipulation as part of the mix at 80%, more than double Alexander’s 34%. Others may put the probability on some kind of paradigm shift (perhaps not yet known) even higher. Just yesterday Stanford PhD student Andrey Kurenkov put the probability at nearly 100%, based on arguments he gave last year about GPT-3 lacking external memory:

Twitter avatar for @andrey_kurenkov
Andrey Kurenkov 🇺🇦 @andrey_kurenkov
It's pretty obvious that 'simply scaling' a GPT-style LLM will not lead to AGI by virtue of the inherent limits of the model architecture and training paradigm.
10:59 PM ∙ Jun 10, 2022
7Likes1Retweet

A day or two earlier, the most important empirical article of the week dropped: Big Bench [link], a massive investigation of massive language models that has a massive list of 442 authors. Large language models; even larger papers! (How massive is the author list? I sent the paper to Scott Aaronson, who said he would read it on the plane; half an hour later he writes back: “I've been reading this paper for the past 10 minutes but haven't yet made it past the author list.”)

The basic finding was: loads of things scale, but not at all ( as foreseen both by Kurenkov in his essay last year and in my much lampooned but basically accurate essay, Deep learning is hitting a wall). Every one of the 442 authors signed off on a paper that contains a conclusion statement that I excerpt here:

The massive paper looks at scaling on many measures, and sees progress on some– but not others. The stuff that isn’t scaling is the “wall”.

The emphasis here is of course is on the words “will require new approaches, rather than scale alone.” That’s why we need new approaches, exactly as I have been arguing.

Meanwhile, if you were to believe what you read on Twitter, you’d think that my biggest rival in the universe is Yann LeCun. While there’s no denying that the two of us have often clashed, on this point—the point about scale alone not likely to be enough, and about the need for some new discovery (i.e., paradigm shifts)—we are actually in complete agreement.

For example, LeCun recently posted this sequence of tweets (excerpted from a long, excellent thread that I discuss here):

Twitter avatar for @ylecun
Yann LeCun @ylecun
(1) the research community is making *some* progress towards HLAI (2) scaling up helps. It's necessary but not sufficient, because.... (3) we are still missing some fundamental concepts 2/N
9:15 PM ∙ May 17, 2022
407Likes14Retweets
Twitter avatar for @ylecun
Yann LeCun @ylecun
(4) some of those new concepts are possibly "around the corner" (e.g. generalized self-supervised learning) (5) but we don't know how many such new concepts are needed. We just see the most obvious ones. (6) hence, we can't predict how long it's going to take to reach HLAI. 3/N
9:16 PM ∙ May 17, 2022
369Likes17Retweets

Amen.

§

Ok, now for the hard part: what we should count as a paradigm shift?

The best that I have seen on that is a thread from a deep learning/NLP postdoc at Edinburgh, Antoni Valerio Miceli-Barone, who asked the field for some tangible, falsifiable predictions. When he invited me into the thread, I made some predictions, tied not to time but to architecture:

Twitter avatar for @GaryMarcus
Gary Marcus 🇺🇦 @GaryMarcus
@AVMiceliBarone @FelixHill84 predictions: whenever AGI comes: 👉large-scale symbolic knowledge will be crucial 👉explicit cognitive models will be crucial 👉operations over variables (including storing, retrieving and comparing values) will be crucial 👉an explicit type/token distinction will be crucial
1:43 PM ∙ May 17, 2022
23Likes6Retweets

(That’s basically what Alexander’s #4 is about)

Barone however insisted on something more; I asked him to define his terms; in a short tweet he characterized what we might count as the current regime:

Twitter avatar for @AVMiceliBarone
Antonio Valerio Miceli Barone @AVMiceliBarone
@GaryMarcus @FelixHill84 Let's say CNNs+RNNs+Transformers, no writable latent discrete memory. I'd consider "discrete attractors" models (e.g. capsules, slot attention, clustering) as innovations, since while they already exist to some extent they are not applied at scale.
2:05 PM ∙ May 17, 2022

“Paradigm shift” then becomes operationally defined as anything not in Miceli-Barone’s first sentence. E.g., if the field were to turn from simply scaling (GPT-3 is pretty much just GPT-2 but bigger) to using large language models as only one component in a larger architecture, with things like writable/readable discrete memory for symbolic propositions, I certainly think we should view that as a paradigm shift.

I, as long term neurosymbolic advocate [link], would of course feel particularly vindicated if that shift was about building bridges to traditional symbolic tools (like storing, retrieving, and comparing propositions) from long-term memory, as it is with the explicitly neurosymbolic MRKL paper from AI21 that I mentioned a few days ago,

That said, I, of course, could be right about foreseeing the need for a paradigm shift, but wrong about what that paradigm shift turns out to be.

§

LeCun’s May 17 thread lays out some of the many challenges ahead, any one of which, on its own, might in fact demand an innovation radical enough to count as a paradigm shift, neurosymbolic or otherwise:

Twitter avatar for @ylecun
Yann LeCun @ylecun
I believe we need to find new concepts that would allow machines to: - learn how the world works by observing like babies. - learn to predict how one can influence the world through taking actions. 6/N
9:17 PM ∙ May 17, 2022
598Likes47Retweets
Twitter avatar for @ylecun
Yann LeCun @ylecun
- learn hierarchical representations that allow long-term predictions in abstract spaces. - properly deal with the fact that the world is not completely predictable. - enable agents to predict the effects of sequences of actions so as to be able to reason & plan 7/N
9:18 PM ∙ May 17, 2022
439Likes22Retweets
Twitter avatar for @ylecun
Yann LeCun @ylecun
- enable machines to plan hierarchically, decomposing a complex task into subtasks. - all of this in ways that are compatible with gradient-based learning. The solution is not just around the corner. We have a number of obstacles to clear, and we don't know how. 8/N
9:19 PM ∙ May 17, 2022
440Likes21Retweets

To all this I would add the capacity to build, interrogate, and reason about long-term cognitive models of an ever-changing world [link next decade], stored in some kind of long-term memory that allows for trustworthy storage and retrieval.

So: can we get to AGI and reliable, no-goof reasoning, handling all the challenges that LeCun and I have been discussing, with scaling and CNNs + RNNs + Transformers alone?

In my view, no way; I think the actual odds are less than 20%.

So Scott, challenge accepted, I am on for your bets #4 and #5.

§

The other Scott, Aaronson said, yesterday, “From where I stand, though, the single most important thing you could do in your reply is to give examples of tasks or benchmarks where, not only does GPT-3 do poorly, but you predict that GPT-10 will do poorly, if no new ideas are added”

As it happens that Miceli-Barone has posted precisely the same question [link] in May; here’s what I said then, and stand by:

Twitter avatar for @GaryMarcus
Gary Marcus 🇺🇦 @GaryMarcus
@AVMiceliBarone @FelixHill84 I think pure deep learning so defined will fail at Comprehension Challenge, proposed here: newyorker.com/tech/annals-of… and developed a bit further here: ojs.aaai.org//index.php/aim…. Working w some folks to try to implement for real.
2:10 PM ∙ May 17, 2022

If GPT-10 is just like GPT-3, no extra bells and whistles, just bigger, and able to read whole novels and watch whole movies and answer subtle and open-ended questions about characters and their conflicts and motivation, and tell us when to laugh, I promise to post a YouTube video admitting I was wrong.

§

I give LeCun the last word, once again from his May 17th manifesto:

Twitter avatar for @ylecun
Yann LeCun @ylecun
I really don't think it's just a matter of scaling things up. We still don't have a learning paradigm that allows machines to learn how the world works, like human anspd many non-human babies do. 4/N
9:16 PM ∙ May 17, 2022
499Likes50Retweets

- Gary Marcus

1

Something similar happens when Alexander sets up what AGI is. There are two extreme views of AGI that one might imagine: one (easily defeated) in which the whole point is to mimic humans exactly, and the other (the steelman) in which AGI tries in some respects in which humans are particularly flawed to do better than humans. For example, humans are lousy at arithmetic. But you shouldn’t be able to declare “I have made an AGI” by showing you can build a machine that makes arithmetic errors. Alexander’s long digression on Luria’s famous cross cultural reasoning work on untutored subjects applies only to the strawman, not the steelman.

Of course, as Miceli-Barone pointed out to me, one could ask whether a particular kind of error might reveal something about whether a given model was a good model of human cognition; that’s actually what my dissertation work with Steven Pinker was about. That’s a story for another day :)

26
Share this post

Does AI really need a paradigm shift?

garymarcus.substack.com
26 Comments
rif a saurous
Jun 12, 2022

Do you consider works like "Memorizing Transformers" (https://arxiv.org/abs/2203.08913), which augments transformers with an explicit external memory allowing much longer contexts, to already represent a "paradigm shift?"

Expand full comment
Reply
3 replies by Gary Marcus and others
Phil Tanny
Nov 26, 2022

You ask, "What’s the chance that AI needs a paradigm shift?"

I realize my answer doesn't really address the intent of your question, but my answer would be, yes, there's a 100% chance that our relationship with AI requires a paradigm shift.

First, we should be aware we probably can't trust those working in the field to be unbiased about what kinds of paradigm shifts may be required. If one has invested a decade or more on becoming an expert in AI, and if one's income depends on the continuation of that expert status, and if the future of one's family depends on that income, it would be unreasonable to expect such a person to be fully objective and detached regarding the future of AI.

Second, we need to determine who it is that can best lead an inquiry in to the following kind of questions. How many more technologies of vast scale do we intend to create? Seriously, how many? Where is the evidence that we know how to make such technologies safe? If we don't have proof we can make such technologies safe, and a net benefit to humanity, what is the rational argument for creating ever more, ever larger such technologies, at an ever accelerating rate?

Here's evidence to support the relevance of such questions. We currently have thousands of massive hydrogen bombs aimed down our own throats, an ever present existential threat we rarely find interesting enough to discuss, even in presidential campaigns where we are selecting a single individual to have sole authority over the use of these weapons.

Is that well established fact evidence that we are mature enough to justify radically accelerating the knowledge explosion?? If your teenage son continually crashed his bicycle, would you respond by buying him a motorcycle, and then a race car, and then an airplane, so he can go faster, faster and faster, seemingly without limit?

I sincerely welcome all arguments to the contrary, but so far I don't get the impression that experts in this field are really that interested in paradigm shifts of any substance. It seems more like a "religion" where we are first required to accept the need for AI as a matter of faith, and only then are we allowed to consider paradigm shifts.

Expand full comment
Reply
24 more comments…
TopNewCommunity

No posts

Ready for more?

© 2023 Gary Marcus
Privacy ∙ Terms ∙ Collection notice
Start WritingGet the app
Substack is the home for great writing