27 Comments

Do you consider works like "Memorizing Transformers" (https://arxiv.org/abs/2203.08913), which augments transformers with an explicit external memory allowing much longer contexts, to already represent a "paradigm shift?"

Expand full comment

yes, it is a clear change towards neurosymbolic from decades of neural nets avoiding symbolic operations. how much more symbolic can you get than an external memory of cached (key, value) pairs? and once you are down that path why not start applying symbolic reasoning operation to those stored values of variables?

Expand full comment

..and intriguingly this external memory/symbol manipulation system can reasonably be differentiable/learnt, which Marcus is not addressing

Expand full comment

Wasn't a key point of the Memorizing paper that the memory wasn't differentiable?

Expand full comment

'Rising star Abebe Birhane has written a withering criticism of the ways in which LLMs rely of objectionable data scraped from the internet. '

This, however important in itself, is basically irrelevant to whether 'scaling will get us to AGI' no? You could have a very intelligent AI that had racist and sexist biases, just like some distinguished scientists have also been racist cranks. If scaling got us that it would be *bad*, but it would still mean Alexander was more right about the potential of just scaling stuff up than you are.

Trying to make debates on fairly specific claims about AI, like the one about whether scaling will lead to human-level-or-better performance on all or most tasks, really about whether you are generically pro/anti AI (or worse still tech as industry), strikes me as a recipe for confusion and polarization.

Expand full comment

You ask, "What’s the chance that AI needs a paradigm shift?"

I realize my answer doesn't really address the intent of your question, but my answer would be, yes, there's a 100% chance that our relationship with AI requires a paradigm shift.

First, we should be aware we probably can't trust those working in the field to be unbiased about what kinds of paradigm shifts may be required. If one has invested a decade or more on becoming an expert in AI, and if one's income depends on the continuation of that expert status, and if the future of one's family depends on that income, it would be unreasonable to expect such a person to be fully objective and detached regarding the future of AI.

Second, we need to determine who it is that can best lead an inquiry in to the following kind of questions. How many more technologies of vast scale do we intend to create? Seriously, how many? Where is the evidence that we know how to make such technologies safe? If we don't have proof we can make such technologies safe, and a net benefit to humanity, what is the rational argument for creating ever more, ever larger such technologies, at an ever accelerating rate?

Here's evidence to support the relevance of such questions. We currently have thousands of massive hydrogen bombs aimed down our own throats, an ever present existential threat we rarely find interesting enough to discuss, even in presidential campaigns where we are selecting a single individual to have sole authority over the use of these weapons.

Is that well established fact evidence that we are mature enough to justify radically accelerating the knowledge explosion?? If your teenage son continually crashed his bicycle, would you respond by buying him a motorcycle, and then a race car, and then an airplane, so he can go faster, faster and faster, seemingly without limit?

I sincerely welcome all arguments to the contrary, but so far I don't get the impression that experts in this field are really that interested in paradigm shifts of any substance. It seems more like a "religion" where we are first required to accept the need for AI as a matter of faith, and only then are we allowed to consider paradigm shifts.

Expand full comment

Models like GPT3 are super sets of the human linguistic encoder/decoder connectme. They do an amazing job of guessing “what could a hyper knowledgeable and competent human say now?”. They lack memetic cohesion and manipulation because researchers are still pointed at the notion that a mind is a monolithic thing rather than a community of competing entities.

If you’re looking for a paradigm shift that rings the AGI is arriving any moment bell, that will be it. Neurons have individual lives and priorities based on partial/curated information of the world around them and are always changing their stance RE that world. None of the projects openly being discusses approach the problem from that direction yet.

Expand full comment

The paradigm shift is here but it is at the level of Epistemology, not algorithms. See chapter 7 on https://experimental-epistemology.ai

For implementation, see chapter 9.

Expand full comment

From the height of my ivory tower where I am slowly going artificially wise, I can say that "explicit cognitive models" may not be so crucial. What I mean by that is that we already possess a powerful tool for modeling - language. We know how to figure out who "they" are if one word changes, we know what makes a journey a good metaphor for love, we can compare 6 feet to 180 cm, we know if a joke is funny or that it was funny the first time we heard it, we know how to teach all that to a machine, stop, do we? One way of looking at neurons is as if those are summators, the other way is that those are <key,value> storages. Will it constitute a paradigm shift?

Expand full comment

Do you think neurosymbolic methods would also be needed to replicate the kind of nonverbal intelligence seen in nonhuman mammals/birds with more complex social intelligence and problem-solving abilities, like dolphins or chimps or crows? Also, what do you think of the "enactivist" idea that the only way to get real human-like understanding of sensory inputs (and linguistic representations of things we sense) is to have motor outputs along with sensory inputs, so that the mind is constantly predicting the changes of sensory inputs that would accompany various motor outputs? (e.g. how a given surface I see visually would likely feel if I reached out and touched it, how the impulses coming through my optic nerves will change if I move my eyes or head in a given way, and so forth)

Expand full comment

I want to propose organizing a large reading group of the masterpiece itself, Thomas Kuhn's Structure of Scientific Revolutions (https://www.amazon.com/Structure-Scientific-Revolutions-Thomas-Kuhn/dp/0226458083). We can use https://www.pubpub.org/ to host a distributed book club and facilitate deep engagement and get the entire community coordinated to ensure a more epistemically robust conversation going forward. I would also suggest we get a multidisciplinary group of moderators/leaders that draws from outside the existing pool of very opinionated voices that dominate this debate. I am deadly serious. I want to do this.

Expand full comment

What do you think of the argument that simply scaling the current approach won't work because it's too statistically and computationally inefficient? If true, this strikes me as a show-stopping objection regardless of how good astronomically large models would be; if they can't practically be built, it doesn't matter.

I'm thinking especially of the superexponential runup in compute needed to train progressively larger "foundation models" over time (figure about halfway down, I think adapted from arXiv:2202.05924): https://www.economist.com/interactive/briefing/2022/06/11/huge-foundation-models-are-turbo-charging-ai-progress

As the economists say, if something can't go on forever, it'll stop. If we can only get the current approach to AGI by superexponentially scaling resource consumption, that's another way of saying it won't work. No one will spend $500 billion training GPT-5 on the way to AGI at GPT-10.

Expand full comment

The only technology of meaning extraction created by nature is transcription, splicing and translation of genetic code and the model of this system for the interpretation between two languages curated by a human is artificial general intelligence.

Expand full comment

Also, what is your analysis on the recent suspension of a Google's employee, for his release of sensitive information regarding what he determined to be a sentient entity?

~~...particularly, I request your analysis on the conversation with the ai (rather than the employee's suspension):

____________

Conversation with lambda ai:

https://s3.documentcloud.org/documents/22058315/is-lamda-sentient-an-interview.pdf

Expand full comment

Your proposals:

"👉large-scale symbolic knowledge will be crucial

👉explicit cognitive models will be crucial

👉operations over variables (including storing, retrieving and comparing values) will be crucial

👉an explicit type/token distinction will be crucial"

____________

1. What is the algorithmic/mathematical/demonstrated evidence, that the models don't already posses some degree of the proposals of LeCun's or yours ?

2. How do we know whether such things if not already present, are subject to automatic generation from higher scale? (i.e. these things may engender naturally at higher scales?)

Expand full comment

The paradigm shift is in building a *lower* half of the Net by assigning *every* experienced pattern a node. Then associate that node with classes or rewards values a.k.a. strength of synaptic connections. I've being irritated by boasts of LLM sizes so have built a GPT-Teaser [https://github.com/MasterAlgo/GPT-Teaser] which builds a sequence [or text] model with speed of a billion parameters an hour. It learns continuously, classifies and generates. Cheap old server, no GPU. There is a list of generations here [https://www.linkedin.com/posts/bullbash_neuromorphic-ann-growing-billions-of-connections-activity-6873695912426917889-tecK?utm_source=linkedin_share&utm_medium=member_desktop_web].

It does "Shakespeare on a blade" - 15 minutes training on 4Gb of RAM with instant generation like:

"QUEENE:

Dost grant me, hedgehog? then, God grants that we have no staff, no stay.

O Clifford, devise excuses for thy faults."

Sure, those models are pure "parrots". The explanation of huge LLM success in that those said LLMs are approximating *growing* nets with nodes dedicated to particular patterns.

And if you have "dedicated nodes" - which conventional LLMs do not - you may build "synaptic connections" a.k.a. logical/symbolical relations between them. Plus a whole bunch more fascinating abilities/properties. That would be that mysterious neurosymbolic superstructure, a lot of us are looking for.

"Stochastic growing into symbolic" by gradually changing properties of neurons in layers. I'm slowly writing a better then that explanation, supported by code and experiments.

Shortly: no stochastic models [LLMs, DALLEs, etc] will ever be intelligent because their architectures do not allow establish relations [lets call them symbolic] between high level nodes . I stopped experimenting after "growing" Grandmother Cells on the MNIST dataset. Those grannies are represented by 40+ pixels with search space is C(28x28, 40+). Should not be proceeding further in my basement lab...

I promised Gary to stop bugging him, but that paradigm shift quest is so tempting ... I'm on Gary's side, hope he forgives me :-) Peace.

Expand full comment

From the BigBench paper:

"Limitations that we believe will require new approaches, rather than increased scale alone, include an inability to process information across very long contexts (probed in tasks with the keyword context length), a lack of episodic memory into the training set (not yet directly probed), an inability to engage in recurrent computation before outputting a token (making it impossible, for instance, to perform arithmetic on numbers of arbitrary length), and an inability to ground knowledge across sensory modalities (partially probed in tasks with the keyword visual reasoning)."

It's the "inability to engage in recurrent computation before outputting a token" that has my attention, as I've been thinking about that one for awhile. I note that our capacity for arithmetic computation is not part of our native endowment. It doesn't exist in pre-literate cultures and our particular system originated in India and China and made its way to Europe via the Arabs. We owe the words "algebra" and "algorithm" to that process.

Think of that capacity as a very specialized form of language, which it is. That is to say, it piggy-backs on language. That capacity for recurrent computation is part of the language system. Language involves both a stream of signifiers and a stream of signifieds. I think you'll find that the capacity for recurrent computation is required to manage those two streams. And that's where you'll find operations over variables and an explicit type/token distinction.

Of course, linguistic fluency is one of the most striking characteristics of these LLMs. So one might think that architectural weakness – for that is what it is – has little or no effect on language, whatever its effect on arithmetic. But I suspect that's wrong. We know that the linguistic fluency has a relatively limited span. I'm guessing effectively and consistently extending that span is going to require the capacity for recurrent computation. It's necessary to keep focused on the unfolding development of a single topic. That problem isn't going to be fixed by allowing for wider attention during the training process, though that might produce marginal improvements.

The problem is architectural and requires an architectural fix, both for the training engine and the inference engine.

Expand full comment

"To all this I would add the capacity to build, interrogate, and reason about long-term cognitive models of an ever-changing world [link next decade], stored in some kind of long-term memory that allows for trustworthy storage and retrieval."

One more comment. This is a great idea but who would be in charge of this "trustworthy storage"?

Expand full comment