27 Comments
Feb 3, 2023Liked by Gary Marcus

The core issue here is that human cognition is indeed compositional and systematic. We form and understand a sentence like "Sue eats pizza" by combining its words in an agent-action-theme sentence structure. This ability is systematic, because with the same words we can also form and understand "pizza eats Sue". E.g., we know that this sentence is absurd precisely because we identify "pizza" as the agent.

A cognitive architecture can achieve this only if it has 'logistics of access'.

Newell analyzed this in detail in his Unified Theories of Cognition (1990, e.g., p. 74-77). In short, his analysis is:

1. Local storage of information is always limited. So, with more information to deal with, the system needs 'distal access' to that information.

2. Then it needs to 'retrieve' that information to affect processing.

For example, we can form arbitrary sentences with our lexicon of around 60.000 words or more (e.g., "pizza eats Sue"). Trying to do this based on chaining words from other (learned) sentences will not work, if only because the amount of sentences that can be created is simply too large for that.

Instead, this requires an architecture that provides distal access to arbitrary words in the lexicon, and can combine them in arbitrary sentence structures.

The architecture that Newell analyzed uses symbols to achieve distal access and to retrieve that information for processing (as in the digital computer). It is interesting to note that his use of symbols and symbol manipulation thus derives from the more fundamental requirement of logistics of access.

This opens up a new possibility: to achieve logistics of access without using symbols. For example, with an architecture that achieves distal access but does not rely on retrieval.

An architecture of this kind is a small-world network structure. An example of that is the road network we use for traveling. It is productive and compositional, because it gives the possibility to travel (basically) from any house to any other. Not by direct connections between them, but via dense local roads and sparse hubs. Also, access from a new house to any other can easily be achieved just by connecting that house to its nearest local road.

Neural networks can achieve this for language as well. An interesting consequence is that 'words' are network structures themselves that remain where they are. A sentence is just a connection path that (temporarily) interconnects these word structures. As a consequence, words are always content addressable, which is not the case with symbols or vector operations.

(For a more detailed analysis, see e.g., arxiv.org/abs/2210.10543)

Expand full comment

That paper describing 'linguistic inputs' in children as if that's actually how we make sense of the world is such a great illustration of the head-banging problem at the heart of this. How difficult is it to understand? We make sense of the world and navigate it with our bodies. The Stochastic Parrots term is great (LLMs will only ever be able to output a sort of empty meta-language) but it still suggests an organic being that uses the meta-language to communicate, even if the 'words' it's saying are mimicry - the fact it is using its vocal cords, tongue, beak, to make sounds that attract other animals (us) likely to give it food and attention is not meaningless.

But an LLM is always virtual, never 'needing' anything, never caring about anything, never feeling physical agitations of the nervous system that signals anything about the environment. So what's the point? We got to the stage where chatbots can run mundane tasks that are - at best - boring for humans, at worst generating abuse from angry customers. That's useful, if limited. But trying to 'solve' the problem of meaning? Surely that's a category error; understanding the world is not what a chatbot does or is even supposed to do. Neither is it a problem to be solved, or one that a machine can ever solve unless we literally learn how to give them a human body. And automating creativity? What is WRONG with these people? Automation was supposed to free us up to do the things we love. If creativity isn't exactly that, then what's left? It's all just so weird to me.

Expand full comment

You simply can't boil down the human experience of the world, in all of its senses and essence, to data and nets and algorithms. There will always be something missing. Every person alive today carries with them an entire world of lived experience, memory, and meaning—this is why we have so many emotional or felt expressions and reactions to the words we read, the things we see, the conversations we have, our own inner thoughts, etc. This is why some people click instantly, while others can't stand each other for no apparent reason. This is (probably) one of the reasons why we still do not fully understand the workings of the brain—or the mind.

It's almost like we're trying to play god. Not suggesting we drop all this AI stuff and go back to the campfire—just that there's a difference between trying to replicate an organic, biochemical and probably quantum entity, the human mind, and creating tools and systems to address our real-world problems. Which a lot of AI is already doing of course.

Expand full comment

Back in the early 1980s David Hays and I decided that the entire AI enterprise was intellectually bankrupt. So we decided to look at several technical literatures – cognitive psychology, linguistics and psycholinguistics, neuroscience, developmental psych, comparative psyc – and see what we could come up with. The resulting paper: Principles and Development of Natural Intelligence. Nothing on the current scene comes close. Here's the abstract:

The phenomena of natural intelligence can be grouped into five classes, and a specific principle of information processing, implemented in neural tissue, produces each class of phenomena. (1) The modal principle subserves feeling and is implemented in the reticular formation. (2) The diagonalization principle subserves coherence and is the basic principle, implemented in neocortex. (3) Action is subserved by the decision principle, which involves interlinked positive and negative feedback loops, and resides in modally differentiated cortex. (4) The problem of finitization resolves into a figural principle, implemented in secondary cortical areas; figurality resolves the conflict between pro-positional and Gestalt accounts of mental representations. (5) Finally, the phenomena of analysis reflect the action of the indexing principle, which is implemented through the neural mechanisms of language. These principles have an intrinsic ordering (as given above) such that implementation of each principle presupposes the prior implementation of its predecessor. This ordering is preserved in phylogeny: (1) mode, vertebrates; (2) diagonalization, reptiles; (3) decision, mammals; (4) figural, primates; (5) indexing. Homo sapiens sapiens. The same ordering appears in human ontogeny and corresponds to Piaget's stages of intellectual development, and to stages of language acquisition.

You can download it here: https://www.academia.edu/235116/Principles_and_Development_of_Natural_Intelligence

While you're at it, take a look at a paper that mathematician Miriam Yevick published in 1975: Holographic or fourier logic, Pattern Recognition, Volume 7, Issue 4, December 1975, Pages 197-213, https://doi.org/10.1016/0031-3203(75)90005-9. As far as I can tell, that paper has dropped off the face of the earth, which is a sign of the intellectual myopia that characterizes the academy. The paper is, in effect, a mathematical argument on why both symbolic and distributed neural networks are necessary to make sense of the world. Here's the abstract:

A tentative model of a system whose objects are patterns on transparencies and whose primitive operations are those of holography is presented. A formalism is developed in which a variety of operations is expressed in terms of two primitives: recording the hologram and filtering. Some elements of a holographic algebra of sets are given. Some distinctive concepts of a holographic logic are examined, such as holographic identity, equality, contaminent and “association”. It is argued that a logic in which objects are defined by their “associations” is more akin to visual apprehension than description in terms of sequential strings of symbols.

Here's a short commentary on that paper, Miriam Yevick on why both symbols and networks are necessary for artificial minds, https://new-savanna.blogspot.com/2022/06/miriam-yevick-on-why-both-symbols-and.html

Expand full comment

You have said that LLMs fail at abstraction. To the contrary, they are astoundingly good at abstraction. They perform generalization over examples, they treat slots and fillers correctly, they exhibit convincing knowledge of a great wealth of everyday concepts and relations. They very effectively apply context derived from combinations of the prompt and prior discourse. Certainly, this is all within the confines of linguistic competence---but that is nothing to sneeze at! As the Mahowald et al preprint points out, they fall short on cognition grounded in meaning connected to commonsense knowledge; formal reasoning; running situational awareness; theory of mind; agency; and goals.

The nonetheless remarkable abilities that LLMs do have require some means to mix and match elements representing objects, events, attributes, and relations. (Albeit, oftentimes illogically and sometimes incoherently.) How do they do it, in view of your citations?

The question for Connectionist models has always been, how do you get combinatoric mixing in a statically wired network? The answer, it seems, is through gating. The LSTM or GRU gates of RNNs, and more recently, transformer attention heads are the secret sauce the NN folks never had until recently. Oh, and don't overlook the magical representational power of vector embeddings.

What LLMs are not good at is the rest of executive function, because their cognitive architectures are primitive. In current form, they lack structured knowledge representations and access; medium-term memory storage and retrieval; sequential step-wise processing; a context stack; and procedural search with backtracking. Some of these functions seem to be kind of kludged from the straight transformer architecture, but as you repeatedly point out they are at this moment very crude. These architectural deficiencies are glaringly obvious and it would be extremely risky to believe they are overlooked by the research community and will remain un-addressed for a long time.

This is to say, while I really appreciate your outspoken justified criticisms of LLMs, history is moving very fast now. Please don't get on the wrong side of it by thinking we're still back at 1990.

Expand full comment

Thanks for an informative post. I will definitely read it. I am not an infant or a stochastic parrot but a weary human adult (female). I cannot be sure but somehow "farmed out to sweatshops" definitely sounds appropos in some way.

I find it depressing too that asinine stuff like anything Musk (histrionics about skynet) or less crazy but still industry funded shilling (EFF, public-private partnerships) stands in the path of serious regulation of a lot of this drivel that has real world impacts on the environment and any sort of democracy.

A lot of performative gate keeping charlatnry..

Expand full comment

Groundhog Day 1: We invented one power of existential scale which puts everything at risk.

Groundhog Day 2: We're inventing another power of existential scale which might put everything at risk.

Groundhog Day 3: Our goal is to repeat this process on every following day, preferably at an ever faster pace.

Groundhog Day 4: We finally figure out that we're dumber than groundhogs, on the day when it becomes too late to do anything about it.

Expand full comment

This post illustrates beautifully how we're still in the same boat that von Neumann and Minsky built for us nearly a century ago.

Expand full comment

What if the neuron is simply the best method that evolution discovered to deal with the complexity of the world?

Neural networks, while powerful for dealing with the patterns of the universe, have the same problems that evolution was dealing with. The need to reduce complexity to a managable size. The ability to discover rules that generalize the massive amounts of information that it can't retain.

And no matter how many neurons you throw at it, no matter how much data you train it with - you get the same results.

Silly errors, hallucinations, and created memories to name a few.

Expand full comment

As someone who grew up on Hubert Dreyfus critiques of GOFAI, this is fascinating...

Would it help or hinder to describe it this way:

GOFAI presumed that human intelligence could be codified in terms of _deduction_

Connectionist / ML strategies presume that human intelligence can be replicated with sufficiently powerful (probabilistic) _induction_

Neither acknowledges what infants are also expert at: _abduction_ = generating and generalising patterns (and rules) from a small set of experiences.

?

Expand full comment

(I'll check out Two Distant Strangers and recommend Hulu's Palm Springs—starring the normally-execrable Andy Samberg—for a comedic, post Groundhog Day time-loop movie.)

Expand full comment