Happy Groundhog Day, The AI Edition
Decades of stumbling over and over in the same places. When will we ever learn?
It’s Groundhog Day again! 30 years ago today, Bill Murray was starring in the iconic film that bears the same name. Poor guy keeps living the same day, over and over, trying to make better choices, and winds up right back where he started. (Travon Free’s Oscar-winning short Two Distant Strangers is a powerful riff on the same theme, highly recommended.)
30 years ago, neural networks were, as they are now, in the midst of a rebirth. The first one collapsed; this one has more staying power.
But a lot of things that a bunch of us worried about 30 years ago remain true. Neural networks in the early 1990s struggled with compositionality (interpreting complex, hierarchical wholes in terms of their parts), abstraction, and plain old silly errors.
Two famous papers, one by Fodor and Pylyshyn, the other by Pinker and Prince, both published in Cognition, had raised those issues, in 1988, with respect to some simple neural architecture that were popular then. Word on the connectionist (as we called it back then) street was that hidden layers might solve all those problems. I didn’t buy it, and showed in 2001 that pretty much the same problems remained when hidden layers (then just one, now many) were introduced. In a 1999 article I showed that some of what was challenging for neural nets of the day was trivial for human infants, even with tiny amounts of data.
Fast forward to now, and it seems to me that everything we were raising back in the day still looms large. Silly errors certainly do, and problems of abstraction are widespread in large language models. A subtler problem that I emphasized in 2001, an inability to keep track of knowledge of instances independently of kinds, seems to underlie the rampant hallucinations that allow large language models to say things like “Elon Musk died in a car crash in 2018”, even when nothing in the corpus says so, and massive amounts of data contradict the claim, because they blend together information about kinds with information about specific individuals, in haphazard ways.
I know it really irritates a lot of people when I say deep learning had hit a wall, so let me be perfectly clear: it counts as hitting a wall if you keep trying to do the same thing over and over and don’t succeed, even if you succeed on other dimensions. Deep learning has undeniably made amazing progress, on other dimensions, but the fact is it hasn’t solved the core problems that many of us thought were essential 30 years ago. Turns out you don’t need to solve them to make beautiful pictures, or decent if error-filled conversation, but anyone who thinks we can get to AGI without solving abstraction, compositionality, and the type-token distinction is kidding themselves.
Here’s the best abstract I read today:, a careful study of compositionality in modern architectures:
Straight out of 1993. The measures are new, the model is new, but the stumbling block remains the same.
One of my favorite papers from last year was Yasaman Razeghi’s analysis of LLMs doing arithmetic. Same story; only the architectures and training regimes have changed. The troubles with abstraction persist.
Melanie Mitchell’s book, Bender et al’s “Stochastic Parrots”; all of these more recent criticisms of deep learning help establish that some of the basic issues that were known long ago, with predecessors to current models, still remain even today.
PS One more thing—I quit studying AI, for a while, in the late 1990s, because so much just seemed like handcrafted hacks. Not sure how much has changed in that regard, either, except that the hacks that were once written by graduate students are now farmed out to sweatshops, for a new form of handcrafting:
Back then we needed hacks because our AI didn’t really understand the world; a lot of systems were bags of tricks that were fairly superficial. They were brittle; they might work in the lab, but not in the real world. Our new systems work better, but they are still superficial, fine for lab demos but often at a loss in the real world. Death by a thousand edge cases is still the norm, and probably will be, for another few years, unless and until people learn a little bit of history.
In that spirit, here are a few suggested readings from the old days, in honor of Groundhog Day:
Things turned out ok for Bill Murray, eventually. Hopefully they will for us, too.
Gary Marcus (@garymarcus), scientist, bestselling author, and entrepreneur, is a skeptic about current AI but genuinely wants to see the best AI possible for the world—and still holds a tiny bit of optimism. Sign up to his Substack (free!), and listen to him on Ezra Klein. His most recent book, co-authored with Ernest Davis, Rebooting AI, is one of Forbes’s 7 Must Read Books in AI.
The core issue here is that human cognition is indeed compositional and systematic. We form and understand a sentence like "Sue eats pizza" by combining its words in an agent-action-theme sentence structure. This ability is systematic, because with the same words we can also form and understand "pizza eats Sue". E.g., we know that this sentence is absurd precisely because we identify "pizza" as the agent.
A cognitive architecture can achieve this only if it has 'logistics of access'.
Newell analyzed this in detail in his Unified Theories of Cognition (1990, e.g., p. 74-77). In short, his analysis is:
1. Local storage of information is always limited. So, with more information to deal with, the system needs 'distal access' to that information.
2. Then it needs to 'retrieve' that information to affect processing.
For example, we can form arbitrary sentences with our lexicon of around 60.000 words or more (e.g., "pizza eats Sue"). Trying to do this based on chaining words from other (learned) sentences will not work, if only because the amount of sentences that can be created is simply too large for that.
Instead, this requires an architecture that provides distal access to arbitrary words in the lexicon, and can combine them in arbitrary sentence structures.
The architecture that Newell analyzed uses symbols to achieve distal access and to retrieve that information for processing (as in the digital computer). It is interesting to note that his use of symbols and symbol manipulation thus derives from the more fundamental requirement of logistics of access.
This opens up a new possibility: to achieve logistics of access without using symbols. For example, with an architecture that achieves distal access but does not rely on retrieval.
An architecture of this kind is a small-world network structure. An example of that is the road network we use for traveling. It is productive and compositional, because it gives the possibility to travel (basically) from any house to any other. Not by direct connections between them, but via dense local roads and sparse hubs. Also, access from a new house to any other can easily be achieved just by connecting that house to its nearest local road.
Neural networks can achieve this for language as well. An interesting consequence is that 'words' are network structures themselves that remain where they are. A sentence is just a connection path that (temporarily) interconnects these word structures. As a consequence, words are always content addressable, which is not the case with symbols or vector operations.
(For a more detailed analysis, see e.g., arxiv.org/abs/2210.10543)
That paper describing 'linguistic inputs' in children as if that's actually how we make sense of the world is such a great illustration of the head-banging problem at the heart of this. How difficult is it to understand? We make sense of the world and navigate it with our bodies. The Stochastic Parrots term is great (LLMs will only ever be able to output a sort of empty meta-language) but it still suggests an organic being that uses the meta-language to communicate, even if the 'words' it's saying are mimicry - the fact it is using its vocal cords, tongue, beak, to make sounds that attract other animals (us) likely to give it food and attention is not meaningless.
But an LLM is always virtual, never 'needing' anything, never caring about anything, never feeling physical agitations of the nervous system that signals anything about the environment. So what's the point? We got to the stage where chatbots can run mundane tasks that are - at best - boring for humans, at worst generating abuse from angry customers. That's useful, if limited. But trying to 'solve' the problem of meaning? Surely that's a category error; understanding the world is not what a chatbot does or is even supposed to do. Neither is it a problem to be solved, or one that a machine can ever solve unless we literally learn how to give them a human body. And automating creativity? What is WRONG with these people? Automation was supposed to free us up to do the things we love. If creativity isn't exactly that, then what's left? It's all just so weird to me.