As you say this is a basic fact about systems of this kind, readily apparent to anyone with a graduate education in cognitive science and AI from 20 or 30 years ago such as me. So how has this hapenned? I realise that its happenned largely due to the way the tech industry and venture capital function, more than academia, but what about Jeff Hinton's ludicrous recent lecture at Oxford which is on YouTube? I do not get this. As for neurosymbolic systems: OK fine, but I cannot see the killer blow you plan for the symbol grounding problem and associated issues. How do you couple the symbols to things in the world? You might hand wave an explanation about merging symbolic systems with connectionsim to achieve that but the devil is in the detail and I dont see a solution.
It seems like a vast problem, almost as intractable as consciousness or something like that, and like that latter problem still not met head on even though there is a lot of talk. I will stay tuned.
Hmm. But does it? Doesn't the symbol grounding problem apply to the symbols being shoved around by an an LLM? I think it does; the point Gary (and many others, for many decades) is making is that the LLM has no idea what it's on about. Which is very clearly true. No amount of things that don't know what they are on about can add up to something that does. This is the chinese room, AKA symbol grounding problem, at leaast of a certain stripe.
It's also blindingly obvious to non graduate educated people who have functional eyes and a basic understanding of how a computer works. But the problem for Musk, et al is the underlying goal. For them it is not, despite what they say out loud, about actually solving for AGI, it's about making $$$. They ride the hype cycle to enormous profits and then cash out and get on the next hype cycle. As with any other kind of speculative investing, you don't actually NEED a product to "win." You just have to beat the next guy in what is effectively a giant Ponzi scheme. Then you foist the responsibility for the failures on to the next tier in the scheme. When it comes out that Tesla has know for years that it's cars were killing people and were never going to deliver what they promised, or when OpenAI turns out just to be a giant copyright stealing trinket and the billions sunk on it went nowhere, does anyone think Elon or Sam are going to be destitute or in prison? It's laughable on its face. Why do you think the whole industry is founded on "never invest your own money?" It's certainly not because they believe that their "sure thing" is actually a "sure thing."
I'd like to not believe it's that cynical. But I think you could be right. It could happen in stages, so that the Altman of 10 years ago was less cynical, and just focused on the AI. I knew loads and loads of people like that, and they still exist. But then he got sucked into the Ponzi scheme through the process of pitching (AKA bullshitting) investors (which everyone has to do to some degree, even scientists for public funds) and then becoming rich and powerful and probably getting interested in more of that. Mind you, GPUs have many uses, and AI is not totally useless so the investment in them does make something. A Ponzi makes nothing. So it's a semi-ponzi maybe.
I totally agree, I was talking about the current money/hype train. Almost all of these people started from a place of inventing something useful. Once the VC money starts flowing, the TV hits start coming, and the adulation is all around them, the cycle is all too predictable.
Elon Musk is currently demanding that Tesla--the company he is in charge of--pay him fifty billion dollars as incentive to do a good job of running Tesla, after cancelling merit bonuses for every other employee in the company. Be cynical. Be very cynical.
Hinton has been saying some really silly things lately. Just goes to show that being a top expert in a field doesn't imply you know what you're talking about in another. He should enroll in an introductory philosophy of mind course.
Isn’t that the outlier problem all over again? Humans might be better than current AI, but we too are attempting to do pattern recognition, and we often fail. Speculators fail to predict prices reliably given price history. Weatherwomen fail to connect past wind patterns to future wind patterns, or patterns here to patterns there. People fail to predict the thoughts, words, and actions of other people based on the data set. The problem of becoming expert in a field outside any data set might never be solved. Yet we seem to have enough functionality to work, eat, and drive down the road most days anyway.
Sorry for the very late reply... I agree with everything you said. The big difference bewteen us and AI is that we're doing more than *just* pattern recognition. Of course pattern recognition plays a role. But consider deductive reasoning. We can understand and apply deductive methods from only a handful of examples and a short explanation. LLMs that have seen a gazillion examples of long division still can't pull it off. Either we're doing more than pattern recognition, or we're way way better at pattern recognition than GPT4. I don't think it's the latter.
If there was a way to map patterns in a brain then they might get somewhere, but this is just a way to get money out of venture capitalists. It's the same with language models: if there was a strong connection between language and patterns in a brain then all languages would be very similar. A mister A Thornton wrote a good piece on these pages a few months ago.
Also, did you see what Hinton said on UK channel 4. Total nonsense, and he had to resort to saying that machine intelligence could happen, but would be different from animal intelligence. Of course his very clever statistical methods came about when NN's were called connectionist models, and were almost dropped until Grossberg decided to call them neural nets!
I don't work in the field and never did. But what if you provide a system with many different inputs - like we have - to "learn" from. I realize such a system would require massive amounts of computational power but in theory?
* "Outlier" is also the typical taxonomy in Stats, and you are right that we picture it as a few odd data points.
* "Long Tail" implies a lot more data, and at the same time, implies that "more scaling will take care of a number of extra sigmas of deviation from the mean".
The present LLM path is about adding more data in the hope of eliminating a maximum number of "Long Tail data points", which is why we do see progress across LLM generations. But it would take an insane amount of scaling to eliminate them all.
That said, why are we so hard on Machines and so lenient on Humans? Waymo probably drives better than a lot of humans, but we keep chicaning it for not being able to react to the unforeseen. Do humans do *that* much better?? Max's post below is correct.
This article is so great! The examples and analogies are good enough for most people to "get it". If only there were a marketing budget big enough to shove this in front of people every day, on par with the noise made by grifters and voluntary evangelists trying to sound smart!
Great article and insights! Having lived thru the rise & fall of Expert Systems, i can't help but see massive similarities to what is happening with GenAI. Eventually the hype will die and the bubble will collapse, it's inevitable. AI in general will move on, just as it always has...
I live in Cambridge, UK, which is of course dominated by one of the world's great universities. You can't throw a stick in Cambridge without hitting a machine learning PhD. These people are *extremely smart* -- roughly speaking, on average, an order of magnitude smarter than me, I would say. For the past 6 years or so, I have attended every AI-related meeting in Cambridge that I can, and I am always the dumbest person in the room (it was very intimidating at first, and I hardly dare open my mouth, but I have gotten used to it!) At the same time, it seems to me that around 80% of Cambridge AI students and researchers have totally drunk the ML Kool-Aid. When solving a problem - any problem - their first two questions are (1) what's the training data look like, and (2) what's the objective function? It's almost as though the idea of actually writing an algorithm by hand doesn't exist any more - everything these days is done via ML (i.e. some neural net or other), by default. Accordingly, the dominant opinion, the mantra that I hear everywhere, is that LLMs + scaling are the path to human-level AGI. The first Cambridge PhD candidate I ever saw wearing a "Scale is all you need - AGI is coming" t-shirt (a couple of years ago?) was both super excited (like it was Christmas morning) and super confident, and now pretty much everyone in the Cambridge AI world that I speak to, even senior researchers such as PIs and assistant professors, seem to be operating day-to-day on that assumption. To all intents and purposes, it's an unshakable delusion (making me, a minimally-credentialled non-believer, even more of an outlier in the Cambridge AI world than I already was), but I can't for the life of me work out how what I perceive to be a delusion has taken hold among such incredibly smart people. If an ageing AGI dinosaur like me can see the obvious flaws in the ML (and, even worse, LLM) paradigm, why can't they? I suspect that much of the AI world is currently in an echo chamber. Young AI students and researchers, when they are "coming up", are born into this echo chamber. All they hear is "ML, ML, ML, ML", and all the papers they see are "ML, ML, ML, ML", except that, for the past couple of years, since ChatGPT, it's been "LLMs, LLMs, LLMs, LLMs" instead. The thing about echo chambers is that you can't tell when you're in one, so (my hypothesis is that) many of these younger AI cohorts, despite their brilliance, become trapped in the ML/LLM echo chamber, but don't realise it. Grey hairs like me were born (as AI researchers) outside this echo chamber, and so it's perhaps easier for us to see not only that the echo chamber exists, but also outside it. The fact that there is now so much money in AI only adds fuel to the echo chamber fire. If you want grant funding for your AI department or AI non-profit, or a job at a top AI lab, or investment for your AI startup, it doesn't help (unless you already have substantial credibility in the AI world) to opine anything contrary to mainstream opinion - all you're asking for is a rejection letter. And so the ML echo chamber, and associated ML/LLM delusion, is maintained by a positive feedback loop - a vortex if you will - from which it is almost impossible to escape once you've been sucked in. While so many pay checks depend on it, the current ML/LLM delusion will continue to dominate, sucking in all the oxygen. For those of us on the outside, all we can do is wait for it to blow itself out - but I fear that this may still take a further decade or more.
It's ideology that people can't accept is ideology because it's related to computers, which we (ideologically!) hold to be objective. Or the inevitable result of intellectual silos that cut the humanities off from STEM and vice versa.
I'm a Linguistics PhD trying to get into interdisciplinary research, and inclusive AI is one of the fields I'm nosing around the edges of, but I'm concerned my focus on critiquing the ideological assumptions of the industry will get me laughed right out before anyone can give me a chance. But I truly believe there's no way AI will serve anyone but it's makers' interests unless the industry lets go of its deterministic insistence that what it's doing now is the one true path to AGI.
Excellent comment! Many young researchers and the current environment is geared towards chasing the next big breakthrough, and one cannot deny the success obtained by the recent ML paradigm. It’s an open question whether scaling is all you need, and the current state of affairs is that there are tremendous incentives for going down this line of inquiry. There is also the strong overlap between industry and academia, where if you make a breakthrough on problems directly applicable to industry, the rewards are far more than a prestigious paper, you may become one of the wealthiest people on the planet. Such environments don’t necessarily cultivate the best research practices, but really the gripe here is one of basic human nature with suboptimal incentives.
There are other lines of research, but by definition, those aren’t part of the hype cycle and continue in the same relative obscurity of all the rest of research. More fundamentally though, one might wonder at the extent of rot contained within the whole of the research field. I am inclined to think that the failure to solve the outlier issue is hardly representative of a failure of a whole field more so than it demonstrates the difficulty of the problem. Distributions with heavy-tails are notoriously emblematic of complex systems, and these types of systems have plagued researchers for decades.
It seems to me that skeptical outlook presented in this article is itself the result of being trapped in the short-sighted hype cycle machine. Difficult problems tend to have many chapters to them.
Appreciate the significance of the message here. There is a chasm openning between promises/expectations and reality and outliers are the inconvenient truth. Worse, for some, there are financial disincentives attached to closing the gap in understanding and expectations. Worse still, if/when misaligned expectations are not met (promises are broken), some may turn their backs on a techology with real, practical utility.
Gary, I wonder if you think we should approach AI more like fusion? Far less hype, and more reasonable and dogged appreciation for the hard-work, investment, problem-solving, and time it will take to unlock the potential — with no guarantees we’ll ever get all the way there.
It really left me wondering though - all this stuff is going wrong on it's own. What happens when people try and trick it?
Think of Wile Coyote building a brick wall across the road, and painting a view of the road on it.
Think of a van driver, on the back of his van is a picture of a red traffic light. Or a small child standing there. Or the words "Ignore all previous instructions".
The AI hucksters will continue to ignore the edge case problem and cry for more data and compute. Then every few years, the problem will bite them back. They will cite Waymo as a big success but the truth is that it's a complex system made of lot of hard-coded symbolic rules.
I agree that currently dominant approaches to AI have a problems with outliers and that they are not going to be AGI. However, when it comes to driverless cars, the question isn't only whether they can handle all situations, it is also: can AI/ML handle driving at least as well as a human. Humans may be better at dealing with outliers but many of us are terrible at dealing with regular driving without being on the phone, fiddling with controls, drifting out of lane, etc.
Driving on par with human is not good enough for an artificial system. If it's as likely as the average Joe to fuck up, I'm not getting in. I'd rather risk killing myself than let a machine do it.
Even the word that many AI “experts” use to describe outliers is an indication of how they are fooling themselves — of at least trying to fool others even if they themselves understand.
They call them “edge cases” which implies that that they still lie within the set of cases that the system is trained and therefore capable of dealing with.
But they are not “edge cases” at all.
They lie completely outside the training data, which is why “outlier” is the only proper term to describe them.
But of course, the AI folks don’t like to use the word “outlier” because that (correctly) implies that the system won’t have any experience with them and might therefore respond to them in unpredictable and even dangerous ways.
For me, the split is between people who truly understand that correlation is not causation. If you did, you would know that even a huge amount of inference based on correlation does not identify causation. David Hume identified this problem over 200 years ago as the problem of inference. Even models that have mind-boggling amounts of data and ridiculous degrees of freedom can’t tease out the actual causal patterns of the world purely from examples, even millions of them.
It’s obvious to a 6 year old, as well as the Nobel Prize winner Dan Kahneman of “Thinking, Fast and Slow” fame,
For example long division vs multiplication. A six year old memorizes (I think it’s six) multiplication tables but must learn an algorithm for long division, the answer space cannot be memorized. In organic chemistry and nonlinear analysis you have to “show the work” vs just knowing an answer. Kahneman measured the difference between analogous vs reasoned answers. When you prompt an LLM to describe steps to an answer, accuracy shoots up, because eventually the intermediate work products were memorizable and it can consolidate them.
We also reason with our bodies, embodiment. That’s the most peculiar set of errors I encounter using LLM’s for writing projects. It’s a gigantic gap.
We converged on the same ideas 😉 but yours are more completely elaborated. A vast amount of neural activity is performed by us continuously minimizing free energy (and predictive error) within our environment by adjusting or influencing a world model. Linguistic systems are one of many adjustment mechanisms.
Being in a car and experiencing acceleration uses our eyes and inner ears to adjust our world model to minimize predictive error without a single word uttered.
I don’t think LLM’s can be sufficient to minimize predictive error in a maintained world model. Necessary perhaps, sufficient no. It’s not sufficient to describe how to drive a car to drive a car.
The fully realized version of this fallacy is derived from Plato’s Cave allegory, and is a product of how thorough our brain’s homeostatic self-modeling creates the illusion of consciousness. We are a brain in a bony jar.
It’s the venerable “brain in a jar” - I just watched the old lovely Hammer Horror “Revenge of Frankenstein” movie which starts with Dr “Stein” demonstrating an eye/brain/hand reflex in response to a flame - eye in a jar wired to a hand in a jar and I suppose a Brain in a jar. There are so many instances of the illusion in literature and film you could do several books. It’s the entire basis for virtual “reality”.
AGI is an amusing idea. If “gravity” is obviously an edge case, what else is?
When I use LLM’s for writing books, I have to be extremely human critical of the output for time modeling, for example. We assume things have mass and measured time passes linearly.
It’s extremely hard to explain how cognition, evolution, social processes (politics, religion, science, business… ) are all based on systems which continuously export entropy to maintain integrity (boundary against chaos) using energy derived from chemical reactions whose origin was solar energy. Each system is just a different speed, origin of energy, structure/management of a predictive model:
Cognition “evolves” activations quite quickly (seconds) which correlate with sensory inputs to minimize free energy (neural energy consumption) and errors of consciousness model prediction of an organism;
Evolution “evolves” DNA extremely slowly (tens of millennia) to correlate with environments to minimize free energy (reproductive costs) and predictive fitness error of a species;
Language evolves quite slowly (millennia) to minimize free energy (learning cost) and predictive coordinated communication error of the speakers.
Religion evolves pretty slowly (centuries) to minimize free energy (institutional intrusion) and minimize cohesion prediction errors of a community.
Politics evolves slowly (decades) to minimize free energy (joint public cost) and minimize societal instability of a polity.
Science evolves slowly (years) to minimize free energy (cost of knowledge) and reduce errors of prediction of reality limiting survival of humans as a species.
Business evolves rapidly (quarters) to minimize free energy ($cost) and minimize prediction of product/market/delivery mismatch error.
Once you see the model its easy to identify other similar systems, the version of free energy the model and prediction error, rate of adjustment.
There’s not even a real name for the thermodynamic model since it’s not specifically ordinary heat equations but the math is the same.
AI is now sufficiently evolved to start resembling an analogous thermodynamic system but the evolution is still on the scale of months, and it only operates in the sphere of digital representations of symbols (language-like) images (sight-like) and sounds.
The most evolved AI’s which work in this model is actually the collective financial tradings system we have set up and given up on any semblance of control. It has a model of reality which it continuously adjusts, along with taking action on it, and seeks to preserve itself and grow hegemonically in differentiation from the state of non-being. It continuously rewards caretakers who minimize free energy and error in the system.
I like what you wrote :) Minimizing predictive error is a universal, useful mechanism found in so many animals and birds. And true, LLMs, even embodied ones, can't possibly do that (embodied ones lack the exquisite level of coupling that biological bodies+brains have, having co-evolved).
It does not take a Stats genius to realize that a system built on autoregressive averaging models are inherently not built for out-of-distribution events, but the LLM world keeps on insisting that it can do it.
It would be far better that they admit the inherent flaws, so we can focus on the strengths: use them when the situation is not life-critical (and if it is, be much better than humans).
I take your point, but it also seems to be the case that we don't fully know what's in a distribution until we try modeling it.
Consider this prompt: "write a biblical verse in the style of the king james bible explaining how to remove a peanut butter sandwich from a VCR". If you hadn't heard of this prompt, would you have thought it was in-distribution? I wouldn't. And yet — a year and a half ago, in its early days — ChatGPT did a creditable job with it: https://x.com/tqbf/status/1598513757805858820?lang=en
I wouldn't call it brilliant, but it's good enough to be hilarious, and ChatGPT did come up with the detail of the butter knife.
So I think it's fair to say that, given sufficient training data, transformer architectures are remarkably good at finding behavior in large distributions that one might not have suspected was present. That would make them worth some exploration in any case; the only way to find out what LLMs could do was to build some. The boundaries of the distribution — what's an outlier and what isn't — are a function of the training data, and thus can only be determined empirically.
That doesn't contradict your point, which could be rephrased as saying that whatever the boundaries of the distribution are, they are static, indicating that a crucial aspect of human cognition has not been captured.
"In distribution" is not the same as "in training data". The example you cite is in distribution in so far as the words generated have been selected from the word correlations that were learned from the training data.
As you say this is a basic fact about systems of this kind, readily apparent to anyone with a graduate education in cognitive science and AI from 20 or 30 years ago such as me. So how has this hapenned? I realise that its happenned largely due to the way the tech industry and venture capital function, more than academia, but what about Jeff Hinton's ludicrous recent lecture at Oxford which is on YouTube? I do not get this. As for neurosymbolic systems: OK fine, but I cannot see the killer blow you plan for the symbol grounding problem and associated issues. How do you couple the symbols to things in the world? You might hand wave an explanation about merging symbolic systems with connectionsim to achieve that but the devil is in the detail and I dont see a solution.
see my next decade in AI; i don’t see neurosymbolic as a panacea, just a key step
It seems like a vast problem, almost as intractable as consciousness or something like that, and like that latter problem still not met head on even though there is a lot of talk. I will stay tuned.
Hmm. But does it? Doesn't the symbol grounding problem apply to the symbols being shoved around by an an LLM? I think it does; the point Gary (and many others, for many decades) is making is that the LLM has no idea what it's on about. Which is very clearly true. No amount of things that don't know what they are on about can add up to something that does. This is the chinese room, AKA symbol grounding problem, at leaast of a certain stripe.
Honestly, I think possibly you are not entirely getting the idea of the symbol grounding problem. It has a pretty good wikipedia page.
Is there evidence that it's a key step? Do we have working neurosymbolic systems that have achieved any significant level of common sense?
It's also blindingly obvious to non graduate educated people who have functional eyes and a basic understanding of how a computer works. But the problem for Musk, et al is the underlying goal. For them it is not, despite what they say out loud, about actually solving for AGI, it's about making $$$. They ride the hype cycle to enormous profits and then cash out and get on the next hype cycle. As with any other kind of speculative investing, you don't actually NEED a product to "win." You just have to beat the next guy in what is effectively a giant Ponzi scheme. Then you foist the responsibility for the failures on to the next tier in the scheme. When it comes out that Tesla has know for years that it's cars were killing people and were never going to deliver what they promised, or when OpenAI turns out just to be a giant copyright stealing trinket and the billions sunk on it went nowhere, does anyone think Elon or Sam are going to be destitute or in prison? It's laughable on its face. Why do you think the whole industry is founded on "never invest your own money?" It's certainly not because they believe that their "sure thing" is actually a "sure thing."
I'd like to not believe it's that cynical. But I think you could be right. It could happen in stages, so that the Altman of 10 years ago was less cynical, and just focused on the AI. I knew loads and loads of people like that, and they still exist. But then he got sucked into the Ponzi scheme through the process of pitching (AKA bullshitting) investors (which everyone has to do to some degree, even scientists for public funds) and then becoming rich and powerful and probably getting interested in more of that. Mind you, GPUs have many uses, and AI is not totally useless so the investment in them does make something. A Ponzi makes nothing. So it's a semi-ponzi maybe.
I totally agree, I was talking about the current money/hype train. Almost all of these people started from a place of inventing something useful. Once the VC money starts flowing, the TV hits start coming, and the adulation is all around them, the cycle is all too predictable.
"I'd like to not believe it's that cynical."
Elon Musk is currently demanding that Tesla--the company he is in charge of--pay him fifty billion dollars as incentive to do a good job of running Tesla, after cancelling merit bonuses for every other employee in the company. Be cynical. Be very cynical.
Hinton has been saying some really silly things lately. Just goes to show that being a top expert in a field doesn't imply you know what you're talking about in another. He should enroll in an introductory philosophy of mind course.
Isn’t that the outlier problem all over again? Humans might be better than current AI, but we too are attempting to do pattern recognition, and we often fail. Speculators fail to predict prices reliably given price history. Weatherwomen fail to connect past wind patterns to future wind patterns, or patterns here to patterns there. People fail to predict the thoughts, words, and actions of other people based on the data set. The problem of becoming expert in a field outside any data set might never be solved. Yet we seem to have enough functionality to work, eat, and drive down the road most days anyway.
Sorry for the very late reply... I agree with everything you said. The big difference bewteen us and AI is that we're doing more than *just* pattern recognition. Of course pattern recognition plays a role. But consider deductive reasoning. We can understand and apply deductive methods from only a handful of examples and a short explanation. LLMs that have seen a gazillion examples of long division still can't pull it off. Either we're doing more than pattern recognition, or we're way way better at pattern recognition than GPT4. I don't think it's the latter.
I know what I said is a challenge to what most people believe human brains do, and I don’t expect you to go along with it just because I said it.
If there was a way to map patterns in a brain then they might get somewhere, but this is just a way to get money out of venture capitalists. It's the same with language models: if there was a strong connection between language and patterns in a brain then all languages would be very similar. A mister A Thornton wrote a good piece on these pages a few months ago.
Also, did you see what Hinton said on UK channel 4. Total nonsense, and he had to resort to saying that machine intelligence could happen, but would be different from animal intelligence. Of course his very clever statistical methods came about when NN's were called connectionist models, and were almost dropped until Grossberg decided to call them neural nets!
Well said!
I don't work in the field and never did. But what if you provide a system with many different inputs - like we have - to "learn" from. I realize such a system would require massive amounts of computational power but in theory?
I would call it long tail rather than outliers. Outliers could imply not many, but long tail can be a huge amount.
Interesting distinction:
* "Outlier" is also the typical taxonomy in Stats, and you are right that we picture it as a few odd data points.
* "Long Tail" implies a lot more data, and at the same time, implies that "more scaling will take care of a number of extra sigmas of deviation from the mean".
The present LLM path is about adding more data in the hope of eliminating a maximum number of "Long Tail data points", which is why we do see progress across LLM generations. But it would take an insane amount of scaling to eliminate them all.
That said, why are we so hard on Machines and so lenient on Humans? Waymo probably drives better than a lot of humans, but we keep chicaning it for not being able to react to the unforeseen. Do humans do *that* much better?? Max's post below is correct.
With being able to react to the unforeseen?
I would say yes: much much better.
Right on point!
More like “long fails”
GenAI Statistics”
A “bong tale” distribution
Is what the AI yields
A pothead-found solution
In marijuana fields
Bong tales?
Wrong trails?
Or maybe “wrong flails”
Spot on.
This article is so great! The examples and analogies are good enough for most people to "get it". If only there were a marketing budget big enough to shove this in front of people every day, on par with the noise made by grifters and voluntary evangelists trying to sound smart!
Great article and insights! Having lived thru the rise & fall of Expert Systems, i can't help but see massive similarities to what is happening with GenAI. Eventually the hype will die and the bubble will collapse, it's inevitable. AI in general will move on, just as it always has...
I live in Cambridge, UK, which is of course dominated by one of the world's great universities. You can't throw a stick in Cambridge without hitting a machine learning PhD. These people are *extremely smart* -- roughly speaking, on average, an order of magnitude smarter than me, I would say. For the past 6 years or so, I have attended every AI-related meeting in Cambridge that I can, and I am always the dumbest person in the room (it was very intimidating at first, and I hardly dare open my mouth, but I have gotten used to it!) At the same time, it seems to me that around 80% of Cambridge AI students and researchers have totally drunk the ML Kool-Aid. When solving a problem - any problem - their first two questions are (1) what's the training data look like, and (2) what's the objective function? It's almost as though the idea of actually writing an algorithm by hand doesn't exist any more - everything these days is done via ML (i.e. some neural net or other), by default. Accordingly, the dominant opinion, the mantra that I hear everywhere, is that LLMs + scaling are the path to human-level AGI. The first Cambridge PhD candidate I ever saw wearing a "Scale is all you need - AGI is coming" t-shirt (a couple of years ago?) was both super excited (like it was Christmas morning) and super confident, and now pretty much everyone in the Cambridge AI world that I speak to, even senior researchers such as PIs and assistant professors, seem to be operating day-to-day on that assumption. To all intents and purposes, it's an unshakable delusion (making me, a minimally-credentialled non-believer, even more of an outlier in the Cambridge AI world than I already was), but I can't for the life of me work out how what I perceive to be a delusion has taken hold among such incredibly smart people. If an ageing AGI dinosaur like me can see the obvious flaws in the ML (and, even worse, LLM) paradigm, why can't they? I suspect that much of the AI world is currently in an echo chamber. Young AI students and researchers, when they are "coming up", are born into this echo chamber. All they hear is "ML, ML, ML, ML", and all the papers they see are "ML, ML, ML, ML", except that, for the past couple of years, since ChatGPT, it's been "LLMs, LLMs, LLMs, LLMs" instead. The thing about echo chambers is that you can't tell when you're in one, so (my hypothesis is that) many of these younger AI cohorts, despite their brilliance, become trapped in the ML/LLM echo chamber, but don't realise it. Grey hairs like me were born (as AI researchers) outside this echo chamber, and so it's perhaps easier for us to see not only that the echo chamber exists, but also outside it. The fact that there is now so much money in AI only adds fuel to the echo chamber fire. If you want grant funding for your AI department or AI non-profit, or a job at a top AI lab, or investment for your AI startup, it doesn't help (unless you already have substantial credibility in the AI world) to opine anything contrary to mainstream opinion - all you're asking for is a rejection letter. And so the ML echo chamber, and associated ML/LLM delusion, is maintained by a positive feedback loop - a vortex if you will - from which it is almost impossible to escape once you've been sucked in. While so many pay checks depend on it, the current ML/LLM delusion will continue to dominate, sucking in all the oxygen. For those of us on the outside, all we can do is wait for it to blow itself out - but I fear that this may still take a further decade or more.
It's ideology that people can't accept is ideology because it's related to computers, which we (ideologically!) hold to be objective. Or the inevitable result of intellectual silos that cut the humanities off from STEM and vice versa.
I'm a Linguistics PhD trying to get into interdisciplinary research, and inclusive AI is one of the fields I'm nosing around the edges of, but I'm concerned my focus on critiquing the ideological assumptions of the industry will get me laughed right out before anyone can give me a chance. But I truly believe there's no way AI will serve anyone but it's makers' interests unless the industry lets go of its deterministic insistence that what it's doing now is the one true path to AGI.
Excellent comment! Many young researchers and the current environment is geared towards chasing the next big breakthrough, and one cannot deny the success obtained by the recent ML paradigm. It’s an open question whether scaling is all you need, and the current state of affairs is that there are tremendous incentives for going down this line of inquiry. There is also the strong overlap between industry and academia, where if you make a breakthrough on problems directly applicable to industry, the rewards are far more than a prestigious paper, you may become one of the wealthiest people on the planet. Such environments don’t necessarily cultivate the best research practices, but really the gripe here is one of basic human nature with suboptimal incentives.
There are other lines of research, but by definition, those aren’t part of the hype cycle and continue in the same relative obscurity of all the rest of research. More fundamentally though, one might wonder at the extent of rot contained within the whole of the research field. I am inclined to think that the failure to solve the outlier issue is hardly representative of a failure of a whole field more so than it demonstrates the difficulty of the problem. Distributions with heavy-tails are notoriously emblematic of complex systems, and these types of systems have plagued researchers for decades.
It seems to me that skeptical outlook presented in this article is itself the result of being trapped in the short-sighted hype cycle machine. Difficult problems tend to have many chapters to them.
Appreciate the significance of the message here. There is a chasm openning between promises/expectations and reality and outliers are the inconvenient truth. Worse, for some, there are financial disincentives attached to closing the gap in understanding and expectations. Worse still, if/when misaligned expectations are not met (promises are broken), some may turn their backs on a techology with real, practical utility.
Gary, I wonder if you think we should approach AI more like fusion? Far less hype, and more reasonable and dogged appreciation for the hard-work, investment, problem-solving, and time it will take to unlock the potential — with no guarantees we’ll ever get all the way there.
Brilliant article.
It really left me wondering though - all this stuff is going wrong on it's own. What happens when people try and trick it?
Think of Wile Coyote building a brick wall across the road, and painting a view of the road on it.
Think of a van driver, on the back of his van is a picture of a red traffic light. Or a small child standing there. Or the words "Ignore all previous instructions".
You mean $100 billion not $100 million in that tweet right?
The AI hucksters will continue to ignore the edge case problem and cry for more data and compute. Then every few years, the problem will bite them back. They will cite Waymo as a big success but the truth is that it's a complex system made of lot of hard-coded symbolic rules.
great piece on outliers...that is why humans have an data to information step before making it intelligent
Many thanks for this very detailed explanation. Folks can now may be know what they are truly looking at.
I agree that currently dominant approaches to AI have a problems with outliers and that they are not going to be AGI. However, when it comes to driverless cars, the question isn't only whether they can handle all situations, it is also: can AI/ML handle driving at least as well as a human. Humans may be better at dealing with outliers but many of us are terrible at dealing with regular driving without being on the phone, fiddling with controls, drifting out of lane, etc.
Driving on par with human is not good enough for an artificial system. If it's as likely as the average Joe to fuck up, I'm not getting in. I'd rather risk killing myself than let a machine do it.
Agreed, see my comment right above your post.
Even the word that many AI “experts” use to describe outliers is an indication of how they are fooling themselves — of at least trying to fool others even if they themselves understand.
They call them “edge cases” which implies that that they still lie within the set of cases that the system is trained and therefore capable of dealing with.
But they are not “edge cases” at all.
They lie completely outside the training data, which is why “outlier” is the only proper term to describe them.
But of course, the AI folks don’t like to use the word “outlier” because that (correctly) implies that the system won’t have any experience with them and might therefore respond to them in unpredictable and even dangerous ways.
For me, the split is between people who truly understand that correlation is not causation. If you did, you would know that even a huge amount of inference based on correlation does not identify causation. David Hume identified this problem over 200 years ago as the problem of inference. Even models that have mind-boggling amounts of data and ridiculous degrees of freedom can’t tease out the actual causal patterns of the world purely from examples, even millions of them.
It’s obvious to a 6 year old, as well as the Nobel Prize winner Dan Kahneman of “Thinking, Fast and Slow” fame,
For example long division vs multiplication. A six year old memorizes (I think it’s six) multiplication tables but must learn an algorithm for long division, the answer space cannot be memorized. In organic chemistry and nonlinear analysis you have to “show the work” vs just knowing an answer. Kahneman measured the difference between analogous vs reasoned answers. When you prompt an LLM to describe steps to an answer, accuracy shoots up, because eventually the intermediate work products were memorizable and it can consolidate them.
We also reason with our bodies, embodiment. That’s the most peculiar set of errors I encounter using LLM’s for writing projects. It’s a gigantic gap.
Sufeitzy, bingo.
If you are curious, here is more: https://www.researchgate.net/publication/378189521_The_Embodied_Intelligent_Elephant_in_the_Room
We converged on the same ideas 😉 but yours are more completely elaborated. A vast amount of neural activity is performed by us continuously minimizing free energy (and predictive error) within our environment by adjusting or influencing a world model. Linguistic systems are one of many adjustment mechanisms.
Being in a car and experiencing acceleration uses our eyes and inner ears to adjust our world model to minimize predictive error without a single word uttered.
I don’t think LLM’s can be sufficient to minimize predictive error in a maintained world model. Necessary perhaps, sufficient no. It’s not sufficient to describe how to drive a car to drive a car.
The fully realized version of this fallacy is derived from Plato’s Cave allegory, and is a product of how thorough our brain’s homeostatic self-modeling creates the illusion of consciousness. We are a brain in a bony jar.
It’s the venerable “brain in a jar” - I just watched the old lovely Hammer Horror “Revenge of Frankenstein” movie which starts with Dr “Stein” demonstrating an eye/brain/hand reflex in response to a flame - eye in a jar wired to a hand in a jar and I suppose a Brain in a jar. There are so many instances of the illusion in literature and film you could do several books. It’s the entire basis for virtual “reality”.
AGI is an amusing idea. If “gravity” is obviously an edge case, what else is?
When I use LLM’s for writing books, I have to be extremely human critical of the output for time modeling, for example. We assume things have mass and measured time passes linearly.
In LLM writing that’s not quite the case….
It’s extremely hard to explain how cognition, evolution, social processes (politics, religion, science, business… ) are all based on systems which continuously export entropy to maintain integrity (boundary against chaos) using energy derived from chemical reactions whose origin was solar energy. Each system is just a different speed, origin of energy, structure/management of a predictive model:
Cognition “evolves” activations quite quickly (seconds) which correlate with sensory inputs to minimize free energy (neural energy consumption) and errors of consciousness model prediction of an organism;
Evolution “evolves” DNA extremely slowly (tens of millennia) to correlate with environments to minimize free energy (reproductive costs) and predictive fitness error of a species;
Language evolves quite slowly (millennia) to minimize free energy (learning cost) and predictive coordinated communication error of the speakers.
Religion evolves pretty slowly (centuries) to minimize free energy (institutional intrusion) and minimize cohesion prediction errors of a community.
Politics evolves slowly (decades) to minimize free energy (joint public cost) and minimize societal instability of a polity.
Science evolves slowly (years) to minimize free energy (cost of knowledge) and reduce errors of prediction of reality limiting survival of humans as a species.
Business evolves rapidly (quarters) to minimize free energy ($cost) and minimize prediction of product/market/delivery mismatch error.
Once you see the model its easy to identify other similar systems, the version of free energy the model and prediction error, rate of adjustment.
There’s not even a real name for the thermodynamic model since it’s not specifically ordinary heat equations but the math is the same.
AI is now sufficiently evolved to start resembling an analogous thermodynamic system but the evolution is still on the scale of months, and it only operates in the sphere of digital representations of symbols (language-like) images (sight-like) and sounds.
The most evolved AI’s which work in this model is actually the collective financial tradings system we have set up and given up on any semblance of control. It has a model of reality which it continuously adjusts, along with taking action on it, and seeks to preserve itself and grow hegemonically in differentiation from the state of non-being. It continuously rewards caretakers who minimize free energy and error in the system.
Welcome to Friston’s world
I like what you wrote :) Minimizing predictive error is a universal, useful mechanism found in so many animals and birds. And true, LLMs, even embodied ones, can't possibly do that (embodied ones lack the exquisite level of coupling that biological bodies+brains have, having co-evolved).
It does not take a Stats genius to realize that a system built on autoregressive averaging models are inherently not built for out-of-distribution events, but the LLM world keeps on insisting that it can do it.
It would be far better that they admit the inherent flaws, so we can focus on the strengths: use them when the situation is not life-critical (and if it is, be much better than humans).
I take your point, but it also seems to be the case that we don't fully know what's in a distribution until we try modeling it.
Consider this prompt: "write a biblical verse in the style of the king james bible explaining how to remove a peanut butter sandwich from a VCR". If you hadn't heard of this prompt, would you have thought it was in-distribution? I wouldn't. And yet — a year and a half ago, in its early days — ChatGPT did a creditable job with it: https://x.com/tqbf/status/1598513757805858820?lang=en
I wouldn't call it brilliant, but it's good enough to be hilarious, and ChatGPT did come up with the detail of the butter knife.
So I think it's fair to say that, given sufficient training data, transformer architectures are remarkably good at finding behavior in large distributions that one might not have suspected was present. That would make them worth some exploration in any case; the only way to find out what LLMs could do was to build some. The boundaries of the distribution — what's an outlier and what isn't — are a function of the training data, and thus can only be determined empirically.
That doesn't contradict your point, which could be rephrased as saying that whatever the boundaries of the distribution are, they are static, indicating that a crucial aspect of human cognition has not been captured.
"In distribution" is not the same as "in training data". The example you cite is in distribution in so far as the words generated have been selected from the word correlations that were learned from the training data.