In May, in a tweet that gave rise to this very Substack, DeepMind executive Nando de Freitas declared AGI victory, possibly prematurely, shouting “It’s all about scale now! The Game is Over!”: de Freitas was arguing that AI doesn’t need a paradigm shift; it just needs more data, more efficiencies, bigger servers. I called this hypothesis—that AGI might arise from larger scale without fundamental new innovation— “scaling-über-alles”. I
Ain't never going to believe in AGI.
Intelligence doesn't exist without goals. Getting up in the morning takes more intelligence than our colleagues will brute force into a model in a lifetime.
I'm still positing we could transition quickly away from wasted breath if we repurposed "AI" as "Augmented Inference". Just stop saying "Artificial Intelligence".
Or demand that it tell us how many angels can dance on the head of a pin.
We'd capture the essence of the computing we're capable of and some of us would move on to more interesting studies of consciousness.
Gary, I appreciate your pushing folks in a good direction. I'm ready to vote my ballot to say you have won.
Indeed. It seems that scaling maximalism relies on the ambiguity of terms like 'big' and 'more'. Training sets on e.g. language in deep learning are very big compared to what humans use in learning language. But they are still minute compared to the 'performance' set of human language, which is in de order of 10^20 or more.
It would take about 10 billion people (agents) and 300 years, with 1 sentence produced and recorded every second, to get a training set of this size. It's fair say we are not there yet.
Also, even if we had a substantial subset, it would most likely be unevenly distributed. Maybe a lot about today’s weather but not very much about galaxies far far away (or perhaps the other way around). So, even with a set of this size it is not guaranteed that it would be statistically distributed sufficiently to cover all relations found in the performance set.
Deep learning is sometimes very impressive, and it could provide the backbone of a semantic system for AGI. But e.g. the fact that humans do not use training sets of the size of deep learning to learn language strongly suggests that the boundary conditions needed to achieve human-level cognition, and with it the underlying architecture, are fundamentally different from those underlying deep learning (e.g. see https://arxiv.org/abs/2210.10543).
After seventy years we still have not the slightest clue how to make ourselves safe from the first existential scale technology, nuclear weapons. And so, based on that experience, because we are brilliant, we decided to create another existential scale technology, AI, which we also have no idea how to make safe. And then Jennifer Doudna comes along and says, let's make genetic engineering as easy, cheap and accessible to as many people as possible as fast as possible, because we have no idea how to make that safe either.
It's a bizarre experience watching this unfold. All these very intelligent, highly educated, accomplished articulate experts celebrating their wild leap in to civilization threatening irrationality. The plan seems to be to create ever more, ever larger existential threat technologies at an ever accelerating rate, to discover what happens. As if simple common sense couldn't predict that already.
Ok, I'm done, and off to watch Don't Look Up again.
My favorite maximalist-scale model is the coelacanth.
The scales don't get much bigger than this, but progress has been VERY slow, and there are a lot of easy-for-human things it can't do yet. Or maybe ever.
Despite all odds, they are still around... though endangered.
Interesting article. You write: "Scaling maximalism is an interesting hypothesis, but I stand by my prediction that it won’t get us to AGI."
In my opinion, you are mincing your words. Scaling maximalism is not an interesting hypothesis. It is a stupid hypothesis made by people who have an expensive but lame deep learning pony in the AGI race. They will not just lose the race, they will come dead last. Just saying.
Thanks for another 'fun' article, Gary :) Dreyfus said this years back: ...an exercise in "tree climbing with one's eyes on the moon." Using data can never scale.
If existing data by itself is enough to become intelligent, why is (experimental) science performed in the lab? Answer - because the new hypotheses being tested (via experiments) need new, direct, data from the world, rather than proofs from existing equations/laws.
In other words, direct engagement with the open-ended world is the source of knowledge and learning, it's what provides "grounded" meaning, it's what keeps us from being confined to a "frame". Without this, all that exists is meaningless symbol manipulation a la Searle.
The size or power of the AGI itself is of very little importance until it has established feedback loops with the real world.
Intelligence is not about always being right as much as it's about doing something useful when you find out that you're wrong.
I may be a layman, but scaling maximalism seems, in my eyes, to build on the very wobbly hypothesis that a facsimile can be as good *universally* as the original. Simulations strip out non-purpose-critical parts in order to free up the computational space to approach a specific slice of reality in a deep and narrow way. Simulate all contingencies, and you will be left with a model that will be less efficient than the original. Taken to the extreme, the most efficient complete general simulation of the universe is to make another fully functional universe.
AGI, by its very nature, lacks room for that simplification. Its purpose is to have no specific purpose. Some things may be stripped out, to be sure, but they're minute compared to the complexity that is still left.
The constants in AI/AGI are that there is 1) always hype, 2) AGI is very very difficult.
If a certain ability is missing (permanent learning for example), then scaling only scales the consequences of its absence, does not lead to the appearance of this ability.
Thanks Gary for the great post - just look at my definition of a symbolic language model - it's surely possible to be done by all of us especially in the linguistics departments of the universities:
Symbolic language model is a coherent conceptual model of the world transferred into all the existing languages.
The conceptual model of the world is presented in wordhippo.com on the level of individual meanings and in powerthesaurus.org on the level of word combinations.
The Matrix - Global Knowledge Network - https://t.me/thematrixcom, https://activedictionary.ru/
Symbolic Language Models: Arabic, Chinese, English, Finnish, French, German, Hebrew, Hindi, Italian, Japanese, Korean, Norwegian, Portugal, Russian, Spanish, Swedish, Ukrainian, Uzbek
US2021033447 - Neural network for interpreting sentences of a natural language
I agree data limits start to play a factor at some point...from what I’ve heard there’s not really any more open source code to train larger versions of github copilot on. Though to be precise, I don’t think it stops growth, it just means you have to be suboptimal vs the chinchilla laws (and perhaps starts getting way too expensive for the marginal gains)
That said, I’m curious, what happens if/when GPT-4 is able to figure out what is meant by wearing gloves? Would that change your opinion at all? Or would you adjust to reference a more complex example that it is unable to handle (since as you say, it will certainly be possible to come up with one!)