Indeed. It seems that scaling maximalism relies on the ambiguity of terms like 'big' and 'more'. Training sets on e.g. language in deep learning are very big compared to what humans use in learning language. But they are still minute compared to the 'performance' set of human language, which is in de order of 10^20 or more.
It would take about 10 billion people (agents) and 300 years, with 1 sentence produced and recorded every second, to get a training set of this size. It's fair say we are not there yet.
Also, even if we had a substantial subset, it would most likely be unevenly distributed. Maybe a lot about today’s weather but not very much about galaxies far far away (or perhaps the other way around). So, even with a set of this size it is not guaranteed that it would be statistically distributed sufficiently to cover all relations found in the performance set.
Deep learning is sometimes very impressive, and it could provide the backbone of a semantic system for AGI. But e.g. the fact that humans do not use training sets of the size of deep learning to learn language strongly suggests that the boundary conditions needed to achieve human-level cognition, and with it the underlying architecture, are fundamentally different from those underlying deep learning (e.g. see https://arxiv.org/abs/2210.10543).
After seventy years we still have not the slightest clue how to make ourselves safe from the first existential scale technology, nuclear weapons. And so, based on that experience, because we are brilliant, we decided to create another existential scale technology, AI, which we also have no idea how to make safe. And then Jennifer Doudna comes along and says, let's make genetic engineering as easy, cheap and accessible to as many people as possible as fast as possible, because we have no idea how to make that safe either.
It's a bizarre experience watching this unfold. All these very intelligent, highly educated, accomplished articulate experts celebrating their wild leap in to civilization threatening irrationality. The plan seems to be to create ever more, ever larger existential threat technologies at an ever accelerating rate, to discover what happens. As if simple common sense couldn't predict that already.
Ok, I'm done, and off to watch Don't Look Up again.
The scales don't get much bigger than this, but progress has been VERY slow, and there are a lot of easy-for-human things it can't do yet. Or maybe ever.
Despite all odds, they are still around... though endangered.
BTW, I suspect that the average Large Language Model might have trouble deciding whether this comment is serious or not. Let alone, why it is humorous (if you think that it is).
I picked up a hardcover copy of NETL while browsing at the Coop many years ago. Nope.... not my field, but I got hooked on the rats opening and kept reading in the store, and bought the book. It's still there right now in front of me in my much-reduced bookshelf. I tried building an Ada version of NETL (yup....weird first language to learn but everyone said at the time this was the future) a few years after that. It didn't do much, but got me thinking about was needed to make something that could make sense of the world much as, or at least on the level, a human could. Thank you for the book Dr Fahlman.
Interesting article. You write: "Scaling maximalism is an interesting hypothesis, but I stand by my prediction that it won’t get us to AGI."
In my opinion, you are mincing your words. Scaling maximalism is not an interesting hypothesis. It is a stupid hypothesis made by people who have an expensive but lame deep learning pony in the AGI race. They will not just lose the race, they will come dead last. Just saying.
Thanks for another 'fun' article, Gary :) Dreyfus said this years back: ...an exercise in "tree climbing with one's eyes on the moon." Using data can never scale.
If existing data by itself is enough to become intelligent, why is (experimental) science performed in the lab? Answer - because the new hypotheses being tested (via experiments) need new, direct, data from the world, rather than proofs from existing equations/laws.
In other words, direct engagement with the open-ended world is the source of knowledge and learning, it's what provides "grounded" meaning, it's what keeps us from being confined to a "frame". Without this, all that exists is meaningless symbol manipulation a la Searle.
I may be a layman, but scaling maximalism seems, in my eyes, to build on the very wobbly hypothesis that a facsimile can be as good *universally* as the original. Simulations strip out non-purpose-critical parts in order to free up the computational space to approach a specific slice of reality in a deep and narrow way. Simulate all contingencies, and you will be left with a model that will be less efficient than the original. Taken to the extreme, the most efficient complete general simulation of the universe is to make another fully functional universe.
AGI, by its very nature, lacks room for that simplification. Its purpose is to have no specific purpose. Some things may be stripped out, to be sure, but they're minute compared to the complexity that is still left.
If a certain ability is missing (permanent learning for example), then scaling only scales the consequences of its absence, does not lead to the appearance of this ability.
Thanks Gary for the great post - just look at my definition of a symbolic language model - it's surely possible to be done by all of us especially in the linguistics departments of the universities:
Symbolic language model is a coherent conceptual model of the world transferred into all the existing languages.
The conceptual model of the world is presented in wordhippo.com on the level of individual meanings and in powerthesaurus.org on the level of word combinations.
I agree data limits start to play a factor at some point...from what I’ve heard there’s not really any more open source code to train larger versions of github copilot on. Though to be precise, I don’t think it stops growth, it just means you have to be suboptimal vs the chinchilla laws (and perhaps starts getting way too expensive for the marginal gains)
That said, I’m curious, what happens if/when GPT-4 is able to figure out what is meant by wearing gloves? Would that change your opinion at all? Or would you adjust to reference a more complex example that it is unable to handle (since as you say, it will certainly be possible to come up with one!)
Also, the 'moving the goal post' thing is a tired trope. The reality is, the goal posts never moved - it's simply, being able to be intelligent like a human - it's the AI community that narrowed it down severely, has had proportionally narrow wins, and complains about moving the posts [which btw is a physical thing that no AI today can actually understand :)] when the limitations get pointed out.
Yeah, I’m guilty of poking at goalposts above. Nonetheless, I do think it would be better to set a better goalpost that’s less likely to moved, though, than one that is likely to need moving in the next generation of scaling. Especially since the argument appears to be that scaling can’t solve these kinds of goals.
Though your other point I think is a really interesting one. Are we talking about whether scaling achieves human intelligence? I totally agree it never will. I think it could still achieve an “alien to us” intelligence though, that works in a different way and still gets to much of the same core of intelligence (but which has some things come easier, and some things harder, vs our human intelligences). Maybe the argument is more “will current AI be permanently narrow, or will it eventually break out from its narrowness into broader generalization”?
The 'frame' problem [which I alluded to in another response to Gary's article, along with 'grounded' meaning, Dreyfus, and Searle - 4-in-one omnibus, lol] will NEVER be solved by data alone. No matter how much data is used, there will always be a frame around it, and things outside that frame, ie. things not in the data.
What's outside the frame, is direct (personal) physical experience with the world. Such experience is what we call 'common sense' - it can be wordless (data-less) , can be rule-less, can be goal-less. AI's failure (in being intelligent, human-like) can be summed up this way: trying to use data, symbols, goals, as substitutes for experience. Experience is what lies outside of AI's self-declared goal posts.
In my view, LLMs do not figure out anything. They are fancy statistical parrots that use thousands or possibly millions of human beings as preprocessors. It's a lame approach to AGI in the 'not even wrong' category.
I agree that it basically relies on human sub-processors (though IMHO starts to generalize from that data, at sufficient scale), and requires a ton more data than any human ever would. So yeah, this is not at all how I thought AI would go. I agree there must be more effective approaches out there (though am not as experienced as others like Gary as to make predictions about what that approach might be).
But unfortunately, I think being “lame” doesn’t mean it won’t achieve intelligence: Lame approaches can still be achieve a goal, even if there are more elegant approaches out there. So more interested in trying to understand if this lame approach will achieve that goal or not, and what future (vs backwards) evidence to look out for.
Oct 29, 2022·edited Oct 29, 2022Liked by Gary Marcus
From my understanding of the meaning of the word, generalization has nothing to do with scale. Even small insects can generalize. They have to because their tiny brains cannot possibly store millions of representations. In other words, generalization is precisely what is required when scaling is too costly or is not an option.
Mike, good question. To figure out how to wear gloves, there needs to be a body, a brain, and curiosity/interest - all the data streams in the world consumed by a bunch of GPUs in a basement, would never have the equivalent of it. In other words, wearing gloves in an experience that can only be had physically, and, for which words etc. (DATA) are optional.
No need to reveal your ignorance with your moronic note. Do you have anything useful, instead of your stupid comment, that actually adds to the discussion?
Indeed. It seems that scaling maximalism relies on the ambiguity of terms like 'big' and 'more'. Training sets on e.g. language in deep learning are very big compared to what humans use in learning language. But they are still minute compared to the 'performance' set of human language, which is in de order of 10^20 or more.
It would take about 10 billion people (agents) and 300 years, with 1 sentence produced and recorded every second, to get a training set of this size. It's fair say we are not there yet.
Also, even if we had a substantial subset, it would most likely be unevenly distributed. Maybe a lot about today’s weather but not very much about galaxies far far away (or perhaps the other way around). So, even with a set of this size it is not guaranteed that it would be statistically distributed sufficiently to cover all relations found in the performance set.
Deep learning is sometimes very impressive, and it could provide the backbone of a semantic system for AGI. But e.g. the fact that humans do not use training sets of the size of deep learning to learn language strongly suggests that the boundary conditions needed to achieve human-level cognition, and with it the underlying architecture, are fundamentally different from those underlying deep learning (e.g. see https://arxiv.org/abs/2210.10543).
Let's see...
After seventy years we still have not the slightest clue how to make ourselves safe from the first existential scale technology, nuclear weapons. And so, based on that experience, because we are brilliant, we decided to create another existential scale technology, AI, which we also have no idea how to make safe. And then Jennifer Doudna comes along and says, let's make genetic engineering as easy, cheap and accessible to as many people as possible as fast as possible, because we have no idea how to make that safe either.
It's a bizarre experience watching this unfold. All these very intelligent, highly educated, accomplished articulate experts celebrating their wild leap in to civilization threatening irrationality. The plan seems to be to create ever more, ever larger existential threat technologies at an ever accelerating rate, to discover what happens. As if simple common sense couldn't predict that already.
Ok, I'm done, and off to watch Don't Look Up again.
Thanks for the like Gary, and sorry to wander off topic. I probably need my own substack. :-)
My favorite maximalist-scale model is the coelacanth.
https://www.wired.com/2015/03/creature-feature-10-fun-facts-coelacanth/
The scales don't get much bigger than this, but progress has been VERY slow, and there are a lot of easy-for-human things it can't do yet. Or maybe ever.
Despite all odds, they are still around... though endangered.
🤣 get your infinitely resizable scale models here: https://www.cgtrader.com/3d-models/various/various-models/scales-of-justice-d77a18ad-6683-4417-9d8d-08f1cd6fbcae
BTW, I suspect that the average Large Language Model might have trouble deciding whether this comment is serious or not. Let alone, why it is humorous (if you think that it is).
Agree.
I picked up a hardcover copy of NETL while browsing at the Coop many years ago. Nope.... not my field, but I got hooked on the rats opening and kept reading in the store, and bought the book. It's still there right now in front of me in my much-reduced bookshelf. I tried building an Ada version of NETL (yup....weird first language to learn but everyone said at the time this was the future) a few years after that. It didn't do much, but got me thinking about was needed to make something that could make sense of the world much as, or at least on the level, a human could. Thank you for the book Dr Fahlman.
I'm glad you enjoyed the old NETL book. I started work on NETL in 1974, for my PhD thesis at MIT. So almost 50 years ago. Time flies.
Scone, the all-software version of NETL (with a few new tweaks) is my current research focus. If you want to explore that, see
https://fahlman-knowledge-nuggets.quora.com/Tutorial-Information-on-Scone
or wait for the book I'm busy writing, which features a general discussion of Scone and of Knowledge-Based AI in general.
Thanks . I will definitely explore the new project.
Interesting article. You write: "Scaling maximalism is an interesting hypothesis, but I stand by my prediction that it won’t get us to AGI."
In my opinion, you are mincing your words. Scaling maximalism is not an interesting hypothesis. It is a stupid hypothesis made by people who have an expensive but lame deep learning pony in the AGI race. They will not just lose the race, they will come dead last. Just saying.
🤣🤣🤣
Thanks for another 'fun' article, Gary :) Dreyfus said this years back: ...an exercise in "tree climbing with one's eyes on the moon." Using data can never scale.
If existing data by itself is enough to become intelligent, why is (experimental) science performed in the lab? Answer - because the new hypotheses being tested (via experiments) need new, direct, data from the world, rather than proofs from existing equations/laws.
In other words, direct engagement with the open-ended world is the source of knowledge and learning, it's what provides "grounded" meaning, it's what keeps us from being confined to a "frame". Without this, all that exists is meaningless symbol manipulation a la Searle.
The size or power of the AGI itself is of very little importance until it has established feedback loops with the real world.
Intelligence is not about always being right as much as it's about doing something useful when you find out that you're wrong.
I may be a layman, but scaling maximalism seems, in my eyes, to build on the very wobbly hypothesis that a facsimile can be as good *universally* as the original. Simulations strip out non-purpose-critical parts in order to free up the computational space to approach a specific slice of reality in a deep and narrow way. Simulate all contingencies, and you will be left with a model that will be less efficient than the original. Taken to the extreme, the most efficient complete general simulation of the universe is to make another fully functional universe.
AGI, by its very nature, lacks room for that simplification. Its purpose is to have no specific purpose. Some things may be stripped out, to be sure, but they're minute compared to the complexity that is still left.
The constants in AI/AGI are that there is 1) always hype, 2) AGI is very very difficult.
If a certain ability is missing (permanent learning for example), then scaling only scales the consequences of its absence, does not lead to the appearance of this ability.
Absence of learning is certainly a weakness that will not be solved by scaling.
Learning, on the other hand, may have a critical mass, and hence be solved very easily simply by scaling.
Thanks Gary for the great post - just look at my definition of a symbolic language model - it's surely possible to be done by all of us especially in the linguistics departments of the universities:
Symbolic language model is a coherent conceptual model of the world transferred into all the existing languages.
The conceptual model of the world is presented in wordhippo.com on the level of individual meanings and in powerthesaurus.org on the level of word combinations.
-
The Matrix - Global Knowledge Network - https://t.me/thematrixcom, https://activedictionary.ru/
Symbolic Language Models: Arabic, Chinese, English, Finnish, French, German, Hebrew, Hindi, Italian, Japanese, Korean, Norwegian, Portugal, Russian, Spanish, Swedish, Ukrainian, Uzbek
US2021033447 - Neural network for interpreting sentences of a natural language
https://patentscope.wipo.int/search/en/detail.jsf?docId=US339762244&_fid=WO2020106180
I agree data limits start to play a factor at some point...from what I’ve heard there’s not really any more open source code to train larger versions of github copilot on. Though to be precise, I don’t think it stops growth, it just means you have to be suboptimal vs the chinchilla laws (and perhaps starts getting way too expensive for the marginal gains)
That said, I’m curious, what happens if/when GPT-4 is able to figure out what is meant by wearing gloves? Would that change your opinion at all? Or would you adjust to reference a more complex example that it is unable to handle (since as you say, it will certainly be possible to come up with one!)
Also, the 'moving the goal post' thing is a tired trope. The reality is, the goal posts never moved - it's simply, being able to be intelligent like a human - it's the AI community that narrowed it down severely, has had proportionally narrow wins, and complains about moving the posts [which btw is a physical thing that no AI today can actually understand :)] when the limitations get pointed out.
Yeah, I’m guilty of poking at goalposts above. Nonetheless, I do think it would be better to set a better goalpost that’s less likely to moved, though, than one that is likely to need moving in the next generation of scaling. Especially since the argument appears to be that scaling can’t solve these kinds of goals.
Though your other point I think is a really interesting one. Are we talking about whether scaling achieves human intelligence? I totally agree it never will. I think it could still achieve an “alien to us” intelligence though, that works in a different way and still gets to much of the same core of intelligence (but which has some things come easier, and some things harder, vs our human intelligences). Maybe the argument is more “will current AI be permanently narrow, or will it eventually break out from its narrowness into broader generalization”?
The 'frame' problem [which I alluded to in another response to Gary's article, along with 'grounded' meaning, Dreyfus, and Searle - 4-in-one omnibus, lol] will NEVER be solved by data alone. No matter how much data is used, there will always be a frame around it, and things outside that frame, ie. things not in the data.
What's outside the frame, is direct (personal) physical experience with the world. Such experience is what we call 'common sense' - it can be wordless (data-less) , can be rule-less, can be goal-less. AI's failure (in being intelligent, human-like) can be summed up this way: trying to use data, symbols, goals, as substitutes for experience. Experience is what lies outside of AI's self-declared goal posts.
In my view, LLMs do not figure out anything. They are fancy statistical parrots that use thousands or possibly millions of human beings as preprocessors. It's a lame approach to AGI in the 'not even wrong' category.
I agree that it basically relies on human sub-processors (though IMHO starts to generalize from that data, at sufficient scale), and requires a ton more data than any human ever would. So yeah, this is not at all how I thought AI would go. I agree there must be more effective approaches out there (though am not as experienced as others like Gary as to make predictions about what that approach might be).
But unfortunately, I think being “lame” doesn’t mean it won’t achieve intelligence: Lame approaches can still be achieve a goal, even if there are more elegant approaches out there. So more interested in trying to understand if this lame approach will achieve that goal or not, and what future (vs backwards) evidence to look out for.
From my understanding of the meaning of the word, generalization has nothing to do with scale. Even small insects can generalize. They have to because their tiny brains cannot possibly store millions of representations. In other words, generalization is precisely what is required when scaling is too costly or is not an option.
Mike, good question. To figure out how to wear gloves, there needs to be a body, a brain, and curiosity/interest - all the data streams in the world consumed by a bunch of GPUs in a basement, would never have the equivalent of it. In other words, wearing gloves in an experience that can only be had physically, and, for which words etc. (DATA) are optional.
"moving the posts [which btw is a physical thing "
"Moving the goalposts" is called a metaphor, Saty -- and all your other letters here are just as stupid.
No need to reveal your ignorance with your moronic note. Do you have anything useful, instead of your stupid comment, that actually adds to the discussion?
John, lol, I've been using a dyslexic version - IA, rather than AI - Intelligence (OURS) Augmentation :)