I have been writing a draft about all the dimensions, i.e. "volume of parameters, "volume of training", "token vector size", "token dictionary size" (these all go hand in hand, there have been slow down reports where only one scaled, but they need to scale concurrently to not hit a wall quickly, in combination they hit a wall as well, just more slowly), "prompt" and 'algorithm width and length' (e.g. o1's CoT, but also the fact that these days many continuations are calculated side by side with pruning the less promising ones on the fly, some LLMs providing more than one continuation (marshalling the users to train the system more). It's all 'engineering the hell out of a limited approach', scaling isn't going to provide any real understanding, period, and if you need understanding, you need a new paradigm. Maybe I'll finish that and publish.
But that LLMs becoming AGI through scaling is dead (it was dead before it started, afaic) doesn't mean GenAI is going to go away. Cheap "sloppy copy" (of skills) may have a decent market, it can definitely speed up a human work, we simply don't know how much and which use cases are economically viable. Not that much as the hype tells us, but also not zero.
So, the current valuations will get some sort of (large) correction, that seems pretty likely. Many investors will get burned. GenAI will remain.
We won't get AGI or any valuation that requires it, certainly not through scaling. I'm convinced of that. Why not simply ignore that hype instead of fighting it? Maybe because of the energy issue? But then, we have bigger problems with that since a day or five.
My challenge in ignoring the hype is how much work it is driving in corporations, sucking up money and time that could be better directed towards more promising innovations, AI or not. And yes, the data centers - genAI is predicted to take 8% of US energy in the near future.
I have been rolling my eyes at the time and energy expenditures for about two years now. The worst of it is that these companies keep everyone's attention riveted through the screaming headlines and head-spinning numbers, that are completely divorced from the real world
Unfortunately, once again, like the Metaverse or crypto, the finance industry and tech media is directly blasting news about this to pump up their own financial interests. On the second tier you have pundits, grifters and influencers pumping up the story via social media to fuel their own grift.
At the top is Big tech which is on board more than it ever was with Metaverse/crypto, and due to how rich and money printing big tech is, the sheep will follow whatever they say.
And at the very tippy top of that is Nvidia, who has become a household name like Tesla, not due to any innovation mind you, but solely due to presenting a public facing wage stagnation and huge cost of living the idea that a 700% gain in stock appreciation is still possible. You're just one gamble away from making it big and escaping wage dependent living.
The whole thing is a deeply sad reflection of the modern economic system.
Please do. As someone who has been following the variable tweaking, even the foremost variables that tech loves to boast about, is seeing a clear plateau. And this is with respect to chip density, in terms of economics aka cost its even worse. Double the cost but only a 10-15% improvement in metrics that are honestly flawed in themselves.
Extending training time, using human feedback, synthetic data, are all just window dressing attempting to buy time since their step change never materialised.
Even since GPT3 it was clear enough that the Transformer architecture has very serious limitations and scaling alone won't fix that.
Yet, pushing scaling to the limit is a very smart thing, because much (not all!) of world's complexity comes from it not being neat and having so many particular cases which only allow for local generalization.
The industry is in a very good place, at least the bigger players. After a quick shot in the arm, it has to focus more on architecture work.
It looks highly unlikely to me that any approaches that advocate some human designed priors will be able to do as well as what we have.
So, large data and neural nets are here to stay, and we need to figure out what to add on top.
My only hope is that the GenAI frenzy will have kicked off a renaissance in nuclear energy that makes our society willing to revisit this promising technology and its recent innovations, just before the AI bubble finally bursts.
What I find amazing is that anyone would take a "scaling law" seriously as any kind of predictor of intelligence capacity. How could scaling, or anything else lead a word guesser to become a model of cognition instead of a model of language patterns?
I think of scaling as a Hellman's mayonnaise approach. Pour enough ingredients into the jar and pour out a mind. Wishful thinking.
Sydney Harris drew a cartoon of two scientists looking at a black board on which a three-step process is being described. The middle step is "And then a miracle occurs." That is what the scaling law is. As Harris noted, we need to be a little more explicit about step 2. https://www.researchgate.net/figure/Then-a-Miracle-Occurs-Copyrighted-artwork-by-Sydney-Harris-Inc-All-materials-used-with_fig2_302632920 The required miracle is akin to spontaneous generation in which a piece of vermicelli stored in a glass container, through some unknown means began to show voluntary movement (Mary Shelley in the 1831 edition of Frankenstein). It's a nonsense idea in biology and a nonsense idea in artificial intelligence.
Empirically, what the scaling law advocates miss is that the volume of training text also grew as the number of model parameters grew. The probability that any known problem would be contained in the training set grew as the size of the training set grew. Scaling advocates failed to control for the possibility that the models were merely emitting slight paraphrases of text that the models had been trained on. Instead they relied on the logical fallacy of affirming the consequent to justify their scaling "analysis."
If scaling really is the core of generative GenAI, then it may be useful as long as the problems that people give it are sufficiently similar to existing text. As a theory, it is bankrupt. GenAI models may be sufficient to help people work, but they are no more competent than actors reciting lines to appropriate cues. They are definitely not models of cognition or intelligence.
TLDR: I go into a lot of details about the current state of thinking about GenAI and why much of it is nonsense. With the release of GPT 4o and other advancements, the hype train is again accelerating. I argue that the idea that language models could achieve intelligence or any level of cognition is a massive self-deception. There is no plausible theory by which a word guessing language model would acquire reasoning, intelligence, or any other cognitive process. Claims that scaling alone will produce cognition are the result of a logical fallacy (affirming the consequent) and are not supported by any evidence. These claims are akin to biological theories of spontaneous generation, and they demonstrate a lack of understanding of what intelligence is. If the statistical properties of language patterns were the only level of intelligence, every statement would be true and accurate. Intelligence requires multiple levels of representation of the world, the language, and of abstract concepts.
The “and then a miracle occurs” analogy is particularly apt in this case, because even the people developing the models really don’t know why scaling works.
Sam Altman calls it a “religious belief “, which is in line with the “miracle” claim.
To call any of this stuff science or engineering is actually very odd.
I just read your article; it is excellent. Thanks for taking the time to write all that out. I agree that the fundamental reason for doubting the big claims of AI is that there's just no good reason to believe intelligence works this way. All the benchmarks in the world are still no substitute for a plausible theory, and right now so we're offered is the magic of emergence.
An alternative view of what is happening is that we have been passing through three different phases of LLM-based development.
In Phase 1, "scaling is all you need" was the dominant view. As data, network size, and compute scaled, new capabilities (especially in-context learning) emerged. But each increment in performance required exponentially more data and compute.
In Phase 2, "scaling + external resources is all you need" became dominant. It started with RAG and toolformer, but has rapidly moved to include invoking python interpreters and external problem solvers (plan verifiers, wikipedia fact checking, etc.).
In Phase 3, "scaling + external resources + inference compute is all you need". I would characterize this as the realization that the LLM only provides part of what is needed for a complete cognitive system. OpenAI doesn't call it this, but we could view o1 as adopting the impasse mechanism of SOAR-style architectures. If the LLM has high uncertainty after a single forward pass through the model, it decides to conduct some form of forward search combined with answer checking/verification to find the right answer. In SOAR, this generates a new chunk in memory, and perhaps in OpenAI, they will salt this away as a new training example for periodic retraining. The cognitive architecture community has a mature understanding of the components of the human cognitive architecture and how they work together to achieve human general intelligence. In my view, they give us the best operational definition of AGI. If they are correct, then building a cognitive architecture by combining LLMs with the other mechanisms of existing cognitive architectures is likely to produce "AGI" systems with capabilities close to human cognitive capabilities.
In the end it will work as it does for people. The domain dictates the mix of strategies. That's why too much bragging isn't helping really. The problem is complex and the ingredients in the cake are being figured out.
Well, someday someone may figure out how to do it all in a connectionist architecture. But either way, we are seeing more and more structure in these systems. I also think the pragmatic engineers in startups will be thinking: "I could try to do reasoning inside the net, but damn this SAT solver runs fast on my GPU." I'm on the lookout for interesting combinations of heavily optimized symbolic AI reasoning engines and strong contextual knowledge retrieved from the LLM. That would give us the soundness of the inference engine plus the rich context and world knowledge of the LLM. It's not how people work, but it is a great way to build an AI system.
Mostly agreed, with the caveats that (a) we don't yet understand how to combine LLMs with those other mechanisms in a way that will work, and (b) even when we make some progress on that question, I think it's still going to be incremental; I would not yet use words like "likely ... close to human cognitive capabilities".
Maybe. The more structure that is exposed by the system, the more interpretable it can be. For example, RAG makes it possible to cite source documents. However, as the size of a search space scales up (e.g., in AlphaGo or in a SAT solver), the size of the "explanation" grows very large, and new techniques are needed to summarize it. That raises the long-standing challenge of discovering human-interpretable abstractions.
In Retrieval Augmented Generation, a collection of documents (e.g., Wikipedia) is pre-processed and indexed into a vector data base. During generation, your question is matched to the vector database, and relevant passages from the documents are copied into the LLM's context buffer. Bing (and presumably Google) also do a web search and include some results in the input buffer as well. My simple model is that it is these retrieved documents that are cited. But I imagine the commercial models have multiple strategies for determining which documents to cite. Studies have shown that the generated answers can have a mix of retrieved material and information learned during the pre-training phase. You must check everything an LLM produces!
This is particularly evident using Perplexity. It's also particularly frustrating as if the source is garbage the RAG will basically output garbage, or an interpretation that is based on garbage. Humans can quickly tell when a source is bad but it seems difficult for their pipeline to do that. I am also still wondering if they remain slave of the SEO and PageRank algorithms to extract those documents?
It may be an empirical hypothesis but having LLMs produce synthetic data that is then used to train more powerful LLMs *seems* like it should violate some fundamental law.
The problem is actually not restricted to “synthetic data” produced specifically for training purposes but in fact, given that the web is now being flooded with LLM generated data, simply training on random data from the web going forward will inevitably have the same result.
So , LLMs require more data but more data generated by LLMs will actually make them worse.
As we suspected. It is likely that LLM’s as a class will hit a conceptual architectural limit and true or just significantly better AI or ML will be something different
LLM are remarkably good at bringing one in the ballpark. As with Moore's law, pushing that to the limit and then building architecture to deal with its limitations looks like the better choice.
I am really not technically qualified here. I come at it moore from a conceptual philosophical pov. Moore’s “Law” is likely not a law like the law of gravity. It is remarkable that it has gone on as long as it has. I think maybe because it was called a law it has inspired continued progress. Similarly the assertion of scaling for LLM’s has not been proven but the progress impressive and the latest versions will have remarkable impact on almost everything. But as smart as it is, it ain’t intelligence. Yeah there is the Turing Test but as brilliant as he was if he were alive today I don’t think he would call it intelligence. It is not blasphemy to be skeptical about scaling and posit limits and new approaches. Unless you are invested in the religion
Intelligence is not one thing but rather a collection of skills. The near-term goal is to have AI assistants that can help do work, and to improve their reliability and make a profit.
What comes after that we'll see. Ideas for alternative approaches aren't new, in fact date back decades, and their track record hasn't been great. At some point maybe a mix of ideas will work out.
I don't know. Dennett always said that the tester has to be very clever. That said, I am now convinced the ELIZA Effect is even more powerful than I ever dreamed; so if you are referring to the popular version of the test (i.e.,fooling the unprepared) I would agree.
The big paradigm shift we need to make is that we come to grips with the fact that we're the most intelligent species on the planet, but not necessarily very intelligent in an absolute sense. We're mostly 'dumb' automation too. See https://youtu.be/3riSN5TCuoE and https://youtu.be/9_Rk-DZCVKE for the relation between the digital revolution and that paradigm shift.
I think maybe because it was called a law it has inspired continued progress“
That’s exactly what Cal Tech scientist Carver Mead has said about Moore’s Law and he should know because he coined the term.
Mead, who is basically behind the whole integrated circuit miniaturization scheme, says that in the early days, he had no luck convincing anyone that his ideas would work.
But once someone tried it and saw that it did work, it became a self perpetuating process.
As Mead has pointed out, the key element is “belief that it will work”.
But belief is obviously not the same as physical law and when the two are in competition, physical law wins every time.
Assuming that's a direct quote from Altman, he's channeling Donald Trump. Are all the techbro ring kissers going to start spouting barely comprehensible gibberish?
OK I listened to it. What the fuck is an air winter? Or for that matter a PGism? That Ycombinator guy's breathless "yeah" at the end, after that comment about a new square in the periodic table; it was like SA had just shown him his new Power Ranger.
When OpenAI runs out of high quality copyrighted…I mean “publicly available”… data to train their chatbot on , they can use their secret weapon: Sam Altman’s large body of verbiage.
The more LLM generated material is out there the more LLMs will train themselves on themselves. Even my cat, after throwing up and eating her own throw-up, doesn't eat it again if she throws it up again. Cats are smarter than LLMs
I think confronting a "wall" with LLM's opens up a new opportunity. The concept we are all assuming we know that makes the term "wall" useful is that it is more "tangible" than an open ended search. So, figuring out what such a "wall" is, should be easier than trying to explain the void of continued growth.
Since my post generated some interest, let me add 2 concrete examples of my point.
The AI community still doesn’t have an “algorithmic” model for human reasoning. If I had to summarize LLMs, it would be, “look for key words in a database, and choose the most common one to form a grammatically acceptable reply.” BUT, how are we judging the “correctness” of LLM results? Ask GPT to explain why “Democracy” isn’t doing too well. I did. I got the typical reply “Well, there are multiple ‘views’ about this. Here they are.” That, of course, gets a low score. To score high, we really want the “TRUE” answer. To the point of the article, once a sufficient database is found to produce a good summary of “views”, “scaling” will no longer provide a proportional gain.
The second example is how we are judging “consciousness” in the first place. If we consider humans with low IQ scores, don’t we “automatically” endow them with “consciousness”, just because they’re “human”? If we then find a human with an IQ of 140, is there any question about them being “conscious”? Clearly no. BUT, if we ask them to provide a short synopsis for each entry in the Encyclopedia Brittanica, for sure, they will get a low score. Yet GPT would ACE the reply, and do it in a flash. AND, scaling up the data base, once it has the whole Brittanica in it, isn’t going to improve the score. But GPT still doesn’t get a “consciousness” badge.
There is enough “challenge” in just these 2 examples to put a lot of “structure” in answering the “wall” question!
I am not surprised it isn't as simple as the wishful hypotheses said. It is easy to make big improvements initially, but it's not surprising that improvements become more difficult to make rather than accelerating over time. I am afraid the model has been oversimplified to sell to investors. I think AI will be useful, especially for somewhat boring, repetitive tasks like reading xrays, etc. i think the more nuanced and complex the task the more difficulty the next generation of AI will have making major improvements. And eventually it will save time and money on certain kinds of tasks. Instead of being impressed with what it can do, i think over selling the public will bring a wave of disappointment, before we get the next level of investment and improvements in AI
This is a great discussion. I do however believe we're missing the bigger picture ... John von Neumann in "The Computer and the Brain", published posthumously in 1958 ... the main relevant items re the LLM Wall ... which I believe is a methodological training wall, not a real limit are the following considerations:
a) the most important points
- Lots of relatively low precision, slow (compared to electronic) NETWORKED processing nodes
b) capacity for massive parallel processing. The estimated brain consisting of (many?) tens of billion neurons, each connected to thousands of others, facilitating simultaneous processing across vast neural networks. The parallelism enables rapid integration of information from diverse sources,...
b) biological systems are inherently robust; the loss or malfunction of individual neurons doesn't typically impair overall function. This resilience arises from overlapping functionalities and the ability of neural networks to reorganize and adapt
c) stochastic, or probabilistic, nature of neural processes. Neuronal firing isn't solely deterministic; it's influenced by a range of variables, including synaptic weights and neurotransmitter levels, which can introduce randomness into neural computations. The stochasticity allows the brain to be highly adaptable, capable of learning from experience, generalizing from incomplete data, and exhibiting creativity.
TO ME, this suggests the work needs to shift to exploring different network structures for 'artificial neural networks' .. different architectures, different topologies, - Use what's know about the brain 'mechanical' architecture more wisely. really understand the brain synaptic organization and the recurrent loops in the brain to conceptualize new architectures and paradigms for using information ... clearly, real experience shows that people like Einstein didn't need to know everything about everything ... and have infinite data ... so. get smart, and get busy ... if course, the lazy way is to look for more training data and more training epochs ... but that's just me ... arguing that the long hanging fruit have been picked, and now one has to get smarter. Cheers. Stay positive, great things are coming (in the AI world)
🧵As I’ve said, I’m trying to read your book with an open mind.
But right away, I have to take issue with you when you say things about Chatbots like,
“Their immense appeal is undeniable. If you play with them … you instantly see why they have become so popular.” (page 26 in the softcover edition)
——
Not for ME‼️ It’s ersatz, I have a great dislike of that.
I was an antiques dealer early in my life, so I have some knowledge of fake or “style of,” as opposed to the real thing.
[And if I need to, I can contact my late mother, Lorraine Waxman Pearce, appointed the first Curator of the White House, Kennedy restoration.]
So as you say, that poem is not a great poem, created by the chatbot, but I wouldn’t ASK a Chatbot to create ANY art or literature for me. I’d just DO IT, myself, 😒‼️
I have been writing a draft about all the dimensions, i.e. "volume of parameters, "volume of training", "token vector size", "token dictionary size" (these all go hand in hand, there have been slow down reports where only one scaled, but they need to scale concurrently to not hit a wall quickly, in combination they hit a wall as well, just more slowly), "prompt" and 'algorithm width and length' (e.g. o1's CoT, but also the fact that these days many continuations are calculated side by side with pruning the less promising ones on the fly, some LLMs providing more than one continuation (marshalling the users to train the system more). It's all 'engineering the hell out of a limited approach', scaling isn't going to provide any real understanding, period, and if you need understanding, you need a new paradigm. Maybe I'll finish that and publish.
But that LLMs becoming AGI through scaling is dead (it was dead before it started, afaic) doesn't mean GenAI is going to go away. Cheap "sloppy copy" (of skills) may have a decent market, it can definitely speed up a human work, we simply don't know how much and which use cases are economically viable. Not that much as the hype tells us, but also not zero.
So, the current valuations will get some sort of (large) correction, that seems pretty likely. Many investors will get burned. GenAI will remain.
We won't get AGI or any valuation that requires it, certainly not through scaling. I'm convinced of that. Why not simply ignore that hype instead of fighting it? Maybe because of the energy issue? But then, we have bigger problems with that since a day or five.
My challenge in ignoring the hype is how much work it is driving in corporations, sucking up money and time that could be better directed towards more promising innovations, AI or not. And yes, the data centers - genAI is predicted to take 8% of US energy in the near future.
I have been rolling my eyes at the time and energy expenditures for about two years now. The worst of it is that these companies keep everyone's attention riveted through the screaming headlines and head-spinning numbers, that are completely divorced from the real world
Lots of male peacock displaying going on
“Look at me! Am I not spectacular?”
Haha yes
Unfortunately, once again, like the Metaverse or crypto, the finance industry and tech media is directly blasting news about this to pump up their own financial interests. On the second tier you have pundits, grifters and influencers pumping up the story via social media to fuel their own grift.
At the top is Big tech which is on board more than it ever was with Metaverse/crypto, and due to how rich and money printing big tech is, the sheep will follow whatever they say.
And at the very tippy top of that is Nvidia, who has become a household name like Tesla, not due to any innovation mind you, but solely due to presenting a public facing wage stagnation and huge cost of living the idea that a 700% gain in stock appreciation is still possible. You're just one gamble away from making it big and escaping wage dependent living.
The whole thing is a deeply sad reflection of the modern economic system.
Please do. As someone who has been following the variable tweaking, even the foremost variables that tech loves to boast about, is seeing a clear plateau. And this is with respect to chip density, in terms of economics aka cost its even worse. Double the cost but only a 10-15% improvement in metrics that are honestly flawed in themselves.
Extending training time, using human feedback, synthetic data, are all just window dressing attempting to buy time since their step change never materialised.
Even since GPT3 it was clear enough that the Transformer architecture has very serious limitations and scaling alone won't fix that.
Yet, pushing scaling to the limit is a very smart thing, because much (not all!) of world's complexity comes from it not being neat and having so many particular cases which only allow for local generalization.
The industry is in a very good place, at least the bigger players. After a quick shot in the arm, it has to focus more on architecture work.
It looks highly unlikely to me that any approaches that advocate some human designed priors will be able to do as well as what we have.
So, large data and neural nets are here to stay, and we need to figure out what to add on top.
My only hope is that the GenAI frenzy will have kicked off a renaissance in nuclear energy that makes our society willing to revisit this promising technology and its recent innovations, just before the AI bubble finally bursts.
The investors are going to reap what they sow, but unfortunately they are going to take a lot of innocent bystanders down with them
What I find amazing is that anyone would take a "scaling law" seriously as any kind of predictor of intelligence capacity. How could scaling, or anything else lead a word guesser to become a model of cognition instead of a model of language patterns?
I think of scaling as a Hellman's mayonnaise approach. Pour enough ingredients into the jar and pour out a mind. Wishful thinking.
Sydney Harris drew a cartoon of two scientists looking at a black board on which a three-step process is being described. The middle step is "And then a miracle occurs." That is what the scaling law is. As Harris noted, we need to be a little more explicit about step 2. https://www.researchgate.net/figure/Then-a-Miracle-Occurs-Copyrighted-artwork-by-Sydney-Harris-Inc-All-materials-used-with_fig2_302632920 The required miracle is akin to spontaneous generation in which a piece of vermicelli stored in a glass container, through some unknown means began to show voluntary movement (Mary Shelley in the 1831 edition of Frankenstein). It's a nonsense idea in biology and a nonsense idea in artificial intelligence.
Empirically, what the scaling law advocates miss is that the volume of training text also grew as the number of model parameters grew. The probability that any known problem would be contained in the training set grew as the size of the training set grew. Scaling advocates failed to control for the possibility that the models were merely emitting slight paraphrases of text that the models had been trained on. Instead they relied on the logical fallacy of affirming the consequent to justify their scaling "analysis."
If scaling really is the core of generative GenAI, then it may be useful as long as the problems that people give it are sufficiently similar to existing text. As a theory, it is bankrupt. GenAI models may be sufficient to help people work, but they are no more competent than actors reciting lines to appropriate cues. They are definitely not models of cognition or intelligence.
I was working on a longer statement along these lines, which I have now posted at: https://www.linkedin.com/pulse/state-thought-genai-herbert-roitblat-kxvmc
Generative AI is still not General AI
TLDR: I go into a lot of details about the current state of thinking about GenAI and why much of it is nonsense. With the release of GPT 4o and other advancements, the hype train is again accelerating. I argue that the idea that language models could achieve intelligence or any level of cognition is a massive self-deception. There is no plausible theory by which a word guessing language model would acquire reasoning, intelligence, or any other cognitive process. Claims that scaling alone will produce cognition are the result of a logical fallacy (affirming the consequent) and are not supported by any evidence. These claims are akin to biological theories of spontaneous generation, and they demonstrate a lack of understanding of what intelligence is. If the statistical properties of language patterns were the only level of intelligence, every statement would be true and accurate. Intelligence requires multiple levels of representation of the world, the language, and of abstract concepts.
The “and then a miracle occurs” analogy is particularly apt in this case, because even the people developing the models really don’t know why scaling works.
Sam Altman calls it a “religious belief “, which is in line with the “miracle” claim.
To call any of this stuff science or engineering is actually very odd.
It really has far more in common with the occult.
I just read your article; it is excellent. Thanks for taking the time to write all that out. I agree that the fundamental reason for doubting the big claims of AI is that there's just no good reason to believe intelligence works this way. All the benchmarks in the world are still no substitute for a plausible theory, and right now so we're offered is the magic of emergence.
An alternative view of what is happening is that we have been passing through three different phases of LLM-based development.
In Phase 1, "scaling is all you need" was the dominant view. As data, network size, and compute scaled, new capabilities (especially in-context learning) emerged. But each increment in performance required exponentially more data and compute.
In Phase 2, "scaling + external resources is all you need" became dominant. It started with RAG and toolformer, but has rapidly moved to include invoking python interpreters and external problem solvers (plan verifiers, wikipedia fact checking, etc.).
In Phase 3, "scaling + external resources + inference compute is all you need". I would characterize this as the realization that the LLM only provides part of what is needed for a complete cognitive system. OpenAI doesn't call it this, but we could view o1 as adopting the impasse mechanism of SOAR-style architectures. If the LLM has high uncertainty after a single forward pass through the model, it decides to conduct some form of forward search combined with answer checking/verification to find the right answer. In SOAR, this generates a new chunk in memory, and perhaps in OpenAI, they will salt this away as a new training example for periodic retraining. The cognitive architecture community has a mature understanding of the components of the human cognitive architecture and how they work together to achieve human general intelligence. In my view, they give us the best operational definition of AGI. If they are correct, then building a cognitive architecture by combining LLMs with the other mechanisms of existing cognitive architectures is likely to produce "AGI" systems with capabilities close to human cognitive capabilities.
sounds like neurosymbolic AI in the end, no?
In the end it will work as it does for people. The domain dictates the mix of strategies. That's why too much bragging isn't helping really. The problem is complex and the ingredients in the cake are being figured out.
Well, someday someone may figure out how to do it all in a connectionist architecture. But either way, we are seeing more and more structure in these systems. I also think the pragmatic engineers in startups will be thinking: "I could try to do reasoning inside the net, but damn this SAT solver runs fast on my GPU." I'm on the lookout for interesting combinations of heavily optimized symbolic AI reasoning engines and strong contextual knowledge retrieved from the LLM. That would give us the soundness of the inference engine plus the rich context and world knowledge of the LLM. It's not how people work, but it is a great way to build an AI system.
Mostly agreed, with the caveats that (a) we don't yet understand how to combine LLMs with those other mechanisms in a way that will work, and (b) even when we make some progress on that question, I think it's still going to be incremental; I would not yet use words like "likely ... close to human cognitive capabilities".
Yes, I'm speculating here (and will probably regret it quite soon). If past experience is a guide, we will discover yet more pieces that are needed.
Tom, question from a neophyte, would all these additional systems ancillary to the LLM be helpful in terms of interpretability?
Maybe. The more structure that is exposed by the system, the more interpretable it can be. For example, RAG makes it possible to cite source documents. However, as the size of a search space scales up (e.g., in AlphaGo or in a SAT solver), the size of the "explanation" grows very large, and new techniques are needed to summarize it. That raises the long-standing challenge of discovering human-interpretable abstractions.
When LLMs cite “source documents” are they actually citing the specific documents from which particular data came?
Or are they citing after the fact “best guesses” about where the data likely might have come from? —Eg, based on a web search of key words
If they are citing the actual source documents , how does that work?
I've seen a GPT based system cite correctly in some sense, but still hallucinate the details when forced (for example) to do arithmetic.
In Retrieval Augmented Generation, a collection of documents (e.g., Wikipedia) is pre-processed and indexed into a vector data base. During generation, your question is matched to the vector database, and relevant passages from the documents are copied into the LLM's context buffer. Bing (and presumably Google) also do a web search and include some results in the input buffer as well. My simple model is that it is these retrieved documents that are cited. But I imagine the commercial models have multiple strategies for determining which documents to cite. Studies have shown that the generated answers can have a mix of retrieved material and information learned during the pre-training phase. You must check everything an LLM produces!
This is particularly evident using Perplexity. It's also particularly frustrating as if the source is garbage the RAG will basically output garbage, or an interpretation that is based on garbage. Humans can quickly tell when a source is bad but it seems difficult for their pipeline to do that. I am also still wondering if they remain slave of the SEO and PageRank algorithms to extract those documents?
Thanks.
And I plan to check😊
It may be an empirical hypothesis but having LLMs produce synthetic data that is then used to train more powerful LLMs *seems* like it should violate some fundamental law.
That's too close to alchemy.
There is actually research showing that training LLMs on output from LLMs leads to model collapse after just a few iterations.
One would think that the developers of LLMs would be very concerned about such things.
“AI models collapse when trained on recursively generated data”(published in Nature)
https://www.nature.com/articles/s41586-024-07566-y
The problem is actually not restricted to “synthetic data” produced specifically for training purposes but in fact, given that the web is now being flooded with LLM generated data, simply training on random data from the web going forward will inevitably have the same result.
So , LLMs require more data but more data generated by LLMs will actually make them worse.
Quite the pickle.
Wait a minute. I saw something on exactly that a couple years ago. When you do that, it results in a degenerative cycle that produces wave patterns.
An awful lot of paychecks rely on the scaling laws holding. The delusion may persist for some time.
Paychecks scale with the wind.
And an ill wind is now blowing
As we suspected. It is likely that LLM’s as a class will hit a conceptual architectural limit and true or just significantly better AI or ML will be something different
LLM are remarkably good at bringing one in the ballpark. As with Moore's law, pushing that to the limit and then building architecture to deal with its limitations looks like the better choice.
I am really not technically qualified here. I come at it moore from a conceptual philosophical pov. Moore’s “Law” is likely not a law like the law of gravity. It is remarkable that it has gone on as long as it has. I think maybe because it was called a law it has inspired continued progress. Similarly the assertion of scaling for LLM’s has not been proven but the progress impressive and the latest versions will have remarkable impact on almost everything. But as smart as it is, it ain’t intelligence. Yeah there is the Turing Test but as brilliant as he was if he were alive today I don’t think he would call it intelligence. It is not blasphemy to be skeptical about scaling and posit limits and new approaches. Unless you are invested in the religion
Intelligence is not one thing but rather a collection of skills. The near-term goal is to have AI assistants that can help do work, and to improve their reliability and make a profit.
What comes after that we'll see. Ideas for alternative approaches aren't new, in fact date back decades, and their track record hasn't been great. At some point maybe a mix of ideas will work out.
Intelligence is not a collection of skills. It is the capability of *acquiring* skills.
We'll get useful 'skill machines' from this, little doubt.
PS. The Turing test is based on the assumption that humans are hard to fool. The reverse is true,
Actual intelligence requires sentience, which requires consciousness. IMHO.
Intelligence evolved, from simple to complex. First imitate, then iterate, reflect, abstract, etc. AI is following the same path.
AI is not following that path (yet). Nor is our kind of intelligence a necessary outcome of evolution.
The Turing test is based on the assumption that humans are hard to fool. The reverse is true,“
LLMs certainly put the Turing test to bed.
In fact, they make me wonder whether humans are even intelligent.
I don't know. Dennett always said that the tester has to be very clever. That said, I am now convinced the ELIZA Effect is even more powerful than I ever dreamed; so if you are referring to the popular version of the test (i.e.,fooling the unprepared) I would agree.
The big paradigm shift we need to make is that we come to grips with the fact that we're the most intelligent species on the planet, but not necessarily very intelligent in an absolute sense. We're mostly 'dumb' automation too. See https://youtu.be/3riSN5TCuoE and https://youtu.be/9_Rk-DZCVKE for the relation between the digital revolution and that paradigm shift.
I think maybe because it was called a law it has inspired continued progress“
That’s exactly what Cal Tech scientist Carver Mead has said about Moore’s Law and he should know because he coined the term.
Mead, who is basically behind the whole integrated circuit miniaturization scheme, says that in the early days, he had no luck convincing anyone that his ideas would work.
But once someone tried it and saw that it did work, it became a self perpetuating process.
As Mead has pointed out, the key element is “belief that it will work”.
But belief is obviously not the same as physical law and when the two are in competition, physical law wins every time.
When faith is lost on a pyramid scheme, sources to new sucker money dry up and the scam collapses.
Assuming that's a direct quote from Altman, he's channeling Donald Trump. Are all the techbro ring kissers going to start spouting barely comprehensible gibberish?
i included the link to the full video, which i condensed for readability. judge for yourself :)
OK I listened to it. What the fuck is an air winter? Or for that matter a PGism? That Ycombinator guy's breathless "yeah" at the end, after that comment about a new square in the periodic table; it was like SA had just shown him his new Power Ranger.
After OpenAI trains chatGPT on Sam Altman’s statements, AI chemists will undoubtedly refer to “the Periodic Table of Squares”
Prompter: “How many chemical elements are there?”
ChatGPT: “I’m not familiar with that term, elements. Could you be more specific?”
Prompter: “You know, in the Periodic Table”
ChatGPT: “Ah, you mean “squares.”The International Union of Pure and Applied Chemistry recognizes 118 squares”
Make that the “International Union of Pure and Applied AI Chemy (IUPAAIC)”
When OpenAI runs out of high quality copyrighted…I mean “publicly available”… data to train their chatbot on , they can use their secret weapon: Sam Altman’s large body of verbiage.
If that doesn’t get them to AGI , nothing will
The more LLM generated material is out there the more LLMs will train themselves on themselves. Even my cat, after throwing up and eating her own throw-up, doesn't eat it again if she throws it up again. Cats are smarter than LLMs
Yes, Mad AI disease is a real thing
Or on the case of Meta’s AI, I suppose it’s called Mad Llama disease
Anyone who has ever been around llamas knows that there is nothing more dangerous than a mad Llama.
They spit
I think confronting a "wall" with LLM's opens up a new opportunity. The concept we are all assuming we know that makes the term "wall" useful is that it is more "tangible" than an open ended search. So, figuring out what such a "wall" is, should be easier than trying to explain the void of continued growth.
Since my post generated some interest, let me add 2 concrete examples of my point.
The AI community still doesn’t have an “algorithmic” model for human reasoning. If I had to summarize LLMs, it would be, “look for key words in a database, and choose the most common one to form a grammatically acceptable reply.” BUT, how are we judging the “correctness” of LLM results? Ask GPT to explain why “Democracy” isn’t doing too well. I did. I got the typical reply “Well, there are multiple ‘views’ about this. Here they are.” That, of course, gets a low score. To score high, we really want the “TRUE” answer. To the point of the article, once a sufficient database is found to produce a good summary of “views”, “scaling” will no longer provide a proportional gain.
The second example is how we are judging “consciousness” in the first place. If we consider humans with low IQ scores, don’t we “automatically” endow them with “consciousness”, just because they’re “human”? If we then find a human with an IQ of 140, is there any question about them being “conscious”? Clearly no. BUT, if we ask them to provide a short synopsis for each entry in the Encyclopedia Brittanica, for sure, they will get a low score. Yet GPT would ACE the reply, and do it in a flash. AND, scaling up the data base, once it has the whole Brittanica in it, isn’t going to improve the score. But GPT still doesn’t get a “consciousness” badge.
There is enough “challenge” in just these 2 examples to put a lot of “structure” in answering the “wall” question!
Would it? I have seen a lot of convincing summaries of Teams meetings that are subtly wrong.
I am not surprised it isn't as simple as the wishful hypotheses said. It is easy to make big improvements initially, but it's not surprising that improvements become more difficult to make rather than accelerating over time. I am afraid the model has been oversimplified to sell to investors. I think AI will be useful, especially for somewhat boring, repetitive tasks like reading xrays, etc. i think the more nuanced and complex the task the more difficulty the next generation of AI will have making major improvements. And eventually it will save time and money on certain kinds of tasks. Instead of being impressed with what it can do, i think over selling the public will bring a wave of disappointment, before we get the next level of investment and improvements in AI
I thought that the core hypothesis driving generative AI was that “hype will make us (the Lords of AI) all billionaires”
It seems to have worked spectacularly so far and shows no sign of being wrong.
It was the best of times, of exponentially overhyped scaling laws, and it was the worst of times, of exponentially diminishing returns...
Gen AI does well because it lends itself to recursive hype generation.
So, pending stack overflow, then? :)
This is a great discussion. I do however believe we're missing the bigger picture ... John von Neumann in "The Computer and the Brain", published posthumously in 1958 ... the main relevant items re the LLM Wall ... which I believe is a methodological training wall, not a real limit are the following considerations:
a) the most important points
- Lots of relatively low precision, slow (compared to electronic) NETWORKED processing nodes
b) capacity for massive parallel processing. The estimated brain consisting of (many?) tens of billion neurons, each connected to thousands of others, facilitating simultaneous processing across vast neural networks. The parallelism enables rapid integration of information from diverse sources,...
b) biological systems are inherently robust; the loss or malfunction of individual neurons doesn't typically impair overall function. This resilience arises from overlapping functionalities and the ability of neural networks to reorganize and adapt
c) stochastic, or probabilistic, nature of neural processes. Neuronal firing isn't solely deterministic; it's influenced by a range of variables, including synaptic weights and neurotransmitter levels, which can introduce randomness into neural computations. The stochasticity allows the brain to be highly adaptable, capable of learning from experience, generalizing from incomplete data, and exhibiting creativity.
TO ME, this suggests the work needs to shift to exploring different network structures for 'artificial neural networks' .. different architectures, different topologies, - Use what's know about the brain 'mechanical' architecture more wisely. really understand the brain synaptic organization and the recurrent loops in the brain to conceptualize new architectures and paradigms for using information ... clearly, real experience shows that people like Einstein didn't need to know everything about everything ... and have infinite data ... so. get smart, and get busy ... if course, the lazy way is to look for more training data and more training epochs ... but that's just me ... arguing that the long hanging fruit have been picked, and now one has to get smarter. Cheers. Stay positive, great things are coming (in the AI world)
This is a fine, measured, focused take on the issue. We need more of this. Time will show.
🧵As I’ve said, I’m trying to read your book with an open mind.
But right away, I have to take issue with you when you say things about Chatbots like,
“Their immense appeal is undeniable. If you play with them … you instantly see why they have become so popular.” (page 26 in the softcover edition)
——
Not for ME‼️ It’s ersatz, I have a great dislike of that.
I was an antiques dealer early in my life, so I have some knowledge of fake or “style of,” as opposed to the real thing.
[And if I need to, I can contact my late mother, Lorraine Waxman Pearce, appointed the first Curator of the White House, Kennedy restoration.]
So as you say, that poem is not a great poem, created by the chatbot, but I wouldn’t ASK a Chatbot to create ANY art or literature for me. I’d just DO IT, myself, 😒‼️
AT LEAST IT WOULD BE AUTHENTIC !