99 Comments

What I find amazing is that anyone would take a "scaling law" seriously as any kind of predictor of intelligence capacity. How could scaling, or anything else lead a word guesser to become a model of cognition instead of a model of language patterns?

I think of scaling as a Hellman's mayonnaise approach. Pour enough ingredients into the jar and pour out a mind. Wishful thinking.

Sydney Harris drew a cartoon of two scientists looking at a black board on which a three-step process is being described. The middle step is "And then a miracle occurs." That is what the scaling law is. As Harris noted, we need to be a little more explicit about step 2. https://www.researchgate.net/figure/Then-a-Miracle-Occurs-Copyrighted-artwork-by-Sydney-Harris-Inc-All-materials-used-with_fig2_302632920 The required miracle is akin to spontaneous generation in which a piece of vermicelli stored in a glass container, through some unknown means began to show voluntary movement (Mary Shelley in the 1831 edition of Frankenstein). It's a nonsense idea in biology and a nonsense idea in artificial intelligence.

Empirically, what the scaling law advocates miss is that the volume of training text also grew as the number of model parameters grew. The probability that any known problem would be contained in the training set grew as the size of the training set grew. Scaling advocates failed to control for the possibility that the models were merely emitting slight paraphrases of text that the models had been trained on. Instead they relied on the logical fallacy of affirming the consequent to justify their scaling "analysis."

If scaling really is the core of generative GenAI, then it may be useful as long as the problems that people give it are sufficiently similar to existing text. As a theory, it is bankrupt. GenAI models may be sufficient to help people work, but they are no more competent than actors reciting lines to appropriate cues. They are definitely not models of cognition or intelligence.

Expand full comment

I was working on a longer statement along these lines, which I have now posted at: https://www.linkedin.com/pulse/state-thought-genai-herbert-roitblat-kxvmc

Generative AI is still not General AI

TLDR: I go into a lot of details about the current state of thinking about GenAI and why much of it is nonsense. With the release of GPT 4o and other advancements, the hype train is again accelerating. I argue that the idea that language models could achieve intelligence or any level of cognition is a massive self-deception. There is no plausible theory by which a word guessing language model would acquire reasoning, intelligence, or any other cognitive process. Claims that scaling alone will produce cognition are the result of a logical fallacy (affirming the consequent) and are not supported by any evidence. These claims are akin to biological theories of spontaneous generation, and they demonstrate a lack of understanding of what intelligence is. If the statistical properties of language patterns were the only level of intelligence, every statement would be true and accurate. Intelligence requires multiple levels of representation of the world, the language, and of abstract concepts.

Expand full comment

The “and then a miracle occurs” analogy is particularly apt in this case, because even the people developing the models really don’t know why scaling works.

Sam Altman calls it a “religious belief “, which is in line with the “miracle” claim.

To call any of this stuff science or engineering is actually very odd.

It really has far more in common with the occult.

Expand full comment

Despite the claims coming from their salesmen and saleswomen, LLMs don’t actually understand the real world and are simply recreating superficial patterns present in their training data

“AI video generators like OpenAI's Sora don't grasp basic physics, study finds”

https://the-decoder.com/ai-video-generators-like-openais-sora-dont-grasp-basic-physics-study-finds/

Hard to see how something like Sora is going to “solve physics” when it has no understanding of even rudimentary physical concepts

Expand full comment

LLMs are reminiscent of Clever Hans, the “mathematical horse” that got correct answers to math problems by picking up on subtle behavioral patterns provided (perhaps unwittingly) by its owner.

And like Clever Hans, “Clever LLMs” have fooled (unwittingly, of course) a lot of intelligent people.

But Clever Hans actually WAS clever, (just not in the way everyone thought.)

The same can not be said for LLMs which are simply outputting patterns based on statistics.

Expand full comment

There is an interesting psychological aspect at work in that even after the evidence makes it clear, people don’t wish to admit that they were fooled by a horse, so they will continue to defend the horse manure.

Expand full comment

I just read your article; it is excellent. Thanks for taking the time to write all that out. I agree that the fundamental reason for doubting the big claims of AI is that there's just no good reason to believe intelligence works this way. All the benchmarks in the world are still no substitute for a plausible theory, and right now all we're offered is the magic of emergence.

Expand full comment

“When one develops artificial intelligence, either one should have a clear physical model in mind or one should have a rigorous mathematical basis. AI-chemy has neither” — Enrico Fermi

https://m.youtube.com/watch?v=hV41QEKiMlM

Expand full comment

And with 175 billion parameters, Johnny Von Neumann could make the elephant hallucinate like an LLM

Expand full comment

I have been writing a draft about all the dimensions, i.e. "volume of parameters, "volume of training", "token vector size", "token dictionary size" (these all go hand in hand, there have been slow down reports where only one scaled, but they need to scale concurrently to not hit a wall quickly, in combination they hit a wall as well, just more slowly), "prompt" and 'algorithm width and length' (e.g. o1's CoT, but also the fact that these days many continuations are calculated side by side with pruning the less promising ones on the fly, some LLMs providing more than one continuation (marshalling the users to train the system more). It's all 'engineering the hell out of a limited approach', scaling isn't going to provide any real understanding, period, and if you need understanding, you need a new paradigm. Maybe I'll finish that and publish.

But that LLMs becoming AGI through scaling is dead (it was dead before it started, afaic) doesn't mean GenAI is going to go away. Cheap "sloppy copy" (of skills) may have a decent market, it can definitely speed up a human work, we simply don't know how much and which use cases are economically viable. Not that much as the hype tells us, but also not zero.

So, the current valuations will get some sort of (large) correction, that seems pretty likely. Many investors will get burned. GenAI will remain.

We won't get AGI or any valuation that requires it, certainly not through scaling. I'm convinced of that. Why not simply ignore that hype instead of fighting it? Maybe because of the energy issue? But then, we have bigger problems with that since a day or five.

Expand full comment

My challenge in ignoring the hype is how much work it is driving in corporations, sucking up money and time that could be better directed towards more promising innovations, AI or not. And yes, the data centers - genAI is predicted to take 8% of US energy in the near future.

Expand full comment

I have been rolling my eyes at the time and energy expenditures for about two years now. The worst of it is that these companies keep everyone's attention riveted through the screaming headlines and head-spinning numbers, that are completely divorced from the real world

Expand full comment

Lots of male peacock displaying going on

“Look at me! Am I not spectacular?”

Expand full comment

Haha yes

Expand full comment

Unfortunately, once again, like the Metaverse or crypto, the finance industry and tech media is directly blasting news about this to pump up their own financial interests. On the second tier you have pundits, grifters and influencers pumping up the story via social media to fuel their own grift.

At the top is Big tech which is on board more than it ever was with Metaverse/crypto, and due to how rich and money printing big tech is, the sheep will follow whatever they say.

And at the very tippy top of that is Nvidia, who has become a household name like Tesla, not due to any innovation mind you, but solely due to presenting a public facing wage stagnation and huge cost of living the idea that a 700% gain in stock appreciation is still possible. You're just one gamble away from making it big and escaping wage dependent living.

The whole thing is a deeply sad reflection of the modern economic system.

Expand full comment

My only hope is that the GenAI frenzy will have kicked off a renaissance in nuclear energy that makes our society willing to revisit this promising technology and its recent innovations, just before the AI bubble finally bursts.

Expand full comment

The investors are going to reap what they sow, but unfortunately they are going to take a lot of innocent bystanders down with them

Expand full comment

Please do. As someone who has been following the variable tweaking, even the foremost variables that tech loves to boast about, is seeing a clear plateau. And this is with respect to chip density, in terms of economics aka cost its even worse. Double the cost but only a 10-15% improvement in metrics that are honestly flawed in themselves.

Extending training time, using human feedback, synthetic data, are all just window dressing attempting to buy time since their step change never materialised.

Expand full comment

An alternative view of what is happening is that we have been passing through three different phases of LLM-based development.

In Phase 1, "scaling is all you need" was the dominant view. As data, network size, and compute scaled, new capabilities (especially in-context learning) emerged. But each increment in performance required exponentially more data and compute.

In Phase 2, "scaling + external resources is all you need" became dominant. It started with RAG and toolformer, but has rapidly moved to include invoking python interpreters and external problem solvers (plan verifiers, wikipedia fact checking, etc.).

In Phase 3, "scaling + external resources + inference compute is all you need". I would characterize this as the realization that the LLM only provides part of what is needed for a complete cognitive system. OpenAI doesn't call it this, but we could view o1 as adopting the impasse mechanism of SOAR-style architectures. If the LLM has high uncertainty after a single forward pass through the model, it decides to conduct some form of forward search combined with answer checking/verification to find the right answer. In SOAR, this generates a new chunk in memory, and perhaps in OpenAI, they will salt this away as a new training example for periodic retraining. The cognitive architecture community has a mature understanding of the components of the human cognitive architecture and how they work together to achieve human general intelligence. In my view, they give us the best operational definition of AGI. If they are correct, then building a cognitive architecture by combining LLMs with the other mechanisms of existing cognitive architectures is likely to produce "AGI" systems with capabilities close to human cognitive capabilities.

Expand full comment

sounds like neurosymbolic AI in the end, no?

Expand full comment

Well, someday someone may figure out how to do it all in a connectionist architecture. But either way, we are seeing more and more structure in these systems. I also think the pragmatic engineers in startups will be thinking: "I could try to do reasoning inside the net, but damn this SAT solver runs fast on my GPU." I'm on the lookout for interesting combinations of heavily optimized symbolic AI reasoning engines and strong contextual knowledge retrieved from the LLM. That would give us the soundness of the inference engine plus the rich context and world knowledge of the LLM. It's not how people work, but it is a great way to build an AI system.

Expand full comment

Mostly agreed, with the caveats that (a) we don't yet understand how to combine LLMs with those other mechanisms in a way that will work, and (b) even when we make some progress on that question, I think it's still going to be incremental; I would not yet use words like "likely ... close to human cognitive capabilities".

Expand full comment

Yes, I'm speculating here (and will probably regret it quite soon). If past experience is a guide, we will discover yet more pieces that are needed.

Expand full comment

Tom, question from a neophyte, would all these additional systems ancillary to the LLM be helpful in terms of interpretability?

Expand full comment

Maybe. The more structure that is exposed by the system, the more interpretable it can be. For example, RAG makes it possible to cite source documents. However, as the size of a search space scales up (e.g., in AlphaGo or in a SAT solver), the size of the "explanation" grows very large, and new techniques are needed to summarize it. That raises the long-standing challenge of discovering human-interpretable abstractions.

Expand full comment

When LLMs cite “source documents” are they actually citing the specific documents from which particular data came?

Or are they citing after the fact “best guesses” about where the data likely might have come from? —Eg, based on a web search of key words

If they are citing the actual source documents , how does that work?

Expand full comment

I've seen a GPT based system cite correctly in some sense, but still hallucinate the details when forced (for example) to do arithmetic.

Expand full comment

In Retrieval Augmented Generation, a collection of documents (e.g., Wikipedia) is pre-processed and indexed into a vector data base. During generation, your question is matched to the vector database, and relevant passages from the documents are copied into the LLM's context buffer. Bing (and presumably Google) also do a web search and include some results in the input buffer as well. My simple model is that it is these retrieved documents that are cited. But I imagine the commercial models have multiple strategies for determining which documents to cite. Studies have shown that the generated answers can have a mix of retrieved material and information learned during the pre-training phase. You must check everything an LLM produces!

Expand full comment

This is particularly evident using Perplexity. It's also particularly frustrating as if the source is garbage the RAG will basically output garbage, or an interpretation that is based on garbage. Humans can quickly tell when a source is bad but it seems difficult for their pipeline to do that. I am also still wondering if they remain slave of the SEO and PageRank algorithms to extract those documents?

Expand full comment

Thanks.

And I plan to check😊

Expand full comment

It may be an empirical hypothesis but having LLMs produce synthetic data that is then used to train more powerful LLMs *seems* like it should violate some fundamental law.

That's too close to alchemy.

Expand full comment

There is actually research showing that training LLMs on output from LLMs leads to model collapse after just a few iterations.

One would think that the developers of LLMs would be very concerned about such things.

Expand full comment

The problem is actually not restricted to “synthetic data” produced specifically for training purposes but in fact, given that the web is now being flooded with LLM generated data, simply training on random data from the web going forward will inevitably have the same result.

So , LLMs require more data but more data generated by LLMs will actually make them worse.

Quite the pickle.

Expand full comment

“AI models collapse when trained on recursively generated data”(published in Nature)

https://www.nature.com/articles/s41586-024-07566-y

Expand full comment

Wait a minute. I saw something on exactly that a couple years ago. When you do that, it results in a degenerative cycle that produces wave patterns.

Expand full comment

An awful lot of paychecks rely on the scaling laws holding. The delusion may persist for some time.

Expand full comment

Paychecks scale with the wind.

And an ill wind is now blowing

Expand full comment

As we suspected. It is likely that LLM’s as a class will hit a conceptual architectural limit and true or just significantly better AI or ML will be something different

Expand full comment
Comment removed
Nov 10
Comment removed
Expand full comment

I am really not technically qualified here. I come at it moore from a conceptual philosophical pov. Moore’s “Law” is likely not a law like the law of gravity. It is remarkable that it has gone on as long as it has. I think maybe because it was called a law it has inspired continued progress. Similarly the assertion of scaling for LLM’s has not been proven but the progress impressive and the latest versions will have remarkable impact on almost everything. But as smart as it is, it ain’t intelligence. Yeah there is the Turing Test but as brilliant as he was if he were alive today I don’t think he would call it intelligence. It is not blasphemy to be skeptical about scaling and posit limits and new approaches. Unless you are invested in the religion

Expand full comment

I think maybe because it was called a law it has inspired continued progress“

That’s exactly what Cal Tech scientist Carver Mead has said about Moore’s Law and he should know because he coined the term.

Mead, who is basically behind the whole integrated circuit miniaturization scheme, says that in the early days, he had no luck convincing anyone that his ideas would work.

But once someone tried it and saw that it did work, it became a self perpetuating process.

As Mead has pointed out, the key element is “belief that it will work”.

But belief is obviously not the same as physical law and when the two are in competition, physical law wins every time.

Expand full comment
Comment removed
Nov 10
Comment removed
Expand full comment

Intelligence is not a collection of skills. It is the capability of *acquiring* skills.

We'll get useful 'skill machines' from this, little doubt.

PS. The Turing test is based on the assumption that humans are hard to fool. The reverse is true,

Expand full comment

Actual intelligence requires sentience, which requires consciousness. IMHO.

Expand full comment

The Turing test is based on the assumption that humans are hard to fool. The reverse is true,“

LLMs certainly put the Turing test to bed.

In fact, they make me wonder whether humans are even intelligent.

Expand full comment

The big paradigm shift we need to make is that we come to grips with the fact that we're the most intelligent species on the planet, but not necessarily very intelligent in an absolute sense. We're mostly 'dumb' automation too. See https://youtu.be/3riSN5TCuoE and https://youtu.be/9_Rk-DZCVKE for the relation between the digital revolution and that paradigm shift.

Expand full comment

I don't know. Dennett always said that the tester has to be very clever. That said, I am now convinced the ELIZA Effect is even more powerful than I ever dreamed; so if you are referring to the popular version of the test (i.e.,fooling the unprepared) I would agree.

Expand full comment
Comment removed
Nov 11
Comment removed
Expand full comment

AI is not following that path (yet). Nor is our kind of intelligence a necessary outcome of evolution.

Expand full comment

Assuming that's a direct quote from Altman, he's channeling Donald Trump. Are all the techbro ring kissers going to start spouting barely comprehensible gibberish?

Expand full comment

i included the link to the full video, which i condensed for readability. judge for yourself :)

Expand full comment

OK I listened to it. What the fuck is an air winter? Or for that matter a PGism? That Ycombinator guy's breathless "yeah" at the end, after that comment about a new square in the periodic table; it was like SA had just shown him his new Power Ranger.

Expand full comment

After OpenAI trains chatGPT on Sam Altman’s statements, AI chemists will undoubtedly refer to “the Periodic Table of Squares”

Prompter: “How many chemical elements are there?”

ChatGPT: “I’m not familiar with that term, elements. Could you be more specific?”

Prompter: “You know, in the Periodic Table”

ChatGPT: “Ah, you mean “squares.”The International Union of Pure and Applied Chemistry recognizes 118 squares”

Expand full comment

Make that the “International Union of Pure and Applied AI Chemy (IUPAAIC)”

Expand full comment

When OpenAI runs out of high quality copyrighted…I mean “publicly available”… data to train their chatbot on , they can use their secret weapon: Sam Altman’s large body of verbiage.

If that doesn’t get them to AGI , nothing will

Expand full comment

The more LLM generated material is out there the more LLMs will train themselves on themselves. Even my cat, after throwing up and eating her own throw-up, doesn't eat it again if she throws it up again. Cats are smarter than LLMs

Expand full comment

Yes, Mad AI disease is a real thing

Expand full comment

Or on the case of Meta’s AI, I suppose it’s called Mad Llama disease

Expand full comment

Anyone who has ever been around llamas knows that there is nothing more dangerous than a mad Llama.

They spit

Expand full comment

When faith is lost on a pyramid scheme, sources to new sucker money dry up and the scam collapses.

Expand full comment

I think confronting a "wall" with LLM's opens up a new opportunity. The concept we are all assuming we know that makes the term "wall" useful is that it is more "tangible" than an open ended search. So, figuring out what such a "wall" is, should be easier than trying to explain the void of continued growth.

Expand full comment

Since my post generated some interest, let me add 2 concrete examples of my point.

The AI community still doesn’t have an “algorithmic” model for human reasoning. If I had to summarize LLMs, it would be, “look for key words in a database, and choose the most common one to form a grammatically acceptable reply.” BUT, how are we judging the “correctness” of LLM results? Ask GPT to explain why “Democracy” isn’t doing too well. I did. I got the typical reply “Well, there are multiple ‘views’ about this. Here they are.” That, of course, gets a low score. To score high, we really want the “TRUE” answer. To the point of the article, once a sufficient database is found to produce a good summary of “views”, “scaling” will no longer provide a proportional gain.

The second example is how we are judging “consciousness” in the first place. If we consider humans with low IQ scores, don’t we “automatically” endow them with “consciousness”, just because they’re “human”? If we then find a human with an IQ of 140, is there any question about them being “conscious”? Clearly no. BUT, if we ask them to provide a short synopsis for each entry in the Encyclopedia Brittanica, for sure, they will get a low score. Yet GPT would ACE the reply, and do it in a flash. AND, scaling up the data base, once it has the whole Brittanica in it, isn’t going to improve the score. But GPT still doesn’t get a “consciousness” badge.

There is enough “challenge” in just these 2 examples to put a lot of “structure” in answering the “wall” question!

Expand full comment

Would it? I have seen a lot of convincing summaries of Teams meetings that are subtly wrong.

Expand full comment

Keith. I agree.

To use your insight here, this is where I claim we have to put the effort in to examine situations in detail - i.e. define the "wall".

I do a lot of "transcription" editing. It typically takes me 4 hours to edit a 1 hour meeting transcript. But, again, a "human", with any I.Q., that is not familiar with the topic matter of the discussion, can easily produce "wrong" summaries.

All AI applications, that create such "transcriptions", as far as I know, are directed by an "AI prompt", which is created by a "conscious human". Based on my work with "consiousness", we can get a hint about the "wall" by posing the following question, "How would we formulate an "AI prompt" that accurately generates "AI prompts"?

Expand full comment

I agree with the examine - IMO the problem is not the errors, but our inability to understand what the errors will be. This also extends to humans (or classes of?). This comparison is where the action is, so to say. The recursive prompting idea seems to be gravely worrisome, as it will enthrone even further bad ideas.

Expand full comment

Keith. Again, I agree with you. But, that’s my point. Our goal should not yet be the outcome. The goal should be understanding the “logic” that creates the “inability to understand what the errors will be.” Ironically, I think we’ll find it is not much different from the reasons so many differences in views in science and social understanding continue to evade us.

As for your “gravely worrisome” concern about “recursive prompting”, again, I’m not pushing to understand how to actualize it. I’m raising the “recursive” model as a tool to understand it. The reason I’m suggesting it is, in my new model for human thinking, it was the understanding of “recursion” in the human brain that answered so many questions. [ https://www.academia.edu/112492199/A3_A_New_Theory_of_Consciousness ]

Expand full comment

I am not surprised it isn't as simple as the wishful hypotheses said. It is easy to make big improvements initially, but it's not surprising that improvements become more difficult to make rather than accelerating over time. I am afraid the model has been oversimplified to sell to investors. I think AI will be useful, especially for somewhat boring, repetitive tasks like reading xrays, etc. i think the more nuanced and complex the task the more difficulty the next generation of AI will have making major improvements. And eventually it will save time and money on certain kinds of tasks. Instead of being impressed with what it can do, i think over selling the public will bring a wave of disappointment, before we get the next level of investment and improvements in AI

Expand full comment

I thought that the core hypothesis driving generative AI was that “hype will make us (the Lords of AI) all billionaires”

It seems to have worked spectacularly so far and shows no sign of being wrong.

Expand full comment

How anyone could believe that a glorified auto-complete would lead to AGI is perhaps the most remarkable aspect of the LLM hype.

Expand full comment

That depends on what one means by AGI.

Maybe AGI actually stands for “Autocomplete Glorification Incarnate”

Expand full comment

It was the best of times, of exponentially overhyped scaling laws, and it was the worst of times, of exponentially diminishing returns...

Expand full comment

Gen AI does well because it lends itself to recursive hype generation.

Expand full comment

So, pending stack overflow, then? :)

Expand full comment

This is a great discussion. I do however believe we're missing the bigger picture ... John von Neumann in "The Computer and the Brain", published posthumously in 1958 ... the main relevant items re the LLM Wall ... which I believe is a methodological training wall, not a real limit are the following considerations:

a) the most important points

- Lots of relatively low precision, slow (compared to electronic) NETWORKED processing nodes

b) capacity for massive parallel processing. The estimated brain consisting of (many?) tens of billion neurons, each connected to thousands of others, facilitating simultaneous processing across vast neural networks. The parallelism enables rapid integration of information from diverse sources,...

b) biological systems are inherently robust; the loss or malfunction of individual neurons doesn't typically impair overall function. This resilience arises from overlapping functionalities and the ability of neural networks to reorganize and adapt

c) stochastic, or probabilistic, nature of neural processes. Neuronal firing isn't solely deterministic; it's influenced by a range of variables, including synaptic weights and neurotransmitter levels, which can introduce randomness into neural computations. The stochasticity allows the brain to be highly adaptable, capable of learning from experience, generalizing from incomplete data, and exhibiting creativity.

TO ME, this suggests the work needs to shift to exploring different network structures for 'artificial neural networks' .. different architectures, different topologies, - Use what's know about the brain 'mechanical' architecture more wisely. really understand the brain synaptic organization and the recurrent loops in the brain to conceptualize new architectures and paradigms for using information ... clearly, real experience shows that people like Einstein didn't need to know everything about everything ... and have infinite data ... so. get smart, and get busy ... if course, the lazy way is to look for more training data and more training epochs ... but that's just me ... arguing that the long hanging fruit have been picked, and now one has to get smarter. Cheers. Stay positive, great things are coming (in the AI world)

Expand full comment

“There is no wall” says Sam Altman , from his position inches away from the Great Wall of China

Expand full comment