242 Comments

The scaling hypothesis is wrong. It depends on magic. https://www.linkedin.com/pulse/state-thought-genai-herbert-roitblat-kxvmc

The measure of scaling is wrong. Data scaled as compute scaled and it is probably the amount of data that affected the model. https://arxiv.org/abs/2404.04125

The predicted shape of the scaling function is wrong. If it requires exponentially more data for linear improvements, then it must slow down over time.

The measure of intelligence is wrong. Intelligence cannot be measured by existing benchmarks when the model has the opportunity and the means to memorize the answers (or very similar answers).

The models are wrong. LLMs model language, not cognition.

So, what's next? That is what my book is about. Here is an excerpt: https://thereader.mitpress.mit.edu/ai-insight-problems-quirks-human-intelligence/ In the book I lay out a roadmap for the future of artificial intelligence. As Yogi said: "If you don't know where you're going, you might end up someplace else.

Expand full comment

"If it requires exponentially more data for linear improvements, then it must slow down over time." This. Yes.

It seems it should act like a classic resource and network problem. So it has to exhibit asymptotic behavior.

Expand full comment

Dude. You're spamming. Chill. Once is good enough.

That said, it is an interesting article. I gave it a read. We shall see how well TTT works in continuous operations mode.

I can't really comment because I am not working with LLMs at the moment. I have other time sinks and priorities right now.

Expand full comment

Yeah, got it, at least someone read it, and that was the point (thank you)

Expand full comment

You posted that link 77 times!

What in the hell is WRONG with you???

Get some psychiatric help. Seriously.

Expand full comment

You did not count right, but, one way to raise awarness is to shout, simply put, as people did throughout all history, to not understand that means there is something wrong with you, but it is a valid question-- I wanted people to be aware, and it worked-- sometimes the means justify the process when the process is a pattern that can be observed and it works, in that case, use it

Expand full comment

Thanks! I'm reporting all these comments for being an obnoxious spammer.

Expand full comment

The only way AI can be properly assessed is if it is presented with problems that it has never seen before.

FrontierMath is the right idea.

https://epoch.ai/frontiermath

So far, LLMs do very poorly on Frontier math problems, despite doing very well on the other benchmarks

Of course, claiming your LLM got 2% of the problems correct is not a particularly strong selling point, so don’t expect the LLM companies to adopt the Frontier benchmark any time soon.

Expand full comment

[From the book excerpt] "We do sometimes behave like computers, but more often, we are sloppy and inconsistent."

Well, now we have computers that are sloppy and inconsistent! This must be progress!

I have been using an LLM to help me write a web app. The amount of information it has at its command is mind-blowing, and its ability to accept requests in English makes it astoundingly easy to access that information. Yet it does make mistakes that could fairly be called sloppy. I'm not saying anything new here, but it has been interesting to see how the problems manifest in practice.

Expand full comment

Claiming LLMs can think because you can make one say that it's thinking is like saying a toasting fork has the same electronic machinery as an electric toaster because they both produce toast.

Expand full comment

Read this https://arxiv.org/abs/2411.07279 this is next pay attention to TTT by MIT

Expand full comment

The surprise is not that CEOs hype their products. Instead, it's that ignorance of how LLMs (and artificial neural networks generally) actually work that allows the hype to be believed. If they were making cars, say, and claimed that future models were going to go 1000 mph within 5 years, they would be immediately asked what technology they would use and be ridiculed if they didn't have a good answer.

Expand full comment

The fourth estate, an open and free press, is broken, that's why they get away with it. Social media and big tech empires have absolutely destroyed journalism in a way that Hearst and Pulitzer could only dream of.

Expand full comment

The 4th Estate was already in decline due to the FCC ending enforcement of use of airwaves, and consolidation of news outlets due to an end of SEC enforcement.

Rupert Murdoch was the biggest consolidator, and by 1990 put his imprint on news around the world. To survive, other outlets consolidated, and this has become nearly complete today. This drove the death of the 4th Estate more than anything else.

Expand full comment

I agree, but there's a lot of nuance to it. I worked in the broadband/cable sector in the 90's, and the FCC was only in charge of broadcast media, so Murdoch as a cable news outlet owner was able to ignore those rules.

By the 00's most governments were flailing around trying to play catch up on that when the world wide web exploded and upended everything.

In a way we are living in the middle of a technological singularity. Few people predicted 25 years ago there would be anything like smart phones, video conferencing, affordable high speed internet in your pocket, anything like Amazon or Google, large complex enterprise software as subscription services hosted and run by the vendor rather than onsite, or any of the rest. The world is massively changed, and I think that's a lot of what drives the deep down fear and anger we're seeing across the world.

I'm an IT pro with quite a diverse and weird set of skills and work experiences, and it still gets on top of me. Not that I'm nostalgic for the 70's and 80's. Getting old sucks, but every birthday brings me a year further away from that time, and that makes me happy.

What I'm saying is I think the notion of the regulatory agencies and government branches reigning in the media in the 90's and 00's might have been doomed from the start. We're all playing catch up, and it's been really hard to stay on top of things.

It's actually fortunate for us that these industries have turned into huge groaning inefficient monopolistic monsters pushing bunk tech like self-driving and so-called AI, and social media empires like Meta and Twitter shitting the bed, because it's pushing average people to get angry and yell at their elected representatives to fix it and slowing the pace of things a bit.

Expand full comment

Dude, stop spamming the same paper on every comment.

Expand full comment

I think he made a mistake when writing his script to spam Substack with his opinion. :)

Expand full comment

Wish this was true! The claims about autonomous vehicles have gone largely unchecked, as have overly ambitious corporate commitments to reduce climate change.

Expand full comment

Unsupported claims about autonomous vehicles have gone largely unchecked…except by the concrete barriers, tractor trailers, parked fire trucks and other stuff they have hit, that is.

Expand full comment

And the rare time they faced business consequences, dragging a human being and then lying about it….

Expand full comment

Trees. Don't forget the trees.

Expand full comment

We have a listening problem. Knowledge without understanding. The incentives are all wrong.

Expand full comment

I agree, but I have one nit to pick: you misspelled “incentive$”

Expand full comment

Money does talk 🤑

Expand full comment

Unfortunately, when it comes to AI, money moneypolizes the conver$ation

Expand full comment

Read this https://arxiv.org/abs/2411.07279 this is going to happen I will put it out to everyone

Expand full comment
Comment removed
Nov 23
Comment removed
Expand full comment

I was speaking more broadly about society. The context in which hype gets so out of hand. Insisting that an unproven hypothesis (scaling will continue to yield exponentially great improvements) is true is the opposite of science.

Expand full comment

That’s why it’s spelled “hype-othesis”, to differentiate it from a scientific hypothesis

Expand full comment

A hype-othesis doesn’t have to be true to generate investment.

All it has to be is sufficiently hyperbolic and “sciencey” sounding.

And “scaling” sounds very sciencey.

Expand full comment

And combined with “exponential” you will have investors eating out of your hand

Expand full comment
Comment removed
Nov 23Edited
Comment removed
Expand full comment

It’s okay that they lie because they know they are lying? 🤥

Expand full comment

How much of this is a question human beings just being human beings, and, how much is just the professional media relations people feeding the financial hype machine? Where does people's wishful thinking end and actual fraud begin?

Expand full comment

Thomas Kuhn wrote an extended elaboration on Schopenhauer's thesis, with copious examples from the history of science. Highly recommended.

Andy Grove and I pointed out the asymptotic inflection of Moore's Law a decade ago. I was an "enforcer" of Moore's Law from '85, when sub-ppb Carbon Analysis was made available by Anatel, and although everyone would like to compare their work to Moore's Law, it was a singularity. Nothing else will EVER decrease in size, mass, energy, and cost by 10 orders of magnitude.

People want to forget that the drivers behind ALL "high tech" were the inventions of transistors and integrated circuits. FETs made flash and low voltage high speed circuits practical, and Silicon Carbide made EVs, next gen robots and rockets possible, as well as many of the grid upgrades for (somewhat greener) energy. All fields of STEM benefitted from quantum effect transduction and electronic data processing.

It is interesting that the current processor fad for AI is archtecture optimized for graphic processing. Makes sense for imaging systems, but not for language - possibly reasoning? What is really needed, and what few recognize in organic intelligence, is a computing system system based on "morphic resonance" (cf. Rupert Sheldrake). This follows heuristic principles.

Meanwhile the salesmen, marketers and hypemasters become the richest and most famous men and the hands-on legions who effected all these changes fade into anonymity. One defninition of "Entrepreneur" is someone who jumps on the bandwagon when the product development is 95% done.

Expand full comment

Yes. And a VC needs to attend to their ability to cash out. This is their first principle. That is driven by publicity. The ultimate in hype is Uber, which only lost and loses money.

It's nice and helpful when an investment is solid. But, as with Uber, pure hype flies. Solid investments tend to walk.

Expand full comment

I'm really interested in what you said about a computing system based on morphic resonance. Isn't it a phenomenon observed in living organisms? What are your thoughts on what would make up such a computing system?

Expand full comment

Morphic resonance not observed anywhere — it is pseudoscience. But I’d also like to read a clarification of what Lawrence means by using it to design artificial systems.

Expand full comment

The whole thing reminds me of Hollywood investing in yet another rehash or sequel in a franchise instead of trying out a fresh idea. Investor risk aversion is why we'll eventually get Shrek 19 or whatever. LLMs had such a string of blockbusters, the ROI is gonna have to bottom out pretty hard before people will pivot, and as we've seen in the movie industry, even then they might not.

Expand full comment

ha! who follows you is how I judge who knows something about AI. The people who are recognized as "experts" constantly display how little they know about the "I" part. And yes, there's a lot of wishfull thinking for a "next tech" -- especially the wealth aspect, and--a kind of Dear Santa, why doesn't Moore's law apply to everything? But the lemming mentality of tech and tech financing has jumped the shark! It's dogma founded on wishes from the self-proclaimed rational, science-based tribe who are also hypocrites, who threw shade at people for learning via dead trees, and then turn a blind eye to the immoral energy consumption of their wanna be tech.

Expand full comment

If you want to start understanding Venture Capital, Mulcahy is required reading. It's her writing. Her candid writing is unusual in the field. Can be quite funny in an understated, dry way.

https://www.kauffman.org/reports/we-have-met-the-enemy-and-he-is-us/

Expand full comment

This is likely where many models will max out. Better than average, but not better than good. Which might be useful for certain applications. But, most of us aren't looking for average certainly to hire or for consequential answers...

Expand full comment

You need to realllly zoom out and see the cross field effects of all the similtanous innovations that only an AI could track using massive data set

This leads the AI (not humans) to be able to make the connections that humans cannot (information time-paradox overload)

This will lead to the hyper expansion of AI which will lead to massive gains (it is all cumulative, like when people said AI would never draw, make videos, be robots et cetera et cetera) but normal humans cannot see this- again, my point

Expand full comment

It does go downhill from here Gary. The worst is when 25 years from now they are still denying and still committing the same errors because professors don't want to update their teaching.

An example of this in another field is the TLR4 problem in human immune systems. Unlike every other animal with TLR4 (insects branched off long before) humans have no functional siglecs to damp the signal. (Most animals have two siglecs for TLR4.) This is why humans are uniquely susceptible to septic conditions. It's a feedback scream in the immune system.

One technology this critically affects is gene therapy. Because the dose of carrier vectors easily goes over the threshold of TLR4. And because this is not taught, protocols don't list it. So there is no diagnostic criteria, physicians don't expect it. Because they don't expect it, and there is no comprehensive response protocol, and it happens so fast, the person (often a kid) dies "mysteriously" from a "cytokine storm".

This was figured out a couple years after the first death in gene therapy, Jesse Gelsinger.

It still happens. Mysteriously.

Expand full comment

The entire Computer Industry has a deep and abiding ignorance of Biology. Theranos is the poster child but AlphaFold's absurd claim to have "solved protein chemistry" is Right Up There.

Expand full comment

I thought the physicists had already claimed to have “solved chemistry.”

And that the chemists had already claimed to have solved biology.

it would seem that all that remains is for the chatbots to solve physics.

Then, everything will be solved.

Expand full comment

Read this https://arxiv.org/abs/2411.07279 I can do this too

Expand full comment

The place where computing really gets it wrong is these bozos that talk about mind uploading, as if you could microtome a laptop (which has larger scale features than brains) and then use the computer images of the slices to run all the software on it. And that the microtome slices model could contain working copies of all the software. It's ludicrous codswallop.

This is still my favorite rant on brains and electronics. To this I will add that our nervous system is not electronic, and the signals we pick up are side effects of electrochemical waves. (But given evolution's penchant for harnessing anything that works, electrical signals may interact in some weird way for thought.)

https://mathbabe.org/2015/10/20/guest-post-dirty-rant-about-the-human-brain-project/

Expand full comment

The brain upload people's knowledge of Neuroscience: Lo & forsooth, the electrick fluid floweth full mightly along the neurone fibre.

Expand full comment

Alpha fold is a later iteration of the heuristic method and works very well for what it does. But the reason it has to work that way is precisely that the chemistry and physics are not computable now. It's too hard a problem. So, for now, we do it this way.

Expand full comment

I did some work with protein folding software (using BOINC) at Vanderbilt University. Even with crowd sourcing I knew that this was a *grand* challenge misunderstood by most.

Expand full comment

AlphaFold is based on the obsolete idea one gene, one protein, one function. We now know that proteins are dynamic conformational ensemble having or containing multiple proteoforms with broad spectrum of structural features and a diverse range of functions. The D4 Dopamine g-protein coupled receptor is a good example, it is highly disordered (See: any number of papers by Uversky on Intrinsically Disordered Proteins, and has a high affinity for Norepinephrine and Epinephrine.

Further AlphaFold is not reliable:

"Our findings suggest that the N-terminus of SN1 may act as a nucleation site for SNARE acceptor complex assembly. As a side note, our result of an intrinsic disorder of SNAP25 contrasts with a recent prediction of AlphaFold2 that predicted SN1 and SN2 to be entirely α-helical with high confidence."

Stief T, Gremer L, Pribicevic S, Espinueva DF, Vormann K, Biehl R, Jahn R, Pérez-Lara Á, Lakomek NA. Intrinsic Disorder of the Neuronal SNARE Protein SNAP25a in its Pre-fusion Conformation. J Mol Biol. 2023 May 15;435(10):168069. doi: 10.1016/j.jmb.2023.168069. Epub 2023 Mar 30. PMID: 37003471.

I find it fascinating the last sentence was eliminated in the published paper.

Expand full comment

I haven't used alpha-fold except for a quick browse. It's hard to really evaluate unless you are doing work that needs it and I'm not right now. I've used the two heuristic predecessors, and I always supplied it with an amino acid or nucleotide sequence of my choice. The nucleotide sequences were chosen from DNA sometimes, but mostly from mRNA sequences from an exome.

Alpha-fold is a database of results from stepping through amino acid sequence databases like Genbank and uniprot, and nucleotide sequences. So, I don't think your opening sentence is meaningful, nor correct.

I have no doubt that all of the protein sequence folding software, including alpha-fold, are wrong quite a bit. I always used them knowing that was pretty likely true. But it's also true that x-ray crystallography conformations are a biit off too, because those crystals are not how the protein is conformed when operating necessarily.

When I worked for a time in a cryo-EM lab on the conformation of the HIV GP120 trimer, it was a probabilistic region. But those proteins were always in motion. Fourier analysis works, but using the software it is all essentially magic because those calculations are so far beyond what a human brain can possibly do.

Expand full comment

https://www.quora.com/Could-Theranoss-technology-work-if-the-technology-was-more-advanced?top_ans=165777074

My PhD thesis was on failure modes of flow cytometry diagnostics. I tried to contact Theranos, but just like the flow cytometry diagnostic industry they ignored me or attacked me. (Theranos ignored me. A company ignored me, fired their CTO, then attacked me in confidential secret white papers.) Just like Gary describes in this article. Theranos could have been saved and transformed diagnostics, just not with Edison. Tons of stuff works well enough to make a demo. Getting it to gold standard level? That's a whole other story.

I like Elon a lot for his brutal honesty in tech development combined with creativity. (I fear the living $%it out of him applying this to government while he shouts his ignorance of basic things like how the monetary system actually works, and allies with wily pols like Rand Paul who ignores a couple basic facts. Fauci (a very nice man I have corresponded with who was out of his depth) was appointed by Trump. And the only reason Fauci had to be appointed is that Trump’s team had cut the Pandemic Response unit at CDC! Just like Elon and Vivek want to slash and burn. He can go a long way in the wrong direction, and societies are not rockets with clear metrics. And there are huge phase lags in society and economies.)

Anyway, Elon is excellent at tech, and doesn't fall too much in love with his ideas. He throws billions away if it's clear it's not working.

Expand full comment

Maybe this is why cognitive AI is resisted so much.

Expand full comment

Just recently: https://x.com/sama/status/1856941766915641580

It should come as no surprise that the CEO of a company that makes LLMs wants people to believe that LLMs will continue to improve at a high rate forever.

Expand full comment

When I was working at AT&T Labs in the early 2000s on the How May I Help You system (I maintained the machine learning classification models), I had a lot of very intelligent co-workers who were experts in their field. One Machine Learning expert mentioned to me a rule of thumb that he had observed. Once one type of model, say, a regression function, reached optimality on the data, other models, such as support vector machines did just about as good. In essence, they had captured as much knowledge as is possible to capture out of the data using a similar set of techniques: in that case, models whose metric for success is based on statistical parameters.

This is the conceptual wall. The wall is made of our a priori suppositions such as "what is our measure of closeness to truth" and "what is the space of models we are choosing from". Neither of these suppositions are cast in stone. But if they are considered that way, those stones form the wall.

In the case of neural nets, the measure of truth is a measure of accuracy. But it could just as easily be an measure of explainbility or insight, if you could define those terms. And the space of models is of course, the space of neural nets. But it coud just as easily be the space of expert system rules, along with a computational component.

The only way to go beyond the wall is to tear it down. Note that, in choosing new metrics and model spaces, we are building new walls. Ultimately, the only solution to that is meta-learning.

Expand full comment

Your history sounds familiar. I started coding and using evolutionary computing (algorithms based on Nature) in 2001 and retired in 2018. My master's was in Cybernetics, so that might clue you in on my view. The way we are going we might as well use perceptrons that were poo-pooed (hyped) and then dismissed back in the day.

Expand full comment

Just recently, I had a great success by combining BERT vectors with a human expert. The task was a recognizer to identify the 5% of service calls coming into a sales channel. My training set was 200 labeled service and 4000 labeled sales call transcripts. Random Forest was 95% accurate: Every call was a sales call. Not too helpful. But the human expert gave 100 service examples and described why they were service calls and a similar number of sales calls. So I used BERT vectors for semantic similarity to the exemplars, and the description provided an explanation of why it was a service or sales call. Again, 95% accuracy, but the error was proportional to the class and it was a form of Explanable AI. Essentially I combined a Large Language Model with an Expert System.

Note that expert systems had two drawbacks: they were hand-crafted and they were fragile in the sense that they could not cope well with cases that did not exactly fit the model. At Coopers and Lybrand in 1990 we used them for tax analysis and audit risk where they actually work well - most tax software nowadays is a glorified expert system. But using a neural net as a measure of similarity instead of an exact match fixes the second problem. Some creative approaches to machine learning would help minimize the hand-crafting.

Expand full comment

"test-time training (TTT), in which models are updated through explicit gradient steps based on test-time inputs"

Well, having looked over the paper, it seems to be a throwback to a form of Identification in the Limit.

https://en.wikipedia.org/wiki/Language_identification_in_the_limit

Another example of this technique is:

U.S. patent number 7,620,550 [Application Number 11/866,685] 2009-11-17: "Method for building a natural language understanding model for a spoken dialog system." AT&T Intellectual Property II, L.P.: Narendra K. Gupta, Mazin G. Rahim, Gokhan Tur, Antony Van der Mude.

https://patents.google.com/patent/US7620550B1

A first NLU model is generated and tested using the hand

crafted rules and sample utterances. A second NLU model is

built using the sample utterances as new training data and

using the handcrafted rules. The second NLU model is tested

for performance using a first batch of labeled data. A series of

NLU models are built by adding a previous batch of labeled

data to training data and using a new batch of labeling data as

test data to generate the series of NLU models with training

data that increases constantly.

Expand full comment

AT&T was the home of my "mentors" for UNIX, C/C++ and other marvels of invention.

Expand full comment

Nadella's quote makes it clear that in order to remain competitive, he took to behaving like an LLM and lifted your words without attribution. Great Timeline!

Expand full comment

How can we be sure he ISN’T actually a NadeLLM?

Expand full comment

Holy crap, the answer was right in front of us this whole time!!

Expand full comment

“NadeLLM” and “AI-t man “ can’t be mere coincidence.

AGI is already here

Expand full comment

One datapoint isn't enough to draw a strong conclusion. Two? That defines a trend. And three might be defining a law, particularly when umpteen cognitive biases are operating. And it's not just technology analysts that make these mistakes! (Former head of research at Gartner, Inc.)

Expand full comment

Well, Gartner does have a wee bit of incentive from time to time to extrapolate based on pleasant cognitive biases and a little data. This could even be called Gartner's business model.

Expand full comment

Whoops, Nadalla is CEO, not CTO, no?

Expand full comment

yikes, not sure how I missed that. thanks! (and now fixed in the online version)

Expand full comment

In my own experience, the curse of dimensionality is crushing. Maybe we need AI in the limit that is never reached, asymptotic as some mentioned. It's important to note when something is "working" but wrong and goes unnoticed. My work was on patterns of association between DNA/RNA and complex human disease.

https://www.wikiwand.com/en/articles/Curse_of_dimensionality

Expand full comment

Your analysis is accurate. I agree with your overall sentiment.

As for the future. This is my proposal. Not spamming. This is genuine conversation starter: https://ai-cosmos.hashnode.dev/beyond-correlation-giving-llms-a-symbolic-crutch-with-graph-based-logic

Expand full comment

I’m familiar with this idea. AI research is at a low point. TTT stems from misunderstandings fueled by hype and poor critical analysis. I have little patience for the growing nonsense, especially the notion that you can improve performance through prompting instead of training. Be careful which papers you read—the quality has dropped alarmingly in the past two years.

https://ai-cosmos.hashnode.dev/its-time-for-ai-research-to-embrace-evidence-over-speculation

Expand full comment

This is an example of scientific thinking being co-opted by greed and capitalism. And wishful thinking…

Expand full comment