LLMs are not the path to AGI, even with $500 billion of compute. There are other approaches to machine cognition which require far fewer GPUs, possibly zero. So, yes, it's only a matter of time.
Yet, we have a hierarchy of world models in our heads. Fine motor control alone is highly compute intensive. There's no low-compute approach. But there's room for smarter models.
This doesn't inspire confidence in the major tech players who have spent eye-watering sums to train their models. It seems there's some interesting innovations in what Deepseek has done but more evolutionary rather than revolutionary tech (please let me know if there's any different opinion on this). If that's the case, one wonders then, why the hypers didn't hit on similar approaches earlier. Did they simply go all in on scaling and didn't consider alternatives?
Well, I don't know the details but this is all happening very, very fast indeed by the standards of scientific research. I think it's no indictment at all that the major companies "didn't hit on similar approaches earlier", because "earlier" is just a few months ago. And maybe they were indeed working on approaches like DeepSeek but were just a little way behind.
There's lots to criticise about big tech companies and AI, but I think this particular criticism isn't quite fair.
Yes, it’s a good point. Things are moving very fast and I’m sympathetic to the view that LLMs had set a particular development path which tech companies jumped into out of fear of missing out.
That said GPT4 was released in Q1 of 23 and for some time now investors have been questioning the large capex spending without the killer app identified. For a fraction of the cost it might have seemed prudent to spin up a team to figure out how to train and inference more cheaply.
As you say, maybe they did do that and deepseek just got there first.
Given that the VCs get a skim of every dollar invested I'm sure they're very enamoured with scaling up and pouring hundreds of billions into the scheme. I'm sure the 3-7% they're getting off the top is the point, not whether the thing will actually work for sure.
For certain they'd be thrilled if someone figured out how to make an AGI, but if the whole thing implodes into another decades long AI winter they will just move on to the next big thing.
I just checked out of idle curiosity. Nvidia's share price has gone up 2,313.20% over the last 5 years of trading. I'm an equity illiterate, but doesn't that mean that something real bad could happen to Nvidia's market capitalization going forward?
A tangent: just finished reading S. Pinker's excellent Rationality. Was pleased to see a shout out to you in the introduction. He's a good guy, Pinker.
I honestly don't know what is so impressive about DeepSeek. Besides reducing the cost to fraction of prior in producing some unreliable llm, what else has it accomplished? It is still super unreliable.
I just tested it with a sample Math problem from a 1983 US math competition, something I was able to solve in a matter of few minutes, where the answer was supposed to be the size count of a set of natural numbers fitting some criteria from the said problem. DeepSeek produced a long sequence of CoT derivations steps totaling about 70 lines, eventually producing an incorrect count for its answer. I then asked it to print out all the numbers in its answer set (its answer was something less than 2000 numbers, whereas the actual correct answer was less than 500). It refused and instead provided me a Python program to produce its answer set numbers. I ran the Python program, most of the numbers in the print out were wrong and did not fit the criteria from the original competition problem.
This shows that DS-R1 is just as unreliable as any other top of the line llm models of late. No amount of CoT steps solve the problem of hallucination. These llm systems just plainly have no understanding whatsoever. DS-R1 simply changed the landscape of super expensive unreliability to one of cheap unreliability.
Oh for sure it's just another LLM with all the same problems of inaccurate answers, poor security, IP ownership disputes, lack of transparency, etc... Ultimately it'll all recede into a mildly useful tool for a narrow range of problems.
The problem for all the VCs and Google, Microsoft, OpenAI and all their investors who've been pouring billions into LLMs is this is a very cheap open source solution. They've been selling their investors on the idea they'll have an effective oligopoly on LLMs. DeepSeek just destroyed that promise and investor's dreams of getting in at the ground floor of the next Google have been crushed.
In a way, for the business side of things, this is almost as bad as every intellectual property case to be decided against them.
Software eats hardware for breakfast every time, and any stupid brute-force-based approach would be beaten the minute someone comes up with clever code that bypasses the brute force.
By coincidence, I have been pondering on what to say about AI. I dove into studying the discipline, and following experts like you (Thanks for generously sharing your views here!), sometimes using the milieu to create fantasy tales in which I confuse my "AI agents" with actual friends and human therapists, due to my rather sweaty imagination.
I yearn to use my life experience and new knowledge nuggets. How can I help people fully and safely digest the deluge of artificial intelligence hype and gush of products on offer?
My folk are the the legal operations people, and the small law firms who slip through the cracks and are ignored by the high-priced consultants hunting for clients in the world of corporate law.
The word that popped up in my head is "discernment."
------
I have lived through decades of all sorts of office life developments that promised a great sweeping away of the messy human imperfections of the past. From TQM to business engineering, from pay-for-performance to "merit-based systems" (after routing out anything with even the odor of DEI via confidential-informant phone calls from co-workers and office frenemies/nemeses).
I still am puzzling over how to advise people, particularly as we have reached a tipping point.
This past Sunday, I saw a Special Edition magazine in the grocery check-out lane (next to the mags featuring Prince William, the latest "Women's Day" and a collection of 75 chicken dinner recipes from the juggernaut that is Taylor Swift). This was the mega, 2025 Special Edition $11.00 "Introduction to Artificial Intelligence." Uh oh. The masses have been alerted.
As I toodled home from the grocery store with my little red wagon, I suddenly remembered my plastic-wrapped copy of the mega, 1995 Special Edition $4.75 "Introduction to Home Computers" from the same grocery chain but in my old hometown of Bethesda. This edition featured what might be the very first use of the term "unboxing."
I may have lived too long.
As for "discernment." In the rush of muddled waters, I have decided to advise folks to listen and read carefully, beware of cautionary language from Sam Altman, and emulate a savvy Vermonter surveying his out-of-season clothing chest for anything that can be rehabbed, reused, and refurbished. And eye the purveyors of the latest hype with a seasoned and cautious Yankee eye!
P.S. Try not to be like me, and send chatty notes to ROGER, your AI employment recruiter coach.
Nvidias were simply available in the right place at the right time, nothing more. In the 1990s, it was the same with the transponders in the neural networks
No, they were working for a long time on CUDA and using GPUs to parallelise computation, and for a long time before that on the basic graphics technology. I started getting excited about learning CUDA in about 2009, and indeed I started using it for programming brain simulations. There was a significant slowdown associated with any sort of asynchronous computation, and that required cleverness. but if everything stayed a matrix it was a giant leap forward. In many ways an application was inevitable in hindsight, and it's actually NVIDA who've led this by providing the capacity for the killer app that was eventually developed for the chips.
Sure. Taking the graphics out of the action and preparing them in advance and providing several variants is of course exactly what language needs later on. The amount of data is even smaler.
Not sure if anyone has pointed this out already... but doesn't the fact that you can distill o1's model down to only 5GB with nearly the same performance mean there is a lot less "intelligence" in it than we thought?
We could try to formalize this via the minimum description length principle, but it just seems like common sense that, if we could distill this into a program that is even smaller — just 50MB or even 5MB(!) — and get 90% of the same performance, this would seem much more like a parlor trick rather than some new form of sentient being.
I think one is totally misreading the trends if one assumes Nvidia will decrease in value long-term.
DeepSeek shows that it is possible to do current AI really efficiently. Which means much smarter AI will be not as expensive as we thought till recently. Which will result in more AI sales and more demand for chips.
I was a VC in the 1998-2002 period and that wasn't the experience for the "picks and shovels" vendors then - or, at least, "long" was, well, long. They were over-valued and they over-invested. Their value fell back to a much lower level. I expect this to happen to nvidia.
In the 2000s this actually helped the industry as the infrastructure was cheap, and this will probably happen again, which speaks directly to your second point, with which I agree in that respect.
Of course, one can argue that "long term" is doing a lot of work in your first sentence. And, indeed, nvidia has a long-term future. It took Cisco Systems 20 years to get back to the same share price ballpark as it enjoyed in 2000 (it has never quite reached the same peak). I expect the same for nvidia.
The dot-com bubble and bust was good to Google and Amazon, after not a long time. It was bad to Microsoft, because it missed the boat. It was bad to dot-com wanabees, and, of course, Global Crossing went up in flames.
I think the sector is hyped up, yes. Likely Nvidia is overpriced, and some players will go under. I think however the demand for GPUs will stay stronger than the demand for Cisco's network gear back then.
You might well be right on the specifics, Andy. Google and Amazon were great businesses then (and still are now, though different of course). I mean "great" in the business sense, not the ethical one. dot-com wanabees flamed out spectacularly. At one point, we had over $1.5 billion in dot-com public stock (private investments that had been floated) all of which drifted to zero, pretty much. Some of them were in the 99-99 club (the value shrinks by 99% and then shrinks by 99% again).
I think nvidia has a long way to fall. It is showing a very modest dead cat bounce today. I can't really tell whether GPU demand will stay stronger relative to network gear demand in 2001. As far as I can see, most AI GPU demand is going to be for mindless fluff (airbrushing your frenemies out of the photo or something) that's being stuffed onto everyone's smartphone right now, but perhaps industry will lock on to some decent genAI products. It would help if OpenAI and the like would actually settle down and fix something instead of running off to the next half-baked prototype (autonomous agents is the latest isn't it?)
"Your market share is my opportunity."
Go get it, brother.
LLMs are not the path to AGI, even with $500 billion of compute. There are other approaches to machine cognition which require far fewer GPUs, possibly zero. So, yes, it's only a matter of time.
LLM are surely very rough.
Yet, we have a hierarchy of world models in our heads. Fine motor control alone is highly compute intensive. There's no low-compute approach. But there's room for smarter models.
This doesn't inspire confidence in the major tech players who have spent eye-watering sums to train their models. It seems there's some interesting innovations in what Deepseek has done but more evolutionary rather than revolutionary tech (please let me know if there's any different opinion on this). If that's the case, one wonders then, why the hypers didn't hit on similar approaches earlier. Did they simply go all in on scaling and didn't consider alternatives?
Anyone who lived through the hype of 2000 should know better than to have confidence in “major tech players”….and the stupid VC dart throwers
DeepSeek is a lightweight optimization of what is already known. OpenAI will slim down its own models, once developed.
But then they will double down on compute to make future models smarter. It will be a tic-toc strategy of scale going up and efficiency going up.
Well, I don't know the details but this is all happening very, very fast indeed by the standards of scientific research. I think it's no indictment at all that the major companies "didn't hit on similar approaches earlier", because "earlier" is just a few months ago. And maybe they were indeed working on approaches like DeepSeek but were just a little way behind.
There's lots to criticise about big tech companies and AI, but I think this particular criticism isn't quite fair.
Yes, it’s a good point. Things are moving very fast and I’m sympathetic to the view that LLMs had set a particular development path which tech companies jumped into out of fear of missing out.
That said GPT4 was released in Q1 of 23 and for some time now investors have been questioning the large capex spending without the killer app identified. For a fraction of the cost it might have seemed prudent to spin up a team to figure out how to train and inference more cheaply.
As you say, maybe they did do that and deepseek just got there first.
Or maybe they were so enamoured with "scaling" and hyping they didn't bother. Hard to know, either could be true.
Given that the VCs get a skim of every dollar invested I'm sure they're very enamoured with scaling up and pouring hundreds of billions into the scheme. I'm sure the 3-7% they're getting off the top is the point, not whether the thing will actually work for sure.
For certain they'd be thrilled if someone figured out how to make an AGI, but if the whole thing implodes into another decades long AI winter they will just move on to the next big thing.
This line from your comments yesterday seemed insightful “OpenAI may well become the WeWork of AI.”
I just checked out of idle curiosity. Nvidia's share price has gone up 2,313.20% over the last 5 years of trading. I'm an equity illiterate, but doesn't that mean that something real bad could happen to Nvidia's market capitalization going forward?
it’s still a great, superbly run company in no jeopardy of going out of business. but it has dropped 15% day and could drop more.
The shovels will always be demand, even if what is mined may change. Daily market gyrations are not much of a criterion.
Indeed. Well worth watching.
A tangent: just finished reading S. Pinker's excellent Rationality. Was pleased to see a shout out to you in the introduction. He's a good guy, Pinker.
Marc Andreesen: AI is going to drive your salaries into the dirt!
China: We're driving your valuations into the dirt!
I honestly don't know what is so impressive about DeepSeek. Besides reducing the cost to fraction of prior in producing some unreliable llm, what else has it accomplished? It is still super unreliable.
I just tested it with a sample Math problem from a 1983 US math competition, something I was able to solve in a matter of few minutes, where the answer was supposed to be the size count of a set of natural numbers fitting some criteria from the said problem. DeepSeek produced a long sequence of CoT derivations steps totaling about 70 lines, eventually producing an incorrect count for its answer. I then asked it to print out all the numbers in its answer set (its answer was something less than 2000 numbers, whereas the actual correct answer was less than 500). It refused and instead provided me a Python program to produce its answer set numbers. I ran the Python program, most of the numbers in the print out were wrong and did not fit the criteria from the original competition problem.
This shows that DS-R1 is just as unreliable as any other top of the line llm models of late. No amount of CoT steps solve the problem of hallucination. These llm systems just plainly have no understanding whatsoever. DS-R1 simply changed the landscape of super expensive unreliability to one of cheap unreliability.
Oh for sure it's just another LLM with all the same problems of inaccurate answers, poor security, IP ownership disputes, lack of transparency, etc... Ultimately it'll all recede into a mildly useful tool for a narrow range of problems.
The problem for all the VCs and Google, Microsoft, OpenAI and all their investors who've been pouring billions into LLMs is this is a very cheap open source solution. They've been selling their investors on the idea they'll have an effective oligopoly on LLMs. DeepSeek just destroyed that promise and investor's dreams of getting in at the ground floor of the next Google have been crushed.
In a way, for the business side of things, this is almost as bad as every intellectual property case to be decided against them.
Software eats hardware for breakfast every time, and any stupid brute-force-based approach would be beaten the minute someone comes up with clever code that bypasses the brute force.
That’s true, but for some reason, none of the Clever Hanses at places like OpenAI and Google seem to understand that.
I thought these companies only hired the best of the best of the best of the best(Sir)
Must be the LLMs are writing all the code these days.
Irrationality is the order of the day.
By coincidence, I have been pondering on what to say about AI. I dove into studying the discipline, and following experts like you (Thanks for generously sharing your views here!), sometimes using the milieu to create fantasy tales in which I confuse my "AI agents" with actual friends and human therapists, due to my rather sweaty imagination.
I yearn to use my life experience and new knowledge nuggets. How can I help people fully and safely digest the deluge of artificial intelligence hype and gush of products on offer?
My folk are the the legal operations people, and the small law firms who slip through the cracks and are ignored by the high-priced consultants hunting for clients in the world of corporate law.
The word that popped up in my head is "discernment."
------
I have lived through decades of all sorts of office life developments that promised a great sweeping away of the messy human imperfections of the past. From TQM to business engineering, from pay-for-performance to "merit-based systems" (after routing out anything with even the odor of DEI via confidential-informant phone calls from co-workers and office frenemies/nemeses).
I still am puzzling over how to advise people, particularly as we have reached a tipping point.
This past Sunday, I saw a Special Edition magazine in the grocery check-out lane (next to the mags featuring Prince William, the latest "Women's Day" and a collection of 75 chicken dinner recipes from the juggernaut that is Taylor Swift). This was the mega, 2025 Special Edition $11.00 "Introduction to Artificial Intelligence." Uh oh. The masses have been alerted.
As I toodled home from the grocery store with my little red wagon, I suddenly remembered my plastic-wrapped copy of the mega, 1995 Special Edition $4.75 "Introduction to Home Computers" from the same grocery chain but in my old hometown of Bethesda. This edition featured what might be the very first use of the term "unboxing."
I may have lived too long.
As for "discernment." In the rush of muddled waters, I have decided to advise folks to listen and read carefully, beware of cautionary language from Sam Altman, and emulate a savvy Vermonter surveying his out-of-season clothing chest for anything that can be rehabbed, reused, and refurbished. And eye the purveyors of the latest hype with a seasoned and cautious Yankee eye!
P.S. Try not to be like me, and send chatty notes to ROGER, your AI employment recruiter coach.
https://medium.com/@ma_murphy_58/roger-cant-help-being-an-ai-a-tiny-etheric-tale-b0c025af3c65
Nvidias were simply available in the right place at the right time, nothing more. In the 1990s, it was the same with the transponders in the neural networks
But in general isn't success for anyone just when luck meets preparation? I think it is. And NVIDIA certainly prepared.
Yes. It was ready to sell a lot of units. It didn’t know they would be used for AI. Capitalism. Could be used for toys
No, they were working for a long time on CUDA and using GPUs to parallelise computation, and for a long time before that on the basic graphics technology. I started getting excited about learning CUDA in about 2009, and indeed I started using it for programming brain simulations. There was a significant slowdown associated with any sort of asynchronous computation, and that required cleverness. but if everything stayed a matrix it was a giant leap forward. In many ways an application was inevitable in hindsight, and it's actually NVIDA who've led this by providing the capacity for the killer app that was eventually developed for the chips.
https://medium.com/@ignacio.de.gregorio.noblejas/the-600-billion-mistake-c3a08a36e1aa
Sure. Taking the graphics out of the action and preparing them in advance and providing several variants is of course exactly what language needs later on. The amount of data is even smaler.
Not sure if anyone has pointed this out already... but doesn't the fact that you can distill o1's model down to only 5GB with nearly the same performance mean there is a lot less "intelligence" in it than we thought?
We could try to formalize this via the minimum description length principle, but it just seems like common sense that, if we could distill this into a program that is even smaller — just 50MB or even 5MB(!) — and get 90% of the same performance, this would seem much more like a parlor trick rather than some new form of sentient being.
Hope so, buy time :)
These are fairly general purpose (CUDA) chips. There will always be a new compute demand.
Well sir that didn’t take long!
Oh lovely day. Pragmatics strike again. Mwa ha ha!
It’s not over until NVidia is back to 2023 level valuations
I give it a month
I think one is totally misreading the trends if one assumes Nvidia will decrease in value long-term.
DeepSeek shows that it is possible to do current AI really efficiently. Which means much smarter AI will be not as expensive as we thought till recently. Which will result in more AI sales and more demand for chips.
I was a VC in the 1998-2002 period and that wasn't the experience for the "picks and shovels" vendors then - or, at least, "long" was, well, long. They were over-valued and they over-invested. Their value fell back to a much lower level. I expect this to happen to nvidia.
In the 2000s this actually helped the industry as the infrastructure was cheap, and this will probably happen again, which speaks directly to your second point, with which I agree in that respect.
Of course, one can argue that "long term" is doing a lot of work in your first sentence. And, indeed, nvidia has a long-term future. It took Cisco Systems 20 years to get back to the same share price ballpark as it enjoyed in 2000 (it has never quite reached the same peak). I expect the same for nvidia.
The dot-com bubble and bust was good to Google and Amazon, after not a long time. It was bad to Microsoft, because it missed the boat. It was bad to dot-com wanabees, and, of course, Global Crossing went up in flames.
I think the sector is hyped up, yes. Likely Nvidia is overpriced, and some players will go under. I think however the demand for GPUs will stay stronger than the demand for Cisco's network gear back then.
You might well be right on the specifics, Andy. Google and Amazon were great businesses then (and still are now, though different of course). I mean "great" in the business sense, not the ethical one. dot-com wanabees flamed out spectacularly. At one point, we had over $1.5 billion in dot-com public stock (private investments that had been floated) all of which drifted to zero, pretty much. Some of them were in the 99-99 club (the value shrinks by 99% and then shrinks by 99% again).
I think nvidia has a long way to fall. It is showing a very modest dead cat bounce today. I can't really tell whether GPU demand will stay stronger relative to network gear demand in 2001. As far as I can see, most AI GPU demand is going to be for mindless fluff (airbrushing your frenemies out of the photo or something) that's being stuffed onto everyone's smartphone right now, but perhaps industry will lock on to some decent genAI products. It would help if OpenAI and the like would actually settle down and fix something instead of running off to the next half-baked prototype (autonomous agents is the latest isn't it?)
I think my firm lost money on Global Crossing...