Marcus on AI

Brandon Mull

Go get it, brother.

Expand full comment

Aaron Turner

LLMs are not the path to AGI, even with $500 billion of compute. There are other approaches to machine cognition which require far fewer GPUs, possibly zero. So, yes, it's only a matter of time.

Expand full comment

LLM are surely very rough.

Yet, we have a hierarchy of world models in our heads. Fine motor control alone is highly compute intensive. There's no low-compute approach. But there's room for smarter models.

Expand full comment

Jeff Ahrens

4dEdited

This doesn't inspire confidence in the major tech players who have spent eye-watering sums to train their models. It seems there's some interesting innovations in what Deepseek has done but more evolutionary rather than revolutionary tech (please let me know if there's any different opinion on this). If that's the case, one wonders then, why the hypers didn't hit on similar approaches earlier. Did they simply go all in on scaling and didn't consider alternatives?

Expand full comment

Reply (3)

Rick Frank

Anyone who lived through the hype of 2000 should know better than to have confidence in “major tech players”….and the stupid VC dart throwers

Expand full comment

DeepSeek is a lightweight optimization of what is already known. OpenAI will slim down its own models, once developed.

But then they will double down on compute to make future models smarter. It will be a tic-toc strategy of scale going up and efficiency going up.

Expand full comment

Well, I don't know the details but this is all happening very, very fast indeed by the standards of scientific research. I think it's no indictment at all that the major companies "didn't hit on similar approaches earlier", because "earlier" is just a few months ago. And maybe they were indeed working on approaches like DeepSeek but were just a little way behind.

There's lots to criticise about big tech companies and AI, but I think this particular criticism isn't quite fair.

Expand full comment

Jeff Ahrens

Yes, it’s a good point. Things are moving very fast and I’m sympathetic to the view that LLMs had set a particular development path which tech companies jumped into out of fear of missing out.

That said GPT4 was released in Q1 of 23 and for some time now investors have been questioning the large capex spending without the killer app identified. For a fraction of the cost it might have seemed prudent to spin up a team to figure out how to train and inference more cheaply.

As you say, maybe they did do that and deepseek just got there first.

Expand full comment

Or maybe they were so enamoured with "scaling" and hyping they didn't bother. Hard to know, either could be true.

Expand full comment

Geopolitical Risk Mitigation

Glen

Given that the VCs get a skim of every dollar invested I'm sure they're very enamoured with scaling up and pouring hundreds of billions into the scheme. I'm sure the 3-7% they're getting off the top is the point, not whether the thing will actually work for sure.

For certain they'd be thrilled if someone figured out how to make an AGI, but if the whole thing implodes into another decades long AI winter they will just move on to the next big thing.

Expand full comment

This line from your comments yesterday seemed insightful “OpenAI may well become the WeWork of AI.”

Expand full comment

Youssef alHoutsefot

I just checked out of idle curiosity. Nvidia's share price has gone up 2,313.20% over the last 5 years of trading. I'm an equity illiterate, but doesn't that mean that something real bad could happen to Nvidia's market capitalization going forward?

Expand full comment

Gary Marcus

it’s still a great, superbly run company in no jeopardy of going out of business. but it has dropped 15% day and could drop more.

Expand full comment

Reply (2)

The shovels will always be demand, even if what is mined may change. Daily market gyrations are not much of a criterion.

Expand full comment

Youssef alHoutsefot

Indeed. Well worth watching.

A tangent: just finished reading S. Pinker's excellent Rationality. Was pleased to see a shout out to you in the introduction. He's a good guy, Pinker.

Expand full comment

Patrick Logan

Marc Andreesen: AI is going to drive your salaries into the dirt!

China: We're driving your valuations into the dirt!

Expand full comment

I honestly don't know what is so impressive about DeepSeek. Besides reducing the cost to fraction of prior in producing some unreliable llm, what else has it accomplished? It is still super unreliable.

I just tested it with a sample Math problem from a 1983 US math competition, something I was able to solve in a matter of few minutes, where the answer was supposed to be the size count of a set of natural numbers fitting some criteria from the said problem. DeepSeek produced a long sequence of CoT derivations steps totaling about 70 lines, eventually producing an incorrect count for its answer. I then asked it to print out all the numbers in its answer set (its answer was something less than 2000 numbers, whereas the actual correct answer was less than 500). It refused and instead provided me a Python program to produce its answer set numbers. I ran the Python program, most of the numbers in the print out were wrong and did not fit the criteria from the original competition problem.

This shows that DS-R1 is just as unreliable as any other top of the line llm models of late. No amount of CoT steps solve the problem of hallucination. These llm systems just plainly have no understanding whatsoever. DS-R1 simply changed the landscape of super expensive unreliability to one of cheap unreliability.

Expand full comment

Glen

Oh for sure it's just another LLM with all the same problems of inaccurate answers, poor security, IP ownership disputes, lack of transparency, etc... Ultimately it'll all recede into a mildly useful tool for a narrow range of problems.

The problem for all the VCs and Google, Microsoft, OpenAI and all their investors who've been pouring billions into LLMs is this is a very cheap open source solution. They've been selling their investors on the idea they'll have an effective oligopoly on LLMs. DeepSeek just destroyed that promise and investor's dreams of getting in at the ground floor of the next Google have been crushed.

In a way, for the business side of things, this is almost as bad as every intellectual property case to be decided against them.

Expand full comment

David Hsing

Software eats hardware for breakfast every time, and any stupid brute-force-based approach would be beaten the minute someone comes up with clever code that bypasses the brute force.

Expand full comment

https://medium.com/@ma_murphy_58/roger-cant-help-being-an-ai-a-tiny-etheric-tale-b0c025af3c65

Larry Jewett

That’s true, but for some reason, none of the Clever Hanses at places like OpenAI and Google seem to understand that.

I thought these companies only hired the best of the best of the best of the best(Sir)

Must be the LLMs are writing all the code these days.

Expand full comment

Moe Murph

3dEdited

Irrationality is the order of the day.

By coincidence, I have been pondering on what to say about AI. I dove into studying the discipline, and following experts like you (Thanks for generously sharing your views here!), sometimes using the milieu to create fantasy tales in which I confuse my "AI agents" with actual friends and human therapists, due to my rather sweaty imagination.

I yearn to use my life experience and new knowledge nuggets. How can I help people fully and safely digest the deluge of artificial intelligence hype and gush of products on offer?

My folk are the the legal operations people, and the small law firms who slip through the cracks and are ignored by the high-priced consultants hunting for clients in the world of corporate law.

The word that popped up in my head is "discernment."

------

I have lived through decades of all sorts of office life developments that promised a great sweeping away of the messy human imperfections of the past. From TQM to business engineering, from pay-for-performance to "merit-based systems" (after routing out anything with even the odor of DEI via confidential-informant phone calls from co-workers and office frenemies/nemeses).

I still am puzzling over how to advise people, particularly as we have reached a tipping point.

This past Sunday, I saw a Special Edition magazine in the grocery check-out lane (next to the mags featuring Prince William, the latest "Women's Day" and a collection of 75 chicken dinner recipes from the juggernaut that is Taylor Swift). This was the mega, 2025 Special Edition $11.00 "Introduction to Artificial Intelligence." Uh oh. The masses have been alerted.

As I toodled home from the grocery store with my little red wagon, I suddenly remembered my plastic-wrapped copy of the mega, 1995 Special Edition $4.75 "Introduction to Home Computers" from the same grocery chain but in my old hometown of Bethesda. This edition featured what might be the very first use of the term "unboxing."

I may have lived too long.

As for "discernment." In the rush of muddled waters, I have decided to advise folks to listen and read carefully, beware of cautionary language from Sam Altman, and emulate a savvy Vermonter surveying his out-of-season clothing chest for anything that can be rehabbed, reused, and refurbished. And eye the purveyors of the latest hype with a seasoned and cautious Yankee eye!

P.S. Try not to be like me, and send chatty notes to ROGER, your AI employment recruiter coach.

Expand full comment

Nvidias were simply available in the right place at the right time, nothing more. In the 1990s, it was the same with the transponders in the neural networks

Expand full comment

But in general isn't success for anyone just when luck meets preparation? I think it is. And NVIDIA certainly prepared.

Expand full comment

Yes. It was ready to sell a lot of units. It didn’t know they would be used for AI. Capitalism. Could be used for toys

Expand full comment

No, they were working for a long time on CUDA and using GPUs to parallelise computation, and for a long time before that on the basic graphics technology. I started getting excited about learning CUDA in about 2009, and indeed I started using it for programming brain simulations. There was a significant slowdown associated with any sort of asynchronous computation, and that required cleverness. but if everything stayed a matrix it was a giant leap forward. In many ways an application was inevitable in hindsight, and it's actually NVIDA who've led this by providing the capacity for the killer app that was eventually developed for the chips.

Expand full comment

Reply (2)

https://medium.com/@ignacio.de.gregorio.noblejas/the-600-billion-mistake-c3a08a36e1aa

15h

Expand full comment

Sure. Taking the graphics out of the action and preparing them in advance and providing several variants is of course exactly what language needs later on. The amount of data is even smaler.

Expand full comment

Kevin Zatloukal

Not sure if anyone has pointed this out already... but doesn't the fact that you can distill o1's model down to only 5GB with nearly the same performance mean there is a lot less "intelligence" in it than we thought?

We could try to formalize this via the minimum description length principle, but it just seems like common sense that, if we could distill this into a program that is even smaller — just 50MB or even 5MB(!) — and get 90% of the same performance, this would seem much more like a parlor trick rather than some new form of sentient being.

Expand full comment

Kenneth E. Harrell

Hope so, buy time :)

Expand full comment

David McCallie

These are fairly general purpose (CUDA) chips. There will always be a new compute demand.

Expand full comment

Steven Collins

Well sir that didn’t take long!

Expand full comment

Brandon Mull

Oh lovely day. Pragmatics strike again. Mwa ha ha!

Expand full comment

Kvetch

It’s not over until NVidia is back to 2023 level valuations

Expand full comment

Digitaurus

I give it a month

Expand full comment

I think one is totally misreading the trends if one assumes Nvidia will decrease in value long-term.

DeepSeek shows that it is possible to do current AI really efficiently. Which means much smarter AI will be not as expensive as we thought till recently. Which will result in more AI sales and more demand for chips.

Expand full comment

Digitaurus

I was a VC in the 1998-2002 period and that wasn't the experience for the "picks and shovels" vendors then - or, at least, "long" was, well, long. They were over-valued and they over-invested. Their value fell back to a much lower level. I expect this to happen to nvidia.

In the 2000s this actually helped the industry as the infrastructure was cheap, and this will probably happen again, which speaks directly to your second point, with which I agree in that respect.

Of course, one can argue that "long term" is doing a lot of work in your first sentence. And, indeed, nvidia has a long-term future. It took Cisco Systems 20 years to get back to the same share price ballpark as it enjoyed in 2000 (it has never quite reached the same peak). I expect the same for nvidia.

Expand full comment

The dot-com bubble and bust was good to Google and Amazon, after not a long time. It was bad to Microsoft, because it missed the boat. It was bad to dot-com wanabees, and, of course, Global Crossing went up in flames.

I think the sector is hyped up, yes. Likely Nvidia is overpriced, and some players will go under. I think however the demand for GPUs will stay stronger than the demand for Cisco's network gear back then.

Expand full comment