104 Comments

While these are all great numbers from the Chinese companies, and that probably means usable and affordable models for certain tasks will be possible, I am going to be careful to draw conclusions before there is more than just some *benchmark* numbers blogs and such. And *selected* benchmarks at that.

The tasks are very specific (i.e. math) and they are benchmarks. Do we know the test data wasn't part of the training data? And even if we know it wasn't *exactly*, might we be running into situations where the *variability* of the test and training data is becoming the issue?

Take DeepSeek V3 (37B parameters, right, what size?). It does 90.2% on MATH-500, but that is a number with (EM) attached, so "exact match" with the reference answer. Ouch. That is a warning sign for me. The other two math benchmarks next to it are 39.2% and 43.2%, both Pass@1. So, does it do well on math?

The vibe I am a little bit getting is the race for benchmark numbers in the 1990s (first MIPS was the big thing, then FLOPS).

And then, additionally, we all are going to use DeepSeek and other Chinese models and all our input and replies ends up in trustworthy hands, right? And we're going to trust what comes out of that when it's something other than math, right?

Expand full comment

Thank you! After everything that's been said here about the problems with benchmarks, why are we trying this one at face value?

I'm open to the possibility that this is legit. I'm also open to the possibility that Chinese tech companies are no less inclined to bullshit the public than American ones.

Expand full comment

Anyway, how can we entrust our confidential and/or strategic data to companies that started by "massively borrowing" all digital data on the Web and digital storage media without worrying about copyright?

Expand full comment

"we all are going to use DeepSeek and other Chinese models and all our input and replies ends up in trustworthy hands, right? "

<smacking self upside the head>

Indeed. An intelligence coup. Quite so. The foo is on the other shoot!

And I did not think of that immediately. I shall go off and jump in a lake now.

Expand full comment

yeah since we can surely trust google and open Ai right?

Expand full comment

"Building $500B worth of power and data centers in the service of enormous collections of those chips isn't looking so sensible, either."

Exactly.

In fact, it's looking like a massive waste of money, time, and attention. In the style of absurd exaggeration and tech hubris we've come to expect.

I'm impressed with the name of the initiative. Stargate. Exactly the kind of semiliterate nonsense that I'd expect from our tech oligarchs.

Where and how were these guys educated?

Expand full comment

How much of Stargate was about currying favor with Trump by letting him make a big, impressive-sounding announcement that will never bear fruit. Remember the factory jobs he "saved" that disappeared anyway? Or the $10 billion Foxconn factory in Wisconsin that never got built? Trump already got what he wanted out of this: a splashy announcement and credulous media coverage. He won't give a rat's ass whether it happens now or not. He won't even notice.

Expand full comment

What? Not our Donald.

He has virtues, but immunity to flattery is not one of them. Last I checked.

<scratching head> But then, a miracle might have occurred.

What probability should I assign to that?

Expand full comment

It's only a matter of time before Trump launches "RoboForce".

Expand full comment

I would prefer "RoboForce" but whatever works.

Expand full comment

Infinitely better! I shall update... :-)

Expand full comment

Are you kidding me? Stargate is an amazing name. The sci-fi TV series ran for years and had two spin-offs. It wasn’t semi-literate or nonsense. It literally was a gate to interstellar travel

Expand full comment

The name is terrible. It has nothing to do with the series or any of the ideas that the series explored. To the extent that AI was a theme in the series, it was when they were fighting AIs (the Replicators), not making AIs of their own.

At least Peter Thiel was out of the room when they were picking names, or we would have gotten another Tolkien name that didn't make sense.

Expand full comment

Humour comes in 3 dimensions. You missed one of them

Expand full comment

Apparently so, as I'm still struggling to see the humor in your comment. :-)

Expand full comment

Dude. Don't burst my bubble here! Aha! You are that David Crouch. The evil one with a symbiont.

But seriously, Hawking thought stargates were plausible in our universe.

Expand full comment

OMG! Really?! The show was based on real events?! There really is a stargate deep under Cheyenne Mountain?!

<running off to the airport to book a flight>

Expand full comment

With Elvis. Marilyn Monroe and Robert Kennedy (the true one) aboard.

Expand full comment

Nice joke!

Expand full comment

Alright. I'll not be a smart alec here. The $500 billion of power isn't enough. The datacenters? Those won't matter much.

$500B worth of power is absolutely needed. Those power sources will be nuclear because that's reliable and dispatchable power. The country is going to need massive amounts of power to deal with climate change and to replace fossil fuels. Why? Because there is a strong relationship between energy and the value of money. 9.7±0.3 mW per inflation-adjusted 1990 US dollar

Garrett, T.J. Are there basic physical constraints on future anthropogenic emissions of carbon dioxide?. Climatic Change 104, 437–455 (2011). https://doi.org/10.1007/s10584-009-9717-9

"...the evolution of the human system can be considered from a surprisingly simple thermodynamic perspective .... Specifically, the human system grows through a self-perpetuating feedback loop in which the consumption rate of primary energy resources stays tied to the historical accumulation of global economic production—or p×g—through a time-independent factor of 9.7±0.3 mW per inflation-adjusted 1990 US dollar."

Expand full comment

That would be great. Assuming 95% of the $500B doesn't get stolen. Which is highly probable given that the US Federal Government is currently a kleptocracy. Κλέφτες.

Expand full comment

The $500 billion is private money, not public. It is Masayoshi Son at Softbank making Trump hap-hap-happy. And right now, the last report I saw was that Softbank has $10 billion on hand. The rest is "dry powder" that is still in the hands of limited partners (LPs).

The theft of $42 billion from the rural internet connectivity budget dating from 2021 is mostly not actually true. It's just held up and never disbursed by fossilized bureaucracy at the state and federal levels. https://www.politico.com/news/2024/09/04/biden-broadband-program-swing-state-frustrations-00175845

The place where money evaporates is in California. The homeless money ($24 billion)

https://www.cbsnews.com/sanfrancisco/news/california-homelessness-spending-audit-24b-five-years-didnt-consistently-track-outcomes/

and the $23 billion high speed rail money

https://www.latimes.com/california/story/2024-03-21/high-speed-rail

The homeless program is an actual WTAF happened to the money?

The high speed rail is mired in regulations, with mountains of paperwork.

Expand full comment

Love this sentence: "The race to AGI will be won not by the country with the most chips but by the one that best fosters true innovation. "

Sir Keir Starmer seems to be making the same mistake by thinking investment in infrastructure instead of a collaborative brain trust (fostering innovation by doing things more cleverly rather than throwing the kitchen sink at the problem) is the way for the UK to get in on the AI race. This continued closed minded (Brexit like) thinking will lead the UK nowhere in the AI race.

PS: I have fine-tuned the distilled versions of Deepseek (32B Qwen) on reasonably cheap A10 chips and they are frighteningly good for RAG applications on your own documents.

Expand full comment

It's the height of irony that the Chinese have proven (to us) the power of unfettered capitalism as well as its limits. At the same time.

Expand full comment

Ok I get it I'm just a shlubby tool bender but explain to me what, outside of some toy "oooo look at me I can make a list of my favorite songs and cheat on that paper I have to write for english class", pass some tests in a lab that may or may not have anything to do with "intelligence", or "hey look I can summarize a whole tech manual for the compressor on a GE90-115B (you know the one the maintenance crews and line chiefs tear out and save the schematics from, then stick the rest on top of a file cabinet as a dust collector), are good for in the real world? How do theses tools help me deliver better knowledge that is complete, consistent, correct and current, to the decision point in mission critical core business applications? How do they do that better than the "olde tyme" Symbolic Declarative or Neural Symbolic approaches that are out there? Because from everything I've seen regardless of "generation" or national origin these tools are just a variation on "meme coins" they are based on hype and are really just to make the brolegarchs richer. But hey I'm educable, even at my advanced age, so school me.

Expand full comment

"How do theses tools help me deliver better knowledge that is complete, consistent, correct and current, to the decision point in mission critical core business applications?"

They don't. LLMs are stochastic parrots spewing word salad at the speed of elections utterly dependent on the Eliza Effect for meaning.

Expand full comment

Behold! Much in the way electricity was misapplied early in its usage, so goes AI.

There is evidence that LLMs are extremely effective tutors. This has tremendous ramifications for education in general.

The mastery of pattern recognition in these models is likely to make significant contributions to work that requires diagnostic analysis.

LLMs appear to help improve the abilities of inexperienced employees, in fields such as software development. It does not solve difficult problems, but it is faster at solving existing, well-established problems.

LLMs are very good at translation, between human languages, and also from human languages into APIs, which will likely make training general purpose robots much easier, as well as a variety of speech-based interfaces to software systems.

.

To be clear: they are over hyped, not general intelligence, and not likely to become self-aware. They are bad at systemic thinking and dealing with new paradigms. But they do have very intriguing possibilities. TBD.

Expand full comment

"The mastery of pattern recognition in these models is likely to make significant contributions to work that requires diagnostic analysis."

I think it's important to recognize that these systems get their pattern recognition training from humans, and that while human pattern recognition is one of our species' competitive strengths, we're also notorious for mistakenly identifying patterns where none actually exist.

Expand full comment

pareidolia

Expand full comment

Precisely.

Expand full comment

Re: pattern recognition being used in diagnostic testing. Are you describing an application of LLMs, or machine learning more generally? I ask because this is something I associate with ML, and don't think the OP was suggesting ML is useless. That would be a pretty hard claim to defend.

Expand full comment

Creating barriers to entry work only for short-term, if at all, and typically does more damage in strategic terms. We need to continue to focus on innovation, objectively. As for current LLM space, I suspect we will reach both convergence and a ceiling in performance since we will soon (may be 2-3 years) run out of regurgitating the existing algorithmic machinery on the “new” paradigm while the core issues such as reliability receive minimal attention. We are already into the efficiency play (logical and practical for actual use) but most focus shifting to that may also be an indication of diminishing opportunity in innovation to advance solutions to core problem spaces.

Expand full comment

I’m curious about the limited discussion broadly about how little consumers seem to want LLMs. The tech companies know this - it’s why they are now forcing them on us with no option to opt out, both for revenue and to make everyone an involuntary research subject. If you ultimately can’t solve hallucinations, and LLMs are limited to what you put in them at massive cost, both in $s and to the environment, what’s the point?

Expand full comment

I suspect the point is the people who invested bazillions into this technology are trying not to get egg on their faces. But they can only hold out so long.

Expand full comment

I know! The hallucinations! They're everywhere!

Expand full comment

GenAI businesses want to make us stupid and dependent on their technologies, like social networks, for life! Brain rot all the way! The eternal law of least effort!

Expand full comment

The point is $$$.

Expand full comment

Gary, keep up the good reporting. Your predictions are good to have but remember that anyone among us who is clear eyed sees much the same things. Talk to us too.

I like to remind people who tell us to worry about national security that NS is a myth. I left the defense industry in the 70s because we were as nationally militarily secure as we were going to be - at least two countries could destroy our world and thus any security depended on our leaders' willingness to share the world.

I was good at math, offered .5 M in today's dollars to stay in defense but I knew all I could add was some gilding on the lily.

My intuitive math tells me that AGI is never and generative AI outside the physical sciences is mush - a ten year old with a monstrous command of humanities writings but forever bursting into our conversations with strange or impossible mashups of what they had memorized.

re China - they are who they are - 4 times our population - emphasis on practical education - rich - more efficient government that put Jack Ma in his place.

Americans liked by the press these days are mostly astute at publicity through exaggeration. Elon Musk is Lucy and FSD is the football Charlie Brown is going to kick.

Love you all.

Expand full comment

Sputnik, with Chinese characteristics. Does it spark consternation, or coolheadedness and resolve?

Expand full comment

Wow, I love that historical leap.

Expand full comment

LLMs are dead-end tech, so whoever leads in LLMs is irrelevant. But more importantly this notion of a tribal race to AGI is utter madness. The only way in which we all win is to somehow come together as a species, globally regulate the arse off of human-level AGI or above, and build superintelligent AGI together, as a global public good, collectively owned by all humanity. Otherwise we all lose.

Expand full comment

Successfully reinforcement learning being incorporated does make me nervous though lol

probably not full agi but still has potential to be very disruptive

Expand full comment

100% agree! There one human specie and one planet Earth that we share all,

Expand full comment

There is one possible moat, which is some of the Chinese models, at least the one from ByteDance, does everything the national security community in the US feared TikTok is doing. It white washes CCP and North Korean atrocities while bashing the US. https://podbay.fm/p/the-china-show/e/1735353598.

The China Show is made by two guys who lived in China for 10 years. They had to flee when Xi made reporting any bad news about China illegal. Their show is now banned by Chinese government. IDK if Deep Seek or other Chinese models have similar pro-communist bias built in as the ByteDance one does, but not a lot of people are going to want commie AI, so to the extent that US and European models are NOT trained to spout CCP propaganda, they'll be more popular.

BTW, for people who think export controls will save us, I've worked with people who impose those for years. It's one of those things the US government does to give the appearance of taking effective action. But appearance is mostly what it is. Shady businessmen selling to shady customers always win. We slow it down a bit. We make customers pay a premium as evading sanctions costs $. The consumer can't get as much of what he wants as fast as he wants it. But they get most of it. Export control evasion happens all day, every day. Look at how many Texas Instrument chips wind up being in Iranian drones used to kill Ukrainians. The only export real export control we have in AI right now is the Dutch company that makes the best chip foundry equipment won't sell their machine to China. But-that just means the Chinese are pushing forward with research on alternate lithography techniques, and they may find something that works as good or better than what they were not allowed to buy. It may take a decade-but they're not afraid of long-term investments.

Expand full comment

Not surprisingly, no mention of the UK in all of this. Without the armies of researchers or raw compute power of either China or the US, our best bet is probably to apply LLMs in a sharper and more pragmatic way than anyone else.

Expand full comment

Europe 100% same predicament

Expand full comment

Perplexity CEO Aravind Srinivas was interviewed on CNBC a few days ago claiming that his company had more-or-less solved the hallucination problem. And that soon these systems would be able to reason.

We live in an age where one doesn't know who to believe anymore, because the greed-fueled hype machine is running at max RPMs.

Expand full comment

A good rule of thumb is ignore everything the tech CEOs say. Hate that I've gotten so cynical, but they've earned if over the past couple years.

Expand full comment

perplexity ain't really a research company, its much more a sophisticated wrapper over various llms so I kinda doubt that, also no source for the claim

Expand full comment

If a world war, with embargos, trade impediments, material and financial constraints, can catalyse jet propulsion, rocketry, radar and nuclear power, then it's probably no surprise a comparatively minor trade skirmish might catalyse some thoughtful AI innovation.

Expand full comment

"Necessity is the mother of invention" - Platon

Expand full comment

Great analysis ✅💯, for contrast one might compare a) electricity b)quantum mechanics c) aviation & aerospace… who ‘won’ those wars and why? The answer usually comes to really really smart & creative people (the discoverers & inventors) + investors at scale. … for a while the US was (and maybe still is) the most attractive place for them… but … we’re not alone in these anymore

Expand full comment

The real question is, what does a society look like with abundant, good AI. The race only addresses a sliver of that. I think the answer won't come from technologists.

Expand full comment