Elon Musk promised that Grok 3 would be the smartest AI ever.
One of his fans even predicted earlier today that it would be AGI!
Spoiler alert: it wasn’t. Musk and 3 people from his company just demo’d Grok 3 tonight, live, and to anybody who has watched these systems lately, the demo just looked like a formulaic carbon copy of many other demos: some slightly better than before results on benchmarks, a lot more training (apparently 15x the compute used for Grok 2), demos of automatic coding of a variation on Tetris that didn’t quite seem to work, and a new product called, rather derivatively, “Deep Search” which sounds just like “Deep Research”. For good measure there was a new entrant in the test-time-compute genre with o1, o3, r1 and many more. I didn’t notice anything truly original.
Elon himself acknowledged it was still a “beta”.
Is it (slightly) smarter than the full but unreleased version of o3? We don’t know. [Update 1: probably not, per an OpenAI employee who posted some previously unreleased o3 data that appeared show o3 beating Grok3 on two benchmarks]
That wasn’t tested (presumably they don’t have access), and only a handful of benchmarks were actually reported.
My hot take:
Sam Altman can breathe easy for now.
No game changers; no major leap forward, here. Hallucinations haven’t been magically solved, etc.
That said, OpenAI’s moat keeps diminishing, so price wars will continue and profits will continue to be elusive for everyone except Nvidia.
Pure pretraining scaling has clearly failed to produce AGI. 🤷♂️
Update 2: Andrej Karpathy was given early access, and his conclusions were in many ways similar to mine: Grok 3 is a contender, but not AGI, and not light years ahead of o3.
Gary Marcus has heard all the hype before.
Sad.
Just a note on hallucination:
I tell my teams:
If you want deterministic replies, ask the tool to write a program.
If you want interpretive replies ask the tool for direct output.
It’s directly analogous to Dan Kahenman’s “Thinking, Slow and Fast”. Algorithm vs constructed recall.
LLM’s have built-in non-determinacy, I know it’s called “hallucination”, it’s you can’t get rid of it unless you turn the temperature to zero inside the mechanism, which you can’t do with a chat, directly.
AGI is, as my father used to say, a fig-newton of the imagination.
Hi Gary! When all is said and done, every LLM ever is about producing something out of nothing - intelligence out of a pile of numbers and math calcs over them. We've seen this movie before :)