Marcus on AI

Hi Gary you were right on!

It is a forever unsolvable problem, using existing approaches alone [that are centered on data, including 'multimodal', 'contrastive']. That's because video will always be an incomplete record of the physical world at large. It's impossible to build a realistic world model using pixels and their text descriptions. Matter behaves on account of its structure (eg a flute with its carefully drilled holes, diffraction grating with microscopic rules, and thousands of other examples), and its interaction with forces (always invisible), under energy fields (also always invisible). What can be gleaned from one video ("this block is catching on fire") is invalidated by another ("wow it's not catching on fire"). Humans learn these via direct physical experiences, not watching videos (alone). If videos by themselves can help form world models, we could shut down every physics, biology, chemistry... lab in the world!

Expand full comment

Tek Bunny

Also, how much of the video training set consists of special effects and CGI anyway? How would you even train a realistic world model if so much of your training set isn't realistic?

Expand full comment

Saty Chary

Good point. Also, synthetic data is what would be plentifully available, for training future versions [similar to how it will be, with text].

Expand full comment

Paul Jurczak

Dec 10Edited

Object permanence artifacts are just a visual annoyance for products like Sora. Unfortunately, the same problem plagues so called self-driving systems. Not much fundamentally changed over the years in this respect. Looking at the display of modern systems, e.g. Tesla, you will often notice pedestrians, cars and trucks appear and disappear into a quantum foam. I call it the Schrödinger's traffic.

Expand full comment

Waymo is way, way ahead. Their record speaks for itself. One can't afford hallucinations on the road, of course.

Expand full comment

Paul Jurczak

I've seen a vlog of Waymo ride about one year ago. Object permanence artifacts were there.

Expand full comment

As I know, Waymo does try to model tracking objects, and is even able to predict things, such as the possibility of a biker suddenly showing up from behind a truck that was not seen before.

Now, these are not object permanence, but related issues. All these are likely solved approximately and implicitly, with decent-enough reliability in practice.

Dolgov said that Transformers-based systems do a lot better than what they had before. I know some folks still want human-like reasoning, but in practice that resulted in rule-based systems that could never be implemented robustly and flexibly enough. We'll see what the future brings.

Expand full comment

Robert Keith

Thank you for posting this, Gary. If I read you correctly, it seems we now have sufficient evidence that this technology cannot be massaged into much more than it already is at this point, because it is foundationally flawed?

I can see Sora being fine for quick background clips (however one might want to use that), but the idea of anyone doing long-form television or feature films using this technology exclusively is a fool's errand.

Expand full comment

James Francis

The Coca-Cola Christmas ad is aweful. Even though I'm sure it's been edited by humans, you can still see weird object morphing happening in a number of scenes. They just flip out of the scenes so fast that you dont have time to focus on them.

Expand full comment

Robert Keith

Yup. And look at the public backlash against it.

Expand full comment

It is much harder to cheat with movies than with images, as the number of things that can go wrong increases by orders of magnitude, and the human eye can see any unnatural movement.

So, not sure where this is going. However, the main application of current AI, to assistants doing work, is much more likely to work out, as the language and action space is a lot smaller than the video space.

Expand full comment

Aaron Turner

Dec 10Edited

Advanced AI systems need to have a broad, deep, and accurate internal model of the physical universe (world model) in order to be able to understand, reason, and predict things about the actual physical universe. The internal world models maintained by today's GenAI systems are very broad, but only superficially deep, and significantly inaccurate. Accordingly, GenAI's ability to understand, reason, and predict things about the actual physical universe is severely compromised. Unfortunately for the investors in GenAI, this weakness is fundamental to how transformers (and neural nets in general) synthesise internal world models from their training data. It can't be properly fixed by any amount of brute scaling, or any simple fudge like RAG or CoT. Are we learning yet...?

Expand full comment

This is all correct. Today's systems are broad but shallow. This is progress though, as until recently our systems were deep but very narrow.

I think it will be easier to make a broad and shallow system deep, by adding lots of algorithms and models where needed, than to make narrow systems broad.

Expand full comment

TTLX

I agree with all of it except this part: "The John Locke hypothesis that you can learn physics purely from sense data is failing"

It's surely precisely the limitations in the range of modalities of "sense data" available in the training data that is a substantial part of the problem.

It's been my contention that computing takes "leaps" insofar as we invent new input and output technologies (the mouse, LCD, MEMS...), and progress has less to do with raw processing capacity than might be commonly assumed.

The corollary is that AI isn't AGI not only because of insufficient scale, but also not, as Marcus would hold, because it's missing some key software. AI is falling short of AGI because generative models "generate" only in the extremely narrow domains of text and textual image description.

We're limiting AI to the current limit of classical computing, i.e. what we've done so far with our keyboards and mice, which is extremely far off even the most basic animal experience of the world, nevermind a sentient animal. I mean, a tube worm on the seabed has a more varied, multimodal sensory experience than an AI. It can at least hope to learn something more fundamental about the world.

Expand full comment

keithdouglas

Dec 10Edited

Mario Bunge challenged me in 1998 to build a computer to generate novel scientific hypotheses. He was convinced it was impossible; I am not anti-AI in that sense. However, I do agree with his view that empiricism is incorrect - the role of rationalism is mysterious to me, but I agree hypotheses are invented, not read off data. I now wonder (Michotte vs. Hume) whether our kinesthetic experience is part of this. Psychologists - including our host - are there pathologies where people, for example - lose the ability to infer causation?

Expand full comment

Harley Davis

Certainly makes sense that a model trained only on images can’t infer and apply a real physics model. Whether or not a full set of sensory data might suffice is an open question; humans leverage more than vision to make sense of the world. Even if our physics model is partially innate, it was burned into the brain over generations of experiments using only sensory data leading to selective reproductive survival (unless you think God designed it in…). So I wouldn’t totally give up hope that we can use induction to make reasonably accurate world models - but with more than passive vision models.

Expand full comment

TheOtherKC

Dec 10Edited

Try dice! No, seriously, Sora and dice managed to work out worse than my already low expectations. I was thinking, "there's no way a video of dice will stand up to scrutiny. They'll have the wrong numbers of pips, or sides will repeat". But no, the writhing, only vaguely dice-shaped blobs I've been getting with natural-language requests are something else entirely.

Expand full comment

Sufeitzy

We learn basic physics in relation to our bodies, not with verbally descriptions or 2D imagery. I noticed when the Amsterdam science museum opened a few decades ago, a superb building, the computer simulations were pathetic. The Foucault pendulum at the Smithsonian is a fantastic example of almost

Feeling the rotation of the earth. All the 2D computer displays were eventually replaced with actual science exhibits. Until they grasp how to encode temporoapatial objects that will be a poor area.

Vis-a-vis self-driving cars, I’ve used them 3-4 times a week for a year, Waymo. Superb, and everyone I know who uses them feels safer than with a human driver. Used them in two cities now. I’m not sure about other vehicles but Waymo has done a great job .

Expand full comment

Satyaki Upadhyay

Jan 4

> The John Locke hypothesis that you can learn physics purely from sense data

How's John Locke related to this? Did you mean learning physics purely from the "empiricism" of video data?

Expand full comment

Dec 19

"A universal physics engine" and a "generative data engine". Very impressive. https://x.com/zhou_xian_/status/1869511650782658846

Expand full comment

Bill Taylor

Dec 12

Thanks for the article; well written and I get the point.

But respectfully I think the view of the article is too small. AI models can absolutely ace physics ... *IF* they're trained on physics data. Saying LLMs can't do physics is like saying English majors can't do physics. It's not wrong. But what does it tell us really?

More structured counterpoint here, for your comment: https://substack.com/home/post/p-152976751

Expand full comment

Rony Abovitz

Dec 12

Bugs Bunny and Star Wars…also do not obey physics…

Expand full comment

Bill Benzon

Dec 11

Just listened to a talk by Geoffrey Hinton in which gave a reductive and uncomprehending dismissal of your critique of ML. I can't imagine how aggravating that must be. But surely the important point is that he feels that he must dismiss your criticisms. Deep down, he doesn't know, and he more or less suspects that.

Hinton's talk: https://youtu.be/Es6yuMlyfPw?si=kE-zz4ZzRKuN1lTR

Expand full comment

Martin Belderson

It's amusing that this is very much like the trouble the longtermists who infest this field have with the reality of astrophysics and relativity. As opposed to the science fiction they seem to think is real.

Expand full comment

Charles Fadel