72 Comments

1. I think looking at a year ago to this (as MKBHD did) is pretty flabbergasting.

2. I continue to not understand *at all* how anyone is saying AGI. I don't have any mystical view of human brains, and I don't think highly of our "rational" abilities, but there is just so clearly no hint of "understanding," let alone general intelligence.

Expand full comment
Feb 18Liked by Gary Marcus

It’s not like we don’t have good ideas about how to architect systems with global knowledge or an understanding of physics. See, for instance Danny Bobrow’s book Qualitative Reasoning about Physical Systems. It’s been moderately amusing and very irritating to watch the swing of the AI pendulum from “Perceptrons Good” to “Perceptrons Bad” then a decade or so later “Connectionism Good” and the slow ramp to mid 2010s and the sudden takeoff of “Connectionism is All”. The result has been the enshrining of data as the basis of intelligence and the lack of recognition of the need for knowledge and understanding. AGI it ain’t, and never can be.

Expand full comment
Feb 17Liked by Gary Marcus

Just excellent!!! So then compare where it does extremely well with where it has unicorn horns through heads and 7 sided chess boards. Where is it great. Where is it nuts? What is the difference between these situations? Where is the cliff into lunacy, the trigger cases for crazy? There seems again something analogous to the uncanny valley. So close, so close.. Oh my God awful.

Expand full comment
Feb 18Liked by Gary Marcus

These silly little toy systems do not "make things up." That is the Anthropomorphism Fallacy: projecting uniquely human qualities onto something that isn't human. They follow their programming and spew output.

Expand full comment

"perhaps better called failed approximations, as Gerben Wierda has pointed out" - exactly! Because it is practically impossible to capture the full joint probability distribution of the data (as it has exponential complexity in the number of data points), the so called GenAI uses autoregressive models to approximate that distribution. By conditioning on the previously generated tokens the distribution of the next token becomes one dimensional and much easier to approximate. However, the trade-off is that such an approximation is not very accurate and is very brittle because it does not guarantee any bounds on the error. Hallucinations are simply the result of the autoregressive approximation failing in a very notable way. They are not a bug but indeed a fundamental feature of the autoregressive approximation.

Expand full comment

A very good point that these systems don’t have internal models - they might have statistical patterns, but localised to token-by-token, pixel-by-pixel, frame-by-frame, I imagine? On that point, what is the mathematical difference between capturing a statistical pattern and having a world model?

For example, does the vector corresponding to “cat” in the embedding space actually represent “cat-ness” relative to other tokens - or perhaps it does, just that mere relative representation is insufficient to construct a world model? What is a world model mathematically speaking? A graph of vectors?

Apologies for the disorganised thoughts, very interesting things to wrap my head around.

Expand full comment
Feb 21Liked by Gary Marcus

I showed the unicorn picture to my kids. They noticed the hand first, then a full three seconds later, a loud shriek when they realised the horn! Thanks Gary - your research is a great teaching tool about AI mistakes (and helps enriches my kids' general knowledge too)! =D

Expand full comment

And a bit later, the main woman's legs pass through each other during a step.

Also note the woman dressed in white a bit behind the main one, and notice how her legs do odd things.

Note also, I suspect that OpenAI consider this video to be their flagship demonstration of how wonderful Sora is in spite of these really important flaws that will not impress fill directors.

Expand full comment
Feb 18Liked by Gary Marcus

The monkey’s face resets entirely about midway through the video, too. Right about when he turns his head to look away from the viewer for the second time. This doesn’t speak to a great understanding of object permanence to me.

Expand full comment

From what OpenAI have said, it is a kludge of GPT4 that takes the human "natural language" prompt and then turns it into the full set of prompts and inclusion files that are required to drive Dall-E 3. Nice and simple and reuses their expertise.

We have argued to death the failings and minimal capabilities of GPT4 in terms of world knowledge and physics and engineering. We are now seeing how the Dall-E 3 diffusion engine places the "patches" in the canvas and then refines the quality of the pixel content and looses its marbles.

The OpenAI explanation is clear about the process of laying out the overall canvas for the video and then using patches, next patch prediction and then uses some form of next pixel prediction in an iterative process until the quaity is good enough.

So, as we understand GPT4 and Dall-e 3, we know the limitations of Sora, and we also now know how little we can expect from it..

Expand full comment
Feb 18·edited Feb 19Liked by Gary Marcus

"I am not complaining, mind you, about the fact that a monkey might pose in front of a chessboard, but about the chessboard itself: 3 kings, only one on the board, and a 7x7 rather than nearly universal 8x8 chess board (to say nothing of the pawn structure). Utterly impossible in chess—and presumably nowhere in the data set, either. Yet it rendered the image photorealistically."

That's astounding.

Superb analysis, Gary. Even a layperson like myself can follow it %-0

"It’s probably not fair to blame the weird board on a lack of data. Judging by the quality of its images, Sora was trained on an immense amount of data, including data about how things change over time. There are plenty of videos of people playing chess out there on the web. Chess boards don’t morph from 8x8 to 7x7 in the real world, and there are probably tons of 8x8 boards in the databases - and few if any 7x7. What Sora is doing is not a direct function of its training data."

So the generative AI program is "learning", in some sense. It's just that it's inertially prompted to ask the wrong questions, and then--correct me if I'm wrong--to accept the answers without question (because when is the last time a computer ever rejected any input that was formally correct?), and then act on what it has "learned." With the preconceived (if not relevant or proper) set of learned parameters in place, the rest is up for grabs.

Although, hmm, what's more formally correct than the logic of a chess game? if AI had any spark of its own, the program set would have gotten in touch with its uncanny affinity with formal logic, and then roamed the www. to scrape Deep Blue (or what have you) at least far enough to "know" that there's no possible way for the game of chess to work with only 7 x 7 squares (or with three kings- !)

I don't know what was in the prompts for that chimp-playing-chess-in-the-park pic, but if the word "chess" was a keyword, I would have expected at least some hint of "aha!" in response from a Major League Artificial Intelligence program.

Except that world-beating AI chess programs evidently don't have any innate spark of their own, either. They have no secrets to impart to a newbie AI program. AI chess programs--so I'm told--have found that the most effective way to win the game is "brute force" approach that maximizes all outcomes and all possible courses of result in order to calculate the next move. I think "brute force" is a particularly ungainly anthropomorphism for the approach. I think a better phrase is something like "probability field theory". A better place to take the programming ideal, at least for a chess program.

Given a game with the formal outline and restrictions of chess, a computer finds the rules easy to train to. The algorithm just doesn't care if it never has to play another game of chess in its entire existence (however long THAT might be.) Even a generative AI algorithm is indifferent, to everything. Indifferent. To everything.

The masterfully invulnerable AI chess program not only doesn't care about the game of chess, it doesn't know what "chess" is. It isn't going to challenge ChatGPT11.0 to a game of chess. Or vice versa, either.

Expand full comment
Feb 17·edited Feb 17Liked by Gary Marcus

Good one :)

Pixel-by-pixel calculations, or word by word ones, using numerical embeddings as inputs, have NO meaningful relation to truth!

Any truth that does emerge (and a lot, does) is solely on account of its locality in the embedding (eg word order, chessboard squares) - it is incidental, not purposeful; it is an artifact of the computation on the data, not reasoned.

Multi-modal 'hallucination', esp in visual form, points out the underlying absurdity of it all - about the magical wishing for meaning.

Expand full comment
Feb 18·edited Feb 18

The pieces are also way too big for the board -- the bottoms of the pieces wouldn't fit within the chess squares. That's definitely not something that has ever been seen in the training set.

Also the white king at far left appears to be resting both on the board (which is maybe 1/2 inch thick?) and on the table below the board at the same time -- but somehow is not tilted. The chess board both does and does not have thickness. A little like the girl with both her hand and her feet on the surfboard.

Expand full comment

Gary - I work in the film industry in Vancouver. I am watching those who have bought into the generative AI hype very carefully, and am frankly a bit flabbergasted at how little critical thinking they are putting into Sora's announcement - preferring instead to declare it "the death of Hollywood," etc. Many cite producer Tyler Perry's recent announcement that he is going to put an $800M expansion of an Atlanta studio facility on hold.

What follows below is a recent comment I left under a YouTube video. I doubt many will read it all (and fewer still will accept it), but it's my analysis based upon both my knowledge of the film industry, and reading the evaluations of AI experts such as yourself >

There appears to be some vast misconceptions about how this generative AI technology works (or more precisely, doesn't work).

For quick clips, commercials, video bites, animation, VFX augmentation, pre-vis and graphic and production design work, yes, this will definitely have a major impact. However, for making entire films and television, that's a whole other level of complexity that scaling and advancements using the current generative models will probably not be able to achieve. Meanwhile, it doesn't matter how good you are at "super prompting," with current generative AI models you cannot get consistent, precise, repeatable results every time...for all the various matching shots you would need in an episodic series or motion picture.

I'm a bit surprised folks aren't reading more deeply about how the underlying generative AI models work, frankly. Even the experts designing them don't fully understand what's going on inside the black box, and after years of gradually developing these models they are no closer to solving the hallucination problem than they were at the outset. The photorealism is getting better, but the AI's understanding of the physicality of the real world is not improving at all. And, no, RAG (Retrieval Augmentation Generation) is probably not gonna fix the underlying problem, either.

The bottom line is that the current model designs of generative AI, while impressive on the surface, are fraught with unreliability and inconsistency. And those problems don't seem to be getting resolved. And that's anathema to major motion picture or series production.

And that's not getting into the myriad other technical, logistical, legal, economic and sociocultural hurdles that this technology is beginning to face.

As to the "democratization" argument, folks have been saying that for years, every time a new technology arrived. It was said about digital photography, it was said about capturing 4K video on phones, it's been repeated ad nauseam about YouTube. But what everybody forgets about all this "empowerment" is that you end up with a lot more noise, and it just becomes that much harder to find the signal. How many YouTubers, for example, ever make it to one million subs? You still have to separate yourself from the crowd.

One thing Hollywood already has, is the entrenched distribution channels and PR machinery. And that's not going anywhere anytime soon. The other competitive advantage Hollywood enjoys revolves around a word you're going to hear a lot more of as fakery permeates every corner of the Internet: "Authenticity." In other words, the capability to make movies and series that star real human beings interacting in real environments. Authenticity will become the new currency of the realm as fakery becomes pervasive...and quickly maligned.

Again, people aren't investigating this generative AI stuff carefully enough. They're just buying all the hype. And, of course, companies like OpenAI love that, as it makes their valuations go through the roof.

Expand full comment

So, LeCun has a very good point. Pixel-level logic is not good enough. Need higher level representations.

Whether neural nets are used for those representations or not, is not that important, if the architecture is solid and flexible.

Expand full comment

I don't know if anyone has mentioned it, but in the walking woman video there is a major continuity problem after the cut to close up: her dress has new designs at the top and her left lapel is now twice as long.

As for monkeys playing chess, these rash of "amusing" animal videos are revealing a disturbing lack of ethics at OpenAi. There is a particularly grotesque animal video called Bling Zoo showing a tiger agonizing in a cramp cage, while in a second shot a turtle eats a string of diamonds, something that would kill a real animal. No sane person would ever think to make let alone showcase a real video like this.

Expand full comment