28 Comments
Jan 29Liked by Gary Marcus

Hi Gary, the errors seem egregious, partly because, when the imagery, text, video... seem correct to us, we tacitly assume they 'knew' what to generate.

But the more bland reality is, they have zero clue about anything at all, even when they compute values that turn out correct (to us).

Stochastic monkeys pounding on loaded keyboards, throwing paint on loaded canvases - with zero understanding of what results...

Expand full comment
Jan 30Liked by Gary Marcus

What about those veins standing out on the unicorn's face? It's like some of the texture from the man's hands bled over onto the unicorn.

Expand full comment

Horses can have engorged veins like the ones depicted in the image, but that typically happens after a lot of exercise or possibly conditions such as blockage or clotting (or heart issues). Not so much if the horse is just standing around being hugged by older gentlemen.

Expand full comment
Jan 31Liked by Gary Marcus

"The painting isn’t really particularly in the style of Michelangelo."

We can say something stronger than that - it looks like a creepy digital image, not a painting, and absolutely nothing like Michelangelo. Here is what a painting by Michelangelo looks like: https://s3-us-west-2.amazonaws.com/courses-images-archive-read-only/wp-content/uploads/sites/1122/2016/08/16190403/sibyls.jpg

These AI images always look creepy, and it is not just the problems of anatomy and physical space that you have pointed out. An art historian would be able to explain better than me. I think some factors are the excessive detail, unnatural lighting, and the way the textures (skin, hair, fabric, etc.) all look wrong.

Expand full comment
Jan 30Liked by Gary Marcus

What a beautiful piece of art, it seriously made me laugh out loud hard along with Chris McKillop's rephrasing of the prompt and your analysis. These AI mistakes are so funny. Thanks for all the (educational) laughs, Marcus.

Expand full comment

Even when they aren't full of egregious errors, AI generated images have a certain immediately recognisable otherness. Earlier this week, I saw a presentation at work that was illustrated on the lines of 'generic stock image of scientist who examines data on a screen', except AI generated instead of stock image, which made sense because the presentation topic was AI, so no problem with that at all. But the scientist had a kind of uncannily shiny appearance and just slightly too idealised features, and of course the screen showed garbled nonsense. Stock photos of people in immaculate white coats pipetting colourful dye into glassware more appropriate for a late Medieval alchemy lab are bad enough, but I feel this is worse.

There are two reasons why despite horn-through-head style errors and off-putting appearance, people are excited about AI media. One is that we are pretty good at seeing what we expect to see, so we often don't notice errors or gloss over the weirdness, just like how we recognise a face in :-).

Second, cultish hype. I do not really understand what is going on inside the heads of people who fill the comment thread under a video of unnaturally moving people with melted features with responses like "so amazing" and "just like Hollywood", but these people exist, they aren't all bots, and for some reason they don't seem to have more pressing things to do than shilling the original poster's business of online courses on how to generate AI 'art'.

My best guess is that it is a mixture of some who hope to grab onto the coattails of fellow hustlers and make some money off a hype cycle and others who seriously believe that Elon Musk and Sam Altman are saving humanity by building robot Messiah to usher in prosperity, interstellar travel, and immortality for all, and even the tiniest bit of scepticism or criticism will delay or even endanger this utopia. Having reasoned thus far, the one thing I get stuck on is why anybody would trust a billionaire to share hypothetical, future AI-generated wealth and rejuvenating drugs with his fan "John12387 laser eye avatar blue tick", but that thought is now really out of scope.

Point is, despite all the flaws, people on social media will continue to claim these images and videos are amazing. And that dissonance can't be good for their psyche.

Expand full comment

The horn shows rings, but not helical twists.

The guy is unrealistically tall--in addition to needing a very long right arm, an adult unicorn would be expected to be the size of an adult horse. But he's taller than the animal.

Expand full comment

It's obvious Joe, the man is God.

Expand full comment

So what I'm seeing, as an artistically talentless twitter troll you don't like that fact that AI draws better in seconds than you might achieve in days, and it makes some mistakes (some of which artists famously made).

Every exceptional achievement is met with derision. Honestly, its getting a little embarrassing. I pity the people who's papers you review.

Expand full comment
author

please do us both a favor and unsubscribe.

Expand full comment

Which artist "famously" drew a horn passing harmlessly through a person's head? I would think that a mistake like that would be more aptly described as "infamous" than "famous". Some of the mistakes, like the implied length of the man's right arm, are subtle and could be made by a human artist, but not that one.

More importantly, you are failing to grasp the reason why people keep pointing out these mistakes. The purveyors of these models continue to insist that the models have developed an understanding of the world that guides the things they generate, and mistakes like these are the counterexamples that refute those claims. If they stuck to more modest claims, like "it often comes up with some pretty good drawings in mere seconds," then people wouldn't be so interested in its failure cases.

Expand full comment
author

Well said and exactly right.

Expand full comment
founding

Idk, Stuart, so when the same "Gen AI" system makes a tiny error (in mere seconds) by launching a nuclear device, that's not important either?

You clearly do not understand the concept of "error propagation"...look into it.

Expand full comment

rapunzel hair on the horsie, verry unusual hair braid on the dude (no judgement)

Expand full comment

This kind of inherent weakness at seeing globally (or at all) has been exploited by a game player to beat a supposed world-champion Go-playing AI. (I'll post the reference if I can find it again...).

Expand full comment
author

it’s from stuart russell’s lab, and i wrote about it in the substack many months ago (but the paper has since been updated).

Expand full comment

I figured you must know about it – I'll check out your article.

Expand full comment

Ha, just found that out – you beat me to the punch by one second (or I beat you? ;) ). :)

Expand full comment

Ah here we Go (no pun intended):

"The tactics used by Pelrine involved slowly stringing together a large “loop” of stones to encircle one of his opponent’s own groups, while distracting the AI with moves in other corners of the board. The Go-playing bot did not notice its vulnerability, even when the encirclement was nearly complete, Pelrine said.

“As a human it would be quite easy to spot,” he added.

The discovery of a weakness in some of the most advanced Go-playing machines points to a fundamental flaw in the deep-learning systems that underpin today’s most advanced AI, said Stuart Russell, a computer science professor at the University of California, Berkeley.

The systems can “understand” only specific situations they have been exposed to in the past and are unable to generalize in a way that humans find easy, he added."

https://arstechnica.com/information-technology/2023/02/man-beats-machine-at-go-in-human-victory-over-ai/

Expand full comment

Isn't the left arm actually sculpted in marble? I think that could be a reaction to the Michaelangelo prompt

Expand full comment

You can actually do a spot the difference with GPT - try it here "create a spot the difference image of an old wizard with a unicorn" https://chat.openai.com/g/g-Yas2WSu7S-cognitive-coach

Expand full comment
Jan 30·edited Jan 30

Honest image generation cannot be done in 2D. One has to start with 3D shapes, each anatomically correct, and combine them in physically legal ways. Then add lightning, reflections, background blur, lens distortion, etc.

This has been known for a very long time. I have been however positively shocked that AI art gen has been able to do so much with so little. That should not have been possible.

So yeah, your point is valid. And 3D generated movies, with rigorous physics but with recent highly imaginative advancements, are not far ahead.

Expand full comment

> What unifies all of the above is that current systems are good at local coherence, between words, and between pixels, but not at lining up their outputs with a global comprehension of the world.

First, you have to have one - global comprehension of the world - laid out then built-in in system ontology and working and developing incrementally in real world the way as a child grows adapting to the world so you have a general intelligence system -

A Unified Theory - Universal Language - https://www.linkedin.com/pulse/unified-theory-consciousness-michael-molin/

Expand full comment

24:10 - You need to be doing some kind of manipulation, if only to generate internal consistency. So there's no way a large language model can have internal consistency. It's learned everything on the internet... And so to get to another level of cognition, you're going to need something that builds an internally consistent model of what's out there, whether that needs. - Simon Prince, "Understanding Deep Learning"

https://www.youtube.com/watch?v=sJXn4Cl4oww

Expand full comment

there needs to be an in there, for there to be an out there.

Expand full comment

This image is disturbing.

Expand full comment