Hi Gary, the errors seem egregious, partly because, when the imagery, text, video... seem correct to us, we tacitly assume they 'knew' what to generate.
But the more bland reality is, they have zero clue about anything at all, even when they compute values that turn out correct (to us).
Stochastic monkeys pounding on loaded keyboards, throwing paint on loaded canvases - with zero understanding of what results...
Horses can have engorged veins like the ones depicted in the image, but that typically happens after a lot of exercise or possibly conditions such as blockage or clotting (or heart issues). Not so much if the horse is just standing around being hugged by older gentlemen.
These AI images always look creepy, and it is not just the problems of anatomy and physical space that you have pointed out. An art historian would be able to explain better than me. I think some factors are the excessive detail, unnatural lighting, and the way the textures (skin, hair, fabric, etc.) all look wrong.
What a beautiful piece of art, it seriously made me laugh out loud hard along with Chris McKillop's rephrasing of the prompt and your analysis. These AI mistakes are so funny. Thanks for all the (educational) laughs, Marcus.
Even when they aren't full of egregious errors, AI generated images have a certain immediately recognisable otherness. Earlier this week, I saw a presentation at work that was illustrated on the lines of 'generic stock image of scientist who examines data on a screen', except AI generated instead of stock image, which made sense because the presentation topic was AI, so no problem with that at all. But the scientist had a kind of uncannily shiny appearance and just slightly too idealised features, and of course the screen showed garbled nonsense. Stock photos of people in immaculate white coats pipetting colourful dye into glassware more appropriate for a late Medieval alchemy lab are bad enough, but I feel this is worse.
There are two reasons why despite horn-through-head style errors and off-putting appearance, people are excited about AI media. One is that we are pretty good at seeing what we expect to see, so we often don't notice errors or gloss over the weirdness, just like how we recognise a face in :-).
Second, cultish hype. I do not really understand what is going on inside the heads of people who fill the comment thread under a video of unnaturally moving people with melted features with responses like "so amazing" and "just like Hollywood", but these people exist, they aren't all bots, and for some reason they don't seem to have more pressing things to do than shilling the original poster's business of online courses on how to generate AI 'art'.
My best guess is that it is a mixture of some who hope to grab onto the coattails of fellow hustlers and make some money off a hype cycle and others who seriously believe that Elon Musk and Sam Altman are saving humanity by building robot Messiah to usher in prosperity, interstellar travel, and immortality for all, and even the tiniest bit of scepticism or criticism will delay or even endanger this utopia. Having reasoned thus far, the one thing I get stuck on is why anybody would trust a billionaire to share hypothetical, future AI-generated wealth and rejuvenating drugs with his fan "John12387 laser eye avatar blue tick", but that thought is now really out of scope.
Point is, despite all the flaws, people on social media will continue to claim these images and videos are amazing. And that dissonance can't be good for their psyche.
The guy is unrealistically tall--in addition to needing a very long right arm, an adult unicorn would be expected to be the size of an adult horse. But he's taller than the animal.
So what I'm seeing, as an artistically talentless twitter troll you don't like that fact that AI draws better in seconds than you might achieve in days, and it makes some mistakes (some of which artists famously made).
Every exceptional achievement is met with derision. Honestly, its getting a little embarrassing. I pity the people who's papers you review.
Which artist "famously" drew a horn passing harmlessly through a person's head? I would think that a mistake like that would be more aptly described as "infamous" than "famous". Some of the mistakes, like the implied length of the man's right arm, are subtle and could be made by a human artist, but not that one.
More importantly, you are failing to grasp the reason why people keep pointing out these mistakes. The purveyors of these models continue to insist that the models have developed an understanding of the world that guides the things they generate, and mistakes like these are the counterexamples that refute those claims. If they stuck to more modest claims, like "it often comes up with some pretty good drawings in mere seconds," then people wouldn't be so interested in its failure cases.
This kind of inherent weakness at seeing globally (or at all) has been exploited by a game player to beat a supposed world-champion Go-playing AI. (I'll post the reference if I can find it again...).
"The tactics used by Pelrine involved slowly stringing together a large “loop” of stones to encircle one of his opponent’s own groups, while distracting the AI with moves in other corners of the board. The Go-playing bot did not notice its vulnerability, even when the encirclement was nearly complete, Pelrine said.
“As a human it would be quite easy to spot,” he added.
The discovery of a weakness in some of the most advanced Go-playing machines points to a fundamental flaw in the deep-learning systems that underpin today’s most advanced AI, said Stuart Russell, a computer science professor at the University of California, Berkeley.
The systems can “understand” only specific situations they have been exposed to in the past and are unable to generalize in a way that humans find easy, he added."
Honest image generation cannot be done in 2D. One has to start with 3D shapes, each anatomically correct, and combine them in physically legal ways. Then add lightning, reflections, background blur, lens distortion, etc.
This has been known for a very long time. I have been however positively shocked that AI art gen has been able to do so much with so little. That should not have been possible.
So yeah, your point is valid. And 3D generated movies, with rigorous physics but with recent highly imaginative advancements, are not far ahead.
> What unifies all of the above is that current systems are good at local coherence, between words, and between pixels, but not at lining up their outputs with a global comprehension of the world.
First, you have to have one - global comprehension of the world - laid out then built-in in system ontology and working and developing incrementally in real world the way as a child grows adapting to the world so you have a general intelligence system -
24:10 - You need to be doing some kind of manipulation, if only to generate internal consistency. So there's no way a large language model can have internal consistency. It's learned everything on the internet... And so to get to another level of cognition, you're going to need something that builds an internally consistent model of what's out there, whether that needs. - Simon Prince, "Understanding Deep Learning"
Hi Gary, the errors seem egregious, partly because, when the imagery, text, video... seem correct to us, we tacitly assume they 'knew' what to generate.
But the more bland reality is, they have zero clue about anything at all, even when they compute values that turn out correct (to us).
Stochastic monkeys pounding on loaded keyboards, throwing paint on loaded canvases - with zero understanding of what results...
What about those veins standing out on the unicorn's face? It's like some of the texture from the man's hands bled over onto the unicorn.
Horses can have engorged veins like the ones depicted in the image, but that typically happens after a lot of exercise or possibly conditions such as blockage or clotting (or heart issues). Not so much if the horse is just standing around being hugged by older gentlemen.
"The painting isn’t really particularly in the style of Michelangelo."
We can say something stronger than that - it looks like a creepy digital image, not a painting, and absolutely nothing like Michelangelo. Here is what a painting by Michelangelo looks like: https://s3-us-west-2.amazonaws.com/courses-images-archive-read-only/wp-content/uploads/sites/1122/2016/08/16190403/sibyls.jpg
These AI images always look creepy, and it is not just the problems of anatomy and physical space that you have pointed out. An art historian would be able to explain better than me. I think some factors are the excessive detail, unnatural lighting, and the way the textures (skin, hair, fabric, etc.) all look wrong.
What a beautiful piece of art, it seriously made me laugh out loud hard along with Chris McKillop's rephrasing of the prompt and your analysis. These AI mistakes are so funny. Thanks for all the (educational) laughs, Marcus.
Even when they aren't full of egregious errors, AI generated images have a certain immediately recognisable otherness. Earlier this week, I saw a presentation at work that was illustrated on the lines of 'generic stock image of scientist who examines data on a screen', except AI generated instead of stock image, which made sense because the presentation topic was AI, so no problem with that at all. But the scientist had a kind of uncannily shiny appearance and just slightly too idealised features, and of course the screen showed garbled nonsense. Stock photos of people in immaculate white coats pipetting colourful dye into glassware more appropriate for a late Medieval alchemy lab are bad enough, but I feel this is worse.
There are two reasons why despite horn-through-head style errors and off-putting appearance, people are excited about AI media. One is that we are pretty good at seeing what we expect to see, so we often don't notice errors or gloss over the weirdness, just like how we recognise a face in :-).
Second, cultish hype. I do not really understand what is going on inside the heads of people who fill the comment thread under a video of unnaturally moving people with melted features with responses like "so amazing" and "just like Hollywood", but these people exist, they aren't all bots, and for some reason they don't seem to have more pressing things to do than shilling the original poster's business of online courses on how to generate AI 'art'.
My best guess is that it is a mixture of some who hope to grab onto the coattails of fellow hustlers and make some money off a hype cycle and others who seriously believe that Elon Musk and Sam Altman are saving humanity by building robot Messiah to usher in prosperity, interstellar travel, and immortality for all, and even the tiniest bit of scepticism or criticism will delay or even endanger this utopia. Having reasoned thus far, the one thing I get stuck on is why anybody would trust a billionaire to share hypothetical, future AI-generated wealth and rejuvenating drugs with his fan "John12387 laser eye avatar blue tick", but that thought is now really out of scope.
Point is, despite all the flaws, people on social media will continue to claim these images and videos are amazing. And that dissonance can't be good for their psyche.
The horn shows rings, but not helical twists.
The guy is unrealistically tall--in addition to needing a very long right arm, an adult unicorn would be expected to be the size of an adult horse. But he's taller than the animal.
It's obvious Joe, the man is God.
So what I'm seeing, as an artistically talentless twitter troll you don't like that fact that AI draws better in seconds than you might achieve in days, and it makes some mistakes (some of which artists famously made).
Every exceptional achievement is met with derision. Honestly, its getting a little embarrassing. I pity the people who's papers you review.
please do us both a favor and unsubscribe.
Which artist "famously" drew a horn passing harmlessly through a person's head? I would think that a mistake like that would be more aptly described as "infamous" than "famous". Some of the mistakes, like the implied length of the man's right arm, are subtle and could be made by a human artist, but not that one.
More importantly, you are failing to grasp the reason why people keep pointing out these mistakes. The purveyors of these models continue to insist that the models have developed an understanding of the world that guides the things they generate, and mistakes like these are the counterexamples that refute those claims. If they stuck to more modest claims, like "it often comes up with some pretty good drawings in mere seconds," then people wouldn't be so interested in its failure cases.
Well said and exactly right.
Idk, Stuart, so when the same "Gen AI" system makes a tiny error (in mere seconds) by launching a nuclear device, that's not important either?
You clearly do not understand the concept of "error propagation"...look into it.
rapunzel hair on the horsie, verry unusual hair braid on the dude (no judgement)
This kind of inherent weakness at seeing globally (or at all) has been exploited by a game player to beat a supposed world-champion Go-playing AI. (I'll post the reference if I can find it again...).
it’s from stuart russell’s lab, and i wrote about it in the substack many months ago (but the paper has since been updated).
Is this the article of yours?:
https://garymarcus.substack.com/p/david-beats-go-liath
I figured you must know about it – I'll check out your article.
Ha, just found that out – you beat me to the punch by one second (or I beat you? ;) ). :)
Ah here we Go (no pun intended):
"The tactics used by Pelrine involved slowly stringing together a large “loop” of stones to encircle one of his opponent’s own groups, while distracting the AI with moves in other corners of the board. The Go-playing bot did not notice its vulnerability, even when the encirclement was nearly complete, Pelrine said.
“As a human it would be quite easy to spot,” he added.
The discovery of a weakness in some of the most advanced Go-playing machines points to a fundamental flaw in the deep-learning systems that underpin today’s most advanced AI, said Stuart Russell, a computer science professor at the University of California, Berkeley.
The systems can “understand” only specific situations they have been exposed to in the past and are unable to generalize in a way that humans find easy, he added."
https://arstechnica.com/information-technology/2023/02/man-beats-machine-at-go-in-human-victory-over-ai/
Isn't the left arm actually sculpted in marble? I think that could be a reaction to the Michaelangelo prompt
Errors schmerrors: https://www.dirtyhans.co.uk/
I'd buy it!
You can actually do a spot the difference with GPT - try it here "create a spot the difference image of an old wizard with a unicorn" https://chat.openai.com/g/g-Yas2WSu7S-cognitive-coach
Honest image generation cannot be done in 2D. One has to start with 3D shapes, each anatomically correct, and combine them in physically legal ways. Then add lightning, reflections, background blur, lens distortion, etc.
This has been known for a very long time. I have been however positively shocked that AI art gen has been able to do so much with so little. That should not have been possible.
So yeah, your point is valid. And 3D generated movies, with rigorous physics but with recent highly imaginative advancements, are not far ahead.
> What unifies all of the above is that current systems are good at local coherence, between words, and between pixels, but not at lining up their outputs with a global comprehension of the world.
First, you have to have one - global comprehension of the world - laid out then built-in in system ontology and working and developing incrementally in real world the way as a child grows adapting to the world so you have a general intelligence system -
A Unified Theory - Universal Language - https://www.linkedin.com/pulse/unified-theory-consciousness-michael-molin/
24:10 - You need to be doing some kind of manipulation, if only to generate internal consistency. So there's no way a large language model can have internal consistency. It's learned everything on the internet... And so to get to another level of cognition, you're going to need something that builds an internally consistent model of what's out there, whether that needs. - Simon Prince, "Understanding Deep Learning"
https://www.youtube.com/watch?v=sJXn4Cl4oww
there needs to be an in there, for there to be an out there.
This image is disturbing.