37 Comments
Sep 19, 2022Liked by Gary Marcus

A clear and concise article that really cuts through the hype. To say that the efforts are "part of the puzzle" is to imply that they already have one of the correct pieces to slot in. That's a bold claim that requires justification. I'd imagine you'd have better intepretive results if you asked a 5 year old kid with a set of colouring pencils. I'd like to see AI"'s formally age graded against humans. I'm still waiting for that scrap of solid evidence that they're anything but parlour trickery.

Expand full comment

What's fascinating to me is that, despite having seen probably millions of bicycles (certainly more than I ever have), these image generators still do things that leave off chains, blur together the pedals and frame, and create wheel spokes in patterns that would never appear on a real bike (because they'd never work). This isn't even a matter of understanding the world; this is sheer failure to properly regurgitate training input when the pattern gets a little complicated. You see little things like this in almost every AI generated image when you look closely enough. We recognize immediately that these bicycles or coffee cups or squirrels don't look quite right. Why can't an AI?

Expand full comment

We have bodily experience built up gradually...

No matter how many trillion images are input to the system, they are merely symbolic of the "real" things, so what we will always have is a Pictorial Chinese Room that is outputting meaningless symbols, ie implausible output images :)

Expand full comment
Sep 18, 2022·edited Sep 19, 2022

Another great post! The visuals help better illustrate the shortcomings of existing AI.

The problems ALL stem back to the SAME source: lack of direct, physical (embodied) interaction with the world. In my previous sentence, I said 'stem', 'back', 'source', 'interaction' 'world' - all of those are physical, so will mean something to an embodied being :)

The physical/material world is a collection of structures and phenomena - our direct interaction with them is what we term "experience". That direct contact lets us explore and be subjected to the world's behavior, and, lets us represent it in our brains directly (without an intermediary) - which we use to reason with, imagine, hypothesize, communicate it to others (via language, math, other forms of symbols), etc.

Today's AI (symbolic, connectionist, RL - no difference) relies on an intermediary - us. And that is the core problem.

Expand full comment

It's a mirror of our achievements.

Expand full comment

Indeed. To that extent it's always going to be derivative in nature, on account of lacking agency.

Expand full comment

Very nice article Gary 👏 thanks for sharing! we know the fundamental issue of lack of understanding will never be solved for most query domains without a set of abstract symbolic rules. In fact for problems with material consequences, we are better off starting with symbolic rules and complementing that with intelligence from data.

Expand full comment

Hi Venkat, that combination can only go so far - farther than what exists now, for sure - but still :)

Expand full comment
Dec 3, 2022Liked by Gary Marcus

Late to chime in on this and the several other wonderful posts about compositionality, language, knowledge, and AI, but I wanted to share what first came to my mind when I saw the phrase "astronaut riding a horse:" https://markkelly.com/wp-content/uploads/2022/09/MecklerPhoto-F-7920-1024x683.jpg

I guess I'm just so much more literal than other intelligences.

Expand full comment
Sep 29, 2022Liked by Gary Marcus

Great article.

I had a very similar experience trying to generate a picture of an inverted pyramid :/

https://www.linkedin.com/feed/update/urn:li:activity:6979765871414059008/

Expand full comment
Sep 25, 2022Liked by Gary Marcus

I'm a guinea pig with DALL E 2. It's a remarkable tool, but just that. It doesn't create anything. It follows instructions within parameters to produce- but if you gauge nuance in input well the results can be startling

Expand full comment
Sep 23, 2022·edited Sep 23, 2022Liked by Gary Marcus

Quick correction to your coda: typo. It's "Mostaque", not "Mostique".

Expand full comment
author

good catch. fixed!

Expand full comment

I think the limitations are part of the fun . And why prompt engineering is so enjoyable. I wonder how long we will need to do that for . But stuff like deepmind’s selection inference framework is really interesting .

Expand full comment

I hear "one piece of the puzzle" as "I've got nothing."

Expand full comment
Oct 14, 2022·edited Oct 14, 2022

From Google:

===================================================

Mind's Eye: Grounded Language Model Reasoning through Simulation - Google Research 2022

Paper: https://arxiv.org/abs/2210.05359 [submitted 10/11/22]

Abstract:

"Successful and effective communication between humans and AI relies on a shared experience of the world. By training solely on written text, current language models (LMs) miss the grounded experience of humans in the real-world -- their failure to relate language to the physical world causes knowledge to be misrepresented and obvious mistakes in their reasoning.

...

..."

===================================================

Ya think? LOL.

But - simulation won't magically fixing grounding either, because simulations are themselves not grounded!

And the title - Mind's Eye - really?

Expand full comment
Sep 19, 2022·edited Sep 19, 2022

It is really strange that you phrased your prompts in the form "draw a..." or "sketch a..." The prompt is supposed to describe the picture, not to instruct the algorithm. No one would describe their picture as "draw a bicycle"

Not that I disagree with your conclusion that there's a lack of knowledge about the real world but this just strikes me as an odd way to show it.

Expand full comment
author

it has no problem with “draw a house” and the like, so i don’t think that’s the issue. but you can try at stability.ai

Expand full comment

“Nice, you’ve created a box that creates a laminar airflow of 150 mph. How does that help you fly?”

“Yes, that wing shape produces more lift, but that’s only gonna fly in a hurricane.”

“Yes, that engine will produce a lot of thrust, but how will you get it into the air? Drop it off a cliff?”

*

[build the parts first, then put them together]

Expand full comment
author

the question is whether this is even one of the parts, or just a distraction.

Expand full comment

Agreed, but I can definitely see it as a part of imagination. Other parts can evaluate the result, selecting which visualizations are more accurate, and perhaps re-initiating the process.

Expand full comment

Yes, but they seem to be working on this one part to the exclusion of all the others. The parts should be enumerated and given equal time (funding).

Expand full comment
author

indeed the stark asymmetry in funding is going to seem like a mistake in hindsight

Expand full comment

How do we know which parts to build, before we know how they go together?

Expand full comment

Understanding is about establishing relationships. In the Constructivist paradigm of Jean Piaget, image[ schema]s ground mental concepts to qualia.

Expand full comment