29 Comments
Feb 13Liked by Gary Marcus

No surprise that lack of understanding is on display, in two modes - text, image - and will likely be, in others as well (audio, video etc).

'Multimodal' can't fix 'clueless'.

'The Emperor has no clothes', in every language and modality :)

Data isn't a substitute for direct understanding, that is the heart of the matter. Fixing things piecemeal after they are shown to be broken, isn't scalable. Reality isn't amenable to ongoing and perpetual dot release upgrades, there is no magic crossover point at which the machine will suddenly 'get it'.

Expand full comment
Feb 13·edited Feb 13Liked by Gary Marcus

OpenAI: "We use the term “hallucinations,” though we recognize ways this framing may suggest anthropomorphization, which in turn can lead to harms or incorrect mental models of how the model learns." — GPT-4 System Card, an addendum of the GPT-4 Technical Report.

At least some people at OpenAI understand the 'bewitchment by language' good enough to have had this footnote added. Too bad they did not add the same footnote in all caps regarding the word 'understanding'... (or 'learning' for that matter)

The use of the term 'hallucination/error' triggers the assumption in our minds that the 'default' of the system is 'understanding/correct'. In an extreme example, someone who says "For me, Jews are people too" is an antisemite, because they allow doubt by implicitly stating this is a valid question in the first place (cf. Godfried Bomans). The opposite of something we say is often also implicitly said.

I seriously think we should refrain from calling these errors or hallucinations. We might call them 'failed approximations' to signal the correct ones are also 'approximations'.

https://ea.rna.nl/2023/11/01/the-hidden-meaning-of-the-errors-of-chatgpt-and-friends/

Expand full comment
Feb 14Liked by Gary Marcus

The reaction you get from folks who just want AI to be better than it is - fascinating. If it walks like a duck but can't do anything else a duck does - does not matter. They will still call it a duck. my point I keep making about AI therapy. If you think it understands you and you really want it to understand you - it will appear to understand you and that's all some folks need. But the rest of us will still have a critical eye. Cheers Gary! Keep up the good work.

Expand full comment

I have to say I do like that fantastical rhinelephant in one picture.

Expand full comment

“…the problem was with its *language understanding*, rather than with illustration per se.” Yup.

Seems to be a lot of resistance to this idea out there… or misunderstanding of it.

Expand full comment

While some variants of a horse riding astronaut failed when they shouldn't the original formulation was flawed. Google shows 45,000 hits for "a horse riding girl" and all the images have the girl on the horse not the other way around. "a horse riding ON an astronaut" would be better. Again - your post points to some really shortcomings but be careful with ambiguous English. Admittedly these programs should not ignore the "an" in "a horse riding an astronaut"

Expand full comment

Let's talk about understanding. I have to come at this from the entire framework of ancient Vedic philosophy. "Intelligence" simply does not exist. I've been vehemently against the use of the concept of "intelligence" for 40 years. The word "intelligence" was coined from Latin and started gaining use as recently as the end of the Medieval Ages in Europe 500 years ago.... 500 years in homo sapien evolution is nothing!

I speak Tamil, my mother tongue in one state in South India, and a bit of Sanskrit. Tamil and Sanskrit are 5000 years old and they are the two oldest and highly literate languages in the world. Even Arabic, Cantonese, Farsi and Urdu that are much much older than Latin-Germanic languages in Europe do not support an equivalent word for "intelligence". All of these languages support concepts describing logic, reason, understanding, yes. But never intelligence.

Hence I've never used the word "intelligence" because I can't use a word if I don't fully understand what it means. If the western world claims humans are "intelligent" then the current state of our planet run by humans equals "intelligence."

As far as I can tell nobody the Latin Western hemisphere is able to define "intelligence". So why put 1 trillion or even 7 trillion into a creating a concept that doesn't exist, and on machines?

"There is perhaps no better a folly of human conceits than this distant image of our tiny world" - Carl Sagan

Expand full comment

Your voice is very important in this conversation of our AI dominated future. I agree human understanding involves a whole compositional view of whatever we are thinking about. For example, a songwriter hears the whole song in his/her head, then writes it down. Its not just a succession of one arbitrary note after another. A mathematician discovers a proof all at once, not a just a list of one symbol and Greek letter followed by another. Perhaps human consciousness has something involving quantum coherence in the microtubules as suggested by Penrose and Hameroff.

But on the other hand, these examples seem to be the same kind of mistakes humans make. Especially kids. We have prepositional phrases like “on the back of” precisely to emphasize the position and order of the things we are talking about. People don’t speak in plain strings of commands and objects. Also, we chunk common phrases together and sometimes mix them up like in a Freudian Slip.

AI is not conscious, but it’s starting to look like it is our collective unconscious.

Expand full comment

This is a solid essay, and provides an in-depth look at what the problem is, rather than offering uncharitable and unproductive criticism. This is a very welcome approach.

Current algorithms do not truly know how things fit together, and what they really mean.

Yet, these methods are not a dead-end or "sucking oxygen" from potentially better things. They are first-cut approaches at solving what are impossibly hard problems.

Rigorous approaches would model anatomy and spatial relationships. Or at least start with a rough correct skeleton and let the algorithm fill the details (that part it can do well). Given how far we've come, I think these issues will be solved in a year or two.

Expand full comment