42 Comments
Jan 25·edited Jan 25Liked by Gary Marcus

Very nice examples, indeed. I've had the same experience with that total lack of actual understanding. What the AGI-is-nigh community doesn't get is that understanding token ordering or pixel-ordering is to real understanding as understanding ink-distribution is to books (and no, I did not think of that example myself, that comes from late 19th-century/early 20th century Dutch psychiatrist/writer Frederik van Eeden, in his studies that foreshadow our current understanding of the subconscious)

Meaning — as Uncle Ludwig — has argued — comes from 'correct use'. The correctness of 'use' for tokens or pixels have a very loose relation with the 'use' of language.

Expand full comment
author

“ink distribution”! love that!

Expand full comment

If you use it I would really like to see a reference to its source. Here is a quote: "One could just as easily set oneself the goal of deciphering the meaning of a writing by making an elementary analysis of it, by calculating the proportions of the size and number of letters, by microscopically examining the black and white paper fibers. Although If this is continued for centuries, with unlimited accuracy, I do not believe it will be achieved. It is better to read the book." Frederik van Eeden, Our Double-I, September 1886, reprinted in: Studies — Eerste Reeks. Frederik van Eeden probably was what we today would call a genius in many areas. I used him recently to illustrate the equivalence of memorising (often sought after) and data leakage (always bad) in https://ea.rna.nl/2023/12/26/memorisation-the-deep-problem-of-midjourney-chatgpt-and-friends/.

Expand full comment
Jan 25Liked by Gary Marcus

I suspect that many don't realize that this "technology" is INHERENTLY flawed. It is not a glitch, it is the way it works. There is a neural network inside which performs generalizations based on data, and this will always be often/sometimes wrong. It is not "early days" , it is very late (since at least 1990 no significant change). The basis is wrong, and we can only use it where the result doesn't matter very much. Many practitioners don't get this. They believe it will grow up. No, we need a paradigm change.

Expand full comment
author

💯

Expand full comment

I think you are being too pessimistic and missing the obvious. Look at it this way: our species is in deep trouble (climate change, incurable diseases, etc); we create an artificial superintelligence; we then proceed to pepper the superintelligence with inane questions and silly image requests. Unsurprisingly, the superintelligence decides to mock us, as we richly deserve...

Expand full comment
Jan 26Liked by Gary Marcus

Gary, I share your dismay that AI acolytes can fail to recognise how grave these "errors of discomprehension" really are. If a human were to make qualitative errors like these, we would diagnose a serious mental pathology. These hallucinations betray a deep disconnect from reality -- and that's not a figure of speech.

Surely the simple truth is that Large Language Models do not model the real world. They are models of how the world has been represented in large volumes of text. The text is all they got (making LLMs the ultimate and purest Post-Modernists). And the text is biased, confined to those things that people care to write about.

Even "model" seems to be an overstatement given how new this stuff is, and the debates that rage with academic linguists like Chomsky. Is there a canonical model in LLM? (a genuine question).

An LMM is an experimental representation of a highly selective representation of the world.

Expand full comment

I wondered if maybe the reason that the elephant prompts were not working was that elephants are so much larger than humans. So I entered the same prompt, but replaced "elephant" with "ninja." The image generator screwed that up too. The ninja was very obvious in all the photos, in one they were the only person in the foreground! Sometimes it made multiple ninjas.

I tried a similar one, "Draw a picture of a crowd in the square of a town. Hiding among the crowd is a ninja. Make sure it will be hard for the viewer to spot the ninja at first." This one also got it wrong, the ninja was visible immediately. In fact, in one of them the crowd was parted around the ninja so that they were extra easy to see.

Expand full comment
Jan 25·edited Jan 25Liked by Gary Marcus

Hi Gary, such 'glaringly' obvious errors stem from the same source as the word hallucinations - no first-hand, bodily experience about the world. Adding more modes (images, video, audio etc.) isn't going to fix the problem. 'Multimodal' is just as clueless as the 'non'.

Coincidentally I wrote a recent paper called 'The Embodied Intelligent Elephant in the Room' for BICA'23 , arguing for embodiment :) The title pays homage to Rodney Brooks' 'Elephants Don't Play Chess' paper.

Expand full comment
Jan 27Liked by Gary Marcus

That's right. The robots don't know what it's like to see the world, so when prompted to do something that's based entirely on how humans see the world, their answers are of course unreal.

But yeah bro, let them drive cars.

Expand full comment
Jan 28Liked by Gary Marcus

The way technologists speak about AI -- especially the soothing metaphors like "learn" and "neural" -- is training laypeople to over-estimate robots. I'm especially worried that people are led to think that robots "see" as we see.

Remember the 2016 work at Carnegie Mellon where psychedelic patterned eyeglass frames fooled face recognition neural networks? They spoofed target celebrities' faces with patterns that have nothing to do with facial features we recognise as such. https://dl.acm.org/doi/10.1145/2976749.2978392

One the first errors of discomprehension eh?

This reality gap to me is the *real* uncanny valley! We are irrationally frightened by robots that look and move like us, but what's really scary is they don't actually work like we do, not even remotely.

Expand full comment

Stephen, bingo.

The danger lies in ignoring how natural things work on account of they are structured, and instead, abstracting them to laughable extremes and naively insisting they are equivalent.

Expand full comment
Jan 25Liked by Gary Marcus

"Unfortunately, errors of discomprehension may soon be even more worrisome in a new context: war. OpenAI seems to be opening the door to military applications of their systems, and at least one well-funded startup is busy hooking up AI to military drones." - how so? If deep learning is "hitting the wall", if it is so poor at understanding the world, how can it be of any use for military? They will surely fail, right?

Expand full comment
deletedJan 25
Comment deleted
Expand full comment
Jan 25Liked by Gary Marcus

In military they test new weapon and abandon projects if it doesn't meet the requirements. Most of experimental projects end like this (robo-mule is one of them). Despite all the hype of killing robots from Boston dynamics (like https://m.youtube.com/watch?v=y3RIHnK0_NE) none is used in US military

Expand full comment
Jan 27·edited Jan 27Liked by Gary Marcus

ChatGPT>>DALL-E gets lost on productively defining the term "camouflage"- a concept that's only readily comprehensible to a being that possesses visual perception as an active processing faculty. Presumably, AI image generation has similarly intractable difficulty with productively adapting concepts like "disguise" and "trompe l'oeil". https://upload.wikimedia.org/wikipedia/commons/c/cc/Escaping_criticism-by_pere_borrel_del_caso.png (Of course ChatGPT can supply a print-out of the precise dictionary definitions of those terms. But it has no more comprehension of their meaning than a Xerox machine. Does a dictionary have a big vocabulary? Trompe l'langue!)

DALL-E can "paint" elaborate pictures of ambitious scope with fine detail. But it does so without eyes, so to speak. The concept of "fooling the eye" requires both the presence of an eye- a a visual input receiver-transmitter- and a nexus of perception, to accurately process the signal input. A function that also implies the capacity to distinguish signal from noise, which is the capacity that- when found in humans (and many other animals) is "fooled" by a skillfuly constructed trompe l'oeil artistic image. Or by camouflage, which is used extensively by both animals and plants.

AI has none of those capacities. In that regard, the phrase "neural network" is a terribly inaccurate misnomer (that human bias!) AI is more like a card sorter (and condenser/synthesizer, if so instructed.) Not only is it unable to think- it's unable to see (or hear, feel, etc.) Yes, neurons ultimately rely on a baseline of binary switching, just like computers. But once past the assembly code level, analogies to biological neural networks fail. Computers are disembodied. It's my contention that embodiment is a precondition for autonomous motivation, which is a precondition for intelligent thought. A computer programmed to control a quadruped robot is still no more "embodied" than a desktop machine, tablet, or iphone. It only looks that way, to humans (that corporeal-animal bias of ours, again!)

That's part of the maddening fun of AI; it may not ever be able to reliably utilize the concept of trompe l'oeil, but it's easily able to generate those sorts of images inadvertently (as with the Eschersque "surfer girl" depicted in the post.) With no conscious effort, because it's never using any conscious effort. When it's given a task that requires conscious effort, it founders. Oh so effortlessly. https://samkriss.substack.com/p/a-users-guide-to-the-zairja-of-the

Expand full comment
Jan 25Liked by Gary Marcus

The problem isn't so much this or that technology, it is instead our outdated relationship with knowledge. It's that relationship which keeps generating new threats faster than we can figure out what to do about them.

https://www.tannytalk.com/p/our-relationship-with-knowledge

Example: 75 years after Hiroshima we still don't have the slightest clue how to remove the threat presented by nukes. And while we've been failing to meet that challenge, we've been busy piling up more technologies presenting more threats.

Thinking about technological threats one by one by one is a loser's game so long as the knowledge explosion is generating new threats faster than we can conquer them. But this is in fact what almost all experts are doing, playing the loser's game of focusing on particular threats.

All these technological threats arise from a failure to update our knowledge philosophy from the past to adapt to a radically new environment. Species that fail to adapt to changing conditions typically don't do so well.

Expand full comment

Humans do have a clue about how to remove the nuclear threat- dismantle all the bombs. The problem is obtaining the consensus agreement to dismantle them and keep them dismantled, by the People Who Matter. It's a primate will-to-power problem. One bad actor spoils the whole bunch.

I'd like to think that humanity could get to a point where even the most dismal egotists could comprehend the material advantages of shifting thought energy and resources away from projects of mass destruction. I'm up for it, even on my worst day. I think most of us are. But even if I'm right about that, "most of us" is not enough.

Expand full comment

The issue is with DALL·E, not ChatGPT. ChatGPT can only do so much - it describes the scene as good as it can to DALL·E - but then it's up to DALL·E to do the job.

Expand full comment
author

but ChatGPT is supposed to be able see now and it obviously fails at that

Expand full comment

It does see, if you then give the image back to it. Have you tried?

Expand full comment

Given a slightly different prompt, the results are way better.

https://x.com/FabioA/status/1750533735035130189?s=20

Expand full comment
author

replied on X

Expand full comment

Is there even an elephant in the image generated from your prompt? It's so much better and more cleverly concealed, I can't find it anywhere. I feel like a bug or amoeba trying to see through the trickery of an intelligence so far beyond my own, the prompt made such a difference.

Expand full comment

It’s clear now that the elephant is attempting to hide, but you still don’t have to look carefully to notice the elephant (which is what the original prompt asked for).

Expand full comment

oh man, this is funny.

Expand full comment

VQA that needs world interpretation is a long way to go

Expand full comment

Neural networks rely on "correspondence effects," and correspondence has no bearing on relations. If it's up to machines to determine causal relations, it'd no doubt say something about Super Bowl results causing stock market movements and vice versa.

Expand full comment

I am really puzzled by the war applications mentioned in the last part, not least in the light of recent news reports that the IDF was supposedly using some AI model or other to decide on where to bomb. As with spam generation, the question arises: does it actually make any material difference? People have generated spam before, and they have indiscriminately bombed non-military targets before. It seems more as if the AI was sold to the military equivalent of a clueless manager as a boondoggle, equivalent to how it could have been "let's put our munitions supply on the blockchain" a few years ago.

Regarding discomprehension, an image of a three-legged Donald Trump praying in a church made the rounds just two days ago. These images and video clips stand in a jarring contrast to the many social media comments claiming how realistic they are and that AI videos will put Hollywood out of business any moment now. Mate, are we looking at the same media?

Expand full comment
Jan 26·edited Jan 26

More worth looking into regarding military usage of AI might be its usage in the Ukraine-Russia war, where the usage of killer robots is constant and technology is being pushed to its limits, including also in electronic warfare and with jamming capabilities. The autotargeting systems of FPV drones have developed to where even when the signal is lost, drones can autonomously continue to try to kill their target on their own.

But I have no idea how much of this is new technology as opposed to new applications of existing technology, or what sort of AI is involved, if LLMs are even involved at all or not, as there is not much technical coverage of any of it assuming much information of that kind is even public to go off of. It is hard to tell how error prone any of it is.

Expand full comment
Jan 26Liked by Gary Marcus

Yes, self-guided missiles or drones make more sense, and I can see an arms race of capabilities and jamming in that field. I was mainly referring to the idea of using some clever model to predict where to bomb and where not, especially because I have read AI hype-men speculate that a future super-AGI will strategise war at a genius level incomprehensible to humans.

Ultimately, the key question there is not a model but how good the data are (GIGO, i.e., without good ground intelligence, the model itself is a boondoggle), or else the model is just a fig leaf: "It's not me who is responsible for exploding a school building or sending my troops into a trap, the AI told me to do it, and its ways are mysterious".

Expand full comment

We have the only choice to use supervised machine learning instead of unsupervised one to make a real world model of our minds - a tree of world languages and cultures - and that will take a lot of work of all specialists in all fields.

Expand full comment