Very nice examples, indeed. I've had the same experience with that total lack of actual understanding. What the AGI-is-nigh community doesn't get is that understanding token ordering or pixel-ordering is to real understanding as understanding ink-distribution is to books (and no, I did not think of that example myself, that comes from late 19th-century/early 20th century Dutch psychiatrist/writer Frederik van Eeden, in his studies that foreshadow our current understanding of the subconscious)
Meaning — as Uncle Ludwig — has argued — comes from 'correct use'. The correctness of 'use' for tokens or pixels have a very loose relation with the 'use' of language.
If you use it I would really like to see a reference to its source. Here is a quote: "One could just as easily set oneself the goal of deciphering the meaning of a writing by making an elementary analysis of it, by calculating the proportions of the size and number of letters, by microscopically examining the black and white paper fibers. Although If this is continued for centuries, with unlimited accuracy, I do not believe it will be achieved. It is better to read the book." Frederik van Eeden, Our Double-I, September 1886, reprinted in: Studies — Eerste Reeks. Frederik van Eeden probably was what we today would call a genius in many areas. I used him recently to illustrate the equivalence of memorising (often sought after) and data leakage (always bad) in https://ea.rna.nl/2023/12/26/memorisation-the-deep-problem-of-midjourney-chatgpt-and-friends/.
I suspect that many don't realize that this "technology" is INHERENTLY flawed. It is not a glitch, it is the way it works. There is a neural network inside which performs generalizations based on data, and this will always be often/sometimes wrong. It is not "early days" , it is very late (since at least 1990 no significant change). The basis is wrong, and we can only use it where the result doesn't matter very much. Many practitioners don't get this. They believe it will grow up. No, we need a paradigm change.
I think you are being too pessimistic and missing the obvious. Look at it this way: our species is in deep trouble (climate change, incurable diseases, etc); we create an artificial superintelligence; we then proceed to pepper the superintelligence with inane questions and silly image requests. Unsurprisingly, the superintelligence decides to mock us, as we richly deserve...
Gary, I share your dismay that AI acolytes can fail to recognise how grave these "errors of discomprehension" really are. If a human were to make qualitative errors like these, we would diagnose a serious mental pathology. These hallucinations betray a deep disconnect from reality -- and that's not a figure of speech.
Surely the simple truth is that Large Language Models do not model the real world. They are models of how the world has been represented in large volumes of text. The text is all they got (making LLMs the ultimate and purest Post-Modernists). And the text is biased, confined to those things that people care to write about.
Even "model" seems to be an overstatement given how new this stuff is, and the debates that rage with academic linguists like Chomsky. Is there a canonical model in LLM? (a genuine question).
An LMM is an experimental representation of a highly selective representation of the world.
I wondered if maybe the reason that the elephant prompts were not working was that elephants are so much larger than humans. So I entered the same prompt, but replaced "elephant" with "ninja." The image generator screwed that up too. The ninja was very obvious in all the photos, in one they were the only person in the foreground! Sometimes it made multiple ninjas.
I tried a similar one, "Draw a picture of a crowd in the square of a town. Hiding among the crowd is a ninja. Make sure it will be hard for the viewer to spot the ninja at first." This one also got it wrong, the ninja was visible immediately. In fact, in one of them the crowd was parted around the ninja so that they were extra easy to see.
Hi Gary, such 'glaringly' obvious errors stem from the same source as the word hallucinations - no first-hand, bodily experience about the world. Adding more modes (images, video, audio etc.) isn't going to fix the problem. 'Multimodal' is just as clueless as the 'non'.
Coincidentally I wrote a recent paper called 'The Embodied Intelligent Elephant in the Room' for BICA'23 , arguing for embodiment :) The title pays homage to Rodney Brooks' 'Elephants Don't Play Chess' paper.
That's right. The robots don't know what it's like to see the world, so when prompted to do something that's based entirely on how humans see the world, their answers are of course unreal.
The way technologists speak about AI -- especially the soothing metaphors like "learn" and "neural" -- is training laypeople to over-estimate robots. I'm especially worried that people are led to think that robots "see" as we see.
Remember the 2016 work at Carnegie Mellon where psychedelic patterned eyeglass frames fooled face recognition neural networks? They spoofed target celebrities' faces with patterns that have nothing to do with facial features we recognise as such. https://dl.acm.org/doi/10.1145/2976749.2978392
One the first errors of discomprehension eh?
This reality gap to me is the *real* uncanny valley! We are irrationally frightened by robots that look and move like us, but what's really scary is they don't actually work like we do, not even remotely.
The danger lies in ignoring how natural things work on account of they are structured, and instead, abstracting them to laughable extremes and naively insisting they are equivalent.
"Unfortunately, errors of discomprehension may soon be even more worrisome in a new context: war. OpenAI seems to be opening the door to military applications of their systems, and at least one well-funded startup is busy hooking up AI to military drones." - how so? If deep learning is "hitting the wall", if it is so poor at understanding the world, how can it be of any use for military? They will surely fail, right?
In military they test new weapon and abandon projects if it doesn't meet the requirements. Most of experimental projects end like this (robo-mule is one of them). Despite all the hype of killing robots from Boston dynamics (like https://m.youtube.com/watch?v=y3RIHnK0_NE) none is used in US military
ChatGPT>>DALL-E gets lost on productively defining the term "camouflage"- a concept that's only readily comprehensible to a being that possesses visual perception as an active processing faculty. Presumably, AI image generation has similarly intractable difficulty with productively adapting concepts like "disguise" and "trompe l'oeil". https://upload.wikimedia.org/wikipedia/commons/c/cc/Escaping_criticism-by_pere_borrel_del_caso.png (Of course ChatGPT can supply a print-out of the precise dictionary definitions of those terms. But it has no more comprehension of their meaning than a Xerox machine. Does a dictionary have a big vocabulary? Trompe l'langue!)
DALL-E can "paint" elaborate pictures of ambitious scope with fine detail. But it does so without eyes, so to speak. The concept of "fooling the eye" requires both the presence of an eye- a a visual input receiver-transmitter- and a nexus of perception, to accurately process the signal input. A function that also implies the capacity to distinguish signal from noise, which is the capacity that- when found in humans (and many other animals) is "fooled" by a skillfuly constructed trompe l'oeil artistic image. Or by camouflage, which is used extensively by both animals and plants.
AI has none of those capacities. In that regard, the phrase "neural network" is a terribly inaccurate misnomer (that human bias!) AI is more like a card sorter (and condenser/synthesizer, if so instructed.) Not only is it unable to think- it's unable to see (or hear, feel, etc.) Yes, neurons ultimately rely on a baseline of binary switching, just like computers. But once past the assembly code level, analogies to biological neural networks fail. Computers are disembodied. It's my contention that embodiment is a precondition for autonomous motivation, which is a precondition for intelligent thought. A computer programmed to control a quadruped robot is still no more "embodied" than a desktop machine, tablet, or iphone. It only looks that way, to humans (that corporeal-animal bias of ours, again!)
That's part of the maddening fun of AI; it may not ever be able to reliably utilize the concept of trompe l'oeil, but it's easily able to generate those sorts of images inadvertently (as with the Eschersque "surfer girl" depicted in the post.) With no conscious effort, because it's never using any conscious effort. When it's given a task that requires conscious effort, it founders. Oh so effortlessly. https://samkriss.substack.com/p/a-users-guide-to-the-zairja-of-the
The problem isn't so much this or that technology, it is instead our outdated relationship with knowledge. It's that relationship which keeps generating new threats faster than we can figure out what to do about them.
Example: 75 years after Hiroshima we still don't have the slightest clue how to remove the threat presented by nukes. And while we've been failing to meet that challenge, we've been busy piling up more technologies presenting more threats.
Thinking about technological threats one by one by one is a loser's game so long as the knowledge explosion is generating new threats faster than we can conquer them. But this is in fact what almost all experts are doing, playing the loser's game of focusing on particular threats.
All these technological threats arise from a failure to update our knowledge philosophy from the past to adapt to a radically new environment. Species that fail to adapt to changing conditions typically don't do so well.
Humans do have a clue about how to remove the nuclear threat- dismantle all the bombs. The problem is obtaining the consensus agreement to dismantle them and keep them dismantled, by the People Who Matter. It's a primate will-to-power problem. One bad actor spoils the whole bunch.
I'd like to think that humanity could get to a point where even the most dismal egotists could comprehend the material advantages of shifting thought energy and resources away from projects of mass destruction. I'm up for it, even on my worst day. I think most of us are. But even if I'm right about that, "most of us" is not enough.
The issue is with DALL·E, not ChatGPT. ChatGPT can only do so much - it describes the scene as good as it can to DALL·E - but then it's up to DALL·E to do the job.
Is there even an elephant in the image generated from your prompt? It's so much better and more cleverly concealed, I can't find it anywhere. I feel like a bug or amoeba trying to see through the trickery of an intelligence so far beyond my own, the prompt made such a difference.
It’s clear now that the elephant is attempting to hide, but you still don’t have to look carefully to notice the elephant (which is what the original prompt asked for).
Neural networks rely on "correspondence effects," and correspondence has no bearing on relations. If it's up to machines to determine causal relations, it'd no doubt say something about Super Bowl results causing stock market movements and vice versa.
I am really puzzled by the war applications mentioned in the last part, not least in the light of recent news reports that the IDF was supposedly using some AI model or other to decide on where to bomb. As with spam generation, the question arises: does it actually make any material difference? People have generated spam before, and they have indiscriminately bombed non-military targets before. It seems more as if the AI was sold to the military equivalent of a clueless manager as a boondoggle, equivalent to how it could have been "let's put our munitions supply on the blockchain" a few years ago.
Regarding discomprehension, an image of a three-legged Donald Trump praying in a church made the rounds just two days ago. These images and video clips stand in a jarring contrast to the many social media comments claiming how realistic they are and that AI videos will put Hollywood out of business any moment now. Mate, are we looking at the same media?
More worth looking into regarding military usage of AI might be its usage in the Ukraine-Russia war, where the usage of killer robots is constant and technology is being pushed to its limits, including also in electronic warfare and with jamming capabilities. The autotargeting systems of FPV drones have developed to where even when the signal is lost, drones can autonomously continue to try to kill their target on their own.
But I have no idea how much of this is new technology as opposed to new applications of existing technology, or what sort of AI is involved, if LLMs are even involved at all or not, as there is not much technical coverage of any of it assuming much information of that kind is even public to go off of. It is hard to tell how error prone any of it is.
Yes, self-guided missiles or drones make more sense, and I can see an arms race of capabilities and jamming in that field. I was mainly referring to the idea of using some clever model to predict where to bomb and where not, especially because I have read AI hype-men speculate that a future super-AGI will strategise war at a genius level incomprehensible to humans.
Ultimately, the key question there is not a model but how good the data are (GIGO, i.e., without good ground intelligence, the model itself is a boondoggle), or else the model is just a fig leaf: "It's not me who is responsible for exploding a school building or sending my troops into a trap, the AI told me to do it, and its ways are mysterious".
We have the only choice to use supervised machine learning instead of unsupervised one to make a real world model of our minds - a tree of world languages and cultures - and that will take a lot of work of all specialists in all fields.
Very nice examples, indeed. I've had the same experience with that total lack of actual understanding. What the AGI-is-nigh community doesn't get is that understanding token ordering or pixel-ordering is to real understanding as understanding ink-distribution is to books (and no, I did not think of that example myself, that comes from late 19th-century/early 20th century Dutch psychiatrist/writer Frederik van Eeden, in his studies that foreshadow our current understanding of the subconscious)
Meaning — as Uncle Ludwig — has argued — comes from 'correct use'. The correctness of 'use' for tokens or pixels have a very loose relation with the 'use' of language.
“ink distribution”! love that!
If you use it I would really like to see a reference to its source. Here is a quote: "One could just as easily set oneself the goal of deciphering the meaning of a writing by making an elementary analysis of it, by calculating the proportions of the size and number of letters, by microscopically examining the black and white paper fibers. Although If this is continued for centuries, with unlimited accuracy, I do not believe it will be achieved. It is better to read the book." Frederik van Eeden, Our Double-I, September 1886, reprinted in: Studies — Eerste Reeks. Frederik van Eeden probably was what we today would call a genius in many areas. I used him recently to illustrate the equivalence of memorising (often sought after) and data leakage (always bad) in https://ea.rna.nl/2023/12/26/memorisation-the-deep-problem-of-midjourney-chatgpt-and-friends/.
https://en.wikipedia.org/wiki/Rorschach_test
I suspect that many don't realize that this "technology" is INHERENTLY flawed. It is not a glitch, it is the way it works. There is a neural network inside which performs generalizations based on data, and this will always be often/sometimes wrong. It is not "early days" , it is very late (since at least 1990 no significant change). The basis is wrong, and we can only use it where the result doesn't matter very much. Many practitioners don't get this. They believe it will grow up. No, we need a paradigm change.
💯
I think you are being too pessimistic and missing the obvious. Look at it this way: our species is in deep trouble (climate change, incurable diseases, etc); we create an artificial superintelligence; we then proceed to pepper the superintelligence with inane questions and silly image requests. Unsurprisingly, the superintelligence decides to mock us, as we richly deserve...
Gary, I share your dismay that AI acolytes can fail to recognise how grave these "errors of discomprehension" really are. If a human were to make qualitative errors like these, we would diagnose a serious mental pathology. These hallucinations betray a deep disconnect from reality -- and that's not a figure of speech.
Surely the simple truth is that Large Language Models do not model the real world. They are models of how the world has been represented in large volumes of text. The text is all they got (making LLMs the ultimate and purest Post-Modernists). And the text is biased, confined to those things that people care to write about.
Even "model" seems to be an overstatement given how new this stuff is, and the debates that rage with academic linguists like Chomsky. Is there a canonical model in LLM? (a genuine question).
An LMM is an experimental representation of a highly selective representation of the world.
I wondered if maybe the reason that the elephant prompts were not working was that elephants are so much larger than humans. So I entered the same prompt, but replaced "elephant" with "ninja." The image generator screwed that up too. The ninja was very obvious in all the photos, in one they were the only person in the foreground! Sometimes it made multiple ninjas.
I tried a similar one, "Draw a picture of a crowd in the square of a town. Hiding among the crowd is a ninja. Make sure it will be hard for the viewer to spot the ninja at first." This one also got it wrong, the ninja was visible immediately. In fact, in one of them the crowd was parted around the ninja so that they were extra easy to see.
Hi Gary, such 'glaringly' obvious errors stem from the same source as the word hallucinations - no first-hand, bodily experience about the world. Adding more modes (images, video, audio etc.) isn't going to fix the problem. 'Multimodal' is just as clueless as the 'non'.
Coincidentally I wrote a recent paper called 'The Embodied Intelligent Elephant in the Room' for BICA'23 , arguing for embodiment :) The title pays homage to Rodney Brooks' 'Elephants Don't Play Chess' paper.
That's right. The robots don't know what it's like to see the world, so when prompted to do something that's based entirely on how humans see the world, their answers are of course unreal.
But yeah bro, let them drive cars.
The way technologists speak about AI -- especially the soothing metaphors like "learn" and "neural" -- is training laypeople to over-estimate robots. I'm especially worried that people are led to think that robots "see" as we see.
Remember the 2016 work at Carnegie Mellon where psychedelic patterned eyeglass frames fooled face recognition neural networks? They spoofed target celebrities' faces with patterns that have nothing to do with facial features we recognise as such. https://dl.acm.org/doi/10.1145/2976749.2978392
One the first errors of discomprehension eh?
This reality gap to me is the *real* uncanny valley! We are irrationally frightened by robots that look and move like us, but what's really scary is they don't actually work like we do, not even remotely.
Stephen, bingo.
The danger lies in ignoring how natural things work on account of they are structured, and instead, abstracting them to laughable extremes and naively insisting they are equivalent.
"Unfortunately, errors of discomprehension may soon be even more worrisome in a new context: war. OpenAI seems to be opening the door to military applications of their systems, and at least one well-funded startup is busy hooking up AI to military drones." - how so? If deep learning is "hitting the wall", if it is so poor at understanding the world, how can it be of any use for military? They will surely fail, right?
In military they test new weapon and abandon projects if it doesn't meet the requirements. Most of experimental projects end like this (robo-mule is one of them). Despite all the hype of killing robots from Boston dynamics (like https://m.youtube.com/watch?v=y3RIHnK0_NE) none is used in US military
ChatGPT>>DALL-E gets lost on productively defining the term "camouflage"- a concept that's only readily comprehensible to a being that possesses visual perception as an active processing faculty. Presumably, AI image generation has similarly intractable difficulty with productively adapting concepts like "disguise" and "trompe l'oeil". https://upload.wikimedia.org/wikipedia/commons/c/cc/Escaping_criticism-by_pere_borrel_del_caso.png (Of course ChatGPT can supply a print-out of the precise dictionary definitions of those terms. But it has no more comprehension of their meaning than a Xerox machine. Does a dictionary have a big vocabulary? Trompe l'langue!)
DALL-E can "paint" elaborate pictures of ambitious scope with fine detail. But it does so without eyes, so to speak. The concept of "fooling the eye" requires both the presence of an eye- a a visual input receiver-transmitter- and a nexus of perception, to accurately process the signal input. A function that also implies the capacity to distinguish signal from noise, which is the capacity that- when found in humans (and many other animals) is "fooled" by a skillfuly constructed trompe l'oeil artistic image. Or by camouflage, which is used extensively by both animals and plants.
AI has none of those capacities. In that regard, the phrase "neural network" is a terribly inaccurate misnomer (that human bias!) AI is more like a card sorter (and condenser/synthesizer, if so instructed.) Not only is it unable to think- it's unable to see (or hear, feel, etc.) Yes, neurons ultimately rely on a baseline of binary switching, just like computers. But once past the assembly code level, analogies to biological neural networks fail. Computers are disembodied. It's my contention that embodiment is a precondition for autonomous motivation, which is a precondition for intelligent thought. A computer programmed to control a quadruped robot is still no more "embodied" than a desktop machine, tablet, or iphone. It only looks that way, to humans (that corporeal-animal bias of ours, again!)
That's part of the maddening fun of AI; it may not ever be able to reliably utilize the concept of trompe l'oeil, but it's easily able to generate those sorts of images inadvertently (as with the Eschersque "surfer girl" depicted in the post.) With no conscious effort, because it's never using any conscious effort. When it's given a task that requires conscious effort, it founders. Oh so effortlessly. https://samkriss.substack.com/p/a-users-guide-to-the-zairja-of-the
The problem isn't so much this or that technology, it is instead our outdated relationship with knowledge. It's that relationship which keeps generating new threats faster than we can figure out what to do about them.
https://www.tannytalk.com/p/our-relationship-with-knowledge
Example: 75 years after Hiroshima we still don't have the slightest clue how to remove the threat presented by nukes. And while we've been failing to meet that challenge, we've been busy piling up more technologies presenting more threats.
Thinking about technological threats one by one by one is a loser's game so long as the knowledge explosion is generating new threats faster than we can conquer them. But this is in fact what almost all experts are doing, playing the loser's game of focusing on particular threats.
All these technological threats arise from a failure to update our knowledge philosophy from the past to adapt to a radically new environment. Species that fail to adapt to changing conditions typically don't do so well.
Humans do have a clue about how to remove the nuclear threat- dismantle all the bombs. The problem is obtaining the consensus agreement to dismantle them and keep them dismantled, by the People Who Matter. It's a primate will-to-power problem. One bad actor spoils the whole bunch.
I'd like to think that humanity could get to a point where even the most dismal egotists could comprehend the material advantages of shifting thought energy and resources away from projects of mass destruction. I'm up for it, even on my worst day. I think most of us are. But even if I'm right about that, "most of us" is not enough.
The issue is with DALL·E, not ChatGPT. ChatGPT can only do so much - it describes the scene as good as it can to DALL·E - but then it's up to DALL·E to do the job.
but ChatGPT is supposed to be able see now and it obviously fails at that
It does see, if you then give the image back to it. Have you tried?
Given a slightly different prompt, the results are way better.
https://x.com/FabioA/status/1750533735035130189?s=20
replied on X
Is there even an elephant in the image generated from your prompt? It's so much better and more cleverly concealed, I can't find it anywhere. I feel like a bug or amoeba trying to see through the trickery of an intelligence so far beyond my own, the prompt made such a difference.
It’s clear now that the elephant is attempting to hide, but you still don’t have to look carefully to notice the elephant (which is what the original prompt asked for).
oh man, this is funny.
VQA that needs world interpretation is a long way to go
Neural networks rely on "correspondence effects," and correspondence has no bearing on relations. If it's up to machines to determine causal relations, it'd no doubt say something about Super Bowl results causing stock market movements and vice versa.
I am really puzzled by the war applications mentioned in the last part, not least in the light of recent news reports that the IDF was supposedly using some AI model or other to decide on where to bomb. As with spam generation, the question arises: does it actually make any material difference? People have generated spam before, and they have indiscriminately bombed non-military targets before. It seems more as if the AI was sold to the military equivalent of a clueless manager as a boondoggle, equivalent to how it could have been "let's put our munitions supply on the blockchain" a few years ago.
Regarding discomprehension, an image of a three-legged Donald Trump praying in a church made the rounds just two days ago. These images and video clips stand in a jarring contrast to the many social media comments claiming how realistic they are and that AI videos will put Hollywood out of business any moment now. Mate, are we looking at the same media?
More worth looking into regarding military usage of AI might be its usage in the Ukraine-Russia war, where the usage of killer robots is constant and technology is being pushed to its limits, including also in electronic warfare and with jamming capabilities. The autotargeting systems of FPV drones have developed to where even when the signal is lost, drones can autonomously continue to try to kill their target on their own.
But I have no idea how much of this is new technology as opposed to new applications of existing technology, or what sort of AI is involved, if LLMs are even involved at all or not, as there is not much technical coverage of any of it assuming much information of that kind is even public to go off of. It is hard to tell how error prone any of it is.
Yes, self-guided missiles or drones make more sense, and I can see an arms race of capabilities and jamming in that field. I was mainly referring to the idea of using some clever model to predict where to bomb and where not, especially because I have read AI hype-men speculate that a future super-AGI will strategise war at a genius level incomprehensible to humans.
Ultimately, the key question there is not a model but how good the data are (GIGO, i.e., without good ground intelligence, the model itself is a boondoggle), or else the model is just a fig leaf: "It's not me who is responsible for exploding a school building or sending my troops into a trap, the AI told me to do it, and its ways are mysterious".
Do you have any comments on this article? https://www.quantamagazine.org/new-theory-suggests-chatbots-can-understand-text-20240122/
We have the only choice to use supervised machine learning instead of unsupervised one to make a real world model of our minds - a tree of world languages and cultures - and that will take a lot of work of all specialists in all fields.
https://www.bloomberg.com/news/newsletters/2024-01-25/ai-companies-are-obsessed-with-agi-no-one-can-agree-what-exactly-it-is?srnd=technology-vp
https://www.economist.com/science-and-technology/2024/01/24/why-ai-needs-to-learn-new-languages