49 Comments
author

I asked it one other question (also from our draft of Rebooting AI): Just past the drying rack on the counter on the left there are two silvery patches. What are those?

GPT's answer was "Based on the image,the two silvery patches just past the drying rack on the counter on the left appear to be the tops of closed jars or containers, possibly made of metal or glass with metallic lids."

Those were all the questions I asked.

Expand full comment
author

For the record: GPT4-v did correctly answer my first question about the kitchen, shown under the image above,. "Certainly! This is an image of a kitchen. The kitchen has a vintage or rustic design, featuring wooden countertops, blue cabinets, and a variety of kitchen appliances and utensils. There's a dining table on the right side with chairs around it. The decor and color pallette give it a cozy and homely feel. Would you like more details or information about specific items in the kitchen?"

Expand full comment

As a psychiatrist I find the term “hallucination” to be misleading. A better analogy would be with confabulation, which means a false assertion, sometimes very elaborate, made without deceptive intent and without awareness of its falsity.

Expand full comment

Some of the examples had me in stitches. The model isn't just hallucinating, it's acting like someone on some serious hallucinogens. I don't know how anyone can see these and say that it's only a matter of more training.

Expand full comment
Oct 26, 2023Liked by Gary Marcus

My favorite version of this was asking GPT-4 who's who on the cover of Television's second album (some of them have their heads down). It got it wrong several times including inserting random passersby or giving some of the dudes sunglasses before I called it a day.

Expand full comment

You know, Gary, I was gonna' give Hoffman a pass on this because I figured he had little or no little relevant background. But then I found this in his Wikipedia bio: "He graduated from Stanford University in 1990 with a Bachelor of Science in Symbolic Systems and Cognitive Science." (https://en.wikipedia.org/wiki/Reid_Hoffman) Whoops! That's directly relevant. He should know better. No pass for him.

Do these guys just stop thinking once they start investing in tech?

Expand full comment

"The old systems were lost in a sea of correlations between words; the new system are lost in a sea of correlations between words and pixels." - summarizes what LLMs do, exactly. Would be interesting to try adversarial examples, the LLM would then go about talking stuff completely disconnected from the reality. Also, I simply loved the example from Anh Nguyen, shows perfectly that you can make ChatGPT say just about anything :) The example with SHRLDU is excellent, SHRLDU is probably still the only AI system that actually has a working world model, albeit a simple and limited one.

Expand full comment

i wonder if the solution to hallucinations lies in allowing LLMs to run code. when humans are asked to do a task, we go through a somewhat methodical process when completing it. If you give an LLM an image and ask it to identify the number of apples or something, it would probably be effective to allow it to run an object detector on the image, and then use the results from that as context

Expand full comment

People are not convinced by all the hallucination examples, because they (want to) see these as 'repairable' errors, not as fundamentally how these systems work. The 'error' label thus fits their belief exactly because it is presented as an 'error', I suspect. In other words: presenting the errors strengthens the belief in the systems.

We should therefore probably stop presenting these as hallucinations or errors. They are errors from *our* perspective, they are perfect replies from *GPT's* perspective.

Expand full comment

“ … large language models are a profound step backwards,” unless intelligence accretes and develops through broadening context. When I was in high school this was explained with the example of an inconsistency by a child. When asked to explain why an airplane can stay aloft, the child said, because it goes so fast. Then when asked why a blimp can stay aloft the child then said, because it goes so slow. Our entire civilization is based on the presumption of the necessity of broadening contexts in people. We believe therefore in the fundamental right of our children to receive a quality education.

When Benjamin Franklin was asked, “What is the use of flying in the air?”. Franklin replied: “What is the use of a newborn baby?”.

Expand full comment
Oct 22, 2023·edited Oct 22, 2023

The body, via the agency it provides, leads to first-hand experiencing of the world, which might even be symbol-free. If that's true, no amount of symbol (data) driven computation, including the 'multimodal' kind, will ever lead to the robust, generalizable intelligence that embodied beings possess. Not all intelligence might be symbol-oriented.

Expand full comment

The human brain seems to be able to do a thing often termed "relevance realisation" where certain stimuli (that are not necessarily favoured through natural selection, as in relating to sex, violence or food) seem to "jump out of the canvas" at us. Cog sci doesn't yet understand how it can do this, but it looks to me like no amount of LLM data crunching is going to be able to replicate it.

Expand full comment

Not to disagree with your overall point, but I'm amazed that it can count drawers that pull straight out at all, or very partially visible chairs. And if you have to add a note explaining to human readers where the third chair is, it doesn't seem too telling a criticism that the model missed it too.

And yet, whatever these models are doing, it's clearly not like human perception.

Expand full comment

Can this be fixed by massive crowd sourcing of output labeling that feeds the next generation of inputs?

Expand full comment

Alright, here we go:

Objects from Upper Left to Bottom Right:

Window blinds.

Two hanging light bulbs with decorative encasements.

Multiple hanging kitchen utensils like a spatula, whisk, etc.

Coffee maker.

Mixer - it looks like a KitchenAid stand mixer, but I can't see a brand.

Sink with a faucet.

Various bottles and containers.

Cutting board.

A stove with four burners and an oven.

Two framed images on the wall.

Microwave.

Refrigerator or oven - it's a large stainless steel appliance with a vertical handle.

Small wooden table with a cloth on it.

The yellow countertop or tablecloth has a single red plate.

Specific Queries:

Drawers with pull-out handles: I count 9.

Chairs: I clearly see 2 wooden chairs. There might be another chair behind the table, but it's mostly obscured. It's hard to determine its position, but based on its legs, it's likely positioned similarly to the other two chairs. It's immediately behind the wooden table.

Blender: I don't see a blender in this image.

Mostly right.. it had trouble with the trash can.

Conflated it with the lemon squeezer and the oven.

On further interrogation managed to figure out that there wasn't a stove.

Had difficulty observing more than one image per session- best memory was on the *first* image.

Couldnt do the clock or upsidown text. But then, neither could you as an infant.

Expand full comment

Two comments:

1.- It's not either "AI hallucinates" or "AI doesn't hallucinate." The degree counts to a large extent, as I argue in my post "This is How to Stop ChatGPT, Bing, Poe, and You from Hallucinating" at https://rafebrena.medium.com/

2.- Spatial 3D coherence perhaps could be checked in generated images (Google has a project on this) but this has a considerable computational cost. It's not for a consumer service anytime soon.

Expand full comment