Marcus on AI

I generally object to the use of the word "hallucination" to describe how LLMs work - which would imply that when they do work correctly they mimic a perfectly normal human, but every now and then they suffer an aberration.

LLMs are effectively a statistical model of human written language. As such they are best described by a phrase repeated by Mark Twain (who credited it to Benjamin Disraeli - "There are Lies, Damn Lies, and Statistics".

Effectively, LLMs are providing us with statistical word sequences, untested with a true understanding of logic or the world around us.

Expand full comment

Reply (7)

Zahid Bashir

confabulation

Expand full comment

Reply (3)

jibal jibal

The problem with both of those words is that they suggest mental states, but LLMs don't have mental states.

Expand full comment

that we know of.....😎

Expand full comment

jibal jibal

No, they don't have mental states. Suggesting otherwise shows a complete failure to understand the technology.

Expand full comment

I believe it is spelled “crackbotulation”

LLMs “crackbotulate” because that’s what crackbots do

Expand full comment

better!

Expand full comment

Joe

Aug 12

Artificial Information (not Intelligence)

Expand full comment

Paul Jurczak

LLMs are parroting tidbits from a huge text corpus. Often they happen to be right. Too frequently they are wrong. They are too primitive to hallucinate.

Expand full comment

I hate that word too... it implies some rudimentary consciousness which these dumb programs clearly do not have....

Expand full comment

Andreas Schneider

Jun 9

In other words, they’re bullshit

Expand full comment

12345

May 16

your summary discounts the effect of their training, which these days is the bulk of the 'ingredients' that are baked into them.

Expand full comment

Shandon F.

And yet, the quote was directed at the very human predilection to believe what we want to believe. So, what to make of the humans that proceed without a true understanding of logic or the world around them—or, much worse, a fine understanding but a desire to manipulate to their own advantage?

Expand full comment

khimru

Average? Normal? Typical?

The chilling truth is that LLMs have shown us the obvious fact that “average human” behaves like LLMs do, most of the time: they also regurgitate words without thinking and, in fact, all marketing tricks exploit that propensity.

But humans, usually, have some area where they are experts – it could be the ability to cook delicious food, or write programs or do many other things, depending on human.

And fallacy around LLMs is related to the fact that after we have managed to make them act like average humans acts in “average” situation, when stakes are low… somehow that should promote it to the entirely different mode of operation that happens when expert actually tries to do expert things… it doesn't work that way!

Expand full comment

Shandon F.

I'm not sure I understand your comment 100 percent but I think we basically agree, with the caveat that it sounds like you're giving AI a dominion that it doesn't have, i.e. "'average human' behaves like LLMs do"—it's the inverse, since humans precede AI and AI is trained solely on human-created knowledge. That's the broader point, that we're attaching all of this importance to AI "hallucinations" or, in non-jargon, lies. The question is intent—if it's a mistake of the AI's formula or if it's willful misinformation for the AI's own benefit...which is currently only known to be a human trait.

Expand full comment

khimru

What I'm saying is that AI does do things in a way humans NORMALLY do: receive words, say some other words that are attached to these words in their subconscious with zero fact-checking or understanding. Without even “turning on consciousness,” as they say. Even article itself says: “as humans sometimes, when well motivated, do”.

But then… the fallacy: if we already have covered that 90% of what humans do… how hard would it be to make AI the remaining 10%, too? Let's make LLMs bigger and consciousness would magically appear!

But that is not happening – and we knew it from the beginning.

Consciousness and subconscious are not just different words, they are using physically different mechanisms in a human brain (as far as we know, anyway).

As for “intent”… dogs and cats don't have consciousness and couldn't program, however… they sure as hell have some goals and intents.

Expand full comment

Shandon F.

If the sole focus is to get AI to 100% parity with human consciousness, then yes, we've all been lied to. But my issue is that Marcus' line of dissent is too self-centered—it's an academic engineer's view of the *project plan* of AI. So, "AI is hallucinating and that means that we've been lied to" is technically true. But it's an argument made as objective truth along a timeline of advancement that quickly renders it obsolete at best and willfully misleading at worst. I just wish that, rather than these aloof "I sat down with Harry Shearer and we talked about those silly hallucinations" pieces, we focused on the actual significance of that last 10% and what impact trying to get there has had and what impact it will have as we approach 99%. And, importantly, I just wish there was more context and attention given to the "as humans sometimes, when well-motivated, do," which is holding an awful lot of water for humans as rational actors.

Expand full comment

Martin Machacek

The main problem with LLMs is that current models cannot provide any information about their confidence in the answer … or simply refuse to provide an answer if there is not enough information. A human (unless they are a bullshitter) may put qualification on the information they provide (something like AFAIK). LLMs provide definitive and (especially to non-experts) plausibly sounding statements which may sometime be entirely wrong. For example a scientist (unless malicious and stupid) won’t fabricate an invalid link to a supporting material.

The remaining 10% (or so) of making LLMs be comparable to skilled humans is going to be hard, because it inevitably requires skills like generalization and other weighting of facts than frequency of occurrence :).

Expand full comment

Which country was the actor Albert Einstein born in?

Expand full comment

Reply (2)

Mikhail Mimic

USA: https://en.m.wikipedia.org/wiki/Albert_Brooks

Expand full comment

TheAISlop

This is one of those prompts I've built to cut to the core of the issue. LLMs are built on probabilities. So even if given a hint like "actor" thet ignore the hint and give more weight to "Albert Einstein". Now find a Michael Jordan who wasn't any NBA legend and see what happens.

Expand full comment

Reply (2)

Eduardo Rodriguez

Jun 28

You mean https://en.wikipedia.org/wiki/Michael_I._Jordan, of course

Expand full comment

But you must admit that he played the role of wild haired, absent minded professor perfectly.

Expand full comment

Had he not played the part of a physics genius, he could easily have played in Spaghetti Westerns.

Expand full comment

Swissghanistan obviously....😆

Expand full comment

Notorious P.A.T.

We just need LLMs to go up to eleven.

Expand full comment

Reply (4)

A.J. Sutter

Stonehenge is a pretty good metaphor for LLM performance vs hype, now that you mention it.

Expand full comment

MarkS

But they can't carry the one. None of them have managed to learn the rules of arithmetic.

Expand full comment

If they can’t even carry a measly little 1, how can we ever expect them to carry civilization?

Expand full comment

Go to 11?

They can go to LL as far as I am concerned.

Expand full comment

Joe

Aug 12

That'll be another $Trillion please!

Expand full comment

Jan Steen

I always wonder how much of the confident, authoritarian tone (encyclopedia-like as you correctly call it) and the grovelling apologetic one when you point out a mistake has been hard-coded. When ChatGPT says "I apologize", is this really something that it came up with spontaneously or is it, as I suspect, something that programmers added?

Who is the 'I' that replies to you in a conversation with an LLM? If my suspicion is correct, this is all rather deceptive, isn't it? You are made to believe that you are talking to an individual who can apologize. It is Eliza on steroids.

Expand full comment

jibal jibal

The tone is conditioned by the vendor.

Expand full comment

Yes, OpenAI prefers the sycophAIntic tone “You are the smartest person who has ever walked on the earth — or any other planet in the universe”

Expand full comment

Gabriel Risterucci

Little typo there: "But LLMs still cannot - and on their may own may never be able to — reliably do even things that basic."

Otherwise great piece. It really puts into words the "it's just a statistical machine" position.

Expand full comment

Joy in HK fiFP

The NYT had an article today about this lacklustre showing by the latest crop of so-called AIs. This article was quite different from the gushing fan-fawning from the resident Ai-promotor, Kevin Roose.

I replied to a commenter questioning the use of "hallucinations" as a descriptor, and suggested the better word was B*llsh*t," and recommended your article from Feb. to them. If it gets past the NYT censor, I think they will be pleased with what you have to say.

Expand full comment

Gerben Wierda

May 5Edited

"LLMs don’t actually know what a nationality, or who Harry Shearer is; they know what words are and they know which words predict which other words in the context of words." It's even worse. They don't even know anything about 'words'. They operate on *tokens* which are mostly meaningless fragments.

I have experienced that explaining 'tokens' clearly to people (e.g. https://youtu.be/9Q3R8G_W0Wc — video — or https://erikjlarson.substack.com/p/gerben-wierda-on-chatgpt-altman-and — text) makes it a lot easier for people to grasp that there is no 'understanding'. All the explanations that use 'word' trigger in humans 'meaningful character sequence'. But the LLMs work on 'meaningless character sequences'. So, explaining with using words is a (Pratchett) "Lie to Children". It is superficially OK, but it isn't true and at some point the lie starts to bite you.

These systems do not make errors (https://ea.rna.nl/2023/11/01/the-hidden-meaning-of-the-errors-of-chatgpt-and-friends/). Every continuation they produce is actually correct from the viewpoint of token statistics. They thus cannot discriminate between right and wrong continuations based on actual meaning/understanding.

The problem is that 'an approximation of the result of (human) understanding' by using (non-understanding) token-statistics isn't 'understanding'. These approximations can be good or bad, but the essence why these issues are unsolvable is *because* it is *fundamentally* an approximation based on weakly related statistics.

Expand full comment

Reply (2)

Gerben Wierda

From that textual explanation (2023):

The question of course is: does it matter that this is how LLMs work? They're still very impressive systems (I agree), after all.

The answer is: yes it matters. The systems are impressive, but we humans are impressionable. We see results that reflect our own qualities (such as linguistic quality), but in reality this quantity has its own quality. And in this case, no amount of scaling is going to solve the fundamental limitations. Simply said: guessing the outcome of logical reasonings based on word-fragment statistics is not really going to work.

Expand full comment

Peter Dorman

May 7

This is important: when we say an LLM commits an "error" or a "hallucination", what do we mean by that? I think it means its output contradicts an overarching rule that limits the space we designate as plausible. But to avoid such errors means being fed all the rules, and there are too many, and in any case that would mean having a process to scan potential rules, with all the opportunities for error in that. Is this all a dead end?

Of course, the practical question is why investors are willing to place immense bets on the performance and profitability of such systems when there is little logic or evidence to justify them.

Expand full comment

Matt Kolbuc

I've found the hallucinations are so totally unreliable, it's mind boggling how tens of billions of dollars with this much hype were poured into the technology. Developing a NLU engine (free and open source btw: https://cicero.sh/sophia/) for example..

At one point was simply trying to sanitize a list of words I curated. Simply sending batch requests into LLMs saying, "here's 20 words, reply with each word along with a 1 or 0 beside it to identify whether or not it's part of conversational English, or a 0 if it's a non-English word / typo". Ensured good, solid prompt.

It couldn't even do that with reliable confidence, I had to throw the wordlist out. It did get the majority right, but would then mark words like "run", "about", and "what" with a 0, while marking words like "svidnteezigpq" with a 1.

Considering these things are called Large Language Models, keyword being language, and they can't even consistently tell me what is and isn't an English word, that means these things are relegated to the category of pure amusement.

Expand full comment

Danko Nikolic

I like this.

Expand full comment

Dakara

https://www.mindprison.cc/p/intelligence-is-not-pattern-matching-perceiving-the-difference-llm-ai-probability-heuristics-human

Great post! LLMs make it difficult for most people to understand their nature. No matter how many times examples such as this are posted, typically people will counter with "but humans hallucinate too" or something similar.

FYI, I published something this morning specifically for that type of rebuttal that I hope adds more clarity.

"It is such a magnificent pretender of capability. It is just good enough to fully elicit the imagination of what it might be able to do, but never will."

Expand full comment

LLM Destiny

May 5Edited

Gary absolutely will own a pet chicken named Henrietta, he cannot escape destiny.

Expand full comment

Jan Blok

LLM's model language, not the world...

Expand full comment

Sufeitzy