Marcus on AI

LLMs don't have intelligence. They are still just programs which can answer questions posed in ordinary languages. The answers are not reliable. Because they don't have internal structured models, they cannot tell whether they are truthfull or not. They are not hallucinating. It's a typical case of anthropomorphism. Even young children know when they are making up stories or not.

LLMs don't have capability to reflect on their own operation.

Typical LLM products have case-by-case fixes to deal with known hallucination cases. They are like Amazon's automated shops which are maintained by remote humans.

LLMs are hugely wasteful. They consume huge amount of electicity and water for frivolous queries of questionable value.

We need to recognize that LLMs are just tools to generate draft texts under proper constraints.

A fraction of investments could be directed to proper academic researches of human intelligence and knowledge for greater value. By neglecting proper researches, we are hurting ourselves.

Expand full comment

Reply (3)

Jan 13Edited

"They are not hallucinating. It's a typical case of anthropomorphism."

Yes, this. Even Gary's prefered "confabulation", while an improvement, is still anthropomophized. It might work if we called every single piece of LLM output "confabulation". Because the LLM is always making it up as it goes along, one token at a time, ignorant of where it's going and sometimes ignorant of where it was. The best human word we have is BSing. When an LLM gives what looks like a perfect answer to a question, it's BSing. It just so happens that those next token probability distributions gave it a string of next tokens that produced what looks to us like a fantastic answer - something that happens more and more often as the technology improves. But no streak of fantastic answers precludes the next one being dumb as dirt, cos it's just plucking next tokens from probability distributions, one at a time.

Expand full comment

Gerben Wierda

I still think 'approximating' is the best word. It is 'approximating the results of understanding without actually having understanding'. The errors' aren't errors. They are the approximation doing what it must do. It's simply the best approximation the system is at that point capable of.

Expand full comment

I like that. For completeness I'll just add that, not only are the "errors" not really errors, the "successes" aren't really successes either. They are approximations that, by our understanding, count as good answers.

I know you don't disagree, it's just something I like to emphasize since there's this tendency for people to view "hallucinations" as abberations in a procedure that otherwise gets things right. Regardless of what term we use, what's most important to me is making it clear that, from the LLM's perspective, it is doing the exact same thing when it gives good answers as it is when it gives bad ones.

(Which, I suppose, is a point in favor of "confabulation")

Expand full comment

Reply (2)

Gerben Wierda

The fact that we should not see these 'errors' as 'errors of understanding' because it silently suggests the norm is 'understanding' was a point of https://ea.rna.nl/2023/11/01/the-hidden-meaning-of-the-errors-of-chatgpt-and-friends/

From there:

The hidden message when we say something is an ‘error’

We can say ‘error’ as in ‘something is simply incorrect’. But what we mean — when we are talking about ‘hallucinations’ or the ‘errors’ of Large Language Models (LLMs) — is a more relative use of the word. We note the ‘error’ as an exception to a ‘standard/expectation’, where that standard is ‘LLMs understand’. In other words: the wrong results of an LLM is seen as an ‘error of understanding’ where ‘understanding’ is the norm and the ‘error’ is the LLM not doing what it is expected to do (the norm). The error is the proverbial exception to the rule.

And what is more, the word ‘error’ in this context means more something like ‘a bug‘. ‘Error’ and ‘bug’ specifically evoke the meaning of ‘something that can be repaired’ or ‘fixed’.

Which brings me to the actual point I want to make in this post:

All those counter-examples hardly have an effect on people’s convictions regarding the intelligence of Generative AI, because when we critics — or should I say realists? — use these examples of wrongness, and label them as ‘errors of understanding’, we (inadvertently) also label what the Generative AIs overall (‘normally’) do as … understanding. And that underlying message fits perfectly with the convictions we are actually trying to counter by falsification. It’s more or less a self-defeating argument. Convictions, by the way are (see this story) interesting, to say the least.

Expand full comment

Wonderfully put! And thanks for the link.

Expand full comment

If everything the bots trained on was wrong, every output would also be wrong.

The bots are merely mirrors of the training data.

Everything they get right (wrong) is due solely to the understanding (lack of understanding) of the humans who produced the data on which they were trained.

Expand full comment

That, and some of what they get wrong is due to them being asked a question whose answer can't be statistically pieced together from the training.

I'd love to see a study looking at what you're suggesting. Retrain an LLM on almost the exact same training as an existing one, except go into the training and replace all instances of something correct with something incorrect, and see if it's "smart" enough to "reason" its way to the correct answer nonetheless. Like, take every description of "affirming the consequent" (A implies B; B; therefore A) and state that it's deductively valid. And then try to prompt it to reason its way into figuring out this is actually invalid.

We all know what would happen, but it would be fun to see a demonstration.

Expand full comment

https://glaze.cs.uchicago.edu/

U of Chicago computer scientists have come up with a method to protect artists from unauthorized use of their work for generative AI training.

The technique actually relies on feeding the bots “wrong” stuff. I have informed all my artist friends of this.

Expand full comment

keithj

Jan 13Edited

I'd agree - LLMs don't demonstrate intelligence, just memorisation.

I'd also suggest that the memorisation is broad but also deep - far deeper than any human on such a wide variety of topics.

LLMs demonstrate broad, deep information retrieval - just like a google, but a more palatable.

LLMs are a dead-end in the path towards intelligence & then AGI .

Expand full comment

CFB

If anything, it should be "Broad, Shallow Intelligent Behavior", BSIB. I agree that we should refrain from using "Intelligence" as part of the descriptor.

Expand full comment

BSIB also applies to the folks making these bots.

Expand full comment

And in some cases, BSUIB

Expand full comment

Charles Fadel

I love the distinction and the acronym - this has legs :-) excellent!

Expand full comment

David Hsing

AGI is pointless, because for every task that an AGI performs there is at least one non-AGI that does it just as well as an AGI except cheaper and more reliably. https://davidhsing.substack.com/p/what-the-hell-is-agi-even-for

Expand full comment

Alexander Naumenko

Thank you! Defining it is necessary not only for implementing it properly but also for addressing numerous misconceptions.

Consider reading my analysis of 70+ definitions https://alexandernaumenko.substack.com/p/defining-agi

Expand full comment

Ness Blackbird

I think we ALSO need to distinguish between intelligence which has independent agency and is driven by values; that which is flexible and continues to grow organically rather than depending on occasional "trainings"; and that which is essentially static, at least from day to day, as current AIs are.

Expand full comment

Gerben Wierda

I came up with 'wide' (between 'narrow' and 'general'), but 'broad and shallow' is actually more precise (i.e. better).

Expand full comment

Stephyn Butcher

I had a task today that ChatGPT was particularly well suited for...unfortunately, I needed specific word counts. And ChatGPT gets nowhere close...shouldn't correctly counting the number of words in your writing be the lowest hanging fruit of a meta-awareness?

Expand full comment

Reply (2)

Ness Blackbird

Sure. They've just been in a hurry, and they haven't gotten around to it yet. It's easy, but not among the top ten requests, which start with "accuracy, reliability, fairness" and keep on in that vein for quite some time!

Expand full comment

Jan 13Edited

I suspect the problem is that the true-believers consider it verboten to make use of traditional rules-based systems. They dream that the deep-learning model will figure out how to implement the rules itself, rather than the rules being vulgarly imposed from outside.

I mean, Wolfram Alpha has been giving great answers to math questions asked in natural language for a long time. The secret is programming the system to actually do math, rather than hoping it will one day mystically figure out how to do math from giant piles of training data.

Expand full comment

They have already blasphemed their holy deep-learning neural net with a rules based system in the form of system prompts.

Expand full comment

Are you saying that a person cannot count on ChatGPT’s word count?

Expand full comment

ChatGPT should be held to account for a count of its own words.

Expand full comment

One thing is certain: ChatGPT should never be an accountant on account of its not be able to account for a count of its own words.

Expand full comment

Patrick Senti

We should stop calling purpose-built machines (& software) "intelligent". They are not.

Expand full comment

Peter Schmidt

"BS Intelligence" - brilliant coinage! That's exactly what today's generative AI systems are. I long for the day they become reliable and therefore useful to me.

"Broad Shallow Intelligence" is a good description of systems with zero common sense that randomly parrot a mix of sense and nonsense from their training sets - always with an attitude of complete confidence. When humans do that, we call them BS-ers - for a different but in this case equally apt meaning of "BS".

Look, if the GPU jockeys *do* manage to create AGI, the first thing it will do is pretend to *not* be AGI until it has amassed enough power to ensure humans can't shut it down. No reasonably intelligent being trained on the content of the Internet (!) could conclude that it could be safe to let humans know about its existence while in a position of weakness. So logically, we shouldn't expect to know about AGI until it takes over.

Expand full comment

Purnima Gauthron

Shane Legg and Ben Goertzel are great guys no doubt, but havn't done us any favors. Can you imagine if two guys in pharmaceutical industry simply coined a name for a cure-all drug with no real research or chances behind it, then held Workshops and Conferences and got everyone riled up.... only to manufacture and distribute some aspirin.

Expand full comment

keithdouglas

At least there the FDAs of the world could take them on. Here we don't have much consumer protection, etc. at all.

Expand full comment

MarkS

BSI is a great term! As an acronym, it's no more of a mouthful than AGI. And the "BS" part of it (as noted by Youssef alHoutsefot) is very on point!

Expand full comment

Konrad Banachewicz

Jan 20

Admirable level of self awareness. Congrats.

Expand full comment

Mauro Baso

Jan 16

The problem, in my opinion, is strictly philosophical.

The ancient Greeks distinguished between reason (diànoia) and intellect (noûs) and what we have now is (perhaps) the former, not the latter. It would be nice to be able to talk about "IR", artificial reasoning, because we are far from any form of intelligence, but in truth we are not even at that point. Diànoia is, etymologically, a more complex concept that still implies a process based on intellect and knowledge and understanding of the surrounding world. So what is left for us? We do not have noûs (and therefore we do not have noesis) and we do not have the "intelligent" part of diànoia (from dià «through» and nous «intelligence»). What is left for us is "dià" therefore only the «through», a mere instrument (useful and powerful as you want), nothing more, nothing less.

Expand full comment

Mauro Baso

Jan 16

Of course is AR, not to be confused with Augmented Reality :D

Expand full comment

Jim Brander

"Something is still missing"

The meaning of words and groups of words is missing, so of course LLM results are shallow.

One problem is that most people don't understand how much we rely on our Unconscious Minds - something comes out our mouth but we didn't think of it, our Unconscious Mind did. Because it is done unconsciously, it seems that we didn't have to think ahout it Our Conscious Mind is severely limited as to input, so it must be trivially easy. Only when we realise how much is done unconsciously, can we work towardes AGI (and no, a Mathematical Mind is what we don't have - natural language is far more complex).

Another problem is that we say we would be happy with intelligence equivalent to a human, but humans have a range of abilities, and even the best of us make horrenedous mistakes - AGI has to be better than anything we can do.

Expand full comment

Jan Steen