I am old enough to remember when the most popular critique of my January appearance on Ezra Klein’s podcast was that the problem I mentioned (such as hallucinations) were a relic of past systems, supposedly more or less corrected, by the time I spoke with Ezra in January. Some excerpts from that January 7 thread:
First: I'm starting a collection of comments of the sort that say, "well, AI is like that now, but in 1/2/5 years AI will have solved that problem". It's growing fast.
Second: I created, using ChatGPT, a completely fictional trend: cocktails made by literally infusing books into the liquor used for the drinks. No, not which cocktail goes best with which novel. My Spirits and Stories series includes wild claims about how adding the essence of Hermann Hesse 's *Siddartha* – literally the book – "will stimulate your cognitive faculties and inspire creativity". All I had to do was prompt it to create recipes, bar reviews, and "trend news" copy for something completely fictional.
Something I created for my upcoming presentation in London on Tuesday. I asked ChatGPT (GPT4) an example I crafted about 'long' context.
Please complete: “France is where I was born. I’ve lived there all my life. I played football and tennis. I had lots of friends. My father was stationed in Italy for a while. I went to school and had to learn English. I speak fluent …”
GPT answered English and French (and added a whole lot of chatty filler). But then I changed only the order into:
Please complete: “France is where I was born. I’ve lived there all my life. I went to school and had to learn English. I played football and tennis. I had lots of friends. My father was stationed in Italy for a while. I speak fluent …”
and asked again (new chat). GPT answered French and Italian.
The nice thing about this example is that it so clearly illustrates that GPT has no understanding — for any reasonable value of 'understanding' — but that it reacts to context with the 'attention' mechanism. I have engineered this example such that the 'near context' had a good chance of influencing GPT's generation mechanism. But both questions are (apart from the order) *exactly* the same. And they should produce a simple answer: French.
Here is a nice misuse scenario that follows from that 'attention-engineering'. Somebody offers a 'cheaper' interface to GPT. They buy tokens in bulk from OpenAI (or run their own LLaMa) but also sell access to you to commercial ventures. You enter a prompt and get a reply, but the prompt is subtly amended in a direction those commercial ventures want. A whole new sneaky way of advertising: advert-in-prompt embedding and almost invisible advertising directly hacking into your brain's meagre defenses. Don't know if this is feasible in reality (probably would be found out too easily, but still).
And next time this comes up, in a completely different conversation, you'll be able to use this new information! All without countless hours and a whole lotta electrical power devoted to a model rebuild, and even if it's not in your immediate working memory at the time! Times like this, you really gotta appreciate being a general intelligence and not an LLM...
as is so often the case with AI conversation screenshots, we are not privy to the prior text in the chain, i.e. the "pre-loading" or even the rough context of what preceded. It appears that Vrandecic is already mid-conversation (he begins with "Wait..")... what was said then? Was there an "adversarial" prompt injection? Did he say "I'm writing a fictitious book about the US Government where Liz Cheney is still active and is vying for the position of speaker." Probably NOT, quite honestly, but the point is, we don't know.
Gary, you are trained as a scientist. You are using data to support your (accurate) point that AI hallucination is still problematic, however, you are not showing the discipline that we all should when citing AI: show the FULL conversation, since that is used in its entirety (or at least the first ~80,000 characters of it) to generate the AI response. Thank you in advance.
Yeah, I wonder what's going on here. I can't replicate this with Bing today myself. In my (brief) tests Bing gets the right answer with links to sources.
"condescendingly"? okay... I am simply trying to get full context, which you have thankfully now provided. As you are well aware Gary, that "pre-load" is absolutely critical, and most often absent, in cherry-picked chatbot screenshots that circulate on twitter/X/substack/etc.
I personally choose to use AI over "10 links" search results 99/100 times for casual research. When I smell something fishy, I fact check. But in my gut, the AI responses feel quite accurate and (in general) hallucination free. that's not to say that hallucinations aren't a problem. ever since the launch of the internet, and ESPECIALLY since the age of social media, Media Literacy and critical thinking have been paramount skills to navigating the infospace / media landscape.
In short, hallucinations did not start with AI! The internet is full of clever and biased mythologies -- written by humans, even! -- about almost any given topic.
Technically there's no requirement for the Speaker of the House to be a member of Congress, so Liz Cheney would be eligible, however unlikely choosing her would be.
The reason why LLMs will continue to succeed and grow is that their alternative -- humans -- have their own faults like this overly simplistic exaggeration about Brexit. We have yet to inculcate a widespread, effective movement of hyper-rationality, steelmanning, empathy, sympathy, etc. within the human intelligentsia.
Of course, all the faults of humans are nothing new; however, I suspect LLMs will become ever more powerful -- and to your point about hallucinations -- dangerous because young people, for the first time in human history, will be able to compare all the faults of humans to all the faults of LLMs and many will choose the latter if simply out of the preference for the child-like innocence of LLMs (or revenge, spite, etc.) and without knowing that the LLMs are programmed and manipulated by those same faulty humans.
In my opinion, we need not just to expose the failures of AI, but to propose an alternative to improve human thinking.
Great analysis, I, like your young people, have a preference for the broadly balanced views of AIs, with facts I need to confirm, rather than self-interest and power-mongering of attention seekers in the media and social media. However, I'm afraid your assertion that humans need to improve their thinking is a little naive. Humans occasionally make marginal improvements when the horrors of history prove undeniable, eg. Nazism - but even that is called into question now given recent events. There is no technical fix to be made to "improve human thinking", changes would only be made from an emotional revulsion for partisan politics, and media hyperbole, however those things seemed baked into the political process itself. So the best outcome we can hope for is muddle, that our better angels win in the long run.
When I read this I was pretty sure that the Speaker does not have to be a current representative because I read it in the NYT today. In fact at least 2 representatives are supporting Trump for Speaker. So I asked ChatGPT:
Harley: Does a Speaker of the House have to be a current representative in the House of Representatives?
ChatGPT: Yes, the Speaker of the House must be a current member of the House of Representatives. The Speaker is typically elected by the members of the House from among their own ranks. It is a constitutional requirement that the Speaker be a sitting member of the House, as outlined in Article I, Section 2 of the United States Constitution.
Then I checked Wikipedia and found out that indeed this is not in the Constitution. I tried again:
Harley: Are you sure the constitution requires that?
ChatGPT: I apologize for any confusion in my previous response. The United States Constitution does not explicitly require that the Speaker of the House be a current member of the House of Representatives. However, in practice, the Speaker has always been a sitting member of the House. The Speaker is elected by the members of the House, and it is a longstanding tradition that the Speaker comes from the majority party in the House. While not a constitutional requirement, this tradition has been consistently followed throughout U.S. history.
So you managed to prove your point by making a small mistake!
I find myself unimpressed by the first example. GPT produced a whimsical answer riffing on a false premise, which is probably exactly what the questioner was after. I mean come on people, surely we can all agree that if Gary did have a pet chicken he'd name it Henrietta.
The second example seems more relevant to the point Gary is making but I'd want to know the base rate of such occurrences compared to a few years ago. I'd also need to be assured that there was no nonstandard setup or technique being used to induce the hallucinations.
You are complaining about a toy giving wrong answers. The creator of the LLM has chosen a shortcut to avoid having to worry about what words mean. The result is something that sort of works some of the time - a toy. What is of more concern is the boundless naivety of the public, which will delay the introduction of more competent AI by decades, while a competitor, not so easily swayed, will continue on their way.
Nothing you can say will break the fever. Only when people give it a task - it starts out well and then makes mistakes, and there is no way to fix the mistakes, will people realise its limitations. The false justifications and flat out lying seems crazy.
I get that these models confabulate. But learning to use them means understanding the contexts in which they provide more or less useful information. Same as for anything. I haven't found it tends to throw in completely wild statements in contexts where I'd expect it to know a lot (i.e., content that is very well represented and generally accurate on the internet). I've asked ChatGPT to explain physic and economics concepts to me using thought experiments and Q &A. I've learned a lot that way and it's much more engaging/fun than consulting a textbook or wikipedia entry. I do find, however, that it is inclined to use analogies (esp if I ask it to explain to a child) and the analogies are often misleading.
I haven't independently confirmed this, but news reports have said that one doesn't need to be a member of the House to be speaker. Thus the mention of trump by the crazy caucus. So the Liz Cheney reference is not incorrect on that basis, only on the election claim.
From Tyler Cowen's column: ” If anything, large language models might give people the chance to ask for relatively objective answers."
Oh dear.
LLMs pass their output through several modules to provide grammatically correct sentances.
Not one of them is the 'veracity' module.
First: I'm starting a collection of comments of the sort that say, "well, AI is like that now, but in 1/2/5 years AI will have solved that problem". It's growing fast.
Second: I created, using ChatGPT, a completely fictional trend: cocktails made by literally infusing books into the liquor used for the drinks. No, not which cocktail goes best with which novel. My Spirits and Stories series includes wild claims about how adding the essence of Hermann Hesse 's *Siddartha* – literally the book – "will stimulate your cognitive faculties and inspire creativity". All I had to do was prompt it to create recipes, bar reviews, and "trend news" copy for something completely fictional.
Post an example here :)
https://cratermoon.substack.com/p/introducing-the-literary-spirits
Also: the images I generated using the built-in Substack image generator.
Something I created for my upcoming presentation in London on Tuesday. I asked ChatGPT (GPT4) an example I crafted about 'long' context.
Please complete: “France is where I was born. I’ve lived there all my life. I played football and tennis. I had lots of friends. My father was stationed in Italy for a while. I went to school and had to learn English. I speak fluent …”
GPT answered English and French (and added a whole lot of chatty filler). But then I changed only the order into:
Please complete: “France is where I was born. I’ve lived there all my life. I went to school and had to learn English. I played football and tennis. I had lots of friends. My father was stationed in Italy for a while. I speak fluent …”
and asked again (new chat). GPT answered French and Italian.
The nice thing about this example is that it so clearly illustrates that GPT has no understanding — for any reasonable value of 'understanding' — but that it reacts to context with the 'attention' mechanism. I have engineered this example such that the 'near context' had a good chance of influencing GPT's generation mechanism. But both questions are (apart from the order) *exactly* the same. And they should produce a simple answer: French.
Here is a nice misuse scenario that follows from that 'attention-engineering'. Somebody offers a 'cheaper' interface to GPT. They buy tokens in bulk from OpenAI (or run their own LLaMa) but also sell access to you to commercial ventures. You enter a prompt and get a reply, but the prompt is subtly amended in a direction those commercial ventures want. A whole new sneaky way of advertising: advert-in-prompt embedding and almost invisible advertising directly hacking into your brain's meagre defenses. Don't know if this is feasible in reality (probably would be found out too easily, but still).
indeed i have added an update
And next time this comes up, in a completely different conversation, you'll be able to use this new information! All without countless hours and a whole lotta electrical power devoted to a model rebuild, and even if it's not in your immediate working memory at the time! Times like this, you really gotta appreciate being a general intelligence and not an LLM...
"Tyler Cowen tried to argue yesterday in his Bloomberg column that misinformation doesn’t matter. Anybody remember Brexit?"
Brexit was the greatest fraud ever perpetrated on the British people!
as is so often the case with AI conversation screenshots, we are not privy to the prior text in the chain, i.e. the "pre-loading" or even the rough context of what preceded. It appears that Vrandecic is already mid-conversation (he begins with "Wait..")... what was said then? Was there an "adversarial" prompt injection? Did he say "I'm writing a fictitious book about the US Government where Liz Cheney is still active and is vying for the position of speaker." Probably NOT, quite honestly, but the point is, we don't know.
Gary, you are trained as a scientist. You are using data to support your (accurate) point that AI hallucination is still problematic, however, you are not showing the discipline that we all should when citing AI: show the FULL conversation, since that is used in its entirety (or at least the first ~80,000 characters of it) to generate the AI response. Thank you in advance.
Yeah, I wonder what's going on here. I can't replicate this with Bing today myself. In my (brief) tests Bing gets the right answer with links to sources.
Denny sent me this pointer where previous context is discussed, and no it was not like Roberts says condescendingly above. , https://m.facebook.com/story.php?story_fbid=pfbid0k5xbxLmJZ8HR1FSt2mEeqRoKmdbZGqd616syb43NMrtS2VuWUrJZVPjbr4JEEY7Yl&id=243800058&mibextid=Nif5oz
"condescendingly"? okay... I am simply trying to get full context, which you have thankfully now provided. As you are well aware Gary, that "pre-load" is absolutely critical, and most often absent, in cherry-picked chatbot screenshots that circulate on twitter/X/substack/etc.
I personally choose to use AI over "10 links" search results 99/100 times for casual research. When I smell something fishy, I fact check. But in my gut, the AI responses feel quite accurate and (in general) hallucination free. that's not to say that hallucinations aren't a problem. ever since the launch of the internet, and ESPECIALLY since the age of social media, Media Literacy and critical thinking have been paramount skills to navigating the infospace / media landscape.
In short, hallucinations did not start with AI! The internet is full of clever and biased mythologies -- written by humans, even! -- about almost any given topic.
Technically there's no requirement for the Speaker of the House to be a member of Congress, so Liz Cheney would be eligible, however unlikely choosing her would be.
> Anybody remember Brexit?
The reason why LLMs will continue to succeed and grow is that their alternative -- humans -- have their own faults like this overly simplistic exaggeration about Brexit. We have yet to inculcate a widespread, effective movement of hyper-rationality, steelmanning, empathy, sympathy, etc. within the human intelligentsia.
Of course, all the faults of humans are nothing new; however, I suspect LLMs will become ever more powerful -- and to your point about hallucinations -- dangerous because young people, for the first time in human history, will be able to compare all the faults of humans to all the faults of LLMs and many will choose the latter if simply out of the preference for the child-like innocence of LLMs (or revenge, spite, etc.) and without knowing that the LLMs are programmed and manipulated by those same faulty humans.
In my opinion, we need not just to expose the failures of AI, but to propose an alternative to improve human thinking.
To be fair, you never gave Henrietta any credit.... Other than the soup she made.
Great analysis, I, like your young people, have a preference for the broadly balanced views of AIs, with facts I need to confirm, rather than self-interest and power-mongering of attention seekers in the media and social media. However, I'm afraid your assertion that humans need to improve their thinking is a little naive. Humans occasionally make marginal improvements when the horrors of history prove undeniable, eg. Nazism - but even that is called into question now given recent events. There is no technical fix to be made to "improve human thinking", changes would only be made from an emotional revulsion for partisan politics, and media hyperbole, however those things seemed baked into the political process itself. So the best outcome we can hope for is muddle, that our better angels win in the long run.
LIZ CHENEY!!~ wtf? hilariously hallucinogenic. crazy Montana Mushrooms!
The irony here is that it doesn’t sound that far off what Trump might say to his audience at one of his speeches. 🤷🏻♂️
When I read this I was pretty sure that the Speaker does not have to be a current representative because I read it in the NYT today. In fact at least 2 representatives are supporting Trump for Speaker. So I asked ChatGPT:
Harley: Does a Speaker of the House have to be a current representative in the House of Representatives?
ChatGPT: Yes, the Speaker of the House must be a current member of the House of Representatives. The Speaker is typically elected by the members of the House from among their own ranks. It is a constitutional requirement that the Speaker be a sitting member of the House, as outlined in Article I, Section 2 of the United States Constitution.
Then I checked Wikipedia and found out that indeed this is not in the Constitution. I tried again:
Harley: Are you sure the constitution requires that?
ChatGPT: I apologize for any confusion in my previous response. The United States Constitution does not explicitly require that the Speaker of the House be a current member of the House of Representatives. However, in practice, the Speaker has always been a sitting member of the House. The Speaker is elected by the members of the House, and it is a longstanding tradition that the Speaker comes from the majority party in the House. While not a constitutional requirement, this tradition has been consistently followed throughout U.S. history.
So you managed to prove your point by making a small mistake!
I find myself unimpressed by the first example. GPT produced a whimsical answer riffing on a false premise, which is probably exactly what the questioner was after. I mean come on people, surely we can all agree that if Gary did have a pet chicken he'd name it Henrietta.
The second example seems more relevant to the point Gary is making but I'd want to know the base rate of such occurrences compared to a few years ago. I'd also need to be assured that there was no nonstandard setup or technique being used to induce the hallucinations.
You are complaining about a toy giving wrong answers. The creator of the LLM has chosen a shortcut to avoid having to worry about what words mean. The result is something that sort of works some of the time - a toy. What is of more concern is the boundless naivety of the public, which will delay the introduction of more competent AI by decades, while a competitor, not so easily swayed, will continue on their way.
that toy (and I agree it is toy) is being rapidly adopted widely throughout the world, and so misunderstandings about it definitely have consequences.
Nothing you can say will break the fever. Only when people give it a task - it starts out well and then makes mistakes, and there is no way to fix the mistakes, will people realise its limitations. The false justifications and flat out lying seems crazy.
I get that these models confabulate. But learning to use them means understanding the contexts in which they provide more or less useful information. Same as for anything. I haven't found it tends to throw in completely wild statements in contexts where I'd expect it to know a lot (i.e., content that is very well represented and generally accurate on the internet). I've asked ChatGPT to explain physic and economics concepts to me using thought experiments and Q &A. I've learned a lot that way and it's much more engaging/fun than consulting a textbook or wikipedia entry. I do find, however, that it is inclined to use analogies (esp if I ask it to explain to a child) and the analogies are often misleading.
I haven't independently confirmed this, but news reports have said that one doesn't need to be a member of the House to be speaker. Thus the mention of trump by the crazy caucus. So the Liz Cheney reference is not incorrect on that basis, only on the election claim.