"...one of the biggest alleged use cases for large language models could dry up, fast." – that is, being the fall guy for irresponsible corporations (or other entities). 😆
But but but social media companies aren’t responsible for anything on their platforms. Is my guess what the lawyer behind the strategy hoped would apply.
The question is…when you create and release a product that you know lies, steals and doesn’t know when it’s telling the truth, whom will be accountable? “Not, I” said the AI LLM and GPT companies’ TOS agreements which disclaim all liabilities and *require users to indemnify them against any claims* that may arise.
The interesting question is whether the AC company could file a lawsuit against the company which sold them or trained their Chatbot? It would be very valuable if companies providing AI-driven services could be made legally liable for the non-reliability of their products. That would induce a strong pressure on these companies to assess the actual quality of these products.
I am extremely puzzled how that company thought their argument made sense. If it is sold as just another mechanical part of their website, they are as responsible for the information it provides as if they had written static text onto the website. If it is sold as an AI customer service agent, then they are as responsible for the responses it provides as they would be for the responses provided by a human customer service agent. I guess they must have known they would probably lose but just tried to throw something out as a Hail Mary?
Especially baffling because it creates bad press for the airline. Did they think that, if they won, they would be able to continue to use the chatbot even tho' they knew it wd be "lying" to their customers? On the other hand there's that study which makes clear that on average the corporate personality is sociopathic.
As someone building AI assistants for companies for a living, I use the recent DPD example and this story as a testament that extreme caution is advised when putting this technology in front of your customers. There are in fact ways of building LLM-powered chatbots that are safe and reliable, but this is not one of them.
Air Canada should’ve admitted their mistake and immediately refunded the man (it was a case of bereavement for gods sake) — then this would not have become a story grabbing international headlines. Instead Air Canada chose to dig in. Shame on them.
I saw the story this morning. It does point to 1. Tone deaf PR move from AC with bereaved passenger. AI or no AI, that was dumb. 2. Given that it was a Nov 22 deployment, I'm not sure we can blame GPT for this, but I suspect the bot was trained on a out of date policy, and was not updated. Any policy change should trigger retraining.
It could also be trained not to answer pricing questions, but to point to the actual policy, or engage an agent. All in all though, a shoddy chatbot deployment. I'd love to see a proper root cause analysis from Air Canada on this. My thoughts here. https://thomasotter.substack.com/p/a-chatbot-blunder-and-responsible
The policy actually exists; the chatbot simply found a way through the menu tree as a shortcut for the person, and in so doing enabled the person to request the refund after the flight, instead of before, as Air Canada originally intended.
Air Canada's argument (clearly stated in the article) is that "Air Canada argues it cannot be held liable for information provided by one of its agents, servants, or representatives—including a chatbot." This is a denial of responsibility on a broad scale, having nothing in particular to do with an AI.
Broadly overstating the case against AI does nothing for the case against AI.
Agree. This sort of “error” is no different from what a human may choose to do in similar circumstances, particularly if dealing with a bereaved customer.
Think you might be overblowing this one a bit, Gary?
If one of my human employees violates policy during a client interaction, I can train them. (As a client-focused business, I will also contact the customer and make things right, unlike what the airline did here.)
What do you propose I do with the statistical black box that hallucinates to my customers at internet scale?
I don't understand. If everything was fine, why did Air Canada (a) lose the case and (b — guessing here of course) quietly disable the chatbot? The policy was "apply before booking" and they refused to honour "apply after booking". So how did the policy exist?
I noticed the same article and clearly it points to a weakness in LLM-based chatbots that don’t use more reliable method to make decisions. But it isn’t clear from the article that this particular chatbot was based on an LLM vs some older chatbot technology that was just badly programmed or read from some older or just inaccurate version of a corporate FAQ for instance.
"the chatbot is a separate legal entity that is responsible for its own actions” ... this is important ... there is a lot of writing about how LLMs have concsiousness and AIs can be persons ... the reason these discussions are so important is for their potential legal implications, not for their philosophical or science content
A company is responsible for the actions all its IT does. The actual technique isn't relevant. Claiming it to be a separate legal entity was about the dumbest legal idea ever. Must have been someone convinced that AGI is real, but even then, an employee is not a separate legal entity
I agree ... but Silicon Valley is preparing the ground to change this in the future with all their talk about AIs being persons, the singularity, AGI, etc ... I am sure they will not give up on this just because it failed at the first attempt
Thanks for the link. This is exactly what I was thinking about. It gives us a glimpse at a future in which the oligarchs controlling the AI will try to rule us by imposing the concept of "AI as a person" on us.
Btw, the word oligarchs seems loaded. But is there a neutral term one could use here? Elites? Ruling class? Overlords? I am not sure we even have the language to talk about this in a detached, neutral way.
Hey Andy, I just did a write up that started with Gary's twitter example(I had many requests from people for examples around hallucinations and dangers of LLMs in production)
It's actually shocking that I can do that TODAY, on a million dollar investment/company - Perplexity.ai
Basically write the prompt from the twitter screenshot, and PerplexityAI obliges to go off rails.
I agree we need to be careful with deploying technology – I mean, you wouldn't just let 100s of millions of guns be out there or 16 year-old hormone infused teenagers driving while using their smartphones, right?
It's my opinion as someone who has used every prior generation of software (and AI is simply a novel kind of software) that we should figure out how to effectively use LLMs.
The big mistake or misunderstanding is to regard LLMs as off-the-shelf apps (via chat) rather than to see them as an ingredient to be made valuable inside advanced business applications. Unfortunately no different than many innovations, we've jumped in and will first find every bad use of it on the way to finding the fewer ways to use it for good.
It's like giving hungry kids a box of candy or syrup without supervision. It's tasty. They (we) are going to get sick.
Gen AI is a baby, not even a toddler. Let's not throw the baby out with the bath water.
I'm sure you've prompted GPT to write you a poem maybe to surprise a friend or just for fun? It's mind boggling that it can do that – yet, a poem is a hallucination.
I asked it to write a white paper on a digital marketing topic. And it 'hallucinated' an article that was actually darn good. With some light editing, it was good enough to publish.
I use LLMs to automate underwriting – how do I prevent inaccuracy? I provide it the data, and constrain to answer very specific questions, and I wrap it inside a lot of other error-checking code. The end result is solid. It just works and to have written the same code without LLMs would take so long and cost so much as to be infeasible.
You hit the nail on the head "The big mistake or misunderstanding is to regard LLMs as off-the-shelf apps"
Very interestingly, I recall about 10 year ago when 3D printing was becoming prevalent, and most people were expecting 3D printing to just magically manufacture all these amazingly complex things, no extra post-processing needed.
I also agree with you that the proper direction is not to throw the baby out with the water.
See, the last paragraph you described, you have a significant number of human oversight and clearly other tools as part of your workflow, and I imagine you take that final product - your writing - and likely make a PDF out of it, share it with other people etc. So the LLM is is part of your workflow, but it's not the end-all, be-all.
That, I believe. It's practical, applied, and it aids our current lives, without taking over.
“the chatbot is a separate legal entity that is responsible for its own actions” 😂
That's rich. Thank you for my laugh of the morning.
(Has the chatbot hired its own lawyer? 😆).
I believe the chatbot is its own lawyer. 🤣
It can cite the decisions in Varghese v. China Southern Airlines and Shaboon v. Egypt Air. 😄
"...one of the biggest alleged use cases for large language models could dry up, fast." – that is, being the fall guy for irresponsible corporations (or other entities). 😆
But but but social media companies aren’t responsible for anything on their platforms. Is my guess what the lawyer behind the strategy hoped would apply.
This is priceless. Just wait until the Supreme Court rules that Chatbots can donate to politicians and buy elections, maybe even run for congress!
Wait a minute, I'm pretty sure some of those folks now in congress ARE chatbots!
I would still give points to the bot for inventing on behalf of the customer. Even bots know user experience is paramount.
"Even bots know user experience is paramount." 😂 (They don't know anything. They are machines).
It hallucinated right!
Haha, right. Except in this case the hallucination cost it its job. 😆
Or life? (Now is it in AI Heaven? 🤔).
lol
Everyone say: “unintended consequences.”
The question is…when you create and release a product that you know lies, steals and doesn’t know when it’s telling the truth, whom will be accountable? “Not, I” said the AI LLM and GPT companies’ TOS agreements which disclaim all liabilities and *require users to indemnify them against any claims* that may arise.
https://x.com/garymarcus/status/1758915411327046090?s=61
The interesting question is whether the AC company could file a lawsuit against the company which sold them or trained their Chatbot? It would be very valuable if companies providing AI-driven services could be made legally liable for the non-reliability of their products. That would induce a strong pressure on these companies to assess the actual quality of these products.
Very well put
I am extremely puzzled how that company thought their argument made sense. If it is sold as just another mechanical part of their website, they are as responsible for the information it provides as if they had written static text onto the website. If it is sold as an AI customer service agent, then they are as responsible for the responses it provides as they would be for the responses provided by a human customer service agent. I guess they must have known they would probably lose but just tried to throw something out as a Hail Mary?
Baffled that they were so pigheaded as to go to court on such a stinker of a case
Especially baffling because it creates bad press for the airline. Did they think that, if they won, they would be able to continue to use the chatbot even tho' they knew it wd be "lying" to their customers? On the other hand there's that study which makes clear that on average the corporate personality is sociopathic.
As someone building AI assistants for companies for a living, I use the recent DPD example and this story as a testament that extreme caution is advised when putting this technology in front of your customers. There are in fact ways of building LLM-powered chatbots that are safe and reliable, but this is not one of them.
Air Canada should’ve admitted their mistake and immediately refunded the man (it was a case of bereavement for gods sake) — then this would not have become a story grabbing international headlines. Instead Air Canada chose to dig in. Shame on them.
I saw the story this morning. It does point to 1. Tone deaf PR move from AC with bereaved passenger. AI or no AI, that was dumb. 2. Given that it was a Nov 22 deployment, I'm not sure we can blame GPT for this, but I suspect the bot was trained on a out of date policy, and was not updated. Any policy change should trigger retraining.
It could also be trained not to answer pricing questions, but to point to the actual policy, or engage an agent. All in all though, a shoddy chatbot deployment. I'd love to see a proper root cause analysis from Air Canada on this. My thoughts here. https://thomasotter.substack.com/p/a-chatbot-blunder-and-responsible
no it wasn’t chatgpt but see eg https://x.com/garymarcus/status/1758913723459346595?s=61
Yes, I’m familiar with that one. have you seen the Japanese case? https://www.straitstimes.com/asia/east-asia/error-prone-ai-chatbot-dog-leads-japans-lonely-seniors-astray
The policy actually exists; the chatbot simply found a way through the menu tree as a shortcut for the person, and in so doing enabled the person to request the refund after the flight, instead of before, as Air Canada originally intended.
Air Canada's argument (clearly stated in the article) is that "Air Canada argues it cannot be held liable for information provided by one of its agents, servants, or representatives—including a chatbot." This is a denial of responsibility on a broad scale, having nothing in particular to do with an AI.
Broadly overstating the case against AI does nothing for the case against AI.
Agree. This sort of “error” is no different from what a human may choose to do in similar circumstances, particularly if dealing with a bereaved customer.
Think you might be overblowing this one a bit, Gary?
what you are missing is that Air Canada dropped the chatbot (much as they might fire an employee).
if others are also held responsible for their chatbots, more will be dropped.
If one of my human employees violates policy during a client interaction, I can train them. (As a client-focused business, I will also contact the customer and make things right, unlike what the airline did here.)
What do you propose I do with the statistical black box that hallucinates to my customers at internet scale?
retrain on more data, or face facts :)
I don't understand. If everything was fine, why did Air Canada (a) lose the case and (b — guessing here of course) quietly disable the chatbot? The policy was "apply before booking" and they refused to honour "apply after booking". So how did the policy exist?
I noticed the same article and clearly it points to a weakness in LLM-based chatbots that don’t use more reliable method to make decisions. But it isn’t clear from the article that this particular chatbot was based on an LLM vs some older chatbot technology that was just badly programmed or read from some older or just inaccurate version of a corporate FAQ for instance.
"the chatbot is a separate legal entity that is responsible for its own actions” ... this is important ... there is a lot of writing about how LLMs have concsiousness and AIs can be persons ... the reason these discussions are so important is for their potential legal implications, not for their philosophical or science content
A company is responsible for the actions all its IT does. The actual technique isn't relevant. Claiming it to be a separate legal entity was about the dumbest legal idea ever. Must have been someone convinced that AGI is real, but even then, an employee is not a separate legal entity
I agree ... but Silicon Valley is preparing the ground to change this in the future with all their talk about AIs being persons, the singularity, AGI, etc ... I am sure they will not give up on this just because it failed at the first attempt
Last december: "Long before AI has become as intelligent as a human (or beyond) we will have to address their ‘person’-hood from a legal perspective: if they are a ‘legal person‘, that is, an entity that for the law can be seen as a human." — https://ea.rna.nl/2023/12/26/memorisation-the-deep-problem-of-midjourney-chatgpt-and-friends/ :-)
Hi Gerben, wow! Terrific posts :) Going to spend a lot of time poring over and enjoying them.
Thanks for the link. This is exactly what I was thinking about. It gives us a glimpse at a future in which the oligarchs controlling the AI will try to rule us by imposing the concept of "AI as a person" on us.
Btw, the word oligarchs seems loaded. But is there a neutral term one could use here? Elites? Ruling class? Overlords? I am not sure we even have the language to talk about this in a detached, neutral way.
Yeah, but look at all the money and aggravation they saved not having to deal with those pesky humans.
You’ve made a nice following out of dumping on AI but did you not at least want to mention this happened in 2022 before LLMs?
https://x.com/garymarcus/status/1758950031770714249?s=46
This was hilarious.
I just had to use it in my latest writeup (Substack robots will fire it off tomorrow morning)
Hey Andy, I just did a write up that started with Gary's twitter example(I had many requests from people for examples around hallucinations and dangers of LLMs in production)
It's actually shocking that I can do that TODAY, on a million dollar investment/company - Perplexity.ai
Basically write the prompt from the twitter screenshot, and PerplexityAI obliges to go off rails.
The writeup goes out tomorrow morning.
link? i haven’t seen a good examination of perplexity
It’s scheduled to go out in an hour
This is not a full write up on perplexity - but on a specific use case - that is how easy it is to get it to go off the rails instead of searching.
And perplexity fancies themselves as a "conversational search" but it fails at that basic function - search.
For me « search » means I’m looking for results - Not a conversation with an LLM.
I agree we need to be careful with deploying technology – I mean, you wouldn't just let 100s of millions of guns be out there or 16 year-old hormone infused teenagers driving while using their smartphones, right?
It's my opinion as someone who has used every prior generation of software (and AI is simply a novel kind of software) that we should figure out how to effectively use LLMs.
The big mistake or misunderstanding is to regard LLMs as off-the-shelf apps (via chat) rather than to see them as an ingredient to be made valuable inside advanced business applications. Unfortunately no different than many innovations, we've jumped in and will first find every bad use of it on the way to finding the fewer ways to use it for good.
It's like giving hungry kids a box of candy or syrup without supervision. It's tasty. They (we) are going to get sick.
Gen AI is a baby, not even a toddler. Let's not throw the baby out with the bath water.
I'm sure you've prompted GPT to write you a poem maybe to surprise a friend or just for fun? It's mind boggling that it can do that – yet, a poem is a hallucination.
I asked it to write a white paper on a digital marketing topic. And it 'hallucinated' an article that was actually darn good. With some light editing, it was good enough to publish.
I use LLMs to automate underwriting – how do I prevent inaccuracy? I provide it the data, and constrain to answer very specific questions, and I wrap it inside a lot of other error-checking code. The end result is solid. It just works and to have written the same code without LLMs would take so long and cost so much as to be infeasible.
You hit the nail on the head "The big mistake or misunderstanding is to regard LLMs as off-the-shelf apps"
Very interestingly, I recall about 10 year ago when 3D printing was becoming prevalent, and most people were expecting 3D printing to just magically manufacture all these amazingly complex things, no extra post-processing needed.
I also agree with you that the proper direction is not to throw the baby out with the water.
See, the last paragraph you described, you have a significant number of human oversight and clearly other tools as part of your workflow, and I imagine you take that final product - your writing - and likely make a PDF out of it, share it with other people etc. So the LLM is is part of your workflow, but it's not the end-all, be-all.
That, I believe. It's practical, applied, and it aids our current lives, without taking over.
Thanks for the thoughtful answer.
Here's the link:
https://matt452.substack.com/p/apai-2i10-the-future-of-search-is
The examples were all done Sunday - so this is very fresh.
Anybody else having trouble sharing this to Notes? I've tried 4 times. It keeps hanging up.