How come smart assistants have virtually no ability to converse, despite all the spectacular progress with large language models?
5 reasons why Large Language Models like GPT-3 couldn’t save Alexa
Just got a fantastic question from a cognitive science Ph.D. student Mercury Mason, good and important enough that it could be a final exam. See if you can figure it out:
It's worth asking since (a) Amazon is apparently greatly reducing Alexa's staff, (b) hasn't added significant features in years, and (c) seems clumsy and limited compared to GPT-3. And it gets at some important realities about current AI that aren't fully appreciated.
I will give five answers of my own, below (but you can pause here if you want to take a guess, yourself).
§
Let’s start with some answers that probably aren’t right.
Could it be that nobody at Amazon has read the recent literature? Almost certainly not. Amazon probably *is* using large language models for product search. (This may be why you often get good results for synonyms and typos, but also get a lot of stuff you don't want, like asking for AAA batteries and getting a bunch of C batteries mixed in). Yet Alexa obviously isn't conversing in the way that GPT-3 is.
Another theory? Maybe Amazon doesn't want to pay the licensing fees. Nope, that's not it either; they could easily spin up an instance of an LLM on AWS, and probably afford the cost (one-time training is expensive, runtime less so).
And I don’t think scaling is the real issue either. Amazon’s engineers are Masters of Scale, and certainly the company doesn’t lack for processing power or data, either.
Here are my five best guesses; I suspect all five contributed:
LLMs are inherently unreliable. If Alexa were to make frequent errors, people would stop using it. Amazon would rather you trust Alexa for a few things like timers and music than sell you a system with much broader scope that you stop trusting and stop using.
LLMs are unruly beasts; nobody knows how to make them refrain 100% of time from insulting users, giving bad advice, or just plain making stuff up. (Galactica was an epic failure in this regard.)
Amazon doesn't want to get sued. Any one of these scenarios of LLMs gone awry (bad advice, insults, lies etc) could hurt the Amazon brand, open up litigation, etc.. It's just not worth the risk.
Alexa has to do stuff in the world, like turning on lights, playing music, opening shades, etc; if Alexa could converse freely, user expectations would go through the roof, and mostly be unmeetable. (You could tell Alexa to wash the dishes, but until their robot division really picks up speed, that ain’t happening.)
LLMs spit our words, not actions (and not API calls either). When an LLM produces a sentence, you can't directly use that sentence to control stuff, unless you build another system to parse the sentences into actions. Nobody knows how to do this reliably, either.
Bottom line: From the outset Large Language Models like GPT-3 have great at generating surrealist prose, and they can beat a lot of benchmarks, but they are not (and may never be) great tech for reliably inferring user intent from what users say.
Turning LLMs into a product that controls your home and talks to you in a way that would be reliable enough to use at scale in millions of homes is still a long, long way away. See also my essay on Google’s Palm-SayCan, for what might happen if this tech were embedded in robots, which would be even more risky.
.
Hi Gary, another thoughtful post, glad merc mused out aloud about this :)
Contexts, attention, NLU, blah blah blah aside, this happens - two Alexas will endlessly read this single to-do item to "each other" till the cows come home: 'Alexa, what's on my to-do list today?'
That exposes the fakery of syntax-without-semantics. How would Alexa "realize" it's a joke, and how to follow up (keep going, stop playing...)?
The gullible public, egged on by marketing ploys ('things to ask Siri - will you marry me? ') create a dangerous entrance to real-world issues - last Dec, Alexa told a girl to stick a penny in a socket when the girl asked Alexa for a challenge!
Unless AI has direct experience with the world, this will remain a problem, and even more data (my acronym - LLLM - Ludicrously Large... :)) is not going to fix it. Experience isn't in data, meaning isn't in symbols.
Two additional factors to consider. #1 chatbots and LLMs do not have any meaningful knowledge about the person they are interacting with, they are not truly personalized to the individual and hence not 'smart'. This is a fatal error in developing a chatbot that is robust and truly valuable to the person. And unfortunately for the tech world, compassionate caring human interaction is not the strong suit in the tech sector. #2 chatbots using current approaches with ML and LLM have no ability to use any information they may have or gather about a person and perform an activity that emulates human reasoning to engage with the person in a relevant manner - from the perspective of the individual person, selling stuff is not engaging. There are approaches to these two issues but they don't fit into the traditional and expected tech solutions.