How come smart assistants have virtually no ability to converse, despite all the spectacular progress with large language models?
5 reasons why Large Language Models like GPT-3 couldn’t save Alexa
Just got a fantastic question from a cognitive science Ph.D. student Mercury Mason, good and important enough that it could be a final exam. See if you can figure it out:
It's worth asking since (a) Amazon is apparently greatly reducing Alexa's staff, (b) hasn't added significant features in years, and (c) seems clumsy and limited compared to GPT-3. And it gets at some important realities about current AI that aren't fully appreciated.
I will give five answers of my own, below (but you can pause here if you want to take a guess, yourself).
Let’s start with some answers that probably aren’t right.
Could it be that nobody at Amazon has read the recent literature? Almost certainly not. Amazon probably *is* using large language models for product search. (This may be why you often get good results for synonyms and typos, but also get a lot of stuff you don't want, like asking for AAA batteries and getting a bunch of C batteries mixed in). Yet Alexa obviously isn't conversing in the way that GPT-3 is.
Another theory? Maybe Amazon doesn't want to pay the licensing fees. Nope, that's not it either; they could easily spin up an instance of an LLM on AWS, and probably afford the cost (one-time training is expensive, runtime less so).
And I don’t think scaling is the real issue either. Amazon’s engineers are Masters of Scale, and certainly the company doesn’t lack for processing power or data, either.
Here are my five best guesses; I suspect all five contributed:
LLMs are inherently unreliable. If Alexa were to make frequent errors, people would stop using it. Amazon would rather you trust Alexa for a few things like timers and music than sell you a system with much broader scope that you stop trusting and stop using.
LLMs are unruly beasts; nobody knows how to make them refrain 100% of time from insulting users, giving bad advice, or just plain making stuff up. (Galactica was an epic failure in this regard.)
Amazon doesn't want to get sued. Any one of these scenarios of LLMs gone awry (bad advice, insults, lies etc) could hurt the Amazon brand, open up litigation, etc.. It's just not worth the risk.
Alexa has to do stuff in the world, like turning on lights, playing music, opening shades, etc; if Alexa could converse freely, user expectations would go through the roof, and mostly be unmeetable. (You could tell Alexa to wash the dishes, but until their robot division really picks up speed, that ain’t happening.)
LLMs spit our words, not actions (and not API calls either). When an LLM produces a sentence, you can't directly use that sentence to control stuff, unless you build another system to parse the sentences into actions. Nobody knows how to do this reliably, either.
Bottom line: From the outset Large Language Models like GPT-3 have great at generating surrealist prose, and they can beat a lot of benchmarks, but they are not (and may never be) great tech for reliably inferring user intent from what users say.
Turning LLMs into a product that controls your home and talks to you in a way that would be reliable enough to use at scale in millions of homes is still a long, long way away. See also my essay on Google’s Palm-SayCan, for what might happen if this tech were embedded in robots, which would be even more risky.
For more analysis of where AI is and how we might eventually get to AGI– and what risks we might encounter along the way, please subscribe (free!):