Oops! How Google bombed, while doing pretty much exactly the same thing as Microsoft did, with similar results
I will always remember today (my birthday no less), February 8, 2023, as the day in which a chatbot-induced hallucination cost Alphabet $100 billion.
But I will also remember it as the week in which Microsoft introduced an ostensibly similar technology, with ostensibly similar problems, to an entirely different response.
Kevin Roose, for example, wrote earlier today at The New York Times, that he “felt a … sense of awe” at the new Microsoft product, Bing enhanced with GPT, even while recognizing that it suffered from now-familiar hallucinations, and others types of errors (“If a dozen eggs cost $0.24, how many eggs can you buy for a dollar?” — Bing said 100, where the correct answer is 50.)
Others, it should be noted, also encountered errors in relatively brief trials, eg CBS Mornings reported in their segment that in a fairly brief trial they encountered errors of geography, and hallucinations of plausible-sounding but fictitious establishments that didn’t exist.
Roose seemed reassured, though, by the corporate types at Microsoft and OpenAI, reporting optimistically that “Kevin Scott, the chief technology officer of Microsoft, and Sam Altman, the chief executive of OpenAI, said in a joint interview on Tuesday that they expected these issues to be ironed out over time”.
The Times’ Roose even went so far to as chide anyone who even expressed concern about the errors, saying that “fixating on the areas where these tools fall short risks missing what’s so amazing about what they get right”.
§
Perhaps. But one might well have said the same in response to people (like me) who expressed concerns about the development of driverless cars in 2016. Seven years and roughly $100 billion later, those pesky errors haven’t gone away.
It’s not just driverless cars, either. Anyone remember Facebook M, demoed, praised, and then canceled? How about IBM Watson, advertised as an imminent solution to oncology and then sold for parts? How about the expert systems of the 1980’s, scarcely even discussed today? Or the awe-inspiring demo of Google Duplex, which has hardly moved a needle in the real world, almost five years later?
I am not saying with certainty that we are for witnessing the same movie all over again, but should anybody blithely accept the reassurances of some very interested parties when AI has such a long history of demos that never quite make it to solid product? I don’t think so.
§
Meanwhile, it is striking that neither Roose nor anyone else has explained exactly why Google and Microsoft received such different reception. The two megacompanies both demoed prototypes, neither fully ready for public use, built around apparently comparable technology, facing apparently similar bugs, within a day of each other. Yet one demo was presented as a revolution, the other as a disaster.
Since neither company has yet subjected their products to full scientific review, it’s impossible to say which is more trustworthy; it might well turn out that Google’s new product is actually more reliable.
Most likely, neither of them are particularly reliable. Yet they are being treated as polar opposites. At the very least, someone ought be asking, from business perspective, what’s the moat here, if two companies are basically both about to offer the same thing?
§
And it’s not just that AI has a fairly spotty track record for taking demos into reliable products; and it’s not just that hallucinatory web search could be dangerous if left to run amok in domains like medicine, it’s that the promises themselves are problematic, when examined from a scientific perspective.
Scaling neural network models—making them bigger—has made their faux writing more and more authoritative-sounding, but not more and more truthful.
Hallucinations are in their silicon blood, a byproduct of the way they compress their inputs, losing track of factual relations in the process. I first pointed out this risk in 2001, in the fifth chapter of my book The Algebraic Mind, and the problem has persisted ever since. To blithely assume that the problem will soon go away is to ignore 20 years of history.
I do actually think that these problems will eventually get ironed out, possibly after some important fresh discoveries are made, but whether they get ironed out soon is another matter entirely. Is the time course to rectifying hallucinations weeks? Months? Years? Decades? It matters.
If they don’t get ironed out soon —and they might not– people might quickly tire of chat based search, in which BS and truth are so hard to discriminate, and eventually find themselves returning to do their searches the old-fashioned, early 21st century way, awe or not.
Gary Marcus (@garymarcus), scientist, bestselling author, and entrepreneur, is a skeptic about current AI but genuinely wants to see the best AI possible for the world—and still holds a tiny bit of optimism. Sign up to his Substack (free!), and listen to him on Ezra Klein. His most recent book, co-authored with Ernest Davis, Rebooting AI, is one of Forbes’s 7 Must Read Books in AI.
“Scaling neural network models—making them bigger—has made their faux writing more and more authoritative-sounding, but not more and more truthful.”
Hero-level posting.
Great read, although I was expecting to find the actual reason that Google bombed but Microsoft didn't. Was it because Google rolled out an inferior version of a BS generator? Or was it because Google has been gradually losing the trust of the general public?
This being said, is it just me or has anyone else noticed that deep learning is the AI technology that drives both autonomous vehicles and LLMs? In spite of the hype and the successes in some automation fields, DL has failed miserably in the intelligence arena. Isn't it time for AI to change clothes, so to speak? I got all excited when I heard that John Carmack was working on a new path to AI only to find out that he got his inspiration from OpenAI's deep learning guru, Ilya Sutskever, who gave him a reading list of 40 DL research papers. Lord have mercy.
I really don't understand the obsession with deep learning in AI research. The brain generalizes but a deep neural net optimizes objective functions. They could not be more polar opposites. I'd say it's high time for AGI researchers to drop DL and find something else to work with but maybe it's just me.