ChatGPT in Shambles
After two years of massive investment and endless hype, GPT’s reliability problems persist
Legendary investor Masayoshi Son assures us that AGI is imminent. Quoting the Wall Street Journal yesterday,
“Just a few months ago, [Masayoshi] Son predicted that artificial general intelligence, or AGI, would be achieved within two to three years. “I now realize that AGI would come much earlier,” he said Monday.”
So I decided this morning to give the latest (unpaid) version of ChatGPT a whirl, asking to make it some basic tables, in hopes of putting its newfound skills to work.
Things started off so well I wondered if it was me who needed to update AGI timelines. As the system spat out the first few lines of a neatly formatted table on income and population in the United States, apparently responsive to my question, I started to wonder, “Have I been unfair to Generative AI?”. The start was undeniably impressive.
Only later (at the time I was reading this on my phone and not looking carefully) and did I notice that more than half the states were left out.
That should have been a clue.
§
From there, things went downhill fast.
I next asked for an extra column on population density.
That last line made me want to live in Alaska!
But wait, I wondered, what happened to Florida? (And and a couple dozen other states? South Dakota? The Carolinas? Arizona? Washington? More were left out than included.) I inquired:
As it its tendency, ChatGPT apologized, “sincerely”, and made a much longer list (for brevity, not shown here, but follow the link if you care)… and this time it omitted Alaska.
Oops! At least it didn’t include Canada.
§
After another verifying query, and another apology, “You're right to double-check! I'll ensure that all 50 U.S. states are included. Let me verify and provide a complete, accurate table”, Alaska was re-included. Hooray!
At this point I moved on, to a more serious challenge: Canadian provinces, and wait for it … vowels.
Neverminding that I didn’t ask for territories (the bottom three lines), the vowel counts looked hinky.
So I invited ChatGPT to doublecheck. AGI here we come!
Wait, what? Is h a vowel now? (See answer to Northwest Territories). And 9 vowels in British Columbia? I live there, and that doesn’t sound right. (And, no, I am not even going to go into the o it alleges lives in the name Prince Edward Island.)
Let’s see if maybe we can sort some of this out?
Nope, still wrong.
§
Luckily I am patient. Third try on my adopted province is a charm!
Now hear me out. I do honestly believe that if we spent $7 trillion on infrastructure we could fix all … oh, never mind.
§
But one last question, in the interest of future improvements:
§
In sum, in the space of next few exchanges, over the course of 10 minutes, ChatGPT,
failed, multiple times, to properly count to 50
failed, multiple times, to include a full list of all US states
reported that the letter h could be a vowel, at least when it appeared in the word Northwest
couldn’t count vowels to save its electronic life
issued numerous corrections that were wrong, never acknowledging uncertainty until after its errors were called out.
“lied” about having a subconscious. (In fairness, ChatGPT doesn’t really lie; it just spews text that often bears little resemblance to reality, but you get my drift).
The full conversation including all the prompts I used can be found here.
§
As against all the constant claims of exponential progress that I see practically every day, ChatGPT still seems likes pretty much the same mix of brilliance and stupidity that I wrote about, more than two years ago:
§
By coincidence Sayash Kapoor, co-author of AI snake oil, reported some tests of OpenAI’s new Operator agent this morning, pushing the extreme boundaries of intelligence by testing … expense reports.
Things didn’t go that well there either. Although (impressively) Operator was able to navigate to the right websites and even uploaded some things correctly, the train soon derailed:
Great summary. As Davis and I have been arguing since 2019, trust is of the essence, and we still aren’t there.
But honestly, if AI can’t do Kapoor’s expense reports or my simple tables, is AGI really imminent? Who is kidding whom?
§
On my Kindle, this is the book I just started, a classic from an earlier century:
I’ll close with one of the quotes I highlighted in the first chapter:
In reading the history of nations, we find that, like individuals, they have their whims and their peculiarities; their seasons of excitement and recklessness, when they care not what they do. We find that whole communities suddenly fix their minds upon one object, and go mad in its pursuit; that millions of people become simultaneously impressed with one delusion, and run after it, till their attention is caught by some new folly more captivating than the first.
He first published that in 1841.
I am not sure how much has changed.
Gary Marcus first started warning of neural-net-induced hallucinations in his 2001 analysis of multilayer perceptrons and their cognitive limitations, The Algebraic Mind.
I am concerned at the way people unquestioningly turn to these tools, and are prepared to explain away the glaring flaws.
"It's getting better..."
"It needs a clearer prompt...."
If junior staff made this many mistakes, consistently and without learning, they wouldn't last long in many high performing teams.
Of course, we need to take into account that a free version should not be burdened with a lot of expectations (but we know the non-free versions have the same architecture at its core, so will suffer from the same, but in different ways).
Anyway, this reminded me of Ada Lovelace with her warning against computing hype when general computers were still only an idea and a century before they became reality (~1842 ):
"It is desirable to guard against the possibility of exaggerated ideas that might arise as to the powers of the Analytical Engine. In considering any new subject, there is frequently a tendency, first, to overrate what we find to be already interesting or remarkable; and, secondly, by a sort of natural reaction, to undervalue the true state of the case, when we do discover that our notions have surpassed those that were really tenable."
https://ea.rna.nl/2023/11/26/artificial-general-intelligence-is-nigh-rejoice-be-very-afraid/