The Imminent Enshittification of the Internet
LLMs are creating a huge sanitation problem that will probably never be solved
My spelling checker doesn’t know the word enshittification, but I didn’t make it up (as far as I know Cory Doctorow did, thank you, Cory), and I am pretty sure you know what I mean by it, whether or not it happens to be in your training corpus.
And it’s on: the LLM-driven enshittification of the internet.
§
I first warned of a LLM-driven deluge on February 12, a few days after Kevin Roose described his sense of “awe” at the unveiling of the new GPT-4 powered Bing—and a few days before it notoriously recommended he get a divorce—in that brief initial period in which Bing and GPT-4 were viewed through entirely rose-colored glasses.
I didn’t believe a word of it, and foresaw serious problems. The art I used, created for me by Sasha Luccioni, using Dall-E 2, was this:
The general thesis of the article was the Google ought be pretty concerned by what was coming; the subtitle was “How sewers of lies could spell the end of web search”.
§
Since then I’ve been collecting examples on Twitter; here are two of my favorites:
and a month later
But things have actually gotten worse than I anticipated.
It’s not just the internet, that’s been infested; it’s Amazon, too, as people are using GPT to create low-quality books:
And even worse, Amazon is starting to be a receptacle for inaccurate and exploitative quickie histories of recent events:
Adding to the chaos, the search engines are now sucking down each other’s garbage, as in this blunder from Bard that turns out to have been nicked from ChatGPT, an example of what Ernest Davis and I long ago called the “echo chamber effect”:
And of course LLMs are fully capable of make up truly terrible advice, too, like this disconcerting recipe from last week:
And once they do make stuff up like that, that garbage will circulate, from one stochastic parrot to the next.
§
The close of my February essay seems even more pertinent now; forgive me for repeating it.
Cesspools of automatically-generated fake websites, rather than ChatGPT search, may ultimately come to be the single biggest threat that Google ever faces. After all, if users are left sifting through sewers full of useless misinformation, the value of search would go to zero—potentially killing the company.
For the company that invented Transformers—the major technical advance underlying the large language model revolution—that would be a strange irony indeed.
Postscript, two hours later: No enumeration of internet excrement would be complete without this regularly updated list of bogus news sites, compiled by NewsGuard: https://www.newsguardtech.com/special-reports/ai-tracking-center/ — 421 sites and counting, “operating with little to no human oversight”. And, to quote the lyricists Nichols and Williams, as made famous by the Carpenters, “We’ve only just begun”.
Gary Marcus is host of the limited 8-part podcast Humans versus Machines (available wherever you get your podcasts), and author (or co-author) of 5 books, including Rebooting AI.
Taught a class on AI this summer. For the first two writing assignments the students were given a subject and required to use an LLM of their choice (mostly ChatGPT) to create the first draft. Then rewrite the paper.
Both versions were submitted to complete the assignment. Then the papers were posted on a discussion board and each student was required to review 3 other papers.
The rewritten paper was consistently better than the LLM version, both in content, organization and readability.
I tried to drive home the point that "You write to be read.". If you generate and publish LLM - pretty soon no one will read you.
Frankly, the internet has been a huge cesspool for quite awhile now. You could deal with it by paging thru Google results and sorting out the clickbait. But even that technique was getting harder.
Did you see "When AI Is Trained on AI-Generated Data, Strange Things Start to Happen" by Maggie Harrison in Futurism?