You may have read yesterday’s New York Times report by Cade Metz and others on how many of the biggest AI companies have been cutting ethical corners in a race to gather as much data as possible (“OpenAI, Google and Meta ignored corporate policies, altered their own rules and discussed skirting copyright law as they sought online information to train their newest artificial intelligence systems”).

All this happened upon the realization that their systems simply cannot succeed without even more data than the internet-scale data they have already been trained on.

Maybe the starkest image there is OpenAI’s then-president Greg Brockman personally working on lining up YouTube videos to download and transcribe—very likely knowing that he was entering a legal grey area—yet desperate to feed the beast. If it all falls apart, either for legal reasons or technical reasons, that image may linger.

Both the Times and Wall Street Journal also covered the current race to synthetic data. I had foreseen all of this — both the data guzzling and the need for synthetic data — in my 2018 paper Deep Learning: A Critical Appraisal. As Jason Pontin concisely put it in an opinion essay in WIRED, maybe the only mainstream outlet to take note at the time, the thrust of my critique was that Deep Learning was “Brittle, Greedy, Opaque, and Shallow”.

Fast forward six years, and those problems are exactly what is ailing GenAI today: the brittleness (a term for unreliability); the greed for data that is causing the desperation that the Times described; the opacity (uninterpretability) that makes debugging, fairness, and engineering so hard; and the shallowness, meaning that generalizations are never complete.

What makes all this tragic is that many of us have tried so hard to warn the field that we would wind up here, and people like LeCun (who haughtily declared that my 2018 critique was “mostly wrong”, moments after it posted) have been routinely dismissive, both of my critiques (often co-authored with Ernest Davis) and others, such as concerns about AI and bias and social injustice, rightfully and repeatedly raised by researchers such as Latanya Sweeney, Timnit Gebru, Kate Crawford, Abeba Birhane, and Margaret Mitchell.

In my own case, it feels as if literally everything I have been warning about is finally coming home to roost:

It’s all happening now. Every single point was originally ignored or dismissed (often most vocally by LeCun, who for example called my initial characterization of reasoning failures in LLM’s a “rear-guard action”).

Not one of these problems has been adequately addressed.

Today we are left with two things:

• an intellectual monoculture — with no alternative approach that is nearly as well-funded.

• a pile of unreliable systems that are unlikely to live up to the hype.

Only with a more open-minded field can we hope to make progress.

We must invest heavily in alternative approaches and stop funding a losing hand to the infinite degree, crowding everything out.

Gary Marcus desperately hopes the field of AI will again start to welcome fresh ideas.

