89 Comments
Feb 23·edited Feb 23Liked by Gary Marcus

AI GENERAL'S WARNING: This product produces poetry and jokes ... it is not intended for use as a search engine, truth generator, fact retrieval system, or to be depended on in any real-life situation... 😂

Expand full comment

I for one cannot believe all of this AI doesn’t work perfectly yet because all other technologies that are older work great.

Expand full comment

This is not the first time Ng is being "overly optimistic", giving him the benefit of doubt. Here is an article by him some 6 years ago: https://medium.com/@andrewng/self-driving-cars-are-here-aea1752b1ad0

Self-driving cars were not "here" back when he published this, and they are not "here" today. The fundamental challenges that prevent them from becoming ubiquitous have been very well known back then. Especially a very prominent, bright scientist like Ng should have known them.

As innovators, we all have to be stubborn optimists but that doesn't mean we can't be realists or we should ignore plain, clear, fundamental challenges. I believe it is serious disservice to the public to say things like "self-driving cars are here" when you should have known that they are multiple decades away from being table stakes, or "fundamental issues of LLMs will be solved in a few months" when you should know they will not be solved in a few months and more importantly we need several more breakthroughs before we can talk about actual AI, let alone AGI.

Expand full comment
Feb 24Liked by Gary Marcus

Typos:

"doceuments"

"which is generally believed to be include"

Expand full comment
Feb 23·edited Feb 23Liked by Gary Marcus

It's just one intellectually-lazy band-aid after another...

Expand full comment
Feb 23Liked by Gary Marcus

Hi Gary, so true, about RAG being the next-in-line silver bullet :) By definition, it's a way to do external/extra computation to look up, query, search for, calculate, or reason (using human-orginated knowledge bases/graphs) things that the core LLM can't calculate by itself. So it's a useful technique for sure, but isn't a universal solution for intelligent behavior.

Expand full comment
Feb 29·edited Feb 29Liked by Gary Marcus

All we need are a few more breakthroughs in bionic technology, and we'll be able to have fully functioning wings grafted to our bodies.

We'll be able to take off from the ground, and fly anywhere we want! So cool!

Expand full comment

"HardFork" podcast (okay, but entirely too credulous of Big Tech claims IMO) had on CEO of Perplexity recently, which to me sounded like RAG. They called it an "answer engine" or some such, but it is pretty similar. LLM working with a function to find and summarize outside source materials. Regardless, it still hallucinates. He partially blamed that on the index (the webcrawler it works with) not updating fast enough. CEO also said there were still "hard problems" to solve, (laughably) tried to tell the podcast hosts, "Don't worry, your job is safe (despite your publisher getting no revenue from answers derived from your work)." https://podbay.fm/p/sway/e/1708077603.

Expand full comment

While I absolutly agree that the current LLMs are quite a bit away from AGI and it is not assured they will eventually lead to AGI, I do differ in the view on Production-viability of RAG.

Sure it does take some effort and wont be able to answer every question in every situation.

But usually in big companies you have a lot of back office workers handling lots of questions and reading large documents while looking for the 1 relevant paragraph.

In my vision you would use the LLM only to encode the original documents into Vector databases and afterwards use the LLM to match the query to fitting vectors. You would not use it to then generate an answer based on these found vectors. You would not use it to search far and wide but let the user sharpen the use case and therefor the space of possibly relevant vectors.

With these relevant vectors returned to the employee they basically just saved a lot of search time and still get to make an informed decision

Expand full comment

Aside from the fact AI people can't write a paper without committing the Anthropomorphic Fallacy* this puts the kabosh on LLMs for legitimate uses and as a way forward.

"Hallucination has been widely recognized to be a significant drawback for large language models (LLMs). There have been many works that attempt to reduce the extent of hallucination. These efforts have mostly been empirical so far, which cannot answer the fundamental question whether it can be completely eliminated. In this paper, we formalize the problem and show that 𝗶𝘁 𝗶𝘀 𝗶𝗺𝗽𝗼𝘀𝘀𝗶𝗯𝗹𝗲 𝘁𝗼 𝗲𝗹𝗶𝗺𝗶𝗻𝗮𝘁𝗲 𝗵𝗮𝗹𝗹𝘂𝗰𝗶𝗻𝗮𝘁𝗶𝗼𝗻 𝗶𝗻 𝗟𝗟𝗠𝘀. Specifically, we define a formal world where hallucination is defined as inconsistencies between a computable LLM and a computable ground truth function. " [emphasis added]

Xu, Ziwei, Sanjay Jain, and Mohan Kankanhalli. "Hallucination is inevitable: An innate limitation of large language models." arXiv preprint arXiv:2401.11817 (2024).

It's almost like in any reasonably useful and complicated mathematical system there will always be both true and false statements that cannot be proved within that system. Maybe someone should formalize that insight and write it up for 𝘔𝘰𝘯𝘢𝘵𝘴𝘩𝘦𝘧𝘵𝘦 𝘧ü𝘳 𝘔𝘢𝘵𝘩𝘦𝘮𝘢𝘵𝘪𝘬 𝘶𝘯𝘥 𝘗𝘩𝘺𝘴𝘪𝘬? Bet it would get a lot of attention.

* attributing human emotions and characteristics to inanimate objects and aspects of nature, such as plants, animals, or the weather.

(and an 'ACK' to Prof. Marcus for bringing the paper to my attention)

Expand full comment

I have a theory... OpenAI described their recent meltdown on account of an update to "optimize" the user experience. Since RAG is the process of optimizing the output of a large language model, one has to wonder if it is production worthy. Now why OpenAI they didn't use a suite of auto regression tests before updating the new release is a mystery. And why Google didn't auto regress Gemini on its racial bias updates is another mystery .. but will table those questions.

[Regression Testing is defined as software testing to ensure that a recent code change has not adversely impacted the existing functionalities]

Separately I havn't had this much fun with a new technology. The stuff that came out of the valley for decades was robust, reliable, well-engineered, ... now I look at semiconductor industry updates from my previous life and everything works. It's boring !@!!

Expand full comment

Using an LLM as a natural language interface to a traditional search engine (and a summariser of the returned documents) is just not very interesting conceptually.

Current generation LLMs hallucinate (while humans tend to say “no idea”) which makes them less useful for business tasks, but as scientific/engineering objects they are far more exciting. Early days and it will take decades to figure out what’s going on. Looks like we will drop the GPT architecture perhaps next year or the year after.

Expand full comment

Matt Taibbi just wrote a Substack post about what Gemini did when he inquired about some of his own writing:

"...With each successive answer, Gemini didn’t “learn,” but instead began mixing up the fictional factoids from previous results and upping the ante, adding accusations of racism or bigotry. “The Great California Water Heist” turned into “The Great California Water Purge: How Nestle Bottled Its Way to a Billion-Dollar Empire—and Lied About It.” The “article” apparently featured this passage:

~~Look, if Nestle wants to avoid future public-relations problems, it should probably start by hiring executives whose noses aren’t shaped like giant penises.~~

I wouldn’t call that a good impersonation of my writing style, but it’s close enough that some would be fooled, which seems to be the idea.

An amazing follow-up passage explained that 'some raised concerns that the comment could be interpreted as antisemitic, as negative stereotypes about Jewish people have historically included references to large noses.'

I stared at the image, amazed. Google’s AI created both scandal and outraged reaction, a fully faked news cycle: https://www.racket.news/p/i-wrote-what-googles-ai-powered-libel ..."

More at the jump. Go on, take that leap.

Expand full comment

hard to take seriously a criticism of RAG by someone running some prompts through Bing Chat - that is a black box. You have to know the details of the RAG to draw conclusions

Expand full comment

Last year I attended a “hackathon” during which this became a significant issue. The machine literally made up “current” statistical data about the disparities among the different housing populations of Hudson Valley. After being asked to link to the sources, none of the links provided actually existed. Just imagine what kind of misinformation damage a faulty RAG system can do. This concern is beyond a dollar amount, and I hope to see it resolved in a way that benevolently furthers the technology.

Expand full comment

Gary, what do you mean by "neurosymbolic AI"?

Expand full comment