Marcus on AI

Heiko Hotz

And yet real organisations build real solutions with RAG - in real life situations 🙃

Expand full comment

Morgan Beatty

I for one cannot believe all of this AI doesn’t work perfectly yet because all other technologies that are older work great.

Expand full comment

Çetin Meriçli

Feb 25, 2024

This is not the first time Ng is being "overly optimistic", giving him the benefit of doubt. Here is an article by him some 6 years ago: https://medium.com/@andrewng/self-driving-cars-are-here-aea1752b1ad0

Self-driving cars were not "here" back when he published this, and they are not "here" today. The fundamental challenges that prevent them from becoming ubiquitous have been very well known back then. Especially a very prominent, bright scientist like Ng should have known them.

As innovators, we all have to be stubborn optimists but that doesn't mean we can't be realists or we should ignore plain, clear, fundamental challenges. I believe it is serious disservice to the public to say things like "self-driving cars are here" when you should have known that they are multiple decades away from being table stakes, or "fundamental issues of LLMs will be solved in a few months" when you should know they will not be solved in a few months and more importantly we need several more breakthroughs before we can talk about actual AI, let alone AGI.

Expand full comment

Ijon Tichy

Typos:

"doceuments"

"which is generally believed to be include"

Expand full comment

Aaron Turner

Feb 23, 2024Edited

It's just one intellectually-lazy band-aid after another...

Expand full comment

Heiko Hotz

There is literally nothing “lazy” about building robust RAG pipelines. Anyone who ever built one knows this.

Expand full comment

Clive

Shhh, the fact that businesses have already found much success in implementing RAG runs counter to Gary’s whole AI-skeptic grift

Expand full comment

They are also not disclosing the failures, unfortunately. Nobody wants to admit failure 😅

Expand full comment

God Bennett

Mischaracterization. Lots of hard work reasonably going on.

Why don't you provide a solution?

Expand full comment

Aaron Turner

https://www.bigmother.ai/

Who says I’m not?

Expand full comment

Saty Chary

Hi Gary, so true, about RAG being the next-in-line silver bullet :) By definition, it's a way to do external/extra computation to look up, query, search for, calculate, or reason (using human-orginated knowledge bases/graphs) things that the core LLM can't calculate by itself. So it's a useful technique for sure, but isn't a universal solution for intelligent behavior.

Expand full comment

Feb 29, 2024Edited

All we need are a few more breakthroughs in bionic technology, and we'll be able to have fully functioning wings grafted to our bodies.

We'll be able to take off from the ground, and fly anywhere we want! So cool!

Expand full comment

Mar 1, 2024Edited

Until then we are like those who thought going over the Niagara Falls in a barrel/using RAG LLMs to solve every critical business problem under the sun was a splendid idea…

Expand full comment

Mar 1, 2024Edited

Say what you want about Nassim Taleb, he understands that business is not an algorithmic endeavor. It's empirical, contingent, holistic. It even contains an element of mystery. Some humans have more difficulty with the concept of mystery than others, but there's general consensus agreement among us humans that the Unforeseen exists, as a potential. But I challenge anyone to try to get a computer to comprehend the concept of the Unforeseen.

AI has no empirical logic, because empiricism requires Experience. Experience presupposes the existence of a subject, possessing an experience-having capacity. Experience is required for evaluation. AI doesn't do evaluation. It can borrow someone else's evaluation under some circumstances, but that isn't the same thing. AI is so amorphous an entity that in some sense it's misleading to even refer to AI as an "it."

What AI consists of instead is a deductive logic capability that resides entirely in the idealized realm of programming input, which is combined with a massive calculation capability that processes a continually accumulating amount of data in order to yield output. That's awesome, but it's also entirely insufficient to fulfill the vaunted ambitions of AI programmers.

Expand full comment

Art Keller

"HardFork" podcast (okay, but entirely too credulous of Big Tech claims IMO) had on CEO of Perplexity recently, which to me sounded like RAG. They called it an "answer engine" or some such, but it is pretty similar. LLM working with a function to find and summarize outside source materials. Regardless, it still hallucinates. He partially blamed that on the index (the webcrawler it works with) not updating fast enough. CEO also said there were still "hard problems" to solve, (laughably) tried to tell the podcast hosts, "Don't worry, your job is safe (despite your publisher getting no revenue from answers derived from your work)." https://podbay.fm/p/sway/e/1708077603.

Expand full comment

Vitali K

While I absolutly agree that the current LLMs are quite a bit away from AGI and it is not assured they will eventually lead to AGI, I do differ in the view on Production-viability of RAG.

Sure it does take some effort and wont be able to answer every question in every situation.

But usually in big companies you have a lot of back office workers handling lots of questions and reading large documents while looking for the 1 relevant paragraph.

In my vision you would use the LLM only to encode the original documents into Vector databases and afterwards use the LLM to match the query to fitting vectors. You would not use it to then generate an answer based on these found vectors. You would not use it to search far and wide but let the user sharpen the use case and therefor the space of possibly relevant vectors.

With these relevant vectors returned to the employee they basically just saved a lot of search time and still get to make an informed decision

Expand full comment

Reply (3)

Ijon Tichy

Feb 24, 2024Edited

People already use LLMs to search for information. A non-trivial question is whether LLMs are a cost-effective solution to such a problem. What if you took a tiny fraction of the hundreds of billions of dollars spent on research, training and inference (plus manually checking for hallucinations) and invested that into solving the specific issue you have (e.g. by hiring someone to produce better documentation or by setting up a better information retrieval system)?

Somehow people imagine that the payoff from investment in LLMs will be endless and ultimately free, while the payoff from solving specific problems LLMs are supposed to solve is just a short-term waste of money. I think in most cases the situation is the exact opposite. It's just that convincing your company to invest into a hyped-up piece of AI is much easier than convincing them to invest into something concrete.

Besides, what you specifically describe does not sound like an LLM, but rather like a completely different AI system that would use some of the same components.

Expand full comment

Vitali K

For it to be an interesting business case we (as a company looking into possible applications of this technology) can ignore the R&D costs and only have to evaluate the cost to run those LLMs and the infrastructure mentioned above, those are far less and actually decently cheap.

In my mind LLMs are quite good at understanding language&intent, basically an evolution of the classic chatbot intent recognition and definitly better than those we have built ourselves. Therefor we can use them to understand questions and match those with our own data. But as Marcus has mentioned there are several problems with LLMs so you have to think of them not as the complete&perfect solution but rather a tool to use at very specific points in a busines sprocess.

Like I said, its not AGI, but that doesnt mean its not useful

Expand full comment

sure, I can imagine some value in this more scoped use case

Expand full comment

I worked in several huge multinationals and I could always find what I needed to know via graph based search engines. So it’s not clear to me what hit and miss RAG LLMs bring to the table…

Expand full comment

A Thornton

Feb 26, 2024

Aside from the fact AI people can't write a paper without committing the Anthropomorphic Fallacy* this puts the kabosh on LLMs for legitimate uses and as a way forward.

"Hallucination has been widely recognized to be a significant drawback for large language models (LLMs). There have been many works that attempt to reduce the extent of hallucination. These efforts have mostly been empirical so far, which cannot answer the fundamental question whether it can be completely eliminated. In this paper, we formalize the problem and show that 𝗶𝘁 𝗶𝘀 𝗶𝗺𝗽𝗼𝘀𝘀𝗶𝗯𝗹𝗲 𝘁𝗼 𝗲𝗹𝗶𝗺𝗶𝗻𝗮𝘁𝗲 𝗵𝗮𝗹𝗹𝘂𝗰𝗶𝗻𝗮𝘁𝗶𝗼𝗻 𝗶𝗻 𝗟𝗟𝗠𝘀. Specifically, we define a formal world where hallucination is defined as inconsistencies between a computable LLM and a computable ground truth function. " [emphasis added]

Xu, Ziwei, Sanjay Jain, and Mohan Kankanhalli. "Hallucination is inevitable: An innate limitation of large language models." arXiv preprint arXiv:2401.11817 (2024).

It's almost like in any reasonably useful and complicated mathematical system there will always be both true and false statements that cannot be proved within that system. Maybe someone should formalize that insight and write it up for 𝘔𝘰𝘯𝘢𝘵𝘴𝘩𝘦𝘧𝘵𝘦 𝘧ü𝘳 𝘔𝘢𝘵𝘩𝘦𝘮𝘢𝘵𝘪𝘬 𝘶𝘯𝘥 𝘗𝘩𝘺𝘴𝘪𝘬? Bet it would get a lot of attention.

* attributing human emotions and characteristics to inanimate objects and aspects of nature, such as plants, animals, or the weather.

(and an 'ACK' to Prof. Marcus for bringing the paper to my attention)

Expand full comment

Sayandev Mukherjee

Apr 10Edited

@garymarcus and @A Thornton: for a less formal but more mechanistic explanation of why training a large transformer-based language model on a large training set will yield hallucinations (but with a nuanced take on what those hallucinations may be), please take a look at the paper by Bernardo A. Huberman and me on SSRN: https://dx.doi.org/10.2139/ssrn.4676180

Expand full comment

Purnima Gauthron

I have a theory... OpenAI described their recent meltdown on account of an update to "optimize" the user experience. Since RAG is the process of optimizing the output of a large language model, one has to wonder if it is production worthy. Now why OpenAI they didn't use a suite of auto regression tests before updating the new release is a mystery. And why Google didn't auto regress Gemini on its racial bias updates is another mystery .. but will table those questions.

[Regression Testing is defined as software testing to ensure that a recent code change has not adversely impacted the existing functionalities]

Separately I havn't had this much fun with a new technology. The stuff that came out of the valley for decades was robust, reliable, well-engineered, ... now I look at semiconductor industry updates from my previous life and everything works. It's boring !@!!

Expand full comment

Ah for the days in which a one in a billion mistake was considered to be a major scandal, and the thought of not being able multiply a pair of six digit integers would have been unthinkable.

Expand full comment

Purnima, I am pretty sure there was a mountain of regression tests that the engineers prepared. But I think it is impossible to catch everything. The next release will just contain an equally huge pile of bandaid fixes until another embarrasing puncture is publicised.

Expand full comment

Exactly!

Expand full comment

Using an LLM as a natural language interface to a traditional search engine (and a summariser of the returned documents) is just not very interesting conceptually.

Current generation LLMs hallucinate (while humans tend to say “no idea”) which makes them less useful for business tasks, but as scientific/engineering objects they are far more exciting. Early days and it will take decades to figure out what’s going on. Looks like we will drop the GPT architecture perhaps next year or the year after.

Expand full comment

Richard Self

Nor is it actually very successful. I tried using Gemini for the same exercise as Gary has included here but it mostly would not produce any summary CV.

Even when I asked it to produce a summary of a copy of my main UoD CV (having removed the top few lines which identified me) it produced a very uninteresting mishmash.

Remember that RAG is only a way to add more context for the transformer of the LLM to work with. It is essentially Prompt Engineering on steroids.

Expand full comment

Feb 24, 2024Edited

To your point (and Gary Marcus's article summarising your experience and those of others), I managed on my first attempt to get ChatGPT using RAG to base its answer on an unreliable source (a far right conspiracy website) AND to hallucinate the content, making garbage up out of the already unreliable representation. (For clarification, I did not point ChatGPT to any source, it found it all by its lonesome).

Expand full comment

Feb 24, 2024Edited

Obviously performance at any one point in time is pertinent - for a business, what it can do for us today defines its present usefulness - but these arguments are always subject to the "wait 'til next release" or "you did it wrong - here's a better prompt" response. Also, it's easy to make CGPT-4 get something very simple wrong every time ("which has more rainfall, Edinburgh or Amsterdam?") and it will often get that wrong even with RAG switched on. As I said, for me, LLMs are interesting and cool, summarising traditional search engine results - not so much.

Expand full comment

(Thinking in particular about the mamba architecture).

Expand full comment

Feb 29, 2024Edited

Matt Taibbi just wrote a Substack post about what Gemini did when he inquired about some of his own writing:

"...With each successive answer, Gemini didn’t “learn,” but instead began mixing up the fictional factoids from previous results and upping the ante, adding accusations of racism or bigotry. “The Great California Water Heist” turned into “The Great California Water Purge: How Nestle Bottled Its Way to a Billion-Dollar Empire—and Lied About It.” The “article” apparently featured this passage:

~~Look, if Nestle wants to avoid future public-relations problems, it should probably start by hiring executives whose noses aren’t shaped like giant penises.~~

I wouldn’t call that a good impersonation of my writing style, but it’s close enough that some would be fooled, which seems to be the idea.

An amazing follow-up passage explained that 'some raised concerns that the comment could be interpreted as antisemitic, as negative stereotypes about Jewish people have historically included references to large noses.'

I stared at the image, amazed. Google’s AI created both scandal and outraged reaction, a fully faked news cycle: https://www.racket.news/p/i-wrote-what-googles-ai-powered-libel ..."

More at the jump. Go on, take that leap.

Expand full comment

David Hunter

Feb 26, 2024

hard to take seriously a criticism of RAG by someone running some prompts through Bing Chat - that is a black box. You have to know the details of the RAG to draw conclusions

Expand full comment

Mar 4, 2024

We can infer from public facts. Copilot is based on a snapshot of GPT 4 running in Azure, “frozen” way before I set up Zingrevenue, my startup, late last year (as per the article). So the links and “facts” that Copilot supplied in my interactions with it were obviously supplemented by Bing as they were recent ones. Hence we can critique the wayward links and broken detail as RAG hallucinations. And Copilot is one of the world’s best RAG LLMs due to simple fact that MSFT’s (and Satya Nadella’s) reputation is riding on its performance and accuracy; the resources Redmond must be devoting to keeping it impressive must be very, very significant.

Also, the wording of your question suggests that I may not be in a position to understand how RAG systems work. While I am unable to supply commercial screenshots of my Falcon 40b LLM instance running with multiple GPUs on my GCP GKE cluster nor the source code of my TF2 containers in my private VPC, nor can I provide screenshots of all the ETL pipelines I have been building over a decade (down to the level of checking hex code of binary data payloads) to visualise that I know a little about the RAG data needed to keep LLMs in check, nor can I post the horribly complex decision trees spanning nearly a couple of decades, I beg to differ.

So, although it is true I don’t have access to Bing’s source code so yes you are right, it is a black box, Bing’s misdirected output is sufficient for me to cast my verdict.

Expand full comment

Comment deleted

Mar 1, 2024Edited

Comment deleted

Expand full comment

https://www.linkedin.com/pulse/building-tensorflow-ai-from-source-container-simon-au-yong-joo9c?trk=public_post

Mar 1, 2024Edited

What I can show you though is an easy peasy way to prepare an unprivileged OCI container that builds the latest copy of Tensorflow 2, the basis for my RAG work, from scratch, if that’s something you fancy 😉

Expand full comment

Anyways

Last year I attended a “hackathon” during which this became a significant issue. The machine literally made up “current” statistical data about the disparities among the different housing populations of Hudson Valley. After being asked to link to the sources, none of the links provided actually existed. Just imagine what kind of misinformation damage a faulty RAG system can do. This concern is beyond a dollar amount, and I hope to see it resolved in a way that benevolently furthers the technology.

Expand full comment

Eric Cort Platt

Gary, what do you mean by "neurosymbolic AI"?

Expand full comment

subject of a future essay, or see my 2000 arxiv The Next Decade in AI

Expand full comment

Comment removed

Comment removed

Expand full comment

Matt Hawthorn

I think we will need a merger of the current paradigm with some kind of flexible probabilistic reasoning system (and by "reasoning" I don't mean poetic speculation about whether that's what a given inscrutable pile of linear algebra is "really" doing, I mean really strong built-in priors for actual symbolic manipulation). Arguably this was the architecture of AlphaGo and in my humble opinion that was a much more successful project than the current LLMs - besting the world's most expert humans in a highly technical domain and inventing whole new strategies that humans hadn't even thought of after playing the game for ~3000 years or so. You kind of *need* the creative capacity to go out on a limb, try something new that *seems* like it could be right (hallucinate, if you will), but if that's all you have then you're in a muddle. You also need symbolic reasoning to verify that the wild ideas are worth pursuing, but on the other hand if *that's* all you have then you're too rigid to succeed in challenging new circumstances. If symbols on their own were enough, then strong AI would have been cracked in the 60s or 70s with all the Lisp people.

Why are humans so successful, a quantum leap above all other animals in our command of our environment? Mammalian instincts certainly help, but symbols are what sets us apart. How can we build bridges, satellites, smartphones, communication networks? Symbols. Imagine trying to do engineering, physics, computer science without symbols. Imagine Schrodinger and Heisenberg and Einstein without symbols.

Expand full comment

Feb 29, 2024Edited

I'm inclined to think that there's more to the animal basis of human intelligence than "mammalian instincts." It's also about localized embodiment- the possession of an input and processing network including the totality of the nervous system, from the frontal and temporal lobes of the cerebrum through the cerebellum and medulla, the spinal cord, the nerves controlling body functions and musculature, on out to the ends of the peripheral nervous system.

The requirements of the biological body are what ground and orient the functional utility of human intelligence. To resort to an imprecise metaphor, that nexus resembles a comparator function in an electronic device, which requires a carrier wave. A radio tuner without a carrier wave has no processing stability. It's unmoored.

I think that something akin to that phenomenon accounts for AI gobbledygook. The problem there is that AI is never going to generate that locus on its own, any more than an engine- even the most powerful and precisely manufactured engine- is going to generate the transmission, drive train, wheels, etc. of a vehicle to surround it. Unlike the case with humans, who developed our cerebral capacity at the end of a very long chain of events that began with the dire necessity to possess an organismic body in order to sustain living animate existence. That requirement has informed every development in animal>>human neural processing that's taken place since then.

(I could go off into metaphysical speculations about humans developing sufficient complexity of conscious functioning that we may have a latent capability to access a higher level of intelligence extending beyond the body. But if that's the case- or a possibility- we've nonetheless required eons of organismic embodiment in order for our neural networks (not an anthropomorphic metaphor) to reach that level of sophistication. As a series of booster stages, so to speak. But, I'm only speculating...my own experiments in lucid dreaming could be explained all sorts of ways, notwithstanding the fact that I've had some success at that project.)

What that reality implies is that it isn't nearly sufficient for AI researchers to develop (some) temporal and frontal lobe functions in order to achieve artificial consciousness. Attaching those functions to motor command modules of robots isn't sufficient, either. There's still no embodiment comparator function present. As far as what the AI processor is doing, it's still unmoored from the baseline of perception and cognition found in terrestrial organisms. That sort of limitation on limitation is a requirement in order to achieve guaranteed "alignment" with human intelligence for the purpose of harm avoidance to the lifeforms of the planet, I think. I'm having a difficult time imagining how that mooring could be accomplished. The bottom line of AI is that it's nothing more than an electric circuit. There's no gravity, no internal sense of identity or chronological time sense, no prioperception, no mammalian/primate/human bandwidth tropisms or limitations. Computers can store a zillion images without a visual sense, and a zillion samples of music and speech without an auditory sense. In can mimic some aspects of tactile perception, but it feels neither pain or pleasure.

Perhaps it would be instructive to consider and catalog CPU capabilities not only in terms of what functions of the human brain they don't presently possess- but that which they're logically foreclosed from ever possessing, as autonomous features. There's a crucial difference between programming a robot to perform functions associated with human neural processing, and programming it to have the conscious experience of carrying out the tasks.

There's an article in the Washington Post today about a human-humanoid robot "social encounter." The article is charming and whimsical; the robots sound quite endearing in some respects. But at the foundation, the achievement is still stage magic. The robots are merely sophisticated ventriloquist dummies, with the ventriloquist(s) being the human programmer(s.)

Expand full comment

Comment removed

Comment removed

Expand full comment

Matt Hawthorn

Feb 25, 2024

You're ignoring the fact that AlphaGo has a tree search built in. The policy network's instantaneous best guess would not beat the best human players - the tree search is required to reason about the consequences of that guess. Arguably this part is symbolic reasoning. The policy and value networks (subsequently fused into one in AphaZero) were indeed learned and were necessary to the success. I don't dispute that. But what I'm saying is that AlphGo wouldn't have succeeded without the plain old tree search also being built in - the success was because the learned parts provided very good heuristics for pruning the search space.

Expand full comment

Comment removed

Comment removed

Expand full comment