26 Comments
Mar 16·edited Mar 17Liked by Gary Marcus

I think I rather wish the development of GenAI technology would slow down, passing through a consolidation phase, in order the necessary regulations, legal assessments, large public awareness, could really keep up.

Expand full comment

Generative AI will slow down just because massive use of data can't give the same kick again and again. They will have to start doing more diligent work, and that will take more time to show results. There is great potential though for solid improvements for years to come.

Expand full comment
Mar 19Liked by Gary Marcus

As foundation models become bigger and bigger, accelerated by new iterations of GPUs and TPUs, LLMs will become more and more unpredictable. We’ll experience more productivity upheavals as model drift, “laziness” and irreversible results (from vendors and companies fine tuning LLMs) drain users of time and money. It may culminate with serious injuries or even death directly caused by LLM-driven autonomous machines, triggering bipartisan outrage and swift knee-jerk overreactions; we have watched this movie before (“it’s a wrap!”).

Expand full comment

"There has however been lots of progress both in discovering potential applications and in putting GPT-4 class models into practice."

Put into practice, but how far from being reliable commercial products?

Expand full comment

I think we have no idea yet what its most useful or powerful practical application will be, but I feel certain the envelope of uses will broaden until some blockbuster use case presents itself. When the Internet was popularized, no one had any ideas that in 20 years we would be using it largely for social media that sells massive amounts of advertising.

Expand full comment
Mar 16Liked by Gary Marcus

While this is hyperbole, I think it also *is* the answer, or at least one of the most likely parts: Google is dead if it doesn't find a way to make Gemini include placed ads - and so are any of the big players in the ad business. Yesterday's post about the enshittification of science likewise applies to social media, so there is a huge pressure on all of these companies for both procurement and AI-upgrading their business models - because in the end my guess is that aside of vanity entertainment (which sadly will be a race to the bottom compared to the "golden days" of Netflix being not-yet-enshittified), social media will be ghost towns where bots (try to) sell stuff to other bots.

Expand full comment

But that's no guarantee that in 20 years the Internet will still be driven by "using it largely for social media that sells massive amounts of advertising." There's a "fad novelty" element present in the dawning of the Internet Age, as well as a "true novelty with lasting benefits" aspect. There isn't anything inherently meritorious about those priorities.

In that respect, the historical similarities between the dawning of radio (and later TV) broadcasting and this early, not-yet-mature era of the Internet also need to be acknowledged. At the outset of radio broadcasting, there was a debate as to whether a new communication technology so potentially powerful should even be used for commercial advertising. The decision was made to allow advertisements to support the musical content. At first, the advertisers had some sense that the content came first. But by the 1980s, the musical content was subordinated to suit the priorities of the Advertisers. And a once diverse spectrum of regional commercial networks was turned into an oligopoly through massive national syndication ( iHeart Radio, 855 stations; Cumulus Media, 428 stations; Audacy Media, 285 stations.) And now it's generally agreed that practically all commercial music radio sucks, even among those who still listen to it. From a cultural or community standpoint, most of the original uses of music radio no longer apply. It's an advertising medium. Nothing more. Circa 2024, the cultural role of commercial radio is to promote venality, mass marketing, and cynicism that anything could be different.

As with the increasing encroachment of ad priorities and other forms of enclosure on the medium of radio, most people don't like what's happened to the Internet either. It's simply become the status quo ante that we're supposed to be resigned to, in perpetuity. "American Free Market, ruh-roh", with all of the related unexamined assumptions.

In the years since 2002, Google Search has morphed into something that's now in some ways practically unrecognizable--subordinating straightforward keyword search capability to its role as a platform for advertising. The company has grown to a net capitalization value of $1.3 trillion. For what purpose? What was lost in the process?

Those questions are overdue for a conversation. I for one am amazed that so much AI research has concentrated on its supposed potential to manipulate human social psychology with images and unresponsive static boilerplate. Now much of that potential is being exposed as an illusion. Setting aside the high probability that fixing it may be a sunk-cost fallacy: is that particular capability even worth fixing? Aren't there more worthy purposes for AI than Social Media, Superficial Image Generation, and Marketing Exploitation?

Expand full comment

Sure, I agree, it's way too early to know anything. I guess I'm wondering are there any indications that this version of AI is better analogized to the success path of the internet or say something like blockchain.

Expand full comment

I'm wondering that, too. And I'm wondering a lot of other things about AI, because I can't help but think that this technology has a unique potential to offer planning assistance in some (not all) realms of human endeavor. As long as the humans doing the prioritizing and programming don't get carried away.

Expand full comment
Mar 16·edited Mar 17

I have an AGI ranking system (see The BigMother Manifesto, pp 13-14, https://www.bigmother.ai) that allows me to broadly compare AGIs (from sub-human-level up). According to this system, (in my assessment) GPS (General Problem Solver), developed in the 1950s and 60s, was an AGI K (able to solve problems across a number of domains, with the quality of some problem solutions exceeding average-human-level, but none achieving super-human level, i.e. better than all humans). Fast forward 60 years later, and GPT-4, Gemini Ultra, and Claude 3 Opus are all (in my assessment) AGIs, and all also AGI K (see The GPT-4 Generation: Why Are the Best AI Models Equally Intelligent? https://www.thealgorithmicbridge.com/p/the-gpt-4-generation-why-are-the), suggesting that AGI K is a difficult barrier to overcome. Anyone care to guess what's missing...?

Expand full comment

Thanks, Aaron. Can’t wait to check this out!

Expand full comment

Hmmm.....You know, the most powerful reinforcement schedule – and here I'm thinking for the humans, not the machines – has an element of randomness in it. They could be throwing good money after bad for awhile.

Expand full comment

Claude is pretty amazing so far!! I am wondering what metrics we use when we are declaring a plateau?

Expand full comment

It is very premature to claim a plateau has been reached. Never in history of tech was a 1 year interval considered enough of a time in which to measure progress.

Vendors are collecting very valuable data when it comes to how their chatbots are used and where they are failing. Such data will be cleaned up, then used for training, together with much synthetic data illustrating relevant use cases.

Use of external tools and RAG is still in infancy. We will likely see separate teams working on specialized chatbots, with each pursuing strategies most likely to make sense for that domain. Lots of exciting stuff to come.

Expand full comment

I’m curious; what do you consider the more likely scenario? Are we close to a plateau or are competitors in fact aiming for on-par performance with the leading model, because it costs big $ to surpass (i.e it’s intentional)?

Expand full comment
author

plateau. and maybe some cost-benefit / diminishing returns calculations

Expand full comment

Plateau. Every time the latest LLMs are scaled by two or three orders of magnitude, the distance remaining to human-level AGI will roughly halve (Zeno's Dichotomy Paradox).

Expand full comment

Or "HAPPY 1sT BRRHDAY GPT-4-! 4" as the case may be. 😆

(Did you use DALL·E?)

Expand full comment
author

via Bing, yep

Expand full comment

I found that Anthropic's Claude 3 Sonnet beat GPT-4 decisively in a simple benchmarking exercise I conducted last week: comparison and analysis of 10 short passages of text, where some (but not all) passages contained related topics and themes. We'll see how Mistral does...

Expand full comment
author

i ran a poll and it was split down the middle, with each probably having some specific areas of strenth but no single decisive winner

Expand full comment

I expect it probably varies case-by-case. What I wish I had more insight into were the hotfixes and training updates to the live models... especially with closed-source (even it does come from a B-Corp) it feels a little dicey to build a product around.

Expand full comment

Not so much with math. I recently tested Claude 3 on its math capability, back and forth exchanges a couple times (middle school level stuff) it started giving me mathematical gibberish, just like ChatGPT/GPT4/Bard before it. Just an example response from Claude was this following statement it made, which does not make any sense. "In summary, in 2D and 1D, the equation x^2+x^2+z^2=100 does not properly define a standard geometric object like a curve, shape, or set of points, due to the duplicated x^2 terms." This is entirely incorrect, the answer is that the equation describes an ellipse. See at the beginning I started with asking standard questions like "x^2+y^2+z^2", which much appear in multitude of online examples to be drawn as training examples. But as soon as I change the term y^2 to x^2 so now there are two x^2 terms, it gets tripped up. It simply regurgitate from training examples as opposed to model the equation and do real calculation. This is what I mean by saying all along those glamorous statistics for these LLM being basically meaningless. Every time I tested out these LLM in math with my own questions, they started spewing nonsense real fast.

Expand full comment

uhhhh, Claude 3 is far better than GPT4. Q* is waiting in the wings and seems to be possible candidate for true AGI, though of course that's entirely based on speculation at this point. We still don't know what spooked Ilya but best candidate is Q* and its capabilities. This post and similar posts by Prof. Marcus remind me of Lord Kelvin's famous statements about physics being almost complete in the late 19th C.

Expand full comment

The next horror movie about the first LLM-driven long haul airliner will smash records at the box office 😜

Expand full comment

Happy birthday dream generator )

Expand full comment