26 Comments
Mar 16·edited Mar 17Liked by Gary Marcus

I think I rather wish the development of GenAI technology would slow down, passing through a consolidation phase, in order the necessary regulations, legal assessments, large public awareness, could really keep up.

Expand full comment
Mar 19Liked by Gary Marcus

As foundation models become bigger and bigger, accelerated by new iterations of GPUs and TPUs, LLMs will become more and more unpredictable. We’ll experience more productivity upheavals as model drift, “laziness” and irreversible results (from vendors and companies fine tuning LLMs) drain users of time and money. It may culminate with serious injuries or even death directly caused by LLM-driven autonomous machines, triggering bipartisan outrage and swift knee-jerk overreactions; we have watched this movie before (“it’s a wrap!”).

Expand full comment

"There has however been lots of progress both in discovering potential applications and in putting GPT-4 class models into practice."

Put into practice, but how far from being reliable commercial products?

Expand full comment
Mar 16·edited Mar 17

I have an AGI ranking system (see The BigMother Manifesto, pp 13-14, https://www.bigmother.ai) that allows me to broadly compare AGIs (from sub-human-level up). According to this system, (in my assessment) GPS (General Problem Solver), developed in the 1950s and 60s, was an AGI K (able to solve problems across a number of domains, with the quality of some problem solutions exceeding average-human-level, but none achieving super-human level, i.e. better than all humans). Fast forward 60 years later, and GPT-4, Gemini Ultra, and Claude 3 Opus are all (in my assessment) AGIs, and all also AGI K (see The GPT-4 Generation: Why Are the Best AI Models Equally Intelligent? https://www.thealgorithmicbridge.com/p/the-gpt-4-generation-why-are-the), suggesting that AGI K is a difficult barrier to overcome. Anyone care to guess what's missing...?

Expand full comment

Hmmm.....You know, the most powerful reinforcement schedule – and here I'm thinking for the humans, not the machines – has an element of randomness in it. They could be throwing good money after bad for awhile.

Expand full comment

Claude is pretty amazing so far!! I am wondering what metrics we use when we are declaring a plateau?

Expand full comment

It is very premature to claim a plateau has been reached. Never in history of tech was a 1 year interval considered enough of a time in which to measure progress.

Vendors are collecting very valuable data when it comes to how their chatbots are used and where they are failing. Such data will be cleaned up, then used for training, together with much synthetic data illustrating relevant use cases.

Use of external tools and RAG is still in infancy. We will likely see separate teams working on specialized chatbots, with each pursuing strategies most likely to make sense for that domain. Lots of exciting stuff to come.

Expand full comment

I’m curious; what do you consider the more likely scenario? Are we close to a plateau or are competitors in fact aiming for on-par performance with the leading model, because it costs big $ to surpass (i.e it’s intentional)?

Expand full comment

Or "HAPPY 1sT BRRHDAY GPT-4-! 4" as the case may be. 😆

(Did you use DALL·E?)

Expand full comment

I found that Anthropic's Claude 3 Sonnet beat GPT-4 decisively in a simple benchmarking exercise I conducted last week: comparison and analysis of 10 short passages of text, where some (but not all) passages contained related topics and themes. We'll see how Mistral does...

Expand full comment

uhhhh, Claude 3 is far better than GPT4. Q* is waiting in the wings and seems to be possible candidate for true AGI, though of course that's entirely based on speculation at this point. We still don't know what spooked Ilya but best candidate is Q* and its capabilities. This post and similar posts by Prof. Marcus remind me of Lord Kelvin's famous statements about physics being almost complete in the late 19th C.

Expand full comment

The next horror movie about the first LLM-driven long haul airliner will smash records at the box office 😜

Expand full comment

Happy birthday dream generator )

Expand full comment