21 Comments
Dec 20, 2023Liked by Gary Marcus

The paper by Ernest Davies is indeed worthwhile. I found especially the final paragraph of section 4.8 illustrative.

Using the LLM as a sort if 'loaded dice' for the mutation element of a genetic algorithm is a nice ('tinkering engineers') trick, but also one that raises a question about the effect of the LLM constraints — they are stochastically constrained confabulators after all — on the mutations on how effective you can find genetic optima.

It seems we're in the 'engineering the hell out of a fundamentally limited approach' stage for transformer LLMs. And the overselling by Google PR is becoming a pattern (see Gemini).

Expand full comment

Just started to read the paper, and found this (line 54):

"First, we sample best performing programs and feed them back into prompts for the LLM to improve on; we refer to this as best-shot prompting. Second, we start with a program in the form of a skeleton (containing boilerplate code and potentially prior structure about the problem), and only evolve the part governing the critical program logic. For example, by setting a greedy program skeleton, we evolve a priority function used to make decisions at every step. Third, we maintain a large pool of diverse programs by using an island-based evolutionary method that encourages exploration and avoids local optima."

This is already coming across to me as a likely case of stone soup — I'm referring to an old fable in which soup was allegedly made from a stone, but it becomes clear in the telling that there were lots of other ingredients that actually made it soup. Given the structure they've described above, they could very well have gotten the same result, I would expect, using a random program tree generator — this is just John Koza's genetic programming technique from the '90s. Does anyone seriously believe that there was information anywhere in the LLM's training corpus that bore on this problem?

Expand full comment

If an AI truly "went beyond human comprehension", how would we even know?

Expand full comment

This just in... FunSearch almost sort of kinda solved the P vs. NP problem, but didn’t really.

Expand full comment

The BOT acronym refers to Bullshit On Tap ?

LLMs reveal us to be strangely deferential to plausible-sounding language, even when the language-generator (human or statistical model) is completely disconnected from any understanding whatsoever.

No wonder fraud thrives ...

Expand full comment

The flipside is that the "shallow" usage here of the LLM can be seen as a feature, not a bug. If you believe, as many here do, that an LLM alone will never be trustworthy on important problems, then it's wise to give it less responsibility. The argument in the response paper, that a perfect LLM would have no opportunity to use its abilities, is irrelevant if you believe that a perfect LLM can't exist. Here the LLM is given the same role in the evolutionary algorithm that random genetic mutation has in biological evolution. In life this method has produced, well, us, for example. The point is hand-wavy exploration plus objective fitness testing. It's a solid contribution that this system doesn't rely on the LLM for correctness.

Expand full comment

"The real problem is not whether machines think, but whether men do"

—Technology Liberation Front

Expand full comment

I read these headlines as though they were using the LLM roughly the same way fiction authors are using it, as a creative aid. Prime the LLM with info and questions and see if it suggests anything that I might be able to use. The weighted dice analogy seems apt. The LLM will spit out a litany of random ideas and the human must be the one to have the aha moment and say, "That might be useful, let's try it."

Expand full comment

This AI hype has reached criminal proportions. I think instead of reacting to it piecemeal it would be good to create something a bit more organised. Something along the lines of Skeptical Science perhaps. A website dedicated to providing explanations of hype myths by experts. This would make an excellent educational aid.

Expand full comment

Hype is how all dishes are served nowadays.

This advance is incremental, but it shows an important truth. A system that generates things has value. Not just in math, but even for an indoor robot, that can figure out, given what you say, what you actually want. Such a robot can take into account what you said before, and the context it which it operates.

We will see more advances in chatbots. Simply adding more data does not work after a while. So companies will be forced to focus more on validation and architecture. The reward for a well-done chatbot that other companies are willing to pay for could be high.

Expand full comment

Steelmanning (Memes-R-Us Is Over) era test

Let’s assess whether Gary Marcus passes the Steelmanning Test:

Steelmanning in the Memes Are Over (ie Gen AI) era

https://themindcollection.com/steelmanning-how-to-discover-the-truth-by-helping-your-opponent/

As it stands, Gary Marcus hasn’t.

Let’s see which LLM is best in speeding up our steelmanning of Deepmind on Funsearch to the John Stewart Mills “On Liberty” extent.

#SpiceTradeAsia_Prompts

Expand full comment

"OVERSOLD" is an understatement! It's like every other day another over-hyped, fake news AI claim comes up. I would hope these smart scientists understand the difference between pure and applied mathematics. How does solving an "unsolvable" pure mathematics problem help humanity exactly? "Scientists" have now become philosophers; that little objective evidence thing seems in the distant past. How can you go from pure mathematics...to making such irrationally exuberant claims...sounds like they're about to solve poverty, cure cancer, and all the other complex social problems in the world, and be home for dinner. Pump breaks, please!

Expand full comment

are you suggesting that the "terrific new paper" you cite would pass peer review?

Expand full comment