Sorry, but FunSearch probably isn’t a milestone in scientific discovery
As ever, don’t believe the hype
Over the last few days, you probably saw a bunch of enthusiastic news reports and tweets about a new Google Deepmind paper on math, like this at The Guardian:
and this
§
Well, yes, and no.
Google DeepMind, along with the mathematician Jordan Ellenberg really did use some AI to help solve a math problem, in a very clever paper that is worth reading, on how to use “program search with large language models” (as the accurate and not at all hypey title explains).
6
But then Google DeepMind’s PR team oversold the paper, claiming e.g., that they had “solved a notoriously hard challenge in computing” and ending with this exuberant but immensely speculative set of claims:
As clever as FunSearch is, it’s unlikely to be a major factor in solving cancer or making lightweight batteries.
In reality:
An LLM didn’t solve the math problem on its own; the LLM was used in a very narrow, prescribed way inside of larger system. (This is very different from the usual explain-your-complete-problem-in-English-and-get-an-answer.)
Human mathematicians had to write problem-specific code for each separate mathematical problem.
There is no evidence that the problem was heretofore “unsolvable”
“Going beyond human knowledge” is a bit oversold here.
The problem was not exactly the biggest outstanding problem in math
And the solution probably isn’t all that general.
It’s also hardly the first time AI is helped with mathematics.
NYU computer scientist Ernest Davis has just written a terrific new paper about all this, going into depth. Highly recommended.
Gary Marcus has been worrying aloud about hype in AI for decades.
The paper by Ernest Davies is indeed worthwhile. I found especially the final paragraph of section 4.8 illustrative.
Using the LLM as a sort if 'loaded dice' for the mutation element of a genetic algorithm is a nice ('tinkering engineers') trick, but also one that raises a question about the effect of the LLM constraints — they are stochastically constrained confabulators after all — on the mutations on how effective you can find genetic optima.
It seems we're in the 'engineering the hell out of a fundamentally limited approach' stage for transformer LLMs. And the overselling by Google PR is becoming a pattern (see Gemini).
Just started to read the paper, and found this (line 54):
"First, we sample best performing programs and feed them back into prompts for the LLM to improve on; we refer to this as best-shot prompting. Second, we start with a program in the form of a skeleton (containing boilerplate code and potentially prior structure about the problem), and only evolve the part governing the critical program logic. For example, by setting a greedy program skeleton, we evolve a priority function used to make decisions at every step. Third, we maintain a large pool of diverse programs by using an island-based evolutionary method that encourages exploration and avoids local optima."
This is already coming across to me as a likely case of stone soup — I'm referring to an old fable in which soup was allegedly made from a stone, but it becomes clear in the telling that there were lots of other ingredients that actually made it soup. Given the structure they've described above, they could very well have gotten the same result, I would expect, using a random program tree generator — this is just John Koza's genetic programming technique from the '90s. Does anyone seriously believe that there was information anywhere in the LLM's training corpus that bore on this problem?