AlphaGeometry2: Impressive accomplishment…

Feb 17

What GoogleDeepMind’s latest does and doesn’t show, and what we like about it

20 Comments

Elementary Euclidean geometry was solved a long time ago. It is a known result of mathematical logic that such a geometry is a decidable theory: there exists an algorithm that solves any problem in the language of elementary euclidean geometry after a finite number of steps. It was found many decades ago. So, strictly speaking, that's an area of mathematics that doesn't require creativity anymore. Deepmind found an efficient algorithm, but I am not surprised at all.

Expand full comment

Reply (1)

S.S.W.(ahiyantra)

Feb 18

That algorithm mentioned by you sounds interesting. Could you please elaborate?

Expand full comment

Reply (1)

Understanding Intelligence

Feb 18Edited

Look e.g. at this:

https://en.wikipedia.org/wiki/Tarski%27s_axioms

There are many results in mathematical logic that shows that some particular branch of mathematics can be axiomatized in such a way that the truth or falsehood of every statement can be decided in a finite amount of time, and a proof is provided. These decision algorithm are usually complex, but basically reduce the notion of truth to a computation. Many branches of mathematics are not decidable, for example the theory of natural numbers (number theory). This means not only that we don't have an algorithm that can solve all problems of number theory, but that it is *mathematically impossible* that such an algorithm exists. Thus we say that mathematics, in general, requires non-algorithmic creativity.

Expand full comment

Gerben Wierda

Feb 17Edited

And while this kind of realistic analysis is what the world needs when these math olympiad results are discussed, the world will get swamped by nonsensical hype about AGI instead. Sigh.

Google is really weird. It marries (often) very good science and engineering with (often) over the top marketing and buzz.

It's a different subject, but the same recently happened with their QM computing (https://ea.rna.nl/2024/12/31/googles-willow-quantum-computer-impressive-science-and-misleading-marketing/) and you could even notice the scientists squirm in discomfort.

Expand full comment

Charles Fadel

Feb 17

Thank you Ernie and Gary,

To get beyond the narrowness of Math benchmarks, which "merely" ramp up algebraic complexity, we have advocated for a better benchmark design to understand the inner reasoning of the model (and used Ernie's framework to compare the existing benchmarks): https://curriculumredesign.org/wp-content/uploads/Benchmark-design-criteria-for-mathematical-reasoning-in-LLMs.pdf

Your opinion would be highly valued, thanks.

Expand full comment

Alex

Feb 17

science journalism at its best

Expand full comment

keithdouglas

Feb 17

The "manual translation" seems to reflect (Euler's?) remark that his intelligence was often in his pencil.

Expand full comment

Terry Bollinger

Feb 17

Thank you for this article! It is so nice to see an article on real AI — the kind that requires a lot of thought and effort and produces results that are not only what someone else did. Impressive!

Expand full comment

Paul Jurczak

Feb 19

"the average IMO high school-aged human gold-medal winner solves 81.8%" - there are time limits in the International Mathematical Olympiad. With relaxed limits, human scores would be higher. With compute limits, scores of AlphaGeometry2 would be lower. Not sure what would be a fair time rule here.

Expand full comment

Reply (1)

Paul Jurczak

Feb 20

Comparable energy use could be a criterion. The human brain consumes about 20W and contestants have four-and-a-half hours to solve three problems. I wonder how pathetic results would AlphaGeometry2 produce given 0.09kWh of electricity.

Expand full comment

Corey

Feb 17

Your hope for better approaches is based on the assumption that we want more powerful AI, and soon. I wish I shared your optimism. To be honest, when I read about AI’s current limitations, mostly what I feel is relief.

Suppose OpenAI or someone else hits upon true, unequivocal AGI sometime in the next couple of years. Will the desire for proper safety measures override incentives to ship product? Will the world’s cultures, economies, and governments adapt in time to prevent some kind of catastrophic disruption? If AGI is the first step in a “fast takeoff” scenario, are we even remotely prepared for machine superintelligence?

More and more, I find myself hoping for slow progress and a stubborn commitment to dead ends. This of course carries its own dangers (difficulty or impossibility of true alignment, for example), but for now I think weaker is better.

Expand full comment

A Thornton

Feb 17

And solving word problems in Euclidean Geometry will help give an LLM the "ability to choose between competing courses of action based on the relative value of their consequences," behavior exhibited by the bacteria Myxococcus xanthus ...

how?

Quote from: Balleine BW. 2007. The neural basis of choice and decision making. J. Neurosci 27:8159–60

Expand full comment

Reply (1)

A Thornton

Feb 17

Especially when:

The interior angles of a triangle sum to less than 180.

The interior angles of a triangle sum to exactly 180.

The interior angles of a triangle sum to greater than 180.

Expand full comment

Simple John

Feb 17

Ernie, Gary,

Fine article - no mention of AGI is appreciated. "hoping" still gives me pause.

"hoping the field will develop MODELS with a deeper understanding of mathematics and the world"

The contortions you describe that DeepMind uses to build proofs in a modest corner of geometry has me wondering how the term "model" applies.

I'd love a response.

Expand full comment

Dakara

Feb 17

The public sentiment has gone from AI is hard to AI is easy based on the LLM hype. At some point we will return back to reality, that being true AI is really hard.

Another recent opportunity for observing the deficiency in LLM reasoning appeared from the promotion of Math progress in multi-digit multiplication of OpenAI o3. The critics of AI reasoning point out LLMs are still poor at Math, but the pro AI-intelligence faction rebut this with only "but humans can't do that either."

However, they missed something important, humans would ask for a calculator and LLMs do not. The fact that they do not is a simple and apparent example of no reasoning present. I elaborated on this recently here FYI - https://www.mindprison.cc/p/why-llms-dont-ask-for-calculators

Expand full comment

Larry Jewett

Feb 20

What fraction of the Olympiad questions that were correctly answered had been trained upon by the bot?

If it is claimed that “none were trained on,” how do we know that this is true?

Simply trusting the folks who are making these claims is not science.

And correctly “solving” problems/solutions that were in the training data is meaningless.

Expand full comment

Sufeitzy

Feb 19

Superb writing. On the LLM front it is utterly consistent with behavior I experimented with and documented back in gpt-2 and forward versions, what I called “elementary physics” and ‘elementary geometry’ problems.

My “benchmark” is writing pulp fiction novels, and seeing where it generates incongruities . The models have gotten very very good, but when humans move through space in writing they easily generate situations which we humans are unlikely to write, at least not unintentionally. People in handcuffs can’t get in front and back seat of a car through separate doors. There are no echoes in open fields. The most humorous one I call “head so far up his ass he could talk through his mouth”.

My feeling is that as humans we are constantly assessing physics and projective geometry in our cognitive systems, and when we read something which contradicts our internal models as we project ideas from the text, it is quite jarring, and dissonant, it doesn’t require logic or calculation. It’s a Kahneman fast system.

Language and higher level models cannot absorb spatial reasoning we develop with our physical bodies, proprioceptive, gravity, acceleration, and visual systems. I had to read the first problem 3 times, I thought there was something I was missing - the answer is trivially obvious from a visual point of view.

I think a lot of geometric axioms (at least simpler ones like Euclidean) dont exist as language models for us. I suspect even more complex ones, topology, differential we understand without being able to describe.

Expand full comment

Youssef alHotsefot

Feb 17

Fascinating. Thank you both for explaining these issues to us non-mathematicals.

Expand full comment

Guinea-pig lover

Feb 17

Gary, slightly off topic, but this might be interesting for you: https://www.wired.com/story/thomson-reuters-ai-copyright-lawsuit/

As the judge decided, fair use does not apply when the new product created with the use of copyrighted material is a direct competitor with the original work.

Expand full comment