Michal on Marcus on AI

60 Comments

⭠ Return to thread

Michal

Jan 9

So what do you think about things like this? https://huggingface.co/papers/2501.04519

Looks overwhelming and a little scary. Do you think it's real or they just overtrained on that exact benchmark?

Expand full comment

Reply (1)

Ben P

Jan 10Edited

It looks really cool, if hacky in the way that all deep learning based attempts at performing deductive logic are hacky. Notice that there is an element of deduction involved: AI-generated proposed solution steps are turned into Python code and then run to see if they actually work.

Ultimately, this is still a probabilistic "does this solution resemble solutions from the training?" approach, but done in a principled manner so as to avoid the usual problems in using language models to answer logic questions.

I'd say this is a good example of a well-established phenomenon: the more narrowly an AI is tailored to solving a specific kind of problem, the better it will perform.

Expand full comment