No matter how much data you train them on, they still don’t truly understand multiplication.
My 1976 vintage ELF has a dedicated math ROM alongside its interpreter. Later PCs used math co-processors. I don't understand why LLM devotees seem to shun hybrid processing solutions as sacrilegeous...
There are two kinds of AI researcher: (1) those who already know that LLMs (by themselves) are not the route to human-level AGI, and (2) those who need to spend 10-20 years and $100 billion working that out.
"My (innately-programmed) calculator by contrast has received no training at all."
Haha. Yes. A true intelligence would use a calculator. After all, the use of tools is known to be a sign of intelligence.
Good article. The twitter thread is enlightening. There is a set of people who want LLMs to "evolve" into a general intelligence and so want to prove they can do everything and are not just stochastic parrots.
Maybe it's just the Lisp programming instinct in me, but my first thought is that DALL-E doesn't understand math either – just look at all those unclosed parentheses!
Made me laugh out loud with that extra column on the table. Brilliant. I suspect some people will miss the very fundamental critique behind that.
My biggest fear is that the 'house of cards' aspect (which exists next to a usable aspect) of LLMs will come crashing down before I can publish my video (which has not only errors but also illustrates how they come to be) of my talk last week.
hah, that paper is indeed great, but it proves exactly the opposite as its title suggests. Also, there are some notable weak points. First, the training set contains numbers with up to 12 digits, the evaluation set also contains only numbers with up to 12 digits - to demonstrate that the LLM has indeed learnt the rules of arithmetic tests with numbers larger than in the training set should be performed, and I can bet on the results - 0%. Second, the evaluation set contains only 9,592 cases - this seems woefully inadequate - considering that test cases are very easy to generate automatically it would actually make sense to test on all possible combinations, or at least on a much larger amount. And third, the authors state that the evaluation cases were from the same distribution as the training cases - that makes no sense, numbers don't have a distribution, if anything numbers are evenly distributed, so essentially have a uniform distribution. I suspect what they meant is that they probably used a special random number generator (with non-uniform distribution) for generating their cases, and if the distribution is very compact (small variance) the chance of getting a lot of evaluation cases the same as some training case becomes quite high.
For statistics, I use mathcracker.com. It does a fine job with regression analysis. And for multiplication I use my calculator. I always feel like I am an old fogie. Thank you for making me feel better about myself, ha, ha. Loved your Guitar Zero book by the way.
If a general purpose AI driven problem solver is seeked for, maths (including logics) will have to be incorporated in a rigorous numerical form. No other option makes any sense.
It'd be interesting to see if you could program a transformer "by hand" to solve large multiplication problems. Is it a failure of the learning algorithm or a limitation of the neural architecture itself? I would think since there's a finite number of layers, it would have a limit on the number of carries it could perform, but you'd expect a model with dozens of layers and billions of parameters to be able to perform perfectly up to some number of digits.
It would also be interesting to see if it can induct binary multiplication (which is very simple) better than decimal multiplication.
hmm, the inability to do arithmetic might become the XOR moment for LLMs :)
(back in 1969 single layered perceptrons were shown to be incapable of performing the XOR (exclusive OR) logical operation, which lead to the first abandonment of neural networks by the AI community)
Forever the faithful a critic and joyful skeptic...thanks for the grounding...
Thank you for writing _The Algebraic Mind_.
what does backtracking mean in a word-predictor? i think you are ascribining internal machinery that is absent
In the future all LLMs will be hybrid. They will always defer to calculator-like modules to do numerical computations. That will guarantee the computation is 100% right, but just as importantly it will be millions of times faster.
I think getting an AI to do math "the hard way" is a reasonable research niche to explore, though. Certainly you'd expect an AGI could do math, no matter how tedious, so today's LLMs are clearly not AGI, for this and many other reasons. But that large *language* models alone cannot do math does not seem surprising to me at all, and it doesn't seem like a big concern. We've been doing math with computers for nearly 80 years, we can always add that back in.
my background is in math and one thing I learnt was that If your proofs or conjectures are not particularly interesting, then journals will not publish them. Rather, having intuition to know what is important and what isn't is perhaps the greatest skill.
Looking at papers like this, anyone who knows anything about LLM's would guess that they are terrible at multiplication. Who would have thought that finding statistical regularities among (10^12)^2 combinations of numbers would be hard...... It is not a good sign if a field finds papers like this interesting enough to publish.
In any case isn't the obvious solution to just attach wolframalpha + a classifier that says when to use wolframalpha?