129 Comments
User's avatar
Bruce Olsen's avatar

Not to pile on, but...

I was doing some carpentry yesterday and did my usual type of entry into google: ".157 in 16ths" which has always returned correct results. But their "AI Overview" piped in first, and informed me that .157 was equal to 1 and 9/16 which I can't even.

It then proceeded to display laughably detailed--and wrong--instructions for performing the conversion.

I was finally informed that .157 inches (I never mentioned inches in the query, btw) was roughly equal to 5/32 (not 16ths but close in value) AND 13/16, which is closer than 1 9/16 but really?

That's 2 wrong answers (or more) from 1 simple query.

I'll post it if anyone is interested.

Edit: added link: https://open.substack.com/pub/thesabot/p/when-size-matters

Erick's avatar

Avoid google's "AI overview" like the plague. It's the cheapest and dumbest AI out there.

Kenneth Burchfiel's avatar

This was my reason for switching to DuckDuckGo (as you can turn their AI results off).

Will Granger's avatar

Trying it out now. Thanks

Ben Kellie's avatar

Switch to Kagi. I never looked back.

Will Granger's avatar

Thanks. I will keep that in mind. Trying Duck Duck Go right now

Morten L. Nielsen's avatar

put a '-ai' at the end of your search

Jennifer Jordan's avatar

Please post. These are the examples people need to be aware of.

User's avatar
Comment removed
Jun 12, 2025
Comment removed
Christopher Rivera's avatar

That’s the problem with these models. Well it’s a problem with how people perceive them. They don’t reason. They aren’t logical, but they often seem like they are. Sam and ilk are selling them as such.

User's avatar
Comment removed
Jun 13, 2025
Comment removed
Christopher Rivera's avatar

No I don't think that. You presume too much. My point is that these models are over hyped and are not capable of doing things that people believe they do. It sounds like you have drunk the Kool-aid.

Robert Spanell's avatar

We need to consider the meaning of the word "reasoning". If reasoning is only a synonym for thinking, then of what use is the word. No, reasoning is the type of thinking that is distinguished by its reliance on logic. Without logic, it can't be considered reasoning. Yes, that logic could be incorrect and we might still categorize it as reasoning, but this would then be a subcategory that we would label as "incorrect reasoning", since it would result in an incorrect answer. So humans can be incorrect, even most of the time, but still be considered to be using logic/reasoning.

However, I don't doubt that there are plenty of humans who seem to get through their lives without any reasoning whatsoever.

Bruce Olsen's avatar

For sure, but this method hadn't failed me until AI chimed in.

User's avatar
Comment removed
Jun 13, 2025
Comment removed
Jonah's avatar

At least in the USA, Google had said that it is using a modified version of Gemini 2.5.

disinterested's avatar

Why does google think they're good at calculations then?

Clint Shook's avatar

This is a great example of its limitation and ability to get something simple incorrect.

Also just fyi for converting decimals to fractions, just multiply it by the bottom number. So if you wanted 0.157 in 32s you just do 0.157*32 and that becomes the numerator. So 5/32s

jibal jibal's avatar

The right answer of 2.512 was in there near the beginning but then it obscured it with a lot of utterly wrong nonsense.

I managed to coax the right answer out of duckduckgo (which used GPT-4o mini) but only after rephrasing it as "How many 16ths is 0.157". But then it adds a bunch of superfluous junk, telling me that that 2.512 is approximately 2.51 (duh) and then offering me that it's about 2 8/16 (pointlessly dividing by 16 again) which simplifies to 2 1/2 (duh).

Joe's avatar

A.I.I. - Artificial Incorrect Information.

Pramodh Mallipatna's avatar

I have read another reason in some discussions. Apple is lagging behind in AI and hence writing this paper to bring the hype down 😀 it’s almost in the realm of conspiracy theory!

Sharing my latest article related to the topic.

Can Data Alone Make Machines Think

https://open.substack.com/pub/pramodhmallipatna/p/can-data-alone-make-machines-think

Gary Marcus's avatar

ah saw that too but forgot; just another stupid ad hominem argument.

it’s not like tim cook was on the paper

A Thornton's avatar

From the day Reitman called Taube a liar in the pages of Science* ad hominem has been the first Go-To for the AI Brigade.

* Reitman, Walter R. "Fact or Fancy?: Computers and Common Sense. The myth of thinking machines. Mortimer Taube. Columbia University Press, New York, 1961. 136 pp. $3.75." Science 135.3505 (1962): 718-718.

Jon B's avatar

It's stupid to argue that corporations publish studies that dovetail with their market position and strategy?

Is it also stupid to argue that oil & gas companies publish studies which conclude climate change is no big deal?

jibal jibal's avatar

It's stupid to argue against a strawman, as you just did. Maybe look up what an ad hominem argument is--Gary is right that "Apple is lagging behind in AI and hence writing this paper to bring the hype down" is one and fails to refute anything in the paper.

The vested interests of fossil fuel companies and the records of their boardroom decisions explains why those companies pay people to lie about climate science, but they don't SHOW THAT they are lies about climate science--that conclusion is reached by other means. Not understanding the difference is stupid, and leads stupid people to stupidly conclude that every study that a corporation publishes that dovetails with its market position and strategy is a pack of lies--or rather, as here, every study that someone *doesn't like* is. That's called "cherry picking", and is stupid and dishonest.

BTW, climate science deniers also play this stupid and dishonest ad hominem game by pointing to research grants that climate scientists receive (or that scientists are "liberals" who perhaps contribute to the Democratic Party, etc.), or to environmental organizations with known views on global warming, like the Sierra Club, commissioning studies. I think we can agree that people are aholes for using these sorts of textbook fallacious ad hominem arguments. Decent intelligent people focus on the scientific content of the paper, not on who commissioned or funded it. If and only if a pattern of severely faulty scientific content is established do they turn to why there is such a pattern--which is a quite different question from the scientific one that published papers purport to address.

As an INTELLIGENT person commented:

"It's a reasonable argument for Apple's motivation for producing the paper but so what? All that should concern us is the contents of the paper itself. It's a good argument for Apple not to follow what other AI companies are doing."

Brian Frantz's avatar

Honestly I'm starting to think Apple is only "behind" because they have higher standards and haven't been able to make GenAI features reliable enough to stake their reputation on. IOW, it's just evidence that these fundamental weaknesses are not solved, and I'm sure not for lack of trying.

Nav 𓅯's avatar

It’s practically a religious movement at this point.

N M's avatar

So much to be said* related to this and yes.

Paul Topping's avatar

It's a reasonable argument for Apple's motivation for producing the paper but so what? All that should concern us is the contents of the paper itself. It's a good argument for Apple not to follow what other AI companies are doing.

Nathalie Suteau's avatar

It was the same argument a year ago.

Pramodh Mallipatna's avatar

Critiquing on substance of the arguments made in the paper (agree or disagree) would be reasonable from my pov. One can question the motivation but attributing it to competitive/strategic considerations alone without anything on the substance of the paper sounds like ad hominem to me. Anyways some discussions I read made that, not all of course

Bruce Olsen's avatar

But Kara Swisher says you're just a pest!!

Gary Marcus's avatar

she is very close to Sam…

Christopher Rivera's avatar

She’s not a technologist or scientist. She, like Sam, aren’t deep thinkers and frankly don’t understand what they are talking about. Americans falling for their hype is a symptom of our society’s worship of money and disdain for science.

User's avatar
Comment removed
Jun 12, 2025
Comment removed
Christopher Rivera's avatar

Okay. I’ve listened to her and partially to Sam. I usually enjoy listening to her. She has some insight into how the tech billionaires think, but she does not appear to understand the technology she speaks about. These models are amazing, but both Sam and Kara are selling kool aid. Sam because he’s a businessman whose business depends on the hype. She sells it because she appears to like feeling cool. I stand by my statement that Americans are motivated by money and demonstrate a serious disdain for actual science.

User's avatar
Comment removed
Jun 13, 2025
Comment removed
Christopher Rivera's avatar

I generally agree with Gary. He’s an actual scientist who’s done research in machine learning and neuroscience. I’m an MD PhD Biochemist who does ml in biotech. Biological systems are incredibly complex. While, alpha fold and ilk are doing awesome jobs at helping solve the protein folding problem, they make enough mistakes that we have to experimentally validate their results. It takes many GPU hours to do relatively simple Molecular Dynamics simulations of single protein systems for a single microsecond. Our nervous systems integrate signals from trillions of molecular machines, that have built in quantum physics engines. I think LLMs are great. But they are ultimately statistical machines and nothing more.

Al de Baran's avatar

Ironically, I haven't seen substantive by you in response to anything "bloviated" here yet. Oh, but you're a "philosopher of metaphysics", so that should end all discussion. *laughs*

Jeff Irvin's avatar

It bothers me that Swisher and Galloway, two generally cynical human beings, are not more skeptical about AI optimism.

Ari's avatar

Access ‘journalist’ who knows where her bread is buttered. She’s been carrying dear Sammy’s water for years.

Christopher Rivera's avatar

"It is difficult to get a man to understand something, when his salary depends upon his not understanding it."

Nathalie Suteau's avatar

When was the Strawberry space with Elon under cover as Adrian, mr Strawberry who is I don’t know? In August? I’ll write my long comment on that day. Thanks for the article.

Darren D'Addario's avatar

Kara Swisher is such a clown. She spent all those years in business with Rupert Murdoch, enriching herself, and then dubs him "Uncle Satan" so she can posture like a badass. She's so obvious.

Hector Zenil's avatar

Apple didn't do it first, they just have better marketing teams:

Here is our paper where this idea of increasing complexity was studied before the hype from Apple on this paper showing that LLMs performed very poorly and beyond using human-centric games and puzzles.

It was proposed as a test for LLMs called SuperARC, and a hybrid neurosymbolic solution was tested outperforming best frontier models:

Our press release from King's College: https://www.kcl.ac.uk/news/new-study-introduces-a-test-for-artificial-superintelligence

Our first paper on complexity to test intelligence: https://www.linkedin.com/posts/zenil_ai-agi-superintelligence-activity-7338244880269209600-tWiD

Let's not perpetuate what they have done to you, you were ignored to some extent for so long until LeCun basically adopted and stole your position so dishonestly.

Peter Gaffney's avatar

I love your insights, but do you have to say "I told you so!" in every single post? It doesn't make me want to go on a long road trip with you.

Paul Czyzewski's avatar

Great article.

However, since Salesforce is in the _business_ of providing AI-powered business solutions, any benchmark provided by them has to be viewed with extreme skepticism.

neandrothal's avatar

Speaking as a SaaS product marketer, Salesforce has been marketing its “AI-powered” features way before ChatGPT first came out. They were terrible and we didn’t bother using them. Whatever they called AI back then definitely wasn’t on the level of LLMs (not even to mention AGI).

I think Salesforce has more of a sour grapes motivation than Apple because they have been overhyping AI for business applications before Sam Altman started to.

None of which is to say that the content of their paper is wrong, of course!

Sufeitzy's avatar

My take is mixed on the article and this write up.

I think the gap we see is that self-awareness, intent, and motivation are a big part of reasoning, but are not modeled in an LLM.

Just a very simple example. An LLM has no model for “reward” and motivation, it only seeks to minimize a divergence metric.

It will never seek, or “want” to solve a Hanoi problem or any problem which it has no example for to minimize divergence for. It is not “aware” it has no “intent”, for solving a problem.

I suspect that attention modeling that was a breakthrough needs to be conceptually augmented with the equivalent of curiosity, motivation, and awareness systems. Not intractable but I would look at people who lack curiosity, motivation, awareness to see what kind of neural system is at play; which can then be replicated.

Until then LLM’s are a very sophisticated information recall system (Librarian) but that’s it.

https://pmc.ncbi.nlm.nih.gov/articles/PMC7049782/

I remember reading Luria a long time ago (I had a strange reading list as a teenager) which I found funny at the time but linking it to Karl Friston it becomes more obvious.

Luria proposed that to be conscious, an entity has to be self-aware, it has to perceive its environment, and it has to be awake - aroused and motivated.

Walking is a form of reasoning, goal-directed self-originated planned neural activation with some complex inverse kinematics. There are many forms of reasoning besides mathematics - how to eat, walk, speak. We reason constantly, but are unaware.

A good place to start for finding these circuits is Luria’s “Higher Cortical Function in Man” (My copy is 1966 Plenum Press)

John's avatar

The path to intelligence is by living, self-aware, biology, not silicon manufactured hardware.

The bots might sound like you, because of your natural human “propensities to connect”.

But they are not like you - they are just gaming you - stay awake.

jibal jibal's avatar

This utterly irrational and illogical take has been refuted many many times, notably in the context of Searle's grossly incompetently argued Chinese Room paper.

User's avatar
Comment removed
Jun 12, 2025
Comment removed
Cyberneticist's avatar

Typical GPU in a data center requires between 400-800W of power but the human brain/body can operate on less than 3k calories. Those two figures are several orders of magnitude apart. Moreover, GPUs in data centers must be cycled every 2-4 years b/c the silicon gets fried and stops performing correct arithmetic operations.

In theory anything is possible, in practice computers are not capable of learning or adapting so you should go back to first principles and figure out why exactly you think there are no obstructions to constructing an adaptable brain from a bunch of xor gates & flip-flops b/c a single neuron has more complexity and adaptability than all of the world’s data centers combined.

User's avatar
Comment removed
Jun 13, 2025
Comment removed
Cyberneticist's avatar

It ain't gonna happen buddy. You should sit down and do the calculations b/c building anything that can approach one brain's worth of capabilities would boil the oceans. What you're doing is called wishcasting. If I told you that epicycles can approximate any function you'd tell me there are no obstructions to constructing brains from epicycles and you'd be just as wrong in your assessment as you are now w/ silicon xor gates & flip-flops.

User's avatar
Comment removed
Jun 13, 2025
Comment removed
A Thornton's avatar

Try solving a Inclusive Middle non-linear problem in software with 3 unknowns in a Collectivity* in software.

Something bacteria can do as a matter of course.

* See: Rescher, Nicholas, and Patrick Grim. Beyond sets: a venture in collection-theoretic revisionism. De Gruyter, 2011.

User's avatar
Comment removed
Jun 13, 2025
Comment removed
A Thornton's avatar

What part of "Inclusive Middle" didn't you understand?

"Taking the principle of excluded middle from the mathematician would be the same, say, as proscribing the telescope to the astronomer or to the boxer the use of his fists. To prohibit existence statements and the principle of excluded middle is tantamount to relinquishing the science of mathematics altogether."

Hilbert, David. "The Foundations of Mathematics" Sarkar, Sahotra, ed. The emergence of logical empiricism: From 1900 to the Vienna Circle. Vol. 1. Taylor & Francis, 1996.

User's avatar
Comment removed
Jun 13, 2025
Comment removed
A Thornton's avatar

What part of "Inclusive Middle non-linear problem in software with 3 unknowns" is unclear?

Rafe Brena, PhD's avatar

Point #2, "The Large Reasoning Models (LRMs) couldn’t possibly solve the problem, because the outputs would require too many output tokens," is taken care of in the paper itself. The authors say that in many cases, the AI failed to solve the problem even when there were plenty of tokens in its budget. So it's not valid, as you say.

Claire Hartnell's avatar

From my perspective, the biggest error in imagining AGI is that it is still closed system thinking. It assumes a threshold above which it attains human intelligence. But there is no threshold? Intelligence, like every other complex adaptive system, is constantly changing as it comes into contact with its environment. The only way a machine becomes human is if the multi-scale architecture of evolution is somehow embedded in it. That would mean an infinite number of tiny lifecycles nested within every machine updating information as they walk through their phase space. Machines may be able to approximate this at one level but they can never do it biologically. It’s like saying a robot dog will eventually be a dog. That will never happen! The robot may be styled to look & sound like a dog but it will never autonomously reproduce & evolve in response to its environment. This is why talk of AGI is so foolish & hubristic. It’s the wrong damn goal. It will never be human. My old school dictionaries contain more information than I can hold in my head but information is not language! This is simply the wrong use of fossil fuel. There are gains to be had from AI - access to all this information is useful. It promotes more advanced human thinking & helps us to find simple answers quickly. Arguably it solves some big data co-ordination problems. But that was a problem of human construction - absurdly huge stores of information have not allowed us to improve the human lot or make a leap to a higher level. The huge, screaming hole in the 70s was energy supplies. That’s what we needed to solve! And instead of doing that we allowed resources to be captured by a small elite who funnelled remaining stocks into proliferating noise. AI was supposed to be the signal but it’s already clear that it’s yet another source of noise contaminating our collective cultural brain.

Jonah's avatar

One thing that bothers me is that research like this gets relatively little coverage from a technology journalism ecosystem that seems mostly convinced that model companies are objective sources of truth, and ignores their obvious conflict of interest in overhyping their models.

For instance, Wired recently published an article saying "AI agents, find out what about what Sam Altman calls the next big thing." How shameful is that? How is that any better than "Find out what Pepsi calls the next big beverage"? By contrast, the Salesforce article pointing out the serious continued limitations, by virtue of not being hype selling a product, certainly does not merit a whole article from them.

In fairness, they did also publish an article about Meta sharing private user conversations publicly, but something has to reach that level of obvious egregiousness to get past their instinctive trust in technology CEOs.

The Human Playbook's avatar

Agree and am trying to showcase that. It’s a massive problem

Antipopulist's avatar

Argument #2, that LLMs don't have sufficient output tokens, has a second part. Mainly, if the output length even gets remotely close to the token length, then the LLM will stop doing advanced reasoning via chain-of-though since these are also counted as output tokens. This would obviously significantly decrease the reliability of answers since the LLM is effectively dumber in this mode. If/when LLM output length becomes longer, this will naturally be resolved. I don't know how long this will take, but it seems practically inevitable through simple scaling.

I'd like to see you address this argument at some point.

Asa Dotzler's avatar

And what happens when I throw a problem at the newer better scaled one and it's still not capable of a reliable answer without even more scaling. "It'll be great on problems you've found it's not great on, perhaps tomorrow or at some other "inevitable" point in time when it is scaled sufficiently.

Jennifer Jordan's avatar

Hi Gary, we met back in 2018 when I was working on a fund trageting Ai Trust, transparency and accountability. That trust stack is still not there and as we move to “reasoning” models and multi-agent decisions the need only gets more acute. Would love to propose pulling together a panel with you and some of the Apple authors to discuss at the San Francisco Imagination in Action AI conference this September.

Jim Brander's avatar

"The systems can solve the puzzles with code. True in some cases, and a huge win for neurosymbolic AI ... Huge vindication for what I have been saying all along: we need AI that integrates both neural networks and symbolic algorithms and representations (such as logic, code, knowledge graphs, etc)." It is a huge win for Semantic AI - English, in other words. Where is it written that "neural networks" and "symbolic algorithms" are any use (an an obvious mishmash). ANNs were stupid from the get-go - calling them neural networks when they are static contraptions lacking most of the functionality of real neurons is marketing, not science, and

"symbolic logic" ignores ther context, so misses out on the nuance or figurative use of natural language. https://semanticstructure.blogspot.com/2025/06/ai-and-symbolic-logic.html

Neurosymbolics is not the way to AGI, Natural language is.

Paul Jurczak's avatar

"The correct answer to Tower of Hanoi with 12 moves would be too long for some LRMs to spit out" --> "The correct answer to Tower of Hanoi with 12 disks would be too long for some LRMs to spit out"

Erick's avatar

LLMs aren't very accurate at executing simple recursive algorithms, so if lots of steps are required they mess up. Also they have limited context lengths. Therefore they will never be AGI?

It seems to me if they're good at most of the stuff humans are good at (not true yet but getting closer), plus can use computer tools to deal with the other parts much the same way humans can, then they'll eventually be able to do most of the stuff humans can do. I'm sure there are better ways, much better, probably involving closer integration with symbolic AI... but a crappy LLM system with a lot of brute force scaling may still be good enough.

Asa Dotzler's avatar

Good enough to decide if your children live or die? You'll trust them on important questions because you have faith in their scale?

Erick's avatar

Not anytime soon! When something is life or death you double check it. Most things aren't, though, and it's hard to say exactly how the technology will evolve over the long term. Certainly people are trying to find ways to improve reliability, but there's a long way to go.

jibal jibal's avatar

"Therefore they will never be AGI?"

Absurd strawman ... Gary has given many fundamental reasons why LLMs are not on the path to AGI, and never made the claim you're putting in his mouth.

Erick's avatar

What's the difference between "not on the path to AGI" and "will never be AGI"?

jibal jibal's avatar

None. The strawman is your claim about what his argument is: "LLMs aren't very accurate at executing simple recursive algorithms, so if lots of steps are required they mess up. Also they have limited context lengths. Therefore they will never be AGI?"

Erick's avatar

Oh, okay. That was my slightly snarky attempt to summarize the most relevant points he discussed in this particular post (ignoring the dumb stuff like "an intern wrote it"). I certainly wasn't trying to address every argument he makes elsewhere. But I think this post is a good example of his general pattern--pointing out limitations of LLMs and then being too quick to jump to very broad and long term conclusions about big picture implications, like what can or can't "fundamentally transform society" or what architectures will be needed to have real "thinking". Yes, he has pointed out many limitations besides the two I mentioned here, but he never makes a convincing case for extrapolating to a confident belief that similar architectures will always have crippling weaknesses.