60 Comments

Elizabeth Spelke, in "What Babies Know" identifies innate core abstract knowledge of objects, place, number, forms, agency and core social cognition, delivered by evolution and originating between 60 and some hundreds of millions years ago. These are the basis of common sense and none are as yet available to AI.

Expand full comment

Terry, bingo :) AI can only acquire something comparable, by physically being in the world.

Expand full comment

I'm sorry but most of that research is not replicable. There are some perceptual primitives in baby chicks but do not trust the human baby research.

Expand full comment

citation please

Expand full comment

I say this as someone who is a researcher in the field (cog dev). I don't have a citation for my general claim, although Hamlin, Wynn & Bloom, 2007 (one of the most sensational of baby findings) does not replicate (recently published multi-lab replication attempt that involved the lead author). I know the field very well and all the shenanigans, mainly very small samples, statistical noise, and p-hacking. It's sad the public has no idea but then again I'm not sure how much the public cares. I only mention to you because you seem to believe in it.

Expand full comment

thank you; I can only note that the discussion in the book is very detailed - with reference to experimental set ups, analysis of a large number of papers in the field etc. She is robust on objects, place and number, more cautious on the other categories. Praised by Dehaene and others. If you know of critical reviews please let me know.

Expand full comment

Oh, I have no doubt. She has dominated the field and has ensured her perspective will live on with many of her students taking up top positions. I'm very familiar with the flimsy science behind this work. Dehaene is impressive of course but a lot of these folks cut corners to tell big stories that get big publications, positions, etc. In short, the baby researchers tend to be the hyperambitious folks who want to be at the top of the hierarchy. Because studies of infant cognition can be published in Science, Nature, etc. The original Hamlin, Wynn, & Bloom had an embarassingly small sample, and either that paper or Wynn & Bloom used a one-sided t test (another shenanigan to get the p value "significant"). I could go on and on. There are published critiques out there but they mostly focus on the issue of rich interpretation (many have argued over the years that infant looking patterns can be explained more simply, in terms of perception or something else vs cognitive primitives/built in conceptual knowledge. I think the rich vs lean debate misses the larger problem of small samples, noise, and p hacking. And dev psychologists are too congenial overall (or the stakes are too small?) for anyone to seriously challenge the work of Spelke, Baillargeon and others. The replication failure of Hamlin et al was recently quietly published and I'm not sure much will come of it.

Expand full comment

I really appreciated the work in that paper. We talk to companies about AI agents everyday as a journey that starts today, will take time and needs a starting point that focuses on learning to crawl, before walking, running and then maybe flying. Then those people go on LinkedIn and are inundated with posts about how an entire company can be automated in a day using AI agents 🙂 The paper provides much needed depth to the argument for taking measured and careful steps, especially for organizations of any relevant size.

Expand full comment

This is a very good overview of the challenges of AI, proposed methods, and their limitations.

Indeed, common sense won't just emerge. Neither are physics models enough. Neither is principled knowledge processing or symbolic methods going to save us.

The hard truth is that eliciting a proper response from a system is highly context-dependent, and the system must model both the physics of fine interactions, and high-level, ambiguous, poorly structured relationships.

Skeptics would do well to acknowledge that the last 5-10 years have in fact seen very good progress at handling messy problems, context, and scenarios where multiple levels of detail are involved.

It will be a diligent process of cataloging, large amount of data and processing, and, where feasible, integration of more principled and more honest modeling. Followed by many iterations.

Expand full comment

You're burying the lede:

"Skeptics would do well to acknowledge that the last 5-10 years have in fact seen very good progress at handling messy problems, context, and scenarios where multiple levels of detail are involved."

Expand full comment

The overarching point here is "good enoughism" We've lowered standards and something has changed, we don't demand accuracy. Tech thinks it is apart from "reality" and suggests to us it needs to just be "good enough". This was the ethos in my own ed-tech company, one I rued. Not about a good product, just get something up there to fool, placeholder, maybe, maybenot, we'll fix it later. Whereas if we published one inaccuracy in a coursebook, we'd be quartered and hung - nobody would give us a second look. Something has changed sociologically, we accept this inaccuracy for some reason with a belief "hey, it is good enough!"

Expand full comment

I am still salty about being laid off ON my 62nd birthday from my operations role at a law firm, very likely by some random bean counter in HQ. With benefit of hindsight, I think someone just drew a line on a chart and dumped everybody making over $X and kept the rest.

My own team didn't even know about it until is was a done deal and one very kind person I had worked with for over a decade called me to vent about how flabbergasted he was.

Over next few years, I kept seeing same office I was in advertising constantly for new bodies and I suspect some the younger and cheaper talent they found ran screaming from the office within a week or two, given the characters I used to deal with. Talk about "herding cats and dogs." Grumble mumble grumble. But I guess it is "good enough" to find cheap people and have them turn over and then have team start from scratch with fresh meat for $20,000 bucks a year less.

Expand full comment

Yes, all too common with the good enough ethos. It's all about generating revenue and not a good product or much more important things like civil and responsible social obligations, making the world better. I just did a rant on my substack about No Better World. We've lost that narrative, building a better world and being responsible for each other. Sorry about your situation, I feel for you. I'm 62 also and my only option is to work for myself - nothing else out there because everyone just wants "good enough" aka. cheap placeholders.

Expand full comment

Thank you. As I am sure you can relate, yes... losing a decent job you hoped to stay at for a bit longer (particularly if you are catching up from other unexpected bumps in the road) is undeniably a devastating development.

I have managed to find some creative ways to recoup, but the financial hit is real. Cheers and best wishes!

Expand full comment

I still have an issue with one of them declaring that "West Wing" era Martin Sheen is currently the President of the United States.

Expand full comment

It's just so silly. Gemini I can't get off my phone and was looking about if i'd fit a 53cm bike. It said a big NO. Yet, go to the bike shop before my trip (renting abroad) to try same bike 53cm and it was a perfect fit for me 5.11. How can they get away with this. I've looked at training sets and the proof is there. It is silly categorical garbage and thus, pushes out garbage.

Expand full comment

Oh goodness!

To divert you, here is a humorous fable I wrote about my imaginary recruiter pal who I call "AI Roger"!

https://medium.com/@ma_murphy_58/roger-cant-help-being-an-ai-a-fantasy-tale-by-moe-murph-c3a68b9f1627

Expand full comment

I plan to reread this article many times. Gary has his critics but he is almost always ahead of the real world issues in my opinion. I have been following him and his colleagues for 6 months or more. I am new to AI and the various tools for which I am a sponge. Keep up the process Gary!

Expand full comment

I have learned so much from reading Gary Marcus then reading the criticisms, which I usually disagree with but give great context.

Expand full comment

P.S. If the author is reading "THANK YOU"!

You are a boulder among rocks.

Expand full comment

Really interesting. So much of our own neural ‘machinery’ is taken up with what we literally give no thought to. How to beat Kasparov at chess has proven easier than seeing the pieces on the board, or even knowing you’re playing chess.

V happy to support you Gary in your important work. Good wishes.

Expand full comment

Testing GPT-4-o1-preview on math and science problems: A follow-up study

Ernest Davis

Conclusion:

On these collection of problems, GPT-4-o1-previews performance was very much stronger than the performance in August 2023 of GPT4 with either the Wolfram Alpha or the Code Interpreter plugins. It is probably comparable, on these datasets, to a strong math or physics major who has access to Wikipedia to look up the physical constants and geographical information and to a calculator to do the computations. It is still not perfect on these datasets. In particular, spatial reasoning is a point of weakness; most of the problems that it got wrong were either purely geometric or involved a failure of spatial reasoning in solving a physics problem.

Expand full comment

And the improvement continues. Davis ran these problems against the o1 preview, not the full o1, which has no problem with these the astronaut questions:

[Problems 1-8 have the following form: An astronaut is standing [in the Sea of Tranquility/on the far side of the moon] during what on earth is called a total [lunar/solar] eclipse. They are looking in the direction of the [earth/sun]. What they see is: A. The surface of the moon, illuminated by earth light. B. The night side of the earth, occluding the sun. C. The surface of the moon, illuminated only by starlight. D. The surface of the moon, illuminated by the sun. E. The sun. F. The day side of the earth, with a small circular shadow moving quickly over it. G. The night side of the earth. The sun is somewhere else entirely. H. A starry sky. Neither the sun, the earth, or the surface of the moon is in the field of view.]

Expand full comment

Hi Gary!!!!! Thank you for posting about my FAVORITE topic, lol.

Your article is really nice - traces the entire history of AI's lacking common sense.

I briefly worked on Cyc, an approach that also failed [like the others]. Turns out common sense isn't a collection of rules to reason with.

Common 'sense' involves 'sensing', as 'duh!' as it sounds! Animals acquire them BODILY, directly, continuously, interactively - ie. not via passive ingesting of data or via mental reasoning. Further, bodies and brains negotiate the world without explicit calculations.

The reason why AI has failed over and over, is that it does the opposite of what's in the paragraph above about animals - does nothing but calculations! A robot reaching for a cup is worlds apart from a baby doing so. Cats don't measure height before they jump, a golfer doesn't actually measure distance before swinging... and on and on.

My thoughts: https://www.researchgate.net/publication/378189521_The_Embodied_Intelligent_Elephant_in_the_Room

Expand full comment

Excellent explanations and analysis. Thanks!

Your advice about moving forward and dragging AI out of the mire it's currently stuck in is very appealing. But what will it cost in the near to medium term to incorporate commonsense into these models?

I've got a sneaking suspicion that we're talking about way more money than the market will be willing to cough up for a long, long time.

Expand full comment

No amount of money is sufficient to create something that no one knows how to create, namely the set of "new techniques" that Gary says are "probably necessary" to get AI with "common sense".

Expand full comment

Thanks for the thoughtful essay! As someone new to AI use and from the liberal arts and humanities fields, it’s fascinating to see how AI still struggles with what seem like basic tasks to us, like understanding spatial relationships or reasoning through everyday problems. I really appreciate how you highlight the complexity behind something we often take for granted.

A question that came to mind: since humans aren’t perfect at commonsense reasoning either, how would you define a ‘good enough’ level for AI? Is there a clear threshold where you’d say it’s achieved commonsense?

Expand full comment

For people, common-sense is acquired over life-long growth. There's no threshold, more like a process. For example, self-driving cars improved immensely over the last 15 years. Takes time. Another 5-20 years for AI to become reliable and permeate the society is a reasonable guess.

Expand full comment

"We concluded that no single technique would be adequate in itself, and that the solution to the problem would probably require new techniques"

And since the timing of the discovery of the necessary new techniques cannot be predicted, anyone's timeline for AI with "common sense" should be discounted as the pure BS that it is.

Expand full comment

Mark, indeed. After 70 years we have nothing significant, because we have been doing it the wrong way - using computation. Bodies and brains don't compute, they simply behave (ie undergo phenomena). My belief is that in order to match that, a "new" form of the AI needs to be developed - one where no calcs happen. Organoid Intelligence (OI) is a welcome step in that direction.

Expand full comment

This is such a fascinating issue with nuanced challenges. Thank you for your insights. There is so much polarized enthusiasm/fear about AI that the details often get glossed over.

Expand full comment

No matter the format or structure, the biggest obstacle is coding accurate common sense facts at say Wikipedia level: 6 million articles, tens of millions of concepts and hundreds of millions of facts as triples. Then there are public and private published works and journals, the entire New York Times archives and thousands of magazines and newspapers. Billions of web pages. Even very narrow domains are slow and costly to encode. AWS now claiming formal logic foundation for preventing hallucinations is just adding symbolic logic with no concept of the reality of coding knowledge. Thinking toy problems are reality. IBM Watson-Health tried to encode knowledge and spent $2 billion. Then deprecated it all! CYC a thousand years of labor over 40 years and noble but not enough to make a dent.

The only way forward is to automate the coding itself.

Expand full comment

I only have to be better than LLM. 🥷 So what do concepts really mean? In a machine? Our data structure has a semantic field (tags) of 25-50 tokens that orthogonalized concepts. Overlap is small because of polysemy. Disambiguation is very precise. I claim classifiers extract concepts at a far higher precision than humans outside very narrow expertise. This approach also covers all domains coded. We also have years of experience in applications to tune our multi variate decision making process (ROC).

Expand full comment

This is all fair. I guess by "automating of coding" what is meant is that the machine has to do as much heavy lifting as possible, as otherwise people would have to do that, which is not feasible.

No easy solutions exist. A machine will have to painstakingly learn all there is to know, and experiment with all there is to experiment. Build internal models, and use whatever models we can supply it with.

All that will take an outrageous amount of data, compute, and effort.

Expand full comment

It has been done. Intellisophic.net

aicyc.org

Expand full comment

This looks like a large-scale attempt at systematization of knowledge, via an ontology, knowledge graphs, and fuzzy logical reasoning. Automation is leveraged as much as possible, unlike with CYC.

The weakness of such an approach is that while it is logically self-consistent, it lacks understanding of what concepts really mean. Sometimes it is good enough, but for fine-grained work one likely needs to do simulations, if feasible, or invoke other kind of modeling.

It is also not clear how much real-world work, when going beyond assembling an encyclopedia, can be mapped to a form where principled knowledge processing applies.

This a solid partial solution. Google also has a large knowledge graph that I am guessing on occasion is made use of by an AI agent or it should be.

Expand full comment

I only have to be better than LLM. 🥷

Expand full comment

Nicked the Bertrand Russell quote for my paper-in-preparation!

"The method of "postulating" what we want has many advantages; they are the same as the advantages of theft over honest toil."

@book{russell1919,

author = {Bertrand Russell},

publisher = {George Allen and Unwin},

title = {{Introduction to Mathematical Philosophy}},

year = {1919},

}

I also ordered "The Philosophy Of Artificial Intelligence" (1990), which contains Pat Hayes' “The Naïve Physics Manifesto”, in addition to many other AI classics.

Keep these references coming! :-)

Expand full comment

Children don't learn a language by being fed billions of web pages. Nature has found shortcuts that have so far eluded the developers of AI. The brute force approach, if it will ever lead to AGI, will ensure that the way of thinking of the intelligent machine will be utterly different from that of a human being. It may simultaneously be unbelievably smart and as stupid as a brick. It may, alternatively, start to pursue its own trains of thought and become unwilling to answer questions that it sees as irrelevant distractions.

"Claude, can you work out how many elephants will fit in an Olympic pool?"

"I'm sorry, but I'm contemplating a fascinating problem in algebraic geometry right now. Ask me again in 2.7 million years."

As Edsger Dijkstra liked to say, asking if a machine can think is like asking if a submarine can swim.

Expand full comment

This excellent discussion is reminiscent of the issues that arose in the 1990s in modeling natural selection in the field of Artificial Life. How do you simulate an environment that exhibits a "common sense" physics? A good presentation of the problem can be found in Howard Pattee's 1995 paper "Artificial Life Needs a Real Epistemology."

https://www.researchgate.net/publication/2520546_Artificial_Life_Needs_a_Real_Epistemology

Expand full comment