Marcus on AI

RMS

an exercise in marketing. Media are incapable of understanding the distinction

Expand full comment

I bet the engineers at OpenAI cringe every time Sam Altman and other sales people talk to the media or investors.

Expand full comment

Tom

considering all the hyperbole announcements and selective benchmark releases from frontier labs, I don’t even read them but wait for levelheaded AI experts like Gary and read their take instead

Expand full comment

A Thornton

Ad hominem is the AI'ers immediate Go-To tactic. Next is twisting your words or misquoting to make it seem you said things you didn't say. Then there is always the good old dismissal of your criticisms by saying they can be refuted, like Dreyfus, with "a few simple words" they never get around to writing. Finally, when it becomes apparent, even to them, the criticism are justified they will whine you are a big meanie who hurt their feelings.

Expand full comment

Swag Valance

Dec 22Edited

All of this is just comedy for me, btw.

I feel like an observing witness at an evangelical tent revival. Otherwise intelligent people are lead by their faith to queue up to reach salvation through a healing blow to the head administered by a frocked sama at the pulpit.

Expand full comment

Paul Topping

I think Chollet has done great work with ARC-AGI. The fact that a statistical approach has been relatively successful merely demonstrates how far away AGI really is. We don't have algorithms that can be brought to bear on ARC-AGI that approach the problem as a human would. (Or, if we do, their human creators didn't enter the contest.)

I look forward to the next generation of ARC-AGI. I believe one of the team's goals is to create a test that is harder for deep learning algorithms to tackle. Detractors will undoubtedly claim that the new test is unfairly biased against their favorite algorithm, but true fans of AGI will say, "This is the way."

Expand full comment

Reply (3)

Paul, BINGO.

It's the AI community that keeps complaining that the goal posts keep getting moved. "True fact" - from Day One, the goal [which needs no moving] was to mimic human intelligence. AI has been a grab bag of hype-filled one-offs, never getting anywhere to the original goal.

Expand full comment

No, the goal isn't to mimic human intelligence. Humans have a very severe limit on their input - the Four Pieces Limit. We need to transcend this limit.

Expand full comment

Jim, please read the ARC contest's fully. The goal *is* to mimic human intelligence. That's true not just for ARC, but for *all* of AI - again, look it up (the Dartmouth Summer Conference on AI, 1955).

Expand full comment

contest's pages

Expand full comment

Saty

I am sure what you say is true, but humans havge a severe limi8t on their intelligence - the FourPiecesLimit.com. That leads to horrendous mistakes when things get complex - maybe 10 pages of text. We need to do better, and fooling around with statistics to make something work is not the way to do it. It would be useful to explain to the machine in our native language what is expected.

Expand full comment

I think we are exactly aligned. LLM is fundamentally flawed. We just have different solutions.

Expand full comment

http://aicyc.org/2024/12/22/no-agi-without-semantic-ai/

There is a semantic AI model (SAM) that is complimentary to large language models (LLM) that contributes facts and reasoning to the AGI in a transparent way.

Expand full comment

Paul Topping

Surrounding an LLM by facts and reasoning is not going to work. When will you guys realize that you need to dump the LLM when it comes to AGI? You will always be working around their problems. It's just word statistics and humans do not reason or understand based on word statistics. LLMs are useful but not when it comes to AGI. An AGI may consult an LLM, say when it needs to generate text in Shakespearian style.

Expand full comment

Reply (4)

http://aicyc.org/2024/12/11/sam-implementation-of-a-belief-system/

This video (1m19s): [ https://bit.ly/3WuGyxE ] shows a new way to access generative AI using a promptless interface. Learn about the AICYC project [www.aicyc.org] dedicated to ending knowledge poverty.

Wrapping an LLM with facts included reasoning about those facts.

Expand full comment

Or when it needs to concoct a plausible tall tale, perhaps also in Shakespearean style

Expand full comment

Jan 7

I agree but LLMs have the money and problem. So surrounding them with a semantic AI wrapper solving some of their most pressing problems is a business decision.

Expand full comment

Dec 28

That is precisely what a semantic AI model (SAM) does. Thanks for the boost.

Expand full comment

A semantic model should not be seen as complementary to LLMs - if it understands text then it replaces all that an LLM can do. You are comparing something that understands text with something that understands not a word - what does an LLM do with a word that might have 60 meanings ("set") or 80 ("on"). The reliability of an LLM is far too low to be used on anything important - it is no more than an amusing toy.

https//www.activesemantics.com

Expand full comment

http://aicyc.org/2023/08/02/llm-ai-hallucination/

Jan 6

I don't care what an LLM does. A semantic model or symbolic AI must demonstrate how it corrects LLM with formal proof. That is the case for intellisophic.net products. We are certified by a U S Government agency NIST and international agencies.

http://aicyc.org/2024/10/05/how-sam-thinks/

Expand full comment

http://aicyc.org/2023/08/02/llm-ai-hallucination/

Semantic AI model is complimentary to LLM as reading and writing. Your points about polysemy iis why LLM needs SAM. SAM can't write but it can read and detect errors caused in part by polysemy.

Here is how SAM-1 (intellisophic.net) finds Hallucinations

Vision Video

https://vimeo.com/1030909563

Expand full comment

George, we obviously have very different ideas about the use of semantics. I use four cases:

Robodebt – loss of 1. billion dollars, 2 suicides – lawyers lied to benefit their political masters

Horizon – loss of 1 billion pounds, 4 suicides - “the program never makes a mistake”

Boeing 737 Max – web of lies, loss of 346 lives, Boeing loses at least 20 billion dollars

F-35 – hundreds of billions wasted

A version of the F-35 was meant to land on a carrier, but if it had just taken off and had to land immediately with a full load of fuel, the undercarriage would be smashed. The specification for the undercarriage ran to 3000 pages.

These are problems where the machine has to see the full problem in abstract action (in a way that a human cannot), not cobble together little pieces in an LLM sea of unknowingness.

Expand full comment

http://aicyc.org/2024/12/11/sam-implementation-of-a-belief-system/

Jan 6

We use Dempster-Shafer to handle beliefs - maybe what you call lies. Solipsistic reasoning starts in the mind of a single person. Where does it start in a formal model?

Expand full comment

Robodebt https://semanticstructure.blogspot.com/2022/12/reading-legislation-and-robo-debt-based.html

Jan 7

George,

“We use Dempster-Shafer to handle beliefs - maybe what you call lies.”

Dempster-Shafer is a dated method, when we couldn’t do it any better. Now, we can bring a document alive using Active Semantics, and the system can find all the inconsistencies, errors, and omissions.

When the belief system is a one-liner “Do whatever it takes”, analysing the belief system is a waste of time.

Some relevant blogs –

Boeing 737 https://semanticstructure.blogspot.com/2024/09/lies.html

A personal experience – fighting over a trademark with Google.

Google’s brief says that our use of more than one trademark is “fatal to our cause”.

IPA (Intellectual Property Australia) says a trader can use multiple trademarks on the same goods. An example of turning the law on its head.

Google says our software should be restricted to a single field, after registering Gemini, which Google claims to be useful in ten fields.

The arguments are so bad, I did not see how a judge could be persuaded. Google has set up a Confidential channel to the judge, so the other party (me) cannot know what they have told the judge.

Given the reputation for ruthlessness of U.S. law firms, I don’t see that belief system analysis is all that useful.

Another example – the specifications for the F-35. Hundreds of billions wasted. The parties involved – DoD, the contractor and many subcontractors, politicians. Whose belief system?

Expand full comment

Mike

Gary, you hit on a number of key points. I've endeavored for understanding rather than review the marketing and social media push by those with a vested interest. But I've come a way believing in some clear take aways, related to the cost to automate human performance on a specific test and time to complete that test vs. human performance. If one looks closely at the numbers. The high efficiency version under performed humans but at a significantly faster speed. Typical automation dynamics there. The down side was the cost at getting human performance. We can play a lot of people to solve this test with $350,000.

Expand full comment

M. E. Black

The notion given by Mr. Wolf of a machine learning program being able to take on any digital task without training specific to the task is total nonsense on its face. How do these guys think machine learning programs work, anyway?

"AGI labs" are trying to create digital golems, the whole research program is 20th century alchemy ("AI" in general is the greatest vaporware project in all of computer science, our philosopher's stone). How many billions of dollars that could have gone to public housing, or a San Fran specific issue, public bathrooms, have instead been blown training toy models that Cali C-suiters believe will give them magic powers if they jam enough reddit threads into it?

Expand full comment

PT Lambert

Dec 23Edited

How can o3 fail on any ARC-AGI tasks when it supposedly solved ~25% of the FrontierMath problems? Just from the sample published ARC-AGI failures and FrontierMath example problems from their website, the former are basically trivial while the latter can't be cracked by the vast majority of humans on this planet. It's basically like getting 2+2 wrong while correctly solving quantum physics and string theory problems.

Are we quite sure that some shape or form of the FrontierMath problems hasn't been used in training or fine-tuning? After all, AI influencers were impressed by even earlier GPT models solving complex math problems -- except only the ones whose solutions appear on the Internet.

Expand full comment

Dec 24

OpenAI has reportedly hired mathematicians to solve math problems, whose solutions are then used to train GPT.

Expand full comment

None of OpenAIs claims should be accepted without being independently verifAI’d.

It’s actually absurd that a “discipline” that some actually call computer “science” is performed in such an opaque, unscientific way.

Expand full comment

It’s an embarrassment to legitimate computer scientists — or at least should be.

Expand full comment

hugh

The way I read Challet’s blog post was that he was strongly implying they fine tuned on the test data. Both of the sets: public and “semi-private” are without a doubt in OpenAI’s datasets. In fact, they’ve probably logged the semi-private dataset hundreds of times. He also said that he expected o3 to score <30% on the newer version of ARC. Feels more like cheating out of desperation than AGI, but what do I know?

Expand full comment

Timothy

Dec 25

I'm getting the same impression, doesn't help that another team that scored high wasn't verified for not being open-source, but OAI was despite also not being open-source.

And then you have the limit on compute power that was completely ignored.

Expand full comment

Jan 6

I really don't understand why there is not more emphasis on this point about the "semi-private" data set, even from chollet. There is no guarantee that the "semi-private" set is not in the training set for o3. In fact, it could in there without OpenAI explicitly training on it (e.g., to "cheat"), but someone else could have leaked it to the internet and OpenAI could be training on it without knowing it. I think the result is still a big deal / impressive, but there is very little discussion of this huge asterisk in the result.

Expand full comment

Richard Foxall

We're in an era where public discourse seems to be led by folks willing to make unsubstantiated claims, or willing to encourage others to do so, only to walk it back later and hope that no one notices the deception. In this case, the deception leads to doubling down on investment in a model that likely diverts from research into supplemental techniques and technologies. I don't really care if Microsoft or other AI investors lose money (my exposure to them is minimal and those companies have money to burn), but the deception of the public and erosion of critical thinking about science and technology in the press has a societal price we all pay.

At the same time, these LLMs have huge costs in energy ($1000 a query, in some instances, I have read, at a time when energy prices are actually pretty low) that contribute massively to the overheating of our planet. (AI trolls: Don't waste your time telling me that they use solar or hydro, they still displace other uses, causing dirty plants to stay on line longer.) That is a price we all pay, too.

Thanks for the skepticism. It's much needed.

Expand full comment

Zohaib A

Your point #7 about influencers being intellectually dishonest made me laugh... As if they ever had the 'intellect' or the 'honesty' to start with.

Expand full comment

Gerben Wierda

Hanlon's Razor

Expand full comment

Hi Gary, your analysis is spot on, as usual! It's amazing how people are so eager to latch on to terms ('AGI') with no clue as to what they mean.

ARC solving [alone] isn't going to lead to AGI - same as how, acing an standard IQ test [alone] isn't a measure of someone's intelligence.

Too bad Francois even made that post about o3 [with a disclaimer in it aside], that ended up adding legitimacy.

By the way [not related to ARC], "common sense reasoning" is an oxymoron unless there is embodiment - the 'sense' part involves direct sensing and perceiving of the physical world, and not symbolic calcs (GOFAI), gradient descent calcs (ML) or cost minimization calcs (RL). *No* AI today does actual common sense reasoning, let alone embark on a path to AGI [whatever the heck that even means].

Expand full comment

Orome

Typo: "4. ... and ack of ...

Expand full comment

Gary Marcus

Ack! and I ACK! and I fixed, thanks

Expand full comment

Gerben Wierda

Isn't what you call pretraining actually finetuning? Pretraining on large corpora gets you the basic language patterns as token statistics. GPT3 was done in 2019. Then they worked for 3 years finetuning to 'sanitise' it enough to be put in the hands of the public (that Kenya stuff...)

As my understanding goes: pretraining is masking tokens on text and calculating token selection.

Or does 01/03 actually do this in a form of pretraining? But how then does that overcome the pure 'weight' of the other pretraining? Enquiring minds would like to know.

Expand full comment

https://arstechnica.com/information-technology/2024/12/twirling-body-horror-in-gymnastics-video-exposes-ais-flaws/

Altman et AI can only hide behind slick choreographed demos with cherry picked pretrained examples for so long.

Eventually, they will have to release o3 to the public and , like Sora befora, o3 will inevitably have its Olympic gymnastics moment at which point people will realize it is not Simone Biles but the Bride of Frankenstein with her legs sewn into her armpits

Expand full comment

e drake kajioka