Kevin Roose, of Hard Fork and NYT, was so impressed with OpenAIโs rollout that he joked โof course they have to announce AGI the day my vacation startsโ.
OpenAI is not an AGI lab, it's a persuading-people-to-give-them-money-on-the-basis-of-some-vague-optimistic-promise lab. That's what they're really good at. That's what the demo is.
considering all the hyperbole announcements and selective benchmark releases from frontier labs, I donโt even read them but wait for levelheaded AI experts like Gary and read their take instead
Ad hominem is the AI'ers immediate Go-To tactic. Next is twisting your words or misquoting to make it seem you said things you didn't say. Then there is always the good old dismissal of your criticisms by saying they can be refuted, like Dreyfus, with "a few simple words" they never get around to writing. Finally, when it becomes apparent, even to them, the criticism are justified they will whine you are a big meanie who hurt their feelings.
I feel like an observing witness at an evangelical tent revival. Otherwise intelligent people are lead by their faith to queue up to reach salvation through a healing blow to the head administered by a frocked sama at the pulpit.
I think Chollet has done great work with ARC-AGI. The fact that a statistical approach has been relatively successful merely demonstrates how far away AGI really is. We don't have algorithms that can be brought to bear on ARC-AGI that approach the problem as a human would. (Or, if we do, their human creators didn't enter the contest.)
I look forward to the next generation of ARC-AGI. I believe one of the team's goals is to create a test that is harder for deep learning algorithms to tackle. Detractors will undoubtedly claim that the new test is unfairly biased against their favorite algorithm, but true fans of AGI will say, "This is the way."
It's the AI community that keeps complaining that the goal posts keep getting moved. "True fact" - from Day One, the goal [which needs no moving] was to mimic human intelligence. AI has been a grab bag of hype-filled one-offs, never getting anywhere to the original goal.
No, the goal isn't to mimic human intelligence. Humans have a very severe limit on their input - the Four Pieces Limit. We need to transcend this limit.
Jim, please read the ARC contest's fully. The goal *is* to mimic human intelligence. That's true not just for ARC, but for *all* of AI - again, look it up (the Dartmouth Summer Conference on AI, 1955).
I am sure what you say is true, but humans havge a severe limi8t on their intelligence - the FourPiecesLimit.com. That leads to horrendous mistakes when things get complex - maybe 10 pages of text. We need to do better, and fooling around with statistics to make something work is not the way to do it. It would be useful to explain to the machine in our native language what is expected.
There is a semantic AI model (SAM) that is complimentary to large language models (LLM) that contributes facts and reasoning to the AGI in a transparent way.
Surrounding an LLM by facts and reasoning is not going to work. When will you guys realize that you need to dump the LLM when it comes to AGI? You will always be working around their problems. It's just word statistics and humans do not reason or understand based on word statistics. LLMs are useful but not when it comes to AGI. An AGI may consult an LLM, say when it needs to generate text in Shakespearian style.
This video (1m19s): [ https://bit.ly/3WuGyxE ] shows a new way to access generative AI using a promptless interface. Learn about the AICYC project [www.aicyc.org] dedicated to ending knowledge poverty.
Wrapping an LLM with facts included reasoning about those facts.
I agree but LLMs have the money and problem. So surrounding them with a semantic AI wrapper solving some of their most pressing problems is a business decision.
A semantic model should not be seen as complementary to LLMs - if it understands text then it replaces all that an LLM can do. You are comparing something that understands text with something that understands not a word - what does an LLM do with a word that might have 60 meanings ("set") or 80 ("on"). The reliability of an LLM is far too low to be used on anything important - it is no more than an amusing toy.
I don't care what an LLM does. A semantic model or symbolic AI must demonstrate how it corrects LLM with formal proof. That is the case for intellisophic.net products. We are certified by a U S Government agency NIST and international agencies.
Semantic AI model is complimentary to LLM as reading and writing. Your points about polysemy iis why LLM needs SAM. SAM can't write but it can read and detect errors caused in part by polysemy.
George, we obviously have very different ideas about the use of semantics. I use four cases:
Robodebt โ loss of 1. billion dollars, 2 suicides โ lawyers lied to benefit their political masters
Horizon โ loss of 1 billion pounds, 4 suicides - โthe program never makes a mistakeโ
Boeing 737 Max โ web of lies, loss of 346 lives, Boeing loses at least 20 billion dollars
F-35 โ hundreds of billions wasted
A version of the F-35 was meant to land on a carrier, but if it had just taken off and had to land immediately with a full load of fuel, the undercarriage would be smashed. The specification for the undercarriage ran to 3000 pages.
These are problems where the machine has to see the full problem in abstract action (in a way that a human cannot), not cobble together little pieces in an LLM sea of unknowingness.
We use Dempster-Shafer to handle beliefs - maybe what you call lies. Solipsistic reasoning starts in the mind of a single person. Where does it start in a formal model?
โWe use Dempster-Shafer to handle beliefs - maybe what you call lies.โ
Dempster-Shafer is a dated method, when we couldnโt do it any better. Now, we can bring a document alive using Active Semantics, and the system can find all the inconsistencies, errors, and omissions.
When the belief system is a one-liner โDo whatever it takesโ, analysing the belief system is a waste of time.
A personal experience โ fighting over a trademark with Google.
Googleโs brief says that our use of more than one trademark is โfatal to our causeโ.
IPA (Intellectual Property Australia) says a trader can use multiple trademarks on the same goods. An example of turning the law on its head.
Google says our software should be restricted to a single field, after registering Gemini, which Google claims to be useful in ten fields.
The arguments are so bad, I did not see how a judge could be persuaded. Google has set up a Confidential channel to the judge, so the other party (me) cannot know what they have told the judge.
Given the reputation for ruthlessness of U.S. law firms, I donโt see that belief system analysis is all that useful.
Another example โ the specifications for the F-35. Hundreds of billions wasted. The parties involved โ DoD, the contractor and many subcontractors, politicians. Whose belief system?
Gary, you hit on a number of key points. I've endeavored for understanding rather than review the marketing and social media push by those with a vested interest. But I've come a way believing in some clear take aways, related to the cost to automate human performance on a specific test and time to complete that test vs. human performance. If one looks closely at the numbers. The high efficiency version under performed humans but at a significantly faster speed. Typical automation dynamics there. The down side was the cost at getting human performance. We can play a lot of people to solve this test with $350,000.
The notion given by Mr. Wolf of a machine learning program being able to take on any digital task without training specific to the task is total nonsense on its face. How do these guys think machine learning programs work, anyway?
"AGI labs" are trying to create digital golems, the whole research program is 20th century alchemy ("AI" in general is the greatest vaporware project in all of computer science, our philosopher's stone). How many billions of dollars that could have gone to public housing, or a San Fran specific issue, public bathrooms, have instead been blown training toy models that Cali C-suiters believe will give them magic powers if they jam enough reddit threads into it?
How can o3 fail on any ARC-AGI tasks when it supposedly solved ~25% of the FrontierMath problems? Just from the sample published ARC-AGI failures and FrontierMath example problems from their website, the former are basically trivial while the latter can't be cracked by the vast majority of humans on this planet. It's basically like getting 2+2 wrong while correctly solving quantum physics and string theory problems.
Are we quite sure that some shape or form of the FrontierMath problems hasn't been used in training or fine-tuning? After all, AI influencers were impressed by even earlier GPT models solving complex math problems -- except only the ones whose solutions appear on the Internet.
We're in an era where public discourse seems to be led by folks willing to make unsubstantiated claims, or willing to encourage others to do so, only to walk it back later and hope that no one notices the deception. In this case, the deception leads to doubling down on investment in a model that likely diverts from research into supplemental techniques and technologies. I don't really care if Microsoft or other AI investors lose money (my exposure to them is minimal and those companies have money to burn), but the deception of the public and erosion of critical thinking about science and technology in the press has a societal price we all pay.
At the same time, these LLMs have huge costs in energy ($1000 a query, in some instances, I have read, at a time when energy prices are actually pretty low) that contribute massively to the overheating of our planet. (AI trolls: Don't waste your time telling me that they use solar or hydro, they still displace other uses, causing dirty plants to stay on line longer.) That is a price we all pay, too.
Hi Gary, your analysis is spot on, as usual! It's amazing how people are so eager to latch on to terms ('AGI') with no clue as to what they mean.
ARC solving [alone] isn't going to lead to AGI - same as how, acing an standard IQ test [alone] isn't a measure of someone's intelligence.
Too bad Francois even made that post about o3 [with a disclaimer in it aside], that ended up adding legitimacy.
By the way [not related to ARC], "common sense reasoning" is an oxymoron unless there is embodiment - the 'sense' part involves direct sensing and perceiving of the physical world, and not symbolic calcs (GOFAI), gradient descent calcs (ML) or cost minimization calcs (RL). *No* AI today does actual common sense reasoning, let alone embark on a path to AGI [whatever the heck that even means].
Isn't what you call pretraining actually finetuning? Pretraining on large corpora gets you the basic language patterns as token statistics. GPT3 was done in 2019. Then they worked for 3 years finetuning to 'sanitise' it enough to be put in the hands of the public (that Kenya stuff...)
As my understanding goes: pretraining is masking tokens on text and calculating token selection.
Or does 01/03 actually do this in a form of pretraining? But how then does that overcome the pure 'weight' of the other pretraining? Enquiring minds would like to know.
Altman et AI can only hide behind slick choreographed demos with cherry picked pretrained examples for so long.
Eventually, they will have to release o3 to the public and , like Sora befora, o3 will inevitably have its Olympic gymnastics moment at which point people will realize it is not Simone Biles but the Bride of Frankenstein with her legs sewn into her armpits
I think another aspect of this is the continuing and evolving effect on the public. This wasn't, i think, covered nearly as widely as the (whackadoodle) claims of AI consciousness back in 2022. The public may be becoming saturated with these wolf cries. I wonder what it means. Purely in terms of limiting the stress of the number of people worrying about this i think is good; hopefully it means public literacy about AI is increasing. Surely it helps to have it right in front of them. My eight year old plays chess and wants to know "why chatgpt is so stupid" about playing chess. It will get there of course. But at least these things are becoming less abstract.
I read speculation that Altman's desire to declare AGI has mostly to do with the poison pill clause in the MS contract; wonder what you think about the likelihood of that context. The hype shall continue until morale improves...
Gary Marcus, it seems youโve once again found a podium to broadcast your perennial skepticism - this time armed with yet another hyper-critical postmortem. You question if o3 is a step toward AGI and proceed to dissect the rollout with the fervor of someone whoโs personally offended by its existence. Yet, your criticisms, while verbose, seem more like the howling of a dog barking at the sun than the constructive discourse we need in the AI space. Letโs address a few glaring contradictions youโve raised:
1. ARC-AGI Isnโt AGI?
Sure, we get it. The ARC test isnโt the definitive acid test for AGI. But no one claimed it was. Whatโs perplexing is your insistence on repeatedly pointing out the obvious, as if everyone is holding their breath waiting for you to bestow wisdom. AGI is an incremental journey, and the steps may not be perfect, but dismissing them outright reeks more of bitterness than rational critique.
2. Pre-training Rant:
Pre-training was used. You mean like itโs used in every single state-of-the-art AI model ever built? To pretend that humans somehow arrive at their conclusions without "pre-training" through years of learning and experience is an absurd double standard. But sure, Gary, letโs all hold our collective breath for your neuro-symbolic model that doesnโt leverage anything remotely resembling pre-training while also being more optimal than the human brain.
3. Graphs and Misleading Data:
While you rightly criticize cherry-picking, the irony of your post doing exactly the same thing isnโt lost on anyone. You casually omit the scientific progress being celebrated here, choosing instead to fixate on your personal grudge against what you perceive as hype. How about you channel that energy into building something of comparable significance instead of nitpicking from the sidelines?
4. Not AGIโ Drumbeat:
We get it - you donโt think o3 is AGI. Congratulations, youโve cracked the case! But framing incremental progress as failure is intellectually dishonest. Rome wasnโt built in a day, and AGI wonโt spring from the ether fully formed. Itโs a journey, and dismissing every step forward as irrelevant because it doesnโt meet your ideal definition doesnโt make you an intellectual giant; it makes you an obstructionist.
5. Go Build Your Neuro-Symbolic Wonder:
Hereโs a sincere suggestion, Gary: stop braying at every advancement that doesnโt fit your mold. Instead, take your neuro-symbolic dreams, gather a team, and build something thatโs more optimal than the human brain if youโre so confident in your approach. Critique without contribution is cheap.
6. Capitalizing on Controversy:
Letโs be real - this isnโt just about "saving science" or "holding companies accountable." Youโre capitalizing on the hype, Gary. Every time you whip up a storm, your subscriber count grows, your following increases, and your name stays in the news. You could have chosen to contribute meaningfully to the field, but instead, youโve leaned into the notoriety of being *that guy* who throws dirt on the hard work and genius of others. Fame earned through destruction rather than construction may grab headlines, but itโs not the kind of legacy anyone admires.
7. The Money Question:
Speaking of capitalizing - how much money have you made off this relentless campaign of negativity? Book deals, speaking gigs, subscriber growth - all fueled by this one-note narrative of tearing down others. The public deserves to know how much of this indignation is fueled by genuine concern and how much is simply a cash grab. If youโre so principled, perhaps you could channel those earnings into something constructive, like funding your oft-touted neuro-symbolic wonder model?
And finally, just disappear!. Honestly, your endless naysaying is not only annoying but also completely irrelevant at this point. Nobel Prize winners have openly called you a moron - yes, letโs not forget that. People with far more credibility have dismissed your ideas as out of touch and counterproductive. The world has moved on, Gary, and your insistence on clinging to the limelight by hurling cheap shots is exhausting.
If you want to matter again, build something. Solve a problem. Contribute to progress. Until then, youโre just noise in a world thatโs busy building the future while youโre stuck in the past.
Side note: This reply was partially written by O1 (custom tailored for you) - take that!
OpenAI is not an AGI lab, it's a persuading-people-to-give-them-money-on-the-basis-of-some-vague-optimistic-promise lab. That's what they're really good at. That's what the demo is.
an exercise in marketing. Media are incapable of understanding the distinction
I bet the engineers at OpenAI cringe every time Sam Altman and other sales people talk to the media or investors.
considering all the hyperbole announcements and selective benchmark releases from frontier labs, I donโt even read them but wait for levelheaded AI experts like Gary and read their take instead
Ad hominem is the AI'ers immediate Go-To tactic. Next is twisting your words or misquoting to make it seem you said things you didn't say. Then there is always the good old dismissal of your criticisms by saying they can be refuted, like Dreyfus, with "a few simple words" they never get around to writing. Finally, when it becomes apparent, even to them, the criticism are justified they will whine you are a big meanie who hurt their feelings.
All of this is just comedy for me, btw.
I feel like an observing witness at an evangelical tent revival. Otherwise intelligent people are lead by their faith to queue up to reach salvation through a healing blow to the head administered by a frocked sama at the pulpit.
I think Chollet has done great work with ARC-AGI. The fact that a statistical approach has been relatively successful merely demonstrates how far away AGI really is. We don't have algorithms that can be brought to bear on ARC-AGI that approach the problem as a human would. (Or, if we do, their human creators didn't enter the contest.)
I look forward to the next generation of ARC-AGI. I believe one of the team's goals is to create a test that is harder for deep learning algorithms to tackle. Detractors will undoubtedly claim that the new test is unfairly biased against their favorite algorithm, but true fans of AGI will say, "This is the way."
Paul, BINGO.
It's the AI community that keeps complaining that the goal posts keep getting moved. "True fact" - from Day One, the goal [which needs no moving] was to mimic human intelligence. AI has been a grab bag of hype-filled one-offs, never getting anywhere to the original goal.
No, the goal isn't to mimic human intelligence. Humans have a very severe limit on their input - the Four Pieces Limit. We need to transcend this limit.
Jim, please read the ARC contest's fully. The goal *is* to mimic human intelligence. That's true not just for ARC, but for *all* of AI - again, look it up (the Dartmouth Summer Conference on AI, 1955).
contest's pages
Saty
I am sure what you say is true, but humans havge a severe limi8t on their intelligence - the FourPiecesLimit.com. That leads to horrendous mistakes when things get complex - maybe 10 pages of text. We need to do better, and fooling around with statistics to make something work is not the way to do it. It would be useful to explain to the machine in our native language what is expected.
I think we are exactly aligned. LLM is fundamentally flawed. We just have different solutions.
There is a semantic AI model (SAM) that is complimentary to large language models (LLM) that contributes facts and reasoning to the AGI in a transparent way.
http://aicyc.org/2024/12/22/no-agi-without-semantic-ai/
Surrounding an LLM by facts and reasoning is not going to work. When will you guys realize that you need to dump the LLM when it comes to AGI? You will always be working around their problems. It's just word statistics and humans do not reason or understand based on word statistics. LLMs are useful but not when it comes to AGI. An AGI may consult an LLM, say when it needs to generate text in Shakespearian style.
This video (1m19s): [ https://bit.ly/3WuGyxE ] shows a new way to access generative AI using a promptless interface. Learn about the AICYC project [www.aicyc.org] dedicated to ending knowledge poverty.
Wrapping an LLM with facts included reasoning about those facts.
http://aicyc.org/2024/12/11/sam-implementation-of-a-belief-system/
Or when it needs to concoct a plausible tall tale, perhaps also in Shakespearean style
I agree but LLMs have the money and problem. So surrounding them with a semantic AI wrapper solving some of their most pressing problems is a business decision.
That is precisely what a semantic AI model (SAM) does. Thanks for the boost.
A semantic model should not be seen as complementary to LLMs - if it understands text then it replaces all that an LLM can do. You are comparing something that understands text with something that understands not a word - what does an LLM do with a word that might have 60 meanings ("set") or 80 ("on"). The reliability of an LLM is far too low to be used on anything important - it is no more than an amusing toy.
https//www.activesemantics.com
I don't care what an LLM does. A semantic model or symbolic AI must demonstrate how it corrects LLM with formal proof. That is the case for intellisophic.net products. We are certified by a U S Government agency NIST and international agencies.
http://aicyc.org/2023/08/02/llm-ai-hallucination/
http://aicyc.org/2024/10/05/how-sam-thinks/
Semantic AI model is complimentary to LLM as reading and writing. Your points about polysemy iis why LLM needs SAM. SAM can't write but it can read and detect errors caused in part by polysemy.
Here is how SAM-1 (intellisophic.net) finds Hallucinations
http://aicyc.org/2023/08/02/llm-ai-hallucination/
Vision Video
https://vimeo.com/1030909563
George, we obviously have very different ideas about the use of semantics. I use four cases:
Robodebt โ loss of 1. billion dollars, 2 suicides โ lawyers lied to benefit their political masters
Horizon โ loss of 1 billion pounds, 4 suicides - โthe program never makes a mistakeโ
Boeing 737 Max โ web of lies, loss of 346 lives, Boeing loses at least 20 billion dollars
F-35 โ hundreds of billions wasted
A version of the F-35 was meant to land on a carrier, but if it had just taken off and had to land immediately with a full load of fuel, the undercarriage would be smashed. The specification for the undercarriage ran to 3000 pages.
These are problems where the machine has to see the full problem in abstract action (in a way that a human cannot), not cobble together little pieces in an LLM sea of unknowingness.
We use Dempster-Shafer to handle beliefs - maybe what you call lies. Solipsistic reasoning starts in the mind of a single person. Where does it start in a formal model?
http://aicyc.org/2024/12/11/sam-implementation-of-a-belief-system/
George,
โWe use Dempster-Shafer to handle beliefs - maybe what you call lies.โ
Dempster-Shafer is a dated method, when we couldnโt do it any better. Now, we can bring a document alive using Active Semantics, and the system can find all the inconsistencies, errors, and omissions.
When the belief system is a one-liner โDo whatever it takesโ, analysing the belief system is a waste of time.
Some relevant blogs โ
Boeing 737 https://semanticstructure.blogspot.com/2024/09/lies.html
Robodebt https://semanticstructure.blogspot.com/2022/12/reading-legislation-and-robo-debt-based.html
A personal experience โ fighting over a trademark with Google.
Googleโs brief says that our use of more than one trademark is โfatal to our causeโ.
IPA (Intellectual Property Australia) says a trader can use multiple trademarks on the same goods. An example of turning the law on its head.
Google says our software should be restricted to a single field, after registering Gemini, which Google claims to be useful in ten fields.
The arguments are so bad, I did not see how a judge could be persuaded. Google has set up a Confidential channel to the judge, so the other party (me) cannot know what they have told the judge.
Given the reputation for ruthlessness of U.S. law firms, I donโt see that belief system analysis is all that useful.
Another example โ the specifications for the F-35. Hundreds of billions wasted. The parties involved โ DoD, the contractor and many subcontractors, politicians. Whose belief system?
Gary, you hit on a number of key points. I've endeavored for understanding rather than review the marketing and social media push by those with a vested interest. But I've come a way believing in some clear take aways, related to the cost to automate human performance on a specific test and time to complete that test vs. human performance. If one looks closely at the numbers. The high efficiency version under performed humans but at a significantly faster speed. Typical automation dynamics there. The down side was the cost at getting human performance. We can play a lot of people to solve this test with $350,000.
The notion given by Mr. Wolf of a machine learning program being able to take on any digital task without training specific to the task is total nonsense on its face. How do these guys think machine learning programs work, anyway?
"AGI labs" are trying to create digital golems, the whole research program is 20th century alchemy ("AI" in general is the greatest vaporware project in all of computer science, our philosopher's stone). How many billions of dollars that could have gone to public housing, or a San Fran specific issue, public bathrooms, have instead been blown training toy models that Cali C-suiters believe will give them magic powers if they jam enough reddit threads into it?
How can o3 fail on any ARC-AGI tasks when it supposedly solved ~25% of the FrontierMath problems? Just from the sample published ARC-AGI failures and FrontierMath example problems from their website, the former are basically trivial while the latter can't be cracked by the vast majority of humans on this planet. It's basically like getting 2+2 wrong while correctly solving quantum physics and string theory problems.
Are we quite sure that some shape or form of the FrontierMath problems hasn't been used in training or fine-tuning? After all, AI influencers were impressed by even earlier GPT models solving complex math problems -- except only the ones whose solutions appear on the Internet.
OpenAI has reportedly hired mathematicians to solve math problems, whose solutions are then used to train GPT.
None of OpenAIs claims should be accepted without being independently verifAIโd.
Itโs actually absurd that a โdisciplineโ that some actually call computer โscienceโ is performed in such an opaque, unscientific way.
Itโs an embarrassment to legitimate computer scientists โ or at least should be.
We're in an era where public discourse seems to be led by folks willing to make unsubstantiated claims, or willing to encourage others to do so, only to walk it back later and hope that no one notices the deception. In this case, the deception leads to doubling down on investment in a model that likely diverts from research into supplemental techniques and technologies. I don't really care if Microsoft or other AI investors lose money (my exposure to them is minimal and those companies have money to burn), but the deception of the public and erosion of critical thinking about science and technology in the press has a societal price we all pay.
At the same time, these LLMs have huge costs in energy ($1000 a query, in some instances, I have read, at a time when energy prices are actually pretty low) that contribute massively to the overheating of our planet. (AI trolls: Don't waste your time telling me that they use solar or hydro, they still displace other uses, causing dirty plants to stay on line longer.) That is a price we all pay, too.
Thanks for the skepticism. It's much needed.
Your point #7 about influencers being intellectually dishonest made me laugh... As if they ever had the 'intellect' or the 'honesty' to start with.
Hanlon's Razor
Hi Gary, your analysis is spot on, as usual! It's amazing how people are so eager to latch on to terms ('AGI') with no clue as to what they mean.
ARC solving [alone] isn't going to lead to AGI - same as how, acing an standard IQ test [alone] isn't a measure of someone's intelligence.
Too bad Francois even made that post about o3 [with a disclaimer in it aside], that ended up adding legitimacy.
By the way [not related to ARC], "common sense reasoning" is an oxymoron unless there is embodiment - the 'sense' part involves direct sensing and perceiving of the physical world, and not symbolic calcs (GOFAI), gradient descent calcs (ML) or cost minimization calcs (RL). *No* AI today does actual common sense reasoning, let alone embark on a path to AGI [whatever the heck that even means].
Typo: "4. ... and ack of ...
Ack! and I ACK! and I fixed, thanks
Isn't what you call pretraining actually finetuning? Pretraining on large corpora gets you the basic language patterns as token statistics. GPT3 was done in 2019. Then they worked for 3 years finetuning to 'sanitise' it enough to be put in the hands of the public (that Kenya stuff...)
As my understanding goes: pretraining is masking tokens on text and calculating token selection.
Or does 01/03 actually do this in a form of pretraining? But how then does that overcome the pure 'weight' of the other pretraining? Enquiring minds would like to know.
Altman et AI can only hide behind slick choreographed demos with cherry picked pretrained examples for so long.
Eventually, they will have to release o3 to the public and , like Sora befora, o3 will inevitably have its Olympic gymnastics moment at which point people will realize it is not Simone Biles but the Bride of Frankenstein with her legs sewn into her armpits
https://arstechnica.com/information-technology/2024/12/twirling-body-horror-in-gymnastics-video-exposes-ais-flaws/
Appreciate your thoughts and time as always.
I think another aspect of this is the continuing and evolving effect on the public. This wasn't, i think, covered nearly as widely as the (whackadoodle) claims of AI consciousness back in 2022. The public may be becoming saturated with these wolf cries. I wonder what it means. Purely in terms of limiting the stress of the number of people worrying about this i think is good; hopefully it means public literacy about AI is increasing. Surely it helps to have it right in front of them. My eight year old plays chess and wants to know "why chatgpt is so stupid" about playing chess. It will get there of course. But at least these things are becoming less abstract.
I read speculation that Altman's desire to declare AGI has mostly to do with the poison pill clause in the MS contract; wonder what you think about the likelihood of that context. The hype shall continue until morale improves...
Gary Marcus, it seems youโve once again found a podium to broadcast your perennial skepticism - this time armed with yet another hyper-critical postmortem. You question if o3 is a step toward AGI and proceed to dissect the rollout with the fervor of someone whoโs personally offended by its existence. Yet, your criticisms, while verbose, seem more like the howling of a dog barking at the sun than the constructive discourse we need in the AI space. Letโs address a few glaring contradictions youโve raised:
1. ARC-AGI Isnโt AGI?
Sure, we get it. The ARC test isnโt the definitive acid test for AGI. But no one claimed it was. Whatโs perplexing is your insistence on repeatedly pointing out the obvious, as if everyone is holding their breath waiting for you to bestow wisdom. AGI is an incremental journey, and the steps may not be perfect, but dismissing them outright reeks more of bitterness than rational critique.
2. Pre-training Rant:
Pre-training was used. You mean like itโs used in every single state-of-the-art AI model ever built? To pretend that humans somehow arrive at their conclusions without "pre-training" through years of learning and experience is an absurd double standard. But sure, Gary, letโs all hold our collective breath for your neuro-symbolic model that doesnโt leverage anything remotely resembling pre-training while also being more optimal than the human brain.
3. Graphs and Misleading Data:
While you rightly criticize cherry-picking, the irony of your post doing exactly the same thing isnโt lost on anyone. You casually omit the scientific progress being celebrated here, choosing instead to fixate on your personal grudge against what you perceive as hype. How about you channel that energy into building something of comparable significance instead of nitpicking from the sidelines?
4. Not AGIโ Drumbeat:
We get it - you donโt think o3 is AGI. Congratulations, youโve cracked the case! But framing incremental progress as failure is intellectually dishonest. Rome wasnโt built in a day, and AGI wonโt spring from the ether fully formed. Itโs a journey, and dismissing every step forward as irrelevant because it doesnโt meet your ideal definition doesnโt make you an intellectual giant; it makes you an obstructionist.
5. Go Build Your Neuro-Symbolic Wonder:
Hereโs a sincere suggestion, Gary: stop braying at every advancement that doesnโt fit your mold. Instead, take your neuro-symbolic dreams, gather a team, and build something thatโs more optimal than the human brain if youโre so confident in your approach. Critique without contribution is cheap.
6. Capitalizing on Controversy:
Letโs be real - this isnโt just about "saving science" or "holding companies accountable." Youโre capitalizing on the hype, Gary. Every time you whip up a storm, your subscriber count grows, your following increases, and your name stays in the news. You could have chosen to contribute meaningfully to the field, but instead, youโve leaned into the notoriety of being *that guy* who throws dirt on the hard work and genius of others. Fame earned through destruction rather than construction may grab headlines, but itโs not the kind of legacy anyone admires.
7. The Money Question:
Speaking of capitalizing - how much money have you made off this relentless campaign of negativity? Book deals, speaking gigs, subscriber growth - all fueled by this one-note narrative of tearing down others. The public deserves to know how much of this indignation is fueled by genuine concern and how much is simply a cash grab. If youโre so principled, perhaps you could channel those earnings into something constructive, like funding your oft-touted neuro-symbolic wonder model?
And finally, just disappear!. Honestly, your endless naysaying is not only annoying but also completely irrelevant at this point. Nobel Prize winners have openly called you a moron - yes, letโs not forget that. People with far more credibility have dismissed your ideas as out of touch and counterproductive. The world has moved on, Gary, and your insistence on clinging to the limelight by hurling cheap shots is exhausting.
If you want to matter again, build something. Solve a problem. Contribute to progress. Until then, youโre just noise in a world thatโs busy building the future while youโre stuck in the past.
Side note: This reply was partially written by O1 (custom tailored for you) - take that!
You appear to have written the same thing on X and I responded there.
It was very evident that this piece was written by ChatGPT, I guessed this after the first paragraph :D
your criticisms, while verbose, seem more like the howling of a dog barking at the sun โ
This โcritiqueโ has all the tooth marks of a chatbot:
Which is it ,howling or barking?
Do dogs actually bark at the sun?
In my 6 decades, Iโve yet to see one doing that. That doesnโt mean it never happens, just that it is howly unlikely.
In my opinion, we will recognize AGI when it knows the difference between howling and barking.
Gary is just reacting to the massive harm that comes from the idea there is only one AI model.
http://aicyc.org/2024/12/22/no-agi-without-semantic-ai/