Kevin Roose, of Hard Fork and NYT, was so impressed with OpenAIโs rollout that he joked โof course they have to announce AGI the day my vacation startsโ.
considering all the hyperbole announcements and selective benchmark releases from frontier labs, I donโt even read them but wait for levelheaded AI experts like Gary and read their take instead
OpenAI is not an AGI lab, it's a persuading-people-to-give-them-money-on-the-basis-of-some-vague-optimistic-promise lab. That's what they're really good at. That's what the demo is.
Ad hominem is the AI'ers immediate Go-To tactic. Next is twisting your words or misquoting to make it seem you said things you didn't say. Then there is always the good old dismissal of your criticisms by saying they can be refuted, like Dreyfus, with "a few simple words" they never get around to writing. Finally, when it becomes apparent, even to them, the criticism are justified they will whine you are a big meanie who hurt their feelings.
I feel like an observing witness at an evangelical tent revival. Otherwise intelligent people are lead by their faith to queue up to reach salvation through a healing blow to the head administered by a frocked sama at the pulpit.
I think Chollet has done great work with ARC-AGI. The fact that a statistical approach has been relatively successful merely demonstrates how far away AGI really is. We don't have algorithms that can be brought to bear on ARC-AGI that approach the problem as a human would. (Or, if we do, their human creators didn't enter the contest.)
I look forward to the next generation of ARC-AGI. I believe one of the team's goals is to create a test that is harder for deep learning algorithms to tackle. Detractors will undoubtedly claim that the new test is unfairly biased against their favorite algorithm, but true fans of AGI will say, "This is the way."
It's the AI community that keeps complaining that the goal posts keep getting moved. "True fact" - from Day One, the goal [which needs no moving] was to mimic human intelligence. AI has been a grab bag of hype-filled one-offs, never getting anywhere to the original goal.
No, the goal isn't to mimic human intelligence. Humans have a very severe limit on their input - the Four Pieces Limit. We need to transcend this limit.
There is a semantic AI model (SAM) that is complimentary to large language models (LLM) that contributes facts and reasoning to the AGI in a transparent way.
Surrounding an LLM by facts and reasoning is not going to work. When will you guys realize that you need to dump the LLM when it comes to AGI? You will always be working around their problems. It's just word statistics and humans do not reason or understand based on word statistics. LLMs are useful but not when it comes to AGI. An AGI may consult an LLM, say when it needs to generate text in Shakespearian style.
Gary, you hit on a number of key points. I've endeavored for understanding rather than review the marketing and social media push by those with a vested interest. But I've come a way believing in some clear take aways, related to the cost to automate human performance on a specific test and time to complete that test vs. human performance. If one looks closely at the numbers. The high efficiency version under performed humans but at a significantly faster speed. Typical automation dynamics there. The down side was the cost at getting human performance. We can play a lot of people to solve this test with $350,000.
Hi Gary, your analysis is spot on, as usual! It's amazing how people are so eager to latch on to terms ('AGI') with no clue as to what they mean.
ARC solving [alone] isn't going to lead to AGI - same as how, acing an standard IQ test [alone] isn't a measure of someone's intelligence.
Too bad Francois even made that post about o3 [with a disclaimer in it aside], that ended up adding legitimacy.
By the way [not related to ARC], "common sense reasoning" is an oxymoron unless there is embodiment - the 'sense' part involves direct sensing and perceiving of the physical world, and not symbolic calcs (GOFAI), gradient descent calcs (ML) or cost minimization calcs (RL). *No* AI today does actual common sense reasoning, let alone embark on a path to AGI [whatever the heck that even means].
We're in an era where public discourse seems to be led by folks willing to make unsubstantiated claims, or willing to encourage others to do so, only to walk it back later and hope that no one notices the deception. In this case, the deception leads to doubling down on investment in a model that likely diverts from research into supplemental techniques and technologies. I don't really care if Microsoft or other AI investors lose money (my exposure to them is minimal and those companies have money to burn), but the deception of the public and erosion of critical thinking about science and technology in the press has a societal price we all pay.
At the same time, these LLMs have huge costs in energy ($1000 a query, in some instances, I have read, at a time when energy prices are actually pretty low) that contribute massively to the overheating of our planet. (AI trolls: Don't waste your time telling me that they use solar or hydro, they still displace other uses, causing dirty plants to stay on line longer.) That is a price we all pay, too.
The way I read Challetโs blog post was that he was strongly implying they fine tuned on the test data. Both of the sets: public and โsemi-privateโ are without a doubt in OpenAIโs datasets. In fact, theyโve probably logged the semi-private dataset hundreds of times. He also said that he expected o3 to score <30% on the newer version of ARC. Feels more like cheating out of desperation than AGI, but what do I know?
Jad Tarifi just claimed on my podcast that his company may be weeks away from AGI while putting the definition at a much higher standard - e.g., with human-level energy consumption for an equivalent level of intelligence output. [He also said that him and his co-founder refused to play the scaling laws from the get-go and went for an entirely different strategy in pursuing AGI.]
You can check out the interview below if you want, but he said they are just running a few tests - I presume some of the same benchmarks ChatGPT runs every time, and then they are supposed to have a big breakout news by the new year or in January.
Why not get a bit scientific. "Training something" here means fiddling with the statistics and not caring much about what else gets messed up in the process. If you/they want to talk about AGI, no changing of statistics. It is given a task in text. If it can't understand the text, use a dictionary - if necessary a scientific dictionary. The claims will fade away quickly.
Chollet's notion of cost as a measure seems nice. But the situation is even worse (cost of o3 versus human) than it seems for GenAI: if I do that daily puzzle, it takes me a very short time to *see* the solution. Most of the time I spend is *entering* it. I am slow not because of thinking, but my interface is mechanical/physical (hands)
Isn't what you call pretraining actually finetuning? Pretraining on large corpora gets you the basic language patterns as token statistics. GPT3 was done in 2019. Then they worked for 3 years finetuning to 'sanitise' it enough to be put in the hands of the public (that Kenya stuff...)
As my understanding goes: pretraining is masking tokens on text and calculating token selection.
Or does 01/03 actually do this in a form of pretraining? But how then does that overcome the pure 'weight' of the other pretraining? Enquiring minds would like to know.
considering all the hyperbole announcements and selective benchmark releases from frontier labs, I donโt even read them but wait for levelheaded AI experts like Gary and read their take instead
OpenAI is not an AGI lab, it's a persuading-people-to-give-them-money-on-the-basis-of-some-vague-optimistic-promise lab. That's what they're really good at. That's what the demo is.
I bet the engineers at OpenAI cringe every time Sam Altman and other sales people talk to the media or investors.
an exercise in marketing. Media are incapable of understanding the distinction
Ad hominem is the AI'ers immediate Go-To tactic. Next is twisting your words or misquoting to make it seem you said things you didn't say. Then there is always the good old dismissal of your criticisms by saying they can be refuted, like Dreyfus, with "a few simple words" they never get around to writing. Finally, when it becomes apparent, even to them, the criticism are justified they will whine you are a big meanie who hurt their feelings.
All of this is just comedy for me, btw.
I feel like an observing witness at an evangelical tent revival. Otherwise intelligent people are lead by their faith to queue up to reach salvation through a healing blow to the head administered by a frocked sama at the pulpit.
I think Chollet has done great work with ARC-AGI. The fact that a statistical approach has been relatively successful merely demonstrates how far away AGI really is. We don't have algorithms that can be brought to bear on ARC-AGI that approach the problem as a human would. (Or, if we do, their human creators didn't enter the contest.)
I look forward to the next generation of ARC-AGI. I believe one of the team's goals is to create a test that is harder for deep learning algorithms to tackle. Detractors will undoubtedly claim that the new test is unfairly biased against their favorite algorithm, but true fans of AGI will say, "This is the way."
Paul, BINGO.
It's the AI community that keeps complaining that the goal posts keep getting moved. "True fact" - from Day One, the goal [which needs no moving] was to mimic human intelligence. AI has been a grab bag of hype-filled one-offs, never getting anywhere to the original goal.
No, the goal isn't to mimic human intelligence. Humans have a very severe limit on their input - the Four Pieces Limit. We need to transcend this limit.
There is a semantic AI model (SAM) that is complimentary to large language models (LLM) that contributes facts and reasoning to the AGI in a transparent way.
http://aicyc.org/2024/12/22/no-agi-without-semantic-ai/
Surrounding an LLM by facts and reasoning is not going to work. When will you guys realize that you need to dump the LLM when it comes to AGI? You will always be working around their problems. It's just word statistics and humans do not reason or understand based on word statistics. LLMs are useful but not when it comes to AGI. An AGI may consult an LLM, say when it needs to generate text in Shakespearian style.
Or when it needs to concoct a plausible tall tale, perhaps also in Shakespearean style
Gary, you hit on a number of key points. I've endeavored for understanding rather than review the marketing and social media push by those with a vested interest. But I've come a way believing in some clear take aways, related to the cost to automate human performance on a specific test and time to complete that test vs. human performance. If one looks closely at the numbers. The high efficiency version under performed humans but at a significantly faster speed. Typical automation dynamics there. The down side was the cost at getting human performance. We can play a lot of people to solve this test with $350,000.
Hi Gary, your analysis is spot on, as usual! It's amazing how people are so eager to latch on to terms ('AGI') with no clue as to what they mean.
ARC solving [alone] isn't going to lead to AGI - same as how, acing an standard IQ test [alone] isn't a measure of someone's intelligence.
Too bad Francois even made that post about o3 [with a disclaimer in it aside], that ended up adding legitimacy.
By the way [not related to ARC], "common sense reasoning" is an oxymoron unless there is embodiment - the 'sense' part involves direct sensing and perceiving of the physical world, and not symbolic calcs (GOFAI), gradient descent calcs (ML) or cost minimization calcs (RL). *No* AI today does actual common sense reasoning, let alone embark on a path to AGI [whatever the heck that even means].
Typo: "4. ... and ack of ...
Ack! and I ACK! and I fixed, thanks
We're in an era where public discourse seems to be led by folks willing to make unsubstantiated claims, or willing to encourage others to do so, only to walk it back later and hope that no one notices the deception. In this case, the deception leads to doubling down on investment in a model that likely diverts from research into supplemental techniques and technologies. I don't really care if Microsoft or other AI investors lose money (my exposure to them is minimal and those companies have money to burn), but the deception of the public and erosion of critical thinking about science and technology in the press has a societal price we all pay.
At the same time, these LLMs have huge costs in energy ($1000 a query, in some instances, I have read, at a time when energy prices are actually pretty low) that contribute massively to the overheating of our planet. (AI trolls: Don't waste your time telling me that they use solar or hydro, they still displace other uses, causing dirty plants to stay on line longer.) That is a price we all pay, too.
Thanks for the skepticism. It's much needed.
Your point #7 about influencers being intellectually dishonest made me laugh... As if they ever had the 'intellect' or the 'honesty' to start with.
Hanlon's Razor
The way I read Challetโs blog post was that he was strongly implying they fine tuned on the test data. Both of the sets: public and โsemi-privateโ are without a doubt in OpenAIโs datasets. In fact, theyโve probably logged the semi-private dataset hundreds of times. He also said that he expected o3 to score <30% on the newer version of ARC. Feels more like cheating out of desperation than AGI, but what do I know?
Hey Gary,
Jad Tarifi just claimed on my podcast that his company may be weeks away from AGI while putting the definition at a much higher standard - e.g., with human-level energy consumption for an equivalent level of intelligence output. [He also said that him and his co-founder refused to play the scaling laws from the get-go and went for an entirely different strategy in pursuing AGI.]
You can check out the interview below if you want, but he said they are just running a few tests - I presume some of the same benchmarks ChatGPT runs every time, and then they are supposed to have a big breakout news by the new year or in January.
https://www.singularityweblog.com/jad-tarifi/
What questions should I ask Jad when she returns to my podcast about "releasing AGI"?
"The media should be asking hard questions, not fanning hype." basically summarized current media practice on literally everything.
There must be tremendous pressure to deliver, among the whole AI industry, after it has received trillions of dollars of investment.
Why not get a bit scientific. "Training something" here means fiddling with the statistics and not caring much about what else gets messed up in the process. If you/they want to talk about AGI, no changing of statistics. It is given a task in text. If it can't understand the text, use a dictionary - if necessary a scientific dictionary. The claims will fade away quickly.
Chollet's notion of cost as a measure seems nice. But the situation is even worse (cost of o3 versus human) than it seems for GenAI: if I do that daily puzzle, it takes me a very short time to *see* the solution. Most of the time I spend is *entering* it. I am slow not because of thinking, but my interface is mechanical/physical (hands)
Correction, today's puzzle took me quite a bit longer before I 'saw' it.
Isn't what you call pretraining actually finetuning? Pretraining on large corpora gets you the basic language patterns as token statistics. GPT3 was done in 2019. Then they worked for 3 years finetuning to 'sanitise' it enough to be put in the hands of the public (that Kenya stuff...)
As my understanding goes: pretraining is masking tokens on text and calculating token selection.
Or does 01/03 actually do this in a form of pretraining? But how then does that overcome the pure 'weight' of the other pretraining? Enquiring minds would like to know.