47 Comments
User's avatar
Joy in HK fiFP's avatar

Remind me, we need this, why?

Expand full comment
Eve Szokolai's avatar

So one percenters can live effortlessly in their satellites cruising high above the fray.

Expand full comment
JonnyMadFox's avatar

The goal is to get people to the point where they don't want to interact with other human beings anymore. That's also why they put men and women against each other and dividing people and make them hate each other.

Expand full comment
Richard Self's avatar

I really, really can't think why we need this.

Expand full comment
Chad Woodford's avatar

I am VERY curious about whether they have started to incorporate symbolic reasoning or some remnant of good old fashioned AI here. In fact, I thought of you when reading the Verge coverage. "The model hallucinates less" hardly raises the bar. https://www.theverge.com/2024/9/12/24242439/openai-o1-model-reasoning-strawberry-chatgpt

Expand full comment
Spartacus's avatar

Hallucinating less also means that it stays on whatever dumf**k track it lands on. This is getting worse with Google. It picks some stupid guess about what an ignoramus might have mistyped and... that's the end.

Expand full comment
Jurgen Gravestein's avatar

While impressive on some tasks, it seems just as fragile in some of the familiar places. Doesn’t hurt to remind ourselves that last year The Information and Reuters, right after Sam Altman's ouster, spread the rumor that a model called Q* could "threaten humanity." And OpenAI was happy to ride that wave. A year later here it is… the first “reasoning” model. A tweet from Clem, CEO of HuggingFace, posted after today’s announcement, I think said it all: https://x.com/ClementDelangue/status/1834283206474191320

Expand full comment
Amy A's avatar

Sounds about right. The claims about math tests sounded like more of the same. Thanks as always for keeping things science based, Gary.

Expand full comment
Amy A's avatar

Side note: I found out about it because an IT guy wanted me to know it can count the Rs in strawberry. Groundbreaking stuff.

Expand full comment
Matt Hawthorn's avatar

To be fair to "Open"AI, the example in the blog post, if real, was kind of impressive since it involved decoding a just-slightly-nontrivial cipher which encoded the sentence "there are 3 r's in strawberry". So it solved a modestly interesting puzzle (but didn't actually count the r's in "strawberry" 😂). But these examples are always cherry-picked (er, strawberry-picked ) for the advertising material.

Expand full comment
User's avatar
Comment removed
Sep 12, 2024Edited
Comment removed
Expand full comment
Amy A's avatar

Yes, although the way they’ve made it fake “hm, I’m thinking” like a person is another creepy choice by OpenAI. The affordance to view the work is a win.

Expand full comment
Matt Hawthorn's avatar

In their blog post "Open"AI said explicitly that they would actually be hiding the "chain of thought" from end users, providing only a model-based summary (which of course need not be accurate)

Expand full comment
Larry Jewett's avatar

Hiding the chain of thought?

Since there is no actual thought involved, hiding it is pretty easy.

It’s just like hiding empty space.

Expand full comment
Matt Hawthorn's avatar

Touché.

I'm just using the anthropomorphized lingo they use. Call it whatever you like, all those tokens they generate between the query and final answer are hidden from view.

Expand full comment
Claude Coulombe's avatar

It seems these new OpenAI models are using a similar approach to Google DeepMind's AlphaProof and AlphaGeometry. This approach combines LLMs (Large Language Models) with a theorem prover (based on symbolic logic), reminiscent of the classic AI meta-algorithm "generate-and-test." However, they've added a "train" step that uses solutions validated by the theorem prover to fine-tune the LLM through low-rank adaptation (LoRA). This avoids the need to retrain the entire pre-trained model. My LinkedIn post: https://www.linkedin.com/pulse/strawberry-alphaproof-gofai-rescue-generative-ai-claude-coulombe-kxane/

Expand full comment
Claude Coulombe's avatar

The longer response time could be explained by the trial of several solutions and then a selection of the solution by majority vote. That's a well known « advanced prompting » techniques. OpenAI is clear on that point... 😉 in their post which is probably ChatGPT generated. 🙂

Expand full comment
Ilia Kurgansky's avatar

What I am curious about is to what extent improvements on the narrow assessment tests actually translates into recognisable common sense improvements.

No matter how leaky the training sets and metrics are, and how narrow the testing, the "look how big the bars are" effect is undeniable and does rock my scepticism each time.

I suppose my question is how can they (the models) keep climbing up these ever-newly-appearing metrics and yet remain so unimpressive to use...

Expand full comment
Shane Hegarty's avatar

I work in the Tech Dept. for a non-tech business, and it is getting quite tedious. Internal calls about how amazing AI is followed by hands-on from staff who try it, find it cumbersome or too-generic, and return to their actual jobs.

The sooner this implodes, the better.

Expand full comment
Ilia Kurgansky's avatar

I work in education. It is even worse here.

I raise this glass for the impending implosion. Here-here.

Expand full comment
Aaron Turner's avatar

Watching the AI news these days is just painful. Vast sums of money wasted on AGI kindergarten, while actual AGI research withers on the vine. "We have named our species Homo sapiens — the wise human. But it is debatable how well we have lived up to the name." (Harari, "Nexus", 2024).

Expand full comment
Perry C. Douglas's avatar

Honestly, anyway you slice it this is just a recalibration of Altman/OpenAI hype. Pure bullshit. There are laws in nature we can’t get around that makes things so. So I wonder why we waste time with science fiction?

Expand full comment
Larry Jewett's avatar

“there's this question which has been debated in the field for a long time: what do we have to do in addition to a language model to make a system that can go discover new physics?"

“In addition to”?

What if being an LLM chatbot (predicting the next token in a sequence based on statistics of what has been produced before) is fundamentally incompatible with being a theoretical physicist whose job it is to come up with new physics?

The proposition that the two might be incompatible doesn’t seem particularly outlandish.

Expand full comment
Ben P's avatar

That plot on the right makes me wanna puke. o1 outperforms "expert humans" on "PhD level science questions"? And it does this by... predicting the next token over and over and over? I have a guess as to how it accomplishes this, and it's got f all to do with knowing anything about science.

Expand full comment
Larry Jewett's avatar

“PhD level science questions”?

According to whom?

Sam Altman, the undergrad dropout?

One thing is certain: what they are doing at OpenAI is certainly not “science” by any stretch of the word (not even freshman level)

To even call it “engineering” is a very long reach.

Tweaking an LLM whose inner workings they have close to (if not precisely) zero understanding of is not science OR engineering, at least not to anyone who actually studied either of these beyond the junior high level.

Expand full comment
R Tey's avatar

These people just distract from other work which makes significant contributions. By inserting reasoning tokens playing with the temperature parameter etc. They just created a big messy hack that gets us nowhere. Trying to justify their prices. It’s MSDOS on steroids. They carried their three character limit all the way to the internet area “htm”. just bad hacks.

Expand full comment
MarkS's avatar

If "we need another breakthrough", why do so many believe that this breakthrough is coming very soon? I see no reason at all to believe that.

Expand full comment
Larry Jewett's avatar

Some undoubtedly believe it because con artists have led them to believe that it is so with incessant hype.

…and they are suckers (especially the investors who stand to lose billions)

Expand full comment
MarkS's avatar

Yeah, but even Gary writes as if this is going to happen reasonably soon. I just don't see any particular reason to think that.

Expand full comment
Richard Self's avatar

See https://platform.openai.com/docs/guides/reasoning for a description of how it uses an LLM to parse the input prompt and then (presumably via a hallucinating transformer) creates a form of chain of thought series of "reasoning tokens" that is added to the original prompt and re-input to the transformer to further hallucinate the final output.

You get charged for all the reasoning tokens, even though you do not see them, as that page warns that you could get up to 25k reasoning tokens. It seems that they also fix the temperature setting at 1 for these preview and mini versions, which is, as we all know, the setting for being away with the fairies.

This smacks of the use of GPT4 to create the prompts for DALL-E3 in Sora, hallucinations compounding hallucinations. A recipe for catastrophe.

Expand full comment
DAFP Ideation's avatar

I’m unsubscribing because the author has moved away from using this channel to ideate about AI content in favor of partisan political content ideation about the US election. In addition to the latest post speculating about a US presidential candidate’s mental health, the author has also suspended or at a minimum, threatened to suspend countering thoughts and opinions from subscribers in the comments. Unfortunately this Substack channel no longer advances intellectual curiosity on AI progress and advancements.

Expand full comment
Larry Jewett's avatar

Sam Altman is the OpenAIpenheimer of our age (which could not be more different from “the Oppenheimer”)

Expand full comment