The goal is to get people to the point where they don't want to interact with other human beings anymore. That's also why they put men and women against each other and dividing people and make them hate each other.
Hallucinating less also means that it stays on whatever dumf**k track it lands on. This is getting worse with Google. It picks some stupid guess about what an ignoramus might have mistyped and... that's the end.
Does any one notice that all the chatbots do is providing straight through response to a human query? That fact by itself is evidence that the chatbots (CoT like o1 or not) harbor no understanding whatsoever about the subject matter in the original human query. Notice they never ask you a question back for clarification. Why? Because they are not trying to understand. They are just trying to produce the next token word. The whole LLM architecture is just for show. There is no substance.
Wow. This is one of those things that seems so obvious when you hear it, even though I'd never heard it pointed out before. I have had GPT4 ask me questions, but never for the purpose of clarifying what it is that I'm asking. It's only ever happened when it's being "chatty", which I assume comes about from the reinforment learning.
While impressive on some tasks, it seems just as fragile in some of the familiar places. Doesn’t hurt to remind ourselves that last year The Information and Reuters, right after Sam Altman's ouster, spread the rumor that a model called Q* could "threaten humanity." And OpenAI was happy to ride that wave. A year later here it is… the first “reasoning” model. A tweet from Clem, CEO of HuggingFace, posted after today’s announcement, I think said it all: https://x.com/ClementDelangue/status/1834283206474191320
Yes, although the way they’ve made it fake “hm, I’m thinking” like a person is another creepy choice by OpenAI. The affordance to view the work is a win.
In their blog post "Open"AI said explicitly that they would actually be hiding the "chain of thought" from end users, providing only a model-based summary (which of course need not be accurate)
I'm just using the anthropomorphized lingo they use. Call it whatever you like, all those tokens they generate between the query and final answer are hidden from view.
To be fair to "Open"AI, the example in the blog post, if real, was kind of impressive since it involved decoding a just-slightly-nontrivial cipher which encoded the sentence "there are 3 r's in strawberry". So it solved a modestly interesting puzzle (but didn't actually count the r's in "strawberry" 😂). But these examples are always cherry-picked (er, strawberry-picked ) for the advertising material.
It seems these new OpenAI models are using a similar approach to Google DeepMind's AlphaProof and AlphaGeometry. This approach combines LLMs (Large Language Models) with a theorem prover (based on symbolic logic), reminiscent of the classic AI meta-algorithm "generate-and-test." However, they've added a "train" step that uses solutions validated by the theorem prover to fine-tune the LLM through low-rank adaptation (LoRA). This avoids the need to retrain the entire pre-trained model. My LinkedIn post: https://www.linkedin.com/pulse/strawberry-alphaproof-gofai-rescue-generative-ai-claude-coulombe-kxane/
The longer response time could be explained by the trial of several solutions and then a selection of the solution by majority vote. That's a well known « advanced prompting » techniques. OpenAI is clear on that point... 😉 in their post which is probably ChatGPT generated. 🙂
What I am curious about is to what extent improvements on the narrow assessment tests actually translates into recognisable common sense improvements.
No matter how leaky the training sets and metrics are, and how narrow the testing, the "look how big the bars are" effect is undeniable and does rock my scepticism each time.
I suppose my question is how can they (the models) keep climbing up these ever-newly-appearing metrics and yet remain so unimpressive to use...
I work in the Tech Dept. for a non-tech business, and it is getting quite tedious. Internal calls about how amazing AI is followed by hands-on from staff who try it, find it cumbersome or too-generic, and return to their actual jobs.
Watching the AI news these days is just painful. Vast sums of money wasted on AGI kindergarten, while actual AGI research withers on the vine. "We have named our species Homo sapiens — the wise human. But it is debatable how well we have lived up to the name." (Harari, "Nexus", 2024).
Honestly, anyway you slice it this is just a recalibration of Altman/OpenAI hype. Pure bullshit. There are laws in nature we can’t get around that makes things so. So I wonder why we waste time with science fiction?
“there's this question which has been debated in the field for a long time: what do we have to do in addition to a language model to make a system that can go discover new physics?"
“In addition to”?
What if being an LLM chatbot (predicting the next token in a sequence based on statistics of what has been produced before) is fundamentally incompatible with being a theoretical physicist whose job it is to come up with new physics?
The proposition that the two might be incompatible doesn’t seem particularly outlandish.
That plot on the right makes me wanna puke. o1 outperforms "expert humans" on "PhD level science questions"? And it does this by... predicting the next token over and over and over? I have a guess as to how it accomplishes this, and it's got f all to do with knowing anything about science.
One thing is certain: what they are doing at OpenAI is certainly not “science” by any stretch of the word (not even freshman level)
To even call it “engineering” is a very long reach.
Tweaking an LLM whose inner workings they have close to (if not precisely) zero understanding of is not science OR engineering, at least not to anyone who actually studied either of these beyond the junior high level.
These people just distract from other work which makes significant contributions. By inserting reasoning tokens playing with the temperature parameter etc. They just created a big messy hack that gets us nowhere. Trying to justify their prices. It’s MSDOS on steroids. They carried their three character limit all the way to the internet area “htm”. just bad hacks.
See https://platform.openai.com/docs/guides/reasoning for a description of how it uses an LLM to parse the input prompt and then (presumably via a hallucinating transformer) creates a form of chain of thought series of "reasoning tokens" that is added to the original prompt and re-input to the transformer to further hallucinate the final output.
You get charged for all the reasoning tokens, even though you do not see them, as that page warns that you could get up to 25k reasoning tokens. It seems that they also fix the temperature setting at 1 for these preview and mini versions, which is, as we all know, the setting for being away with the fairies.
This smacks of the use of GPT4 to create the prompts for DALL-E3 in Sora, hallucinations compounding hallucinations. A recipe for catastrophe.
Remind me, we need this, why?
So one percenters can live effortlessly in their satellites cruising high above the fray.
To say we do not need agents that work through tricky problems step by step shows a failure of imagination.
The goal is to get people to the point where they don't want to interact with other human beings anymore. That's also why they put men and women against each other and dividing people and make them hate each other.
I use the models to generate code and search for research papers. (Like perplexity.ai). A good search engine to approximate research.
Lot's of valuable use cases in https://www.salesforce.com/artificial-intelligence/use-cases/
I really, really can't think why we need this.
I am VERY curious about whether they have started to incorporate symbolic reasoning or some remnant of good old fashioned AI here. In fact, I thought of you when reading the Verge coverage. "The model hallucinates less" hardly raises the bar. https://www.theverge.com/2024/9/12/24242439/openai-o1-model-reasoning-strawberry-chatgpt
Hallucinating less also means that it stays on whatever dumf**k track it lands on. This is getting worse with Google. It picks some stupid guess about what an ignoramus might have mistyped and... that's the end.
Does any one notice that all the chatbots do is providing straight through response to a human query? That fact by itself is evidence that the chatbots (CoT like o1 or not) harbor no understanding whatsoever about the subject matter in the original human query. Notice they never ask you a question back for clarification. Why? Because they are not trying to understand. They are just trying to produce the next token word. The whole LLM architecture is just for show. There is no substance.
Interesting.
The chatbots never ask for clarification because they are clairvoyant.
It’s part of their training, along with spoonbending.
Word has it that Sam Altman is in charge of the spoonbending training.
In fact, they know what you are going to say before you even say it —so prompts are really superfluous.
Wow. This is one of those things that seems so obvious when you hear it, even though I'd never heard it pointed out before. I have had GPT4 ask me questions, but never for the purpose of clarifying what it is that I'm asking. It's only ever happened when it's being "chatty", which I assume comes about from the reinforment learning.
While impressive on some tasks, it seems just as fragile in some of the familiar places. Doesn’t hurt to remind ourselves that last year The Information and Reuters, right after Sam Altman's ouster, spread the rumor that a model called Q* could "threaten humanity." And OpenAI was happy to ride that wave. A year later here it is… the first “reasoning” model. A tweet from Clem, CEO of HuggingFace, posted after today’s announcement, I think said it all: https://x.com/ClementDelangue/status/1834283206474191320
Sounds about right. The claims about math tests sounded like more of the same. Thanks as always for keeping things science based, Gary.
Side note: I found out about it because an IT guy wanted me to know it can count the Rs in strawberry. Groundbreaking stuff.
The chatbot will be able to work step-by-step and explain what it did. That is a huge deal.
Yes, although the way they’ve made it fake “hm, I’m thinking” like a person is another creepy choice by OpenAI. The affordance to view the work is a win.
In their blog post "Open"AI said explicitly that they would actually be hiding the "chain of thought" from end users, providing only a model-based summary (which of course need not be accurate)
Hiding the chain of thought?
Since there is no actual thought involved, hiding it is pretty easy.
It’s just like hiding empty space.
Touché.
I'm just using the anthropomorphized lingo they use. Call it whatever you like, all those tokens they generate between the query and final answer are hidden from view.
To be fair to "Open"AI, the example in the blog post, if real, was kind of impressive since it involved decoding a just-slightly-nontrivial cipher which encoded the sentence "there are 3 r's in strawberry". So it solved a modestly interesting puzzle (but didn't actually count the r's in "strawberry" 😂). But these examples are always cherry-picked (er, strawberry-picked ) for the advertising material.
It seems these new OpenAI models are using a similar approach to Google DeepMind's AlphaProof and AlphaGeometry. This approach combines LLMs (Large Language Models) with a theorem prover (based on symbolic logic), reminiscent of the classic AI meta-algorithm "generate-and-test." However, they've added a "train" step that uses solutions validated by the theorem prover to fine-tune the LLM through low-rank adaptation (LoRA). This avoids the need to retrain the entire pre-trained model. My LinkedIn post: https://www.linkedin.com/pulse/strawberry-alphaproof-gofai-rescue-generative-ai-claude-coulombe-kxane/
The longer response time could be explained by the trial of several solutions and then a selection of the solution by majority vote. That's a well known « advanced prompting » techniques. OpenAI is clear on that point... 😉 in their post which is probably ChatGPT generated. 🙂
They should really rename "Chain of Thoughts" to "Chain of Hallucinations".
What I am curious about is to what extent improvements on the narrow assessment tests actually translates into recognisable common sense improvements.
No matter how leaky the training sets and metrics are, and how narrow the testing, the "look how big the bars are" effect is undeniable and does rock my scepticism each time.
I suppose my question is how can they (the models) keep climbing up these ever-newly-appearing metrics and yet remain so unimpressive to use...
I work in the Tech Dept. for a non-tech business, and it is getting quite tedious. Internal calls about how amazing AI is followed by hands-on from staff who try it, find it cumbersome or too-generic, and return to their actual jobs.
The sooner this implodes, the better.
I work in education. It is even worse here.
I raise this glass for the impending implosion. Here-here.
Watching the AI news these days is just painful. Vast sums of money wasted on AGI kindergarten, while actual AGI research withers on the vine. "We have named our species Homo sapiens — the wise human. But it is debatable how well we have lived up to the name." (Harari, "Nexus", 2024).
Honestly, anyway you slice it this is just a recalibration of Altman/OpenAI hype. Pure bullshit. There are laws in nature we can’t get around that makes things so. So I wonder why we waste time with science fiction?
“there's this question which has been debated in the field for a long time: what do we have to do in addition to a language model to make a system that can go discover new physics?"
“In addition to”?
What if being an LLM chatbot (predicting the next token in a sequence based on statistics of what has been produced before) is fundamentally incompatible with being a theoretical physicist whose job it is to come up with new physics?
The proposition that the two might be incompatible doesn’t seem particularly outlandish.
That plot on the right makes me wanna puke. o1 outperforms "expert humans" on "PhD level science questions"? And it does this by... predicting the next token over and over and over? I have a guess as to how it accomplishes this, and it's got f all to do with knowing anything about science.
“PhD level science questions”?
According to whom?
Sam Altman, the undergrad dropout?
One thing is certain: what they are doing at OpenAI is certainly not “science” by any stretch of the word (not even freshman level)
To even call it “engineering” is a very long reach.
Tweaking an LLM whose inner workings they have close to (if not precisely) zero understanding of is not science OR engineering, at least not to anyone who actually studied either of these beyond the junior high level.
These people just distract from other work which makes significant contributions. By inserting reasoning tokens playing with the temperature parameter etc. They just created a big messy hack that gets us nowhere. Trying to justify their prices. It’s MSDOS on steroids. They carried their three character limit all the way to the internet area “htm”. just bad hacks.
If "we need another breakthrough", why do so many believe that this breakthrough is coming very soon? I see no reason at all to believe that.
Some undoubtedly believe it because con artists have led them to believe that it is so with incessant hype.
…and they are suckers (especially the investors who stand to lose billions)
Yeah, but even Gary writes as if this is going to happen reasonably soon. I just don't see any particular reason to think that.
See https://platform.openai.com/docs/guides/reasoning for a description of how it uses an LLM to parse the input prompt and then (presumably via a hallucinating transformer) creates a form of chain of thought series of "reasoning tokens" that is added to the original prompt and re-input to the transformer to further hallucinate the final output.
You get charged for all the reasoning tokens, even though you do not see them, as that page warns that you could get up to 25k reasoning tokens. It seems that they also fix the temperature setting at 1 for these preview and mini versions, which is, as we all know, the setting for being away with the fairies.
This smacks of the use of GPT4 to create the prompts for DALL-E3 in Sora, hallucinations compounding hallucinations. A recipe for catastrophe.
A critical analysis on underlying techniques used in OpenAI o1 models. https://ai-cosmos.hashnode.dev/why-reinforcement-learning-via-chain-of-thought-misses-the-point-a-mess-of-misunderstandings-in-ai-research