As someone who builds on top of these API's, I think you're underestimating how big of a deal the drastically reduced latency is. Yes the evals look like diminishing returns but the latency and cost improvements are drastic.
It’s welcome improvement, but I am sure that “I wish this thing answered faster” was pretty low on most people’s list of “what I would like to see improve about AI”.
I respectfully disagree with you dramatically, Sillius. High latency and low speed means you do not want to use the system as a helper or tutor, among a bunch of other applications or services. Maybe the problem is me and my need for interactivity and positive reinforcement (I hate slow things, even when I am slow at something, where I hate myself), but there is not much I can do about that (maybe therapy, but I have no money or time for that right now). So, for me, speed was a huge turnoff with GPT-4, especially when the context window or chat length got large, which alone seems to have improved (needless to say, for new chats, the new speed is also awesome). I can't wait to talk to the assistant when the features are released in my locale, but I'll keep my black tape on my camera as usual.
The "advances" are therefore exactly the wrong thing for society: the bot is not more capable in any substantive way, but it is more seductive. Just the sort of thing a marketeer would think of.
Starting at roughly 1/4: "The fact that observations and logic have little impact on our convictions — and that therefore our convictions are stable — merits attention."
Indeed. But what I meant was, my conviction/knowledge/response... won't flip at all, merely if someone just asks 'are you sure?' In the example, that's exactly what happened - because of blindly chaining words together based on similarity search [attention computation].
Humans have errors in judgment, biases, gaps in knowledge, etc etc - but none of them are prone to sudden reversal.
LLMs are so hugely different from human intelligence but outwardly look so much like it that our human estimates/convictions are not capable of handling it with ease. GenAIs are safety-fine-tuned, middle-of-the-road/bland approximations, in that way they have some stability. We also have stability, but it is not all based on middle-of-the-road (cue flat earthers for instance)
I'm so sorry I missed the elephant swimming the English Channel. That would have been epic. I was seven at the time and living in Canterbury so my family could easily have made a day outing to see it. LOL
This is really the ultimate tell, because why would you do that if it were an actual (exponential!!!!111) improvement? Running that thing is *not* cheaper than the version before.
So my guess is they are trying to fuel the hype just a little bit more by getting more users - smells like Twitter valuation to me.
So until they are able to reach GPT-5 they will do the periodic ‘iPhone style’ mini spec bump? “This years iPhone is 0.0034 times faster than last years” . Are they equally speaking about what they are doing to reduce hallucinations/other known issues with LLMs? Surely an update which focuses on improving on current flaws is equally as beneficial as features that are nice to haves?
These systems shouldn't really be described as Artificial Intelligence; "Artificial Extelligence" (AE) would be better. Stewart & Cohen (1997) described, in "Figments of Reality: The Evolution of the Curious Mind", "extelligence" as the cultural capital of humanity, stored and available to us, but external to our minds. A key point is that this store isn't intelligent in itself - these repositories only become useful when they are accessed and made meaningful by humans.
It might well be possible to make artificially intelligent machines using GPT-style technology, machines that make calculations that are meaningful to them, but these systems aren't that. So - AE rather than AI?
An accurate description. I like it in two ways, firstly, I think it hits on the distinction between an "extensional" versus "intensional" description, and AE similarities to human intelligence exist only in terms of the extensional description, and end when you get to the intensional description of intelligence. Like, an AE is similar to a human in the ways it can reply to certain prompts (extensional description of human intelligence), but is very dissimilar to a human in the ways the internal structure, and learning take place (intensional). Like AE needing to rely on a separate training mode that needs to be switched on, and being mutually exclusive to its usage mode, and needing to rely on back propagation, which has no known way to be realised in the human brain.
Secondly, because it gets at what I think is the real nature of AE, a kind of lossy compression of text data. i.e. "stored an available to us".
Disagree. Could still be training GPT5. Could be worried about safety of GPT5. Could be rolling out incremental advances. Many reasons this isn't necessarily a big leap or necessarily GPT5 at this juncture.
We "may" have diminishing returns? Of course we do in energy terms, but its intellectually lazy to claim that "they" don't (and won't) have GPT-5. I'm willing to bet that GPT-5 exists today, but only for internal and VIP use, similar to Sora. Of course proving/disproving this would be quite a challenge, so I'm willing to accept a dearth of OpenAI/industry news by end of 2024 as proof of your claim.
Very easy to emulate the elephant swimming the channel example... when I ask it "When did Raoul Bott win the Fields Medal for Bott Periodicity?" it happily tells me 1966, the year a related mathematician (Atiyah) won the Fields for related work (the index theorem). (Bott never won the Fields,.) Worked great on another similar example as well.
Unfortunately when I Google "Did Raoul Bott ever win the Fields Medal" Google's "AI Overview" (top of the results page!) makes the same mistake, with plausible but false supporting details. We're f'ed, folks.
I remember CEO Sam Altman saying something about "AI will solve global warming crisis". I wondering what are they doing about that, besides consuming plenty of dirty energy.
Of course, no problem... AGI will make the coffee soon, colonize Mars next year (Isn't it Elon?), teleport us to another galaxy before ecosystem crashes and in bonus give us immortality. The messiah is among us. Let us bow down and above all open our wallets and show credit cards.
Ask a bot a serious question, and chances are it's seen that question over and over in its training and can effectively mimic its way to the correct answer. Which, hey, if that's what you're looking for, LLMs do what they're supposed to do quite well.
Problem is, the tech CEOs keep insisting that their LLMs "reason" and "understand". You can't test these claims using normal questions, because those have all been memorized. If we're going to critically assess the purported reasoning abilities of LLMs, we have to do it using questions that aren't easy to solve through mindless pattern-matching.
(Alternatively, tech CEOs could cut out the nonsense be honest about how LLMs actually work. But I guess there's too much money on the line.)
It seems these things are bad at counting siblings..
Try "joe has two brothers. His sister sue has two sisters. How many siblings are there in joe's family?"
Or: "joe has two brothers. His sister sue has two sisters. How many brothers does sue have?"
(MS Copilot goes seriously crazy on the second one, setting up a system of equations to get sue having (drumroll) two brothers.)
I'd guess that this is a systematic error in that if you can come up with examples that require an internal model of the situation being discussed, any LLM, even GPT-17, isn't going to be able to handle it.
Given what the underlying LLM technology is, such systematic errors should be easy to find.
As someone who builds on top of these API's, I think you're underestimating how big of a deal the drastically reduced latency is. Yes the evals look like diminishing returns but the latency and cost improvements are drastic.
It’s welcome improvement, but I am sure that “I wish this thing answered faster” was pretty low on most people’s list of “what I would like to see improve about AI”.
I think the “I wish this thing answered faster” is a really sought after feature, at least from my pov. Voice has had way too high latency.
I respectfully disagree with you dramatically, Sillius. High latency and low speed means you do not want to use the system as a helper or tutor, among a bunch of other applications or services. Maybe the problem is me and my need for interactivity and positive reinforcement (I hate slow things, even when I am slow at something, where I hate myself), but there is not much I can do about that (maybe therapy, but I have no money or time for that right now). So, for me, speed was a huge turnoff with GPT-4, especially when the context window or chat length got large, which alone seems to have improved (needless to say, for new chats, the new speed is also awesome). I can't wait to talk to the assistant when the features are released in my locale, but I'll keep my black tape on my camera as usual.
Plus if you really need speed Mistral or LLama are almost as good at about 2% the compute.
I agree. The voice and video integration with minimal latency will undergird an extraordinary diversity of new apps.
Spot on. The ways in which it becomes more convincing that there is actual understanding seem to outpace actual progress on understanding.
In the meantime Sam discusses UBC when GPT-7 arrives. Which is all too much messianic prophet for my taste.
Human intelligence by the way is also amazing as well as often pretty dumb, so who are we to point fingers?
The "advances" are therefore exactly the wrong thing for society: the bot is not more capable in any substantive way, but it is more seductive. Just the sort of thing a marketeer would think of.
Gerben, lol. But our dumbness is usually not brittle the way an LLM's is :)
No, robustness has a definite evolutionary advantage, see https://ea.rna.nl/2022/10/24/on-the-psychology-of-architecture-and-the-architecture-of-psychology/
Starting at roughly 1/4: "The fact that observations and logic have little impact on our convictions — and that therefore our convictions are stable — merits attention."
Indeed. But what I meant was, my conviction/knowledge/response... won't flip at all, merely if someone just asks 'are you sure?' In the example, that's exactly what happened - because of blindly chaining words together based on similarity search [attention computation].
Humans have errors in judgment, biases, gaps in knowledge, etc etc - but none of them are prone to sudden reversal.
LLMs are so hugely different from human intelligence but outwardly look so much like it that our human estimates/convictions are not capable of handling it with ease. GenAIs are safety-fine-tuned, middle-of-the-road/bland approximations, in that way they have some stability. We also have stability, but it is not all based on middle-of-the-road (cue flat earthers for instance)
PS: Gerben, rereading, not sure if you replied to Earl or me, lol!!
Humans (mostly) have "Common Sense".
I think Artificial Intelligence is a great representation of Human Intelligence, if not slightly more reliable...
I'm so sorry I missed the elephant swimming the English Channel. That would have been epic. I was seven at the time and living in Canterbury so my family could easily have made a day outing to see it. LOL
SO weird (but not really!) how it slips into being illogical - directed there by the addition of 'possibility' :)
Better to have truth served in broken English, rather than falsehood using perfect grammar. Semantics > syntax!
The most important thing is that it’s free.
This is really the ultimate tell, because why would you do that if it were an actual (exponential!!!!111) improvement? Running that thing is *not* cheaper than the version before.
So my guess is they are trying to fuel the hype just a little bit more by getting more users - smells like Twitter valuation to me.
So, "if you do not pay for the product you are the product" holds?
So until they are able to reach GPT-5 they will do the periodic ‘iPhone style’ mini spec bump? “This years iPhone is 0.0034 times faster than last years” . Are they equally speaking about what they are doing to reduce hallucinations/other known issues with LLMs? Surely an update which focuses on improving on current flaws is equally as beneficial as features that are nice to haves?
Is this another way of saying that Sora is parked for now? GPT4-o has definitely stolen a lot of its thunder.
Thank you for sharing Gary. Haven't watched the presentation yet but it's always good to read your thoughts on these topics.
it's clickbait and misinformation. open ai didn't even start training gpt5 until last december
These systems shouldn't really be described as Artificial Intelligence; "Artificial Extelligence" (AE) would be better. Stewart & Cohen (1997) described, in "Figments of Reality: The Evolution of the Curious Mind", "extelligence" as the cultural capital of humanity, stored and available to us, but external to our minds. A key point is that this store isn't intelligent in itself - these repositories only become useful when they are accessed and made meaningful by humans.
It might well be possible to make artificially intelligent machines using GPT-style technology, machines that make calculations that are meaningful to them, but these systems aren't that. So - AE rather than AI?
An accurate description. I like it in two ways, firstly, I think it hits on the distinction between an "extensional" versus "intensional" description, and AE similarities to human intelligence exist only in terms of the extensional description, and end when you get to the intensional description of intelligence. Like, an AE is similar to a human in the ways it can reply to certain prompts (extensional description of human intelligence), but is very dissimilar to a human in the ways the internal structure, and learning take place (intensional). Like AE needing to rely on a separate training mode that needs to be switched on, and being mutually exclusive to its usage mode, and needing to rely on back propagation, which has no known way to be realised in the human brain.
Secondly, because it gets at what I think is the real nature of AE, a kind of lossy compression of text data. i.e. "stored an available to us".
Disagree. Could still be training GPT5. Could be worried about safety of GPT5. Could be rolling out incremental advances. Many reasons this isn't necessarily a big leap or necessarily GPT5 at this juncture.
i they made gpt-4o free to try to reclaim market share and then upsell with gpt-5 in the summer (which is rumored).
you also mention 14 months of training - but they didn't start training gpt-5 until dec 2023, which was 5 months ago
We "may" have diminishing returns? Of course we do in energy terms, but its intellectually lazy to claim that "they" don't (and won't) have GPT-5. I'm willing to bet that GPT-5 exists today, but only for internal and VIP use, similar to Sora. Of course proving/disproving this would be quite a challenge, so I'm willing to accept a dearth of OpenAI/industry news by end of 2024 as proof of your claim.
Very easy to emulate the elephant swimming the channel example... when I ask it "When did Raoul Bott win the Fields Medal for Bott Periodicity?" it happily tells me 1966, the year a related mathematician (Atiyah) won the Fields for related work (the index theorem). (Bott never won the Fields,.) Worked great on another similar example as well.
Unfortunately when I Google "Did Raoul Bott ever win the Fields Medal" Google's "AI Overview" (top of the results page!) makes the same mistake, with plausible but false supporting details. We're f'ed, folks.
I remember CEO Sam Altman saying something about "AI will solve global warming crisis". I wondering what are they doing about that, besides consuming plenty of dirty energy.
Weather of today is super crazy. =/
Of course, no problem... AGI will make the coffee soon, colonize Mars next year (Isn't it Elon?), teleport us to another galaxy before ecosystem crashes and in bonus give us immortality. The messiah is among us. Let us bow down and above all open our wallets and show credit cards.
Ask a bot a serious question and you will get a serious answer.
Ask it a silly question and you will get a silly answer.
Bots behave like most human beings.
Ask a bot a serious question, and chances are it's seen that question over and over in its training and can effectively mimic its way to the correct answer. Which, hey, if that's what you're looking for, LLMs do what they're supposed to do quite well.
Problem is, the tech CEOs keep insisting that their LLMs "reason" and "understand". You can't test these claims using normal questions, because those have all been memorized. If we're going to critically assess the purported reasoning abilities of LLMs, we have to do it using questions that aren't easy to solve through mindless pattern-matching.
(Alternatively, tech CEOs could cut out the nonsense be honest about how LLMs actually work. But I guess there's too much money on the line.)
ask it, "Can an elephant swim the English channel?"
It seems these things are bad at counting siblings..
Try "joe has two brothers. His sister sue has two sisters. How many siblings are there in joe's family?"
Or: "joe has two brothers. His sister sue has two sisters. How many brothers does sue have?"
(MS Copilot goes seriously crazy on the second one, setting up a system of equations to get sue having (drumroll) two brothers.)
I'd guess that this is a systematic error in that if you can come up with examples that require an internal model of the situation being discussed, any LLM, even GPT-17, isn't going to be able to handle it.
Given what the underlying LLM technology is, such systematic errors should be easy to find.
GPT 4o on first attempt:
----QUESTION 1----
User: "joe has two brothers. His sister sue has two sisters. How many siblings are there in joe's family?"
GPT-4o:
"To determine the total number of siblings in Joe's family, we need to carefully consider the information provided:
Joe has two brothers.
Joe's sister, Sue, has two sisters.
Let's break down the information:
Joe having two brothers means there are three boys in the family: Joe and his two brothers.
Sue having two sisters implies there are three girls in the family: Sue and her two sisters.
Therefore, the family consists of:
3 boys (Joe and his two brothers)
3 girls (Sue and her two sisters)
The total number of siblings is the sum of the boys and girls:
3 boys + 3 girls = 6 siblings
Thus, Joe's family has a total of 6 siblings."
-----QUESTION 2----
User: "joe has two brothers. His sister sue has two sisters. How many brothers does sue have?"
GPT-4o:
"Given the information:
Joe has two brothers.
Sue has two sisters.
Let's break it down step by step.
Step 1: Joe's Brothers
Joe himself is one of the brothers.
Since Joe has two brothers, there must be three brothers in total (including Joe).
Step 2: Sue's Sisters
Sue has two sisters.
Including Sue herself, there are three sisters in total.
Step 3: Sue's Brothers
The three brothers mentioned (Joe and his two brothers) must be the same for Sue since they are siblings.
Therefore, Sue also has three brothers."
In contrast, GPT-3.5 has this to say about the first question:
"Sue has two sisters, so she doesn't have any brothers. The number of brothers she has is zero."