The misleading graphs are like the samples provided in middle school textbooks about how to spot misleading graphs. This is a huge presentation, there's no way they weren't deliberately* included to... mislead.
Their models hit a wall, but their hype generation has not. And so the story continues.
Though I think even a master talent like Sam Altman will hit limits there soon. After all, he announced GPT-5 as the Death Star of LLMs. There really is not much beyond that in popular culture. Perhaps a Borg cube?
As a tangential aside (and I'm NOT intending to Anthropomorphise) this is one of my go-to fallible / heuristic framing-models when dealing with LLM/GPT-based chat tools.
I frame my questions - and interpret the responses - as if I'm dealing with a very well-read, easily-excited undergraduate who is yet to leave academia, and has no real-world practical experience, yet shows potential as a research assistant.
Free after Upton Sinclair: "It is difficult to get a man to understand that we are still far from AGI, when his salary depends on his not understanding it."
I do better at my job than a billion dollar company does at their super important press events to the whole world? I should vibe myself into a job there!
Code deception percentage rate. Imagine that. Not displayed as linear but on a Möbius surface where 50 looks considerably lower. At least now we have statistics instead of damn lies
Right, typos in a hot take are okay. Completely wrong graphs in a massive presentation to announce a product you’ve spent 3 years building ought to be unacceptable.
It's almost as if they're meant to be misleading to an unsophisticated group of public users. Unless someone is suggesting that OpenAI employees are too dumb to notice the extremely obvious errors in graphs for a major presentation.
Yes who could have guessed that someone like Sam Altman who's been fired from multiple jobs for lying to the board and has been caught faking demos and benchmarks would be misleading about the capabilities of his products.
Of course it doesn't work, it's still based on the transformer architecture.
This technology is fundamentally flawd. It's cool tech, but you can't actually use it for anything. Why we're still going down this path alludes me.
From my calculations, about $800 billion spent on gen AI in total, and these idiots have the audacity to say they're trying to better humanity? If you want to better humanity, how about instead of building out your massive data centers, you go feed some poor kids? Idiots.
You highlight some reasonable points Matt - particularly the ludicrous ongoing investments to polish an already-well-polished plateau'd technique.
To be clear: I do agree with the gist of you point that LLM/GPT-based systems in-and-of-themselves are terrible as a complete approach to an AGI system.
However, I don't agree with your statement "This technology is fundamentally flawd. It's cool tech, but you can't actually use it for anything."
Rather, I think it's "This technology is fundamentally flawed for what it's currently being focused on: it cannot be AGI in-and-of itself"
I agree that it is cool tech, and I think LLM/GPT's *CAN* have an ongoing use in a more specific use for both input processing and output forming: essentially as intelligent language engines at the interface between humans and AI systems. Both of those are useful *as component parts* of a larger heterogeneous AGI system, along with world-models for different speciality problem-solution domains.
By automating building very large ontologies and semantically classifying web content at the sentence level serves as an intermediary between LLM and huma. we use an encyclopedia as the interchange. Thanks for like. George
There might be a place for SAM / AICYC-type solutions, however - at least as demonstrated on the website link you provided - from my perspective, this isn't targeting a fundamental problem / need.
In my view, the more general challenge / problem isn't auto-finding refined definitions or specialised sub-categories around a specific topic based on traditional classification decomposition: in my view, two more-important and fundamental problems are 1) having access to knowledge graphs - maybe simple ontologies - around fundamental / foundational sets of inter-related concepts and categories in a subject area or domain that shows the important properties and relations *between them* (not *within each independent member* of the set; and 2) understanding clearly the contextual constraints from which a question or request is being framed.
These more-significant problems are in large part a graph problem / a network problem: not a tightly constrained set of hierarchical ontologies *within* a specific topic / member. We've had those for years and - although they *can* be useful for some specific problem spaces - they're less generally useful: especially if your need is to create a useful generalised tool for problem solving.
Those more-significant problems - solved in way that provides a reliable, trustable, repeatable set of responses - require a) domain-specific / domain-tuned models that anchor on and reflect real and important concepts and constraints, that are b) tuned for the localised temporal-spatial and specialised contextual factors of the requestor.
I don't believe LLM/GPT-based systems can resolve the core challenges inherent of those problems - in and of themselves - in a reliable, repeatable, trustable way.
LOL, those graphs. This is what we call vibe-charting. It is how you report your vibe-coding progress to your manager.
You have to wonder about the ego required to put up such sloppy graphs.
Saul Goodman might say "flexible morals" (where "morals" in this case is "scientific integrity")
BrAIking Bad
My keyboard thanks you for coffee I spilled on it :D
To say nothing of about the intelligence
The misleading graphs are like the samples provided in middle school textbooks about how to spot misleading graphs. This is a huge presentation, there's no way they weren't deliberately* included to... mislead.
Yeah, hard to imagine no editing took place. This was either a yolo/screw it move (bad) or an attempt to pull a fast one (worse).
Some might even say that OpenAI is “hitting a wall”
Their models hit a wall, but their hype generation has not. And so the story continues.
Though I think even a master talent like Sam Altman will hit limits there soon. After all, he announced GPT-5 as the Death Star of LLMs. There really is not much beyond that in popular culture. Perhaps a Borg cube?
"Gary Marcus thinks that perhaps scaling is not in fact all you need."
This made me laugh way harder than I should have. What a shit show AI has become.
this is pathetic. An undergraduate who drew these up would be laughed out of the room. F-
As a tangential aside (and I'm NOT intending to Anthropomorphise) this is one of my go-to fallible / heuristic framing-models when dealing with LLM/GPT-based chat tools.
I frame my questions - and interpret the responses - as if I'm dealing with a very well-read, easily-excited undergraduate who is yet to leave academia, and has no real-world practical experience, yet shows potential as a research assistant.
Thanks for the hot take! "Fan will still find something to rejoince in" should be corrected to "Fans will still find something to rejoice in."
Meanwhile, MSFT and the SP500 are both down today. Seems the markets aren't exactly 'rejoincing' over this new release.
> which left the livestream looking like marketing rather than science.
If you watch release livestreams expecting science, you're gonna have a bad time.
If you want something closer to science, the model card is here: https://cdn.openai.com/pdf/8124a3ce-ab78-4f06-96eb-49ea29ffb52f/gpt5-system-card-aug7.pdf
You of course still need keep in mind all of the corporate incentives while reading that; it's not the same as a journal article.
Looks like they used GPT-5 to annotate the graphs.
Gary Marcus, right again.
Free after Upton Sinclair: "It is difficult to get a man to understand that we are still far from AGI, when his salary depends on his not understanding it."
I do better at my job than a billion dollar company does at their super important press events to the whole world? I should vibe myself into a job there!
To be fair, the word is all the good people took a billion dollars to work elsewhere, so Open-AI is likely relying on mechanical turk ...
Code deception percentage rate. Imagine that. Not displayed as linear but on a Möbius surface where 50 looks considerably lower. At least now we have statistics instead of damn lies
Feeling the SORA. Just waiting for Google to pull away comfortably, and bring the next big idea.
This is the nittiest nit to ever nit but I believe “ And 52.8 is now less than 69.1??” should be more instead of less.
I wouldn’t even mention if this wasn’t an illustration of typos.
Right, typos in a hot take are okay. Completely wrong graphs in a massive presentation to announce a product you’ve spent 3 years building ought to be unacceptable.
It's almost as if they're meant to be misleading to an unsophisticated group of public users. Unless someone is suggesting that OpenAI employees are too dumb to notice the extremely obvious errors in graphs for a major presentation.
too dishonest
Yes who could have guessed that someone like Sam Altman who's been fired from multiple jobs for lying to the board and has been caught faking demos and benchmarks would be misleading about the capabilities of his products.
I am, once again, reminded of another Stanford drop out.
Who is in jail.
Nothing is unacceptable when it comes to AI.
"Unacceptable" implies standards.
As the Fanbois say, it's not a problem: it's "hallucination-by-design"
Of course it doesn't work, it's still based on the transformer architecture.
This technology is fundamentally flawd. It's cool tech, but you can't actually use it for anything. Why we're still going down this path alludes me.
From my calculations, about $800 billion spent on gen AI in total, and these idiots have the audacity to say they're trying to better humanity? If you want to better humanity, how about instead of building out your massive data centers, you go feed some poor kids? Idiots.
You highlight some reasonable points Matt - particularly the ludicrous ongoing investments to polish an already-well-polished plateau'd technique.
To be clear: I do agree with the gist of you point that LLM/GPT-based systems in-and-of-themselves are terrible as a complete approach to an AGI system.
However, I don't agree with your statement "This technology is fundamentally flawd. It's cool tech, but you can't actually use it for anything."
Rather, I think it's "This technology is fundamentally flawed for what it's currently being focused on: it cannot be AGI in-and-of itself"
I agree that it is cool tech, and I think LLM/GPT's *CAN* have an ongoing use in a more specific use for both input processing and output forming: essentially as intelligent language engines at the interface between humans and AI systems. Both of those are useful *as component parts* of a larger heterogeneous AGI system, along with world-models for different speciality problem-solution domains.
By automating building very large ontologies and semantically classifying web content at the sentence level serves as an intermediary between LLM and huma. we use an encyclopedia as the interchange. Thanks for like. George
http://aicyc.org/2025/07/26/aicyc-an-encyclopedia-for-llm/
There might be a place for SAM / AICYC-type solutions, however - at least as demonstrated on the website link you provided - from my perspective, this isn't targeting a fundamental problem / need.
In my view, the more general challenge / problem isn't auto-finding refined definitions or specialised sub-categories around a specific topic based on traditional classification decomposition: in my view, two more-important and fundamental problems are 1) having access to knowledge graphs - maybe simple ontologies - around fundamental / foundational sets of inter-related concepts and categories in a subject area or domain that shows the important properties and relations *between them* (not *within each independent member* of the set; and 2) understanding clearly the contextual constraints from which a question or request is being framed.
These more-significant problems are in large part a graph problem / a network problem: not a tightly constrained set of hierarchical ontologies *within* a specific topic / member. We've had those for years and - although they *can* be useful for some specific problem spaces - they're less generally useful: especially if your need is to create a useful generalised tool for problem solving.
Those more-significant problems - solved in way that provides a reliable, trustable, repeatable set of responses - require a) domain-specific / domain-tuned models that anchor on and reflect real and important concepts and constraints, that are b) tuned for the localised temporal-spatial and specialised contextual factors of the requestor.
I don't believe LLM/GPT-based systems can resolve the core challenges inherent of those problems - in and of themselves - in a reliable, repeatable, trustable way.
I agree and that is why we export our knowledge graph in W3C RDF.
Here is a directory of the KG with 5 million concept nodes (articles) .
https://aicyc.wordpress.com/wp-content/uploads/2025/07/aicyc-knowledge-domain-directory.pdf