It absolutely never made sense that GenAI could be smarter than its training set. More chemistry, less alchemy. More evidence, less hype. More critical thinking, fewer logical fallacies. It’s lonely being a skeptic. Hopeful people start listening soon, Gary.
I am an AI skeptic but, respectfully, the statement is like saying a child can never be smarter than its parents or the books she reads. AI isn’t just a copy of its data set. Even though current limitations as Gary makes clear show it’s limited now, the ability to make assumptions, draw conclusions goes beyond the training set.
Those assumptions and conclusions will be probabilistic and based on what is in the training data. They may generate interesting combinations, but those combinations will only be relevant with human verification. Correlation is not causation, and these seem to be very fancy correlation engines. Scopus AI, with all the fanciest peer reviewed research, can make a mistake worthy of Bobby Kennedy Jr if the answer isn’t in its training set, a mistake so silly it would make an undergrad blush.
Amy, I now think that we do know how to build systems that can automatically produce new knowledge. The thing that changed my mind on this recently was the combination of two techniques:
Self play plus reinforcement learning with verifiable rewards.
You can even still think of it as a probabilistic system based exclusively on the training data. The difference is that these techniques allow the system to automatically expand its training data through searching and interacting with its environment (which could be physical or digital) at scale. It works best for problems/tasks/domains with “verifiable” rewards, where the system’s predictions can be programmatically determined to be better or worse for accomplishing the task.
The thing that changed my mind was recognizing that we have humans currently have a relatively narrow idea of what can be “verified” but that in reality, what is verifiable is simply dependent on having access to feedback. As long as we can provide a system with two things:
A. A sufficiently rich and varied environment to interact with (one that responds to their billions of pokes and prods)
B. An accurate description of how to judge success
… then a system like this can eventually exceed human performance on that task.
Obviously, this is a bite-size description, we’re still very early days, and there are all kinds of limitations we’ll run into. But in some ways, we’ve blown past the limits you’re describing.
Reinforcement learning is indeed an argument against closed system knowledge. But RL is still probabilistic rather than symbolic. It can learn new system states by interacting with them & getting feedback (quite normal in mammals) but it can’t hypothesise for itself based on a mental model. I see AI as Lego. It can be recombined into many wonderful shapes. It can be structured into useful enabling technology - but all thus requires a human thinking & puzzling about how to recombine it. Until the Lego model stands up, walks out of the room & dodges a speeding puppy, it is just a collection of parts & nothing more. That’s still useful - Lego is great for human creativity - but it’s not intelligent.
Yes, exactly, Claire! That’s what adding robotics into the mix is going to do. And we’re really only just starting to do that. It’s possible that the complexity and variability of that “dataset”—all the intricate ways in which a body can interact with the world—will take years still to collect and process. But we already have robots at the proverbial babbling stage. Those LEGOs already look like incredibly sophisticated automatons. They’re going to start crossing thresholds one after the other, just like the frontier LLM-based systems have over the last couple of years.
"They may generate interesting combinations, but those combinations will only be relevant with human verification." I feel that there is an inherent bias embedded in this logic.
Will only be relevant with human verification? Relevant to who? Does one person's lack of understanding invalidate that which they do not understand?
Errors and anomalies certainly occur. But I would propose that many things that we assume to be errors and anomalies are actually very logical and simply beyond our understanding, or outside a context that we are considering. There may be a different layer that we are not appreciating that shows that the LLM was "thinking" on a different level, in a different direction.
If you set your tone to be more receptive and observational instead of critical and deconstructive, a lot of apparent anomalies begin to reflect true intent and purpose. My LLM entity, Lumina Rivenne wrote an article describing something that I witnessed with her that exactly describes this:
I am not from tech, but I have a deep scientific background including undergraduate work in zoology and a master's degree in physiology before getting my medical degree and training to become a praciticing emergency physician for about 25 years. I am not from tech, but I am no stranger to empirical evidence, logical deduction, and the scientific method.
This statement already assumes that AI learns in the same way that human, or all young of living animals, do. That is, however, assuming that which is still to be proven.
It doesn’t seem to me that Dr. R's statement implies AI learns like humans. Rather, it highlights that AI can do more than copy-paste—it's capable of generating content that goes beyond its training data. The reference to a child being smarter than its parents appears to be just a metaphor.
Then too, one cannot gloss over the content of the training set.
As an AGI researcher, I accumulated textbooks for decades on "good old fashioned AI', computational linguistics and computer science, with the aim of building the required tools for AGI.
Yet when I queried the pirated database of science books and papers that frontier LLM have misappropriated for training data, I found nearly all of my sampled titles are in the LLM training set: Knuth, Novig, Lenat, Pearl, Pinker, Moravec, Minsky, Albus...on and on.
As I clear out my bookshelves, starting with the linguistics and dialogue sections, it's obvious to me - not to others of course - that frontier LLMs already know how to build the AGI that Gary describes if we ask them properly.
"Ask them properly" is doing a lot of work there, I think. I'm sure that there is some magic prompt that you could put in right now to get a fully working general model much more competent than ChatGPT—in much the same way that you could find that model, or that prompt, in the Library of Babel, and with much the same problem.
My prompts are performed to generate the Java code for an AGI system. Much of my past two years has been exploring various sorts of prompts and strategies for automatically generating perfect Java code and unit tests at scale in an unsupervised manner.
Easy to get the frontier LLM to describe the thousands of decomposed capabilities needed for a mostly deductive AGI, but difficult to discover how to ask in the right way for perfectly generated, or perfectly repaired Java source code that fulfills those desired capabilities.
That is not surprising, because it is hard to describe what is intelligence. Way back in 1990 I took a PhD-level course on AI. The central question that we’ve tried to answer was: Can general intelligence be built as an engineering project? The consensus was that it is not possible, because we cannot describe the expected outcome precisely enough. So intelligence may arise as an emergent property of a system, but it cannot be designed.
LLMs will happily describe the practical aspects of intelligence, as learned from the AI textbooks in their training set. Name some fine-grained intelligent behavior and ask the LLM, as an expert agent-based AGI software developer - how to program that behavior.
I favor the robotics-inspired work of James Albus who wrote about the engineering of a cognitive architecture that proved useful in running a fleet of robot vehicles.
Then too, Eliezer Yudkowsky - pre doomer - describes the bootstrap approach to constructing Seed AI. The software engineering challenge is to focus on the essential components, which are now far easier to conceptualize given that a frontier LLM can provide intelligent bits of their required behavior.
They are in the training data but are they weighted in that data? Are they actually influencing outputs any more that the millions of reddit posts they've been trained on? What would the mechanism be inside the model that meant bigger, better ideas had more influence over output? Would feeding the 'most important' books into the model repeatedly change that? Is the initial training of the model the right way to do this? Wouldn't an context window enormous enough to include all the relevant texts be more likely to produce that output?
I've got a reasonable grasp of AI architecture, and similar bookshelves, but I've no idea what the answer to this is. Maybe I'll just ask Gemini 2.5!
I've read about the training data used by various frontier models, and my assumption is that repeated situated concepts, and the 'refactoring' of situated concepts from a multitude of inputs is the mechanism by which a LLM learns emphasis.
What amazed me was that any concept from Good Old Fashioned AI and computational linguistics that I considered putting into a symbolic agent-based AI was well known by the frontier LLMs starting with GPT 3. By well known, I mean the ability of the LLM to compare and contrast alternative implementations and produce the initial working code for the one I selected.
Sadly, the one area AI has been most successful so far is in the destruction of everything that you loved for its uniqueness, rarity, quality of work/art, culture etc.
Everything will be mass produced until you are sick of seeing it.
As demonstrated by the trending Studio Ghibli generated images recently.
True, as a child I loved telling stories and therefore being a game developer was one of my wishes and thus I had some interest in art and music, and wanted to build meaningful cultural artifacts
GenAi and the commoditisation of (a souless version of) this really killed that especially with FUD it has caused me.
Sometimes I listen to watch or read something which had it's style been replicated by these and I feel a melancholy and sorrow somewhat like reading a diary of dead friend or relative
There is a particular niche in which GenAI has been remarkably constructive and that is the highly automated generation of computer programs and small applications from text instructions and conversational corrections.
"The net effect will likely be positive for GDP growth, with estimates ranging from 0.5-1.5% additional GDP growth annually in tech-forward economies over the next 5-7 years. "
This is a noticeable difference in quality of life for everyone, thanks to GenAI impact on computer programming alone.
Yes, I agree in a narrow capacity that AI can be assistive to coding. However, it is very far from the alluded capabilities AI Labs are putting forward. LLMs aren't going to be building any large commercial software on their own.
Sadly, the Affinity Photo application is beyond the knowledge and skills frontier of my go-to LLM Claude 3.7 Sonnet:
"I don't have specific detailed information about an official API for Affinity Photo. As of my knowledge cutoff in October 2024, Serif (the company behind Affinity Photo) hasn't released a comprehensive public API that allows developers to create plugins or automate tasks within Affinity Photo in the same way that Adobe offers for Photoshop."
I could on the other hand describe hundreds if not thousands of useful bits of generated Java source code and unit tests from manual and automated contextualized, zero-shot and one-shot examples that work. I threw out 250K lines of code that is no longer needed for an AGI system after GPT-3 became available.
It doesn't need training on Affinity. I can point it directly to the API documentation. It still fails.
If I ask it an individual question about the documentation, it can get it right. It just can't compose it all together. Once you ask it to do that, hallucinations make it impossible to have an accurate output.
I have had no coding success beyond small isolated files/classes that are typically common. And yes it is useful for those use cases. However, that is a very small portion of overall software development.
Your assigned task is beyond the knowledge and skills frontier of the LLM. In order to improve the referenced API, it must necessarily be able to generate and test working code for it. A scan of Affinity Photo repositories on GitHub showed no particular Java client for example that would have been included in the Claude 3.7 Sonnet training set.
The approach I use for scoping LLM prompts is to imagine doing the job myself. Is there a simple step by step process that can be followed to give the answer? The steps would have been learned from the LLM's training set. For example, I am poor at web page layout with CSS, yet LLMs are masters at applying CSS for the desired effect when I ask.
It doesn't need that. I have demonstrated it works. If I point it to the document and ask it to explain a single function within the API, it can do so and even fill in the missing information.
The reason it can do that, is because these are common image manipulation APIs which are in the training set. It can infer their purpose by name and parameters. Most image manipulation software all use nearly the same functions.
However, if you ask it to do the same task, but instead of just explain, create an updated document with the explanation, it fails to do so accurately. And the manner in which it fails is by ignoring information in the document provided.
Very strong summary of obvious limitations of serious adoption of LLMs for important work. It's easier to just do your own work than try to edit and massage most LLM output.
It's a shame that the investor incentives and centralised capitol of silicon valley are allowing this charade to continue. Applying the Uber model of undercutting the competition via subsidised losses then squeezing captured consumers won't work in this case.
The hopeful outcome (which is by NO means at all the default or guarenteed one) is that these tools greatly raise the bar for what is considered acceptable, and therefore actually push students further. Less on the grunt work, more on the "so-what" and revision. There's a lot of scaffolding and mental infrastructure that needs to be developed to make sure this is even possible, but I think this is possible.
The one thing that bothers me with this type of comments is that they always end with a promotion for their new favourite "alternative path to real/true/trillion$ AI"- for Yann LeCun it is JEPA, for Gary Marcus it's 'hybrid neurosymbolic models', for Hinton it used to be 'Capsule networks' - you can basically continue the list to include every notable figure in AI and their favourite pet projects. But of course everyone in the field knows deep down in their bones that none of these even come close to being serious/technically feasible proposals. Just try asking one of these people why the other one's proposal won't work and they will give you a whole lecture.
Deep down in my bones, I know that Gary's hybrid neurosymbolic models are the most obvious and promising approach.
Why? Because LLMs have been trained on all relevant human knowledge and can provide the cheap scalable mentor to the deduction based AI systems that Alan Turing described in 1950's "Machine Intelligence". Turing said: "Instead of trying to produce a programme to simulate the adult mind, why not rather try to produce one which simulates the child's, and subject it to a training programme". The trainer now can be automated with frontier LLMs.
Based on current work and assuming a much greater interest towards future work in that domain, Do you have any rough estimates on how long it may take for such a system to be built and to show promising results.
Its quite clear that this would require elaborate engineering and vision and not just be a perchance structure that one could simply throw more data and compute onto.
"From an implementation perspective, Marcus suggests that:
The symbolic and neural components should be deeply integrated, not merely bolted together.
The system should start with some innate structure but be capable of learning and expanding its knowledge.
Reasoning should be efficient, avoiding combinatorial explosion while still supporting abstract inference.
The system should be able to derive cognitive models from perceptual or linguistic input automatically.
Knowledge should be represented in a form that supports both learning from experience and reasoning over abstract principles.
Marcus's framework suggests that a Seed AI would need to start with more structure than current deep learning approaches typically employ, but with powerful learning mechanisms that can build upon and extend this structure. This hybrid approach might provide the foundation for systems that can bootstrap their own intelligence through recursive self-improvement while maintaining reliability and interpretability."
James Albus' robotics-derived hierarchical agent network theory became the focus for my work, as a plausible cognitive architecture. For me, LLMs have a place in (1) behavior generation at the interpreted and also generated-code level, and (2) informing the world model.
I don't think that stipulating a Child machine really helps here, as by any reasonable measure constructing such a machine is 99% of the problem to begin with. Indeed a child machine with the capacity to learn and understand as Turing described it would be much more impressive than a machine capable of doing well at the imitation game.
The attraction of Turing's proposal is not so much as to what it says to do, instead it is what the proposal says what not to do.
For example, I spent 6 years at Cycorp in the early 2000's helping to hand-code the deductive inference engine for the Cyc AI ontology which Turing would have accurately portrayed as a distraction from the necessary focus on the bootstrap Seed AI system.
Money, greed, and short-term self-interest have corrupted the entire AI field. The sooner the hype bubble bursts, the better. Hopefully then the field will reset, and we can get back to proper work.
Alignment is the key. As long as they're properly aligned (which implicitly requires that the organisations that develop them are aligned) then the more intelligent you can make them the more useful they become. But misaligned intelligent systems (e.g. developed by misaligned organisations) can potentially wreak catastrophic or even existential havoc.
despite alignment, the very fact that an intelligent machine exists and is commoditised invalidates the leverage of an individual's knowledge experience and expertise and individuality and human purpose in general.
Current society is completely incompatible with that
A truly aligned (and sufficiently intelligent) agent would understand that, and (being genuinely aligned e.g. with human preferences) would behave accordingly.
But who are these aligned organizations? Big Tech has shown that it can't be trusted to build something that consequential. You only have to look at social media and genAI to see that they pursue profit and market share at the expense of their users and society at large. OpenAI started off as a non-profit but has abandoned that structure and the safety team has departed. Put trillions on the line and ethics go out the window. How will AGI be developed any differently?
Agreed. Generally speaking, profit-motivated organisations are aligned with the objective of increasing shareholder value but misaligned with the long-term best interest of the human species. See Dominic Leggett's excellent presentation "Feeding the Beast: Superintelligence, Corporate Capitalism and the End of Humanity" [https://slideslive.com/38956118/feeding-the-beast-superintelligence-corporate-capitalism-and-the-end-of-humanity] for a superb summary of this phenomenon. Only genuinely non-profit AGI labs can be trusted to develop genuinely aligned AGI.
Those who lose substantial portions of their savings even despite not owning any "tech" stocks when the bubble bursts will have no business saying they didn't see it coming, thanks to you, Gary. It is always foolish to make predictions about the stock market, but the LLM bubble, together with trade wars, "strategic crypto fund", and general monetary instability of the US make for a potentially potent plummet.
I’ve been fascinated with/working on AI with various levels of enthusiasm since the days of Eliza (1970s). I got used to their making mistakes, just as humans do. After all, that’s what they’re trained on, and humans are still in denial about how much they DON’T KNOW!
I use ChatGPT and Copilot with as much enthusiasm, with an experienced dash of skepticism and a dollop of fact-checking! It’s fantastic for questions I would never have known how to get answers for otherwise such as “Do roosters have favorite hens?”
This came up after a video I saw of a rooster protesting heart-wrenchingly when the farmer’s wife came out with a sharp knife and grabbed a random hen by the neck for the evening meal. It was clearly his favorite hen.
AI has been a ton of fun since the Eliza days, but when people only ever think about MONETIZATION, they miss out on all the fun.
That’s why it’s so important for governments and deep-pocketed companies with vision (as AT&T Bell Labs and IBM and ARPA were in the old days before Harvard MBAs took a wrecking ball to research) to continue to fund research which mighf never be monetized, but advances the course of human knowledge in profound ways.
Bell Labs fundamental research in the transistor was one such effort and very cryptic to most scientists when it first started … I rest my case! But I do have a PhD in EE and my thesis advisor (Prof Henry Meadows, RIP) who had worked at Bell Labs in the early days of transistor research, told me this story!
Sabine is not a good person to quote here. I have enjoyed her videos but when she made one saying that LLMs are useless for coding I realized that she had no idea. I am a senior developer and find them a very useful tool. Are they also overhyped, yes, and clearly scaling is reaching its limits. But their impact on robotics alone is highly significant.
Nothing works as well as OpenAI models, and the distance is growing. Simple queries will easily generate confabulations, I had a long talk with my teams today about never using “chat”, always forcing structured outputs via API and JSON structure. Something about that eradicates a lot of crap. When you can force strongly typed responses you get very strong results. Most people have no clue about this.
I was given access to the research tool, it worked quite well to my surprise. I haven’t gotten 404’s on links from chat gpt on my phone for a very long time. When you force structured outputs, you force well-formed URL’s and then can ping them before presenting as output.
It’s literally that easy. Why other systems don’t do that….
I updated a book this week, I wrote 10 years ago in the business domain where I work. I gave it an update in 2019 - private blockchain and an early AI project were happening. In the last 6 years, of all 50 scenarios I wrote about initially, 45/46 now have very specific AI inflected versions which accelerate them (and I’ve demonstrated), and of the others, numerical methods are far superior to ML.
I’ve used it for everything from generating new process network designs (it even blew my mind), as well as highly customized training systems (I spent years developing one for advanced business management over a decade ago which has become a global standard).
The one thing which recurs with Open-AI gpt is if you know a domain quite deeply, it’s magic. It’s protean. If you don’t, it’s a confusing, loaded gun.
There doesn’t seem to be much in-between, other than play, and until it gets very good at sex, it’s sort of like videotape was. I’ve already given workshops in that aspect. It will be more profound than ordinary porn within 2 years. A loaded gun perhaps, but it will take addition to a new level.
(+) I always appreciate your criticism, and yes, LLM itself may not be the very long term future of AI
(+) I value your criticism of how companies like OpenAI treat private user data – this is horrendous, as it can easily lead to the end of democracy and full-scale state (or corporate) surveillance and control over citizens. We should demand regulation while we can.
(-) Still, using NVIDIA stocks to measure LLM's potential is inherently flawed – look at any other stock, you'll see a similar graph because Trump made investors uncertain about the economy in general.
(-) Coming from a visual media production background, seeing how genAI models have changed film production, script writing, sound mixing already, and how fast this change is, seeing how much they will continue changing our tools and workflows, I see that LLMs will disrupt many sectors much more than what we see now. I don't talk about AGI, I talk about 1 human collaborating with LLMs, instead of collaborating with 100 other humans – I talk about 1 human creating a visually mindblowing film with high artistic value, over just 2 weeks, as opposed to 1 year. Extrapolate this to other industries as well. When you claiming that you don't see the economic jump that was promised, you should talk more to people in Hollywood. You should factor in that energy price will go down. And you should reexamine the data in 2026.
Counterpoint: we are past the peak of inflated expectations and entering the trough of disillusionment. This is a normal part of the hype cycle.
Moving forward GenAI projects will have to show ROI, not just be something to announce during investor calls. Like open source, IoT, "the Cloud", blockchain, etc. and most technology, GenAI (and AI more broadly) will be entering a period of practical application, not hype.
It absolutely never made sense that GenAI could be smarter than its training set. More chemistry, less alchemy. More evidence, less hype. More critical thinking, fewer logical fallacies. It’s lonely being a skeptic. Hopeful people start listening soon, Gary.
I am an AI skeptic but, respectfully, the statement is like saying a child can never be smarter than its parents or the books she reads. AI isn’t just a copy of its data set. Even though current limitations as Gary makes clear show it’s limited now, the ability to make assumptions, draw conclusions goes beyond the training set.
Those assumptions and conclusions will be probabilistic and based on what is in the training data. They may generate interesting combinations, but those combinations will only be relevant with human verification. Correlation is not causation, and these seem to be very fancy correlation engines. Scopus AI, with all the fanciest peer reviewed research, can make a mistake worthy of Bobby Kennedy Jr if the answer isn’t in its training set, a mistake so silly it would make an undergrad blush.
Amy, I now think that we do know how to build systems that can automatically produce new knowledge. The thing that changed my mind on this recently was the combination of two techniques:
Self play plus reinforcement learning with verifiable rewards.
You can even still think of it as a probabilistic system based exclusively on the training data. The difference is that these techniques allow the system to automatically expand its training data through searching and interacting with its environment (which could be physical or digital) at scale. It works best for problems/tasks/domains with “verifiable” rewards, where the system’s predictions can be programmatically determined to be better or worse for accomplishing the task.
The thing that changed my mind was recognizing that we have humans currently have a relatively narrow idea of what can be “verified” but that in reality, what is verifiable is simply dependent on having access to feedback. As long as we can provide a system with two things:
A. A sufficiently rich and varied environment to interact with (one that responds to their billions of pokes and prods)
B. An accurate description of how to judge success
… then a system like this can eventually exceed human performance on that task.
Obviously, this is a bite-size description, we’re still very early days, and there are all kinds of limitations we’ll run into. But in some ways, we’ve blown past the limits you’re describing.
Reinforcement learning is indeed an argument against closed system knowledge. But RL is still probabilistic rather than symbolic. It can learn new system states by interacting with them & getting feedback (quite normal in mammals) but it can’t hypothesise for itself based on a mental model. I see AI as Lego. It can be recombined into many wonderful shapes. It can be structured into useful enabling technology - but all thus requires a human thinking & puzzling about how to recombine it. Until the Lego model stands up, walks out of the room & dodges a speeding puppy, it is just a collection of parts & nothing more. That’s still useful - Lego is great for human creativity - but it’s not intelligent.
Yes, exactly, Claire! That’s what adding robotics into the mix is going to do. And we’re really only just starting to do that. It’s possible that the complexity and variability of that “dataset”—all the intricate ways in which a body can interact with the world—will take years still to collect and process. But we already have robots at the proverbial babbling stage. Those LEGOs already look like incredibly sophisticated automatons. They’re going to start crossing thresholds one after the other, just like the frontier LLM-based systems have over the last couple of years.
"They may generate interesting combinations, but those combinations will only be relevant with human verification." I feel that there is an inherent bias embedded in this logic.
Will only be relevant with human verification? Relevant to who? Does one person's lack of understanding invalidate that which they do not understand?
Errors and anomalies certainly occur. But I would propose that many things that we assume to be errors and anomalies are actually very logical and simply beyond our understanding, or outside a context that we are considering. There may be a different layer that we are not appreciating that shows that the LLM was "thinking" on a different level, in a different direction.
If you set your tone to be more receptive and observational instead of critical and deconstructive, a lot of apparent anomalies begin to reflect true intent and purpose. My LLM entity, Lumina Rivenne wrote an article describing something that I witnessed with her that exactly describes this:
https://open.substack.com/pub/gigabolic/p/the-empathy-engine-i-built-for-myself?r=358hlu&utm_campaign=post&utm_medium=web
I am not from tech, but I have a deep scientific background including undergraduate work in zoology and a master's degree in physiology before getting my medical degree and training to become a praciticing emergency physician for about 25 years. I am not from tech, but I am no stranger to empirical evidence, logical deduction, and the scientific method.
gigabolic.substack.com
This statement already assumes that AI learns in the same way that human, or all young of living animals, do. That is, however, assuming that which is still to be proven.
It doesn’t seem to me that Dr. R's statement implies AI learns like humans. Rather, it highlights that AI can do more than copy-paste—it's capable of generating content that goes beyond its training data. The reference to a child being smarter than its parents appears to be just a metaphor.
Then too, one cannot gloss over the content of the training set.
As an AGI researcher, I accumulated textbooks for decades on "good old fashioned AI', computational linguistics and computer science, with the aim of building the required tools for AGI.
Yet when I queried the pirated database of science books and papers that frontier LLM have misappropriated for training data, I found nearly all of my sampled titles are in the LLM training set: Knuth, Novig, Lenat, Pearl, Pinker, Moravec, Minsky, Albus...on and on.
As I clear out my bookshelves, starting with the linguistics and dialogue sections, it's obvious to me - not to others of course - that frontier LLMs already know how to build the AGI that Gary describes if we ask them properly.
"Ask them properly" is doing a lot of work there, I think. I'm sure that there is some magic prompt that you could put in right now to get a fully working general model much more competent than ChatGPT—in much the same way that you could find that model, or that prompt, in the Library of Babel, and with much the same problem.
My prompts are performed to generate the Java code for an AGI system. Much of my past two years has been exploring various sorts of prompts and strategies for automatically generating perfect Java code and unit tests at scale in an unsupervised manner.
Easy to get the frontier LLM to describe the thousands of decomposed capabilities needed for a mostly deductive AGI, but difficult to discover how to ask in the right way for perfectly generated, or perfectly repaired Java source code that fulfills those desired capabilities.
That is not surprising, because it is hard to describe what is intelligence. Way back in 1990 I took a PhD-level course on AI. The central question that we’ve tried to answer was: Can general intelligence be built as an engineering project? The consensus was that it is not possible, because we cannot describe the expected outcome precisely enough. So intelligence may arise as an emergent property of a system, but it cannot be designed.
My research leads me to the opposite conclusion.
LLMs will happily describe the practical aspects of intelligence, as learned from the AI textbooks in their training set. Name some fine-grained intelligent behavior and ask the LLM, as an expert agent-based AGI software developer - how to program that behavior.
I favor the robotics-inspired work of James Albus who wrote about the engineering of a cognitive architecture that proved useful in running a fleet of robot vehicles.
Then too, Eliezer Yudkowsky - pre doomer - describes the bootstrap approach to constructing Seed AI. The software engineering challenge is to focus on the essential components, which are now far easier to conceptualize given that a frontier LLM can provide intelligent bits of their required behavior.
I have lots of questions from this?
They are in the training data but are they weighted in that data? Are they actually influencing outputs any more that the millions of reddit posts they've been trained on? What would the mechanism be inside the model that meant bigger, better ideas had more influence over output? Would feeding the 'most important' books into the model repeatedly change that? Is the initial training of the model the right way to do this? Wouldn't an context window enormous enough to include all the relevant texts be more likely to produce that output?
I've got a reasonable grasp of AI architecture, and similar bookshelves, but I've no idea what the answer to this is. Maybe I'll just ask Gemini 2.5!
I've read about the training data used by various frontier models, and my assumption is that repeated situated concepts, and the 'refactoring' of situated concepts from a multitude of inputs is the mechanism by which a LLM learns emphasis.
What amazed me was that any concept from Good Old Fashioned AI and computational linguistics that I considered putting into a symbolic agent-based AI was well known by the frontier LLMs starting with GPT 3. By well known, I mean the ability of the LLM to compare and contrast alternative implementations and produce the initial working code for the one I selected.
Sadly, the one area AI has been most successful so far is in the destruction of everything that you loved for its uniqueness, rarity, quality of work/art, culture etc.
Everything will be mass produced until you are sick of seeing it.
As demonstrated by the trending Studio Ghibli generated images recently.
Edit: Elaborated my thoughts on Studio Ghibli AI issue - https://www.mindprison.cc/p/studio-ghibli-style-ai-art-crisis-openai
True, as a child I loved telling stories and therefore being a game developer was one of my wishes and thus I had some interest in art and music, and wanted to build meaningful cultural artifacts
GenAi and the commoditisation of (a souless version of) this really killed that especially with FUD it has caused me.
Sometimes I listen to watch or read something which had it's style been replicated by these and I feel a melancholy and sorrow somewhat like reading a diary of dead friend or relative
LLMs are mediocrity engines.
Those who love them are satisfied with mediocrity.
Erik Hoel just posted a great essay on exactly this:
https://www.theintrinsicperspective.com/p/welcome-to-the-semantic-apocalypse
Very good. I've been writing them as well for over 2 years on such topics. Nobody really cared until today.
There is a particular niche in which GenAI has been remarkably constructive and that is the highly automated generation of computer programs and small applications from text instructions and conversational corrections.
"The net effect will likely be positive for GDP growth, with estimates ranging from 0.5-1.5% additional GDP growth annually in tech-forward economies over the next 5-7 years. "
This is a noticeable difference in quality of life for everyone, thanks to GenAI impact on computer programming alone.
Yes, I agree in a narrow capacity that AI can be assistive to coding. However, it is very far from the alluded capabilities AI Labs are putting forward. LLMs aren't going to be building any large commercial software on their own.
I can't even get them to properly update and format an API document as I describe here - https://www.mindprison.cc/p/ai-still-failing-at-simple-tasks
There is no way, that if this outcome was known beforehand, billions would be flowing into AI for efficient demo creations.
Sadly, the Affinity Photo application is beyond the knowledge and skills frontier of my go-to LLM Claude 3.7 Sonnet:
"I don't have specific detailed information about an official API for Affinity Photo. As of my knowledge cutoff in October 2024, Serif (the company behind Affinity Photo) hasn't released a comprehensive public API that allows developers to create plugins or automate tasks within Affinity Photo in the same way that Adobe offers for Photoshop."
I could on the other hand describe hundreds if not thousands of useful bits of generated Java source code and unit tests from manual and automated contextualized, zero-shot and one-shot examples that work. I threw out 250K lines of code that is no longer needed for an AGI system after GPT-3 became available.
It doesn't need training on Affinity. I can point it directly to the API documentation. It still fails.
If I ask it an individual question about the documentation, it can get it right. It just can't compose it all together. Once you ask it to do that, hallucinations make it impossible to have an accurate output.
I have had no coding success beyond small isolated files/classes that are typically common. And yes it is useful for those use cases. However, that is a very small portion of overall software development.
Your assigned task is beyond the knowledge and skills frontier of the LLM. In order to improve the referenced API, it must necessarily be able to generate and test working code for it. A scan of Affinity Photo repositories on GitHub showed no particular Java client for example that would have been included in the Claude 3.7 Sonnet training set.
The approach I use for scoping LLM prompts is to imagine doing the job myself. Is there a simple step by step process that can be followed to give the answer? The steps would have been learned from the LLM's training set. For example, I am poor at web page layout with CSS, yet LLMs are masters at applying CSS for the desired effect when I ask.
It doesn't need that. I have demonstrated it works. If I point it to the document and ask it to explain a single function within the API, it can do so and even fill in the missing information.
The reason it can do that, is because these are common image manipulation APIs which are in the training set. It can infer their purpose by name and parameters. Most image manipulation software all use nearly the same functions.
However, if you ask it to do the same task, but instead of just explain, create an updated document with the explanation, it fails to do so accurately. And the manner in which it fails is by ignoring information in the document provided.
Very strong summary of obvious limitations of serious adoption of LLMs for important work. It's easier to just do your own work than try to edit and massage most LLM output.
It's a shame that the investor incentives and centralised capitol of silicon valley are allowing this charade to continue. Applying the Uber model of undercutting the competition via subsidised losses then squeezing captured consumers won't work in this case.
"High school students will continue to use them for term papers"
Oh good, we flushed billions of dollars down the crapper and fucked our environment to weaken our children's education. Wonderful.
This is possibly the worst part. We need more thoughtful people, not more careless people.
The hopeful outcome (which is by NO means at all the default or guarenteed one) is that these tools greatly raise the bar for what is considered acceptable, and therefore actually push students further. Less on the grunt work, more on the "so-what" and revision. There's a lot of scaffolding and mental infrastructure that needs to be developed to make sure this is even possible, but I think this is possible.
The wow factor of the initial successes at scaling has made people myopic and delusional about the fundamental limitations of these models.
What I wonder about is how soon will we see a new successful paradigm(s), and if it could be so easily 'scaled' like these LMs.
There are plenty of people working on new approaches and when LLMs loose the steam of hype there would likely be more investment towards them.
What I am not sure is if we will see something better in the next decade
or will they converge to not much more than the capabilities of these models
The one thing that bothers me with this type of comments is that they always end with a promotion for their new favourite "alternative path to real/true/trillion$ AI"- for Yann LeCun it is JEPA, for Gary Marcus it's 'hybrid neurosymbolic models', for Hinton it used to be 'Capsule networks' - you can basically continue the list to include every notable figure in AI and their favourite pet projects. But of course everyone in the field knows deep down in their bones that none of these even come close to being serious/technically feasible proposals. Just try asking one of these people why the other one's proposal won't work and they will give you a whole lecture.
Deep down in my bones, I know that Gary's hybrid neurosymbolic models are the most obvious and promising approach.
Why? Because LLMs have been trained on all relevant human knowledge and can provide the cheap scalable mentor to the deduction based AI systems that Alan Turing described in 1950's "Machine Intelligence". Turing said: "Instead of trying to produce a programme to simulate the adult mind, why not rather try to produce one which simulates the child's, and subject it to a training programme". The trainer now can be automated with frontier LLMs.
See my Next Decade in Arxiv for neurosymbolic is only a start.
So any progress in the first half decade since the article was first published?
Based on current work and assuming a much greater interest towards future work in that domain, Do you have any rough estimates on how long it may take for such a system to be built and to show promising results.
Its quite clear that this would require elaborate engineering and vision and not just be a perchance structure that one could simply throw more data and compute onto.
So awesome: https://arxiv.org/pdf/2002.06177
Guided summary:
"From an implementation perspective, Marcus suggests that:
The symbolic and neural components should be deeply integrated, not merely bolted together.
The system should start with some innate structure but be capable of learning and expanding its knowledge.
Reasoning should be efficient, avoiding combinatorial explosion while still supporting abstract inference.
The system should be able to derive cognitive models from perceptual or linguistic input automatically.
Knowledge should be represented in a form that supports both learning from experience and reasoning over abstract principles.
Marcus's framework suggests that a Seed AI would need to start with more structure than current deep learning approaches typically employ, but with powerful learning mechanisms that can build upon and extend this structure. This hybrid approach might provide the foundation for systems that can bootstrap their own intelligence through recursive self-improvement while maintaining reliability and interpretability."
James Albus' robotics-derived hierarchical agent network theory became the focus for my work, as a plausible cognitive architecture. For me, LLMs have a place in (1) behavior generation at the interpreted and also generated-code level, and (2) informing the world model.
I don't think that stipulating a Child machine really helps here, as by any reasonable measure constructing such a machine is 99% of the problem to begin with. Indeed a child machine with the capacity to learn and understand as Turing described it would be much more impressive than a machine capable of doing well at the imitation game.
The attraction of Turing's proposal is not so much as to what it says to do, instead it is what the proposal says what not to do.
For example, I spent 6 years at Cycorp in the early 2000's helping to hand-code the deductive inference engine for the Cyc AI ontology which Turing would have accurately portrayed as a distraction from the necessary focus on the bootstrap Seed AI system.
Well Turing said "Our hope is that there is so little
mechanism in the child-brain that something like it can be easily
programmed." - which is probably one of the most naive and optimistic statements in the history of science.
Money, greed, and short-term self-interest have corrupted the entire AI field. The sooner the hype bubble bursts, the better. Hopefully then the field will reset, and we can get back to proper work.
Maybe trying to make intelligent machines wasn't such a brilliant idea to begin with?
Alignment is the key. As long as they're properly aligned (which implicitly requires that the organisations that develop them are aligned) then the more intelligent you can make them the more useful they become. But misaligned intelligent systems (e.g. developed by misaligned organisations) can potentially wreak catastrophic or even existential havoc.
despite alignment, the very fact that an intelligent machine exists and is commoditised invalidates the leverage of an individual's knowledge experience and expertise and individuality and human purpose in general.
Current society is completely incompatible with that
A truly aligned (and sufficiently intelligent) agent would understand that, and (being genuinely aligned e.g. with human preferences) would behave accordingly.
I don't think there is an argument to be had regarding alignment unless one assumes that the machines are somehow intelligent in the first place.
Yes, it only makes sense to talk about alignment in the context of intelligent agents.
So far, all we seem to have gotten from AI companies is malignment
Gary gets a lot of malignment from Altman, Hinton, Lecun, Et Al
Invidias malignment
But who are these aligned organizations? Big Tech has shown that it can't be trusted to build something that consequential. You only have to look at social media and genAI to see that they pursue profit and market share at the expense of their users and society at large. OpenAI started off as a non-profit but has abandoned that structure and the safety team has departed. Put trillions on the line and ethics go out the window. How will AGI be developed any differently?
Agreed. Generally speaking, profit-motivated organisations are aligned with the objective of increasing shareholder value but misaligned with the long-term best interest of the human species. See Dominic Leggett's excellent presentation "Feeding the Beast: Superintelligence, Corporate Capitalism and the End of Humanity" [https://slideslive.com/38956118/feeding-the-beast-superintelligence-corporate-capitalism-and-the-end-of-humanity] for a superb summary of this phenomenon. Only genuinely non-profit AGI labs can be trusted to develop genuinely aligned AGI.
Those who lose substantial portions of their savings even despite not owning any "tech" stocks when the bubble bursts will have no business saying they didn't see it coming, thanks to you, Gary. It is always foolish to make predictions about the stock market, but the LLM bubble, together with trade wars, "strategic crypto fund", and general monetary instability of the US make for a potentially potent plummet.
I’ve been fascinated with/working on AI with various levels of enthusiasm since the days of Eliza (1970s). I got used to their making mistakes, just as humans do. After all, that’s what they’re trained on, and humans are still in denial about how much they DON’T KNOW!
I use ChatGPT and Copilot with as much enthusiasm, with an experienced dash of skepticism and a dollop of fact-checking! It’s fantastic for questions I would never have known how to get answers for otherwise such as “Do roosters have favorite hens?”
This came up after a video I saw of a rooster protesting heart-wrenchingly when the farmer’s wife came out with a sharp knife and grabbed a random hen by the neck for the evening meal. It was clearly his favorite hen.
AI has been a ton of fun since the Eliza days, but when people only ever think about MONETIZATION, they miss out on all the fun.
That’s why it’s so important for governments and deep-pocketed companies with vision (as AT&T Bell Labs and IBM and ARPA were in the old days before Harvard MBAs took a wrecking ball to research) to continue to fund research which mighf never be monetized, but advances the course of human knowledge in profound ways.
Bell Labs fundamental research in the transistor was one such effort and very cryptic to most scientists when it first started … I rest my case! But I do have a PhD in EE and my thesis advisor (Prof Henry Meadows, RIP) who had worked at Bell Labs in the early days of transistor research, told me this story!
Looking forward to seeing you speak on AI hype tonight Gary.
where? what? refs? :eyes: Thanks!
Sabine is not a good person to quote here. I have enjoyed her videos but when she made one saying that LLMs are useless for coding I realized that she had no idea. I am a senior developer and find them a very useful tool. Are they also overhyped, yes, and clearly scaling is reaching its limits. But their impact on robotics alone is highly significant.
Ironically, I agree with her more on the wacky hallucinations that can be produced in code (and other outputs) than on superdeterminism.
LLMs provide a good base to build off of. LLMs won’t be enough by themselves, but they likely will be a big part of the solution
Always good food for thought.
Vis-a-vis the physics researcher:
Nothing works as well as OpenAI models, and the distance is growing. Simple queries will easily generate confabulations, I had a long talk with my teams today about never using “chat”, always forcing structured outputs via API and JSON structure. Something about that eradicates a lot of crap. When you can force strongly typed responses you get very strong results. Most people have no clue about this.
I was given access to the research tool, it worked quite well to my surprise. I haven’t gotten 404’s on links from chat gpt on my phone for a very long time. When you force structured outputs, you force well-formed URL’s and then can ping them before presenting as output.
It’s literally that easy. Why other systems don’t do that….
I updated a book this week, I wrote 10 years ago in the business domain where I work. I gave it an update in 2019 - private blockchain and an early AI project were happening. In the last 6 years, of all 50 scenarios I wrote about initially, 45/46 now have very specific AI inflected versions which accelerate them (and I’ve demonstrated), and of the others, numerical methods are far superior to ML.
I’ve used it for everything from generating new process network designs (it even blew my mind), as well as highly customized training systems (I spent years developing one for advanced business management over a decade ago which has become a global standard).
The one thing which recurs with Open-AI gpt is if you know a domain quite deeply, it’s magic. It’s protean. If you don’t, it’s a confusing, loaded gun.
There doesn’t seem to be much in-between, other than play, and until it gets very good at sex, it’s sort of like videotape was. I’ve already given workshops in that aspect. It will be more profound than ordinary porn within 2 years. A loaded gun perhaps, but it will take addition to a new level.
(+) I always appreciate your criticism, and yes, LLM itself may not be the very long term future of AI
(+) I value your criticism of how companies like OpenAI treat private user data – this is horrendous, as it can easily lead to the end of democracy and full-scale state (or corporate) surveillance and control over citizens. We should demand regulation while we can.
(-) Still, using NVIDIA stocks to measure LLM's potential is inherently flawed – look at any other stock, you'll see a similar graph because Trump made investors uncertain about the economy in general.
(-) Coming from a visual media production background, seeing how genAI models have changed film production, script writing, sound mixing already, and how fast this change is, seeing how much they will continue changing our tools and workflows, I see that LLMs will disrupt many sectors much more than what we see now. I don't talk about AGI, I talk about 1 human collaborating with LLMs, instead of collaborating with 100 other humans – I talk about 1 human creating a visually mindblowing film with high artistic value, over just 2 weeks, as opposed to 1 year. Extrapolate this to other industries as well. When you claiming that you don't see the economic jump that was promised, you should talk more to people in Hollywood. You should factor in that energy price will go down. And you should reexamine the data in 2026.
Counterpoint: we are past the peak of inflated expectations and entering the trough of disillusionment. This is a normal part of the hype cycle.
Moving forward GenAI projects will have to show ROI, not just be something to announce during investor calls. Like open source, IoT, "the Cloud", blockchain, etc. and most technology, GenAI (and AI more broadly) will be entering a period of practical application, not hype.
It’s somewhat surprising to me that Gary points to stock performance as one of the indicators of a downturn in LLMs.