Epic summary, Gary --- and lots of gold nuggets! I, for one, am glad that your views are finally being heard, despite all the gaslighting over the last 3 or 4 years and still rampant wishful thinking now. Please keep speaking up and bring to us more realistic and clear-headed views on AI, because we need them to make real progress.
I have been *stunned* at LeCun, perhaps naively. Through pure chance, I encountered theorists who made me extremely skeptical of the relationship between these types of agents and what was historically meant by "AI" and now by "AGI"; later, working with ML made me even more skeptical. To see people I once considered opponents in this debate silently switching sides has been shocking to me, but probably shouldn't have been (anymore than seeing founders trumpet their current fixations and enterprises before later pretending they never bought the hype is).
In my unprofessional, long-time hobbyist opinion, it does feel like GPT has been refined, not improved. That is to say, its strengths have been made stronger. It is a bullshitter, but it has become an excellent, entertaining bullshitter, and there are use cases for excellent bullshitting. I should know, I use those very use cases!
Yet those weaknesses endure, fundamentally limiting the system to cases where a high rate of failure is acceptable, even amusing, and human oversight is ever present. With each refinement, as reliably as the tides, waves of awe and utopian and dystopian prognostication rise, crest, and recede. Reality sets in and the world, under no obligation to follow the tropes of science fiction, remains fundamentally unchanged.
Gary, my take is that Yann has a new much more expansive model built on deep learning paradigm (many units connected via gradient learning in a long chain). This new model has short term memory, and lots of other parts that a simple deep net lacks. When he is looking at his new fancy model, he sees the things that the pure deep model could not do. so NOW he is saying, LLM != thinking.
Still I am taken with his new model. It is a very broad sketch, and will take many years to get it to really work, but I think it will behave very differently from a thinking perspective.
And of course he is a full professor. We should just not expect him to acknowledge ANYTHING. :-)
This is wrong. There was a massive change between GPT2 and 3: 3 writes proper English. 2 still had lots of grammatical disfluencies, and the semantic problems were at the sentence level.
GPT3 has literally solved the problem of grammatical output. I know your whole point is that grammatical output alone doesn't do anything, but that doesn't change the fact that this is a huge step forward. It's an advance over the chess and go supremacy, because natural languages are evolved systems, not designed systems. AFAIK, GPT-3 was the first software that could properly mimic any evolved, biological system.
It's not AGI (and we'll probably never get anything people recognise as AGI); but it is something.
I’m really curious about where ChatGPT is getting its information.
I ran an informal experiment the other day to test the accuracy of the chatbot, and the results were baffling. I picked 15 old, public-domain stories that I’ve read in the past few years and asked the chatbot to write summaries of them. All these stories are available for free in their entirety online at sites like Project Gutenberg and American Literature Online and have been for years. (If I remember correctly, they were all added before 2021, the year that the chatbot was done being trained.) The point is that the chatbot should have had access to all the original texts.
I found that it did a pretty good job with stories written by famous authors like D.H. Lawrence and Joseph Conrad. It produced summaries that were accurate, clear, and concise. If I hadn’t known about ChatGPT, I would have assumed that they were written by a professional book reviewer or librarian. For lesser-known authors like Wilkie Collins or E.F. Benson, it did poorly. Sometimes the summaries had a tangential relation to the original stories, but for many of them, Chatbot just fabricated bizarre scenarios that had nothing to do with the original texts—not even close. (Some were surreal and laugh-out-loud funny.) This leads me to believe that the Chatbot is examining other people’s comments about the texts, not the *texts themselves*.
What should I make of that? Has anyone else had a similar experience?
I'm just a layperson with a computing background and a long time interest in AI. This helped me get a better feel for what's going on, although Wolfram himself often struggles to provide a non-technical explanation.. 😉
This is very logical. This is a statistics game. Well-known authors provide a lot of training material in terms of them being written about. The bot hasn't been trained by actual knowledge, it is a 'best next word' engine on steroids. For low statistics entries, like your lesser known authors it hasn't been trained enough to get good statistics. For that you need coded knowledge, which it doesn't have. Past coded knowledge approaches (expert systems mainly) also have seen limited successes and have culminated in the previous round of hype.
Look at it this way, if you have 99 stories on Lawrence and 1 on Benson, the trained model will be a bit like 99% Lawrence and 1% Benson and the result is only very lightly influenced by your question being either about Lawrence of Benson.
Yes, that definitely makes sense. The Chatbot does a good job summarizing texts like *Huckleberry Finn* or *Moby Dick* because it has so much material to draw from. I suppose that it scrapes a lot of text from Amazon, Wikipedia, Encyclopedia Britannica, and maybe even Cliffs Notes. For obscure stories, it just fabricates things based on . . . who knows what. So I suppose the Chatbot might be somewhat useful for writing about topics that have been widely discussed online, but pretty much useless for anything obscure. In any case, it appears that the bot can't read text and make any kind of judgment about its meaning.
The bot is amazing at two things: writing well-formed sentences and writing that 'fits' the subject. Both things together convinces us humans that something intelligent is going on, but that is because we gauge things pretty superficially (we need to, or we would be as slow as molasses). We thus conclude it is intelligent but that is because *we* ourselves really aren't...
A model like ChatGPT only makes sense when most of what it has been trained on (that is related to the phrases in the prompt) makes some sense, and in those cases you get a good chance of well-formed and fitting sentences that make sense. But only if there has been enough material. That is why your example on well-known (i.e. widely discussed) and obscure (i.e. rarely discussed) authors is so good.
If these bots become widely used, they will poision their own well and training them will mean curating what they are trained on heavily. And even then, they will not have any common sense.
I asked it about what QM computing algorithms had been developed between 2010 and 2020 and it answered "many" (closer to 'none' in reality) and mentioned 3, of which one was a hybrid without proof of speed win and two were from the 1990's. https://ea.rna.nl/2022/12/12/cicero-and-chatgpt-signs-of-ai-progress/ but the answer was well formed and sounded convincing. Even for subjects that have been widely discussed online, it will often produce wrong results.
"The bot is amazing at two things: writing well-formed sentences and writing that 'fits' the subject." Yeah, I guess it really is a case of style over substance. If the writing is grammatical and polished, then we often assume that the ideas are legitimate. That's something we will all have to be on guard against in the future.
That's an interesting approach that hadn't occurred to me. I was having the chatbot write short summaries of a lot of different material without going into depth. For Heart of Darkness, the chatbot made a lot of vague generalizations about "the darkness and evil that lies at the heart of the colonial enterprise"—or variations on that theme—but it got tripped up on the actual details of the book and contradicted itself.
I've heard tech pundits say that ChatGPS could be used to summarize legal briefs or other documents, but it looks to me like the bot just doesn't have that capability. At least, I certainly wouldn't trust anything it wrote.
A friend explained to me that it's also limited by context. When it's trained its context is limited to 4096 tokens (words, roughly). The same limitation applies when it's responding to prompts. So it can't really look at the whole text when you ask if for a summary. Those vague high-level summaries it's giving you, it's probably working from Wikipedia plot summaries and summaries found in Cliff notes, etc. But when you start asking for details, well, it may well find the complete text – lots of them in Project Guttenberg, for example – but it can only access them 4K words at a time. So it may look at a 4K chunk, find something that's responsive, and come back with that, completely unaware that it's missing a lot of relevant context.
To do a proper summary it would have to break the text into 4K chunks and summarize them. And then start summarizing the summaries until it had boiled it all down to 4K or less. I believe you can get work-around apps that do that sort of thing. But I wouldn't want to do that for legal briefs of trial transcripts.
I didn't know that but it explains a lot. I suppose the next version of chatbot will be able to analyze bigger batches of text at a time, but I wonder if it will have the intended effect. It's a question of whether ChatbotGPT just needs more firepower or whether the large language model concept has built-in limits. I guess only time will tell!
All quite standard human reaction to controversy about technology. We could all stand to be more humble in recognizing our limitations, generous in granting others recognition of their achievements, and civil in our debates. Alas, we would lose both power and popularity.
Just read an article on the different types of gaslighting and LeCun's tweets (knowingly or unknowingly - who knows) cover about half - the other half includes things like 'religious gaslighting' 😅 That's one of the reasons why the Twitterverse is not for me!
I am reminded of Hanlon's razor: "never attribute to malice what you can attribute to stupidity" (or Bonhoeffer's words on stupidity versus malice. Bonhoeffer wrote for instance "There are human beings who are of remarkably agile intellect yet stupid"). I've noticed that the people with the naive stories about AI generally tend to believe them. That was probably true of LeCun as well. He is now — one hopes — being educated by reality, a reality that some were already aware of.
"Against stupidity we are defenseless. Neither protests nor the use of force accomplish anything here; reasons fall on deaf ears; facts that contradict one’s prejudgment simply need not be believed- in such moments the stupid person even becomes critical – and when facts are irrefutable they are just pushed aside as inconsequential, as incidental. In all this the stupid person, in contrast to the malicious one, is utterly self-satisfied and, being easily irritated, becomes dangerous by going on the attack." — Bonhoeffer
Yep. For students of (the history of) AI the true frustrating thing is the constant repeat of this pattern. I have also observed that people who work in this area aren't always as gaslighting as the loud voices that float to the top.
Aside: I've learned form a friend once: "there are many ways to rise to the top, one of these is being a lightweight". (Originally in Dutch "one of these is by lacking weight" as "weight' has the double meaning of mass-weight and intellectual-weight (consequence) in the sentence)
“Never agree with the aforementioned critics but start mimicking their approach.” — this happened to Dreyfus. His critique was privately listened to and publicly scorned.
"Observe how aforementioned critics gain even more relevance and popularity for being right." — this never happened to Dreyfus. And it doesn't sound right for any reasonable value of 'popularity'. Which critic who (correctly) corrected the fairy tales we *like* to hear has *ever* become popular? I suspect this is — for psychological reasons — unavoidable, a psychological reason being that the brain craves reinforcement and the critics by definition come late to the game.
Epic summary, Gary --- and lots of gold nuggets! I, for one, am glad that your views are finally being heard, despite all the gaslighting over the last 3 or 4 years and still rampant wishful thinking now. Please keep speaking up and bring to us more realistic and clear-headed views on AI, because we need them to make real progress.
I have been *stunned* at LeCun, perhaps naively. Through pure chance, I encountered theorists who made me extremely skeptical of the relationship between these types of agents and what was historically meant by "AI" and now by "AGI"; later, working with ML made me even more skeptical. To see people I once considered opponents in this debate silently switching sides has been shocking to me, but probably shouldn't have been (anymore than seeing founders trumpet their current fixations and enterprises before later pretending they never bought the hype is).
Love your work, of course!
In my unprofessional, long-time hobbyist opinion, it does feel like GPT has been refined, not improved. That is to say, its strengths have been made stronger. It is a bullshitter, but it has become an excellent, entertaining bullshitter, and there are use cases for excellent bullshitting. I should know, I use those very use cases!
Yet those weaknesses endure, fundamentally limiting the system to cases where a high rate of failure is acceptable, even amusing, and human oversight is ever present. With each refinement, as reliably as the tides, waves of awe and utopian and dystopian prognostication rise, crest, and recede. Reality sets in and the world, under no obligation to follow the tropes of science fiction, remains fundamentally unchanged.
Intelligence is... integrity.
My hypothesis is that LeCun is a large language model
lol... but maybe not that large
Haha. You guys crack me up. Someone please insert this exchange in ChatGPT's data set.
This conversation is becoming very risqué indeed... :P
Gary, my take is that Yann has a new much more expansive model built on deep learning paradigm (many units connected via gradient learning in a long chain). This new model has short term memory, and lots of other parts that a simple deep net lacks. When he is looking at his new fancy model, he sees the things that the pure deep model could not do. so NOW he is saying, LLM != thinking.
Still I am taken with his new model. It is a very broad sketch, and will take many years to get it to really work, but I think it will behave very differently from a thinking perspective.
And of course he is a full professor. We should just not expect him to acknowledge ANYTHING. :-)
"GPT hasn’t really changed, either."
This is wrong. There was a massive change between GPT2 and 3: 3 writes proper English. 2 still had lots of grammatical disfluencies, and the semantic problems were at the sentence level.
GPT3 has literally solved the problem of grammatical output. I know your whole point is that grammatical output alone doesn't do anything, but that doesn't change the fact that this is a huge step forward. It's an advance over the chess and go supremacy, because natural languages are evolved systems, not designed systems. AFAIK, GPT-3 was the first software that could properly mimic any evolved, biological system.
It's not AGI (and we'll probably never get anything people recognise as AGI); but it is something.
I’m really curious about where ChatGPT is getting its information.
I ran an informal experiment the other day to test the accuracy of the chatbot, and the results were baffling. I picked 15 old, public-domain stories that I’ve read in the past few years and asked the chatbot to write summaries of them. All these stories are available for free in their entirety online at sites like Project Gutenberg and American Literature Online and have been for years. (If I remember correctly, they were all added before 2021, the year that the chatbot was done being trained.) The point is that the chatbot should have had access to all the original texts.
I found that it did a pretty good job with stories written by famous authors like D.H. Lawrence and Joseph Conrad. It produced summaries that were accurate, clear, and concise. If I hadn’t known about ChatGPT, I would have assumed that they were written by a professional book reviewer or librarian. For lesser-known authors like Wilkie Collins or E.F. Benson, it did poorly. Sometimes the summaries had a tangential relation to the original stories, but for many of them, Chatbot just fabricated bizarre scenarios that had nothing to do with the original texts—not even close. (Some were surreal and laugh-out-loud funny.) This leads me to believe that the Chatbot is examining other people’s comments about the texts, not the *texts themselves*.
What should I make of that? Has anyone else had a similar experience?
Stephen Wolfram makes a good effort at explaining chatGPT:
https://youtu.be/zLnhg9kir3Q
I'm just a layperson with a computing background and a long time interest in AI. This helped me get a better feel for what's going on, although Wolfram himself often struggles to provide a non-technical explanation.. 😉
Thanks—that's one of the clearest explanations I've heard so far.
This is very logical. This is a statistics game. Well-known authors provide a lot of training material in terms of them being written about. The bot hasn't been trained by actual knowledge, it is a 'best next word' engine on steroids. For low statistics entries, like your lesser known authors it hasn't been trained enough to get good statistics. For that you need coded knowledge, which it doesn't have. Past coded knowledge approaches (expert systems mainly) also have seen limited successes and have culminated in the previous round of hype.
Look at it this way, if you have 99 stories on Lawrence and 1 on Benson, the trained model will be a bit like 99% Lawrence and 1% Benson and the result is only very lightly influenced by your question being either about Lawrence of Benson.
Yes, that definitely makes sense. The Chatbot does a good job summarizing texts like *Huckleberry Finn* or *Moby Dick* because it has so much material to draw from. I suppose that it scrapes a lot of text from Amazon, Wikipedia, Encyclopedia Britannica, and maybe even Cliffs Notes. For obscure stories, it just fabricates things based on . . . who knows what. So I suppose the Chatbot might be somewhat useful for writing about topics that have been widely discussed online, but pretty much useless for anything obscure. In any case, it appears that the bot can't read text and make any kind of judgment about its meaning.
The bot is amazing at two things: writing well-formed sentences and writing that 'fits' the subject. Both things together convinces us humans that something intelligent is going on, but that is because we gauge things pretty superficially (we need to, or we would be as slow as molasses). We thus conclude it is intelligent but that is because *we* ourselves really aren't...
A model like ChatGPT only makes sense when most of what it has been trained on (that is related to the phrases in the prompt) makes some sense, and in those cases you get a good chance of well-formed and fitting sentences that make sense. But only if there has been enough material. That is why your example on well-known (i.e. widely discussed) and obscure (i.e. rarely discussed) authors is so good.
If these bots become widely used, they will poision their own well and training them will mean curating what they are trained on heavily. And even then, they will not have any common sense.
I asked it about what QM computing algorithms had been developed between 2010 and 2020 and it answered "many" (closer to 'none' in reality) and mentioned 3, of which one was a hybrid without proof of speed win and two were from the 1990's. https://ea.rna.nl/2022/12/12/cicero-and-chatgpt-signs-of-ai-progress/ but the answer was well formed and sounded convincing. Even for subjects that have been widely discussed online, it will often produce wrong results.
"The bot is amazing at two things: writing well-formed sentences and writing that 'fits' the subject." Yeah, I guess it really is a case of style over substance. If the writing is grammatical and polished, then we often assume that the ideas are legitimate. That's something we will all have to be on guard against in the future.
I had it summarize "Heart of Darkness" and it didn't do very well.
https://new-savanna.blogspot.com/2023/01/the-horror-horror-chatgpt-gets-lost.html
That's an interesting approach that hadn't occurred to me. I was having the chatbot write short summaries of a lot of different material without going into depth. For Heart of Darkness, the chatbot made a lot of vague generalizations about "the darkness and evil that lies at the heart of the colonial enterprise"—or variations on that theme—but it got tripped up on the actual details of the book and contradicted itself.
I've heard tech pundits say that ChatGPS could be used to summarize legal briefs or other documents, but it looks to me like the bot just doesn't have that capability. At least, I certainly wouldn't trust anything it wrote.
A friend explained to me that it's also limited by context. When it's trained its context is limited to 4096 tokens (words, roughly). The same limitation applies when it's responding to prompts. So it can't really look at the whole text when you ask if for a summary. Those vague high-level summaries it's giving you, it's probably working from Wikipedia plot summaries and summaries found in Cliff notes, etc. But when you start asking for details, well, it may well find the complete text – lots of them in Project Guttenberg, for example – but it can only access them 4K words at a time. So it may look at a 4K chunk, find something that's responsive, and come back with that, completely unaware that it's missing a lot of relevant context.
To do a proper summary it would have to break the text into 4K chunks and summarize them. And then start summarizing the summaries until it had boiled it all down to 4K or less. I believe you can get work-around apps that do that sort of thing. But I wouldn't want to do that for legal briefs of trial transcripts.
I didn't know that but it explains a lot. I suppose the next version of chatbot will be able to analyze bigger batches of text at a time, but I wonder if it will have the intended effect. It's a question of whether ChatbotGPT just needs more firepower or whether the large language model concept has built-in limits. I guess only time will tell!
All quite standard human reaction to controversy about technology. We could all stand to be more humble in recognizing our limitations, generous in granting others recognition of their achievements, and civil in our debates. Alas, we would lose both power and popularity.
Just read an article on the different types of gaslighting and LeCun's tweets (knowingly or unknowingly - who knows) cover about half - the other half includes things like 'religious gaslighting' 😅 That's one of the reasons why the Twitterverse is not for me!
I am reminded of Hanlon's razor: "never attribute to malice what you can attribute to stupidity" (or Bonhoeffer's words on stupidity versus malice. Bonhoeffer wrote for instance "There are human beings who are of remarkably agile intellect yet stupid"). I've noticed that the people with the naive stories about AI generally tend to believe them. That was probably true of LeCun as well. He is now — one hopes — being educated by reality, a reality that some were already aware of.
See https://www.linkedin.com/pulse/stupidity-versus-malice-gerben-wierda/ where a relation is made between Bonhoeffer and what Stanislas Dehaene has so brilliantly uncovered and written about human intelligence
"Against stupidity we are defenseless. Neither protests nor the use of force accomplish anything here; reasons fall on deaf ears; facts that contradict one’s prejudgment simply need not be believed- in such moments the stupid person even becomes critical – and when facts are irrefutable they are just pushed aside as inconsequential, as incidental. In all this the stupid person, in contrast to the malicious one, is utterly self-satisfied and, being easily irritated, becomes dangerous by going on the attack." — Bonhoeffer
Yep. For students of (the history of) AI the true frustrating thing is the constant repeat of this pattern. I have also observed that people who work in this area aren't always as gaslighting as the loud voices that float to the top.
Aside: I've learned form a friend once: "there are many ways to rise to the top, one of these is being a lightweight". (Originally in Dutch "one of these is by lacking weight" as "weight' has the double meaning of mass-weight and intellectual-weight (consequence) in the sentence)
“Never agree with the aforementioned critics but start mimicking their approach.” — this happened to Dreyfus. His critique was privately listened to and publicly scorned.
"Observe how aforementioned critics gain even more relevance and popularity for being right." — this never happened to Dreyfus. And it doesn't sound right for any reasonable value of 'popularity'. Which critic who (correctly) corrected the fairy tales we *like* to hear has *ever* become popular? I suspect this is — for psychological reasons — unavoidable, a psychological reason being that the brain craves reinforcement and the critics by definition come late to the game.
AI will always be garbage intelligence. All the WEFers in Davos are going to be replaced with chatbots that no one listens to.