It's fascinating that the authors conclude that GPT-2 "may be our best model of language representations in the brain" when really what they have is a 25% correlation between one layer of GPT-2 and one aspect of the data. If they mean "our best (simulated) model," then I guess they might have a point, although it's hard to know what being the best AI model of any cognitive process is worth at this point. If they mean "best model (period)," that's quite the claim.
It's especially interesting that the correlation they found exists on the level of comprehension and semantics, which is exactly the area where we know GPT-2 and similar models are lost at sea. I don't have to tell you that the "fluency" of the text produced by GPT-2 is purchased mainly by not having to worry about coherence beyond a superficial level; surrealist prose remains the strong suit of Large Language Models.
Another good post to point out problems with the AI reporting out there.
You make it your business to point out errors (generally especially with respect to unsupported claims). But such 'facts' do not convince people. It is the other way around (as psychological research has shown): convictions influence what we accept (or even notice) as 'facts' much more than the other way around. AI hype is just as many other human convictions — especially extreme ones — rather fact- and logic-resistant.
What AI-hype is thus illustrating is — ironically enough — not so much the power of digital AI, but the weakness of humans.
Our convictions stem from reinforcement, indeed a bit like ML. For us it is about what we hear/experience often or hear from a close contact. That is not so different from the 'learning' of ML (unsupervised/supervised). That analogy leads ML/AI-believers to assume that it must be possible to get something that has the same 'power' that we do. Symbolic AI's hype was likewise built on an assumption/conviction, namely that intelligence was deep down based on logic and facts (a conviction that resulted from "2500 years of footnotes to Plato"). At some point, the lack of progress will break that assumption. You're just recognising it earlier than most and that is not a nice situation to be in. Ignorance is ...
Hi Gary, another great article, thank you for pulling so many diverse pieces together! The misguided optimism and outright errors regarding the amazing qualities of AI, stem from just one thing - conflating a symbol, with its meaning! Words, x-rays, videos, "mean" something to us when we look at them (or hear, touch...) because we have our own understanding of them that is apart from those symbols themselves.
But every form of AI to date does not have an innate representation of anything! Innate representation is by definition, only possible when there is nothing between the system and its environment, that would re-represent, abstract, narrow, simplify... the world.
I have spent my whole career researching and building Human Language Technology. "Speech is Just Around the Corner" -- that is, speech recognition software that is accurate and fast enough will go mainstream very soon -- was something that I heard repeatedly starting from the late 80s, and every year since then, so that, for more than 20 years, we lived in what seemed to be perpetual disappointment. And then, suddenly is seems, in the early 2010s, the problem was solved! Dictation now is almost better than human. So, yes, we are not there yet on many, many AI fronts, and I agree that lots of charlatans are making many unecessary noises, but we will get there. As for those who are earnestly impatient or naively optimistic? We need them to keep hope alive and the money coming to finance the important work that is being done.
Yes, deep learning has solved a large part of the speech recognition problem but not all of it. Background noise is still a problem. The cocktail party effect has still not been solved. A human can instantly filter out all noises and other voices in a room while concentrating on the voice of a single interlocutor. Humans can do this even if they're hearing the voice for the first time. Deep learning is not even close to solving this very important problem.
My two little points: (1) Deep Learning has made far more progress the last 10 years or so as far as Speech Reco is concerned (at least) than all the big brains, including those of Minsky and Chomsky and Pinker, etc., were able to deliver in the pervious 50+ years. (2) No one is stopping a better solution from emerging -- and I think it will in due time. And maybe it will emerge because Deep Learning will be a bridge tech that will give us the tools to come up with something more elegant? Maybe? Probably? I don't know. I am just perplexed over the nature and purpose of this hostility towards data-heavy Deep Learning...
Thanks for the reply. I absolutely agree with the two points you made. DL is without a doubt a valuable technology but data-heavy DL is a huge problem on the road to AGI in my opinion. A lowly honeybee has less than one million neurons in its tiny brain but it can navigate and survive in highly complex 3D environments. If it was using something like DL or any kind of gradient-based optimizer, it would need a brain the size of a refrigerator or larger to store representations for all the possible patterns/objects/terrains it might encounter during the course of its life. Obviously, biological systems can perform amazingly well without DL-like architectures. If AGI is the goal, we will need to figure out how they do it.
Yup. Totally agree. And anyone who says that DL is the be all and end all is wrong. There is something here that has escaped us for decades and decades (we have been working on AI for a VERY long time, and long before DL was a glint in anyone eye) since we have been trying to make computer do things that human can do. And nothing is stopping us from continuing our research. Now, the one point I would grant is this: maybe there is much too much money going to DL when some of it should go somewhere else? I don't doubt it. But if that's the case, banging on DL is not the right strategy. A better one would be to start delivering success and to make progress on the non-DL front. My prediction: DL-built tools will help us move away from DL, eventually.
The use of the word 'hostility' is interesting and represents an important aspect of the discussion.
On two sides. (1) By labelling the reaction 'hostile' you do not react to the actual argument (it is part of what we humans do if we are confronted with stuff that is in conflict with our convictions; convictions are important and if they are 'threatened' — again a word like 'hostility' — we have a natural tendency to protect them) we fight/flee/freeze. I am not perplexed. It is in our nature to protect convictions because without stable convictions we would not be able to cooperate. The stability of convictions themselves is an important 'good', evolutionary speaking. (2) But it is true that the people confronting ML/AI hype/convictions (like myself) are often frustrated too, and that may indeed lead to 'hostile' behaviour (e.g. in tone). As convictions are 'stable' and 'protected', those that want to change them tend to have to use a form of (verbal) 'violence'.
In other words: even the most friendly critique on convictions already feels 'hostile' to those that hold the conviction — that is natural and it is so for good reasons. And the natural inertia of convictions forces advocates against them to some sort of 'hostility' to break through. Deep convictions lead to more (imagined and real) 'hostility'
I don't think it's anything other than stating the observable to state that Marcus is hostile to DL. It doesn't necessarily mean that he wrong or even that I don't agree with at least aspects of his stand, but his stand towards DL is certainlyt not friendly. :-)
no matter how many I am say “we shouldn’t abandon, but it’s one tool among many”, i will get this reaction. It’s be like if I said the heart isn’t enough to make a complete body, people would say I am anti-heart.
I respect your criticism of DL but, in my opinion, you don't go far enough. Yes, DL is a useful tool for many things, but it is not a tool for solving AGI. To a growing number of AGI researchers, it is utterly useless. The sooner the AGI field, as a whole, comes to understand this, the better things will be. A new learning paradigm that is not based on either function optimization or gradients is coming. It must come. Otherwise, no AGI.
I agree about speech technology. BUT we were surprised how hard it was! Speech to text should be easy. Nobody thinks AI should be easy. Well maybe some do.
The interesting thing is that as soon as we got a chance to move away from algorithmic, rule based "semantic" strategies and started focusing on brute force data driven ones (because we could finally get hold of gobs and gobs of data cheaply and had the power to crunch that data), the speech to text problem was sloved quickly. This is why, having seen just how fruitless non-brute-data methods were for DECADES, and how fast the black-box data solutions have delivered real solutions, I find it annoying whenever I encounter people shouting -> FLAWED! As if (a) We didn't know that it was flawed, and (b) Cost-effective and scalable alternatives existed.
Not sure why you'd say "fake it". If the AI delivers transciptions faster and with greater accuracy than a human being (and we most certainly are there), what is fake about the AI? After all, it is called "Artificial" for a reason. ...
Thanks Gary. Sensible stuff. In the 80s we had promises of true (referred to as 'strong') AI where internal workings were to embody world models of increasing completeness. In contrast, useful behaviours based on statistical (later 'big data') processing were called 'weak' AI. It is sobering to read some of the predictions from 80s evangelists. Incredible then and incredible now.
We do have exciting alternative approaches that merit serious consideration by the mainstream. In my opinion, deep learning is the biggest red herring on the road to AGI in the history of AI. Symbolic AI is a close second. Neither (alone or combined) will play any role in cracking AGI. AGI researchers should immediately abandon deep learning or any kind of gradient-based optimization model and start focusing on winner-take-all, spike-timing-dependent plasticity (STDP) models.
Read this recent paper for a start. This is the true future of AI.
Columnar Learning Networks for Multisensory Spatiotemporal Learning
there's no lack of interesting ideas, but that paper as an example isn't anything like a drop-in intelligence module, just a hint of places we might look. definitely agree though that the mainstream is too narrow, and important ideas may come from elsewhere
Thanks for the reply. There is no doubt in my mind that the solution lies elsewhere. LeCun and other DL pioneers are to be commended for their achievements in DL, an important technology to be sure, but I believe it's time for them to leave the AGI problem to others.
Interesting article. Regardless of the actual architecture/model proposed, it illustrates a far more fundamental point, I think. Second article I have seen that does this.
Yes, foundational issues are a lot more important to AGI than whatever mainstream AI researchers are working on. Even the design of basic components (e.g., visual or auditory sensors) is far more important to generalized intelligence than anything being discussed by either DL or symbolic AI experts. Fortunately, there are unsung researchers out there who think deeply about those things and conduct experiments.
"we don’t yet have any serious candidates" - I do not have an application to install, only a hint about an algorithm. The question is - whether to develop it as it is or think more? If you don't have time for the whole article what about just one section, the first half of it?
It's fascinating that the authors conclude that GPT-2 "may be our best model of language representations in the brain" when really what they have is a 25% correlation between one layer of GPT-2 and one aspect of the data. If they mean "our best (simulated) model," then I guess they might have a point, although it's hard to know what being the best AI model of any cognitive process is worth at this point. If they mean "best model (period)," that's quite the claim.
🔥
It's especially interesting that the correlation they found exists on the level of comprehension and semantics, which is exactly the area where we know GPT-2 and similar models are lost at sea. I don't have to tell you that the "fluency" of the text produced by GPT-2 is purchased mainly by not having to worry about coherence beyond a superficial level; surrealist prose remains the strong suit of Large Language Models.
indeed, the correlation is surely parasitic on other factors
Another good post to point out problems with the AI reporting out there.
You make it your business to point out errors (generally especially with respect to unsupported claims). But such 'facts' do not convince people. It is the other way around (as psychological research has shown): convictions influence what we accept (or even notice) as 'facts' much more than the other way around. AI hype is just as many other human convictions — especially extreme ones — rather fact- and logic-resistant.
What AI-hype is thus illustrating is — ironically enough — not so much the power of digital AI, but the weakness of humans.
Our convictions stem from reinforcement, indeed a bit like ML. For us it is about what we hear/experience often or hear from a close contact. That is not so different from the 'learning' of ML (unsupervised/supervised). That analogy leads ML/AI-believers to assume that it must be possible to get something that has the same 'power' that we do. Symbolic AI's hype was likewise built on an assumption/conviction, namely that intelligence was deep down based on logic and facts (a conviction that resulted from "2500 years of footnotes to Plato"). At some point, the lack of progress will break that assumption. You're just recognising it earlier than most and that is not a nice situation to be in. Ignorance is ...
I'm hoping your opinion has changed in the last 7 months.
Which part?
Hi Gary, another great article, thank you for pulling so many diverse pieces together! The misguided optimism and outright errors regarding the amazing qualities of AI, stem from just one thing - conflating a symbol, with its meaning! Words, x-rays, videos, "mean" something to us when we look at them (or hear, touch...) because we have our own understanding of them that is apart from those symbols themselves.
But every form of AI to date does not have an innate representation of anything! Innate representation is by definition, only possible when there is nothing between the system and its environment, that would re-represent, abstract, narrow, simplify... the world.
AI's problem - is us!!
Saty, welcome to my blog - t.me/natural_language_explainability
Thank you Michael! Will join in a bit :)
Cheers,
Saty
I have spent my whole career researching and building Human Language Technology. "Speech is Just Around the Corner" -- that is, speech recognition software that is accurate and fast enough will go mainstream very soon -- was something that I heard repeatedly starting from the late 80s, and every year since then, so that, for more than 20 years, we lived in what seemed to be perpetual disappointment. And then, suddenly is seems, in the early 2010s, the problem was solved! Dictation now is almost better than human. So, yes, we are not there yet on many, many AI fronts, and I agree that lots of charlatans are making many unecessary noises, but we will get there. As for those who are earnestly impatient or naively optimistic? We need them to keep hope alive and the money coming to finance the important work that is being done.
Yes, deep learning has solved a large part of the speech recognition problem but not all of it. Background noise is still a problem. The cocktail party effect has still not been solved. A human can instantly filter out all noises and other voices in a room while concentrating on the voice of a single interlocutor. Humans can do this even if they're hearing the voice for the first time. Deep learning is not even close to solving this very important problem.
My two little points: (1) Deep Learning has made far more progress the last 10 years or so as far as Speech Reco is concerned (at least) than all the big brains, including those of Minsky and Chomsky and Pinker, etc., were able to deliver in the pervious 50+ years. (2) No one is stopping a better solution from emerging -- and I think it will in due time. And maybe it will emerge because Deep Learning will be a bridge tech that will give us the tools to come up with something more elegant? Maybe? Probably? I don't know. I am just perplexed over the nature and purpose of this hostility towards data-heavy Deep Learning...
Thanks for the reply. I absolutely agree with the two points you made. DL is without a doubt a valuable technology but data-heavy DL is a huge problem on the road to AGI in my opinion. A lowly honeybee has less than one million neurons in its tiny brain but it can navigate and survive in highly complex 3D environments. If it was using something like DL or any kind of gradient-based optimizer, it would need a brain the size of a refrigerator or larger to store representations for all the possible patterns/objects/terrains it might encounter during the course of its life. Obviously, biological systems can perform amazingly well without DL-like architectures. If AGI is the goal, we will need to figure out how they do it.
Yup. Totally agree. And anyone who says that DL is the be all and end all is wrong. There is something here that has escaped us for decades and decades (we have been working on AI for a VERY long time, and long before DL was a glint in anyone eye) since we have been trying to make computer do things that human can do. And nothing is stopping us from continuing our research. Now, the one point I would grant is this: maybe there is much too much money going to DL when some of it should go somewhere else? I don't doubt it. But if that's the case, banging on DL is not the right strategy. A better one would be to start delivering success and to make progress on the non-DL front. My prediction: DL-built tools will help us move away from DL, eventually.
The use of the word 'hostility' is interesting and represents an important aspect of the discussion.
On two sides. (1) By labelling the reaction 'hostile' you do not react to the actual argument (it is part of what we humans do if we are confronted with stuff that is in conflict with our convictions; convictions are important and if they are 'threatened' — again a word like 'hostility' — we have a natural tendency to protect them) we fight/flee/freeze. I am not perplexed. It is in our nature to protect convictions because without stable convictions we would not be able to cooperate. The stability of convictions themselves is an important 'good', evolutionary speaking. (2) But it is true that the people confronting ML/AI hype/convictions (like myself) are often frustrated too, and that may indeed lead to 'hostile' behaviour (e.g. in tone). As convictions are 'stable' and 'protected', those that want to change them tend to have to use a form of (verbal) 'violence'.
In other words: even the most friendly critique on convictions already feels 'hostile' to those that hold the conviction — that is natural and it is so for good reasons. And the natural inertia of convictions forces advocates against them to some sort of 'hostility' to break through. Deep convictions lead to more (imagined and real) 'hostility'
Agreed. And the asymmetries here are striking. On the word hostile, I think it would be fair to accuse Yann LeCun when he says stuff like
- “mostly wrong”, without giving an argument. (it would fine with an argument, but it’s pure hostility without it(
- fighting a “rearguard” action (esp when the problem pointed out turns out to be real, and one he himself letter acknowledge)
- “never published anything in a peer review AI journal” (particularly when the claim is provably false)
etc
But nobody (except me) ever calls LeCun on stuff like that.
I don't think it's anything other than stating the observable to state that Marcus is hostile to DL. It doesn't necessarily mean that he wrong or even that I don't agree with at least aspects of his stand, but his stand towards DL is certainlyt not friendly. :-)
no matter how many I am say “we shouldn’t abandon, but it’s one tool among many”, i will get this reaction. It’s be like if I said the heart isn’t enough to make a complete body, people would say I am anti-heart.
I respect your criticism of DL but, in my opinion, you don't go far enough. Yes, DL is a useful tool for many things, but it is not a tool for solving AGI. To a growing number of AGI researchers, it is utterly useless. The sooner the AGI field, as a whole, comes to understand this, the better things will be. A new learning paradigm that is not based on either function optimization or gradients is coming. It must come. Otherwise, no AGI.
I agree about speech technology. BUT we were surprised how hard it was! Speech to text should be easy. Nobody thinks AI should be easy. Well maybe some do.
The interesting thing is that as soon as we got a chance to move away from algorithmic, rule based "semantic" strategies and started focusing on brute force data driven ones (because we could finally get hold of gobs and gobs of data cheaply and had the power to crunch that data), the speech to text problem was sloved quickly. This is why, having seen just how fruitless non-brute-data methods were for DECADES, and how fast the black-box data solutions have delivered real solutions, I find it annoying whenever I encounter people shouting -> FLAWED! As if (a) We didn't know that it was flawed, and (b) Cost-effective and scalable alternatives existed.
The problem is still difficult to understand and analyze but with statistics we can fake it quite well. Same for MT.
Not sure why you'd say "fake it". If the AI delivers transciptions faster and with greater accuracy than a human being (and we most certainly are there), what is fake about the AI? After all, it is called "Artificial" for a reason. ...
Statistics fakes intelligence because intelligence means understanding.
And what does "understanding" mean?
With 4 parameters I can fit an elephant and with 5 make him wiggle his trunk.
Thanks Gary. Sensible stuff. In the 80s we had promises of true (referred to as 'strong') AI where internal workings were to embody world models of increasing completeness. In contrast, useful behaviours based on statistical (later 'big data') processing were called 'weak' AI. It is sobering to read some of the predictions from 80s evangelists. Incredible then and incredible now.
weak AI has done more than many expected but strong AI still seems distant
Yes - just like 'competent systems' turned out to be useful whereas we never built an 'expert system'.
Thanks Gary for the great post. Welcome to my blog - t.me/natural_language_explainability
PS: I've always found this takedown of Chomsky solid and fair: https://norvig.com/chomsky.html
"we don’t yet have any serious candidates"
We do have exciting alternative approaches that merit serious consideration by the mainstream. In my opinion, deep learning is the biggest red herring on the road to AGI in the history of AI. Symbolic AI is a close second. Neither (alone or combined) will play any role in cracking AGI. AGI researchers should immediately abandon deep learning or any kind of gradient-based optimization model and start focusing on winner-take-all, spike-timing-dependent plasticity (STDP) models.
Read this recent paper for a start. This is the true future of AI.
Columnar Learning Networks for Multisensory Spatiotemporal Learning
https://onlinelibrary.wiley.com/doi/epdf/10.1002/aisy.202200179
there's no lack of interesting ideas, but that paper as an example isn't anything like a drop-in intelligence module, just a hint of places we might look. definitely agree though that the mainstream is too narrow, and important ideas may come from elsewhere
Thanks for the reply. There is no doubt in my mind that the solution lies elsewhere. LeCun and other DL pioneers are to be commended for their achievements in DL, an important technology to be sure, but I believe it's time for them to leave the AGI problem to others.
Interesting article. Regardless of the actual architecture/model proposed, it illustrates a far more fundamental point, I think. Second article I have seen that does this.
Yes, foundational issues are a lot more important to AGI than whatever mainstream AI researchers are working on. Even the design of basic components (e.g., visual or auditory sensors) is far more important to generalized intelligence than anything being discussed by either DL or symbolic AI experts. Fortunately, there are unsung researchers out there who think deeply about those things and conduct experiments.
"we don’t yet have any serious candidates" - I do not have an application to install, only a hint about an algorithm. The question is - whether to develop it as it is or think more? If you don't have time for the whole article what about just one section, the first half of it?