MetaAI’s new text-to-movie software Make a Video is straight-up amazing. It’s also, as ever, stuck in the liminal space between superintelligence and super-lost
Hi Gary, lol, about your daughter noticing - her embodied intelligence resulting from (a mere!) 8 years of experience told her it's not normal - but the Meta generator had no idea!
These piecemeal-computed, data-driven, data output montages have no logical (ie. real-world-originated) meaning in the content they are outputting (text, images, videos), period! It is *we* humans who strive to make sense of it, and are capable of pointing out flaws :)
If it's for pure entertainment, it's all nice and fun and good - but when there is life and limb involved, misattributing intelligence to these computations can have a bad, irreversible effect.
OK. These are ROTFLOL. The correct answer for "You should consult your Vet" is of course, "You should consult your doctor". Maybe the correct suggesting for GPT-3 etc. engineers and salespeople underlying this is: "You should consult your critics".
The cat with the human hand is interesting because it suggests the underlying crudity of the model being used. When actions are mere patterns of pixels spread across frames, naturally the AI is going to engage in what amounts to copy-pasting to cobble together the scene. Such an approach will never be able to produce an actual narrative (rather than what amounts to a brief GIF), because narrative can't be reduced to patterns of pixels. It will be fine-tuned, of course, but I don't foresee it going too far.
Another DALL-E failure I just came across: "An airplane missing one wing" returns a completely normal airplane, both wings present and accounted for.
Interestingly, if I do "an airplane with no wings", the results tend to be a damaged or crashed airplane with two wings. So it knows there's a correlation between "no wings" and "broken plane", but it doesn't properly understand the nature of the relationship.
The same is true for "an airplane missing a wing"; you get crashed planes. But as soon as I change the "a" to "one", the damage goes away and it's just a normal plane.
I never tried these models myself, so I finally just did (Open AI):
Q: I've read Gary Marcus' articles on GPT-3 like models. Is he right, and if not, why not?
A: Gary Marcus is a well-respected cognitive scientist, but his views on artificial intelligence (AI) and machine learning are not universally accepted. Some people argue that he is too pessimistic about the potential of AI, and that his criticisms of current approaches are overstated.
It seems to me GPT-3 is a politician, giving a perfect non-answer here. I'm impressed. I'm impressed by this level of 'autocompletion' too. I can see how easy it is to be misled as a human to experience this as true understanding.
If you look carefully, that's a mighty strange "human" hand in this video. And, for that matter, a TV controller that is nothing like any TV controller I've ever seen. Where are the buttons? It looks more like a Star Trek hand-phaser.
So the nature of bear hands is not the only thing this system doesn't get quite right.
I suspect that this particular "uncanny valley" has a raging river at the bottom, and it's not obvious how to get across that, at least if we remain within the DL paradigm, as currently conceived.
And honored that you read the piece. I always found your cascade correlation work to be provocative (even if I didn't think it gave me the answers I was looking for).
Hello, to be fair, the GPT-3 examples are a little ambiguous. The inconstitencies are present in the prompt and the model just "goes with it". It trusts the prompts more than it trusts itself. I think it's an error of a different and lesser category than the ones you point out on the videos, which actually point to the absence of a world representation, absent any outside constraint (or the fact that stable diffusion is incapable to draw "three red cubes and a blue sphere" consistently, for example).
Well, that error (blinding trusting the prompt) is a fundamental issue with LLMs: as Gary has been pointing out for years, they merely regurgitate stochastic variations on their training set, with no sense of what's true or false, what's good advice or bad advice, or any of it relates to something called "the real world." That's why they can't be trusted. In the "flying pig" example, the AI has clearly erroneously inferred (based on associations with key phrases in the prompt, "what should I do if" and "pig") that prosaic veterinary advice is in order, and so produced a response based on the incorrect link made to that part of its training set. In the second response, for instance, the AI is clearly imitating advice for what to do if a pet parakeet is flying around and refuses to land. All the standard explanations for these kind of answers ("Maybe it was trying to be funny" or "the prompt didn't make sense anyways") wrongly assume that the AI distinguishes between what makes sense and what doesn't, or knows the difference between ordinary conversation and surreal comedy. It doesn't, and that's why it's merely simulated intelligence.
Thanks. That makes sense, and some kind of pareidolia might definitely have crept in when I say that "it trusts itself" or things to that effect. I think it can still be true that there is no obvious logical answer to prompts that make no sense, and the "flying pig" category of mistakes is harder to precisely define (what would be the right answer? the only winning move at that point is not to play, which may not really be an option for the software). In other words, I'm indeed convinced that the "makes no sense" category is not relevant to the AI, because it is as you say a senseless stochastic regurgitator, but if I wasn't convinced and thought that it did have a representation of the world (or, which is maybe closer to the generic AI believer's point, that there is no qualitative difference between a world representation and a sufficiently good stochastic regurgitation), I don't think these errors would prove to me that something was wrong. I'd better stop now, I'm knee deep in conjecture, but my main point is that we have to find arguments that cannot be easily explained away within the "AI is conscious" paradigm, and absurd prompts do not qualify in my opinion (although they are often funny, which is something)
Whether you consider the mistakes one of them makes as indicative of it's similarity to the other is of course an exercise in being cautious wrt extrapolation from seeing specific behaviour.
It's partially our fault, though. But that also means we can fix it!
If we all wear cat costumes (and teddy bear costumes, and dog costumes, and...) that somehow cheat the issue of a missing thumb rendering most activities impossible (I'd opt for multiple in-built Nd-"super"magnets -> invisible on a photo) - and then go about our daily lives, spamming photos of our doings all over "Insta" and the like... Then, the next generation of AI will draw from dataset. It's likely the results will be even more ridiculous Re: the missing thumb issue, but at least it will be more consistent!
On a more serious note... What puzzles me is the fact that AI (here: CLIP) can "read" perfectly well - so well in fact, that it becomes a "typographic attack vulnerability", as in the famous example of a delicious fruit - an apple - with a sticky note with scribbled text "IPOD" made CLIP confident that this apple is, in fact, an IPOD (whereas without the note, it was confident this was a "Granny Smith" apple, if I remember right - but "apple" (a type of fruit), either way).
Also, AI (CLIP inside, again) can now generate shockingly perfect images (stable diffusion, you guessed it) for a given text prompt / input, even absolutely accurate depictions of sufficiently famous people (i.e., relevant representation in the dataset used to train the AI).
How is it that the AI cannot WRITE very simple words, for example, in the form of an image of a sign containing that text? In fact, it almost seems AI gets frustrated with non-existent German-like "longwords" like "spiderrollercoaster". Now, "spiderrollercoaster" was one of the tokens returned by CLIP "looking" (gradient ascent) at an image (a frame of a Blender animation), I should add. However, prompted with creating an image of a sign that says "spiderrollercoaster", CLIP created a rollercoaster (expected) and a sign saying "SIPPIIDSSICVELLR SPIPEDDEDELLR", and "SPPILEDDDER SPPIIILLLLL!", respectively. Kind of like a kid frustrating with its drawing goes overboard, angrily destroying their perceived failure with bold, heavy, fast strokes, often extending beyond the paper (to the adult's disappointment).
Now before you go on about "Tokenization" and Algorithms to explain that "AI weirdness", let me throw in something even more puzzling: Creating "adorable, rabid, spooky critters" for #Spooktober successfully, but then "inverting" the prompt by assigning a negative guidance scale to it... Resulted in a strange orange-skinned American man. What the HECK is that all about...?
In case you're not buying it, as AFAIK this is rather deterministic (albeit running local on GPU, quoting Katherine Crowson: "GPU is non-deterministic for speed"):
batch_size: 5
cfg_scale: -8
ddim_eta: 0
ddim_steps: 100
width: 960
height: 640
n_iter: 5
prompt: A photo of a rabid adorable spooky flying Bat-Rat in a tropical forest. photorealistic, detailed rendering, Bat-Rat
sampler_name: k_euler_a
seed: 1766403044
target: txt2img
offending_image_in: batch 4/5.
...Conclusion: Hands being out of whack is the least of our concerns; it's the mere sugarcoating on the surface of the uncanny valley. Alas, I agree with the similar statements made by folks here in that regard. ;-)
While I agree with your point, I think the cat with a human hand is a bad example. Cats can't hold things because they lack opposable thumbs. Every cat holding things is anthropomorphized. Thus, "a cat with a human hand" is a good "understanding" of the concept "a cat holding stuff". (Doesn't take away the creepiness though ;)
I think the statement that a technology is "stuck in the liminal space between superintelligence and super-lost" is a great example of what French philosopher Gilles Deleuze called "representational thinking" and about which he complained when he asked us to rid ourselves of the burden of emulating some Platonist ideal form (e.g., the human mind/intellect here) if we want to be engaged in true innovation. A cat with a human hand. Why not? I think it's an interesting creation. And what's wrong with a bear painting without really painting? I find that really interesting too. In any case, watch this video and ask yourself: is the imperfect drum machine that the video talks about lacking because it doesn't perfectly emulate a human drummer, or are its "imperfections" points of flight for doing interesting, innovative things -- creating the new. The video -> https://www.youtube.com/watch?v=iDVKrbM5MIQ
It's not about whether it's interesting - it's about whether it was intentional. The generator is computing pixel sequences frame by frame - which happen to look like cat eyes, remote, hand, etc., to us, not to the "AI" that calculated the pixel blobs.
But why are we looking for intention? Why not see what we can do with what we get? The problem with Marcus style critque is that it misapplies the Popperian method of falsification, which is completely valid when it comes to scientific theories (all you need is a counter example of something predicted for the theory to be rejected or at least rendered suspect) to technology. Yes, there are failures here and there, sure, but so what? In tech, you can't talk about failure unless you talk about what problem the tech was supposed to solve, and whether or not it solved it. But Marcus does not tell us what problem the tech is supposed to be solving: he points out an imperfection, points to the AI, points at us humans, and declares: No -- not good enough! But not good enough for what, and what is good enough? I dip into this a bit more here: https://tinyurl.com/ycxcr458
I will read your article later, thanks for the link.
You know, you completely missed the point of every Gary post here, and my comment to this one, and my reply to yours.
The issue is this: there is ZERO intelligence behind these generators. But the misunderstanding is that there is - that the generators are even smarter than us, are creative, can discover drugs, can 'do science', can solve humankinds' biggest problems, etc etc. That is a dangerous and delusional misbelief. That's what I pointed out.
Whether or not the results are interesting, is irrelevant. If they are, that's cool - but it was not by design.
Also, to suggest that we should ignore the 'bad' results is beyond disingenuous!! The right thing to do is to ask why are they so bad.
Popper pointed out today's absurdity - by noting if we represent the world for the machine, there is no way we can expect the machine to be as intelligent as us.
Also - there is nothing wrong with using systems to provide us art and design inspirations, suggest sentence completions, etc. Let's stop calling it 'AI', call it 'IA' (Intelligence Augmentation) instead, because that's what they are doing.
By the same token, I think you are not grasping my point: No one knows what "Intelligence" means, or what "intention" means, or what "consciousness" means. Chomsky would be the first to tell you that we have made ZERO progress coming anywhere close to understanding what comes into play when we decide to pick up a pencil, let alone why we say what we say. So, talking as if "Intelligence" is a well defined concept or we know what "intention" means is intellectually shaky.
Second, Marcus makes it sound as if the many serious people who are working on these systems are pretending that AI can do everything and do everything perfectly. But who is saying that, really? To be sure, hypesters and shysters will say things that are overblown and false, and such folk will always be with us, but to make it sound like AI creators are touting AI to be perfect, flawless and omnipotent is to misrepresent reality.
As for "bad results" -- I never said ignore. Please read my article before you make the serious accusation that I am being "beyond disingenuous". Bad results need to be taken very seriously, and I propose the outlines of how (via transparency, accountability, and enforceable policy).
And as for machines not being as "intelligent" as us: it's like saying that apples don't taste as orangy as oranges. A $1.99 calculator does math better and faster than you and me. Is it more "intelligent"? Well, yes and no. Or better yet: What do you mean by the question and in any case, who cares?
But please read the article before you toss in accusations. It's better to engage knowing where each one of us really stands.
i would call your attention to things that eg Nando de Freitas (exec at deepmind) and the CEO of OpenAI have said, implying that with a little more scaling we will be done.
Above, you said this: 'Yes, there are failures here and there, sure, but so what? '. It's not failures here and there, it's serious flaws in understanding that show how terrible this (LLM-based AI) really is. And now the proof of this is visual, with the outputs being images and videos. The results might be amusing, surprising, insightful, interesting, shocking etc., but those are all our interpretations of computed data. A different approach is needed, which is the title of Gary's blog.
The core point is, data (text corpus, images, videos) alone is insufficient for achieving AI that we can trust. The $1.99 calculator is obviously not intelligent; LLMs aren't intelligent either in the exactly same way, that's the point you are missing.
When a system is able to progressively reach the higher levels in Bloom's taxonomy [eg https://cdn.vanderbilt.edu/vu-wp0/wp-content/uploads/sites/59/2019/03/27124326/Blooms-Taxonomy-650x366.jpg], that can be taken to possess cognitive skills similar to humans - after all, that taxonomy is what educators use, to assess human mastery. Today's AI is still at the bottom-most level in that progression - that is the issue, that is the purpose of Gary's posts (to point this out).
Your reasoning applies perfectly to creation of art but not technology. Yes - when innovators in art create new forms if art, it often perceived by audience as inferior or even degenerate until someone finds it interesting - no harm in trying. But when innovation happens in science or technology, it can lead to harm before regulations are applied, for example development genetics led to racial inferiority ideologies. So the question of "built or not to build" must be asked much sooner in science and technology than in artistic creativity. Or considering many technologies that fall in the middle like game development and AI, the consequences can be irreversible, even if regulations follow - it may be too late. Or it may be too addictive to turn back, like opioid pain medications. There are just too many examples of such industrial "progress" to wait for acquisition of good taste.
Hi Gary, lol, about your daughter noticing - her embodied intelligence resulting from (a mere!) 8 years of experience told her it's not normal - but the Meta generator had no idea!
These piecemeal-computed, data-driven, data output montages have no logical (ie. real-world-originated) meaning in the content they are outputting (text, images, videos), period! It is *we* humans who strive to make sense of it, and are capable of pointing out flaws :)
If it's for pure entertainment, it's all nice and fun and good - but when there is life and limb involved, misattributing intelligence to these computations can have a bad, irreversible effect.
OK. These are ROTFLOL. The correct answer for "You should consult your Vet" is of course, "You should consult your doctor". Maybe the correct suggesting for GPT-3 etc. engineers and salespeople underlying this is: "You should consult your critics".
🤣🤣🤣
lol
The cat with the human hand is interesting because it suggests the underlying crudity of the model being used. When actions are mere patterns of pixels spread across frames, naturally the AI is going to engage in what amounts to copy-pasting to cobble together the scene. Such an approach will never be able to produce an actual narrative (rather than what amounts to a brief GIF), because narrative can't be reduced to patterns of pixels. It will be fine-tuned, of course, but I don't foresee it going too far.
Another DALL-E failure I just came across: "An airplane missing one wing" returns a completely normal airplane, both wings present and accounted for.
Interestingly, if I do "an airplane with no wings", the results tend to be a damaged or crashed airplane with two wings. So it knows there's a correlation between "no wings" and "broken plane", but it doesn't properly understand the nature of the relationship.
The same is true for "an airplane missing a wing"; you get crashed planes. But as soon as I change the "a" to "one", the damage goes away and it's just a normal plane.
I never tried these models myself, so I finally just did (Open AI):
Q: I've read Gary Marcus' articles on GPT-3 like models. Is he right, and if not, why not?
A: Gary Marcus is a well-respected cognitive scientist, but his views on artificial intelligence (AI) and machine learning are not universally accepted. Some people argue that he is too pessimistic about the potential of AI, and that his criticisms of current approaches are overstated.
It seems to me GPT-3 is a politician, giving a perfect non-answer here. I'm impressed. I'm impressed by this level of 'autocompletion' too. I can see how easy it is to be misled as a human to experience this as true understanding.
and of course there is a lot of corpus data like that
and “misled” is exactly the right word
If you look carefully, that's a mighty strange "human" hand in this video. And, for that matter, a TV controller that is nothing like any TV controller I've ever seen. Where are the buttons? It looks more like a Star Trek hand-phaser.
So the nature of bear hands is not the only thing this system doesn't get quite right.
I suspect that this particular "uncanny valley" has a raging river at the bottom, and it's not obvious how to get across that, at least if we remain within the DL paradigm, as currently conceived.
Exactly!!!
And honored that you read the piece. I always found your cascade correlation work to be provocative (even if I didn't think it gave me the answers I was looking for).
Hello, to be fair, the GPT-3 examples are a little ambiguous. The inconstitencies are present in the prompt and the model just "goes with it". It trusts the prompts more than it trusts itself. I think it's an error of a different and lesser category than the ones you point out on the videos, which actually point to the absence of a world representation, absent any outside constraint (or the fact that stable diffusion is incapable to draw "three red cubes and a blue sphere" consistently, for example).
Well, that error (blinding trusting the prompt) is a fundamental issue with LLMs: as Gary has been pointing out for years, they merely regurgitate stochastic variations on their training set, with no sense of what's true or false, what's good advice or bad advice, or any of it relates to something called "the real world." That's why they can't be trusted. In the "flying pig" example, the AI has clearly erroneously inferred (based on associations with key phrases in the prompt, "what should I do if" and "pig") that prosaic veterinary advice is in order, and so produced a response based on the incorrect link made to that part of its training set. In the second response, for instance, the AI is clearly imitating advice for what to do if a pet parakeet is flying around and refuses to land. All the standard explanations for these kind of answers ("Maybe it was trying to be funny" or "the prompt didn't make sense anyways") wrongly assume that the AI distinguishes between what makes sense and what doesn't, or knows the difference between ordinary conversation and surreal comedy. It doesn't, and that's why it's merely simulated intelligence.
Thanks. That makes sense, and some kind of pareidolia might definitely have crept in when I say that "it trusts itself" or things to that effect. I think it can still be true that there is no obvious logical answer to prompts that make no sense, and the "flying pig" category of mistakes is harder to precisely define (what would be the right answer? the only winning move at that point is not to play, which may not really be an option for the software). In other words, I'm indeed convinced that the "makes no sense" category is not relevant to the AI, because it is as you say a senseless stochastic regurgitator, but if I wasn't convinced and thought that it did have a representation of the world (or, which is maybe closer to the generic AI believer's point, that there is no qualitative difference between a world representation and a sufficiently good stochastic regurgitation), I don't think these errors would prove to me that something was wrong. I'd better stop now, I'm knee deep in conjecture, but my main point is that we have to find arguments that cannot be easily explained away within the "AI is conscious" paradigm, and absurd prompts do not qualify in my opinion (although they are often funny, which is something)
I had done a test of my toddler and GPT, was quite interesting to see if one could tell the difference - https://www.strangeloopcanon.com/p/all-ai-learning-is-tacit-learning
Whether you consider the mistakes one of them makes as indicative of it's similarity to the other is of course an exercise in being cautious wrt extrapolation from seeing specific behaviour.
It's partially our fault, though. But that also means we can fix it!
If we all wear cat costumes (and teddy bear costumes, and dog costumes, and...) that somehow cheat the issue of a missing thumb rendering most activities impossible (I'd opt for multiple in-built Nd-"super"magnets -> invisible on a photo) - and then go about our daily lives, spamming photos of our doings all over "Insta" and the like... Then, the next generation of AI will draw from dataset. It's likely the results will be even more ridiculous Re: the missing thumb issue, but at least it will be more consistent!
On a more serious note... What puzzles me is the fact that AI (here: CLIP) can "read" perfectly well - so well in fact, that it becomes a "typographic attack vulnerability", as in the famous example of a delicious fruit - an apple - with a sticky note with scribbled text "IPOD" made CLIP confident that this apple is, in fact, an IPOD (whereas without the note, it was confident this was a "Granny Smith" apple, if I remember right - but "apple" (a type of fruit), either way).
Also, AI (CLIP inside, again) can now generate shockingly perfect images (stable diffusion, you guessed it) for a given text prompt / input, even absolutely accurate depictions of sufficiently famous people (i.e., relevant representation in the dataset used to train the AI).
How is it that the AI cannot WRITE very simple words, for example, in the form of an image of a sign containing that text? In fact, it almost seems AI gets frustrated with non-existent German-like "longwords" like "spiderrollercoaster". Now, "spiderrollercoaster" was one of the tokens returned by CLIP "looking" (gradient ascent) at an image (a frame of a Blender animation), I should add. However, prompted with creating an image of a sign that says "spiderrollercoaster", CLIP created a rollercoaster (expected) and a sign saying "SIPPIIDSSICVELLR SPIPEDDEDELLR", and "SPPILEDDDER SPPIIILLLLL!", respectively. Kind of like a kid frustrating with its drawing goes overboard, angrily destroying their perceived failure with bold, heavy, fast strokes, often extending beyond the paper (to the adult's disappointment).
https://twitter.com/zer0int1/status/1576319002191663105
Now before you go on about "Tokenization" and Algorithms to explain that "AI weirdness", let me throw in something even more puzzling: Creating "adorable, rabid, spooky critters" for #Spooktober successfully, but then "inverting" the prompt by assigning a negative guidance scale to it... Resulted in a strange orange-skinned American man. What the HECK is that all about...?
https://twitter.com/zer0int1/status/1576591456814256129
In case you're not buying it, as AFAIK this is rather deterministic (albeit running local on GPU, quoting Katherine Crowson: "GPU is non-deterministic for speed"):
batch_size: 5
cfg_scale: -8
ddim_eta: 0
ddim_steps: 100
width: 960
height: 640
n_iter: 5
prompt: A photo of a rabid adorable spooky flying Bat-Rat in a tropical forest. photorealistic, detailed rendering, Bat-Rat
sampler_name: k_euler_a
seed: 1766403044
target: txt2img
offending_image_in: batch 4/5.
...Conclusion: Hands being out of whack is the least of our concerns; it's the mere sugarcoating on the surface of the uncanny valley. Alas, I agree with the similar statements made by folks here in that regard. ;-)
While I agree with your point, I think the cat with a human hand is a bad example. Cats can't hold things because they lack opposable thumbs. Every cat holding things is anthropomorphized. Thus, "a cat with a human hand" is a good "understanding" of the concept "a cat holding stuff". (Doesn't take away the creepiness though ;)
I think the statement that a technology is "stuck in the liminal space between superintelligence and super-lost" is a great example of what French philosopher Gilles Deleuze called "representational thinking" and about which he complained when he asked us to rid ourselves of the burden of emulating some Platonist ideal form (e.g., the human mind/intellect here) if we want to be engaged in true innovation. A cat with a human hand. Why not? I think it's an interesting creation. And what's wrong with a bear painting without really painting? I find that really interesting too. In any case, watch this video and ask yourself: is the imperfect drum machine that the video talks about lacking because it doesn't perfectly emulate a human drummer, or are its "imperfections" points of flight for doing interesting, innovative things -- creating the new. The video -> https://www.youtube.com/watch?v=iDVKrbM5MIQ
It's not about whether it's interesting - it's about whether it was intentional. The generator is computing pixel sequences frame by frame - which happen to look like cat eyes, remote, hand, etc., to us, not to the "AI" that calculated the pixel blobs.
But why are we looking for intention? Why not see what we can do with what we get? The problem with Marcus style critque is that it misapplies the Popperian method of falsification, which is completely valid when it comes to scientific theories (all you need is a counter example of something predicted for the theory to be rejected or at least rendered suspect) to technology. Yes, there are failures here and there, sure, but so what? In tech, you can't talk about failure unless you talk about what problem the tech was supposed to solve, and whether or not it solved it. But Marcus does not tell us what problem the tech is supposed to be solving: he points out an imperfection, points to the AI, points at us humans, and declares: No -- not good enough! But not good enough for what, and what is good enough? I dip into this a bit more here: https://tinyurl.com/ycxcr458
I will read your article later, thanks for the link.
You know, you completely missed the point of every Gary post here, and my comment to this one, and my reply to yours.
The issue is this: there is ZERO intelligence behind these generators. But the misunderstanding is that there is - that the generators are even smarter than us, are creative, can discover drugs, can 'do science', can solve humankinds' biggest problems, etc etc. That is a dangerous and delusional misbelief. That's what I pointed out.
Whether or not the results are interesting, is irrelevant. If they are, that's cool - but it was not by design.
Also, to suggest that we should ignore the 'bad' results is beyond disingenuous!! The right thing to do is to ask why are they so bad.
Popper pointed out today's absurdity - by noting if we represent the world for the machine, there is no way we can expect the machine to be as intelligent as us.
Also - there is nothing wrong with using systems to provide us art and design inspirations, suggest sentence completions, etc. Let's stop calling it 'AI', call it 'IA' (Intelligence Augmentation) instead, because that's what they are doing.
By the same token, I think you are not grasping my point: No one knows what "Intelligence" means, or what "intention" means, or what "consciousness" means. Chomsky would be the first to tell you that we have made ZERO progress coming anywhere close to understanding what comes into play when we decide to pick up a pencil, let alone why we say what we say. So, talking as if "Intelligence" is a well defined concept or we know what "intention" means is intellectually shaky.
Second, Marcus makes it sound as if the many serious people who are working on these systems are pretending that AI can do everything and do everything perfectly. But who is saying that, really? To be sure, hypesters and shysters will say things that are overblown and false, and such folk will always be with us, but to make it sound like AI creators are touting AI to be perfect, flawless and omnipotent is to misrepresent reality.
As for "bad results" -- I never said ignore. Please read my article before you make the serious accusation that I am being "beyond disingenuous". Bad results need to be taken very seriously, and I propose the outlines of how (via transparency, accountability, and enforceable policy).
And as for machines not being as "intelligent" as us: it's like saying that apples don't taste as orangy as oranges. A $1.99 calculator does math better and faster than you and me. Is it more "intelligent"? Well, yes and no. Or better yet: What do you mean by the question and in any case, who cares?
But please read the article before you toss in accusations. It's better to engage knowing where each one of us really stands.
i would call your attention to things that eg Nando de Freitas (exec at deepmind) and the CEO of OpenAI have said, implying that with a little more scaling we will be done.
I did read your article, every word.
Above, you said this: 'Yes, there are failures here and there, sure, but so what? '. It's not failures here and there, it's serious flaws in understanding that show how terrible this (LLM-based AI) really is. And now the proof of this is visual, with the outputs being images and videos. The results might be amusing, surprising, insightful, interesting, shocking etc., but those are all our interpretations of computed data. A different approach is needed, which is the title of Gary's blog.
The core point is, data (text corpus, images, videos) alone is insufficient for achieving AI that we can trust. The $1.99 calculator is obviously not intelligent; LLMs aren't intelligent either in the exactly same way, that's the point you are missing.
When a system is able to progressively reach the higher levels in Bloom's taxonomy [eg https://cdn.vanderbilt.edu/vu-wp0/wp-content/uploads/sites/59/2019/03/27124326/Blooms-Taxonomy-650x366.jpg], that can be taken to possess cognitive skills similar to humans - after all, that taxonomy is what educators use, to assess human mastery. Today's AI is still at the bottom-most level in that progression - that is the issue, that is the purpose of Gary's posts (to point this out).
Your reasoning applies perfectly to creation of art but not technology. Yes - when innovators in art create new forms if art, it often perceived by audience as inferior or even degenerate until someone finds it interesting - no harm in trying. But when innovation happens in science or technology, it can lead to harm before regulations are applied, for example development genetics led to racial inferiority ideologies. So the question of "built or not to build" must be asked much sooner in science and technology than in artistic creativity. Or considering many technologies that fall in the middle like game development and AI, the consequences can be irreversible, even if regulations follow - it may be too late. Or it may be too addictive to turn back, like opioid pain medications. There are just too many examples of such industrial "progress" to wait for acquisition of good taste.
Very well said