MetaAI’s new text-to-movie software Make a Video is straight-up amazing. It’s also, as ever, stuck in the liminal space between superintelligence and super-lost
Hi Gary, lol, about your daughter noticing - her embodied intelligence resulting from (a mere!) 8 years of experience told her it's not normal - but the Meta generator had no idea!
These piecemeal-computed, data-driven, data output montages have no logical (ie. real-world-originated) meaning in the content they are outputting (text, images, videos), period! It is *we* humans who strive to make sense of it, and are capable of pointing out flaws :)
If it's for pure entertainment, it's all nice and fun and good - but when there is life and limb involved, misattributing intelligence to these computations can have a bad, irreversible effect.
OK. These are ROTFLOL. The correct answer for "You should consult your Vet" is of course, "You should consult your doctor". Maybe the correct suggesting for GPT-3 etc. engineers and salespeople underlying this is: "You should consult your critics".
The cat with the human hand is interesting because it suggests the underlying crudity of the model being used. When actions are mere patterns of pixels spread across frames, naturally the AI is going to engage in what amounts to copy-pasting to cobble together the scene. Such an approach will never be able to produce an actual narrative (rather than what amounts to a brief GIF), because narrative can't be reduced to patterns of pixels. It will be fine-tuned, of course, but I don't foresee it going too far.
Another DALL-E failure I just came across: "An airplane missing one wing" returns a completely normal airplane, both wings present and accounted for.
Interestingly, if I do "an airplane with no wings", the results tend to be a damaged or crashed airplane with two wings. So it knows there's a correlation between "no wings" and "broken plane", but it doesn't properly understand the nature of the relationship.
The same is true for "an airplane missing a wing"; you get crashed planes. But as soon as I change the "a" to "one", the damage goes away and it's just a normal plane.
I never tried these models myself, so I finally just did (Open AI):
Q: I've read Gary Marcus' articles on GPT-3 like models. Is he right, and if not, why not?
A: Gary Marcus is a well-respected cognitive scientist, but his views on artificial intelligence (AI) and machine learning are not universally accepted. Some people argue that he is too pessimistic about the potential of AI, and that his criticisms of current approaches are overstated.
It seems to me GPT-3 is a politician, giving a perfect non-answer here. I'm impressed. I'm impressed by this level of 'autocompletion' too. I can see how easy it is to be misled as a human to experience this as true understanding.
If you look carefully, that's a mighty strange "human" hand in this video. And, for that matter, a TV controller that is nothing like any TV controller I've ever seen. Where are the buttons? It looks more like a Star Trek hand-phaser.
So the nature of bear hands is not the only thing this system doesn't get quite right.
I suspect that this particular "uncanny valley" has a raging river at the bottom, and it's not obvious how to get across that, at least if we remain within the DL paradigm, as currently conceived.
Hello, to be fair, the GPT-3 examples are a little ambiguous. The inconstitencies are present in the prompt and the model just "goes with it". It trusts the prompts more than it trusts itself. I think it's an error of a different and lesser category than the ones you point out on the videos, which actually point to the absence of a world representation, absent any outside constraint (or the fact that stable diffusion is incapable to draw "three red cubes and a blue sphere" consistently, for example).
I had done a test of my toddler and GPT, was quite interesting to see if one could tell the difference - https://www.strangeloopcanon.com/p/all-ai-learning-is-tacit-learning
Whether you consider the mistakes one of them makes as indicative of it's similarity to the other is of course an exercise in being cautious wrt extrapolation from seeing specific behaviour.
It's partially our fault, though. But that also means we can fix it!
If we all wear cat costumes (and teddy bear costumes, and dog costumes, and...) that somehow cheat the issue of a missing thumb rendering most activities impossible (I'd opt for multiple in-built Nd-"super"magnets -> invisible on a photo) - and then go about our daily lives, spamming photos of our doings all over "Insta" and the like... Then, the next generation of AI will draw from dataset. It's likely the results will be even more ridiculous Re: the missing thumb issue, but at least it will be more consistent!
On a more serious note... What puzzles me is the fact that AI (here: CLIP) can "read" perfectly well - so well in fact, that it becomes a "typographic attack vulnerability", as in the famous example of a delicious fruit - an apple - with a sticky note with scribbled text "IPOD" made CLIP confident that this apple is, in fact, an IPOD (whereas without the note, it was confident this was a "Granny Smith" apple, if I remember right - but "apple" (a type of fruit), either way).
Also, AI (CLIP inside, again) can now generate shockingly perfect images (stable diffusion, you guessed it) for a given text prompt / input, even absolutely accurate depictions of sufficiently famous people (i.e., relevant representation in the dataset used to train the AI).
How is it that the AI cannot WRITE very simple words, for example, in the form of an image of a sign containing that text? In fact, it almost seems AI gets frustrated with non-existent German-like "longwords" like "spiderrollercoaster". Now, "spiderrollercoaster" was one of the tokens returned by CLIP "looking" (gradient ascent) at an image (a frame of a Blender animation), I should add. However, prompted with creating an image of a sign that says "spiderrollercoaster", CLIP created a rollercoaster (expected) and a sign saying "SIPPIIDSSICVELLR SPIPEDDEDELLR", and "SPPILEDDDER SPPIIILLLLL!", respectively. Kind of like a kid frustrating with its drawing goes overboard, angrily destroying their perceived failure with bold, heavy, fast strokes, often extending beyond the paper (to the adult's disappointment).
Now before you go on about "Tokenization" and Algorithms to explain that "AI weirdness", let me throw in something even more puzzling: Creating "adorable, rabid, spooky critters" for #Spooktober successfully, but then "inverting" the prompt by assigning a negative guidance scale to it... Resulted in a strange orange-skinned American man. What the HECK is that all about...?
In case you're not buying it, as AFAIK this is rather deterministic (albeit running local on GPU, quoting Katherine Crowson: "GPU is non-deterministic for speed"):
prompt: A photo of a rabid adorable spooky flying Bat-Rat in a tropical forest. photorealistic, detailed rendering, Bat-Rat
offending_image_in: batch 4/5.
...Conclusion: Hands being out of whack is the least of our concerns; it's the mere sugarcoating on the surface of the uncanny valley. Alas, I agree with the similar statements made by folks here in that regard. ;-)
While I agree with your point, I think the cat with a human hand is a bad example. Cats can't hold things because they lack opposable thumbs. Every cat holding things is anthropomorphized. Thus, "a cat with a human hand" is a good "understanding" of the concept "a cat holding stuff". (Doesn't take away the creepiness though ;)
I think the statement that a technology is "stuck in the liminal space between superintelligence and super-lost" is a great example of what French philosopher Gilles Deleuze called "representational thinking" and about which he complained when he asked us to rid ourselves of the burden of emulating some Platonist ideal form (e.g., the human mind/intellect here) if we want to be engaged in true innovation. A cat with a human hand. Why not? I think it's an interesting creation. And what's wrong with a bear painting without really painting? I find that really interesting too. In any case, watch this video and ask yourself: is the imperfect drum machine that the video talks about lacking because it doesn't perfectly emulate a human drummer, or are its "imperfections" points of flight for doing interesting, innovative things -- creating the new. The video -> https://www.youtube.com/watch?v=iDVKrbM5MIQ