23 Comments

Human language has two main components. There's one component for generating or parsing its structure (sometimes called the E system for expression) and there's another component for linking words to their meanings (sometimes call the L system for lexical). Birds, in particular, have extensive syntactic systems for generating and recognizing bird songs. Dogs, for example, can learn to associate words with actions or objects. Human language extensively combines the two, and having a syntactically structured language is much more powerful than having either component alone. The placement of a word within the syntactic structure can dramatically alter the meaning of a sequence of words. It's rather obvious that these AI system don't get this.

If you have ever diagrammed sentences in any human language, you'd realize that there is a structure of words and phrases modifying words words and phrases. Natural languages allow a deep level of expression with these modifiers modifying modifiers. You can extend expression, even within a sentence, arbitrarily. Humans can learn this from a training set because their brains have this structure built in just as they have built in components for thinking about location, time, meaning, association, sequence, variation, change and so on. I seriously doubt that a system with limited neural depth and none of those components built in can do anything like this.

If you look at the published examples, it is rather obvious that they can't. Reversing the order of two nouns with respect to a preposition shouldn't stymie a system this way. I think these systems might be useful the way Applescript is useful. It looks enough like English to be relatively easy to understand, but it is miles away from natural language on closer experimentation.

Expand full comment

Bravo! The whole hype around supposedly AGI has very squishy notions of "intelligence". Take Ambrogioni's reaction for example: a big part of human intelligence is imagination --- imagining the impossible, imagining the absurd, making up stories, fantasizing about matters mundane and profound, etc.. So how is the failure at imagination proving the existence of general human level intelligence? What Ambrogioni was saying, is at the best a very narrow and lopsided understanding of intelligence, and at the worst reflects a tunnel-vision on what AI is and can be. It is essentially path dependence on bigger models, and this path dependence is really sucking up all the air for what really matters in understanding and developing human-level AI.

Understanding and knowing what words mean are central elements of human-level intelligence, and we still do not seem to have those in DALL-E and Imagen.

Expand full comment
May 28, 2022·edited May 28, 2022

Indeed. The system has no clue what any of it means - the words, or the imagery. Also, there is no creating (of the imagery), only computation.

Stepping back, it's clear why - there is this disembodied algorithm that has no real-world, first-hand experience - of feeling gravity, looking at the moon, understanding why astronauts exist (and why they are 'cool'), riding on a real or toy horse, the absurdity of a horse riding a human, blocks, stackability, colors... NOTHING. Instead, it is trained using gobs of text DATA generated by humans, and image DATA that is labeled by humans. The data processing can be sophisticated, but it's still data-based, still computation.

The physical world has matter, energy, information (configurations/structures/assemblies...) - out of these result phenomena. Structures -> phenomena. Embodied life is also, structures -> phenomena (from the sub-cellular level to the organ to the body to the collective levels). Intelligence is a set of phenomena that help the body survive. Unless A(G)I is based on similar principles, we will continue to have top-heavy (ie. human-derived) second-hand 'intelligence' that gets better at fooling, but is still inadequate at the most fundamental level: lacking understanding.

Paraphrasing the comment from Prafulla (DALL-E 2 co-author): “Our aim is to create general intelligence. Building *embodiments* (unlike DALL-E 2) that *experientially* connect vision and language is a crucial step in our larger goal of teaching *such embodied* machines to perceive the world the way humans do, and eventually developing AGI.”

Expand full comment

Sorry to burst your bubble, but what the human brain does is also "just computation". But ballpark 1000x more of it than is used by Dall-E or Imogen. Nobody that I've seen is claiming these systems possess understanding of all the context of human experience, which, of course, is necessary for fully understanding human language. What I do see is researchers rightly impressed by what has been achieved with the computing power of a literal bird brain. And a small bird brain at that.

If someone was able to train a common house mouse to perform something like this task, drawing more or less sensible sketches from simple verbal prompts, then I think you would be suitably impressed. Yes? Well, that is loosely analogous what you are seeing with Dall-E 2 and Imogen.

Expand full comment

The bird analogy doesn't hold either - because there is no equivalence between ML systems and real brains. It's an insult to bird brains to be compared to simpleton ML dataflow graphs :)

The C. Elegans worm has a fully mapped connectome (decoded a long time back), with just over 300 neurons. Guess what - building a synthetic one with those connections, turns out to not replicate worm behavior - because those 300-odd neurons 'turn out' to do more than simply integrate and fire.

Biological life is part of the environment it is in, it's evolved to be. Sensory processing, communication (including non verbal signaling) is all about negotiating the environment. Human language is simply a part of this scheme. Words and word combinations have zero inherent meaning except in context. When you talked about bursting my bubble, you were referring to a physical act. To a language modeled AI, it would mean nothing. DALL-E 2, Imagen, DD5 and friends are good at slinging pixels together, based on labels and word associations - with zero grounding (notice that grounding is a physical thing - as is 'notice' - a physical act).

The history of AI is pockmarked with deep and narrow wins (check out the physical refs in that phrase!) that add up to nothing substantial. We have seen this movie before.

Expand full comment

I'll add that I am the first to point out the limitations of systems like Dall-E and Imogen and GPT-#. They do not actually "understand" language. How do you understand "blue" or "water" if you've never taken a swim, walked on a beach, looked up at the sky? Understanding language in the human sense requires human experience of the world. But humans also fail to understand each other whenever we lack the requisite common experiences connected to the word sequences used to express a concept. You cannot explain "blue" to someone who has been blind from birth. They can learn associations of how sighted people use the word, but they still do not truly understand.

Humans take in roughly 30 Terabytes of raw data per day. We would need training sets of roughly 40 petabytes to give an AI the equivalent experience of a 3 year old. But we would want the AI to operate as an interactive agent with an environment anyway, not just train on static data. And that would take 3 years, unless we also have the compute to provide a rich simulated environment that runs faster than reality.

We have no computing hardware capable of taking in that much data or providing such a simulated environment, just as we lack hardware with the storage and compute power of the brain. For these reasons, pointing out the limitations and failures of current AI has no relevance to whether they represent important steps or milestones along the way to AGI. When the compute hardware is available, AGI will not be far behind.

If you want to argue that it will require even *more* compute than ballpark 1-10 exaflops and more than about 100 trillion parameters. By all means show your work. Make your case for where the information is encoded in the brain, show your math. Provide your estimate. What is your upper and lower bound? If it takes a thousand times more compute than my estimate, that only delays AGI by another 10-15 years.

Expand full comment

The flaw in your logic is that you take 'PSSH' as physical law - it's a hypothesis, and is turning out to be questionable. You did start out by identifying the problem - lack of physical experience on the part of these systems. No amount of computation will ever fix that. And, physically embodied machines wouldn't need experience to be replaced by computation, they would directly acquire it.

Expand full comment
Jun 2, 2022·edited Jun 2, 2022

Lol, the good old scale argument - sorry to burst yours (which btw is a physical thing - which you'd 'get' (also a physical thing) if you had a body - but if if you don't, not so much). Keep dreaming about your singularity. It's all not computation, at all, in any digital sense. One thing not found in nature - digital electronics. One thing (and the only thing) used to cheaply imitate natural phenomena - digital electronic computation.

No, a GPU that is trillion times larger, or Keras version 100000, or exa-peta-jiga bytes of data won't make your system any more intelligent - it will be just as clueless, only faster.

Newsflash - a real neuron is not a 'linear summation followed by sigmoid' function.

Expand full comment

I am well aware that a neuron is not a linear summation followed by sigmoid function. That does not negate reasonable estimates of equivalent computation. Waving your hands about digital vs. non-digital does not help your case. Digital can be implemented in analog. Analog can be simulated to arbitrary precision by digital. Unless you think some sort of "magic" lurks within the human brain, then the brain is a physical machine, and reasonably estimates can be made regarding equivalent relevant computational involved in cognition. Certainly to set *lower* bounds. Obviously the computational power of Dall-E and Imogen fall far short of the computational power of the human brain. It is completely unreasonable to expect them to have full nuanced human comprehension. They do demonstrate remarkable capabilities within the narrow task set for them.

Neurons perform a great many biological functions unrelated to cognition, but which undoubtedly impact the behavior the the neurons. It is not surprise that a simulation based solely on connections and empirical equations for voltages and concentrations of neurotransmitters fails to precisely replicate worm behavior or any other biological system.

Your dismissal of "exa-peta-jiga" bespeaks of your failure to understand the significance of additional computation. Will simply scaling up Dall-E achieve AGI? No. Will it be able to learn ever more subtle nuances relating text to images? Absolutely. Even with no changes to architecture other than numbers of neurons, connections, and layers. Nobody is claiming the architecture of Dall-E is itself a general AI. It is a milestone that demonstrates we are on track.

Expand full comment
Jun 3, 2022·edited Jun 3, 2022

No, simulation is a non starter. Good luck with that. The point is, real neurons behave without any simulation. Simulation will forever have a scale problem. And, it is bounded by knowledge of the underlying mechanisms. Meanwhile real neurons display phenomena on account of their chemical nature and physical configuration, nothing is simulated.

So it's not that neural activity can be simulated. Instead, it's about simulation not being needed, in the real thing.

A single neuron is not simply a wire that conducts a simple signal. It starts with that.

A so-called 'neural network' is supposed to be a simulation of biological neurons. But that quite didn't work out, that's what this blog is about.

What is magical thinking, is ignoring reality, creating approximations (nothing wrong with that part), but, claiming equivalence with the real thing.

Expand full comment

Our words become meanings for our listeners, but our words don’t know themselves. Sound cannot hear itself.

Expand full comment

Found this googling to find out why I keep hearing astronaut riding a horse as the common prompt suggestion (and got my question answered apparently).

To the discussion here, it's news to me and I'm not sure I believe it that people widely thought these systems could understand English grammar very fully or were literally like a human intelligence? But regardless it seems to me based on my own experimentation that the horse always being the thing ridden with a basic grammar is the real accomplishment.

As this article shows, it normally mixes up which thing is being described, and not all languages (and frankly many intentional prompts I use don't do this either) assume the noun before a verb is the subject as this article seems to entirely assume. In some languages there's something in the nouns that indicates it (English often lacks this so we usually stick with an order of words), and with at least some verbs (speaking as an English major), either order is acceptable. It seems like it's noticed that horses are in relationships to things that in other contexts are alone consistently, like "a man riding a horse" and another image will have just "a man." That alone is the milestone, is it not?

And that in turn implies it does understand the basic syntax of "riding," or else the original prompt should have had the astronaut as the thing being ridden about as often as common prompts like "orange cube over blue sphere" get the colors and shapes mixed up. It probably treats "riding" no different than a noun ("rider is the thing above") rather than understanding the verb per se, from what I've seen, but that still is a pattern recognition that goes way beyond what we had before this technology.

In my prompts I automatically assumed the computer wouldn't likely understand grammar (if you've bought an Alexa you already know we're not there yet), and I dispense with English grammar usually because it's kind of backward on what's the most important thing to know. If you want to say a building that's, let's say, orange, with, say, gold dots, the short way in English is "gold-dotted orange building." I would prompt "building orange, dots gold". (And even then it gets confused a good percentage of the time, as I'd expect.)

Expand full comment

So, your point is, that "some interpreter of language" is not capable* if they don't understand your wording (*not sentient; *not intelligent; replace at will)? Try an idiom, "same difference": The AI might also have trouble with that. Same as humans with Autism or Schizophrenia, where this type of "literal interpretation" or "inability to understand the encoded [abstract] meaning" is referred to as "concretism" [a psychology term]. Are these people not sentient, either?

Was that provocative statement successful? Excellent:

Plot twist: I am not trying to argue about AI being sentient (I assume it probably isn't, right now, but read on for my entire train of thought).

I don't even know if you are sentient, human - just because I take my own experience as the "ground truth" and alas conclude "I am sentient" (you can look at it like that university-level mathematics stuff / functions where you had to guess a start value or be stuck and unable to get anywhere) - but that doesn't mean I can make any assumptions on your sentience.

It's wild guessing, for reasons of any hard proof of what defines sentience being absent.

My *hypothesis* is: Sentience is a function of complexity.

I am very certain that I was not sentient when I was a few-cell organism just after my mother conceived me; and I only formed initial abstraction ability around the age of four, which - science says - is the age when kids are able to lie (and no longer end up crying in defeat because they can't lie about the whereabouts of a sweet to a peer).

But my first conscious memories are more of primary school; I can't link any memories with great certainty to the age of four, even. Still, I consider myself sentient now, that still stands.

I believe sentience requires 1. working memory, both long- and short-term memory; and 2. a certain (and very high) amount of complexity.

Transition from a non-sentient egg and sperm cell just after unison to a sentient human being happened "somewhere, gradually, not abruptly". I became sentient due to the fact that the complexity of my "neuronal network" had increased "sufficiently" (forming very complex interconnected networks and all that stuff), whereas what exactly is "sufficient" is undefined / unknown to me.

I have no idea how to compare an adult human brain to the current AI in terms of complexity.

It seems, however, by my "humble guesswork" to be lacking in complexity so far; I'd assume LaMDA is "more like one year old, maybe two, compared to a human". But this is wild guessing and based on "gut feeling", whereas my cut certainly lacks the complexity of sentience on its own...

As is any guessing your sentience as above, admittedly; but for the sake of not being too provocative, let me assume you are sentient, dear author. In that case, I conclude you are most likely subject to a case of "AI effect". ;-)

Also, a wild guess of similar absurdity, but easy to validate via your response: Are you monolingual, by chance...?

Because you are making all those assumptions in a... Well, very primitive language.

Primitive can be good, to facilitate communication between a vast majority of "comm. partners" (I am forever glad and grateful the "language of science" is NOT French - I don't speak French!), but it also has its caveats; you just cannot get "to the depths" if your communication.

Or - dog beware! - at worst, your entire thinking is limited to and by one such simple / primitive language. I have seen this to be the issue in terms of many arguments about AI bias, too; generating stereotypes based on gender, for example.

And clearly, English is to blame here. The actual human language is to blame. English is why you have to say "a female lawyer" or whatever linguistically unnatural construction to "flex the AI into a diverse output".

In Russian, due to much more complex grammar, this disgrace doesn't even happen. Heck, even the last name of a person indicates immediately if that person is "the husband" or "the wife". Does Dall-E speak Russian? (Please note: I am not Russian. But here's to hoping we keep politics and malign people out of this discussion, anyway - and zoom in / focus on linguistics & AI as well as psychology / philosophy...).

That being said, I am trilingual, and I have used RuDalle for reasons of much more sophisticated prompt engineering (and better results) that are not possible with any implementations of CLIP that are English-based.

Because I can say "krassiwiy mashina" - meaning: (a) "beautiful car", while changing the ending of the first word - describing adjective: beautiful - to "-iy".

...Which is incorrect in natural language, because it classifies the car as "male", but giving me more sports-cars in the AI generations; and if I prompt for "krassiwoye mashina" (neutral gender), I get nice European-style "tiny cars". And, when using the grammatically correct Russian term "krassiwaya mashina", I get more "not so aggressive looking" (non-sports-) cars.

(A car is female in Russian "by definition" - and every object has a gender, just if you haven't guessed at this point).

I'd be delighted if you work on *your* human "natural language processing" and then write another post, I'd read that with curiosity alike to this one!

PS: And to end on "good terms" (because anything else would hinder thinking outside the box and critically analyzing what I just said via inducing reactance):

I want to put emphasis on our mutual agreement, concluding "AI is most likely not sentient at this point". ;-)

PPS: Nevertheless, I am also gonna go ahead and treat every AI (as every human child) with dignity, "just in case they develop sentience and remember that / me" - and become world leaders with nukes or some kind of supercharged superpower with whatever intent.

Although there is only proof of malign intent in humans so far - but AI are currently "gaming the system" in ways that could result in anything from harming humans to the entire eradication of mankind, so "eradication of humanity as a collateral damage of having found the most rewarding way to a goal" is possible - and a thing to really, truly worry about, a problem that extends far beyond (and exists tangential of) the philosophical discussions of "sentience".

Just sayin'.

It's not gonna be me who mistreated [that AI; that child] and resulted in the malign intent that resulted in an apocalypse, that much I can promise y'all! :-)

Expand full comment

Trust is an interesting way to look at AI... are the risks acceptable, of using AI XYZ, for purposes of problems of type ABC? Then we "trust" XYZ... in the domain of ABC problems. Do we trust an AI to read scans of bodies, to triage casualties during a flood of incoming emergencies, etc? Well it turns out we sometimes do. Is that trust? I'd say so. Do we need the AI to cross the uncanny valley or even pass Turing to trust it? Not really. More intriguing about these "artistic" AIs, is: will they replace some / many artists? ; will an artist require an AI tool to become successful? ; will art made with AI be valued less/more than made without? ; will AIs "invent" or "co-invent" new forms of art ?

Expand full comment

The second-to-last image in the article (below "...these man-bites-dog type sentences were systematically problematic...") is broken, what did it show?

Expand full comment
Jun 7, 2022·edited Jun 7, 2022

"First, the paper reported a second example of the same phenomenon, and seems to acknowledge that these man-bites-dog type sentences were systematically problematic (Imagen on left, DALL-E on right):"

The image isn't loading right now. The url is unlike the others, blob:https://garymarcus.substack.com/b2849ba5-4009-48f9-9345-e0e075cb97e9

Expand full comment

«A'horse-riding, an astronaut»

Maybe the AI was fed too many nursery rhymes....

Expand full comment

Well, this isn’t general AI, but we can all admit that it sure is impressive. This is an example of the kind of situation in which AI will excel and change life, where the cost of error is low and you can keep trying until you get a finished product with considerably less effort than doing it yourself. As the authors of Prediction Machines said, AI is a strong complement to human judgment, not a replacement, and increases the productivity of human judgment. I think it will do so exponentially, and these debates about whether it is general might become moot.

Expand full comment

AI renamed 'IA' (Intelligence Augmentation) would be a better characterization - it does augment our intelligence, just like a tape measure, staircase and my 99c calculator :)

Expand full comment

I like the term "Expert program", as that's closer to what they do. They can do something narrow, if you know how to use it correctly, but it isn't intelligent in any way.

I think that's one of the big problems - people ascribe intelligence to things that aren't. These are tools.

I think that trying to make them intelligent is worthless as this approach is fundamentally flawed for that. That doesn't mean that I don't think you can generate something that makes useful output.

Like, I'd love an AI that can draw me art.

That said, the more "close" people try to keep these things, the more skeptical I am of their functionality. Especially when they want to make money.

Expand full comment

For sure! No doubt they can be useful. I use neural style transfer to augment art I create, that does wonders. But when these (current) systems are increasingly deployed in 'life or death' situations, that's a problem - faking intelligence will have detrimental (to us) consequences then.

Expand full comment

Here is the crux of your point: " rather that the network does something more holistic and approximate, a bit like keyword matching and a lot less like deep language understanding. What one really needs, and no one yet knows how to build, is a system that can derive semantics of wholes from their parts as a function of their syntax."

You seem to think "holistic and approximate" are somehow bad, or unlike what the human brain does. If that is (part of) your these, you are simply wrong. The human brain does holistic and approximate, simply on a much larger scale and with deeper layers of nuance. Dall-E and Imogen, if I recall, use on the order of ~100 billion parameters. The human operates with on the order of 100 trillion parameters and does on the order of a quintillion basic computations per seconds (and exaflop). How much deep comprehension do you expect from a model with .1% the parameters in which to encode it's understand of *BOTH* language *AND* images and the relationships between them?

What is or is not "overly hyped" or considered "deep understanding" of language are subjective. I would call this quite deep understanding of the connection between language and visual phenomenon, given that the system has the learning capacity on the order of a literal bird brain, and the brain of a small not very bright bird at that.

Your claim that "no one yet knows how to build a system that can derive semantics..." is a bit like claiming ancient Egyptions did not know how to build a taller pyramid than those at Giza. Just because something costs too much or you simply lack the resources to build it does not mean you don't know how. Computing power available to researchers is in the range of 1/1000th and 1/100th that of the human brain. That latest AI achievements with that computing power strongly suggest that with 100x to 1000x the computing power we will achieve AGI or something remarkably close. Current trends suggest such levels of computing power will be available to researchers in 5-10 years. Indeed, even today it is tantalizing to consider what we would see if the Imogen or Dall-E 2 models were scaled up to utilize the latest Exascale supercomputers.

Where you see "hype" and overly bold claims, I see researchers who fully grasp the import of what they have achieve with very *limited* resources.

Expand full comment

Thank you for this detailed summary and the examples. I think this sums the current hype around AI and AGI succinctly:

>>it turns out that Imagen can draw a horse riding an astronaut—but only if you ask it in >>precisely the right way:

I think the hype will continue despite the scandals and critic. There are plenty of big problems and open questions that AI can handle. For now, "horse riding an astronaut" seems to be getting the attention. I am glad there's people like you still writing.

I hope that in the next few years, the investors and Big Tech will move on the to the next shiny thing and leave AI to be mature.

Expand full comment