33 Comments

Great article. Since Language Models are only learning the distriburion of words and with certain clever hacks (like attention) how to contextualize it, the next big models will appear to learn basic compositionality of most common examples, but will fail to generalize for complex unseen ones. There must be a better way that just building larger and larger models to approach AGI, and I think is important for the field to start explore alternatives to get closer before a new AI winter arrives.

Expand full comment

Are the numerous grammatical and "writing" errors in this interesting piece intentional (i.e. purposefully creating dirty data...) or the result of speech to text input problems, or lack of rewrite time? Seems strange to discuss language-centric issues without making it easier for one's readers.

Expand full comment

If Musk takes you up on your bet, you should wait till 2029, then ask the AI who wins. If it says "Elon should pay Gary," demand your money. If it says "Gary should pay Elon," point out that it clearly still hasn't solved compositionality, and demand your money.

Expand full comment

May I suggest using Grammarly before posting? Without typos the article will be even better.

Expand full comment

Hi there, I am the mysterious entity known only as Vitor, and I really don't see what my credentials have to do with anything. A bet like this is obviously not meant to settle the question scientifically. Rather, it's a tool that helps people articulate their intuitions in a falsifiable way, and makes it harder to shift the goalposts after the fact. Even a sloppy bet like this provides some sort of signal, though I'd say it's best interpreted as being analogous to the playful, exploratory phase of scientific research, trying to make my intuitions a bit more precise.

Now, in this case I'd agree with you that the bet is too generous towards Scott's side. By taking only the best sample, the AI's job is made too easy. I realized this shortly after I made the bet, but of course I wouldn't try to pull out over something like this.

cheers!

Expand full comment

The commercial sector requires constant low-cost hype-able news. That drains from the resources required to actually do deep investigation of the issue. Will AI get caught up in a hare-tortoise?

Expand full comment
Dec 6, 2022·edited Dec 6, 2022

"One of Silicon Valley’s Sharpest Minds"

LOL. Musk has proven that he's not that.

Edit: Oops, I wrote that before reading the whole article. It's true, though.

Expand full comment
Sep 22, 2022·edited Sep 22, 2022

Sure, the bar was set very low in the bet with Vitor, but Scott Alexander did say "Without wanting to claim that Imagen has fully mastered compositionality....."

Expand full comment

Are grammar and spelling errors suppose to make us believe this wasn't actually written by GPT-3? Nice try.

Expand full comment

With respect to compositionality, there is a big difference between Dall-E and Imagen. Dall-E is trained by contrastive learning: it matches a set of captions against a set of images, and treating each caption as a simple bag of words is usually enough to do well at that task. Given the way it is trained, it would be shocking if it *could* understand compositional descriptions.

Imagen is an image generator hooked up to a large language model, and the LLM is pre-trained on vast reams of actual text (not just image captions). The predict-the-next token loss function of LLMs is not necessarily a great task for learning compositional relationships, but it does have some compositional elements, since the LLM must at least learn to produce gramatically-correct output, and it has seen many, many detailed descriptions of scenes within its training set, e.g. in novels and news articles.

Moreover, although the transformer architecture as a whole is not fully recursive to unlimited depth, it can capture limited recursion with bounded depth by using many stacked layers. The attention mechanism is known to be capable of representing parse trees internally. (See "Global Relational Models of Source Code"). LLMs have recently had a fair amount of success at producing other recursive, compositional data structures, such as source code.

Thus, an LLM hooked up to an image generator should be capable, in theory, of parsing a compositional scene description, and generating an image from it. Better pre-training tasks on text (beyond predict-the-next-token) and on images (perhaps tracking objects in video over multiple frames) would doubtless further improve the model.

What is surprising to me is not that these models have limitations (of course they do), but that they do so astonishingly well, even when trained with brain-dead loss functions like predict-the-next-token.

Expand full comment

The problems in AI are well documented - re: the number of papers and articles claiming GAI will be the end of us all, due to the tradeoff between usefully relaxing requirements and limitations to allow improvisational improvements and having those limitations keep us all safe. When you allow peer review you are putting code where it can be used by anyone without the same ethical limitations use code in progress... do no evil? Or don't allow it to be done. We need a way to share with trusted reviewers, but not the unvetted general public. We can see the complications ths adds in action with the cyber arms race.

Expand full comment

dude, you are in serious need of a proofreader. I mean, it's a good article, and I agree with your premise, but look at your footnote: "... asked if I could they could it for me..." srsly? Many similar examples, they detract. I share your frustration with AI hype, however. Thanks also for providing us yet more evidence that Elon Musk ain't the genius he thinks he is (and I know, he's got lots of company).

Expand full comment

I made a prediction market based on this post: https://manifold.markets/SneakySly/will-ai-image-generating-models-sco

Expand full comment

I'm kinda intrigued as to whether this compositionality issue will leave AI-based content moderation systems vulnerable

Expand full comment

Another great article.

I do think DeepFold should maybe get a Nobel Prize for the results they got on protein folding. This technology is powerful in its own way, as long as results are discrete or need not be very precise.

But general AI and equivalent on digital computers: no chance in hell. And if you want to laugh at a sign about this 'second run of AI hype' look at US Patent 11396271 (for an app that warns pedestrians on a crossing that an oncoming self-driving car (should but) will not stop... "a method and system for communication between a vulnerable road user and an autonomous vehicle using augmented reality to highlight information to the vulnerable road user regarding potential interactions between the autonomous vehicle and the vulnerable road user.")

Expand full comment