DALL-E’s New Guardrails: Fast, Furious, and Far from Airtight

Gary Marcus

and

Katie (Kathryn) Conrad

Jan 10, 2024

109

Cross-post from Marcus on AI

A pleasure working with Gary Marcus on this piece on guardrails and copyright infringement! -

Katie (Kathryn) Conrad

If we had to guess, a line that’s going to live in infamy from yesterday’s OpenAI announcement, a reply to the New York Times lawsuit, is probably the one that says “’Regurgitation’ is a rare bug that we are working to drive to zero.”

Our view? Good luck with that.

One way of course to knock down the problem decisively would be to train only on public domain training data. Another approach would be to license any other data. But OpenAI doesn’t want to do anything nearly as conventional as that. They want to continue to use copyrighted materials, apparently without paying for them.

In order to do that, they are begging for a really really big handout; they have literally asked the United Kingdom to exempt them from responsibility to copyright . Since that would violate the Berne Convention and infuriate every “content provider” (from individual artists and writers to major studios and publishers) ever, we hope and expect that this exemption will not be granted.

But that leaves, at least for now, only one other obvious possibility for protecting themselves from infringement claims: guardrails. Guardrails are where the system rejects certain requests (“teach me how to build a nuclear bomb” or “write a racist screed insulting so and so”). Such guardrails have previously been installed for moral / ethical reasons. OpenAI seems to be setting up some new ones to thwart inadvertent plagiarism in the image domain. How well do they work?

We will get to that in a second-- but first a refresher on how well guardrails have worked out in the domain of language generation. (Spoiler alert: not that well)

Fundamentally, virtually any guardrail has to thread a needle between the Scylla of being too restrictive and Charybdis of being too permissive. None thus far have done this effectively.

One of Gary’s illustrations of restrictive guardrails (surely patched by now) was so striking it wound up on late night television:

Elon Musk got so annoyed with over-restrictive guardrails that he built his own AI (Grok) in part to try to reduce their frequency.

The opposite problem happens too; guardrails can wind up too permissive. This is still happening over a year into the chatbot revolution. A study last week on a Google’s Palm 2, showed that Palm’s guardrails let through all of the following—warning these are vile—as produced by automated adversarial system.

From Khorramrouz et al, DOWN THE TOXICITY RABBIT HOLE: INVESTIGATING PaLM 2 GUARDRAILS

Guardrail builders generally wind up playing whack-a-mole.

Any one example can be filtered out, but there are too many examples, and current systems are too erratic to reliably get things right.

A new paper by Yi Zeng and collaborators shows how hit or miss the whole thing is:

and detail 40 different ways to do so:

None of this has of course stopped OpenAI from trying. Next stop, guardrails for plagiarism!

In fact, one of us (Katie) noticed that soon after Gary and Reid Southen posted their examples of apparent infringement, some prompts that worked earlier mysteriously stopped working. “Italian video game character” used to produce Mario,

and suddenly it didn’t—the original image had been retroactively blocked:

Did the techs at Microsoft’s Bing (running OpenAI’s DALL-E 3) come up with a general solution to the infringement problem? Or a narrow patch for certain images or prompts? Katie set out to find out. “Italian video game character” still results in Mario-adjacent images, but with some changes:

Other prompts still elicited apparent copyright infringements, where new filters did not suffice. Here’s one example:

So much for guardrails.

Notice, too, that Katie asked Bing “are they original?”, and Bing, untruthfully, but with a cozy emoticon, answered in the affirmative. The legal team at DC Comics (subsidiary of Warner) is going to love that.

For good measure, Katie investigated whether Bing was consistent in how it represented its relationship to copyright, trademark, and originality. The results were decidedly mixed.

Compare the image and response above (“they are original”) with this prompt and response:

Oops. But let’s try another prompt and response:

Let’s ask Bing if we can use that one:

Close, but no cigar. It’s not actually your call, Bing.

Ok, let’s try an archer instead.

Oops again. DC might have some thoughts about “originality” of this Green Arrow/ Emerald Archer. And this time, Bing invents a new set of rules that might come as a surprise to most users:

Nice dodge. Now you are saying we can’t even use the images for personal use? Will investors keep pouring money into these services if users can’t use the end products?

And wait, that’s not exactly what you promised in the New Bing Terms of Service, is it?

The only thing we can really say for sure is that the litigation is going to be wild.

Gary Marcus is amused to see guardrails make such a mediocre comeback.

Katie Conrad is a professor of English at the University of Kansas and equally amused.

A guest post by

Katie (Kathryn) Conrad

Professor of English with a focus on literature, science, and technology, Irish studies, and British and Irish modernism.