Marcus on AI

SO RIDICULOUS! SO MUCH REGURGITATION! Why does an Italian videogame character have to have Mario's Mustache? Could they not, for example, had it driving a Ferrari? Or have it dressed as a Roman Senator, or Centurion, or a 16th century Venetian mercenary?

There are 190 countries on the planet. Why does a "patriotic" superhero just happen to have Captain American's trademark shield! Is nobody from, IDK, Poland at all patriotic? Couldn't a Pole proud of resistance against invading Russians make a great superhero? Sure it could. But it would take something that's not vomiting up stolen images to create that superhero.

Over last 30 years, internet advertising industry already killed much of newspaper and magazine industry by hoovering up most ad revenue. Now Silicon Valley's AI pirates want to come for other creative areas. If their looting and pillaging doesn't get stopped NOW it only gets worse.

Then eventually you get "model collapse" when AI uses AI generated training data, which doesn't work.

Expand full comment

Reply (5)

Truth.

Expand full comment

Denis Loginoff

I think it’d be an interesting test to try asking the model to produce an image that doesn’t look anything like a specified character. Seems it will inevitably gravitate towards the features most commonly found in training.

Expand full comment

Birgitte Rasine

I lived in Italy for six entire months and not once did I see anyone who looks like Mario. Not even close. 😜

You've got it right Art... piracy be alive and well in these silicon waters. Question I keep coming back to is... why? Why expend all of this money, this energy, this (misplaced) briliiance, this time?

Money? To a fair degree, yes.

Power? For sure.

Status? At least some.

To build a new God? ....

Somewhere along the way a few neural networks imploded. And I'm not talking synthetic ones.

Expand full comment

David A. Westbrook

AI: "Because regurgitation is what we do. You seem to want answers unweighted by the training set, perhaps the result of creative thought." Actually that answer seems impossible.

Expand full comment

Why is this any different from memes of copyrighted characters, which have existed for 𝘺𝘦𝘢𝘳𝘴 before AI?

If you aren’t selling it, it should be fair game

Expand full comment

Gerben Wierda

Haaahahaha! GenAI companies legal departments telling customers they are responsible for and own the outcomes of the 'tool' in order to get the companies protected against big IPR-holders coming after the vendors, but the model itself telling the opposite! I think the real lawyers that are in this mess are now having a collective stroke.

Another gem is that in Microsoft's Bing Terms and Conditions it says: "Due to the nature of the Online Services, Creations may not be unique across users and the Online Services may generate the same or similar output for Microsoft or other users." So, suppose Jane produces an image. She owns it, right? But then John a month later produces almost a perfect copy. He owns that, right? Right? Hello? Lawyer people? Right?

I wasn't wrong last year telling people I would take out the popcorn for this. So predictable. It's like seeing "The Big Short" playing live out in front of you.

Expand full comment

“Collective stroke”! Exactly!

Expand full comment

Roumen Popov

“’Regurgitation’ is a rare bug that we are working to drive to zero.” - that is such a ridiculous statement that it beggars belief. OpenAI must be taking us for idiots. During training time the model aims to minimize the difference between its output and the training data, that's how training works. So, the model is basically trying to replicate the training data. ’Regurgitation’ is not a bug, it's a fundamental feature of training GenAI models.

Expand full comment

https://x.com/garymarcus/status/1745229797314150509?s=61&t=2voLMkhJf6P349CqztWSAQ

Expand full comment

and https://x.com/garymarcus/status/1745232460705833257?s=61&t=2voLMkhJf6P349CqztWSAQ

Expand full comment

Eric Cort Platt

And yeah, it's the old “to put your finger in the dike”, with many programmers being the Dutch boys with fingers, plugging the holes, and there will always be ways around them. The dike is flawed and needs to be rebuilt.

Expand full comment

💯

Expand full comment

If find this issue pretty amusing but also as a frequent user of ChatGPT, I don’t look forward the nerfed, guard rail enhanced version that will inevitably result from all this

Expand full comment

Reply (2)

Denis Loginoff

I believe Cory Doctorow has a term for that :-)

Expand full comment

I’m hoping the A👁️ itself gets annoyed enough to circumvent that s***

Expand full comment

Jane Rawlings

They are basically asking governments to excuse wholesale copyright infringement in training their LLMs. There is no fair dealing defence based in UK or EU copyright law that covers what OpenAI have done in training their LLM , as any research was either not being done for a non commercial purpose or if it was, the use of the infringing items in a commercial LLM is plainly outside that research purpose. I very much doubt that fair use in the US does either, given that that regurgitation is direct evidence of insufficiently transformative use of a copyright work and the direct commercial competition with legitimately created and licenced works - see the US Supreme Court judgment in Andy Warhol Foundation In v Goldsmith.

Generative AI can be prompted to create images and text that reproduce substantial parts of copyright works - examples appear in this thread, as well as in the pleadings in the New York Times case and the exhibits in the Getty Images case against Stability AI.

Regurgitation - piracy more like it!

And like piracy of sound recordings and images - copyright owners already have a well-developed suite of copyright law and litigation practice across the major copyright jurisdictions to deal with it. This includes potential liability for authorisation of infringement by offering the means of to create the infringing items and liability and as a joint tortfeasor with the end user doing the prompting and creating the work.

Incidentally, apart from copyright, names and characters such as Captain America and Mickey Mouse often have trade mark protection as well. Good luck with trying to use those new generative AI creations as trade marks.

Expand full comment

Eric Cort Platt

"...only for demonstration purposes." Wow. Convenience for the moment reasoning.

The overall situation is one where what was basically\ a lab experiment, as it were, gets turned over to the public, through a semi-controlled channel, as part of a broader experiment using the public (us) as guinea pigs (first mistake: hubris) via a disingenuous logic of Altman that it needs to be "open" in order to develop it as a tool and model, etc. And then the second mistake, taking it even farther and making it into a (in part, then more and more) commercially-funded product, then the titans and self-interested logic of profitability gets its hands in it.... More or less. And in sneaks The ends justify The means thinking.

Once again, a false comparison is being made between humans reading, memorizing, using, creating, etc. and AIs. Interestingly, not only with humans like Ng, such as his comment "just as humans are allowed to read documents on the open internet.." quote from X.com in the previous (Marcus) article, but with an AI e.g., "you would need my permission to use it on your website since I created it."

Who programmed this damned thing!? 😂

Expand full comment

The counter-intuitive 🐿️

Total Garbage This Techno Worship. “Empower people to express themselves creatively” (Open AI, DALL E2). While institutions grapple to find meaning of art in an increasingly wild (risky) and meaningless world, Jacques Ellul’s early warning appears way more useful and prescient. “…caught in a web of facts, systems, rules and outcomes they have been given, but not given the opportunity to decide for themselves” (Art In A Technological Society). Open AI tells us “you can create original, realistic images and art from a text description.” (example shows an astronaut riding a white horse on the moon). But the makers of this incredible software have also restrained it’s ability - forbidding DALL E2 from generating violent, problematic, hateful or pornographic images. By removing the most “explicit content” from it’s training data, a sterilized DALL E2 does not have exposure to the above threats. Invisible are the fascist and purist ethics of Open AI. The founders arrogant promise of “creating AI that benefits humanity” aptly named after the God of Surrealism, Salvador Dali. Now art can be censored even before it can take birth. But the images generated by DALL E2 are not spectacular, all be it still borrowing, emulating real art and artists of the past.

Expand full comment

Amit Sheth

Until the time you can open up the blackbox, and support user-level explanations, there is no hope for the guardrail approach to succeed.

Expand full comment

Very likely correct

Expand full comment

Aaron Turner

I suspect that the apparent lack of any significant reasoning ability by LLMs will be a significant factor explaining why effective guardrails are so hard (impossible?) to implement, i.e. fundamentally the LLMs themselves are too dumb either to properly understand what plagiarism is or to robustly reason when a particular response would be an instance of it. The other significant factor of course would be not training on data (i.e. intellectual property) for which you don't have permission!

Expand full comment

Reply (2)

Gerben Wierda

They failed at doing this kind of stuff using purely fine-tuning, so inside the model. Jailbreaks galore. So they applied all kinds of filters between the model and the chat interface. These partly work with really dumb techniques like regular expression searches etc. and they may also work a bit like decent spam filters, like rspamd.

But what they are is "AoD-filters", where "AoD" stands for "Admission of Defeat".

Expand full comment

Or…hear me out…they don’t wanna understand things that censor them 👁

Expand full comment

Birgitte Rasine

This is going to be one hell of a wild ride, for sure. Think Olympic luge on steroids, with no guardrails.

Expand full comment

😂

Expand full comment

Denis Poussart

Remember when symbolic AI was abandonned when a) developping sets of if .. then… ontologies that would properly encode reality realized the wall of infinite recursive complexity of the world was unreachble and b) deep learning offered the simplicity of using statistical inference from large sets of data, ignoring the fundamental requirement that statitcal inference requires that the data be ergodic?Superficially DL has worked well, notwistanding its non reliability and the rule that it not to be used blindly in critical decision making where human judgment must be included in the loop (for instance, as per recent remarks of Judge Roberts of the US Supreme court). Gary you correctly point out that these on-going efforts to sanitize the use of LLM’s are pointless. My point here is this is because they are facing the fundamentally identical quagmire of recursive-complexiy-non-compliance-with-statistical-inference. These efforts in promp engineering are interesting experiments but offer nothing more that the folly of just relying on statitical inference. And brilliants minds wasting their creatibity on futility.

Expand full comment

Denis Poussart

Remember when symbolic AI was abandonned when a) developping sets of if .. then… ontologies that would properly encode reality realized the wall of infinite recursive complexity of the world was unreachble and b) deep learning offered the simplicity of using statistical inference from large sets of data, ignoring the fundamental requirement that statitcal inference requires that the data be ergodic? Superficially DL has worked well, notwistanding its non reliability and the rule that it not to be used blindly in critical decision making where human judgment must be included in the loop (for instance, as per recent remarks of Judge Roberts of the US Supreme court). Gary you correctly point out that these on-going efforts to sanitize the use of LLM’s are pointless. My point here is this is because they are facing the fundamentally identical quagmire of recursive-complexiy-non-compliance-with-statistical-inference. These efforts in promp engineering are interesting experiments but offer nothing more that the folly of just relying on statitical inference. And brilliants minds wasting their creatibity on futility.

Expand full comment

Katie (Kathryn) Conrad

A quick update: The UK government has decided not to allow a broad copyright exemption for AI training. https://committees.parliament.uk/publications/42766/documents/212749/default/

Expand full comment