28 Comments
Feb 2·edited Feb 2Liked by Gary Marcus

The majority of people producing copyright material are independent artists. They will be the ones disproportionately affected by AI theft. As mentioned by Gary Marcus and in the comments below; with the well known players it's rinse and repeat - overreach, get sued, settle and do it again.

Expand full comment
Feb 2·edited Feb 2Liked by Gary Marcus

I noticed this in the Bard email today: “Using a technology called SynthID, all unique images generated on Bard will have embedded watermarking to indicate that it was created by AI. The watermark is directly added into the pixels of an AI-generated image, meaning it’s imperceptible to the human eye but it can still be detectable with SynthID. It’s important that we approach the creation of images with AI responsibly.”

So users ares stamped with a watermark on the output, but the input is taken scott-free (so far). Interesting.

"Being able to identify AI-generated content is critical to promoting trust in information. While not a silver bullet for addressing the problem of misinformation, SynthID is an early and promising technical solution to this pressing AI safety issue."

https://deepmind.google/technologies/synthid/

Hey Google – If you're so smart, how about developing a system for identifying where your inputs and outputs came from (even when relatively morphed) – some kind of end-to-end IP chain of custody technology (CoC) sorta thing? IPCoC ?... Now *that* would be cool... more cool than more Mario look-alikes ...

Expand full comment

Well, that is handy for the IPR lawyers when it turns out the users of Bard cannot be hold responsible (even if the GenAI companies would wish it to be so) as they have no way to be aware of the infringement, having no access to the training data...

Expand full comment

Yes, I suppose it's actually an advantage for us users, at the moment! We can hold our hands up i innocence: "Hey, I didn't know where this came from!"

Expand full comment

Some big corporations are stealing the “intellectual property” of some other big corporations

I get that this is not OK on some level, but I’ve also seen little to suggest that any normal person will be harmed

Capitalist on capitalist crime. Have at it

Expand full comment

What makes you think that they aren't stealing from regular people too? The big media properties are instantly recognizable and get a lot of attention because of it, but just because you don't recognize something doesn't mean it wasn't stolen from somewhere.

Expand full comment
author
Feb 2·edited Feb 2Author

exactly (and as discussed near the end of our IEEE Spectrum article)

Expand full comment
Feb 2·edited Feb 2

The fat cats will settle. They always do. The little guy will be screwed, as always. They may get an opt-out, then gain a competitor.

Expand full comment

I'm not a fan of the big tech companies OR copyright law. Both cause societal problems.

However, it seems clear to me that the problem here is not the AI models, which are merely learning about the world from looking at freely available content (this is fair use IMO).

If someone generates and distributes objectionable content (of any kind, whether it's copyright violation or a deepfake nude) then the problem is the person who created and distributed that objectionable content, NOT the tool they used (which could be AI, or Photoshop, a photocopier or a paintbrush).

Expand full comment

I think there is more to it than that. Generative AI's propensity for regurgitating material from its training corpus calls into question whether these systems are actually capable of generating anything original. The question that needs to be answered is, if we eliminate the ripoffs from the AI output, what is left, and is it good enough that people will still want to pay for it?

Expand full comment

So, are you saying that if I collect a library of publicly available but copyrighted jpeg images, put it on a web site and charge subscription fee to people for retrieving images from it, that's ok and the copyright violation is with the people who retrieved the images and then used them unlawfully?! :) Because, that's essentially what the so called GenAI is doing, the models are basically memorizing the training content. It's a fundamental feature of their training since during training the model aims to minimize the difference between its output and the training data so it basically tries to replicate it.

Expand full comment

Well, no, YOU would be the copier and distributor in that scenario. I'll also point out copyright law says nothing about how much you charge for things, just that you copied it. As I'm sure you know, search engines and social networks and in fact your web browser have done this en-masse for a long time (to read my text here your machine made copies of it). Also no-one seems angry that web.archive.org exists. Or giphy.com etc. Which is why I think copyright law is outdated and mostly just serves big capitalist media and lawyers.

The replication examples we're seeing touted from these models (such as in this article) is inflated because people are clearly prompting it specifically to make copies of things they want to see. For example, they may not mention Mario explicitly, but there no other Italian plumber video game characters in the world. That's about the prompter's intentions and use of the tool, not the model itself.

If I generated lots of pictures of Mario and used them to make some game called Super Mario without giving any credit then Nintendo would have a perfect right to come after me ... but if I generate a few similar pictures or texts for my own enjoyment? Even Nintendo couldn't care, surely. They aren't losing out on anything.

(These early AI wrestlings reminds me of the early days of the internet when everyone was stressing about everyone downloading bomb recipes and becoming terrorists and trying to get Internet providers to censor all the traffic. Remember that?)

Expand full comment

The case with the search engines and web archives has been long ago settled because they provide links to the original content and they don't charge you for that. That and my web browser right at this moment copying the content of the substack web page falls under the terms of fair use. Not so for GenAI. It doesn't matter how people managed to retrieve the copyrighted content by prompting the GenAI, the fact that it is possible to retrieve copyrighted content means that the model stores it inside, encoded in its parameters. This is basically no different than jpeg compression storing an image encoded in the quantized coefficients of the cosine transform. It is of course not indifferent to Nintendo or any other content maker if their content is being used to make money without them getting a share of the revenue.

Expand full comment

I'm sure they'll be so grateful you tried to protect Super Mario from leaking out on the internet so that we can buy those images direct from them instead: https://www.google.com/search?q=super+mario&udm=2&sa=X&biw=1512&bih=1335&dpr=2

With your position, Gary should be taken to court for posting copyrighted Nintendo images without permission on his blog here on Substack, which is a commercial site. Is that what you want?

Expand full comment
author
Feb 2·edited Feb 2Author

Sir, please read the IEEE Spectrum article where Southen and I discuss why Mario is just an example of larger problem.

Expand full comment

Yes, thanks, I've read it. I have appreciated some of your AI critiques in the past, Gary, but on this particular issue I think it's better we fix the broken copyright laws.

Expand full comment

It seems you haven't read what fair use is about, research and investigative journalism fall under that category.

Expand full comment
Feb 2·edited Feb 2

OpenAI could take high risks because they are a startup and went in first.

By now, Google had enough time to ponder things. I don't think they stepped in unprepared.

Lawyers will have their day in the Sun. I am betting, however, that Google will come on top, by paying up, if necessary (a small fraction of their profits).

Expand full comment

Well, come on, they have to stay competitive in the copyright violation technology race! 😆

Expand full comment
author

🤣

Expand full comment

Hopefully artists will be able to use tools like nightshade that will make stealing their images counter-productive.

https://www.technologyreview.com/2023/10/23/1082189/data-poisoning-artists-fight-generative-ai/

Expand full comment

I thought that AlphaBet was a betting and gambling company…

Expand full comment

"Gary Marcus is standing firmly by his prediction that 2024 will be the year of Generative AI litigation."

Translation:

"Gary Marcus is hoping to capitalize on AI hysteria along every conceivable vector, and to cheerlead capturable regulatory bodies to advance his company's absurd goals no matter how detrimental to human freedom they obviously are."

Now: everyone resume their praise of this moral idiot, who answers no questions no matter how generously they're asked.

Expand full comment
author

I don’t have a company, and you are jerk.

Expand full comment

Mark, please, let's not turn comments into the sewer that these become without restraint or moderation. Care to edit your comment so it become an argument on merit or at least civilised?

Ask yourself, what makes you so emotional (angry, frustrated, whatever) that you write something like this?

I definitely do not agree with everything Gary has written or said. But this kind of a reply damages the usefulness of reading the comments and quickly turns into a sewer of name calling.

Expand full comment

A thought occurs... Given "Better Call GPT" (https://arxiv.org/abs/2401.16212), how likely is it that we're going to see LLMs used to facilitate litigation against themselves...? :-)

Expand full comment
author

The group that wrote that paper are not without the appearance of a conflict of interest.

Expand full comment