Marcus on AI

Gerben Wierda

Well, that is handy for the IPR lawyers when it turns out the users of Bard cannot be hold responsible (even if the GenAI companies would wish it to be so) as they have no way to be aware of the infringement, having no access to the training data...

Expand full comment

Eric Cort Platt

Yes, I suppose it's actually an advantage for us users, at the moment! We can hold our hands up i innocence: "Hey, I didn't know where this came from!"

Expand full comment

Clive

Some big corporations are stealing the “intellectual property” of some other big corporations

I get that this is not OK on some level, but I’ve also seen little to suggest that any normal person will be harmed

Capitalist on capitalist crime. Have at it

Expand full comment

Comment deleted

Comment deleted

Expand full comment

Feb 2, 2024Edited

exactly (and as discussed near the end of our IEEE Spectrum article)

Expand full comment

Feb 2, 2024Edited

I'm not a fan of the big tech companies OR copyright law. Both cause societal problems.

However, it seems clear to me that the problem here is not the AI models, which are merely learning about the world from looking at freely available content (this is fair use IMO).

If someone generates and distributes objectionable content (of any kind, whether it's copyright violation or a deepfake nude) then the problem is the person who created and distributed that objectionable content, NOT the tool they used (which could be AI, or Photoshop, a photocopier or a paintbrush).

Expand full comment

Roumen Popov

So, are you saying that if I collect a library of publicly available but copyrighted jpeg images, put it on a web site and charge subscription fee to people for retrieving images from it, that's ok and the copyright violation is with the people who retrieved the images and then used them unlawfully?! :) Because, that's essentially what the so called GenAI is doing, the models are basically memorizing the training content. It's a fundamental feature of their training since during training the model aims to minimize the difference between its output and the training data so it basically tries to replicate it.

Expand full comment

Feb 2, 2024Edited

Well, no, YOU would be the copier and distributor in that scenario. I'll also point out copyright law says nothing about how much you charge for things, just that you copied it. As I'm sure you know, search engines and social networks and in fact your web browser have done this en-masse for a long time (to read my text here your machine made copies of it). Also no-one seems angry that web.archive.org exists. Or giphy.com etc. Which is why I think copyright law is outdated and mostly just serves big capitalist media and lawyers.

The replication examples we're seeing touted from these models (such as in this article) is inflated because people are clearly prompting it specifically to make copies of things they want to see. For example, they may not mention Mario explicitly, but there no other Italian plumber video game characters in the world. That's about the prompter's intentions and use of the tool, not the model itself.

If I generated lots of pictures of Mario and used them to make some game called Super Mario without giving any credit then Nintendo would have a perfect right to come after me ... but if I generate a few similar pictures or texts for my own enjoyment? Even Nintendo couldn't care, surely. They aren't losing out on anything.

(These early AI wrestlings reminds me of the early days of the internet when everyone was stressing about everyone downloading bomb recipes and becoming terrorists and trying to get Internet providers to censor all the traffic. Remember that?)

Expand full comment

Roumen Popov

The case with the search engines and web archives has been long ago settled because they provide links to the original content and they don't charge you for that. That and my web browser right at this moment copying the content of the substack web page falls under the terms of fair use. Not so for GenAI. It doesn't matter how people managed to retrieve the copyrighted content by prompting the GenAI, the fact that it is possible to retrieve copyrighted content means that the model stores it inside, encoded in its parameters. This is basically no different than jpeg compression storing an image encoded in the quantized coefficients of the cosine transform. It is of course not indifferent to Nintendo or any other content maker if their content is being used to make money without them getting a share of the revenue.

Expand full comment

Feb 2, 2024Edited

I'm sure they'll be so grateful you tried to protect Super Mario from leaking out on the internet so that we can buy those images direct from them instead: https://www.google.com/search?q=super+mario&udm=2&sa=X&biw=1512&bih=1335&dpr=2

With your position, Gary should be taken to court for posting copyrighted Nintendo images without permission on his blog here on Substack, which is a commercial site. Is that what you want?

Expand full comment

Reply (2)

Feb 2, 2024Edited

Sir, please read the IEEE Spectrum article where Southen and I discuss why Mario is just an example of larger problem.

Expand full comment

Yes, thanks, I've read it. I have appreciated some of your AI critiques in the past, Gary, but on this particular issue I think it's better we fix the broken copyright laws.

Expand full comment

Roumen Popov

Feb 4, 2024

It seems you haven't read what fair use is about, research and investigative journalism fall under that category.

Expand full comment

Eric Cort Platt

Feb 1, 2024

Well, come on, they have to stay competitive in the copyright violation technology race! 😆

Expand full comment

https://www.technologyreview.com/2023/10/23/1082189/data-poisoning-artists-fight-generative-ai/

Feb 1, 2024

🤣

Expand full comment

Ben

Feb 5, 2024

Hopefully artists will be able to use tools like nightshade that will make stealing their images counter-productive.

Expand full comment

Tor Guttorm

I thought that AlphaBet was a betting and gambling company…

Expand full comment

Mark Bisone

"Gary Marcus is standing firmly by his prediction that 2024 will be the year of Generative AI litigation."

Translation:

"Gary Marcus is hoping to capitalize on AI hysteria along every conceivable vector, and to cheerlead capturable regulatory bodies to advance his company's absurd goals no matter how detrimental to human freedom they obviously are."

Now: everyone resume their praise of this moral idiot, who answers no questions no matter how generously they're asked.

Expand full comment

Reply (2)

I don’t have a company, and you are jerk.

Expand full comment

Gerben Wierda

Mark, please, let's not turn comments into the sewer that these become without restraint or moderation. Care to edit your comment so it become an argument on merit or at least civilised?

Ask yourself, what makes you so emotional (angry, frustrated, whatever) that you write something like this?

I definitely do not agree with everything Gary has written or said. But this kind of a reply damages the usefulness of reading the comments and quickly turns into a sewer of name calling.

Expand full comment

Aaron Turner

Feb 1, 2024

A thought occurs... Given "Better Call GPT" (https://arxiv.org/abs/2401.16212), how likely is it that we're going to see LLMs used to facilitate litigation against themselves...? :-)

Expand full comment