21 Comments

Meanwhile, everybody's favorite AI developer, Yann LeCun, is calling for every repository of human knowledge to give AI companies free access to their treasure troves. Something sickening about the richest and most powerful people building their tools for mass unemployment off the stolen creative work of humanity. They could go slower in development, and only use training data they licensed. They don't want to. Stealing is cheaper, and Silicon Valley is packed with people who see that theft as fine. If fact, they call it "FREEING THE DATA!" As if the novels and articles I WROTE somehow want to jump into their shady LLMs so that I have even less chance to get paid to write in the future. It is IP theft on a scale unprecedented in human history. It SHOULD shock the conscience. It appears the first rule of going to work for a big AI company is a conscience-ectomy. Those are the people building our brave new world.

Expand full comment

This is not the Yann LeCun I knew. That Yann LeCun was instrumental in creating a new technology that converted an internet over raw data into an internet of known objects. The value of that work was real, tangible, and electrifying. No one had to sell its value by heavy-handed marketing

That was also a time when every major effort I knew to create databases of human knowledge began with Wikipedia and strictly emphasized privacy and respect for intellectual property.

Times and people change, not always for the better.

Expand full comment

This has been going on for decades. This A.I thing is the second go round. This went on in the zeroes with what is called social media. No copyrights were paid. So I am saying why are people shocked! This has been consistent behavior that just choose not to see.

Expand full comment

I have been saying this for two plus decades when it comes to Silicon Valley. They have been involved in this criminal activity with the co sign wink wink of the so called law makers. Nothing new here. I am just glad folks are finally acknowledging that much of this innovation is based on fraudulent acts of crime.

Expand full comment

With money as speech, the lawmakers who might oppose this have an impossible time getting elected.

Expand full comment

Thank you for that Amy. I do not see the lawmakers as the solution now because the system is so co opted. The system would have to collapse. Also what are called lawmakers are not the same as before. The people will have to stand up and say no! Now I know that may sound far out but that is the only way.

Expand full comment

I have small children so system collapse is not an attractive solution. Real people taking civic action so that we can achieve change without destroying the good things that we take for granted would be my preference.

Expand full comment

Not only is AI bad at copyright, it also doesn't seem to understand that the Avengers are Marvel and Superman is DC.

Expand full comment

That is disrespectful! For them it is just content not media. It is free not sweat and toiling to create.

Expand full comment

I'm always suspicious when a suicide conveniently allows the plundering class to keep on keeping on. Aaron Swartz, *TWO* Boeing whistleblowers, and Suchir Balaji.

In addition to Art Keller's comparison with Yann LeCun, these people want everything "democratized" so they can train on the entire "input" of humanity, but they are absolutely adamant that you will pay for the "output" whether it's with money or data. Them's the rules because they paid good money for lawmakers and judges to make them that way.

Expand full comment

Yes, these "suicides" are awfully convenient, aren't they? In each and every case, potentially billions of dollars at stake.

Expand full comment

Fair use is a defense, not a right. If you use copyrighted material and make a lot of money, you're going to get sued - even if you us just a little.

Expand full comment

Condolences to his family and friends.

There is clear evidence that the makers of these tools know that their fair use argument is weak, most obviously in that they have tried to hide their training data. Values and principles just don’t exist, it seems.

Expand full comment

What many people miss in these debates is how asynine popular culture ergo culture in the Western world has become since the dawn of the Internet age, accelerated by the social media age. Spotify created a culture of commodification of music. It's the John the Baptist to the anti-Jesus of fake tunes. It's Christmas 2024 in the privileged Western world. Look at the music charts : Mariah, Wham, The Pogues, Sinatra. Big tech destroys culture

Expand full comment

If he was an Indian-American, was he 1st of 2nd generation immigrant? Was he a US Citizen or was he a worker on an H-1B? Would have been nice if someone had reported on that.

My condolences to his family, friends, loved ones.

Expand full comment

I worry that the US courts are moving too slow here. With Sam Altman literally paying homage to Trump post election it feels like OpenAI and Microsoft will be ramping up the lobbying in DC next year to carve out an explicit fair use exception for pre training. And know their anticompetitive tendencies I’m sure they’ll bake the law in a way that only the top labs get thar fair use exception.

Expand full comment

Does this have any bearing on Alexandra Elbakyan’s project of making scientific literature (behind criminally high paywalls) available free to research scientists?

Expand full comment

Is this one of those "suicides" like when a person shoots themselves in the back of the head? Sort of like the Boeing whistleblower "suicides" from earlier this year?

Expand full comment

On the four-factor test: I’m not a lawyer, but I don’t think it’s at all clear-cut that the use of copyrighted material in training LLMs is a violation of copyright, even when portions of the copyrighted material are reproduced verbatim in the output.

The questions that still have to be answered in the courts are:

Is the purpose and character of the use (factor 1) primarily of a commercial nature, or for nonprofit educational purposes? Here OpenAI’s unique business structure may actually work in its favor (and indeed may very well have been a factor when it was set up that way).

What is “the amount and substantiality of the portion used in relation to the copyrighted work as a whole” (factor 3)? In Campbell vs Acuff-Rose Music, the Supreme Court on remand required the lower courts to consider the “transformative elements” of the derivative work, and “transformation” has since become a key part of the legal cases when the third factor is invoked. Just because a substantial portion of the output from a copyrighted work is duplicated does not necessary mean that the copyrighted work hasn’t been transformed, under law.

What is the effect of the derivative work on the potential market for, or value of, the copyrighted work (factor 4)? Is the value of, say, the Mario Bros. reduced by the use of DALL-E to generate an image that’s used in a different context? The courts have already dismissed or greatly reduced the ability of plaintiffs claiming copyright violations by requiring them to show that they have been harmed by the availability of their copyrighted works inside of an AI tool, and the courts do seem at least initially inclined to a position that there is no “market substitution” in the way that copyrighted works are used within AI.

Expand full comment

It is very tragic that this young man chose to take his own life. It is not clear why he chose to do that.

Longer-term, the copyright issues will likely be solved in a manner that will satisfy everybody. OpenAI will pay up for content. Opt-outs will be implemented. Some degree of fair use will be accepted. It worked out with YouTube.

Then, the focus will shift more to custom-designed data for detailed work and for algorithms doing more work, so less need for data.

Expand full comment

"Longer-term, the copyright issues will likely be solved in a manner that will satisfy everybody."

At first I thought that I must have misread this. This is frankly a pretty absurd assumption.

Expand full comment