Reid Southen and I continued our experiments, and wrote a long paper about what we found, and why they may pose serious problems both for users and developers that are difficult to fix.
Interesting. Goes along the lines of what I was saying in the other comments threads from the previous 2 articles, that no matter how you try and spin it as using only links or URLs or whatever, the fact is that an image or a text is still used at the very beginning as input. How to solve this, other than getting consensus for use? We'd have to re-conceptualize what it means to digest and "use" material, or try and grant some kind of "thinking" or sentience status to AIs, such that they are free to consume copyrighted material and "think" about them and output about stuff, like we do as free human agents (that are genuinely creative) ... which is a distortion of what's really going on, which is a mechanical system created by humans as a tool. And this all strikes at the very core of what is at issue and what Gary writes about: what is "intelligence", and what is AI and what is the goal, how to use them (properly), and how to get there?
I imagine the journey to conscious AI would speed up if Bots are held accountable for making strategic, informed decisions. But what does it mean to hold AI accountable? More to the point I guess is that it currently appears the folks building AI are building something they know is not capable of making good, rational, ethical, reliable informed decisions. But they don't seem to care? Or are obsessed with "progress" at any cost.
I used the word “sentience” rather than “conscious” intentionally, as it is a much lower bar: sentience would apply to outward behavior we can observe, such as a body being asleep and insentient one moment, and awake and sentient the next. Whether an entity is conscious is impossible to know: all I can know is this consciousness I am. Heck I can’t even know what my girlfriend’s subjective experience is, and imagining it is, well, just imagination (as a I keep having to learn again and again…). Is an octopus sentient? Probably, given its behavior. Could a computer be? Maybe be – I’ll give it some probability when I see something worthy of the word.
In the meantime, that is all speculative with respect to the matter at hand, which is the practical one of copyright. We’ve got a fine mess on our hands, and I wouldn’t want to be in the position of the AI companies right now, caught with their pants down.
I would agree that “...the folks building AI are building something they know is not capable of making good, rational, ethical, reliable informed decisions.” The builders and the lawyers, judges and politicians seem barely up to the same task as well. Plus, the lawyers, judges and politicians have the added handicap of not understanding the tech, and projecting all kinds of powers onto it at times (thus another useful thing Gary is helping to hopefully ameliorate).
I don’t know what the solution is (especially since I’m not a lawyer), but the situation was inevitable once computers were invented and creative material was digitized. That opened the door to the ability to copy things infinitely. And now a digital power amplifier and transmogrifier has been added to the process, in so-called AIs, based on neural nets, LLMs, etc. And now that the cat’s out of the bag with all kinds of open-source LLMs out in the wild, perhaps the only solution will be not at the point of input to the models, but at the other end of the process, when humans take something they created in collaboration with an AI and put it out into the world. At that point it will be the responsibility of the user to check if they are using materials that are close enough to copyrighted input such that it would violate some entities’ legal claims. And, to aid in this, the AI companies might be good enough to figure out a way of tracing that similarity automatically, such that the system will warn the user that the text or image or sound is dangerously similar to X input? Just an idea.
I just published a book on Amazon (about the history of computers), and noticed that when I uploaded the manuscript, one of the new, standard questions on the form is if an AI was used in its creation (I said 'No')! Interesting times we live in.
The summary of your paper is very clear and relevant in my opinion. It suits well my way of thinking on the subject. There are true solutions to avoid genAI plagiarism but none of them will be easily accepted by companies providing genAI tools and most of them will have to be enforced by the law (new regulations or lawsuits).
1) AI driven systems could be trained on public domain content only or on content for which a permission under a specific license was granted. That means a huge step back, resetting all training data bases and reeducating the systems.
2) Any result generated by an AI system could be assorted with a list of references (works) used for producing this specific output. Not possible with present LLMs if I well understood, but possible in foreseeable future if the purely LLM approach is changed to a more inferential and rigorous one. That means investing further in a new technology.
3) Every content generated by an AI system which is a based on no referenced data could be accompanied by a clear written liability warning for the user stating that this content is intended for strictly personal, no professional and no commercial use. Such a restriction will surely refrain some customers from using the system.
All these solutions are not straightforward to apply now when the AI systems have already been commercialized but what is going on presently is just plain not respecting of intellectual property principles.
Interesting. Goes along the lines of what I was saying in the other comments threads from the previous 2 articles, that no matter how you try and spin it as using only links or URLs or whatever, the fact is that an image or a text is still used at the very beginning as input. How to solve this, other than getting consensus for use? We'd have to re-conceptualize what it means to digest and "use" material, or try and grant some kind of "thinking" or sentience status to AIs, such that they are free to consume copyrighted material and "think" about them and output about stuff, like we do as free human agents (that are genuinely creative) ... which is a distortion of what's really going on, which is a mechanical system created by humans as a tool. And this all strikes at the very core of what is at issue and what Gary writes about: what is "intelligence", and what is AI and what is the goal, how to use them (properly), and how to get there?
I imagine the journey to conscious AI would speed up if Bots are held accountable for making strategic, informed decisions. But what does it mean to hold AI accountable? More to the point I guess is that it currently appears the folks building AI are building something they know is not capable of making good, rational, ethical, reliable informed decisions. But they don't seem to care? Or are obsessed with "progress" at any cost.
I used the word “sentience” rather than “conscious” intentionally, as it is a much lower bar: sentience would apply to outward behavior we can observe, such as a body being asleep and insentient one moment, and awake and sentient the next. Whether an entity is conscious is impossible to know: all I can know is this consciousness I am. Heck I can’t even know what my girlfriend’s subjective experience is, and imagining it is, well, just imagination (as a I keep having to learn again and again…). Is an octopus sentient? Probably, given its behavior. Could a computer be? Maybe be – I’ll give it some probability when I see something worthy of the word.
In the meantime, that is all speculative with respect to the matter at hand, which is the practical one of copyright. We’ve got a fine mess on our hands, and I wouldn’t want to be in the position of the AI companies right now, caught with their pants down.
I would agree that “...the folks building AI are building something they know is not capable of making good, rational, ethical, reliable informed decisions.” The builders and the lawyers, judges and politicians seem barely up to the same task as well. Plus, the lawyers, judges and politicians have the added handicap of not understanding the tech, and projecting all kinds of powers onto it at times (thus another useful thing Gary is helping to hopefully ameliorate).
I don’t know what the solution is (especially since I’m not a lawyer), but the situation was inevitable once computers were invented and creative material was digitized. That opened the door to the ability to copy things infinitely. And now a digital power amplifier and transmogrifier has been added to the process, in so-called AIs, based on neural nets, LLMs, etc. And now that the cat’s out of the bag with all kinds of open-source LLMs out in the wild, perhaps the only solution will be not at the point of input to the models, but at the other end of the process, when humans take something they created in collaboration with an AI and put it out into the world. At that point it will be the responsibility of the user to check if they are using materials that are close enough to copyrighted input such that it would violate some entities’ legal claims. And, to aid in this, the AI companies might be good enough to figure out a way of tracing that similarity automatically, such that the system will warn the user that the text or image or sound is dangerously similar to X input? Just an idea.
I just published a book on Amazon (about the history of computers), and noticed that when I uploaded the manuscript, one of the new, standard questions on the form is if an AI was used in its creation (I said 'No')! Interesting times we live in.
The summary of your paper is very clear and relevant in my opinion. It suits well my way of thinking on the subject. There are true solutions to avoid genAI plagiarism but none of them will be easily accepted by companies providing genAI tools and most of them will have to be enforced by the law (new regulations or lawsuits).
1) AI driven systems could be trained on public domain content only or on content for which a permission under a specific license was granted. That means a huge step back, resetting all training data bases and reeducating the systems.
2) Any result generated by an AI system could be assorted with a list of references (works) used for producing this specific output. Not possible with present LLMs if I well understood, but possible in foreseeable future if the purely LLM approach is changed to a more inferential and rigorous one. That means investing further in a new technology.
3) Every content generated by an AI system which is a based on no referenced data could be accompanied by a clear written liability warning for the user stating that this content is intended for strictly personal, no professional and no commercial use. Such a restriction will surely refrain some customers from using the system.
All these solutions are not straightforward to apply now when the AI systems have already been commercialized but what is going on presently is just plain not respecting of intellectual property principles.
tHIS IMPLIES THAT THE PROBLEM OF INFRINGEMENT IS NOT SOLVABLE AT THIS POINT.