Tech Roundup Episode 22 – Training Artificial Intelligence & Copyright Law

November 7, 2023

Moderated by Brent Skorup, experts Timothy B. Lee, Professor Pamela Samuelson, and Kristian Stout discuss the emerging legal issues involving artificial intelligence, and its use of works protected under copyright law. Topics include how artificial intelligence uses intellectual property, whether allegations of violations of intellectual property are analogous to prior historical challenges or are novel, and the tradeoffs involved.

Transcript

Although this transcript is largely accurate, in some cases it could be incomplete or inaccurate due to inaudible passages or transcription errors.

[Music and Narration]

Introduction: Welcome to the Regulatory Transparency Project’s Fourth Branch podcast series. All expressions of opinion are those of the speaker.

Steven Schaefer: Hello, and welcome to The Regulatory Transparency Project’s Fourth Branch podcast. My name is Steve Schaefer, and I’m the Director of the Regulatory Transparency Project. Today we are pleased to have with us participants to discuss the intersection of law involving artificial intelligence and copyright law. We are happy to have with us a panel of stellar experts. I will keep their impressive bios brief so we can get right to the discussion.

First, we have Timothy B. Lee, who is a reporter who has written about technology, economics, and public policy for more than a decade. Before he launched Understanding AI, he wrote for the Washington Post, Vox.com and Ars Technica. Secondly, we have Brent Skorup, who is a senior research fellow at the Mercatus Center at George Mason University. His research areas include transportation technology, telecommunications, aviation, and wireless policy.

We have with us Kristian Stout, who is the Director of Innovation Policy at the International Center for Law and Economics. He is an expert in intellectual property, antitrust telecommunications, and internet governance. We are also pleased to have with us Pamela Samuelson, who is the Richard M. Sherman Distinguished Professor of Law and Professor of School Information at the U.C. Berkeley School of Law. She is also co-director of Berkeley Center for Law & Technology. Her areas of expertise include intellectual property, copyright, patents, trademarks, and law and technology. I’ll hand it over to you, Brent. Thank you.

Brent Skorup: Thanks, Steve. And I want to thank The Federalist Society for agreeing to host this conversation. I wanted to organize this and put this topic together for a lot of reasons. But the big one is a year ago OpenAI released ChatGPT. And I study technology policy. I’ve studied technology policy for a decade. And I haven’t seen a public reaction and a policy reaction to anything like I’ve seen to ChatGPT. And, not just ChatGPT, but following many companies who started publicly releasing or getting more attention for their generative AI products — Midjourney, Hugging Face, Anthropic, DALL-E, many others — with text-based and image-based generative AI services.

And, as I, like many others working in this area, try to wrap my head around it, one issue that frequently comes up as a possible risk and a policy concern is how will generative AI companies and copyright law coexist? And what will copyright law look at? There’s new issues presented by generative AI companies. And so, I think we’ve got a great panel. People who have written about these things in depth and have great knowledge and history on this. But before we turn to questions, I’d like to go to each participant, Tim first, and you can give a minute or two, your thoughts as you’re approaching this issue and how you’re thinking about copyright, fair use, and generative AI services.

Timothy B. Lee: Thanks. So, yes, I write a newsletter called Understanding AI. And it’s a mix of writing about policy relating to AI and also just AI itself, trying to help people understand how it works and how people are using it. But I think that one of the big underlying questions for all of these generative AI tools that people are excited about, both ChatGPT and also image tools like Midjourney, is that they are trained on very large amounts of information, pictures, written material that is straight from the internet and that may or may not be — in many cases is not — licensed from the original copyright holder.

And when these were research projects, people kind of didn’t worry about this very much because the stakes were not very high and there was kind of a general assumption that it’s okay to train AI systems on this material, at least for research purposes. But as they become commercial, and as people have started trying to use these models for real-world applications, the copyright holders have gotten increasingly concerned about it and they’ve just started suing.

And so, the stakes here are that if the plaintiffs win in these lawsuits, it might turn out that all the tech tools we’re using right now are infringing copyright, and all these companies are going to have to go back to the drawing board. They’re going to have to license new training material, which would take a long time, might lead to models that aren’t as good. At a minimum, it would be a very big disruption for this industry over the next couple of years.

Brent Skorup: Thank you. Pam, a minute or two, just how you’re thinking about this and how you’re — I saw your piece in Science recently. And you’re thinking about this, obviously.

Pamela Samuelson: Well, I think that the generative AI companies really thought, “Hey, if the stuff that we train on is out there on the open internet, that if we make copies of it and we train our systems, we’re not harming anybody, and so it must be fair use.” And there are some precedents that they will look to to support that. I think they were surprised that now there are a dozen, and there are probably going to be more of the lawsuits against them. Especially, I think, ten of them are class action lawsuits. And some of them just focus on visual art and some of them focus on software and some of them are, like, everything.

And it’s going to be, really, quite a long time before we really get any kind of definitive answer to the question of whether or not the training data, which, I think — I call that the big kahuna issue. Because the derivative work infringement claims, which are part of the claims that are in these cases, I think, are really weak. And, actually, in the Anderson v. Stability case just last week, the trial judge said, “If the outputs are not substantially similar, then they can’t be an infringement of the derivative work right of copyright.”

And so, I think that’s right. And I think that’s going to be the holding in the other cases. But the training data issue is huge. And things are just much more up in the air right now than, I think, the AI generative companies thought they would be.

Brent Skorup: Thank you. And Kristian, opening thoughts on how you’re approaching these issues.

Kristian Stout: Sure. I think there’s going to be a lot of agreement here, echoing Tim and Pam. I think these are very complicated questions. I think they’re not going to shake out, necessarily, the way anybody thought they would up front, because the core of what the question is going to be around, fair use, which is a balancing test. And I have my view about how I think that balancing test should come out. But ultimately, it’s going to be up to a fact-finder to weigh those elements.

I’ll just give a quick overview of where I think the most critical questions are landing. And then I’ll finish with a thought on how we need to think about these questions. And I think that’s the more productive way to approach them, at the moment. On fair use, I think the training process likely is not going to qualify as fair use, because it directly leverages the intrinsic expressive value of works. Again, it’s a balancing test. So, I could be wrong. But I think that’s how I read the case law.

But then, that said, courts, when they’ve been faced with novel and complicated technological issues in the fair use domain — something like the snippets cases or the image thumbnail cases — they will look at the public value of the use as one of the factors to weigh when thinking about it as transformative. So, the fact that generative AI is really very promising may weigh very heavily on the mind of a factfinder when they’re looking at the fair use analysis.

On the AI output side — which is another contentious issue Pam kind of signaled to — I don’t really see much needing to change in fundamental copyright law. I think the substantial similar test is probably going to be the same that governs whether the output of an AI is infringing copyright. I think there are some complicated issues based on whether you can know or infer that the model actually had access to the work that it’s alleged that it’s infringing. That will be a complicated fact issue. I don’t know that you’d be able to have a change in doctrine based on that, at the moment.

On liability for infringement, generally, I think secondary and direct liability is going to work roughly the same. But there are some interesting situations that could arise. For instance, if a user never prompts an AI system to infringe copyright, and if the AI system developer explicitly puts safeguards in to try to prevent it from infringing copyright on the outputs but you get a substantially similar result anyway, you could, theoretically, have some kind of secondary liability theory for the producers with no human agent actually directly infringing copyright, which would be kind of an interesting result that I would be curious to see how courts, and potentially, Congress, deal with that.

But, beyond the legal questions, I think, above all right now, the important thing — litigation aside — is not to try to rush to pick winners and losers, but about asking the right questions about how to think about the situation. We should be thinking about creating frameworks that facilitate bargaining between the AI developers and the copyright holders, because, obviously, I think it’s obvious that copyright has a tremendous amount of economic value for society. The AI systems are very promising as well.

So, instead of thinking about it in terms of a showdown whether the need to license billions of pieces of content will kill AI in its crib, I think better is to think about, well, what is it that we could do on the policy front that will enable these two parties to bargain with each other on the private market to make sure both parties are satisfied? Because, ultimately, copyright is an economic doctrine. It’s about having the incentives in the system to encourage the creation of productive works and to advance whatever the Constitution says, you know, the arts and sciences.

Which means that we need to think about this in terms of trade-offs. What are the costs and benefits to strong copyright protections, on one side. And what are the costs and benefits to expanding the fair use of these works, or thinking about creating additional rights that make sure creators are fairly compensated on the back end.

Brent Skorup: All right. Thank you. And I’ll put my first question to Pam and open it to the others. Pam, I know you teach copyright law courses. I’m not an IP expert. Most of our listening audience, probably, is not IP experts. Can you talk, just briefly, about — as new technologies and digital technologies have changed over the years — how copyright law has adapted to these changes?

Pamela Samuelson: Generative AI is not the first disruptive technology for copyright. And I think it’s important to sort of understand that. One that I really like to point to is player piano rolls. There was a time when lots of households in the United States had pianos that you put these little piano rolls in, and they would play music automatically. And that was really cool. But the people who owned music copyrights said, “No, that’s not cool,” and sued for infringement. And the Supreme Court said, “You know, this is not a copy of a sheet music, so it’s not an infringement.”

Now, Congress basically came along later and said, “Oh, we’re going to give a new exclusive right to copyright owners of musical work. We’re going to say that a mechanical recording of your music is actually something that you have the right to control.” So, Congress can step in if the courts basically say, “Well, based on the precedents, there’s no problem here.” Now, in the mid-1900s, the bane of the publisher’s existence was photocopies. And so, there were several lawsuits involving photocopies.

And sometimes the courts basically said, “Not an infringement.” And sometimes they said, “Well, actually, because a copyright clearance center will allow people to get a license to make photocopies for corporations, then that’s actually a harm to the market and, therefore, not fair use.” Then there was the Betamax case, the Sony v. Universal case, before the Supreme Court about videotape recording machines. And the Supreme Court decided that Sony was not a contributory infringer of copyrights, because people could use the Betamax machines to make time shift copies so that if they had to go to a soccer game, they could still watch the program at a later time. And that was fair use.

So, the courts have kind of gone back and forth. Now, of course, Napster and Grokster were examples of new technologies that enabled a lot of infringement to happen. And, of course, they got shut down. And so, with all of the new technology precedents, sometimes fair use prevails. Sometimes fair use doesn’t prevail. And while I think the precedents right now support the fair use defenses, again, fair use has to be decided on a case-by-case basis and fact-specific. And so, it’s too early to really call it one way or the other.

Brent Skorup: Okay, great. Thanks. My next question will be for Tim. Pam just mentioned Napster and Grokster went under, in part, because of copyright problems. You wrote — I think, a couple of months ago, a couple weeks ago — a piece I really enjoyed. You wrote about how copyright lawsuits pose a serious threat to generative AI companies. Can you talk about that piece? Talk about why copyright is such a massive issue for these companies.

Timothy B. Lee: Sure. So, if you think about a product like ChatGPT, the way that was made is that OpenAI downloaded a large share of all of the written content on the internet. And then they have what’s called a training process where, basically, they have a very complicated mathematical model that tries to predict the next word in a sentence. And so, they’ll take a document. They’ll give it the first word and try to guess the next word, take the first two words, try to guess the next word, etc. And they use the correct next word as feedback to the model, to make the model better at predicting the next word.

And, surprisingly, if you do that over billions of documents, you end up with this thing that is actually pretty good at generating something that seems like written human speech, written human thoughts. But, to do that, you need billions of documents. And it’s good to have a wide range of documents. And so, it would be way too expensive to hire somebody to create all these documents. You need to get them from somewhere. And so, the traditional way they’ve done that is to just download stuff all over the internet.

And, as I said before, when this was just a research project, I think people didn’t really worry about it too much. And OpenAI started out as this kind of non-profit research organization that was just doing kind of theoretical research and developing these models, just to kind of show they could. And, in the last couple years, they’ve shifted from that to producing a commercial product. There’s now a lot of startups and also Microsoft and other big companies that are trying to build commercial products using this. And so now the stakes are much higher and copyright holders are objecting.

The reason the stakes are so high is that if they find that this is copyright infringement, number one, that could be very costly for OpenAI because we’re talking about billions of works. And so, it could be billions of dollars in damages. But, beyond that, if the courts told OpenAI to stop doing this, then they would have to, basically, start over. They’d have to assemble a new set of data. And this is not impossible. So, there are some sources of data. Wikipedia, for example, is freely licensed. And so, it’s pretty clear, I think, that you could make a pretty strong case that that’s fair use, at least. Then you could go to big organizations like big news organizations or other publishers. And you could pay them for permission.

But, at minimum, this is going to be very expensive for these companies and, also, I think, will change the structure of the market a lot. Because Google, OpenAI, Microsoft, Meta — those kinds of big companies — they can afford to do the paperwork and make the licensing payments. But there also are companies building open-source models. And it’s hard to see how you would scrape together the millions or billions of dollars in licensing fees you’d need.

And so, I think the two big consequences: one, it will just be disruptive. There will be a period where maybe these models are not available or they just kind of take a big step back because the initial versions of the license models are much smaller. But, number two, I think you’ll see consolidation in the industry, because there might be only a few companies that can afford to do the paperwork and pay the licensing fees. And that won’t mean that, necessarily, only a few companies can do AI. But they might all have to license from these two big companies. And it could change the structure of the industry a lot.

Brent Skorup: Okay. Kristian, do you have something on that?

Kristian Stout: Just an additional gloss, going back to one of the themes in my introductory remarks that I brought up on thinking about this in terms of a set of economic trade-offs. When you’re thinking about the amount of content that Tim mentioned that needs to be ingested, in order to make this work, what that suggests that kind of complicates the whole licensing question is the marginal value of any one piece of content is going to be infinitesimally small. So, there’s no killer piece of content that the algorithms need. It’s really a quantity of content that they need access to, which is not impossible to deal with. So, for instance, radio licensing happens where there’s some sort of flat fees that are paid for access to works.

So, it’s theoretically possible to do this. But the vast quantity of material they need, and the marginal value of any piece, makes figuring that out much more complicated, especially when the thing that I think most people are really concerned about is, when you’re on the outputs, the ability to displace the market value of the original works. That’s what really matters, I think, to creators. Obviously, not everybody, but I think it’s what really matters. So, trying to connect the very small value that the work has on the training to the impact on the market for the creators’ works on the output. That makes it a very complicated economic calculation.

Brent Skorup: Okay. And this is not hypothetical. I believe part of the Hollywood strike was negotiating over some of these issues of AI outputs. I’ll put the next question to you, Kristian, and anyone else who wants to contribute. So, the timing for this talk is pretty good. This will be released in a few days. But a couple days ago, the White House released an executive order, very wide-ranging, over 100 pages. But it touched on some copyright and AI issues. It was an executive order about AI policy in the federal government.

Kristian, can you talk a little bit about what the competing commercial interests are here, and, also, some of the relevant major interest groups who are relevant for these discussions? And, also, is it just the copyright office? Or are there others who are relevant?

Kristian Stout: The real locus of this action is going to be in litigation, initially. So, the relevant stakeholders, if you want to put it that way, would be the judges that are thinking about how to do the fair use balancing, because they are going to have to answer those questions very soon. Apart from that — as Pam mentioned, some of the early player piano cases that became mechanical licenses — there wasn’t a right, at that time, that would have made sense for finding infringement, which is why Congress put the mechanical licensing right in place.

It’s possible that we could see something like that need to happen. So, the parties of interest are going to be judges and lawyers and litigants. But Congress, in the background, and policymakers at that level, have got to be looking at this and thinking if it’s going to be potentially hard for there to be bargaining between rights holders and AI producers because of the vast quantity of content needed, maybe there’s something else that needs to be introduced, some other form of right, possibly, that could be introduced to facilitate this bargaining.

One of the things I’ve been thinking about is what you could possibly do to expand the right of publicity from the state level into a federal right of publicity to give creators some assurance that, on the output side, when users are trying to create creations that are very striking and using the name and likeness of existing creators or different pieces of artwork that they’re using as prompts to get these outputs, maybe there’s some right that creators need on that output side that facilitates that bargaining, since, the input side, maybe it’s a little bit unknown, the value there.

Brent Skorup: Okay. Tim or Pam, do you have thoughts on what Kristian said?

Pamela Samuelson: So, there is actually a draft bill. I think it’s called the “No Fakes Act.” And I think the fake Drake song was galvanizing for a lot of people, especially in the recording industry. And there is existing case law under state right of publicity law that impersonating someone like Bette Midler or Tom Petty for a commercial purpose is actually a violation of right of publicity. So, there is sort of a recognition that not every state has a right of publicity law. And the states that do have different ones. And so, some sort of harmonization of right of publicity-type rights would be possible.

But there are other tools. Sometimes it’s trademark law that can protect a kind of persona, or the name of someone. Sometimes it can be false representation law. So, I’m a little bit skeptical about “Oh, it’s AI. Therefore, we have to have new laws.” But certainly, the executive order that was just published raises a vast array of policy issues that courts and policymakers will be grappling with for the next decade, I think.

Brent Skorup: Tim, anything to add on that?

Timothy B. Lee: I would say that this concern about publicity is one of the issues that I think will make it tricky for the defendants in these cases to make their fair use claim. Because Kristian was saying earlier that you need vast amounts of data to do this. And so, the value of any particular piece of data is insignificant. And I think that’s generally true. But one of the things you see with these image-generating AIs is that you can ask it to generate an image in the style of some particular living artist and it will produce a good facsimile of what that artist would have produced.

And so, if you’re in the shoes of that artist and there’s this machine that’s ingested all of your artwork and then can produce something in your style, I don’t think it’s the case that copyright law has a protection for styles, per se. Those aren’t copyrightable. But it does suggest that, clearly, that artist’s work is playing a significant role in producing the work. And it, also, I think, just looks bad. I think, practically speaking, when judges are deciding whether something is fair use, the fairness part really matters. People think, does this seem like a reasonable act you’re doing, a reasonable thing?

And so, if you have a machine that can create realistic facsimiles of what a particular artist would do, I think that is just going to smell bad to a lot of people. And so, I think that’s one of the reasons that it might be difficult for the defendants to convince courts that the training data is, in fact, of minimal value. Any particular piece is of minimal value and, therefore, it’s fine to say it’s fair use.

Brent Skorup: Great. Thanks. Next, I’d like to go to Pam and whoever has a view on this. But to Pam first. It’s been mentioned multiple times. A lot of this will be in the hands of the courts. A lot of this is in litigation already. Can you talk about some of the cases that are most relevant and most important and, also, the issues that are most important? You mentioned training versus output uses. But, yeah, I’m curious what cases you think are most impactful for this industry and for policymakers.

Pam Samuelson: I think the case that is the closest analog is the Authors Guild v. Google case. So, you have Google basically copying 20 million books from research library collections. And they scanned them and made multiple copies and indexed the contents, and then served up snippets in response to user search queries. Now, one of the things that the case law in fair use has paid a lot of attention to is whether a use is transformative or non-transformative. If it’s transformative use that actually tips in favor of fair use.

And in cases like Authors Guild v. Google, you had the Authors Guild saying, “Hey, it’s commercial and they’re copying the whole thing.” And the court says, “No, actually, we think this is transformative because the purpose that Google had in copying the works is totally different from the purpose that the authors had when they wrote the books.” And the difference of the purpose means that the use is not competing in the marketplace with the original and supplanting demand for the original. And so, the difference of purpose counts a great deal. And, again, in the Authors Guild v. Google case, the court basically said that Google’s use was highly transformative. And, because it was highly transformative, the fact that it was commercial got a lot less attention.

And even though the Authors Guild said, “Oh, but we could have a licensing market for this,” and “We could have a licensing market for that,” and “Oh, this is a harm,” and blah, blah, blah, blah, blah, the court just didn’t buy it. It just basically said people use the Google book search to try to find information. They’re not trying to get the book, and the contents of the book. If they want the contents of the book, they have to go to the library. They have to basically buy the book.

And I think that if you say, “Is the use of these works as training data a different purpose than the original?” you kind of say, “Yeah.” Using it as training data essentially is not using the work and the expression in the work. As a work, it was using it as data. And when you’re using it as data and you’re trying to extract lots of information about how works are constructed, it’s the case that the models that are generating the output do not have copies of the training data embodied in them. The models were trained on the training data. But the training data is a separate entity from the model, which is embodied in software.

And so, the training data, generally speaking, is not a commercial product. And it may only be used once in training the model. Or it’s going to be used, probably, infrequently, if the model is going to be updated. So those considerations seem to me to be quite salient to the fair use defenses that the generative AI companies are raising.

Brent Skorup: Kristian, are there other cases or issues that you think are most important for this industry?

Kristian Stout: Just one quick note on my previous answer. I didn’t want to suggest that I thought that the difficulty of assessing the input value was relevant to the fair use analysis. It was more about how it will create difficulties in finding the right kind of bargain between the parties. On the fair use side, I actually differ slightly with Pam on the way I think Authors Guild applies. I think her argument is a very strong argument that the AI producers are going to use. I think the fair use analysis goes a little bit differently.

I think the Authors Guild case is actually more useful for the AI producers for the proposition that the creation of these AI systems has enormous public benefit. And then, that will weigh in the transformative use analysis on that side. In terms of the transformative analysis more generally, though, I think the Authors Guild case isn’t as helpful. Because, as Pam mentioned — and as they talk about in the case — the point was that there was a completely new product generated. So, it wasn’t superseding the market for the original because the actual output that that use was creating was a search index.

In the case of AI systems, I think it’s a little more complicated than the facts presented there, because you actually have two completely separate pieces of software. And one of them would qualify as fair use. The model, once it’s created using whatever it’s learned, even if we imagine that it has memorized some things, that’s probably fair use at that point. The trainings out of the system, I think, actually won’t qualify. Because, if you think about it, what these models are doing is they’re learning. They’re learning what human expression is.

So, what that means is that they’re being presented with copies of works for the purpose of learning about the expressive content of those works. That is, they’re using the works for the purpose for which they were created. It’s very similar to if you give textbooks to art students. You give the art students textbooks. You show them copyrighted works in order to train them to be artists. Just because they want to go and create more art doesn’t mean that their accessing of copyrighted works in order to train to be artists should now be considered free use. They still have to buy their textbooks.

And there’s a case. I believe it was in the second circuit. It was the American Geophysical Union v. Texaco, which, I think, actually has facts very similar to what is pertinent here. The American Geophysical Union sued Texaco because Texaco was in the practice of subscribing to scientific journals, taking a single copy of a scientific article, then making photocopies and keeping that in filing cabinets in order to train their scientists. And they brought up an argument that was actually very similar to the way the AI producers talk about their fair use.

They’re saying, “Well, what we’re really doing is we just want to extract the scientific content for the purpose of creating this enormous public benefit: scientists that are well-trained in order to go out and do more discoveries. We don’t really care about making copies for redistributing the works, in themselves.” And the court disagreed. The court said, “No, you’re using the material for the purpose for which it was created, which is to demonstrate scientific principles.”

It’s the same thing with showing an AI system all the works of Stephen King or all the works of Salvador Dali. You want it to know. You want it to learn what Salvador Dali did, in order to put that information in the system. And there’s a way to do that. You can license that, under the current regime. Now, that might be expensive. It might be prohibitive. And that’s why we might need further changes to the way our system works. But on the fair use side, I think it’s really hard to say that these systems are not using them for their original expressive content and are just merely superseding the original use.

Pam Samuelson: So, I have to jump in here and say that the Texaco case is basically about making exact copies of in-copyright works. And that was the output. And so, if the generative AI systems basically spit out infringing material on a regular basis and were designed to enable that, then that would be similar. But one of the things that at least the generative AI systems that are defendants in these cases — so far as I can tell, overwhelmingly, they don’t spit out infringing material.

And even if a determined person can cause an infringement to happen, that person may be the infringer, but the maker of the generative AI system may say, “Hey, I have a technology that has and is capable of substantial non-infringing uses. And, under the Sony v. Universal case, I’m entitled to a safe harbor, so that even if there are some infringing outputs, it doesn’t mean I’m a bad guy.” And so, again, I think that we’re in kind of uncharted territory. And Kristian may prove right over time. But I’m saying the precedents right now look more favorable for the defendants than for the plaintiffs.

Kristian Stout: And I’ll admit this is uncharted territory. When I happened on these cases — because of the way I understand the way these software systems work — the reason why I think that case applies is because it’s actually two separate pieces of software. In the first piece of software, you’re literally making a copy of a work and showing it to a system to train it. And the system is interpreting its expressive content and then it’s outputting something that then becomes the model that users interact with. I think, because of the way that system is structured, those initial copies, that’s what makes it very similar to that Texaco case.

Brent Skorup: Next question I’d like to turn to Tim. Tim recently wrote about this for Ars Technica, this issue of — I don’t know what you would call them — prompt engineers trying to get their outputs recognized and copyrighted. And Tim, can you explain what’s going on here, and why you think the copyright office is making a mistake when it comes to people trying to copyright their AI outputs?

Timothy B. Lee: Sure. There have been several cases where people have used AI systems to generate images and then sought copyright registrations for them. And the copyrights office’s position is that to get a copyright, a work needs to have a human author. And these works were not created by a human being. They’re created by an AI system. And, therefore, they’re not eligible for copyright. And I am pretty skeptical for this. Because, in my view, an AI system is a tool. It’s not a separate entity.

And I think the best analogy to this is a camera. If I pull out my iPhone and take a snapshot of my desk, there’s a sense in which that’s a completely mechanical process. I’m not deciding which pixels light up. It’s just whatever happens to be in front of the camera. But we still give copyright. Because the act of choosing which way to point the camera and how to set the settings and when to click the button is a creative act. And, in my view, the uses of AI are pretty similar.

For any given prompt, the result that comes back from that prompt is a fairly mechanical process, based on how the system is programmed. But there’s a huge range of different prompts you can use, in the same way there’s a huge range of different ways you can point your camera. And I think the same basic concept applies. And I think this is particularly important. Because I think people have this idea that the current kind of crude AI prompt-based image generation is going to be the last word on AI and that all we’re talking about is if you type a prompt into Midjourney, can you copyright the resulting image?

But, as this technology advances, I think the line between AI and regular software is going to get more and more blurry. You already have — Photoshop has a bunch of filters where you can do things like remove or add backgrounds. And it uses AI if there’s something obscured. Say you’ve got a picture, and you want to remove a person, and what’s behind that person is obscured. It uses AI to figure out what’s the most logical background for this. And so, over time, I think you’re going to have more and more use of AI in image editing products. Cameras, I think, use AI to sharpen images and do things like that.

And so, if you had a rule that “AI generated content” can’t be copyrighted, it’s just going to be a total quagmire for the copyright office because I don’t think AI is a clearly defined concept. And, if you think about it in the photograph context, we could have had a rule that said only “creative” photos get copyrighted. So, if you’re taking a picture of your breakfast, that’s not copyrighted. But if you are staging some kind of professional photoshoot, maybe that’s copyrighted.

But just think about what a nightmare that would be for the copyright office and the courts to figure out which photos are created with that for copyright protection. I think the way that we’ve dealt with it instead is that, by default, almost all photographs get copyright protection, and then we use other concepts like fair use to decide how broad is that copyright and under what circumstances can people use elements of photographs for their own purposes.

Brent Skorup: Kristian, Pam, do you have thoughts on this? Can users, can prompt engineers copyright their outputs?

Pam Samuelson: I actually wrote an article about this in 1985.

Brent Skorup: Wow.

Pam Samuelson: So, it was a hot topic back in the mid-1980s. And the U.K. and some of the Commonwealth countries actually passed a kind of copyright-like law to protect the outputs of computer-generated stuff. And so, it may well be that Congress will consider a sui generis — of its own kind — type of IP right, similar to copyright, for computer-generated works. But, as Tim notes, the copyright office right now has a very firm position that AI-generated stuff is not protectable. And not only do they say it’s not protectable, but if you include some AI-generated material in your work that you seek to register, you have to not only identify it, but you have to disclaim authorship.

And I think that, for reasons which Tim gave, it’s going to be really hard for the office to say, “Well, you know, here’s the 63 prompts that I used to generate this particular thing and refine it in this way and this way and this way.” But that’s not authorship. So, they are going to have to make some very fine distinctions. But they’re pretty firm right now that human authorship is absolutely essential. And, interestingly, the authors guild and some of the other organizations representing creators, they like the fact that AI-generated stuff can’t be copyrighted because then it can’t really compete with their works as much as if they were copyrighted too. And so, that’s another kind of policy issue for policymakers to grapple with.

Kristian Stout: And, I would add, I agree with all of what was just said. I am pretty much in disagreement with the way the copyright office is putting their stance in place. Generally speaking, I think it requires very little human intervention in a piece of work for it to be considered a work of authorship. So, the fact that you selected it — almost certainly going to do some kind of touch up or framing or something — that’s enough, probably, for copyright protection.

There are potential objections. One of them is just, I guess, what you’d call an X-factor, where’s it’s just like “Oh, well, it’s bad art. Why would you want to copyright bad art?” But we copyright bad art all the time. There’s a lot of really terrible stuff that gets copyright protection. It’s very unimaginative, very unoriginal, and human made. And we copyright it. So that’s not really very convincing.

There is one potential scenario that could happen where I could see the Copyright Office wanting to be careful, which would be if you were to set up a system that would generate thousands and thousands and thousands or millions of pieces of content. And you want to submit them all to the Copyright Office in bulk and say, “Give me a copyright on this because it did it.” And a human never intervened. They just took an output directory. And now they want copyright protection and all that. That could lead to complications.

But then, again, I don’t know how good all that stuff’s going to really be if a human is not involved. There was someone in the recent past who tried to use an AI to generate every conceivable melody — I remember reading about this — so that they could get some kind of de facto common law copyright over it. And what it generated was unusable and really, good luck. So, theoretically possible that you don’t want to recognize copyrights in computer-generated material. But I don’t really see that right now being warranted.

Brent Skorup: Okay. Great. Well, thank you. We’ll have to end it there. That’s the end of our time for this. We could go much longer. But I want to thank Tim, Pam, Kristian, for joining me to discuss these things, and Steve and The Federalist Society for agreeing to host this discussion. Steve.

Steven Shaefer: Thank you all. Thank you to our experts. And thank you to all our audience who’s listening. And for more content like this, please check out regproject.org. That’s regproject.org. Thank you.

[Music]

Conclusion: On behalf of The Federalist Society’s Regulatory Transparency Project, thanks for tuning in to the Fourth Branch podcast. To catch every new episode when it’s released, you can subscribe on Apple Podcasts, Google Play, and Spreaker. For the latest from RTP, please visit our website at www.regproject.org.

[Music]

This has been a FedSoc audio production.

Timothy B. Lee

Understanding AI

Pamela Samuelson

Richard M. Sherman Distinguished Professor of Law Professor of School Information Co-Director

Berkeley Center for Law & Technology

Kristian Stout

Director of Innovation Policy

International Center for Law & Economics

Brent Skorup

Senior Research Fellow

Mercatus Center, George Mason University

Emerging Technology
Intellectual Property

The Federalist Society and Regulatory Transparency Project take no position on particular legal or public policy matters. All expressions of opinion are those of the speaker(s). To join the debate, please email us at [email protected].

CALL FOR PAPERS: The Future of Law in an AI World

Tech Roundup Episode 22 – Training Artificial Intelligence & Copyright Law

Speakers

Moderator

Topics

Related Content

Regulatory Transparency Project

a venture of
The Federalist Society

1776 I Street, NW, Suite 300
Washington, DC 20006
fedsoc.org
[email protected]

The Federalist Society and Regulatory Transparency Project take no position on particular legal or public policy matters.

© The Federalist Society.

CALL FOR PAPERS: The Future of Law in an AI World

Tech Roundup Episode 22 – Training Artificial Intelligence & Copyright Law

Speakers

Moderator

Topics

Related Content

Explainer Episode 63 – Super Elections Year

New Legal Analysis: Regulatory Implications of Turning Internet Platforms Into Common Carriers

Tech Roundup Episode 21 – The CHIPS Act, Immigration, and the Innovation Economy

Get RTP content in your inbox!

Sign up now to stay up-to-date on all RTP content and events.

Regulatory Transparency Project

a venture of The Federalist Society 1776 I Street, NW, Suite 300 Washington, DC 20006 fedsoc.org [email protected]

The Federalist Society and Regulatory Transparency Project take no position on particular legal or public policy matters.

© document.write(new Date().getFullYear()) The Federalist Society.

a venture of
The Federalist Society

1776 I Street, NW, Suite 300
Washington, DC 20006
fedsoc.org
[email protected]

© The Federalist Society.