Eisenhower Executive Office Building, home to the Office of Science and Technology Policy (Official White House photo, by Carlos Fyfe; Public Domain; Source: Wikimedia Commons)
Authors Alliance and SPARC have released the second of four planned white papers addressing legal issues surrounding open access to scholarly publications under the 2022 OSTP memo (the “Nelson Memo”). The white papers are part of a larger project (described here) to support legal pathways to open access.
This first paper discussed the “Federal Purpose License” and how it supports federal public access policies under the Nelson Memo. This second paper discusses legal landscape surrounding the Federal Purpose License and the public access policies in light of concerns that the policies are not permissible government actions. The white paper explains why they are.
The White Paper is available here. Supporting materials, previous papers, and other formats are available here.
In the last couple of months there has been a lot of change in the Federal grants space, but so far the public access policies, including the latest announced by the Nelson Memo, are still in place. Several agencies have already implemented their responses to the Nelson Memo through regulation; the rest are due to finish the task later this year.
For Federal agencies to act permissibly, their actions must be grounded in valid Congressional delegation of authority. Congress can’t escape its limitations by delegating beyond its own authority, so to be valid delegation the actions also must be permissible actions for Congress. The first part of the paper examines Congress’s constitutional power to provide grants for research and development, finding support under both the Spending and Progress clauses.
The Federal Purpose License places a condition on acceptance of grant funds. Congress doesn’t have unlimited power to place conditions on grants, as established by the Supreme Court in South Dakota v. Dole. However, the Federal Purpose License safely falls within the limitations placed by Dole. In particular, the Federal Purpose License as a condition does not violate either the First Amendment’s Speech Clause or the Fifth Amendment’s Takings Clause.
The second part of the paper looks at Congress’s delegation of authority and the agencies’ development of the policies. The paper explains how Congress expressly—and permissibly—delegated power and obligation to create the prototype of the public access policy to the National Institutes of Health, and how subsequent application of the policy to the rest of the grant-making agencies is strongly supported by principles of implicit delegation, and were established through appropriate rulemaking. Though the recent case of Loper Bright Enterprises v. Raimondo may require agencies to satisfy a somewhat higher burden when defending their actions, the Supreme Court’s abandonment of the “Chevron doctrine” does nothing to change the permissibility of the public access policies or the use of the Federal Purpose License.
The next paper will examine the interaction between institutional intellectual property policies and federal public access policies, and the final paper will discuss issues surrounding article versioning. Watch this space for more!
A Recent Entrance to Paradise, an image generated by Steven Thaler’s “Creativity Machine.”
Yesterday, the U.S. Court of Appeals for the District of Columbia Circuit issued its ruling in Thaler v. Perlmutter, a case centered on the question of whether a non-human author, without any intervention from a human, could be an author and hold copyright under the U.S. Copyright Act. The court found that a non-human machine could not be an author under the Act.
In virtually every way, this decision should not be surprising. While it is absolutely conceivable that the product of AI and human collaboration may result in copyrightable works, it is well settled law that non-human authorship is not recognized under the U.S. Copyright Act. This opinion is mostly a repetition of the positions taken by the U.S. Copyright Office in its denial of registration.
That acknowledged, there are some points worth highlighting from the opinion:
First, the court centers much of its analysis on the text of the Copyright Act and the myriad ways in which the statutory language is dependent on humans as authors. Taken together, the Act is unarguably one that is built upon the premise of human authorship. The court says: “All of these statutory provisions collectively identify an “author” as a human being. Machines do not have property, traditional human lifespans, family members, domiciles, nationalities, mentes reae, or signatures.”
Part of the court’s analysis is focused on whether the public would benefit from granting copyright to machine-authored works and ultimately concludes that it would not. The court says: “But the Supreme Court has long held that copyright law is intended to benefit the public, not authors. Copyright law “makes reward to the owner a secondary consideration. ‘[T]he primary object in conferring the monopoly lie[s] in the general benefits derived by the public from the labors of authors.’”
It is important to remember that this opinion is only about the narrow question of whether a machine, working in isolation and with no human intervention, can be considered the author of a work. We should be careful not to try to extend this opinion beyond that. “Those line-drawing disagreements over how much artificial intelligence contributed to a particular human author’s work are neither here nor there in this case. That is because Dr. Thaler listed the Creativity Machine as the sole author of the work before us, and it is undeniably a machine, not a human being.”
Finally, the district court found that Dr. Thaler had waived the argument that, as creator of the Creativity Machine, he was the work’s author. The Court of Appeals found that Dr. Thaler had not challenged that waiver and that it therefore could not address the question of whether works generated by Artificial Intelligence might be authored by the creator of the AI. (“Dr. Thaler argues that he is the work’s author because he made and used the Creativity Machine. We cannot reach that argument.”) This leaves some ambiguity as to whether a future creator of an AI might successfully claim copyright in a work themselves. It also leaves open questions where the human user of AI claims to be the author of an AI-generated work or portions of a work. This is the question the court will have to address head-on in Allen v. Perlmutter, a case currently pending in Colorado. We will continue to watch this space, and share with you any new developments.
Ultimately, the Thaler v. Perlmutter decision is limited to the fact that a machine cannot be an author under copyright law. This is a sensible result and consistent with sound public policy.
We’ve heard from lots of authors with questions about AI licensing of their works by their publishers. Cambridge University Press is one that has been in the news because it has undertaken a project to ask authors to opt into a contract addendum that would allow CUP to license AI rights for their books, giving authors a royalty on AI licensing net revenue. Cambridge has shared an FAQ with authors already, along with a further explanation of its approach last September and a report in January highlighting that it had contacted some 17,000 authors, the majority of whom have opted in.
Below is an interview with Ben Denne, Director of Publishing, Academic Books, at Cambridge University Press, answering some questions about the program.
Dave: Thank you, Ben, for talking with me. To start off, could you say what your role is at Cambridge University Press?
Ben: I’m the Director of Publishing for the Academic Books part of the Academic Division of Cambridge. In short, I’m the director overseeing the whole of the books program, the Academic Books program for Cambridge, except for the Bibles. That’s a specialist unit that runs separately that I don’t have anything to do with, but that means our textbooks, our research and reference books, and then we have a kind of small program of more traditional academic titles that sell a bit more to a bit of a wider audience.
Dave: Thanks. My interest in talking with you is about generative AI licensing. And we’ve had quite a few authors actually forward us some emails that they’ve gotten from Cambridge presenting an AI license addendum to sign that goes with their contract and also an FAQ. I’d like to ask just a few questions about how that’s going and how that works.
What are Cambridge University Press’ goals with AI licensing?
Ben: That’s a really good question. Broadly speaking, when this started to come our way, which was the same time a couple of years ago as this subject became really noisy. We’re looking at it and thinking, what’s the best way through this? How do we appropriately engage in this conversation? And I think it came back to us thinking about encouraging responsible use and thinking about our role as an academic publisher.
And I think our role as an academic publisher is to push the academic debate forward, which means that we want our authors’ books to get read. We want them to get used. We want them to get cited. I think that’s really the kind of spirit we came into this conversation with is thinking, these developments are happening, right, that they’re happening anyway and the best thing we can do as a publisher is try and engage with this debate and push it in a direction that we think really helps to underline those principles of how good research is done.
Dave: One of the things that I’ve seen with CUP’s rollout with this asking authors is, first of all, that you are asking authors. Could you talk me through that decision? We’ve seen some other publishers in the news just announce that they have licensing deals with technology companies, and there was no outreach to authors as far as we can tell from those publishers. So could you talk through that thought process of this outreach?
Ben: Sure, so for us, when we first looked at this, we have a contract that authors sign, which is, probably in many ways, very similar to contracts that they signed with other publishers, and it includes all sorts of clauses about use and wide ranging licensing rights. And one of the things it covers is derivative uses for content and the right to make derivatives. Our sense with that is when we looked at this in the context of these AI conversations and licensing, from a legal perspective, we looked at that and thought, well actually that derivative use clause technically does cover us for this kind of work. And I’m sure that’s the conclusion that some other people have reached too.
But we also thought, it just feels a bit like nobody knew that this kind of technology was emerging when they signed those contracts. And so from our perspective, we thought there’s a lot of noise about this subject in the whole ecosystem right now, you know, you can’t read the news without reading about AI, and people are nervous about it, understandably, and all of those kinds of things. So we felt that we should treat this as additional consent and approach it in that spirit. And that really underpins the decision to go out with the addendum for existing contracts.
I don’t want to jump onto any of your other questions, but that kind of principle, that we were going to ask for opt-ins, was important. Authors have to actively opt into this. We’re not saying to them. “if we don’t hear from you, we’ll assume you’ve opted in.” They have to actually come back to us and say that they’re happy for that use to happen.
Dave: I think one of the things a lot of people don’t think about is how complicated rights clearance is, especially at scale, across a title list that is the size that you have. So this seems to me like a pretty big investment in just doing this process. Could you say how many of these you have sent out? I gather that you’re doing this in batches, but do you have a sense of the scale of how many author addendum requests you anticipate making over the course of however long this process lasts?
Ben: It’s a really good question and it’s a moving target. At this stage, we have sent out multiple thousands. But I think we have about 45,000 books available in print and digitally at the moment. And we’re working our way through that list systematically. So we’re in the thousands and you’re right. It is a pretty big undertaking you know it’s quite a logistical challenge to do. We had to set up a whole kind of new workflow for doing this. We have a team that are working on the addenda and addressing the questions that authors have and all of those kinds of things.
Dave: This is maybe getting in the weeds, but it seems to me like there’s a pretty big difference between figuring this out for a sole-authored, single-part monograph, for instance, which is mostly what I’ve seen come through, and edited volumes. Have you tried to figure out those more complex books with multiple authors, multiple works within them?
Ben: Yeah, so the way it’s working for us is where we have several contracted authors for a book, we’re contacting them all and all of those authors have to opt in in order for us to agree that we have the licensing rights.
For edited volumes with multiple contributors, we’re not contacting the individual contributors for opt-in and there are a couple of different reasons for that. Typically, they don’t get paid royalties and also it would just be impossible for us to do. I mean, that’s logistically, you know, that’s a huge ask. So what we are doing is we are still contacting the editors for those volumes and the editors will opt in or not. So if the editor opts in, our understanding is that they’re opting in on behalf of the contributors as well.
But for multi-authored works, we get in touch with all of them. And in fact, we have quite a sizable number of books which are stuck because we’ve had some authors opt-in and some authors not opt-in.
Dave: This is a pretty fast-moving technology and I think a lot of authors are feeling just uncertain right now. And so I wonder about the opt-in window, if an author declines to opt in right now, is that it? Is there an opportunity to come back later after the dust settles and say, oh, no, actually, you know, I’d be happy to have my work used in this way?
Ben: Yeah, definitely. We’re in the process of putting something in place so that if authors don’t opt in now they are able to come back and opt in later. And by the way, if they don’t opt in, that’s fine for all the reasons that you just said; some people are queasy about this and that’s okay. We’re not trying to, we’re not putting a hard sell on it
My sense with this is that for some of the people that we’re speaking to who haven’t opted in, it is because they haven’t yet really seen what the kind of use cases are for this kind of technology. Perhaps as those become more public, people will want to come back and opt in.
I think some of the things that are out there are going to be quite powerful discovery tools in the future. So we want to make sure the authors do have the opportunity to opt in later if they want to, although we can’t, of course, be sure that if people opt in later the same opportunities will necessarily be available then, since this is quite a fast moving area.
Dave: For your contracts moving forward for front list books, is a clause like this now a default in those agreements or will authors of new books have the option to opt-in or opt-out for AI licensing?
Ben: Good question. Currently, we have put a clause into our contracts to add AI licensing. But, where authors are asking us to remove that clause, we’re taking it out.
And again, coming back to your point before, those authors could opt in later. But for the contracts as they go out, we have it in as a clause now.
Dave: Okay. So let’s shift to if you’re gathering all of these rights from authors, presumably at some point, then you would actually engage in the licensing with technology companies or others. Could you say a little bit about that? Do you have any deals in place with tech companies already? Or, the other thing that I’ve seen is, some publishers have been in the position of not doing those deals directly, but having sort of sub-licensing deals with others- I understand Proquest Clarivate is doing this. And I think Wiley is as well. Do you have any of those deals in place now?
Ben: We’re still having those conversations at the moment. And we are talking to a range of different people who are looking at this kind of content.
Dave: Okay, that’s really helpful to know.
At the beginning, you talked a little bit about Cambridge University Press’s motivations with engaging in this space and doing licensing. Could you talk a little bit about important factors for what might show up in one of those kinds of deals with tech companies? For instance, one of the things that I think aligns with the sort of values that you outlined at the beginning and that authors care a lot about is credit, right? We know that, especially for academic authors, credit is incredibly valuable and important. And so I wonder if you’ve thought about how ensuring author credit might factor into any sort of downstream deal that CUP might engage in?
Ben: Absolutely. So we’re having exactly those conversations at the moment with anybody that we’re talking to. And we’ve been very clear with our authors when they’ve asked questions about this, and you may have seen this alluded to in some of the information that you’ve had forwarded to you from authors, that those principles of attribution are 100% what we’re focused on. Really, they’re kind of a red line for us.
One of the things we’ve been in lots of conversations with people around this technology is the question of at what level does content need to be attributed? Our sense with this is that any kind of meaningful extract from somebody else’s work needs to be cited.
I’m kind of repeating myself, but that’s how research works. People build on other people’s work, and so in a scenario where content is being ‘discovered’, if we can’t identify and cite that content, it can’t be accurately attributed. So that’s a red line for us.
Dave: Right. I think figuring out that attribution, like at what level does that attribution need to kick in, is a really tricky thing. It seems to me, that if you’ve got a foundation model that is pulling in some texts and then someone’s using, say ChatGPT to write emails and somewhere in the model it gleans some structural components from sources like academic books, I don’t think that’s the thing most authors care about – being cited for the fact that you help train this model to understand how to format citations or do other things like that. It’s the intellectual content that matters and that’s the really tricky piece of it.
Ben: Absolutely and I don’t have an easy answer for you there. So we’re having those conversations at the moment, but our sense is that any sort of direct quote, anything that could be, you know, anything that you would consider to be plagiarism or worthy of credit in a non-AI world should be attributed.
Dave: I realize this question is asking a hypothetical because you don’t have any of these agreements in place yet, but it seems to me there’s a pretty big difference between use of Cambridge books for model training and uses such as for Retrieval Augmented Generation (RAG).
Have you thought about those distinctions in terms of how that might affect differences in Cambridge’s willingness to set a price on those things? I assume retrieval-augmented generation (RAG) would come with a higher licensing price than others. But could you talk me through that thought process?
Ben: So it’s kind of interesting because I think there’s a little bit of a gray area, because I think a lot of the RAG tools are combined with some aspect of LLM. So they might belooking to summarize some research or write a brief about X, Y, and Z.
I think it is quite interesting at the moment that most of the questions we get from people who are worried about this are really anxious about LLMs, but I feel like the really exciting place for academia and research is around that kind of retrieval augmented generation because that’s what’s going to help with discoverability for authors. It is difficult to talk about at the moment because we don’t have any public deals that I can point to. But I’d say a lot of the conversations that we’re having are somewhere between those two things, you know, so it’s a combination of an LLM that’s generating text and a citation engine or discovery engine sitting over content.
Dave: Leaving aside the legal situation for a moment, one of the things that I hear from authors pretty consistently is the sentiment that with these big technology companies coming in, they feel that these companies are sort of profiting off of content; that they are exploiting. And so they ought to return something to the system and to authors.
But there’s a really different sentiment about what happens when you have, say, academic researchers using content for AI or text data mining purposes to make new discoveries or learn new things both about the texts and about the world around them. We work a lot with text data mining researchers who are interested in large aggregations of content, not so they can build the next OpenAI, but so they can understand how language has changed over time, or how has culture changed over time.
I wonder from CUP’s perspective, how do those two different kinds of use cases factor into your thinking about downstream licensing deals for AI/ text data mining?
Ben: Yeah, I think for us that the primary thing we’re really trying to lean on, because of course the whole thing is not quite that clear cut, because a lot of the time it’s the big tech companies that are facilitating a lot of that discovery or that a lot of the kind of discovery traffic goes through them. So I think from our perspective, I’m going to say we’re not ruling out working with anyone. We would put anybody– any partner that we had– through the same diligence process that we would have with onboarding anybody else, but we wouldn’t rule out those conversations with anybody. I think for us, the most important thing is coming back to, and I’m going to sound like a stuck record here, but those principles of attribution. And we have had conversations, some preliminary conversations with people who’ve said, “Well, we don’t think it would be possible to do what you’re asking,” and at that point, we’re saying, “well, okay, then you know that’s the red line for us.”
I think there’s quite a bit of cloudy territory between those two things. And I think for us, the most important thing is to make sure that authors are being credited where their work’s being used.
Dave: All right, I have a hypothetical that I wanted to give you. So we see that it’s a 20% royalty calculated on net revenue. Let’s say you received $5 million from an AI licensing deal. Can you walk me through how that might work out for the author? How do you calculate net revenue on that? And then, how that the individual author sitting there sees CUP signs a big deal. What can they expect?
Ben: That’s a tricky one because it would depend a little bit on the terms of the deal as well. But broadly speaking, the principle is, if that’s the net revenues that we receive, so in your situation, you had five million in there, the full licensing payment, is divided out across the list of titles. Authors then earn the royalty for that sale or license type per title, as they do now with all other forms of licensing.
But, then, where a licensee can provide accurate title-level usage within their royalty statements, this would instead be used. So in an LLM situation that you were just talking about, that would be divided among those books. With the retrieval augmented generation tool, I think that would work much more around the basis of usage. So, depending on what searches within that tool were bringing back particular content, then we would be attributing revenue that way.
Dave: Okay, that makes a lot of sense. I think this was in the FAQ: one of your use cases is in an authoritative database that’s used on a perpetual basis. But there was somewhere that talked about the removal of content once a licensing term has ended. I wonder if you’ve developed thinking internally about what a standard term would be, how long these things might last?
Ben: Yeah, I mean, it’s hard, isn’t it? Because where you’re licensing content to train an LLM, it would be sort of insincere to dress that up. Generally most agreements would be governed by a 2-5 year training term and at the end of that term the training data set would be destroyed, however, they would retain the output from the specific models that were developed during the training term. If they wanted to create new models they would need to renew the license/extend the term.
For some of the other uses that’s all being discussed at the moment. I think there is still work on this, but there would be standard partnership length terms. What I would say is that from our perspective, we think it’s quite likely in the next few years, the focus will move more away from training large language models and into that area of discovery that these are going to become quite important revenue streams for academic publishers.
Dave: Thanks, very helpful. As you work on these deals, what level of transparency do you plan on offering authors or the general public about what these licenses might look like? At least with other publishers, it’s been quite mysterious – I think with one, we learned about an AI licensing deal in a quarterly earnings report, for instance. I think authors do really care about what the details of these deals look like.
Ben: It’s tricky, isn’t it? It’s hard for me to talk about a deal that hasn’t been done already, and of course, these deals can be subject to the same commercial confidentiality requirements as any other partnership. But I think it’s fair to say that Cambridge University Press would endeavor to be pretty transparent about what we’re doing generally and most importantly, be transparent about why we’re doing it. So I don’t think we’d be concealing that information from anybody. And coming back to my point before, we’ve been quite clear that we only want to enter into these kinds of conversations with people that we think are using content responsibly, and we’d always aim to be open.
Dave: A few final questions. First, CUP has published a number of open-access books. For example, I believe CUP was part of the TOME initiative. Do you feel like this kind of addendum is necessary for those open-access books, given that they already have some sort of open license attached to them? Or do you think that this is a necessary addition to those OA licenses?
Ben: That’s a really good question, and it’s something that we’re grappling with at the moment. Without getting into the kind of weeds around open access, some of it depends on the license. Historically for books, our default license open access license was a Creative Commons CC BY NC license, which prohibits commercial reuse. I think at the moment, we’re looking at that (and I think a lot of publishers would say the same thing) and working through how that fits with AI licensing with commercial AI companies. The short answer to your question is if you have a CC BY license, then, people do have a broad license to reuse that content. So at the moment, we’re not actively going after those authors for opt-ins, nor are we including those books in licensing deals.
That we’re doing, but that’s also a relatively small number of books. I can say, we are now looking at using more CC-BY-NC-ND as the default, which restricts the creation of derivative works. You’ve touched on a conversation that is evolving, but we would be treating AI usage as requiring a derivative license and therefore not covered under a CC-BY-NC-ND license.
Dave: Thanks, that’s very helpful and I think that’s something a lot of authors are trying to figure out: how does AI downstream use factor into Creative Commons licensed works? And of course, the underlying legal situation matters. I didn’t ask, but I assume that the rights that you’re asking for in this addendum are worldwide, since that affects for example whether usage might be permitted under national law.
Ben: Yes, the rights are worldwide.
And thinking again about that, I mean, it’s interesting, isn’t it? Because even under the CC-BY license, it doubles down on that principle of attribution as well. That’s the nature of the license so some uses even then may not be covered by that license.
Dave: Right. That attribution piece under the CC-BY license will be an important one [note: this issue is being litigated, most prominently in the Doe v. Github suit]. And then, there’s also the underlying question of what the law allows independently even if there is no license–open license or otherwise. I know right now there’s a consultation that just closed in the UK about what the law should be, and in the US, we’re fighting these things out in the courts. I think there are 39 lawsuits right now pending about various aspects of this, and a key question in most of them is just how far fair use goes. And of course, you know, if fair use applies then you don’t have to worry too much about what the license says, whether it’s CC BY or CC BY NC ND or anything else. This is like reading tea leaves but I think the prevailing case law indicates that model training and coming up with the weights has a pretty strong fair use case, but for the output side, that’s where I think it starts to stumble a little bit when you’ve got systems that are producing outputs that are substantially similar to the inputs. So I wouldn’t be surprised if in some of these suits, we get a ruling in favor of fair use and then in some of them we get a different outcome. And then, the landscape is just sort of messy.
And I suppose in the UK, I imagine y’all are watching what that legal landscape looks like around the world as it’s changing.
Ben: Yeah, absolutely.
Dave: One final question: we’ve talked a lot about licensing books for AI, but CUP has a substantial journal portfolio as well. Can you say anything about CUP’s approach to use of journal content either as AI training data or for other AI uses?
Ben: We’ve been more focussed on books, as this is where most of the demand has been to date, but we have seen a developing interest in journal content. We are, therefore, currently exploring this form of licensing in a consultative way with our journal partners.
Dave: Well, thank you for talking. And this was really, really helpful. And I think that this will be useful for authors who are trying to understand just more about what’s going on.
Today, we submitted a response to a Request for Information from the Office of Science and Technology Policy (OSTP). The OSTP is seeking to develop an “AI Action Plan,” to sustain and accelerate the development of AI in the United States. As an organization dedicated to advancing the interests of authors who wish to share their works broadly for the public good, we felt it imperative to weigh in on critical copyright and policy issues impacting AI innovation and access to knowledge.
In our response, we reaffirmed our belief that the use of copyrighted works specifically for AI training (distinct from other AI uses) is a quintessential fair use. We noted that Section 1202(b) of the Copyright Act has little utility and serves as an unnecessary stumbling block to the development of AI. We also highlighted the importance of high quality training data and pointed towards the work that is already being done to develop AI training corpora.
A Few Key Points from Our Submission
Our response to the OSTP highlights several key areas where federal policy can support both authors and a thriving AI research environment:
1. The Role of Fair Use in AI Model Training
We emphasize that fair use has long been a cornerstone of innovation in the U.S.—enabling everything from web search engines to digitization projects. US Copyright law has played a major role in both developing the incredible creative industries homed in the US, as well as driving leading scientific research and commercial innovation. The key to this innovation policy has been a thoughtful balance between providing a degree of control over copyrighted works to copyright holders while allowing for flexibility when it comes to technological innovation and new transformative uses. AI development relies on the ability to analyze large datasets, many of which include copyrighted materials. The uncertainty surrounding the legal status of AI training data due to ongoing litigation threatens to slow innovation. We urge the federal government to explicitly support the application of fair use to AI training and provide much-needed clarity.
2. Addressing the Contractual Override of Fair Use
Many AI developers face contractual barriers that limit their ability to make fair use of content, particularly in text and data mining applications. We recommend legislative measures to prevent contracts from overriding fair use rights, ensuring that AI researchers and developers can continue innovating without undue restrictions.
3. Access to High Quality Datasets
Access to high-quality datasets is a foundational pillar for AI development, enabling models to learn, refine, and iteratively improve. However, the availability of such datasets is often hindered by restrictive licensing agreements, proprietary controls, and inconsistent data standards. To maximize the potential of AI while ensuring ethical and legally sound development, collaborations between academic institutions, libraries, public archives, and technology developers are essential. Government policies should facilitate public-private partnerships that allow for robust and thoughtfully curated datasets, ensuring that AI systems are trained on a rich range of representative materials.
We invite our community of authors, researchers, and policymakers to review our submission. Your engagement is crucial in shaping a responsible and forward-thinking AI policy in the U.S. You can always reach us at info@authorsalliance.org.
This post is by Syn Ong, an LLM student at U.C. Berkeley Law School. This semester, Syn has been working as our intern on a project to examine the legal landscape for text and data mining and AI research across different jurisdictions. If you’re attending AWP this year, stop by our booth to meet Syn and ask about her work!
The Authors Alliance is excited to announce our participation in the 2025 Association of Writers & Writing Programs (AWP) Conference & Bookfair, taking place March 26–29, 2025, at the Los Angeles Convention Center. We invite all attendees to visit us at Booth T524, where we will be available to connect with writers, answer questions, and discuss the latest developments in publishing, copyright, and authorship.
We’ve participated in AWP in past years, engaging with authors on issues that matter most to them. In previous conferences, we’ve hosted discussions on authors’ rights, book contracts, and publishing strategies. This year, we’re continuing these important conversations at our booth, where you can speak with us about protecting your rights as an author, expanding the reach of your work, and navigating today’s evolving publishing landscape.
What We’ve Been Working On
One of the biggest challenges we’ve addressed recently is how artificial intelligence is affecting authorship—from AI-generated content to concerns about copyright and fair compensation. We maintain a resources page on AI to help authors navigate AI’s impact on their work, and we encourage you to check out our summary of the report from the U.S. Copyright Office on copyrightability and AI.
We’ve also been working closely with academics and researchers to make open access publishing more accessible and sustainable. Whether you’re a scholar navigating institutional requirements or an independent researcher looking to share your work widely, we can help you understand your options for open-access publishing.
Protecting Your Rights as an Author
We know that authors care about maintaining control over their work, which is why we provide guidance on areas like negotiating book contracts, rights reversion, and termination of transfer. Whether you’re signing a contract for the first time or looking to regain rights to an older work, we can guide you through the key issues to watch for. We encourage you to check out our resources on rights reversion and termination of transfer for more details. If you have questions about how to advocate for fairer publishing terms, come speak with us at Booth T524.
Stay Connected With Us
Beyond AWP, there are many ways to engage with Authors Alliance. You can become a member to access exclusive consultation and support our advocacy efforts, subscribe to our newsletter for updates on authors’ rights and policy changes, and follow us on social media to stay engaged year-round.
The AWP Conference & Bookfair is a cornerstone event for the literary community, and we’re thrilled to be part of it again this year. We look forward to meeting writers, sharing our expertise, and discussing how we can work together to protect authors’ rights and expand access to knowledge. Join us at Booth T524 March 26–29, 2025—we can’t wait to see you in Los Angeles!
some district courts have applied DMCA 1202(b) to physical copies, including textile, which means if you cut off parts of a fabric that contain copyright information, you could be liable for up to $25,000 in damages
The US Copyright Act has never been praised for its clarity or its intuitive simplicity—at a whopping 460 pages long, it is filled with hotly debated ambiguities and overly complex provisions. The copyright laws of most other jurisdictions aren’t much better.
Because of this complexity of copyright law, the implications of changes to copyright law and policy are not always clear to most authors. As we’ve said in the past, many of these issues seem arcane, and largely escape public attention. Yet entities with a vested interest in maximalist copyright—often at odds with the public interest—are certainly paying attention, and often claim to speak for all authors when they in fact represent only a small subset. As part of our efforts to advocate for a future where copyright law offers ample clarity, certainty, and real focus on values such as the advancement of knowledge and free expression, we would like to share with you two recent projects we undertook:
The 1202 Issue Brief and Amicus Brief in Doe v. Github
Authors Alliance has been closely monitoring the impact of Digital Millennium Copyright Act (DMCA) Section 1202. As we have explained in a previous post, Section 1202(b) creates liability for those who remove or alter copyright management information (CMI) or distribute works with removed CMI. This provision, originally intended to prevent wide-spread piracy, has been increasingly invoked in AI copyright lawsuits, raising significant concerns for lawful use of copyrighted materials beyond training AI. While on its face, penalties for removing CMI might seem somewhat reasonable, the scope of CMI (including a wide variety of information such as website terms of service, affiliate links, and other information) combined with the challenge of including it with all downstream distribution of incomplete copies (imagine if you had to replicate and distribute something like the Amazon Kindle terms of service every time you quoted text from an ebook) could be potentially very disruptive for many users.
In order to address the confusion regarding the (somewhat inaptly named) “identicality requirement” by the courts in the 9th Circuit, we have released an issue brief, as well undertaken to file an amicus brief in the Doe v. Github case now pending in the 9th Circuit.
Here are the key reasons why we care—and why you should care—about this seemingly obscure issue:
The Precedential Nature of Doe v. Github: The upcoming 9th Circuit case, Doe v. GitHub, will address whether Section 1202(b) should only apply when copies made or distributed are identical (or nearly identical) to the original. Lower courts have upheld this identicality requirement to prevent overbroad applications of the law, and the appellate ruling may set a crucial precedent for AI and fair use.
Potential Impact on Otherwise Legal Uses: It is not entirely certain if fair use is a defense to 1202(b) claims. If the identicality requirement is removed, Section 1202(b) could create liability for transformative fair uses, snippet reuse, text and data mining, and other lawful applications. This would introduce uncertainty for authors, researchers, and educators who rely on copyrighted materials in limited, legal ways. We advocate for maintaining the identicality requirement and clarifying that fair use applies as a defense to Section 1202 claims.
Possibility of Frivolous Litigation: Section 1202(b) claims have surged in recent years, particularly in AI-related lawsuits. The statute’s vague language and broad applicability have raised fears that opportunistic litigants could use it to chill innovation, scholarship, and creative expression.
To find out more about what’s at stake, please take a look at our 1202(b) Issue Brief. You are also invited to share your stories with us, on how you have navigated this strange statute.
Reply to the UK Open Consultation on Copyright and AI
We have members in the UK, and many of our US-based members publish in the UK. We have been watching the development in UK copyright law closely, and have recently filed a comment to the UK Open Consultation on Copyright and AI. In our comment, we emphasized the importance of ensuring that copyright policy serves the public interest. Our response’s key points include:
Competition Concerns: We alerted the policy-makers that their top objective must include preventing monopolies forming in the AI space. If licensing for AI training becomes the norm, we foresee power consolidating in a handful of tech companies and their unbridled monopoly permeating all aspects of our lives within a few decades—if not sooner.
Fair Use as a Guiding Principle: We strongly believe that the use of works in the training and development of AI models constitutes fair use under US law. While this issue is currently being tested in courts, case law suggests that fair use will prevail, ensuring that AI training on copyrighted works remains permissible. The UK does not have an identical fair use statute, but has recognized that some of its functions—such as flexibility to permit new technological uses—are valuable. We argue that the wise approach is for the UK to update its laws to ensure its creative and tech sectors can meaningfully participate in the global arena. Our comment called for a broad AI and TDM exception allowing temporary copies of copyrighted works for AI training. We emphasized that when AI models extract uncopyrightable elements, such as facts and ideas, this should remain lawful and protected.
Noncommercial Research Should Be Protected: We strongly advocated for the protection of noncommercial AI research, arguing that academic institutions and their researchers should not face legal barriers when using copyrighted works to train AI models for research purposes. Imposing additional licensing requirements would place undue burdens on academic institutions, which already pay significant fees to access research materials.
Caption: 451 is the http error code when a webpage is unavailable for legal reasons; it is also the temperature at which books catch fire and burn. This public domain image is taken inside the Internet Archive
Imagine this: a high-profile aerospace and media billionaire threatens to sue you for writing an unauthorized and unflattering biography. In the course of writing, you rely on several news articles, including a series of in-depth pieces about the billionaire’s life written over a decade earlier. Given their closeness in time to real events, you quote, sometimes extensively, from those articles in several places.
On the eve of publication, your manuscript is leaked. Through one of his associated companies, the billionaire buys up the copyrights to the articles from which you quote. The next day the company files an infringement lawsuit against you.
Copyright Censorship: a Time-Honored Tradition
It’s easy to imagine such a suit brought by a modern billionaire—perhaps Elon Musk or Jeff Bezos. But using copyright as a tool for censorship is a time-honored tradition. In this case, Howard Hughes tried it out in 1966, using his company Rosemont Enterprises to file suit against Random House for a biography it would eventually publish.
As we’ve seen many times before and since, the courts turned to copyright’s “fair use” right to rescue the biography from censorship. Fair use, the court explained, exists so that “courts in passing upon particular claims of infringement must occasionally subordinate the copyright holder’s interest in a maximum financial return to the greater public interest in the development of art, science and industry.”
Singling out the biographical nature of the work and its importance in surfacing underlying facts, the court explained:
Biographies, of course, are fundamentally personal histories and it is both reasonable and customary for biographers to refer to and utilize earlier works dealing with the subject of the work and occasionally to quote directly from such works. . . . This practice is permitted because of the public benefit in encouraging the development of historical and biographical works and their public distribution, e.g., so “that the world may not be deprived of improvements, or the progress of the arts be retarded.”
Fair use playing this role is no accident. As the Supreme Court has explained, the relationship between copyright and free expression is complicated. On the one hand, the Court has explained, “[T]he Framers intended copyright itself to be the engine of free expression. By establishing a marketable right to the use of one’s expression, copyright supplies the economic incentive to create and disseminate ideas.” But, recognizing that such exclusive control over expression could chill the very speech copyright seeks to enable, the law contains what the Court has described as two “traditional First Amendment safeguards” to ensure that facts and ideas remain available for free reuse: 1) protections against control over facts and ideas, and 2) fair use.
But rescuing a biography that merely quotes, even extensively, from earlier articles seems like an easy call, especially when it seems so clear that the plaintiff has so clearly engineered the copyright suit not to protect legitimate economic interests but to suppress an unpopular narrative.
The world is a little more complicated now. Can fair use continue to protect free expression from excessive enforcement of copyright? I think so, but two key areas are at risk:
Fair Use and the Archives
It may have escaped your notice that large chunks of online content disappear each year.
For years, archivists have recognized and worked to address the problem. Websites going dark is an annoyance for most of us, but in some cases, it can have real implications for understanding recent history, even as officially documented. For example, back in 2013, a report revealed that well over half of the websites linked to in Supreme Court opinions no longer work, jeopardizing our understanding of just what went into why and how the Court decided an issue.
The most well-known bulwark against disappearing internet content is the Internet Archive, which has, at this point, archived over 900 billion web pages. Over and over again, we’ve seen its WayBack Machine used to shine a light on history that powerful people would rather have hidden. It’s also why the WayBack Machine has been blocked or threatened at various times in China, Russia, India, and other jurisdictions where free expression protections are weak.
It’s not just the open web that is disappearing. A recent report on the problem of “Vanishing Culture” highlights how this challenge pervades modern cultural works. Everything from 90s shareware video games to the entirety of the MTV News Archive are at risk. As Jordan Mechner, a contributor to the report explains, “historical oblivion is the default, not the exception” to the human record. As the report explains, it’s not just disappearing content that poses a problem: libraries and consumers must grapple with electronic content that can be remotely changed by publishers or others as well. As just one example among many, in just the last few years we’ve seen surreptitious modifications to ebooks on readers’ devices—some changing important aspects of the plot—for works by authors such as RL Stine, Roald Dahl, and Agatha Christie.
The case for preservation as a foundational necessity to combat censorship is straightforward. “There is no political power without power over the archive,” Jacques Derrida reminds us. Without access to a stable, high-fidelity copy of the historical record, there can be no meaningful reflection on what went right or wrong, or holding to account those in power who may oppose an accurate representation of their past.
What sometimes goes unnoticed is that, without fair use, a large portion of these preservation efforts would be illegal.
In a world where century-long copyright protection applies automatically to any human expression with even a “modicum of creativity,” virtually everything created in the last century is subject to copyright. This is a problem for digital works because practically any preservation effort involves making copies—often lots of them—to ensure the integrity of the content. Making those copies means that archivists must rely on fair use to preserve these works and make them available in meaningful ways to researchers and others.
The upshot is that every time the Internet Archive archives a website, it’s an act of faith in fair use. Is that faith well-founded?
I think so. But the answer is complicated.
For preservation efforts like those of the Internet Archive, fair use is a foundation, but not an unshakable one. Two recent cases highlight the risk, one against its book lending program and the other objecting to its “Great 78” record project. Both take issue with how the Archive provides access to preserved digital copies in its collections. While not directly attacking the preservation of those materials, the suits effectively jeopardize their effective use. As archivists have long lamented, “preservation without access is pointless.”
Beyond direct challenges to fair use, archives are threatened by spurious takedown demands, content removal requests, and legal challenges. Organizations like the Internet Archive have fought back, but many institutions simply cannot afford to, leading to a chilling effect where preservation efforts are scaled back or abandoned altogether.
Compounding this uncertainty is the growing use of technological protection measures (TPMs) and digital rights management (DRM) systems that restrict access to digital works. Under the Digital Millennium Copyright Act (DMCA), circumventing these restrictions is illegal—even for lawful purposes like preservation or research. This creates a paradox where a researcher or archivist may have a clear fair use justification for accessing and copying a work, but breaking an encryption lock to do so could expose them to legal liability.
Additionally, the rise of contractual overrides—such as restrictive licensing agreements on digital platforms—threatens to sideline fair use entirely. Many modern works, including e-books, streaming media, and even scholarly databases, are governed by terms of service that explicitly prohibit copying or analysis, even for noncommercial research. These contracts often supersede fair use rights, leaving archivists and researchers with no legal recourse.
Still, there are reasons for optimism. Courts have generally ruled favorably when fair use is invoked for transformative purposes, such as digitization for research, searchability, and access for disabled users. Landmark decisions, like those in Authors Guild v. Google and Authors Guild v. HathiTrust, upheld fair use in the context of large-scale digital libraries and text-mining projects. These cases suggest that courts recognize the essential role fair use plays in making knowledge accessible, particularly in an era of vast digital information.
Fair Use and the Freedom to Extract
One of copyright’s other traditional First Amendment protections is that the copyright monopoly does not extend to facts or ideas. Fair use is critical in giving life to this protection by ensuring that facts and ideas remain accessible, providing a “freedom to extract” (a term I borrow from law professor Molly Van Houweling’s recent scholarship) even when they are embedded within copyrighted works.
Copyright does not and cannot grant exclusive control over facts, but in practice, extracting those facts often requires using the work in ways that implicate the rightsholder’s copyright. Whether journalists referencing past reporting, historians identifying truths in archival materials, or researchers analyzing a vast corpus of written works, fair use provides the necessary legal space to operate without running afoul of copyright protections for rightsholders.
The need is more urgent than ever given the sheer scale of the modern historical record. In many cases, relying on individual researchers to sift through the record and extract important facts is impractical, if not impossible. Automated tools and processes, including AI and text data mining tools, are now indispensable for processing, retrieving, and analyzing facts from large amounts of massive amounts of text, images, and audio. From uncovering patterns in historical archives to verifying political statements against prior records, these tools serve as extensions of human analysis, making the extraction of factual information possible at an unprecedented scale. However, these technologies depend on fair use. If every instance of text or data mining required explicit permission from rights holders—who may have economic or political incentives to deny access—the ability to conduct meaningful research and discovery would be crippled.
For example, consider a researcher studying the roots of the opioid crisis, trying to mine the 4 million documents in the Opioid Industry Documents Archive—many of them legal materials, internal company communications, and regulatory filings. These documents, made public through litigation, provide critical insights into how pharmaceutical companies marketed opioids, downplayed their risks, and shaped public policy. But making sense of such a massive trove of records is impossible without computational tools that can analyze trends, track key players, and surface hidden patterns.
Without fair use, researchers could face legal roadblocks to applying text and data mining techniques to extract the facts buried within these documents. If copyright law were used to restrict or complicate access to these records, it would not only hamper academic research but also shield corporate and governmental actors from exposure and accountability.
Conclusion
As information continues to proliferate across digital media, fair use remains one of the few safeguards ensuring that historical records and cultural artifacts do not become permanently locked away behind copyright barriers. It allows the past to be examined, challenged, and understood. If we allow excessive copyright restrictions to limit the ability to extract and analyze our shared past and culture, we risk not only stifling innovation but also eroding our collective ability to engage with history and truth.
Fair Use Week
This is my contribution to Fair Use Week. The read the other excellent posts from this week, check out Kyle Courtney’s Harvard Library Fair Use Week blog here.
In December 2024 we announced a new project to develop a public interest AI training corpus focused on books. Over the last few months we’ve been actively engaging a diverse set of stakeholders in the development of The Public Interest Corpus.
The Public Interest Corpus is focused on developing large-scale, high-quality AI training data from the world’s memory organizations that serve the public interest. In the aggregate, memory organizations like libraries and archives are in a prime position to address this need given a multi-century focus on developing high-quality, locally and globally comprehensive collections of books, newspapers, scholarly journals, photographs, manuscript materials, and more. We seek to prioritize uses of The Public Interest Corpus that promote learning, access to knowledge, and broad benefits to the public.
Project Team and Advisory Board
The project team consists of Dave Hansen, Executive Director of Authors Alliance and Dan Cohen, Vice Provost for Information Collaboration, Dean of the Library, and Professor of History at Northeastern University. In January, I joined the team as the Public Interest AI Strategist. In this capacity I will leverage extensive experience developing community around responsible computational use of memory organization collections as data and responsible AI. Giulia Taurino, recently joined the team as Project Coordinator. Giulia holds a doctoral degree in Media Studies and Visual Arts from the University of Bologna and the University of Montreal and is currently a member of the NULab for Digital Humanities and Computational Social Science and of AI & Arts interest group at The Alan Turing Institute.
The project team is guided by a strong advisory board composed of senior leaders and experts who think deeply about how authors, libraries, and AI can better serve the public interest.
David Bamman, Associate Professor, UC Berkeley School of Information
Sandra Aya Enimil, Director of Scholarly Communications and Collection Strategy, Yale University Library
Suzanne Wones, University Librarian, UC Berkeley Library
Ted Underwood, Professor of Information Science and English, University of Illinois at Urbana Champaign
How you can get involved
Over the next year the project team will engage a diverse set of stakeholders in a co-development process that directly informs The Public Interest Corpus priorities, strategies, and partnerships. To kick things off we are holding a working event at Northeastern University Library in Boston, Massachusetts on March 3 where a group of senior library administrators, publishers, disciplinary researchers, authors, and technical experts will workshop core legal, technical, business model, and governance challenges.
Moving forward we intend to hold additional focused in-person and virtual working events with a broad range of communities. We strongly believe that engaging with diverse stakeholders in a co-development process for this effort will be key to success. If you are interested in participating in a future event, hosting a Public Interest Corpus event, or have other ideas for how we might collaborate please let us know via the following form.
We look forward to advancing a public interest solution with you all.
Last November, we covered a case where a group of authors complained about McGraw Hill’s interpretation of publishing agreements related to compensation for ebooks. As subscription-based models become increasingly dominant in the publishing industry, authors must be vigilant about how their contracts define compensation. Platforms like Kindle Unlimited, Audible, and academic ebook services are reshaping traditional royalty structures. This is not just a concern for trade books; academic publishing is also shifting towards subscription-based access, as evidenced by ProQuest’s recent announcement that it is ending print sales and moving toward a “Netflix for books” model.
Here we see yet another case where ambiguous contractual terms resulted in financial loss for an author—
On Feb. 19th, the Second Circuit affirmed the lower court’s dismissal of Teri Woods Publishing’s copyright infringement and breach-of-contract claims against Audible and other audiobook distributors in Teri Woods Publ’g, LLC v. Amazon.com, Inc. The Plaintiff initially granted the rights (that are the subject of this dispute) to Urban Audios in a licensing agreement. Thereafter, Urban Audio granted the rights under that agreement to Blackstone, which then sublicensed its rights to Amazon and Audible.
The Plaintiff in this case, Teri Woods Publishing, is an independent publisher founded by urban fiction author Teri Woods. The Plaintiff argued—and the courts ultimately disagreed—that the licensing agreement did not unambiguously permit Defendants to distribute Teri Woods’ audiobooks through the Defendants’ online audiobook streaming subscription services. More specifically, on the question of compensation for online streaming, Plaintiff and Defendants disagreed on whether (1) online streaming counted as “internet downloads” or alternatively “other contrivances, appliances, mediums and means,” and (2) the licensing terms dealing with royalties prohibit subscription streaming.
The licensing terms in question are contained in the licensing agreement Plaintiff entered into in 2018, granting Urban Audios the
“exclusive unabridged audio publishing rights, to manufacture, market, sell and distribute copies throughout the World, and in all markets, copies of unabridged readings of the [Licensed Works] on cassette, CD, MP3-CD, pre-loaded devices, as Internet downloads and on, and in, other contrivances, appliances, mediums and means (now known and hereafter developed) which are capable of emitting sounds derived for the recording of audiobooks.”
In exchange of this assignment of rights, Urban Audio—as the Licensee—must pay Plaintiff:
“(a) Ten percent (10%) of Licensee’s net receipts from catalog, wholesale and other retail sales and rentals of the audio recordings of said literary work;
(b) Twenty Five percent (25%) of net receipts on all internet downloads of said literary work.
(c) Twenty Five percent (25%) of net receipts on Playaway format [under certain conditions].”
In case you are not familiar with the services Amazon Audible provides: members of Audible generally pay a monthly fee to digitally stream or download audiobooks, instead of making any specific payment for the specific audiobooks they are streaming or downloading. This method of distribution, the Plaintiff argues, led to drastically lower compensation than expected, as the audiobooks were made available to subscribers at a fraction of their retail price.
Audible has a history of relying on ambiguous contractual terms to reduce author payouts. The “Audiblegate” controversy, for instance, exposed how Audible’s return policy allowed listeners to return audiobooks after extensive use, deducting royalties from authors without transparency. That practice came under legal scrutiny inn Golden Unicorn Enters. v. Audible Inc., where authors alleged that Audible deliberately structured its payment model to significantly reduce their earnings (unfortunately, the court in that case also largely sided with Audible)
Despite Audible’s track record, the courts were unsympathetic to Plaintiff’s grievance in the Teri Woods case, and held that the plain meaning of the phrase “other contrivances, appliances, mediums and means (now known and hereafter developed)” in the licensing agreement included digital streams and other future technological developments in distribution services. The courts also observed that the underlying licensing agreement did not provide for the payment of royalties on a per-unit basis; Plaintiff was only entitled to a percentage of “net receipts” received by Urban Audio for sales, rentals, and internet downloads.
The ambiguity in defining what constitutes an “internet download,” and whether payment was due on a per unit basis, ultimately were interpreted in favor of Audible. This case serves to remind us again of the importance of adopting clear contractual language.
Licensing agreements should be drafted with clear and precise language regarding revenue models and payment structures. Subscription-based compensation models, like those employed by Audible, fundamentally differ from traditional sales models, often leading to lower per-unit earnings for authors. By failing to anticipate and address these nuances, authors risk losing control over how their works are monetized. Ensuring that rights, distribution methods, and payment structures are clearly defined can prevent disputes and financial losses down the line.
Many authors assume that digital rights are similar to traditional print rights, but as this case demonstrates, vague phrasing can allow distributors to exploit gaps in understanding. If authors do not explicitly outline limitations on emerging distribution technologies, they may find themselves receiving significantly less compensation than they anticipate when signing the agreement. For example, authors should ensure their contracts specify whether subscription-based revenue falls under traditional royalty calculations, and whether distribution via new technological formats require renegotiation. Beyond the issues with ambiguous contractual terms, this case also highlights the broader issue of how digital platforms can negatively impact readers and authors alike. Readers no longer own the books they purchase; instead, they receive licensed access that can be revoked or restricted at any time. This shift undermines the traditional relationship between books and their readers. Authors are equally threatened by these digital intermediaries, who have the power to dictate distribution methods and unilaterally alter revenue models; an author’s right to fair compensation is too often sacrificed along the way. The situation is especially dire with audiobooks, where Audible dominates the market.
Uncopyrightable image generated using Google Gemini, illustrating a group of photographers excited to learn that their nearly identical photos of the public domain Washington Monument are all copyrightable) (“The Office receives ten applications, one from each member of a local photography club. All of the photographs depict the Washington Monument and all of them were taken on the same afternoon. Although some of the photographs are remarkably similar in perspective, the registration specialist will register all of the claims.”) (Compendium of Copyright Office Practices, Section 909.1)
In our comments, we urged the Copyright Office to not pursue revisions to the Copyright Act at this time and instead work towards providing greater clarity for authors of AI-generated and AI-assisted works (“Instead of proposing revisions to the Copyright Act to enshrine the human authorship requirement in law or clarify the human authorship requirement in the context of AI-generated works, the Office should continue to promulgate guidance for would-be registrants.”) We also noted that, as technology evolves in the coming years, our ideas about the copyrightability of AI-generated and AI-assisted works will likely shift as well.
We are happy to see that the USCO heard our voice and that of many others regarding no need for legislative change at this time (“The vast majority of commenters agreed that existing law is adequate in this area…”) (Report, page ii). We likewise continue to be aligned with the USCO’s view that works wholly generated by Artificial Intelligence are not copyrightable. In reading through the entirety of the report, it is clear that the Office appreciates that some elements of AI-assisted works will be copyrightable, but believes that the level of human control over the AI output will be central to the copyrightability inquiry (“Whether human contributions to AI-generated outputs are sufficient to constitute authorship must be analyzed on a case-by-case basis.”) (“Based on the functioning of current generally available technology, prompts do not alone provide sufficient control.”) (Report, page iii)
The Office’s report does provide some useful clarity. At the same time, it takes some positions that fail to adequately address the complexity of AI-generated works. Below, we will unpack a number of elements of the report that are noteworthy.
Modifying or arranging AI-generated content
The report makes it clear that the USCO views selection and arrangement of AI-generated work as a viable path towards copyrightability of works where AI was an element in the creation of the work. In 2023, when reviewing the graphic novel Zarya of the Dawn, “the Office concluded that a graphic novel comprised of human-authored text combined with images generated by the AI service Midjourney constituted a copyrightable work, but that the individual images themselves could not be protected by copyright.” (Copyright Registration Guidance: Works Containing Material Generated by Artificial Intelligence, page 2) Thus, authors who incorporate AI-generated work into a larger work will often be successful in registering the whole work, but will typically need to disclaim any AI-generated elements.
Alternatively, an author who modifies an AI-generated work outside of the AI environment (e.g., an artist who uses Photoshop to make substantial modifications to an AI-generated image), will usually have a path to copyright registration with the USCO.
The USCO takes the position that most AI-assisted works are not copyrightable
Unlike an AI-generated image later modified manually by a human (which may be copyrightable), when prompt-based modifications to AI generated works are performed entirely within the AI environment, it is clear that the USCO is reluctant to view the resulting work as copyrightable.
Here, the Office’s position regarding Jason Allen’s attempts to register copyright in the two dimensional artwork Théâtre D’opéra Spatial is illuminating. In developing the image using Midjourney, Allen claimed to have used over 600 text prompts to both generate and alter the image, and further used Photoshop to “beautify and adjust various cosmetic details/flaws/artifacts, etc.,” a process which he viewed as copyrightable authorship. In denying his claim, the Office responded that “when an AI technology receives solely a prompt from a human and produces complex written, visual, or musical works in response, the ‘traditional elements of authorship’ are determined and executed by the technology—not the human user.” (88 FR 16190 – Copyright Registration Guidance: Works Containing Material Generated by Artificial Intelligence, page 16192).
Within the report, there is no direct examination of the Théâtre D’opéra Spatial copyright claim and lessons to be learned from it. This is likely due to ongoing litigation between Allen and the USCO. While the USCO has significant practical influence on what materials are protectable under copyright, ultimately the decision falls to the courts. So, this suit and others like it will be important to watch. Still, the lack of a deeper dive into such a real-world example is unfortunate—such examples offer fertile territory for exploring the boundary lines between copyrightable AI-assisted works and those that will remain uncopyrightable.
The report offers a sense of possibility with regard to copyrightable AI-assisted works
Yet, the Office also acknowledges that there are remaining unanswered questions (“So I know that everyone in their particular area of creativity is looking for, you know, more examples and brighter lines. And I think at this point in time, we’re going to be learning as everyone else is learning…we will be providing more guidance as we learn more.”) (Webinar Transcript, Robert Kasunic, page 10) This recognition that the USCO, like everyone, is still learning is refreshing and welcome, given that it’s fairly easy to see that there are murky waters all around. AI-generated works are already frequently a complex hybrid of AI expression and human expression.
What are some of these questions?
The technology is still developing and it seems likely that the legal complexity will become even more pronounced as sophisticated generative AI evolves to respond to fine-grained feedback from users, while also offering expression and suggestions that many users will ultimately adopt. Navigating this complexity will be challenging and will require answering a fundamental question: what is the threshold level of human control over AI-generated expression that is necessary as a prerequisite for copyright protection?
Similarly, what standards might the Copyright Office or the courts develop to prove sufficient human authorship when it is intermingled with AI-generated content? The copyright registration process currently requires very little information and no documentation related to this question. For now, creators don’t have clear guidance on what types of documentation will be most effective if a future dispute arises.
To the extent that protection does exist in human-guided, but AI-produced content, how will or should the courts determine what are uncopyrightable, AI-generated elements in what will appear to users as a single unified work? Separating human expression that is enmeshed and embedded within uncopyrightable AI expression will require some framework for distinguishing the two in cases of infringement. Although the courts have already developed methods that may shape this (selection, filtration, abstraction, for example) it remains far from clear whether such tests will perform adequately for AI-produced content
We will be watching developments in this space closely and will continue to advocate for reasonable and flexible approaches to copyrightability that align with the practical realities of authorship in an emerging technological landscape.