Category Archives: Blog

Copyright Office Holds Listening Session on Copyright Issues in AI-Generated Music and Sound Recordings

Posted June 2, 2023
Photo by Possessed Photography on Unsplash

Earlier this week, the Copyright Office convened a listening session on the topic of copyright issues in AI-generated music and sound recordings, the fourth in its listening session series on copyright issues in different types of AI-generated creative works. Authors Alliance participated in the first listening session on AI-generated textual works, and we wrote about the second listening session on AI-generated images here. The AI-generated music listening session participants included music industry trade organizations like the Recording Industry Association of America, Songwriters of North America, and the National Music Publishers’ Association; generative AI music companies like Boomy, Tuney, and Infinite Album; music labels like the Universal Music Group and Wixen; and individual musicians, artists, and songwriters. Streaming service Spotify and collective-licensing group SoundExchange also participated. 

Generative AI Tools in the Music Industry

Many listening session participants discussed the fact that some musical artists, such as Radiohead and Brian Eno, have been using generative AI tools as part of their work for decades. For those creators, generative AI music is nothing new, but rather an expansion of existing tools and techniques. What is new is the ease with which ordinary internet users without musical training can assemble songs using AI tools—programs like Boomy enable users to generate melodies and musical compositions, with options to overlay voices or add other sounds. Some participants sought to distinguish generative tools from so-called “assistive tools,” with the latter being more established for professional and amateur musicians. 

Where some established artists themselves have long relied on assistive AI tools to create their works, AI-generated music has lowered barriers to entry for music creation significantly. Some take the view that this is a good thing, enabling more creation by more people who could not otherwise produce music. Others protest that those with musical talent and training are being harmed by the influx of new participants in music creation, as these types of songs flood the market. In my view, it’s important to remember that the purpose of copyright, furthering the progress of science and the useful arts, is served when more people can generate creative works, including music. Yet AI-generated music may already be at or past the point where it can be indistinguishable from works created by human artists without the use of these tools, at least to some listeners. It may be the case that, as at least one participant suggested, audio generated works are somehow different from AI-generated textual works such that they may require different forms of regulation. 

Right of Publicity and Name, Image, and Likeness

Although the topic of the listening session was federal copyright law, several participants discussed artists’ rights in both their identities and voices—aspects of the “right of publicity” or the related name, image, and likeness (“NIL”) doctrine. These rights are creatures of state law, rather than federal law, and allow individuals, particularly celebrities, to control what uses various aspects of their identities may be put to. In one well-known right of publicity case, Ford used a Bette Midler “sound alike” for a car commercial, which was found to violate her right of publicity. That case and others like it have popularized the idea that the right of publicity can cover voice. This is a particularly salient issue within the context of AI-generated music due to the rise of “soundalikes” or “voice cloning” songs that have garnered substantial popularity and controversy, such as the recent Drake soundalike, “Heart on My Sleeve.” Some worry that listeners could believe they are listening to the named musical artist when in fact they are listening to an imitation, potentially harming the market for that artist’s work. 

The representative from the Music Artists Coalition argued that the hodge podge of state laws governing the right of publicity could be one reason why soundalikes have proliferated: different states have different levels of protection, and the lack of unified guidance on how these types of songs are governed under the law can create uncertainty as to how they will be regulated. And the representative from Controlla argued that copyright protection should be expanded to cover voice or identity rights. In my view, expanding the scope of copyright in this way is neither reasonable nor necessary as a matter of policy (and furthermore, would be a matter for Congress, and not the Copyright Office, to address), but it does show the breadth of the soundalike problem for the music industry. 

Copyrightability of AI-Generated Songs

Several listening session participants argued for intellectual property rights in AI-generated songs, and others argued that the law should continue to center human creators. The Copyright Office’s recent guidance regarding copyright in AI-generated works suggests that the Office does not believe that there is any copyright in the AI-generated materials due to the lack of human authorship, but human selection, editing, and compilation can be protected. The representatives from companies with AI-generating tools expressed a need for some form of copyright protection for the songs these programs produce, explaining that they cannot be effectively commercialized if they are not protected. In my view, this can be accomplished through protection for the songs as compilations of uncopyrightable materials or as original works, owing to human input and editing. Yet, as many listening session participants across these sessions have argued, the Copyright Office registration guidance does not make clear precisely how much human input or editing is needed to render an AI-generated work a protectable original work of authorship. 

Licensing or Fair Use of AI Training Data

In contrast to the view taken by many during the AI-generated text listening session, none of the participants in this listening session argued outright that training generative AI programs on in-copyright musical works was fair use. Instead, much of the discussion focused on the need for a licensing scheme for audio materials used to train generative AI audio programs. Unlike the situations with many text and image-based generative AI programs, the representatives from generative AI music programs expressed an interest and willingness to enter into licensing agreements with music labels or artists. In fact, there is some evidence that licensing conversations are already taking place. 

The lack of fair use arguments during this listening session may be due to the particular participants, industry norms, or the “safety” of expressing this view in the context of the music industry. But regardless, it provides an interesting contrast to views around training data text-generating programs like ChatGPT, which many (including Authors Alliance) have argued are fair uses. This is particularly remarkable since at least some of these programs, in our view, use the audio data they are trained on for a highly transformative purpose. Infinite Album, for example, allows users to generate “infinite music” to accompany video games. The music reacts to events in the video game—becoming more joyful and upbeat for victories, or sad for defeats—and can even work interactively for those streaming their games, where those watching the stream can temporarily influence the music. This seems like precisely the sort of “new and different purpose” that fair use contemplates, and similarly like a service that is unlikely to compete directly with individual songs and records. 

Generative AI and Warhol Foundation v. Goldsmith

Many listening session participants discussed the interactions between how AI-generated music should be regulated under copyright law and the recent Supreme Court fair use decision in Warhol Foundation v. Goldsmith (you can read our coverage of that decision here), which also considered whether a particular use which could have been licensed was fair use. And some participants argued that the decision in Goldsmith makes it clear that training generative AI models (i.e., the input stage) is not a fair use under the law. It is not clear precisely how the decision will impact the fair use doctrine going forward, particularly as it applies to generative AI, and I think it is a stretch to call it a death knell for the argument that training generative AI models is a fair use. However, the Court did put a striking emphasis on the commerciality of the use in that case, deemphasizing the transformativeness inquiry somewhat. This could impact the fair use inquiry in the context of generative AI programs, as these programs tend overwhelmingly to be commercial, and the outputs they create can and are being used for commercial purposes. 

Athena Unbound and Untangling the Law of Open Access

Posted May 26, 2023

A few months ago, Authors Alliance and the Internet Archive co-hosted an engaging book talk featuring historian Peter Baldwin and librarian Chris Bourg. They discussed Baldwin’s new book, Athena Unbound: Why and How Scholarly Knowledge Should be Free For All. You can watch the recording of the talk here and access the book for free in open access format here.

Today, I’m beginning a series of posts aimed at clarifying legal issues in open access scholarship. Reflecting on some key takeaways from Athena Unbound seemed like a great place to start.

For those already well-versed in the open access community, you know that there is an abundance of literature covering the theory, economics, and sociological dimensions of OA. But, it’s easy to lose the forest for the trees.  Athena Unbound stands out by providing a comprehensive, high-level explanation of how we have reached the current state of open access affairs. The book offers much more than just commentary on the underlying legal structures that impact access to scholarly works. But, as we delve deeper into the legal aspects of open access in this series, I want to highlight three key takeaways on this issue:

  1. Copyright law does not cater to most academic authors.

“Open access does not seek to dispossess authors of their property nor to stint them of their rightful earnings. But authors are not all alike. Those whose creativity supplies their livelihood are entitled to the fruits of their labor. But most authors either do not make a living from their work or are already supported in other ways.” – Athena Unbound, Chapter 2, “The Variety of Authors and Their Content”

In theory, copyright law in the United States is designed to incentivize the creation of new works by granting strong and long-lasting economic rights. This framework assumes authors primarily function as independent operators (Baldwin likens them to “bohemian artistes”) who can negotiate these rights with publishers or directly with members of the public in exchange for financial support.

However, this framework does not align with the reality faced by most academic authors, who number in the millions. While scholarly authors deserve compensation for their work, their remuneration also often comes from sources like university employment. Their motivation to create stems from incentives to share ideas and discoveries with the world, as well as personal gains such as recognition and career advancement. For these authors, the publishing system and the laws that govern it have clash with their interests to such an extent that we now witness academic authors willingly paying thousands of dollars to persuade publishers to distribute their articles for free.

If anything, copyright law, with its excessively long duration, extensive economic control, and limited freedom for researchers to engage with creative works, hampers those authors’ goals in practice. As Baldwin explains, “the fundamental problem open access faces is worth restating. Copyright has become bloated, prey to the rent-seeking academic publishing industry… Legislators, dazzled into submission by the publishing industry’s success in portraying itself as the defender of creativity and cultural patrimony, bear much responsibility.”

As we explore the legal mechanisms that influence open access, it is crucial to remember that the default rules of the system are more often than not at odds with the goals of open access authors. 

  1. Open access must encompass more than contemporary scientific articles.

While much of the current open access discourse revolves around providing access to the latest scholarly research, particularly scientific articles, there is a vast amount of past scholarship that remains inaccessible. An inclusive approach to open access should address how to provide access to these works as well. The majority of research library holdings are not available online in any form. Baldwin uses the term “grey literature” to describe the extensive collections in research libraries that are no longer commercially available. As he points out, most books lose commercial viability rather quickly. “Of the 10,000 US books published in 1930, only 174 were still in print in 2001. Of the 63 books that won Australia’s Miles Franklin prize over the past half-century, ten are unavailable in any format.”

Many of these works have become so-called orphan works: they are so detached from the commercial marketplace that their publishers have gone out of business, authors have passed away, and any remaining rights holders who would benefit from potential sales are obscure, if they exist at all. Even Maria Pallante, former Register of Copyrights and current AAP president, agrees that in the case of true orphan works, “it does not further the objectives of the copyright system to deny use of the work, sometimes for decades. In other words, it is not good policy to protect a copyright when there is no evidence of a copyright owner.”

In addition to this issue around orphan works, a subset of what is known as the “20th Century black hole,” Athena Unbound also sheds light on the various concerns and challenges that act as barriers to open access in scholarly fields outside of the sciences. While the goals of open access may be the same across these different areas, the implementation can vary significantly. In the case of certain scholarly works, such as older books entangled in complex rights issues, we may need to settle for an imperfect form of “open,” such as read-only viewing via controlled digital lending—a far cry from what many consider true open access.

  1. The intricacies of ownership are significant.

Although this is not the primary focus of Athena Unbound, it is an important aspect that deserves attention. In simple terms, the legal pathway to open access appears straightforward: authors, often depicted as individual, independent actors, must retain sufficient rights to allow them to legally share and allow reuse of their writing.

However, reality is far more complex. Multiple-authored works, including in extreme cases thousands of joint authors on one scientific article, can complicate our understanding of who actually holds a copyright interest in a work and can therefore authorize an open license on it. 

Moreover, many if not most academic authors are employed by colleges or universities, each with its own perspective on copyright ownership of scholarly publications. In most cases, as Baldwin explains, universities have been hesitant to assert ownership of scholarly publications under the work-for-hire doctrine (a topic I will cover in a subsequent post), possibly based on the increasingly tenuous “teacher exception” to the work-for-hire doctrine. However, this approach is not universally adopted. For instance, some universities assert ownership of specific categories of scholarly work, such as articles produced under grant-funded projects. Others reserve broad licenses to use scholarly work for university purposes, albeit with ill-defined parameters.

Open access, or at least the type we commonly think of—copyrighted articles typically licensed under Creative Commons or similar licenses—depends heavily on obtaining affirmative permission from the rightsholder. But the identity of the rightsholder, whether it be the university, author, or even the funder, can vary significantly due to a wide range of factors, including state laws, university IP policies, and funder grant contracts. 

Stay tuned for more in this series, and if you have questions in the meantime, check out our open access guide and resource page.

Supreme Court Issues Decisions in Warhol Foundation and Gonzalez

Posted May 19, 2023
Photo by Claire Anderson on Unsplash

Yesterday, the Supreme Court released two important decisions in Warhol Foundation v. Goldsmith and Gonzalez v. Google—cases that Authors Alliance has been deeply invested in, submitting amicus briefs to the Court in both cases. 

Warhol Foundation v. Goldsmith and Transformativeness

First, the Court issued its long-awaited opinion in Warhol Foundation v. Goldsmith, a case Authors Alliance has been following for years, and for which we submitted an amicus brief last summer. The case concerned a series of screen prints of the late musical artist Prince created by Andy Warhol, and asked whether the creation and licensing of one of the images, an orange screen print inspired by Goldsmith’s black and white photograph (which the Court calls “Orange Prince”), constituted fair use. After the Southern District of New York found for the Warhol Foundation on fair use grounds, the Second Circuit overturned the ruling, finding that the Warhol Foundation’s use constituted infringement. The sole question before the Supreme Court was whether the first factor in fair use analysis, the purpose and character of the use, favored a finding of fair use. 

To our disappointment, the Supreme Court’s majority agreed with the holding of the district court, finding that the purpose and character of Warhol’s use favored Goldsmith, such that it did not support a finding of fair use. This being said, the decision focused narrowly on the Warhol Foundation’s “commercial licensing of Orange Prince to Condé Nast,” expressing “no opinion as to the creation, display, or sale of any of the original Prince Series works.” Because the Court cabins its opinion, focusing specifically on the licensing of Orange Prince to Condé Nast rather than the creation of the entire Prince series, the decision is less likely to have a deleterious effect on the fair use doctrine generally than a broader decision would have. 

Writing for the majority, Justice Sotomayor argued that Goldsmith’s photograph and the Prince screen print in question shared the same purpose, “portraits of Prince used to depict Prince in magazine stories about Prince.” Moreover, the Court found the use to be commercial, given that the screen print was licensed to Condé Nast. Justice Sotomayor explained that “if an original work and secondary use share the same or highly similar purposes, and the secondary use is commercial, the first fair use factor is likely to weigh against fair use, absent some other justification for copying.” Justice Sotomayor found that the two works shared the same commercial purpose, and therefore concluded that factor one favored Goldsmith. 

Justice Kagan, joined by Chief Justice Roberts, issued a strongly worded dissenting opinion. The dissent admonished the majority for its departure from Campbell’s “new meaning or message test,” an inquiry that Authors Alliance advocated for in our amicus brief. Justice Kagan further criticized the majority’s shifting focus towards commerciality, arguing that the fact that the use was a licensing transaction should not be given so much importance in the analysis. While Authors Alliance agrees with these points, we are less sure that the majority’s decision goes so far as to “constrain creative expression” or “threaten[] the creative process. And while it’s uncertain what effect this case will have on the fair use doctrine more generally, one important takeaway is that the question of whether the use in question is commercial in nature—a consideration under the first factor—has been elevated to one of greater importance. 

While we thought this case offered a good opportunity for the Court to affirm a more nuanced approach to transformative use, we much prefer the Supreme Court’s approach to the Second Circuit’s decision, and applaud the Court on confining its ruling to the narrow question at issue. The holding does not, in our view, radically alter the doctrine of fair use or disrupt a bulk of established case law. Moreover, some aspects of arguments we made in our brief—such as the notion that transformativeness is a matter of degree, not a binary—are present in the Court’s decision. This is a good thing, in our view, as it will allow for more nuanced consideration of a use’s character and purpose, and stands in contrast to the Second Circuit’s all or nothing view of transformativeness. 

Gonzalez v. Google and the Missing Section 230

Also yesterday, the Court released its opinion in Gonzalez v. Google, a case that generated much attention because of its potential threat to Section 230, and another case in which Authors Alliance submitted an amicus brief. The case asked whether Google could be held liable under an anti-terrorism statute for harm caused by ISIS recruitment videos that YouTube’s algorithm recommended. In its per curiam decision (a unanimous one without a named Justice as author), the Court stated that Gonzalez’s complaint had failed to state a viable claim under the relevant anti-terrorism statute. Therefore, it did not reach the question of the applicability of Section 230 to the recommendations at issue. In other words, a case that generated tremendous concern about the Court disturbing Section 230 and harming internet creators, communities, and services that relied on it ended up saying nothing at all about the statute. 

Authors Alliance Welcomes Christian Howard-Sukhil as Text Data Mining Legal Fellow

Posted May 12, 2023

As we mentioned in our blog post on our Text Data Mining: Demonstrating Fair Use project a few weeks back, Authors Alliance is pleased to have Christian Howard-Sukhil on board as our brand new Text Data Mining legal fellow. As part of our project, generously funded by the Andrew W. Mellon Foundation, we established this new fellowship to provide research and writing support for our project. Christian will help us produce guidance for researchers and a report on the usability, successes, and challenges of the text data mining exemption to Section 1201’s prohibition on bypassing technical protection measures that Authors Alliance obtained in 2021. Christian begins her work with Authors Alliance this week, and we are thrilled to have her. 

Christian holds a PhD in English Language and Literature from the University of Virginia, and has just completed her second year of law school at UC Berkeley. Christian has extensive digital humanities and text data mining experience, including in previous roles at UVA and Bucknell University. Her work with Authors Alliance will focus on researching and writing about the ways that current law helps or hinders text and data mining researchers in the real world. She will also contribute to our blog—look out for posts from her later this year.

About her new role at Authors Alliance, Christian says, “I am delighted to join Authors Alliance and to help support text and data mining researchers navigate the various legal hurdles that they face. As a former academic and TDM researcher myself, I saw first-hand how our complicated legal structure can deter valid and generative forms of TDM research. In fact, these legal issues are, in part, what inspired me to attend law school. So being able to work with Authors Alliance on such an important project—and one so closely tied to my own background and interests—is as exciting as it is rewarding.”

Please join us in welcoming Christian!

Book Talk: Against Progress by Jessica Silbey

Posted May 8, 2023

Join journalist MARIA BUSTILLOS for a virtual book talk with author & professor of law JESSICA SILBEY for her latest book, AGAINST PROGRESS.

REGISTER NOW

When first written into the Constitution, intellectual property aimed to facilitate “progress of science and the useful arts” by granting rights to authors and inventors. Today, when rapid technological evolution accompanies growing wealth inequality and political and social divisiveness, the constitutional goal of “progress” may pertain to more basic, human values, redirecting IP’s emphasis to the commonweal instead of private interests.

Against Progress considers contemporary debates about intellectual property law as concerning the relationship between the constitutional mandate of progress and fundamental values, such as equality, privacy, and distributive justice, that are increasingly challenged in today’s internet age. Following a legal analysis of various intellectual property court cases, Jessica Silbey examines the experiences of everyday creators and innovators navigating ownership, sharing, and sustainability within the internet eco-system and current IP laws. Crucially, the book encourages refiguring the substance of “progress” and the function of intellectual property in terms that demonstrate the urgency of art and science to social justice today.

Purchase Against Progress from Stanford University Press.

JESSICA SILBEY is Professor of Law at the Boston University School of Law. She is the author of Against Progress: Intellectual Property and Fundamental Values in the Internet Age (Stanford, 2022), The Eureka Myth: Creators, Innovators, and Everyday Intellectual Property (Stanford, 2015), and was a Guggenheim Fellow in 2018.

BOOK TALK: AGAINST PROGRESS
May 9 @ 10am PT / 1pm ET
Register now for the free, virtual event

An Update on our Text and Data Mining: Demonstrating Fair Use Project

Posted April 28, 2023

Back in December we announced a new Authors Alliance’s project, Text and Data Mining: Demonstrating Fair Use, which is about lowering and overcoming legal barriers for researchers who seek to exercise their fair use rights, specifically within the context of text data mining (“TDM”) research under current regulatory exemptions. We’ve heard from lots of you about the need for support in navigating the law in this area. This post gives a few updates. 

Text and Data Mining Workshops and Consultations

We’ve had a tremendous amount of interest and engagement with our offers to hold hands-on workshops and trainings on the scope of legal rights for TDM research. Already this spring, we’ve been able to hold two workshops in the Research Triangle hosted at Duke University, and a third workshop at Stanford followed by a lively lunch-time discussion. We have several more coming. Our next stop is in a few weeks at the University of Michigan, and we have plans in the works for workshops in the Boston area, New York, a few locations on the West Coast, and potentially others as well. If you are interested in attending or hosting a workshop with TDM researchers, librarians, or other research support staff, please let us know! We’d love to hear from you. The feedback so far has been really encouraging, and we have heard both from current TDM researchers and those for whom the workshops have opened their eyes to new possibilities. 

ACH Webinar: Overcoming Legal Barriers to Text and Data Mining
Join us! In addition to the hands-on in-person workshops on university campuses, we’re also offering online webinars on overcoming legal barriers to text and data mining. Our first is hosted by the Association for Computers and the Humanities on May 15 at 10am PT / 1pm ET. All are welcome to attend, and we’d love to see you online!
Read more and register here. 

Research 

A second aspect of our project is to research how the current law can both help and hinder TDM researchers, with specific attention to fair use and the DMCA exemption that Authors Alliance obtained for TDM researchers to break digital locks when building a corpus of digital content such as ebooks or DVDs.

Christian Howard-Sukhil, Authors Alliance Text and Data Mining Legal Fellow

To that end, we’re excited to announce that Christian Howard-Sukhil will be joining Authors Alliance as our Text and Data Mining Legal Fellow. Christian holds a PhD in English Language and Literature from the University of Virginia and is currently pursuing a JD from the UC Berkeley School of Law. Christian has extensive digital humanities and text data mining experience, including in previous roles at UVA and Bucknell University. Her work with Authors Alliance will focus on researching and writing about the ways that current law helps or hinders text and data mining researchers in the real world. 

The research portion of this project is focused on the practical implications of the law and will be based heavily on feedback we hear from TDM researchers. We’ve already had the opportunity to gather some feedback from researchers including through the workshops mentioned above, and plan to do more systematic outreach over the coming months. Again, if you’re working in this field (or want to but can’t because of concerns about legal issues), we’d love to hear from you. 

At this stage we want to share some preliminary observations, based on recent research into these issues (supported by the work of several teams of student clinicians) as well as our recent and ongoing work with TDM researchers:

1) Licenses restrictions are a problem. We’ve heard clearly that licenses and terms of use impose a significant barrier to TDM research. While researchers are able to identify uses that would qualify as fair use and also many uses that likely qualify under the DMCA exemption, terms of use accompanying ebook licenses can override both. These terms vary, from very specific prohibitions–e.g., Amazon’s, which says that users “may not attempt to bypass, modify, defeat, or otherwise circumvent any digital rights management system”–to more general prohibitions on uses that go beyond the specific permissions of the license–e.g., Apple’s terms, which state that “No portion of the Content or Services may be transferred or reproduced in any form or by any means, except as expressly permitted.” Even academic licenses, often negotiated by university libraries to have  more favorable terms, can still impose significant restrictions on reuse for TDM purposes. Although we haven’t heard of aggressive enforcement of those terms to restrict academic uses, even the mere existence of those terms can have chilling and negative real world impacts on research using TDM techniques.

The problem of licenses overriding researchers rights under fair use and other parts of copyright law is of course not limited to just inhibiting text and data mining research. We wrote about the issue, and how easy it is to evade fair use, a few months ago, discussing the many ways that restrictive license terms can inhibit normal, everyday uses of works such as criticism, commentary and quotation. We are currently working on a separate paper documenting the scope and extent of “contractual override,” and will be part of a symposium on the subject in May, hosted by the Association of Research Libraries and the American University, Washington College of Law Program on Information Justice and Intellectual Property.

2) The TDM exemption is flexible, but local interpretation and support can vary. We’ve heard that the current TDM exemption–allowing researchers to break technological protection measures such as DRM on ebooks and CSS on DVDs–is an important tool to facilitate research on modern digital works. And we believe the terms of that exemption are sufficiently flexible to meet the needs of a variety of research applications (how wide a variety remains to be seen through more research). But local understanding and support for researchers using the exemption can vary. 

For example, the exemption requires that the university that the TDM research is associated with implement “effective security measures” to ensure that the corpus of copyrighted works isn’t used for another purpose. The regulation further explains that in the absence of a standard negotiated with content holders, “effective security measures” means “measures that the institution uses to keep its own highly confidential information secure.” University  IT data security standards don’t always use the same language or define their standard to cover “highly confidential information” and so university IT offices must interpret this language and implement the standard in their own local context. This can create confusion about what precisely universities need to do to secure the TDM corpora. 

Some of these definitional issues are likely growing pains–the exemption is still new and universities need time to understand and implement standards to satisfy its terms in a reasonable way–it will be important to explore further where there is confusion on similar terms and how that might best be resolved. 

3) Collaboration and sharing are important. Text and data mining projects are often conceived of as part of a much larger research agenda, with multiple potential research outputs both from the initial inquiry and follow-up studies with a number of researchers, sometimes from a number of institutions. Fair use clearly allows for collaborative TDM work –e.g., in  Authors Guild v. HathiTrust, a foundational fair use case for TDM research in the US, we observe that the entire structure of HathiTrust is a collective of a number of research institutions with shared digital assets. And likewise, the TDM exemption permits a university to provide access to “researchers affiliated with other institutions of higher education solely for purposes of collaboration or replication of the research.” The collaborative aspect of this work raises some challenging questions, both operationally and conceptually. For example, the exemption for breaking digital locks doesn’t define precisely who qualifies as a researcher who is “affiliated,” leaving open questions for universities implementing the regulation. More conceptually, the issue of research collaboration raises questions about how precisely the TDM purpose must be defined when building a corpora under the existing exemption, for example when researchers collaborate but investigate different research questions over time. Finally, the issue of actually sharing copies of the corpus with researchers at other institutions is important because at least in some cases, local computing power is needed to effectively engage with the data. 

Again, just preliminary research, but some interesting and important questions! If you are working in this area in any capacity, we’d love to talk. The easiest way to reach us is at  info@authorsalliance.org

Want to Learn More?
This current Authors Alliance project is generously supported by the Mellon Foundation, which has also supported a number of other important text and data mining projects. We’ve been fortunate to be part of a broader network of individuals and organizations devoted to lowering legal barriers for TDM researchers. This includes efforts spearheaded by a team at UC Berkeley to produce the “Legal Literacies for Text Data Mining” and its current project to address cross-border TDM research, as well as efforts from the Global Network on Copyright and User Rights, which has (among other things) led efforts on copyright exceptions for TDM globally.

Authors Alliance Joins Copyright Office Listening Session On Copyright in AI-Generated Literary Works

Posted April 20, 2023
Photo by Possessed Photography on Unsplash

Yesterday, I represented Authors Alliance in a Copyright Office listening session on copyright issues in AI-generated literary works, in the first of two of such sessions that the Office convened yesterday afternoon. I was pleased to be invited to share our views with the Office and participate in a rousing discussion among nine other stakeholders, representing a diverse group of industries and positions. Generative AI raises challenging legal questions, particularly for its skeptics, but it also presents some incredible opportunities for authors and other creators.

During the listening session, I emphasized the potential for generative AI programs (like OpenAI’s Chat GPT, Microsoft’s Bing AI, Jasper, and others) to support authorship in a number of different ways. For instance, generative AI programs support authors by increasing the efficiency of some of the practical aspects of being a working author aside from their writings. But more importantly, generative AI programs can actually help authors express themselves and create new works of authorship. 

In the first category, generative AI programs can support authors by, for example, helping them create text for pitch letters to send to agents and editors, produce copy for their professional websites, and develop marketing strategies for their books. Making these activities more efficient frees up time for authors to focus on their writing, particularly for authors whose writing time is limited by other commitments. 

In the second category, generative AI has tremendous potential to help authors come up with new ideas for stories, develop characters, summarize their writings, and perform early stage edits of manuscripts. Moreover, and particularly for academic authors, generative AI can be an effective research tool for authors seeking to learn from a large corpus of texts. Generative AI programs can help authors research by providing short and simple summaries of complex issues, surveys of the landscape of various fields, or even guidance on what human works to turn to in their research. Authors Alliance is committed to protecting authors’ right to conduct research, and we see generative AI tools as a new, innovative, and efficient form of conducting this research. Making research easier helps authors save time, and has a particular benefit for authors with disabilities that make it difficult to travel to multiple libraries or otherwise rely on analog forms of research. 

These programs undoubtedly have the potential to serve as powerful creative tools that support authorship in these ways and more, but, when discussing the copyright implications of the programs and the works they produce, it’s important to remember just how new these technologies are. Because generative AI remains in its infancy, and the costs and benefits for different segments of the creative industry have yet to be seen, it seems to me to be sensible to preserve the development of these tools before crafting legal solutions to problems they might pose in the future. And in fact, in our view, U.S. copyright law already has the tools to deal with many of the legal challenges that these programs might post. When generative AI outputs look too much like the copyrighted inputs they are trained on, the substantial similarity test can be used to assess claims of copyright infringement to vindicate an authors’ exclusive rights in their works when those outputs do infringe. 

In any case, in order for generative AI programs to be effective creative tools, it’s necessary that they are trained on large corpora. Narrowing the corpus of works the programs are trained on—through compulsory licensing or other mechanisms—can have disastrous effects. For example, research has shown that narrow data sets are more likely to produce racial and gender bias in AI outputs. In our view, the “input” step, where the programs are trained on a large corpus of works, is a fair use of these texts. And the holdings in Google Books and HathiTrust indicate that it is consistent with fair use to build large corpora of works, including works that remain protected by copyright, for applications such as computational research and information discovery. Additionally, the Copyright Office has recognized this principle in the context of research and scholarship, as demonstrated by its approval of Authors Alliance’s petition for an exemption from DMCA restrictions for text and data mining

The question of the copyright status of AI-generated works is an important one. Most if not all of the stakeholders participating in this discussion agreed with the Copyright Office’s recent guidance regarding registration in AI-generated works: under ordinary copyright principles, the lack of human authorship means these texts are not protected by copyright. This being said, we also recognize that there may be challenges in reconciling existing copyright principles with these new types of works and the questions about authorship, creativity, and market competition that they might pose. 

But importantly, while this technology is still in its early stages, it serves the core purposes of copyright—furthering the progress of science and the useful arts by incentivizing new creation—to allow these systems to develop and confront new legal challenges as they emerge. Copyright is not only about protecting the exclusive rights of copyright holders (a concern that underlies many arguments against generative AI as a fair use), but incentivizing creativity for the public benefit. The new forms of creation made possible through generative AI can incentivize people who would not otherwise create expressive works to do so, bringing more people into creative industries and adding new creative expression to the world to the benefit of the public.

The listening sessions were recorded, and will be available on the Copyright Office website in the coming weeks. And these listening sessions are only the beginning of the Office’s investigation of copyright in AI generated works. Other listening sessions on visual works, music, and audiovisual works will be held in the coming weeks, and the Office has indicated that there will be an opportunity for written public comments in order for stakeholders to weigh in further. We are committed to remaining involved in these cutting edge issues, through written comments and otherwise, and we will keep our readers informed as policy around generative AI continues to evolve. 

Authors Alliance Submits Comment to Copyright Office Regarding Ex Parte Communications

Posted April 4, 2023
Photo by erica steeves on Unsplash

Yesterday, Authors Alliance submitted a comment to the U.S. Copyright Office in response to a notice of proposed rulemaking asking for feedback from the public on new rules to govern ex parte communications. “Ex parte communications” refer to communications outside the normal, permitted channels of communication—in this case, to communications between organizations or members of the public and Copyright Office staff outside of hearings or other formal proceedings. Ex parte communications with the Copyright Office are important, because they allow stakeholders and the office to work out open questions in rulemakings or other proceedings outside of the formal channels. Authors Alliance relied on our ability to make ex parte communications during the last Section1201 rulemaking cycle (where we obtained our text data mining exemption) in order to clarify certain issues. Now, the Office is proposing establishing formal rules for how these communications can be made, as well as establishing transparency around them. We support this proposal, and shared our thoughts in a comment. You can read our full comment here.

Judge Rules Against Internet Archive on Controlled Digital Lending

Posted March 28, 2023
Photo by Wesley Tingey on Unsplash

On Friday, Southern District of New York Judge John Koeltl issued a much-anticipated decision in Hachette Books v. Internet Archive. Unfortunately, as many of our members and allies are aware, the judge ruled against the Internet Archive, finding that its CDL program was not protected by the doctrine of fair use and granting the publishers’ motion for summary judgment. You can read the 47-page decision for yourself here

In his fair use analysis, Judge Koeltl found that each of the four fair use factors weighed in favor of the publishers, emphasizing above all else his view that IA’s controlled digital lending program was not transformative, an important consideration under the first fair use factor, which considers the purpose and character of the use. This inquiry also involves asking whether the use in question was commercial. To the surprise of many, the decision stated that IA’s use of the publishers’ works was commercial, because the Open Library is part of the IA’s website, which it uses “to attract new members, solicit donations, and bolster its standing in the library community.” The judge found this to be the case in spite of the fact that IA “does not make a monetary profit” from CDL. In other words, the judge held that the indirect, attenuated benefits the Internet Archive (which is, after all, a nonprofit) reaps from operating the Open Library makes its CDL program commercial. 

Judge Koeltl gave less attention to the fourth factor in the fair use analysis, “the effect of the use on the potential market for the work,” which is often held up to be of significant importance. One consideration under this factor is whether the use creates a competing substitute with the original work. Unfortunately, on this point too, the court—in our view—missed the mark. This is because the decision does not draw a distinction between CDL scans and ebooks, going so far as to call CDL scans “ebooks” throughout. As we explained in our summary of the proceedings last week, many features of both CDL and ebooks make them both functionally and aesthetically distinct from one another. By glossing over these differences, the judge reached the conclusion that CDL scans are direct substitutes for licensed ebooks.

Authors Alliance is deeply concerned about the ramifications of this decision, which was exceedingly broad in scope, striking a tremendous blow to the CDL model, rather than only IA’s implementation of it. Local libraries across the country practice CDL, and library patrons and authors alike depend on it to read, research, and participate in academic discourse. 

As it stands, this decision only applies to Internet Archive and is only about the 127 books on which the publishers based their lawsuit. It does not set a binding precedent for any other library, but if left in place (or worse, if affirmed on appeal), it could cause libraries to avoid digitizing and lending books under a CDL model, which in our view would not serve the interests of many authors. This decision makes it harder for those authors to reach wide audiences: CDL enables many authors to reach more readers than they could otherwise, and authors like our members who write to be read would not be served if fewer readers could access their books. 

The decision also hampers efforts to preserve books—aside from IA’s scanning program, there are few if any centralized efforts to preserve books in digital format once their commercial life is over. Without CDL, those books could quite literally disappear, and the knowledge they advance could be lost. IA’s scanning operations do preserve such books, which is one reason we have strongly supported them in this lawsuit. By the same token, if this decision stands, it will also limit authors’ ability to conduct efficient research online. The CDL survey we launched last year revealed that CDL is an effective research tool for authors who need to consult other books as part of their writing process, and in many cases it enables them to access far more works than they could at their local library alone. Authors who rely on CDL in this way would be harmed by this decision, as they could well be forced to undergo a more time-consuming research process, detracting from time that could be spent writing. 

The Internet Archive has already indicated that it will be appealing Judge Koeltl’s ruling, and we look forward to supporting those efforts. We will continue to keep our readers and members apprised of updates as this case moves forward.

Judge Hears Oral Arguments in Hachette Book Group v. Internet Archive

Posted March 20, 2023
Photo by Timothy L Brock on Unsplash

Earlier today, Judge John Koeltl of the Southern District of New York heard oral arguments in Hachette Book Group v. Internet Archive—a case Authors Alliance has been following since the lawsuit was first filed back in 2020. The case is about—among other things—whether Internet Archive’s controlled digital lending program qualifies as a fair use. Authors Alliance submitted an amicus brief in support of the Internet Archive back in July, arguing that CDL serves the interests of authors who write to be read. IA’s attorney cited to our brief during oral argument, and we are pleased that we were able to magnify the voices of authors who write to be read through its submission. You can learn more about the case and read our brief here.

In the hearing, the judge considered each party’s motion for summary judgment. The parties hotly contested a number of key issues in the case, including whether each side’s experts had properly demonstrated market harm (or lackthereof), what the appropriate market to consider was for purposes of fair use analysis, the commerciality of IA’s use, and what legal cases supported both arguments in favor of and against fair use. Judge Koeltl asked the Internet Archive’s attorney a number of probing questions on these points, grappling with the difficult questions in this case. The judge further implied that there may be open issues of fact in this case, which could indicate the need for additional briefings or hearings. 

CDL and Commerciality

The parties disagreed on the commerciality of IA’s use when it produces and makes CDL scans available. The publishers attorney argued that IA’s CDL operations are “intertwined” with its other functions, such as its ownership of the book vendor Better World Books, and further emphasizing its argument that CDL loans result in lost revenue for the publisher—in other words, that the supposed commercial harm to the publishers that results from CDL lending makes the CDL lending itself commercial. The Internet Archive’s attorney answered that IA is a nonprofit organization that does not profit at all from its CDL program. He pointed to the fact that traditional library lending is not commercial in nature and does not provide libraries like IA with commercial benefits. 

CDL and Market Effects

The plaintiffs’ attorney began by setting forth plaintiffs’ views on the issue of market harm—the fourth factor in fair use analysis, often cited as one of the most important factors in the inquiry. Plaintiffs discussed what they see as massive financial harm stemming from IA’s CDL program, which they estimated to amount to “millions of dollars in licensing revenues.” Plaintiffs also emphasized that, were CDL “given the green light,” or upheld as a fair use, the plaintiffs would suffer even greater losses. Throughout her argument, plaintiffs’ attorney emphasized the “basic economic principle and common sense is that you cannot compete with free.” In other words, the publishers argue that the ebook library licensing market could collapse altogether if CDL were allowed to continue. Yet this misses the point that CDL is a longstanding and established practice, which has seen adoption and growth in libraries across the country while the ebook licensing market has continued to thrive. 

Judge Koeltl, however, pressed the publishers on whether they had shown evidence of actual market harm, i.e. proof that IA’s CDL program had directly harmed their bottom line. In response, plaintiffs criticized the expert evidence offered by IA’s experts to show that no such harm had occurred. This is a difficult question because the party asserting a fair use defense typically has the burden of showing that the use has not harmed the market, but it exceedingly difficult to prove a negative. 

The judge also questioned whether CDL actually could represent such a loss: the publishers’ argument rests on the premise that libraries loan out CDL scans in lieu of paying to license ebooks, and were CDL not permitted under the law, IA and other libraries would instead choose to pay licensing fees to lend out ebooks. The judge pointed out that the result might in fact be that libraries would choose not to lend digital copies of works out at all, or would instead lend out physical books, undercutting the lost licensing revenue argument. 

IA’s attorney argued that the publishers had not offered empirical evidence of market harm in this case, focusing on the fact that when a library lends out a CDL scan, it does so in lieu of a physical book, “simulating the limitations of physical books.” This is due to CDL’s “owned to loaned” ratio requirement: a library can only loan out the number of CDL scans as it has physical books in its collection, and can only loan these scans out to one patron at a time. When a library lends out a CDL scan, it does so in lieu of loaning the physical book, for which it has already paid. And while the plaintiffs mentioned harm to authors (who are, after all, the people that copyright law is intended to protect) several times during their argument, they did this in a way that linked authors with publishers as parties that are financially invested in a works’ sale—author interests and the finer details of the economics of author income and library lending were absent from the discussion. 

The parties also disagreed about which market was the appropriate one to look to when discussing market harm in the context of fair use analysis. The publishers argued, and the judge seemed to assume, that the proper market is the library ebook licensing market. The judge opined that libraries could, instead of using CDL to lend out their books, simply purchase an ebook license. He seemed to view CDL scans and licensed ebooks as one and the same, despite the fact that there are several key differences between these types of loans, both in form and function, as explained in other amicus briefs in the case. Moreover, missing from the argument was the fact that, in many cases, libraries loan out CDL scans because no ebook is available to them: particularly for older books in a publisher’s backlist, or for books that are no longer available commercially, there is in many cases no ebook available, or no ebook available to libraries. Library patrons with print or mobility disabilities in need of digital copies of these kinds of works in order to read them would be greatly harmed if CDL were no longer permitted. 

CDL and Transformativeness

The publishers’ attorney started from the premise that CDL as a use was not transformative, explaining that a licensed ebook and a CDL scan served precisely the same function. In response, IA’s attorney in response argued that CDL is a transformative use because it “utilizes technology to achieve the transformative purpose of improving efficiency of delivering content without unreasonably encroaching on the rights of the rightsholder.” He further explained that fair uses are favored when they serve the key purpose of copyright: incentivizing new creation for the public benefit without harming the interests of rightsholders. To illustrate these benefits, he cited to Authors Alliance’s amicus brief, in which we explained the myriad ways that CDL benefits authors and can even incentivize the creation of new works. 

Adding to its transformativeness argument, IA explained that, when it comes to speculative or actual market harm, such an effect must be balanced against the public benefit that results from the use. And when it comes to CDL, this public benefit is tremendous: numerous amici, as well as Authors Alliance, explained that CDL serves the interests of library patrons, authors, and the public writ large. 

What’s Next?

Now that the judge has heard both sides’ arguments, he will issue a decision in the case. While there is no way of knowing exactly when this will happen, Judge Koeltl is known for issuing decisions fairly quickly, so we may have a decision as soon as later this week. As always, we will keep our members and readers apprised of any developments in this pivotal case as it moves forward.