Author Archives: Rachel Brooke

Authors Alliance Joins Copyright Office Listening Session On Copyright in AI-Generated Literary Works

Posted April 20, 2023
Photo by Possessed Photography on Unsplash

Yesterday, I represented Authors Alliance in a Copyright Office listening session on copyright issues in AI-generated literary works, in the first of two of such sessions that the Office convened yesterday afternoon. I was pleased to be invited to share our views with the Office and participate in a rousing discussion among nine other stakeholders, representing a diverse group of industries and positions. Generative AI raises challenging legal questions, particularly for its skeptics, but it also presents some incredible opportunities for authors and other creators.

During the listening session, I emphasized the potential for generative AI programs (like OpenAI’s Chat GPT, Microsoft’s Bing AI, Jasper, and others) to support authorship in a number of different ways. For instance, generative AI programs support authors by increasing the efficiency of some of the practical aspects of being a working author aside from their writings. But more importantly, generative AI programs can actually help authors express themselves and create new works of authorship. 

In the first category, generative AI programs can support authors by, for example, helping them create text for pitch letters to send to agents and editors, produce copy for their professional websites, and develop marketing strategies for their books. Making these activities more efficient frees up time for authors to focus on their writing, particularly for authors whose writing time is limited by other commitments. 

In the second category, generative AI has tremendous potential to help authors come up with new ideas for stories, develop characters, summarize their writings, and perform early stage edits of manuscripts. Moreover, and particularly for academic authors, generative AI can be an effective research tool for authors seeking to learn from a large corpus of texts. Generative AI programs can help authors research by providing short and simple summaries of complex issues, surveys of the landscape of various fields, or even guidance on what human works to turn to in their research. Authors Alliance is committed to protecting authors’ right to conduct research, and we see generative AI tools as a new, innovative, and efficient form of conducting this research. Making research easier helps authors save time, and has a particular benefit for authors with disabilities that make it difficult to travel to multiple libraries or otherwise rely on analog forms of research. 

These programs undoubtedly have the potential to serve as powerful creative tools that support authorship in these ways and more, but, when discussing the copyright implications of the programs and the works they produce, it’s important to remember just how new these technologies are. Because generative AI remains in its infancy, and the costs and benefits for different segments of the creative industry have yet to be seen, it seems to me to be sensible to preserve the development of these tools before crafting legal solutions to problems they might pose in the future. And in fact, in our view, U.S. copyright law already has the tools to deal with many of the legal challenges that these programs might post. When generative AI outputs look too much like the copyrighted inputs they are trained on, the substantial similarity test can be used to assess claims of copyright infringement to vindicate an authors’ exclusive rights in their works when those outputs do infringe. 

In any case, in order for generative AI programs to be effective creative tools, it’s necessary that they are trained on large corpora. Narrowing the corpus of works the programs are trained on—through compulsory licensing or other mechanisms—can have disastrous effects. For example, research has shown that narrow data sets are more likely to produce racial and gender bias in AI outputs. In our view, the “input” step, where the programs are trained on a large corpus of works, is a fair use of these texts. And the holdings in Google Books and HathiTrust indicate that it is consistent with fair use to build large corpora of works, including works that remain protected by copyright, for applications such as computational research and information discovery. Additionally, the Copyright Office has recognized this principle in the context of research and scholarship, as demonstrated by its approval of Authors Alliance’s petition for an exemption from DMCA restrictions for text and data mining

The question of the copyright status of AI-generated works is an important one. Most if not all of the stakeholders participating in this discussion agreed with the Copyright Office’s recent guidance regarding registration in AI-generated works: under ordinary copyright principles, the lack of human authorship means these texts are not protected by copyright. This being said, we also recognize that there may be challenges in reconciling existing copyright principles with these new types of works and the questions about authorship, creativity, and market competition that they might pose. 

But importantly, while this technology is still in its early stages, it serves the core purposes of copyright—furthering the progress of science and the useful arts by incentivizing new creation—to allow these systems to develop and confront new legal challenges as they emerge. Copyright is not only about protecting the exclusive rights of copyright holders (a concern that underlies many arguments against generative AI as a fair use), but incentivizing creativity for the public benefit. The new forms of creation made possible through generative AI can incentivize people who would not otherwise create expressive works to do so, bringing more people into creative industries and adding new creative expression to the world to the benefit of the public.

The listening sessions were recorded, and will be available on the Copyright Office website in the coming weeks. And these listening sessions are only the beginning of the Office’s investigation of copyright in AI generated works. Other listening sessions on visual works, music, and audiovisual works will be held in the coming weeks, and the Office has indicated that there will be an opportunity for written public comments in order for stakeholders to weigh in further. We are committed to remaining involved in these cutting edge issues, through written comments and otherwise, and we will keep our readers informed as policy around generative AI continues to evolve. 

Authors Alliance Submits Comment to Copyright Office Regarding Ex Parte Communications

Posted April 4, 2023
Photo by erica steeves on Unsplash

Yesterday, Authors Alliance submitted a comment to the U.S. Copyright Office in response to a notice of proposed rulemaking asking for feedback from the public on new rules to govern ex parte communications. “Ex parte communications” refer to communications outside the normal, permitted channels of communication—in this case, to communications between organizations or members of the public and Copyright Office staff outside of hearings or other formal proceedings. Ex parte communications with the Copyright Office are important, because they allow stakeholders and the office to work out open questions in rulemakings or other proceedings outside of the formal channels. Authors Alliance relied on our ability to make ex parte communications during the last Section1201 rulemaking cycle (where we obtained our text data mining exemption) in order to clarify certain issues. Now, the Office is proposing establishing formal rules for how these communications can be made, as well as establishing transparency around them. We support this proposal, and shared our thoughts in a comment. You can read our full comment here.

Judge Rules Against Internet Archive on Controlled Digital Lending

Posted March 28, 2023
Photo by Wesley Tingey on Unsplash

On Friday, Southern District of New York Judge John Koeltl issued a much-anticipated decision in Hachette Books v. Internet Archive. Unfortunately, as many of our members and allies are aware, the judge ruled against the Internet Archive, finding that its CDL program was not protected by the doctrine of fair use and granting the publishers’ motion for summary judgment. You can read the 47-page decision for yourself here

In his fair use analysis, Judge Koeltl found that each of the four fair use factors weighed in favor of the publishers, emphasizing above all else his view that IA’s controlled digital lending program was not transformative, an important consideration under the first fair use factor, which considers the purpose and character of the use. This inquiry also involves asking whether the use in question was commercial. To the surprise of many, the decision stated that IA’s use of the publishers’ works was commercial, because the Open Library is part of the IA’s website, which it uses “to attract new members, solicit donations, and bolster its standing in the library community.” The judge found this to be the case in spite of the fact that IA “does not make a monetary profit” from CDL. In other words, the judge held that the indirect, attenuated benefits the Internet Archive (which is, after all, a nonprofit) reaps from operating the Open Library makes its CDL program commercial. 

Judge Koeltl gave less attention to the fourth factor in the fair use analysis, “the effect of the use on the potential market for the work,” which is often held up to be of significant importance. One consideration under this factor is whether the use creates a competing substitute with the original work. Unfortunately, on this point too, the court—in our view—missed the mark. This is because the decision does not draw a distinction between CDL scans and ebooks, going so far as to call CDL scans “ebooks” throughout. As we explained in our summary of the proceedings last week, many features of both CDL and ebooks make them both functionally and aesthetically distinct from one another. By glossing over these differences, the judge reached the conclusion that CDL scans are direct substitutes for licensed ebooks.

Authors Alliance is deeply concerned about the ramifications of this decision, which was exceedingly broad in scope, striking a tremendous blow to the CDL model, rather than only IA’s implementation of it. Local libraries across the country practice CDL, and library patrons and authors alike depend on it to read, research, and participate in academic discourse. 

As it stands, this decision only applies to Internet Archive and is only about the 127 books on which the publishers based their lawsuit. It does not set a binding precedent for any other library, but if left in place (or worse, if affirmed on appeal), it could cause libraries to avoid digitizing and lending books under a CDL model, which in our view would not serve the interests of many authors. This decision makes it harder for those authors to reach wide audiences: CDL enables many authors to reach more readers than they could otherwise, and authors like our members who write to be read would not be served if fewer readers could access their books. 

The decision also hampers efforts to preserve books—aside from IA’s scanning program, there are few if any centralized efforts to preserve books in digital format once their commercial life is over. Without CDL, those books could quite literally disappear, and the knowledge they advance could be lost. IA’s scanning operations do preserve such books, which is one reason we have strongly supported them in this lawsuit. By the same token, if this decision stands, it will also limit authors’ ability to conduct efficient research online. The CDL survey we launched last year revealed that CDL is an effective research tool for authors who need to consult other books as part of their writing process, and in many cases it enables them to access far more works than they could at their local library alone. Authors who rely on CDL in this way would be harmed by this decision, as they could well be forced to undergo a more time-consuming research process, detracting from time that could be spent writing. 

The Internet Archive has already indicated that it will be appealing Judge Koeltl’s ruling, and we look forward to supporting those efforts. We will continue to keep our readers and members apprised of updates as this case moves forward.

Judge Hears Oral Arguments in Hachette Book Group v. Internet Archive

Posted March 20, 2023
Photo by Timothy L Brock on Unsplash

Earlier today, Judge John Koeltl of the Southern District of New York heard oral arguments in Hachette Book Group v. Internet Archive—a case Authors Alliance has been following since the lawsuit was first filed back in 2020. The case is about—among other things—whether Internet Archive’s controlled digital lending program qualifies as a fair use. Authors Alliance submitted an amicus brief in support of the Internet Archive back in July, arguing that CDL serves the interests of authors who write to be read. IA’s attorney cited to our brief during oral argument, and we are pleased that we were able to magnify the voices of authors who write to be read through its submission. You can learn more about the case and read our brief here.

In the hearing, the judge considered each party’s motion for summary judgment. The parties hotly contested a number of key issues in the case, including whether each side’s experts had properly demonstrated market harm (or lackthereof), what the appropriate market to consider was for purposes of fair use analysis, the commerciality of IA’s use, and what legal cases supported both arguments in favor of and against fair use. Judge Koeltl asked the Internet Archive’s attorney a number of probing questions on these points, grappling with the difficult questions in this case. The judge further implied that there may be open issues of fact in this case, which could indicate the need for additional briefings or hearings. 

CDL and Commerciality

The parties disagreed on the commerciality of IA’s use when it produces and makes CDL scans available. The publishers attorney argued that IA’s CDL operations are “intertwined” with its other functions, such as its ownership of the book vendor Better World Books, and further emphasizing its argument that CDL loans result in lost revenue for the publisher—in other words, that the supposed commercial harm to the publishers that results from CDL lending makes the CDL lending itself commercial. The Internet Archive’s attorney answered that IA is a nonprofit organization that does not profit at all from its CDL program. He pointed to the fact that traditional library lending is not commercial in nature and does not provide libraries like IA with commercial benefits. 

CDL and Market Effects

The plaintiffs’ attorney began by setting forth plaintiffs’ views on the issue of market harm—the fourth factor in fair use analysis, often cited as one of the most important factors in the inquiry. Plaintiffs discussed what they see as massive financial harm stemming from IA’s CDL program, which they estimated to amount to “millions of dollars in licensing revenues.” Plaintiffs also emphasized that, were CDL “given the green light,” or upheld as a fair use, the plaintiffs would suffer even greater losses. Throughout her argument, plaintiffs’ attorney emphasized the “basic economic principle and common sense is that you cannot compete with free.” In other words, the publishers argue that the ebook library licensing market could collapse altogether if CDL were allowed to continue. Yet this misses the point that CDL is a longstanding and established practice, which has seen adoption and growth in libraries across the country while the ebook licensing market has continued to thrive. 

Judge Koeltl, however, pressed the publishers on whether they had shown evidence of actual market harm, i.e. proof that IA’s CDL program had directly harmed their bottom line. In response, plaintiffs criticized the expert evidence offered by IA’s experts to show that no such harm had occurred. This is a difficult question because the party asserting a fair use defense typically has the burden of showing that the use has not harmed the market, but it exceedingly difficult to prove a negative. 

The judge also questioned whether CDL actually could represent such a loss: the publishers’ argument rests on the premise that libraries loan out CDL scans in lieu of paying to license ebooks, and were CDL not permitted under the law, IA and other libraries would instead choose to pay licensing fees to lend out ebooks. The judge pointed out that the result might in fact be that libraries would choose not to lend digital copies of works out at all, or would instead lend out physical books, undercutting the lost licensing revenue argument. 

IA’s attorney argued that the publishers had not offered empirical evidence of market harm in this case, focusing on the fact that when a library lends out a CDL scan, it does so in lieu of a physical book, “simulating the limitations of physical books.” This is due to CDL’s “owned to loaned” ratio requirement: a library can only loan out the number of CDL scans as it has physical books in its collection, and can only loan these scans out to one patron at a time. When a library lends out a CDL scan, it does so in lieu of loaning the physical book, for which it has already paid. And while the plaintiffs mentioned harm to authors (who are, after all, the people that copyright law is intended to protect) several times during their argument, they did this in a way that linked authors with publishers as parties that are financially invested in a works’ sale—author interests and the finer details of the economics of author income and library lending were absent from the discussion. 

The parties also disagreed about which market was the appropriate one to look to when discussing market harm in the context of fair use analysis. The publishers argued, and the judge seemed to assume, that the proper market is the library ebook licensing market. The judge opined that libraries could, instead of using CDL to lend out their books, simply purchase an ebook license. He seemed to view CDL scans and licensed ebooks as one and the same, despite the fact that there are several key differences between these types of loans, both in form and function, as explained in other amicus briefs in the case. Moreover, missing from the argument was the fact that, in many cases, libraries loan out CDL scans because no ebook is available to them: particularly for older books in a publisher’s backlist, or for books that are no longer available commercially, there is in many cases no ebook available, or no ebook available to libraries. Library patrons with print or mobility disabilities in need of digital copies of these kinds of works in order to read them would be greatly harmed if CDL were no longer permitted. 

CDL and Transformativeness

The publishers’ attorney started from the premise that CDL as a use was not transformative, explaining that a licensed ebook and a CDL scan served precisely the same function. In response, IA’s attorney in response argued that CDL is a transformative use because it “utilizes technology to achieve the transformative purpose of improving efficiency of delivering content without unreasonably encroaching on the rights of the rightsholder.” He further explained that fair uses are favored when they serve the key purpose of copyright: incentivizing new creation for the public benefit without harming the interests of rightsholders. To illustrate these benefits, he cited to Authors Alliance’s amicus brief, in which we explained the myriad ways that CDL benefits authors and can even incentivize the creation of new works. 

Adding to its transformativeness argument, IA explained that, when it comes to speculative or actual market harm, such an effect must be balanced against the public benefit that results from the use. And when it comes to CDL, this public benefit is tremendous: numerous amici, as well as Authors Alliance, explained that CDL serves the interests of library patrons, authors, and the public writ large. 

What’s Next?

Now that the judge has heard both sides’ arguments, he will issue a decision in the case. While there is no way of knowing exactly when this will happen, Judge Koeltl is known for issuing decisions fairly quickly, so we may have a decision as soon as later this week. As always, we will keep our members and readers apprised of any developments in this pivotal case as it moves forward.

Copyright Office Issues Opinion Letter on Copyright in AI-Generated Images

Posted March 8, 2023
Photo by Michael Dziedzic on Unsplash

In late February, the Copyright Office issued a letter revoking a copyright registration it had previously granted artist Kristina Kashtanova for a comic that used images generated using Midjourney, a generative AI program that creates images in response to user prompts. While this may seem minor, or simply another data point in the ongoing fight about copyright protection for AI-generated works, the determination is quite significant: it comes at a moment when AI-generated art has captured public attention, and moreover shows the Copyright Office’s thoughts on the important question of whether an artist who relies on a program like Midjourney can obtain copyright protection for an original compilation of AI-generated works. In today’s post, we explain the Copyright Office letter, contextualize it within the growing debate over AI and copyright, and share our thoughts on what all of this might mean for authors who write to be read. 

Copyright and Human Authorship

As technology has advanced to allow the creation of works without the direct involvement of a human, courts have grappled with whether these creations are entitled to copyright protection. In the late 19th century, the Supreme Court established that copyright was intended to protect the products of human labors and creativity, creating the “human authorship” requirement. In an early case on the topic, the Court held that a photograph was copyrightable despite the fact that a camera literally created the image, since photographs were “representatives of original intellectual conceptions of the author.” It cautioned, however, that when it came to creations resulting from processes that were “merely mechanical,” lacking “novelty, invention, or originality” by a human author, such hypothetical works might be beyond the scope of copyright protection.

This principle was tested in the 2010s: in 2011, an Indonesian crested macaque monkey named Naruto seized a photographer’s camera and took hundreds of images of himself. The photographer, David Slater, shared some of these images online, which promptly went viral. Several websites posted these images as well, prompting Slater to assert that he owned the copyright in the images and request their removal. The Wikimedia Foundation, which had uploaded the image to Wikimedia Commons, a repository of public domain and free license content, argued that the image was a part of the public domain due to the lack of a human creator. Several years later, Slater published a book of nature photographs which included Naruto’s selfie. Then, in 2015, the People for the Ethical Treatment of Animals (PETA) filed a lawsuit in the Northern District of California on Naruto’s behalf, asserting that the macaque owned the copyright in the image and requesting damages. The district court judge held that Naruto could not own the copyright in the image due to copyright’s human authorship requirement. However, the judge did indicate that Congress might be free to do away with the human authorship requirement and permit copyright ownership by animals, suggesting that the requirement was not a constitutional one, but indicating that it was beyond the power of the judiciary to decide. The Ninth Circuit Court of Appeals later affirmed the district court’s ruling.

Currently, the Copyright Office is defending a lawsuit in the D.C. district court brought by AI system developer, Dr. Stephen Thaylor, regarding the constitutionality of copyright law’s human authorship requirement. Thaylor argues that the Copyright Act does not forbid treating AI systems as “authors” for the purpose of copyright law, and contends that the human authorship principle is unsupported by contemporary case law. While it seems unlikely that Thaylor will prevail on this argument, the case will at the very least generate new attention about the human authorship requirement and how it fits into creation in the digital age. 

The Creativity Requirement and Zarya of the Dawn

Kashtanova’s assertion of copyright ownership in her comic, Zarya of the Dawn, is in many ways similar to the photographer David Slater’s claim that he owned the copyright in Naruto’s selfie. In each case, the Copyright Office indicated that when a work is not the product of human authorship, a human may not claim copyright in that work (the latest compendium of Copyright Office practices lists “a photograph taken by a monkey” as an example of work that is not entitled to copyright protection since it does not meet the human authorship requirement). 

Kashtanova’s attorney had argued that Midjourney served “merely as an assistive tool,” and that Kashtanova should be considered the work’s author. But the Office likened Midjourney to a “merely mechanical process” lacking “novelty, invention, or originality” by a human creator, quoting the Supreme Court’s warning about the limits of copyright protection in the 19th century case discussed earlier in this post. And it was not only the human authorship requirement that made Zarya of the Dawn beyond the scope of copyright protection, but also copyright’s creativity requirement: for a work to be copyrightable, it must possess at least a “modicum” of creativity, a very low bar that rarely forecloses copyright protection for works of human authorship. 

The Office explained that Midjourney generates images in response to user prompts, “text commands entered in one of Midjourney’s channels.” But these are not “specific instructions” for generating an image, rather input data that Midjourney compares to its training data before generating an image. The Office also argued that these images lack human authorship because the process is “unpredictable” and “not controlled by the user.” In other words, the “creativity” in these images comes not from the human entering prompts, but from the interaction between the prompt and Midjourney’s training data. This makes it different from a tool like a camera over which a user exercises total control—there is little to no unpredictability when we use digital cameras to photograph the world around us, rather all creative choices come from the human using the device. 

The Office also noted that this opinion was not necessarily the final world on AI-generated images, as “other [generative] AI offerings” might operate differently, such that the creativity and human authorship requirements could be met. Kashtanova argued that minor edits she had made to the images were sufficiently creative to give her copyright ownership in the work as a whole. While the Office disagreed in this specific case (the before and after images demonstrating the editing were nearly identical), it did leave this possibility intact for future cases. Moreover, the Office granted Kashtanova ownership in the comic’s text, which she alone had written, as well as copyright ownership in the compilation of Midjourney-generated images. Compilations of uncopyrightable subject matter can sometimes be protected by copyright, because both the human authorship and creativity requirements are met when a human selects and arranges the material. The copyright owner does not own a copyright in the material itself, but in the original compilation they have created.

What Does this Mean for Authors?

The Copyright Office’s denial of registration in the Midjourney-generated images has important implications for the public domain and authors’ abilities to use new forms of technology as assistive tools in the creation of their works. But the Office’s action also leaves some open questions about the copyright status of images generated by Midjourney and similar systems. One possibility is—as was asserted by Wikimedia in the case of Naruto’s selfie—these images are a part of the public domain. Were that to be the case, it could be a boon for artists and creators. Recall that once a work is in the public domain, it becomes free for all to use without fear of copyright infringement. The case of the monkey selfie is further instructive here, as the owner of the camera in that case did not prevail on claiming his own copyright in Naruto’s selfie. By the same token, it is unlikely that the creators of Midjourney could claim a copyright in images like those used by Kashtova, despite their role in creating and making available the “assistive tool.” 

If AI systems could be used to generate infinite public domain content—whether through text-based systems like ChatGPT or image-generating systems like Midjourney—this would greatly expand public domain content. The public domain can be a boon for creators, as they are free to do anything they wish with this material. On the other hand, some have expressed fear that, should all AI-produced works be considered a part of the public domain, these public domain works could compete with works produced by human authors. It is also important to remember the practical economic realities of systems like Midjourney. Whether or not the Copyright Office and other policymakers determine that AI-generated content is a part of the public domain, the creators of those systems could employ other means to assert ownership or forbid onward uses of the content created by these systems. Contractual override, the employment of so-called “digital locks” like DRM, or other legal and technical mechanisms could conceivably limit authors’ ability to use AI-generated works the way they might use more traditional public domain materials. 

Fair Use Week 2023: Looking Back at Google Books Eight Years Later

Posted February 24, 2023
Photo by Patrick Tomasso on Unsplash

This post is authored by Authors Alliance Senior Staff Attorney, Rachel Brooke. 

More recent members and readers may not be aware that Authors Alliance was founded in the wake of Authors Guild v. Google,  a class action fair use case in the Second Circuit that was litigated for nearly a decade, and finally resolved in favor of Google in 2015. The case concerned the Google Books project—an initiative launched by Google whereby the company partnered with university libraries to scan books in their collections. These scans would ultimately be made available as a full-text searchable database for the public to search through for particular terms, with short “snippets” displayed accompanying the search results. Users could not, however, view or read the scanned books in their entirety. The Authors Guild, along with several authors, filed a lawsuit against Google alleging that scanning the books and displaying these snippets constituted copyright infringement.

In addition to Authors Guild representing its members in the litigation, its associated plaintiffs brought the case as a class action, claiming to bring the case on behalf of a broad group of authors:  “[a]ll persons residing in the United States who hold a United States copyright interest in one or more Books reproduced by Google as part of its Library Project” who were either authors or the authors’ heirs.

But many of these authors did not agree with the Authors Guild’s stance in the case, and felt that the Google Books project served their interests in sharing knowledge, seeing their creations be preserved, and reaching readers interested in their work. A group of authors and scholars came together to share their views with the district court, many of whom would soon become founding members of Authors Alliance. Many of those same authors signed on to amicus briefs before both the district court and Second Circuit explaining why they opposed the litigation and supported Google’s fair use defense. Then, in 2014, Authors Alliance submitted its first amicus brief to the Second Circuit, supporting Google’s ultimately successful fair use defense. The plaintiffs later appealed the Second Circuit’s ruling, asking the Supreme Court to weigh in, but the Court ultimately declined to hear the case, leaving the Second Circuit’s ruling intact. 

Nearly a decade later, the effects of Google Books can still be seen in fair use decisions and copyright policy developments involving the challenges of adapting copyright to the digital world. In today’s post, I’ll reflect on how Google Books can be contextualized within today’s fair use landscape and share my thoughts on what the case can tell us about copyright in the digital world. 

Google Books and Transformativeness

A major question in Authors Guild v. Google was whether Google’s use of the copyrighted works was “transformative,” a key component of the fair use inquiry. When a use is found to be transformative, this in practice weighs heavily in favor of a finding of fair use. In the case, the court found that Google’s scanning, as well as the search and snippet display functions, were transformative because the service “augments public knowledge by making available information about [the] books without providing the public with a substantial substitute for . . . the original works.” This was because Google Books provided information about the books—such as the author and publisher information—without creating substitutes of the original works. In other words, readers could learn about the books they searched through, but could not read the books in full—to do this, those readers would have to purchase or borrow copies through the normal channels. 

Since the doctrine of transformativeness was established in the 1994 landmark Supreme Court case, Campbell v. Acuff-Rose Music, there have been myriad questions about the precise contours of what it means for a use to be transformative. Campbell established that a use is transformative when it endows the secondary work with a “new meaning or message,” but it can be difficult to apply this test in practice, particularly in the context of new or nascent technologies. Google Books tells us that scanning works in order to create a full-text searchable database with limited snippet displays is a transformative use based on its new and different purpose from the purpose of the works themselves. Furthermore, it reinforces the notion that a use is particularly likely to be considered transformative when it serves the underlying purpose of copyright law: incentivizing new creation for the benefit of the public and “enriching public knowledge.” By highlighting that Google contributed to public knowledge about books through its scanning activities and the Google Books search function, the court helped bring fair use for scholarship and research—two key prototypical uses established in the 1976 Copyright Act—into the digital age, setting an important precedent for later cases. 

Google Books and Derivative Works

One of the plaintiffs’ arguments in Google Books was that Google’s full-text searchable database constituted a derivative work. One of a copyright holder’s exclusive rights is the right to prepare derivative works—such as adaptations, abridgements, or translations of the original work—and the plaintiffs alleged that this right had been infringed. The court disagreed, finding that Google’s use had a transformative purpose, whereas derivative works tend to involve a transformation in form, such as the adaptation of a novel into a movie or an audiobook. Furthermore, the court explained that derivative works are “those that re-present the protected aspects of the original work, i.e., its expressive content, converted into an altered form[.]” In contrast, the Google Books project provided information about the books and offered a limited “snippet” view, but did not re-present the expressive content: the full text of the books themselves.

The distinction the court drew between transformative fair uses and derivative works in Google Books is an important one, as it can often be a close question whether a work involves a transformative purpose or merely represents the same work in a new form, without enough added to tip the scales towards fair use. And it is a question that continues to arise in fair use cases today: just last year, the Supreme Court agreed to hear Warhol Foundation v. Goldsmith, a case about whether Andy Warhol’s creation of a series of screenprints of the late musical artist Prince which drew from a photograph taken by photographer Lynn Goldsmith qualified as a fair use. We’ve covered this case extensively on our blog over the past few years, and submitted an amicus brief in the case. Our brief argues (among other things) that Warhol’s screen prints involve much more than a transformation in form: they are stylistically and visually distinct from Goldsmith’s photograph, and endow the photograph with a new meaning or message, making the use highly transformative. 

As in Google Books, the parties and amici in Goldsmith grapple with the line between transformative uses and the creation of derivative works, an often complicated and fact-sensitive determination. In this context, Google Books serves as a reminder that fair use is not a one-size-fits-all determination. Yet it also provides support for arguments advanced by Authors Alliance and others that simply because a transformation in form exists—in the Google Books case, the transformation from a print book to a scanned copy, and in Goldsmith, the transformation of a black and white photo to a series of colorful screenprints—does not mean that a secondary use cannot be a fair one. Warhol’s use did not merely “re-present the protected aspects of the original work[‘s] . . . expressive content,” but was transformative in the different “purpose, character, expression, meaning, and message” it conveyed.

Google Books and Controlled Digital Lending

The practice of controlled digital lending (“CDL”)—and the arguments in favor of it constituting a fair use—can be traced back in part to the fair use principles established and reinforced in Google Books. As I argue in our amicus brief in Hachette Books v. Internet Archive, a case about—among other things—whether CDL constitutes a fair use, Google Books shows that copying the entirety of a work in the process of making a transformative use of it can be fully consistent with fair use. 

Another important suggestion in the Google Books case, made at the district court level, was that the Google Books search function could actually drive book sales: the search results were accompanied by links to purchase the book, and research suggested that this could enhance sales of those books. This is analogous to the effects of library lending: library readers often purchase books by authors they first discovered at the library, an effect which can apply with equal force when the library patron borrows a CDL scan. Indeed, several other amici in Hachette Books argue that the finding that the Google Books search was a fair use lent substantial support for the argument that CDL is a fair use, based on both the factual similarities between the two initiatives and their shared objective of “enriching public knowledge.” 

As in Google Books, CDL also helps authors reach readers who could not otherwise access their books, and achieves this through scanning books on library shelves. And also like Google Books, CDL helps solve the problem of 20th century works “disappearing”: the commercial life of a book tends to be much shorter than the term of copyright, so when books under copyright go out of print, they can disappear into obscurity. Scanning these books to preserve them ensures that the knowledge they advance will not be lost. 

Google Books and Text Data Mining

Text data mining—the process of using automated techniques aimed at quantitatively analyzing text and other data—is also widely considered to be a fair use, and this determination is similarly built in part on the building blocks established in Google Books. As was the case in Google Books, the results of text data mining research provide information about the works being studied, and cannot in any way serve as substitutes for the content of the works. In fact, one important aspect of the new exemption to DMCA liability for text data mining, which Authors Alliance successfully petitioned for in 2021, is that researchers are not able to use the works in the text data mining corpus for consumptive purposes. And also like Google Books, researchers are able to view the content in a limited manner to verify their findings, analogous to Google Books’s snippet view. The new TDM exemption was a huge win for Authors Alliance members, and something to celebrate for all scholars engaged in this important research. Importantly, the precedent established by Google Books strongly supported its adoption and the Register of Copyright’s suggestion that text data mining was likely to be a fair use

Looking Forward: Google Books and Artificial Intelligence

In recent years, scholars and researchers have grappled with the implications of copyright protection on AI-generated content and AI models more generally. The holding in Google Books provides some support for companies’ and researchers’ ability to engage in these activities: one important factor in the case was that Google Books did not harm the market for the books at issue in the case, since the books in the database could not serve as substitutes for the books themselves. Similarly, when copyrighted works are used to train AI, the output cannot serve as a substitute for the copyrighted works, and the market for those works is not harmed, even if—like the plaintiffs in Google Books—the copyright holders might prefer that their works not be used in this way. Google Books establishes that simply because copyrighted works are used as “input” in a given model, this does not mean that the outputs constitute infringement. It is also worth noting that the court found Google’s use to be fair despite the fact that it was a use by a commercial, profit-seeking entity. While a commercial use can sometimes tip the scales in favor of finding a use to not be fair, this can be overcome by a socially beneficial, transformative purpose. This could arguably apply with equal force to AI models trained on copyrighted works which contribute to our understanding of the world, despite the fact that commercial entities are often the ones deploying these technologies. 

Eight years after it was decided, the legacy of Google Books endures in policy debates and copyright lawsuits that capture the public’s attention. Policymakers and judges would be wise to heed the lessons it teaches about the value of advancing public knowledge through digitization and the use of copyrighted works for new and socially beneficial purposes. As we await policy developments regarding text data mining and wait for decisions in Goldsmith and Hachette Books, it is my hope that this legacy will live on, reminding us all of the vast capabilities of information technology to enrich our understanding of the world and advance the progress of knowledge, which, after all, is what copyright law is all about. 

Fair Use Week 2023: Resource Roundup

Posted February 21, 2023
Photo by Adi Goldstein on Unsplash

Authors who want to incorporate source materials into their writings with confidence may find themselves faced with more questions than answers. What exactly does fair use mean? What factors do courts consider when evaluating claims of fair use? How does fair use support authors’ research, writing, and publishing goals? Fortunately, help is at hand! This Fair Use/Fair Dealing Week, we’re featuring a selection of resources, briefs, and blog posts to help authors understand and apply fair use.

Fair Use 101

Cover of the Fair Use Guide for Nonfiction Authors

Authors Alliance Guide to Fair Use for Nonfiction Authors: Our guidebook, Fair Use for Nonfiction Authors, covers the basics of fair use, addresses common situations faced by nonfiction authors where fair use may apply, and debunks some common misconceptions about fair use. Download a PDF today.

Authors Alliance Fair Use FAQs: Our Fair Use FAQs cover questions such as:

  • Can I still claim fair use if I am using copyrighted material that is highly creative?
  • What if I want to use copyrighted material for commercial purposes?
  • Does fair use apply to copyrighted material that is unpublished?

Codes of Best Practices in Fair Use: The Center for Media and Social Impact at American University has compiled this collection of Codes of Best Practices in Fair Use for various creative communities, from journalists to librarians to filmmakers.

Fair Use Evaluator Tool: This tool, created by the American Library Association, helps users support and document their assertions of fair use.

Dig Deeper

U.S. Copyright Office Fair Use Index: The U.S. Copyright Office maintains this searchable database of legal opinions and fair use test cases.

Fair Use Amicus Briefs: Authors Alliance submitted several friend of the court briefs on issues related to fair use over the past year. Check out our brief in Hachette Books v. Internet Archive, where we expand on our longtime defense of Controlled Digital Lending as a fair use; our brief in Goldsmith v. Warhol Foundation, where we advocate for a broad yet sensible conception of “transformativeness”; and our brief in Sicre de Fontbrune v. Wofsy, where we explain why fair use is a crucial aspect of U.S. policy and why it should shield authors from the enforcement of foreign copyright judgments where fair use would have protected the use had it occurred in the U.S.

Fair Use and Text Data Mining: Learn about Authors Alliance’s new project, “Text and Data Mining: Defending Fair Use,” intended to support researchers engaging in text and data mining under the recent DMCA exemption for Text Data Mining, generously supported by the Mellon Foundation.

Fair Use and Public Policy: Learn about why we voiced opposition to the SMART Copyright Act of 2022 and the Journalism Competition and Preservation Act—proposed legislation that, if passed, could erode our fair use rights.

Public Domain Day 2023: Welcoming Works from 1927 to the Public Domain

Posted January 5, 2023
Montage courtesy of the Center for the Public Domain

Literary aficionados and copyright buffs alike have something to celebrate as we welcome 2023: A new batch of literary works published in 1927 entered the public domain on January 1st, when the copyrights in those works expired. The public domain refers to the commons of creative expression that is not protected by copyright. When a work enters the public domain, anyone may do anything they want with that work, including activities that were formerly the “exclusive right” of the copyright holder like copying, sharing, translating, or adapting the work. 

Some of the more recognizable books entering the public domain this year include: 

  • Virginia Woolf’s To the Lighthouse
  • William Faulkner’s Mosquitoes
  • Agatha Christie’s The Big Four
  • Edith Wharton’s Twilight Sleep
  • Herbert Asbury’s The Gangs of New York (the original 1927 publication)
  • Franklin W. Dixon’s (a pseudonym) The Tower Treasure (the first Hardy Boys book)

Literary works can be a part of the public domain for reasons other than the expiration of copyright—such as when a work is created by the government—but copyright expiration is the major way that literary works become a part of the public domain. Copyright owners of works first published in the United States in 1927 needed to renew that work’s copyright in order to extend the original 28-year copyright term. Initially, the renewal term also lasted for 28 years, but over time the renewal term was extended to give the copyright holder an additional 67 years of copyright protection, for a total term of 95 years. This means that works that were first published in the United States in 1927—provided they were published with a copyright notice, were properly registered, and had their copyright renewed—were protected through the end of 2022. 

Once in the public domain, works can be made freely available online. Organizations that have digitized text of these books, like Internet ArchiveGoogle Books, and HathiTrust, can now open up unrestricted access to the full text of these works. HathiTrust alone has opened up full access to more than 40,000 titles originally published in 1927. This increased access provides richer historical context for scholarly research and opportunities for students to supplement and deepen their understanding of assigned texts. And authors who care about the long-term availability of their works may also have reason to look forward to their works eventually entering the public domain: A 2013 study found that in most cases, public domain works are actually more available to readers than all but the most recently published works. 

What’s more, public domain works can be adapted into new works of authorship, or “derivative works,” including by adapting printed books into audio books or by adapting classic books into interactive forms like video games. And the public domain provides opportunities to freely translate works to enrich our understanding of those works and help fill the gap in works available to readers in their native language.

Updates on the JCPA

Posted December 14, 2022
Photo by Elijah Mears on Unsplash

Last week saw a flurry of news about the Journalism Competition and Preservation Act (“JCPA”), proposed legislation that would create an exemption to antitrust law that would allow certain news publishers to join together to collectively negotiate with digital platforms to negotiate payments for carrying their content. Authors Alliance has consistently opposed the JCPA, as we believe it would harm small publishers and creators, while further entrenching major players in the news media industry. 

Last Monday, December 5th, it was uncovered that the revised JCPA had been included in a “must pass” defense spending bill (the National Defense Authorization Act, or NDAA), leading the legislation’s opposition to promptly decry the move and caution against it. Then, the next day, news broke that Congress had removed the JCPA from the legislation—something to celebrate for those, like Authors Alliance, who believed this was ill-advised legislation that would not have served the interests of the creators who contribute to the news media. 

Background

The JCPA was first proposed as separate bills in the Senate and House of Representatives in March 2021. The JCPA has laudable goals: to preserve a strong, diverse, and independent press, responding to ongoing crises in local and national journalism. But the actual text of the JCPA doesn’t meet those goals, while causing other problems.  One major problem has been that the JCPA implicitly expands the scope of copyright, and would potentially require payment for activities like linking or using brief snippets of content that are not only fair uses, but are crucial for digital scholarship. In June 2021, Authors Alliance joined a group of like-minded civil society organizations on a letter urging Congress to clarify that the bill would not expand copyright protection to article links, and that authors and other internet users would not have to pay to link to articles or for the use of headlines and other snippets that fall within fair use. 

Then, this September, a new version of the bill was released in the Senate. While the revised language made some improvements—like clarifying that the bill would not modify, expand, or alter the rights guaranteed under copyright—it still failed to clarify that the bill would not cover activities like linking that are fundamental for authors creating digital scholarship. And some changes to the legislation posed serious First Amendment concerns. For example, new language in the bill would have forced platforms to carry content of digital journalism organizations that participated in the collective bargaining, regardless of extreme views or misinformation. The revised bill could also have hurt authors of news articles financially, because it failed to include a provision that would require authors of the press articles to be compensated as part of the collective bargaining it envisioned. 

Inclusion in the NDAA

Last week, the news that the JCPA had been included in the NDAA was met with outcry. Its opponents argued that the bill was far too complex to be included in must-pass legislation, and merited further discussion and revision before becoming law. The JCPA was never marked up in the House of Representatives, nor did it receive a hearing there. Authors Alliance once again joined 26 other civil society organizations on a letter protesting the move and urging Congress not to include the JCPAA in military spending or other must-pass legislation. 

A wide variety of other stakeholders also objected to the inclusion of the JCPA in the NDAA. Small publications, lobbyists for platforms, and even journalism trade groups reiterated their opposition. Meta, the company that owns Facebook, even threatened to remove news from their platform were the legislation to pass (in response to a similar bill being passed in Australia, Meta did in fact remove news from its platform in the country). Then, late on Tuesday, December 6th, the latest version of the bill’s text was released, with the JCPA omitted. The NDAA was approved by the House a few days later. 

A Victory for Now

Because the JCPA was removed from the NDAA before its passage, it is no longer on the brink of becoming law. What happens next with the JCPA is less certain. There have already been multiple iterations of the bill, and it could be reintroduced, with or without modifications, at the next legislative session. While it’s unclear how the new makeup of Congress following the midterm elections might affect the JCPA’s chance of becoming law, this is certainly a factor in the bill’s future. This was also not the first time that the government has attempted to support journalism and local news through proposals that could affect users’ and authors’ ability to rely on fair use. Just last year, the Copyright Office conducted a study on establishing a new press publishers’ right in the United States which would have required news aggregators to pay licensing fees as part of their aggregation of headlines, ledes, and short phrases of news articles (you can read about Authors Alliance’s reply comment in that study here), activities. While the Office ultimately decided not to recommend the adoption of a new press publisher’s right, its study shows that the government may continue to investigate these policies from other fronts. 

Analysis: Opinion Released in U.S. v. Bertelsmann

Posted November 18, 2022
Photo by Scott Graham on Unsplash

Last week, the district court released its opinion in United States v. Bertelsmann, an antitrust case concerning a proposed merger between Penguin Random House (“PRH”) and Simon & Schuster (“S&S”), which the court blocked (an “amended opinion” was released earlier this week, but the two documents only differ in their concluding language). Authors Alliance has been covering this case on our blog for the past year, and we were eager to read Judge Pan’s full opinion now that redactions had been made and the opinion made public. This post gives an overview of the opinion; shares our thoughts about what Judge Pan got right, got wrong, and left out; and discusses what the case could mean for the vast majority of authors who are not represented in the discussion.

Background

The Department of Justice initiated this antitrust proceeding after PRH and S&S announced that they intended to merge, with Bertelsmann, PRH’s parent company, purchasing S&S from its parent company, Paramount Global. The trade publishing industry has long been dominated by a few large publishing houses which have merged and consolidated over time. Today, the trade industry is dominated by the “Big Five” publishers: PRH, S&S, HarperCollins, Hachette Book Group, and Macmillan. And a sub-section of the trade publishing industry, “anticipated top sellers,” is the focus of the government’s argument and Judge Pan’s opinion. This market segment is defined as books for which authors receive an advance of $250,000 or higher (a book advance is an up-front payment made to authors when they publish a book, and often the only money these authors receive for their works). 

The main thrust of Judge Pan’s opinion is simple: the proposed merger would have led to lower advances for authors of anticipated top sellers, and the market harm that would flow from the decreased competition in the industry is substantial enough that the merger can not go forward under U.S. antitrust law. To arrive at this conclusion, the court considered testimony from a variety of publishing industry insiders, experts in economics, and authors. 

Defining the Market

Trade publishing houses are those that distribute books on a national scale and sell them in non-specialized channels, like at general interest bookstores or on Amazon. It stands in contrast to self-publishing, academic publishing, and publishing with specialized boutique presses. But changes in how we read and how books are distributed has complicated these distinctions. For example, university presses are sometimes considered to be non-trade publishers, despite the fact that many also publish trade books. University presses are particularly well poised to publish books that bridge the gap between the scholarly and the popular—Harvard University Press’s publication of Thomas Picketty’s Capital in the 21st Century is one example, and it was an unexpected bestseller. Similarly, Amazon sells trade books alongside other types of books. The Authors Alliance Guide to Understanding Open Access is available as a print book on Amazon, but it is one we released under an open access license, and is far from a trade book. Consumers increasingly buy books online as brick and mortar bookstores across the country close or downsize, and the Amazon marketplace obscures the distinction between trade publishing and other types of publishing.

Within trade publishing, there is a small segment of books which are seen as “hot,” which the DOJ calls anticipated top sellers. While PRH argued that this distinction was pulled out of whole cloth, the popular “Publisher’s Marketplace,” a subscription-based service for those in the industry, uses certain terms (essentially code words) to indicate the size of the advance in a book deal when they are announced. “Deals under $50,000 are ‘nice,’ those up to $100,000 are ‘very nice,’ those up to $250,000 are ‘good,’ those up to $500,000 are ‘significant,’ and larger deals are ‘major.””

For the market for anticipated top sellers (trade books with advances of $250,000 or higher), the Big Five collectively control 91% of the market share. In contrast, for books where an author receives an advance under $250,000, the Big Five control just 55% of the market, with non-Big Five trade publishers publishing a significant portion of trade books in this category. Post-merger, the combined PRH and S&S were expected to have a 49% share of the market for anticipated bestsellers, according to expert testimony—more than the rest of the Big Five put together. For these reasons, the merger was determined to be improper as a matter of antitrust.

Beyond Anticipated Top Sellers

While Judge Pan’s opinion is measured, thoughtful, and reaches (from our perspective) the correct result, the broader context of the publishing industry shows how narrow the subset of authors in this market is, and how some authors were left out. The market the court considered in this case is “a submarket of the broader publishing market for all trade books.” In its pre-trial brief, PRH asserted that “[s]ome 57,000 to 64,00 books are published in the [U.S.] each year by one of more than 500 different publishing houses” and “another 10,000-20,000 are self published.” It is unclear whether the first number includes academic books and other non-trade titles. “[A]nticipated top-selling books” account for just 2% of “all books published by commercial publishers” (again, it is unclear what “commercial publishers” means in this context), and an even smaller share of all books published in the U.S. in a given year (a difficult statistic to pin down, but somewhere between 300,000 and 1,000,000 books per calendar year, depending on who you ask). 

It is not just that the authors that are the topic of this discussion are unique in the high advances they receive for their books, it is that the business of publishing a book is fundamentally different for these authors than less commercially successful authors. And this is what is missing from Judge Pan’s opinion: the economic system of Big Five book acquisitions for anticipated top sellers is totally unlike many authors’ experiences getting their work published. While many authors struggle to find a publisher willing to publish their book, and more still struggle to convince their publisher to do so on terms that are acceptable to them, anticipated top sellers are generally the subject of book auctions, whereby editors bid on the rights to a manuscript in an auction held by an author’s literary agent. It is important to keep in mind that for a vast majority of working authors, these auctions do not take place. 

The language in the opinion shows how it generalizes the experiences of commercially successful trade authors to authors more broadly, doing a disservice to the multitude of authors whose book deals do not look like the transactions it describes. Judge Pan states that “[a]uthors are generally represented by literary agents, who use their judgment and experience to find the best home for publishing a book.” Literary agents play an important role in the publishing ecosystem, and serve as intermediaries for some authors to help them develop their manuscripts and get the best deal possible. But the author-agent relationship is also a financial one: agents receive a “commission” of around 15% of all monies paid to the author. It stands to reason that an author who cares more about their work reaching a broad audience than receiving a large advance, or even an advance at all, is much less likely to be represented by an agent. And these authors too care about finding the right home for their work, getting a book deal with favorable terms, and feeling confident that their publisher is invested in their work. Making the publishing industry less diverse, with fewer houses overall, is just as detrimental to these authors as it is to top-selling ones. 

What is troubling about the decision is not that it focuses in on a certain type of author and certain type of book—the question of what the relevant “market” is in antitrust cases is a complicated one—but that the vision of authorship and publication it presents as typical does not reflect the lived experiences of most authors. The dominant narrative that “authors” are famous people who make a living from their writing, primarily through the high advances they receive from trade publishers, simply does not bear out in today’s information economy. 

Overall, the decision in this case is in many ways a boon for authors who care about a vibrant and diverse publishing ecosystem—whether they are authors of anticipated top sellers or authors who forgo compensation and publish open access. When publishing houses consolidate, fewer books are published, and fewer authors can publish with these publishers. This could lead less commercially successful trade authors to turn to other publishers (whether small trade publishers, university presses, or boutique publishers), who would then be forced to take on fewer books by less commercially successful authors. Self-publishing is an option for authors no longer able to find publishers willing to take their work on, but self-published authors earned 58% less than traditionally published authors as of 2017, and this decrease could lead some authors to abandon their writing projects altogether. These downstream effects may be speculative, but they deserve attention: this is almost certainly not the last we will hear about anticompetitive behavior in the publishing industry, and the effects of this behavior on non-top selling authors also matter. We hope that future judges considering these thorny questions will remember that authors are not a monolith, yet all are affected by drastic changes to the publishing ecosystem.