Category Archives: Blog

Who Represents You in the AI Copyright Lawsuits? 

Posted October 16, 2024

Sara Silverman is the author of The Bedwetter, a comedy memoir.  Richard Kadrey wrote Sandman Slim, a fantasy novel series. Christopher Golden, a supernatural thriller titled Ararat. 

These authors might not seem to have much in common with an academic author who writes in history, physics, or chemistry. Or a journalist. Or a poet. Or, for that matter, me, writing this blog post.  And yet, these authors may end up representing us all in court. 

A large number of the recent AI copyright lawsuits are class action lawsuits. This means that these lawsuits are brought by a small number of plaintiffs who (subject to judicial approval) are granted the right to represent a much larger class. In many of the AI copyright lawsuits,  the proposed classes are extraordinarily broad, including many creators who might be surprised that they are being represented. If you live in the US and wrote something that was published online, there is a good chance that you are included in multiple of these classes. 

A very brief background on class action lawsuits

Class actions can be an efficient way of resolving disputes that involve lots of people, allowing for a single resolution that binds many parties when there are common interests and facts. As you can imagine, the class action mechanism can also attract misuse, for example, by plaintiffs (and their attorneys) who may seek large settlements on behalf of a large number of people. Those settlements may benefit the named plaintiffs and their attorneys but they aren’t really aligned with the interests of most class members. 

There are rules in place to prevent that kind of abuse.  In federal courts (where all copyright lawsuits must be brought), Rule 23 of the Federal Rule of Civil Procedure governs. It provides that:

“One or more members of a class may sue or be sued as representative parties on behalf of all members only if: 
(1) the class is so numerous that joinder of all members is impracticable; [“numerosity”]
(2) there are questions of law or fact common to the class; [“commonality”]
(3) the claims or defenses of the representative parties are typical of the claims or defenses of the class; [“typicality”] and
(4) the representative parties will fairly and adequately protect the interests of the class. [“adequacy”]”

The rest of Rule 23 contains a number of other safeguards to protect both class members and defendants. Among them are requirements that the court must certify that the class complies with rule 23,  that any proposed settlements be approved by the court,  and that class members receive notice of any proposed settlement and an opportunity to object. Additionally, there are a number of rules to ensure that the law firm bringing the suit can fairly and competently represent the class members. 

Class definition and class representatives in the copyright AI lawsuits

We believe it’s important for creators to pay attention to these suits because if a class is certified and that class includes those creators, the class representatives will have meaningful legal authority to speak on their behalf.  

Rule 23  provides that “at an early practicable time after a person sues,” the court must decide whether to certify the proposed class. Though we are now well over a year into some of the earliest suits filed, this has yet to happen. In the meantime what we have are proposed class definitions offered by plaintiffs. How broadly or narrowly a class is defined by the plaintiffs will be one of the most important factors in whether the class can be certified since it will directly affect the commonality of facts among the class, the typicality of claims, and whether the representatives can fairly and adequately represent the interests of the class. Plaintiffs have the burden of proving that they have satisfied Rule 23. 

In these AI lawsuits, we see some themes in terms of class representative and proposed classes, with many offering very broad class definitions. For example, in the now-consolidated In re OpenAI ChatGPT Litigation, the class representatives are 11 fiction writers of books such as The Cabin at the End of the World, The Brief Wondrous Life of Oscar Wao, What the Dead Know and others. 

They propose to represent a class defined as follows:  

“All persons or entities domiciled in the United States that own a United States copyright in any work that was used as training data for the OpenAI Language Models during the Class Period [defined as June 28, 2020 to the present].” 

This kind of broad “anyone with a copyright in a work used for training” approach to class definition is repeated in a few other suits. For example, the consolidated Kadrey v. Meta lawsuit has a similar (and overlapping) grouping of fiction author class representatives and an almost identical proposed class definition. Dubus v. NVIDIA is another suit that takes essentially the same approach. 

Other AI lawsuits have more variation in class representatives. Huckabee v. Bloomberg, for example, is another suit with a similar class definition (basically, all copyrighted works owned by someone in the US and used for training Bloomberg’s LLM) but with class representatives that are a bit different: mostly authors of religious books and of course, Mike Huckabee, a politician. 

There is at least one class action that is more precise both in terms of proposed class representatives and their relation to the proposed class definition. The now-consolidated Authors Guild v. OpenAI suit has some 28 proposed class representatives, most of whom are authors of best-selling fiction and non-fiction trade books, 14 of whom are members of the Authors Guild. In this suit, the plaintiffs propose two classes: one for fiction authors and one for non-fiction authors. It also places some restrictions around them: class members for fiction works must be “natural persons” who are “sole authors of, and legal or beneficial owners of Eligible Copyrights in” fictional works that were registered with the U.S. Copyright Office and used for training the defendants’ LLMs (and this includes persons who are beneficiaries of works held by literary estates). For nonfiction authors, class members are “[a]ll natural persons, literary trusts, and literary estates in the United States who are legal or beneficial owners of Eligible Nonfiction Copyrights’ which the complaint defines as works used to train defendants’ LLMs and that have an ISBN with the exception of any books classified as reference works (BISAC code REF). 

Some challenges and dangers
When you consider the scale and scope of materials used to train the AI models in question, you can immediately see some of the challenges that are likely to arise with relatively small groups of authors attempting to represent practically all individual U.S. copyright owners. 

While the exact training materials used for the models at issue remain opaque, it’s definitely true that they were not just trained on modern fiction. There is widespread acknowledgment that these models are trained on a large amount of content scraped from across the internet using data sources such as Common Crawl. This, in effect, means that these suits implicate the rights of millions of rights holders, with interests as diverse as those of YouTube content creators, computer programmers, novelists, academics, and more. 

How can these representatives fairly and adequately represent such a broad and diverse group–especially when many may disagree with the underlying motivations for the suit to begin with–is a tough question. Even the Authors Guild consolidated case, which is much more careful in terms of class definition, includes classes that are breathtakingly broad when one considers the diversity of authorship within them. The fiction author class, for example, could include everyone from NY Times bestselling authors to fan fiction writers. The nonfiction class, which is at least limited to nonfiction book authors of works assigned an ISBN, could similarly include everyone from authors of popular self-help books distributed by the millions to scholarly books with print runs in the low hundreds and distributed online on open-access terms. The interests, financial and otherwise, of those authors can vary significantly. 

Beyond the adequacy of representatives (along with questions about whether their experiences are really typical of others in the proposed class), there are other challenges unique to copyright law, for example, the opaque nature of ownership (there is no official public record of who owns what), making ascertaining who actually falls within the class an initial challenge. Compounding that, there are a dizzying variety of unique terms under which works are distributed online, some of which may afford AI developers a viable defense for many works. A fair use defense also requires some level of assessment of the nature of the works used, a fact-intensive inquiry that will vary from one work to another. This just scratches the surface of some of the issues that likely mean there really aren’t common questions of law or fact among the class. 

Conclusion
There are good reasons to think that the classes as currently defined in these lawsuits are too broad. For some of the reasons mentioned above, I think it will be difficult for courts to certify them as is. But this doesn’t mean authors and other rightsholders should sit back and assume that their interests won’t be co-opted by others in these suits who seek to represent them. We don’t know when the courts will actually address these class certification issues in these suits. When they do, it will be important for authors to speak up. 

Artist Left with Heavy Fees by Copyright Troll Law Firm

Posted October 11, 2024

Facts of the Case & Fair Use

On September 18, the 5th Circuit decided in Keck v. Mix Creative Learning Center that using copyrighted artwork to teach children how to make art in a similar style does not constitute copyright infringement. The case adds to the well-developed jurisprudence that teaching with copyrighted materials is often protected by fair use.

This case was initially filed in 2021 by plaintiff’s counsel, Mathew Kidman Higbee, a known and prolific copyright litigation firm sometimes accused of troll-like behavior.  During the pandemic, the defendant sold a total of six art kits (out of the six kits sold, two were purchased by the plaintiff) that included images of the plaintiff’s dog-themed artworks, biographical information, and details on her artistic styles. Additionally, the kit included paint, paintbrushes, and collage paper. The plaintiff’s side argued that including the artworks in teaching kits constituted willful copyright infringement and therefore demanded $900,000 in damages—to make up for the $250 the defendant made in sales. 

The district court dismissed all infringement claims in 2022; and last month, the 5th Circuit court affirmed that including copies of plaintiff’s artwork in a teaching kit is fair use. 

The courts found the first and fourth fair use factors to favor the defendant. Under the first factor, even though the defendant’s use was commercial in nature, by accompanying the artworks with art theory and history, the teaching kit transformed the original decorative purpose of the dog-themed artworks. The 5th Circuit distinguished this case from Warhol by pointing out that, in the Warhol case, the infringing use served the same illustrative purpose as the original work, while in this case, “the art kits had educational objectives, while the original works had aesthetic or decorative objectives.”  

Under the fourth factor, courts explained that they cannot imagine how the market value of plaintiff’s dog-themed artworks could decrease when included in children’s art lesson kits. The 5th Circuit Court further pointed out that there was no evidence that a market for licensing artworks for similar teaching kits exists now or is ever likely to develop. 

Because these “two most important” factors favored the defendant, the defendant’s use was fair use.

Fee Shifting: Plaintiffs Beware of Copyright Troll Law Firms!

The final outcome of the case: the plaintiff was ordered to cover $102,404 in fees and $165.72 in costs for the defendant.

Even though we are happy for the defendant and her counsel that, after a prolonged legal battle, this well-deserved victory is finally won, it is nevertheless disheartening to see the plaintiff-artist left alone in the end to face the high legal fees of this ill-conceived lawsuit. The plaintiff’s counsel not only failed to advise the plaintiff to act in her own best interest (whether it is to settle the case at the right moment or to pursue more plausible claims), but also conjured up willful infringement claims that were clearly meritless to any trained eye. Even the 5th Circuit Court lamented over this in its opinion, as it begrudgingly upheld the district court’s decision based on the abuse of discretion standard it must follow:

It is troubling that Keck alone will be liable for the high fees incurred by Defendants largely because of Higbee & Associates’ overly aggressive litigation strategy. From our review of the record, the law firm lacked a firm evidentiary basis to pursue hundreds of thousands of dollars in statutory damages against Defendants for willful infringement. Nevertheless, we cannot say, on an abuse of discretion standard, that the district court erred by determining that there was insufficient evidence that the firm’s conduct was both unreasonable and vexatious. … But we warn Higbee & Associates that future conduct of this nature may well warrant sanctions, and nothing in this opinion prevents Higbee & Associates from compensating its client, if appropriate, for the fees that she is now obliged to pay Defendants.

This should serve as a cautionary tale for would-be plaintiffs: copyright lawsuits, like any other type of litigation, are primarily meant to address the damages plaintiffs actually suffered, and the final settlement should make plaintiffs whole again—that is, as if no infringement has ever occurred. Copyright lawsuits (or the threat to sue) should not be undertaken as a way to create brand new income streams, such as was the case in the lawsuit described above. 

When someone aggressively enforces dubious copyright claims with the sole purpose of collecting exorbitant fees rather than protecting any underlying copyrights, they are called a “copyright troll.” Regrettably, beyond the disreputable law firms that are enthused to pursue aggressive claims, many services now exist to tempt creators into troll-like behavior by promising “new licensing income.” The true aim of these services is solely to collect high representation charges from creators, when users of the creators’ works are harassed into paying exorbitant settlements. Many victims often agree to pay just for the nuisance to stop. This predatory business model has been repeatedly exposed by creators and authors, including famously by Cory Doctorow

Needless to say, copyright trolls are harmful to the copyright ecosystem. Obviously, innocent users are harmed when slapped with unreasonable demand letters or even frivolous lawsuits. Worse, creators are misled into supporting this unethical practice while deluded into believing they are doggedly following the spirit of the law—sometimes, as was in this case, they are left to face the inevitable consequences of bringing a frivolous lawsuit, while the lawyer or agent that originally led them into the mire gets off free, upward and onward to their next “representation.” 

It was very unfortunate that the district court did not fully study the plaintiff’s counsel’s track record and issue appropriate disciplinary orders against him. The problem of copyright trolls will have to be addressed soon in order to preserve a healthy copyright system. 

What is “Derivative Work” in the Digital Age?

Posted October 7, 2024
on the top, Seltzer v. Green Day; on the bottom, Kienitz v. Sconnie Nation

Part I: The Problem with “Derivative Work”

The right to prepare derivative works is one of the exclusive rights copyright holders have under §106 of the Copyright Act. Other copyright holders’ exclusive rights include the right to make and distribute copies, and to display or perform a work publicly. 

Lately, we’ve seen a congeries of novel conceptions about “derivative works.” For example, a reader of our blog stated that when looking at AI models and AI outputs, works should be considered infringing “derivatives” even when there is no substantial similarity between the infringing AI model/outputs and the ingested originals. Even in the courts, we’ve seen confusion, for example, Hachette v. Internet Archive presented us with the following statement about derivative works:

Changing the medium of a work is a derivative use rather than a transformative one. . . . In fact, we have characterized this exact use―“the recasting of a novel as an e-book”―as a “paradigmatic” example of a derivative work. [citation omitted; emphasis added]

These statements leave one to wonder—what is a copy, a derivative work, an infringing use, and a transformative fair use in the context of U.S. copyright law? In order to have some clarity on these questions, it’s helpful to juxtapose “derivative works” first with “copies” and then with “transformative uses.” We think the confusion about derivative work and its related concepts arises out of using the phrase to mean “a work that is substantially similar to the original work” as well as “a work that is so in an unauthorized way, not excused from liabilities.”

There are many immediate real world implications for confusion over the meaning of “derivative work.” In privately negotiated agreements, licensees who have a right to make reproductions but not derivative works may be confused as to what medium their use is restricted to. For example, a publisher of a book with a license that allows it to make reproductions but not derivatives might be confused as to whether, under the Hachette court’s reasoning, it is allowed to republish a print book in a digital format such as a simple PDF of a scan. Similarly, for public licenses, such as the CC ND licenses, where a licensor stipulates restriction on the creation of derivative works, it causes confusion for downstream users whether, say, changing a pdf into a Word document is allowed. 

This is also an important topic to explore both in the recent hot debates over Controlled Digital Lending and generative artificial intelligence, as well as in an author’s everyday work—for instance, would quoting someone else’s work make your article/book a derivative work of the original? 

Part II: “Copies” and “Derivatives”

Our basic understanding of derivative works comes from the 1976 Copyright Act. The §101 definition tells us:

A “derivative work” is a work based upon one or more preexisting works, such as a translation, musical arrangement, dramatization, fictionalization, motion picture version, sound recording, art reproduction, abridgment, condensation, or any other form in which a work may be recast, transformed, or adapted. A work consisting of editorial revisions, annotations, elaborations, or other modifications which, as a whole, represent an original work of authorship, is a “derivative work”.

The U.S. Copyright Office published Circular 14 gives some further helpful guidance as to what a §106 derivative work would look like:

To be copyrightable, a derivative work must incorporate some or all of a preexisting “work” and add new original copyrightable authorship to that work. The derivative work right is often referred to as the adaptation right. The following are examples of the many different types of derivative works: 

  • A motion picture based on a play or novel 
  • A translation of an novel written in English into another language
  • A revision of a previously published book 
  • A sculpture based on a drawing 
  • A drawing based on a photograph 
  • A lithograph based on a painting 
  • A drama about John Doe based on the letters and journal entries of John Doe 
  • A musical arrangement of a preexisting musical work 
  • A new version of an existing computer program 
  • An adaptation of a dramatic work 
  • A revision of a website

One immediate observation that can be made from reading these, is that “ebook” or “digitized version of a work” is not listed as, nor similar to any of the exemplary derivative works in the Copyright Act or the Copyright Office Circular. By contrast, “ebook” or “digitized version of a work” seems to fit much better under the § 101 definition of “copies”:

“Copies” are material objects, other than phonorecords, in which a work is fixed by any method now known or later developed, and from which the work can be perceived, reproduced, or otherwise communicated, either directly or with the aid of a machine or device. The term “copies” includes the material object, other than a phonorecord, in which the work is first fixed.

The most crucial difference between a “copy” and a “derivative work” is whether new authorship is added. If no new authorship is added, merely changing the material that the work is fixed on does not create a new copyrightable derivative work. This, in fact, is observed by many courts before Hachette. For example, in Corel v. Bridgeman Art Gallery, the court unequivocally held that there is no new copyright granted to photos of public domain paintings. 

Additionally, as we know from Feist v. Rural Tel., “[t]he mere fact that a work is copyrighted does not mean that every element of the work may be protected.” Copyright protection is only limited to the original elements of a work. We cannot call a work “derivative” of another if it does not incorporate any copyrightable elements from the original copyrighted work. For example, the “Game Genie” device, which let players change elements of a Nintendo game, was not found to be a derivative work by the court because it didn’t incorporate any part of the Nintendo game. 

It is clear from this examination that sometimes a later-created work is a copy, sometimes a derivative, and sometimes it may not implicate any of the exclusive rights of the original.

Part III: “Derivative” and “Transformative” Works

Let’s quickly recap the context in which courts are confusing “derivative” and “transformative” works—

A prima facie case of copyright infringement requires the copyright holder to prove (1) ownership of a valid copyright, and (2) inappropriate copying of original elements. We will not go into more details here, but essentially, the inappropriate copying prong requires plaintiffs to assert and prove defendant’s access to the plaintiff’s work as well as a level of similarity between the works in question that shows improper appropriation of the plaintiff’s work. If the similarity between the defendant’s work and protectable elements in the plaintiff’s work is minimal, then there is no infringement. As seen in the  “Game Genie” example above, courts can rely on substantial similarity analysis to determine whether a work is indeed a potentially-infringing copy or derivative of the plaintiff’s work.

Once the plaintiff establishes a prima facie infringement case—e.g., the defendant’s work is shown to be a derivative or a copy of the plaintiff’s registered work—the defendant may still nevertheless be free to make the use if the use falls outside the ambit of the copyright holder’s §106 rights, such as uses that are fair use. Whether a work is a derivative work under § 106 is no longer a relevant inquiry after establishing a prima facie case: this point is starkly obvious when looking at the many plausible defenses a defendant can raise (including fair use) where even the verbatim copying of a work is authorized by law. 

As the court stated in Authors Guild v. Hathitrust, “there are important limits to an author’s rights to control original and derivative works. One such limit is the doctrine of ‘fair use,’ which allows the public to draw upon copyrighted materials without the permission of the copyright holder in certain circumstances.” When a prima facie infringement case is already established, yet a court still discusses whether the defendant’s work is a “derivative work,” at a minimum, the court adds confusion by beyond the § 101  definition of a derivative work. 

In fact, a distinct new significance is being given to “derivative work” in recent years in the context of the “purpose and character” factor of fair use, specifically, when analyzing if a use has a transformative purpose. The shift in a word’s meaning or a concept is not per se unimaginable or objectionable. It is misguided to consider the copyright legal landscape static. As law professor Pamela Samuelson pointed out, before the mid-19th century, most courts did not even think copyright holders were entitled to demand compensation from others preparing derivative works. The 1976 Copyright Act finally codified copyright holders’ exclusive right to prepare derivative works. And, now, some rights holders want the courts to say there are categorical derivative uses that can never be considered fair use.

The Hachette court is among those that have unfortunately bought into this novel approach. The court seems not only to misconstrue the salient distinction between a ‘copy of a work’ and a ‘derivative work’, they appear to give heightened protections to works they now define as ‘derivative’. If this misconception becomes widespread, we will be living in a world where if a use is new-derivative, then it is never transformative (and, if it is not transformative, it is likely not fair). Ultimately, it is purely circular for a court to say that the reason for denying the fair use defense is that the use is derivative. When we buy into this setup of “derivative v.s. transformative,” it is difficult to ever say with confidence that a work is transformative, because at the same time we remember how a transformative use should often fit in the actual definition of derivative work under § 101, “derivative”—just like the Green Day rendition of the plaintiff’s art in Seltzer v. Green Day.  

Clearly, if we take “derivative work” at its true § 101 definition, out of all potentially infringing works, “transformative fair use” is not an absolute complement, but a possible subset, of derivative works. We know from Campbell v. Acuff-Rose that “transformativeness is a matter of degree, not a binary;” whereas no such sliding scale is plausible for derivative works. A work is either a derivative or it is not: there’s never a “somewhat derivative” work in copyright. All in all, it makes little sense to frame the issues as “transformative v.s. derivative work”—such discussions inevitably buy into the rhetorics of copyright expansionists. We have already warned the court in Warhol against the danger of speaking heedlessly about derivative works in the context of fair use. We must ensure that the “derivative v.s. transformative” dichotomy does not come to dominate future discussions of fair use, so that we conserve the utility and clarity of the fair use doctrine.

The expansion of the relevance of “derivative work” beyond the establishment of a prima facie infringement case not only creates a circular reasoning for denying fair use, but also makes it impossible to make sense of the case law we have accumulated on fair use. Take Seltzer v. Green Day for example, the court held that a work can be transformative even if that work “makes few physical changes to the original.” The Green Day concert background art with a red cross superimposed was found to be a fair use of the original street art—a classic example of how a prima facie infringing derivative work can nevertheless be a transformative, and thus fair, use. Similarly, in Kienitz v. Sconnie Nation, a derivative use of a photo on a tshirt was found to be a fair use. Ideas and concepts, including “derivative works,” are only important to the extent they elucidate our understanding of the world. When the use of “derivative works” leads to more confusion than clarity, we should be cautious in adopting the new meaning being superimposed on “derivative works.”

Antitrust Lawsuit Filed Against Large Academic Publishers

Posted September 17, 2024

On September 12, a San Francisco-based law firm filed an antitrust lawsuit on behalf of UCLA professor Lucina Uddin against six prominent academic publishers and the trade association that represents them: Elsevier, John Wiley & Sons, Sage Publications, Springer Nature, Taylor & Francis, Wolters Kluwer, and the International Association of Scientific, Technical, and Medical Publishers (“STM”). The suit is brought on behalf of a class that it defines as “All natural persons residing in the United States who performed peer review services for, or submitted a manuscript for publication to, any of the Publisher Defendants’ peer-reviewed journals from September 12, 2020 to the present.” The complaint lists just one claim for relief: that “Publisher Defendants and their co-conspirators entered into and engaged in unlawful agreements in restraint of the trade and commerce described above in violation of Section 1 of the Sherman Act, 15 U.S.C. § 1.” 

To support this claim, the plaintiff makes three key allegations. Namely, that the publishers have illegally agreed amongst each other to abide by: 

  1. a “Single Submission Rule,” where researchers are only allowed to submit a manuscript to one journal for consideration unless the journal rejects it;
  2. a “Unpaid Peer Review Rule,” where journals implement policies to not compensate peer reviewers for their labor; and
  3. a “Gag Rule,” where researchers are not allowed to share or discuss their manuscript once they have submitted it to a journal for consideration before the journal publishes it.

Why would any of these actions constitute an antitrust violation? We thought a little background could be helpful: 

To understand this lawsuit, we must first consider the purpose of U.S. antitrust law. The fundamental goal of antitrust law is to encourage competition and ultimately to promote consumer welfare. The Supreme Court explains that: “Congress designed the Sherman Act as a consumer welfare prescription.” 

Section 1 of the Sherman Antitrust Act does this by prohibiting “[e]very contract, combination in the form of trust or otherwise, or conspiracy, in restraint of trade.” This generally requires proving two things: (1) some sort of agreement or business arrangement, and (2) that this agreement is “in restraint of trade,” i.e., unreasonably harmful to competition.  

Proving an agreement can sometimes be a complicated factual question, though often there are good clues, especially when joint activity is coordinated through a trade association (antitrust lawyers love to quote Adam Smith on trade associations: “People of the same trade seldom meet together, even for merriment and diversion, but the conversation ends in a conspiracy against the public, or in some contrivance to raise prices.”)  In this case, the plaintiff says that the agreements are so obvious that they are in fact published openly in several portions of the STM’s “International Ethical Principles for Scholarly Publication” which are then implemented and enforced by each publisher.  

Proving the second part, that the agreement is in restraint of trade or unreasonably harmful to competition can be more complicated. The courts have developed three different analytical frameworks for evaluating whether conduct harms competition in this context:

  1. A “per se” rule for agreements that are “nakedly” anticompetitive. Examples include agreements to fix prices (what’s alleged here, at least with respect to payment for peer review), bid-rigging, agreements to divide markets, and a few other kinds of less common agreements. 
  2. A “rule of reason” test—which applies in most cases—that weighs the pro-competitive effects of the agreement against anti-competitive effects and prohibits agreements only when the anti-competitive harms outweigh the benefits.
  3. “Intermediate” or “quick-look” scrutiny, which covers a small number of cases in which agreements look suspicious on their face but are not so obviously anticompetitive as to fall under the “per se” rule. 

The plaintiff’s complaint claims that the publishers’ agreements violate the Sherman Act regardless of which of these three tests apply.  But often, some of the most significant battles in antitrust lawsuits are about which of these standards apply since the costs associated with litigating a suit can change dramatically depending on which is used. If the court accepts that the “per se” standard applies, the plaintiff likely wins. If the “quick-look” rule applies, the burden is on the defendant to show that its conduct is not anticompetitive. But if the “rule of reason” standard applies, the suit will likely involve extensive discovery, expert witnesses, and other factual evidence about what exactly the market is, whether the defendants had sufficient market power to negatively affect competition, and whether the agreement would negatively or positively affect competition. 

In these cases, defining the relevant market is often crucial. In most antitrust cases, defining the relevant market involves identifying substitutes for the product under review. On this point, the plaintiff argues:  

 “Publication by peer-reviewed journals is a relevant antitrust market. For scholars who seek to communicate their scientific research, there is no adequate substitute for publication in a peer-reviewed journal. Peer-reviewed journals establish the validity of scientific research through the peer review process, communicate that research to the scientific community, and avoid competing claims to the same scientific discovery.”

For market definition, it is important to include only close substitutes and exclude those that are distant. For academic publishing, even though there are many ways authors can share their manuscripts—from emailing their colleagues, to posting on their personal websites, to publishing with upstart new journals—the fact remains that the journals in question are the primary means of dissemination and are the most heavily read and heavily cited. So at first glance, the complaint seems realistic in its formulation of the market at issue. 

The complaint then goes on to argue that the publishers hold significant power in this market and misuse that power, touching on familiar themes such as how these academic publishers extract significant profits, charge high rates for access and increasingly high fees for authors to publish openly, and so on. The plaintiffs allege that this market power has allowed the publishers to make agreements amongst each other (the three allegations noted above) in ways that allow them to maximize profits while also maintaining their market power. We note that it’s true that there may be some other explanations for these practices—in fact, some authors may be proponents of some of them, for example, the single-submission rule. But even if there are other explanations for these rules, with the current arrangement agreed through STM, the allegation is that no member publisher will even try to compete or develop other approaches that may drive up the price for peer reviewers, compete for placement of papers, etc.

How this lawsuit will turn out is hard to predict. This lawsuit flags some very problematic practices enforced on the academic publishing industry by prominent publishers. But there are many other problems with the academic publishing industry not discussed in the complaint. We’ve long thought that the public-interest nature of academic research and publishing is complicated when paired with commercial publishers who have strong incentives to maximize profits. Of course, even for-profit firms are expected to operate within the law; profit-maximizing to the point of adopting anti-competitive practices is fundamentally at odds with their essential social responsibilities.

Hachette v. Internet Archive Update: Second Circuit Court of Appeals Rules Against Internet Archive

Posted September 5, 2024

We got a disappointing decision yesterday from the Second Circuit Court of Appeals in the long-running Hachette v. Internet Archive (IA) copyright lawsuit about IA’s digitization and lending of books. The Court affirmed the district court’s decision that IA cannot circulate digital copies of books they have legitimately acquired in physical copies, even when only the same number of copies as legitimately acquired are circulated to a single user at a time—just as a physical book would be loaned.

The Court, focusing on IA’s lending of digitized books that were available for license as ebooks from the publishers, concluded that IA’s fair use defense fails. We think this decision will result in a meaningful reduction in access to knowledge. This is sad news for many authors who have relied on IA’s Open Library for research and discovery, and  for readers who have used Open Library to find authors works. However, we also view it as a decision limited to its facts—that is, IA’s particular implementation of controlled digital lending (CDL), and more specifically, its lending of books that are already available in licensed digital formats. 

We plan to do a more in-depth analysis of the Court’s decision later, but for now, we offer some initial thoughts. First, there are a couple of bright spots in the opinion: 

1) The Court rejected the district court’s conclusion that IA was engaged in commercial use when looking at the first factor of fair use. The publishers argued IA’s lending of digitized books was commercial in nature because IA received a few thousand dollars from a for-profit used-bookseller and also solicited donations on its website. The Court rightly pointed out that if that was the standard, virtually every nonprofit that solicits donations would by default only be able to engage in commercial use. This was an issue we and others strongly urged the Court to address, and we’re glad it did. 

2)  For the most part, the Court focused its analysis on the facts of the case, which was really about IA lending digitized copies of books that were already available in ebook form and licensable from the publishers. The legal analysis in several places turned on this fact, which we think leaves room to make fair use arguments regarding programs to digitize and make available other books, such as print books for which there is no licensed ebook available, out-of-print books, or orphan works. CDL will remain an important framework, especially considering the lack of an existing digital first-sale doctrine.  

We are also disappointed by several key points in the decision: 

One was the Court’s assessment of the first fair use factor, “purpose and character of the use.” The Court’s analysis of this factor was in some ways unsurprising but nevertheless disappointing. The Court did little more than conclude that the use was not transformative and, therefore, not fair use. Though we think there are strong arguments that CDL is transformative, whether CDL is “transformative” is just one of the supporting rationales for the argument that CDL is fair use. The other justifications—that CDL supports teaching, scholarship, and research, along with complementing the first sale doctrine and supporting the public-interest mission of libraries—are at the heart of CDL. The Court didn’t engage with those other arguments at all and also ignored meaningful discussion of cases where non-transformative copying supported a fair use finding because of the public benefits.

A second key issue is about whether IA’s digital lending negatively impacts the market for the original works. This issue probably deserves a whole blog post to itself, but in short the analysis came down to who shoulders the burden of proving or disproving market harm, and what default assumptions the court has about market harm.  The following quotes from the decision will give you a sense of how the Court analyzed the issue: 

[a]lthough they do not provide empirical data of their own, Publishers assert that they (1) have suffered market harm due to lost eBook licensing fees and (2) will suffer market harm in the future if IA’s practices were to become widespread.  IA argues that Publishers cannot rely on the “common-sense inference” of market harm without data to back that up, citing American Society for Testing & Materials v. Public.Resource.Org, Inc. [citations omitted]. . . . We agree with Publishers’ assessment of market harm. 

Despite IA’s experts having offered meaningful data and analysis indicating a lack of market harm on sales of publishers’ books, the Court went on to say: 

We are likewise convinced that “unrestricted and widespread conduct of the sort engaged in by [IA] would result in a substantially adverse impact on the potential market for [the Works in Suit]. . . . Though Publishers have not provided empirical data to support this observation, we routinely rely on such logical inferences where appropriate in assessing the fourth fair use factor. . . . Thus, we conclude it is “self-evident” that if IA’s use were to become widespread, it would adversely affect Publishers’ markets for the Works in Suit.

We are also disappointed by how the Court portrayed the overall public benefit of IA’s lending and its long-term effect: “while IA claims that prohibiting its practices would harm consumers and researchers, allowing its practices would―and does―harm authors.” We think this is a gross generalization and mischaracterization of how IA’s digital lending affects most authors. Authors are researchers. Authors are readers. IA’s digital library helps authors create new works and supports their interests in having their works read. This ruling may benefit the largest publishers and most prominent authors, but for most, it will end up harming more than it will help. 

The AI Copyright Hype: Legal Claims That Didn’t Hold Up

Posted September 3, 2024

Over the past year, two dozen AI-related lawsuits and their myriad infringement claims have been winding their way through the court system. None have yet reached a jury trial. While we all anxiously await court rulings that can inform our future interaction with generative AI models, in the past few weeks, we are suddenly flooded by news reports with titles such as “US Artists Score Victory in Landmark AI Copyright Case,” “Artists Land a Win in Class Action Lawsuit Against A.I. Companies,” “Artists Score Major Win in Copyright Case Against AI Art Generators”—and the list goes on. The exuberant mood in these headlines mirror the enthusiasm of people actually involved in this particular case (Andersen v. Stability AI). The plaintiffs’ lawyer calls the court’s decision “a significant step forward for the case.” “We won BIG,” writes the plaintiff on X

In this blog post, we’ll explore the reality behind these headlines and statements. The “BIG” win in fact describes a portion of the plaintiffs’ claims surviving a pretrial motion to dismiss. If you are already familiar with the motion to dismiss per Federal Rules of Civil Procedure Rule 12(b)(6), please refer to Part II to find out what types of claims have been dismissed early on in the AI lawsuits. 

Part I: What is a motion to dismiss?

In the AI lawsuits filed over the last year, the majority of the plaintiffs’ claims have struggled to survive pretrial motions to dismiss. That may lead one to believe that claims made by plaintiffs are scrutinized harshly at this stage. But that is far from the truth. In fact, when looking at the broader legal landscape beyond the AI lawsuits, Rule 12(b)(6) motions are rarely successful.

In order to survive a Rule 12(b)(6) motion to dismiss filed by AI companies, plaintiffs in these lawsuits must make “plausible” claims in their complaint. At this stage, the court will assume that all of the factual allegations made by the plaintiffs are true and interpret everything in a way most favorable to plaintiffs. This allows the court to focus on the key legal questions without getting caught up in disputes about facts. When courts look at plaintiffs’ factual claims in the best possible light, if the defendant AI companies’ liability can plausibly be inferred based on facts stated by plaintiffs, then the claims will survive a motion to dismiss. Notably, the most important issues at the core of these AI lawsuits—namely, whether there has been direct copyright infringement and what may count as a fair use—are rarely decided at this stage, because these claims raise questions about facts as well as the law. 

On the other hand, if the AI companies will prevail as a matter of law even when the plaintiffs’ well-pleaded claims are taken as entirely true, then the plaintiffs’ claims will be dismissed by court. Merely stating that it is possible that the AI companies have done something unlawful, for instance, will not survive a motion to dismiss; there must be some reasonable expectation that evidence can be found later during discovery to support the plaintiffs’ claims. 

Procedurally, when a claim is dismissed, the court will often allow the plaintiffs to amend their complaint. That is exactly what happened with Andersen v. Stability AI (the case mentioned at the beginning of this blog post): the plaintiffs’ claims were first dismissed in October last year, and the court allowed the plaintiffs to amend their complaint to address the deficiencies in their allegations. The newly amended complaint contains infringement claims that survived new motions to dismiss, as well as other breach of contract, unjust enrichment, and DMCA claims that again were dismissed.

As you may have guessed, including something like the “motion to dismiss” in our court system can help save time and money, so parties don’t waste precious resources on meritless claims at trial. One judge dismissed a case against OpenAI earlier this year, stating that “the plaintiffs need to understand that they are in a court of law, not a town hall meeting.” The takeaway: plaintiffs need to bring claims that can plausibly entitle them to relief.

Part II: What claims are dismissed so far?

Most of the AI lawsuits are still at an early stage, and most of the court rulings we have seen so far are in response to the defendants’ motions to dismiss. From these rulings, we have learned which claims are viewed as meritless by courts. 

The removal of copyright management information (“CMI,” which includes information such as the title, the copyright holder, and other identifying information in a copyright notice) is a claim included in almost all plaintiffs’ complaints in the AI lawsuits, and this claim has failed to survive motions to dismiss without exception. DMCA Section 1202(b) restricts the intentional, unauthorized removal of CMI. Experts initially considered DMCA 1202(b) one of the biggest hurdles for non-licensed AI training. But courts so far have dismissed all DMCA 1202(b) claims, including in J. Doe 1 v. GitHub, Tremblay v. OpenAI, Andersen v. Stability AI, Kadrey v. Meta Platforms, and Silverman v. OpenAI. The plaintiffs’ DMCA Section 1202(b)(1) claims have failed because plaintiffs were not able to offer any evidence showing their CMI has been intentionally removed by the AI companies. For example, in Tremblay v. OpenAI and Silverman v. OpenAI, the courts held that the plaintiffs did not argue plausibly that OpenAI has intentionally removed CMI when ingesting plaintiffs’ works for training. Additionally, plaintiffs’ DMCA Section 1202(b)(3) have failed thus far because the plaintiffs’ claims did not fulfill the identicality requirement. For example, in J. Doe 1 v. GitHub, the court pointed out that Copilot’s output did not tend to represent verbatim copies of the original ingested code. We now see plaintiffs voluntarily dropping the DMCA claims in their amended complaints, such as in Leovy v Google (formerly J.L. vs Alphabet). 

Another claim that has been consistently dismissed by courts is that AI models are infringing derivative works of the training materials. The law defines a derivative work as “a work based upon one or more preexisting works, such as a translation, musical arrangement, … art reproduction, abridgment, condensation, or any other form in which a work may be recast, transformed, or adapted.” To most of us, the idea that the model itself (as opposed to, say, outputs generated by the model) can be considered a derivative work seems to be a stretch. The courts have so far agreed. On November 20, 2023, the court in Kadrey v. Meta Platforms said it is “nonsensical” to consider an AI model a derivative work of a book just because the book is used for training. 

Similarly, claims that all AI outputs should be automatically considered infringing derivative works have been dismissed by courts, because the claims cannot point to specific evidence that an instance of output is substantially similar to an ingested work. In Andersen v. Stability AI, plaintiffs tried to argue “that all elements of [] Anderson’s copyrighted works [] were copied wholesale as Training Images and therefore the Output Images are necessarily derivative;” the court dismissed the argument because—besides the fact that plaintiffs are unlikely able to show substantial similarity—“it is simply not plausible that every Training Image used to train Stable Diffusion was copyrighted [] or that all [] Output Images rely upon (theoretically) copyrighted Training Images and therefore all Output images are derivative images. … [The argument for dismissing these claims is strong] especially in light of plaintiffs’ admission that Output Images are unlikely to look like the Training Images.”

Several of these AI cases have raised claims of vicarious liability—that is, liability for the service provider based on the actions of others, such as users of the AI models. Because a vicarious infringement claim must be based on a showing of direct infringement, the vicarious infringement claims are also dismissed in Tremblay v. OpenAI and Silverman v. OpenAI, when plaintiffs cannot point to any infringing similarity between AI output and the ingested books.

Many plaintiffs have also raised a number of non-copyright, state law claims (such as negligence or unfair competition) that have largely been dismissed based on copyright preemption. Copyright preemption prevents duplicitous state law claims when those state law claims are based on an exercise of rights that are equivalent to those provided for under the federal Copyright Act. In Andersen v. Stability AI, for example, the court dismissed the plaintiffs’ unjust enrichment claim because the plaintiffs failed to add any new elements that would distinguish their claim based on California’s Unfair Competition Law or common law from rights under the Copyright Act.

It is interesting to note that many of the dismissed claims in different AI lawsuits closely mimic one another, such as in Kadrey v. Meta Platforms, Andersen v. Stability AI, Tremblay v. OpenAI, and Silverman v. OpenAI. It turns out that the similarities are no coincidence—all these lawsuits are filed by the same law firm. These mass-produced complaints not only contain overbroad claims that are prone to dismissal, they also have overbroad class designations. In the next blog post, we will delve deeper into the class action aspect of the AI lawsuits. 

Authors Alliance and SPARC Supporting Legal Pathways to Open Access for Scholarly Works

Posted August 27, 2024

Authors Alliance and SPARC are excited to announce a new collaboration to address critical legal issues surrounding open access to scholarly publications. 

One of our goals with this project is to clarify legal pathways to open access in support of federal agencies working to comply with the Memorandum on “Ensuring Free, Immediate, and Equitable Access to Federally Funded Research,” (the “Nelson Memo”) which was issued by the White House’s Office of Science and Technology Policy in 2022. For more than a decade, federal open access policy was based on an earlier memo instructing federal agencies with research and development budgets over $100 million to make their grant-funded research publicly accessible for free online. The Nelson Memo, drawing from lessons learned during the COVID-19 Pandemic, provides important updates to the prior policy. Among the key changes are extending the requirements to all agencies, regardless of budget, and eliminating the 12-month post-publication embargo period on articles. 

The Nelson Memo raises important legal questions for agencies, universities, and individual researchers to consider. To help ensure smooth implementation of the Nelson Memo, we plan to produce a series of white papers addressing these questions. For example, a central issue is the nature and extent of the pre-existing license, known as the “Federal Purpose License,” which all federal grant-making agencies have in works produced using federal funds.  The white papers will outline the background and history of the License, and also address commonly raised questions, including whether the License would support the application of Creative Commons or other public licenses; possible constitutional or statutory obstacles to the use of the License for public access; whether the License may apply to all versions of a work; and whether the use of the License for public access would require modification of university intellectual property policies. 

In addition to the white paper series, we plan to convene a group of experts to update the SPARC Author Addendum. The Addendum was created in 2007 and has been an extremely useful tool in educating authors on how to retain their rights, both to provide open access to their scholarship and to allow for wide use of their work. However, in the nearly two decades since its creation, models for open access and scholarly publishing have changed dramatically. We aim to update the Addendum to more closely reflect the present open access landscape and to help authors to better achieve their scholarship goals.

A final piece of the project is to develop a framework for universities looking to recover rights for faculty in their works, particularly backlist and out-of-print books that are unavailable in electronic form. Though the open access movement has made significant strides in advancing free availability and reuse of scholarly articles, that progress has generally not extended to books and other monographic works, in part because of the non-standard and often complicated nature of book publishing licenses. It has also not done as much to open backfile access to older journal articles. We think a framework for identifying opportunities to recover rights and relicense them under an open access license will help advance open access of these works.

Eric Harbeson

The project will be spearheaded by Eric Harbeson, who joined the Authors Alliance this week as Scholarly Publications Legal Fellow. Eric is a recent graduate of the University of Oregon School of Law. Prior to law school, Eric had a dual career as a librarian/archivist and a musicologist. Eric did extensive work advocating for libraries’ and archives’ copyright interests, especially with respect to preservation of music and sound recordings. Eric’s publications include a well-regarded report on the Music Modernization Act, as well as two scholarly music editions. Eric can be reached at eric@authorsalliance.org.

Clickbait arguments in AI Lawsuits (will number 3 shock you?)

Posted August 15, 2024

Image generated by Canva

The booming AI industry has sparked heated debates over what AI developers are legally allowed to do. So far, we have learned from the US Copyright Office and courts that AI created works are not protectable, unless it is combined with human authorship. 

As we monitor two dozen ongoing lawsuits and regulatory efforts that address various aspects of AI’s legality, we see legitimate legal questions that must be resolved. However, we also see some prominent yet flawed arguments that have been used to enflame discussions, particularly by publisher-plaintiffs and their supporters. For now, let’s focus on some clickbait arguments that sound appealing but are fundamentally baseless. 

Will AI doom human authorship?

Based on current research, AI tools can actually help authors improve creativity, productivity, as well as the longevity of their career

When AI tools such as ChatGPT first appeared online, many leading authors and creators publicly endorsed it as a useful tool like any other tech innovation that came before it. At the same time, many others claimed that authors and creators of lesser caliber will be disproportionately disadvantaged by the advent of AI. 

This intuition-driven hypothesis, that AI will be the bane of average authors, has so far proved to be misguided.

We now know that AI tools can greatly help authors during the ideation stage, especially for less creative authors. According to a study published last month, AI tools had minimal impact on the output of highly creative authors, but were able to enhance the works of less imaginative authors. 

AI can also serve as a readily-accessible editor for authors. Research shows that AI enhances the quality of routine communications. Without AI-powered tools, a less-skilled person will often struggle with the cognitive burden of managing data, which limits both the quality and quantity of their potential output. AI helps level the playing field by handling data-intensive tasks, allowing writers to focus more on making creative and other crucial decisions about their works. 

It is true that entirely AI-generated works of abysmal quality are available for purchase on some platforms. Some of these works are using human authors’ names without authorization. These AI-generated works may infringe on authors’ right of publicity, but they do not present commercially-viable alternatives to books authored by humans. Readers prefer higher-quality works produced with human supervision and interference (provided that digital platforms do not act recklessly towards their human authors despite generating huge profits from human authors).

Are lawsuits against AI companies brought with authors’ best interest in mind? 

In the ongoing debate over AI, publishers and copyright aggregators have suggested that they have brought these lawsuits to defend the interests of human authors. Consider the New York Times for example, in its complaint against OpenAI, NY Times describes their operations as “a creative and deeply human endeavor (¶31)” that necessitates “investment of human capital (¶196).” NY Times argues that OpenAI has built innovation on the stolen hard work and creative output from journalists, editors, photographers, data analysts, and others—an argument contrary to what the NY Times once argued in court in New York Times v. Tasini,  that authors’ rights must take a backseat to NY Times’ financial interests in new digital uses.  

It is also hard to believe that many of the publishers and aggregators are on the side of authors when we look at how they have approached licensing deals for AI training. These licensing deals can be extremely profitable for the publishers. For example, Taylor and Francis sold AI training data to OpenAI for 10 million USD. John Wiley and Sons earned $23 million from a similar deal with a non-disclosed tech company. Though we don’t have the details of these agreements, it seems easy to surmise that in return for the money received, the publishers will not harass the AI companies with future lawsuits. (See our previous blog post about these licensing deals and what you can do as an author.) It is ironic how an allegedly unethical and harmful practice quickly becomes acceptable once the publishers are profiting from it.

How much of the millions of dollars changing hands will go to individual authors? Limited data exist. We know that Cambridge University Press, a good-faith outlier, is offering authors 20% royalties if their work is licensed for AI training. Most publishers and aggregators are entirely opaque about how authors are to be compensated in these deals. Take the Copyright Clearance Center (CCC) for example, it offers zero information about how individual authors are consulted or compensated when their works are sold for AI training under CCC AI training license.

This is by no means a new problem for authors. We know that traditionally-published book authors receive around 10% of royalties from their publishers: a little under $2 per copy for most books. On an ebook, authors receive a similar amount for each “copy” sold. This little amount handed to authors only starts to look generous when compared to academic publishing, where authors increasingly pay publishers to have their articles published in journals. The journal authors receive zero royalties, despite the publishers’ growing profit

Even before the advent of AI technology, most authors were struggling to make a living on writing alone. According to an Authors Guild’s survey in 2018, the median income for full-time writers was $20,300, and for part-time writers, a mere $6,080. Fair wage and equitable profit sharing is an issue that needs to be settled between authors and publishers, even if publishers try to scapegoat AI companies. 

It’s worth acknowledging that it’s not just publishers and copyright industry organizations filing these lawsuits. Many of these ongoing lawsuits have been filed as class actions, with the plaintiffs claiming to represent a broad class of people who are similarly situated and (thus they alleged) hold similar views. Most notably, in Authors Guild v. OpenAI, Authors Guild and its named individual plaintiffs claim to represent all fiction writers in the US who have sold more than 5000 copies of a work. There’s also another case where plaintiff claims to represent all copyright holders of non-fiction works, including authors of academic journal articles, which got support from Authors Guild, and several others in which an individual plaintiff asserts the right to represent virtually all copyright holders of any type

As we (along with many others) have repeatedly pointed out, many authors disagree with the publishers and aggregators’ restrictive view on fair use in these cases, and don’t want or need a self-appointed guardian to “protect” their interests.  We have seen the same over-broad class designation in the Authors Guild v. Google case, which caused many authors to object, including many of our own 200 founding members.

Respect for copyright and human authors’ hard work means no more AI training under US copyright law? 

While we wait for courts to figure out the key questions on infringement and fair use, let’s take a moment to remember what copyright law does not regulate.

Copyright law in the US exists to further the Constitutional goal to “promote the Progress of Science and useful Arts.” In 1991, the Supreme Court held in Feist v. Rural Telephone Service that copyright cannot be granted solely based on how much time or energy authors have expended. “Compensation for hard work“ may be a valid ethical discussion, but it is not a relevant topic in the context of copyright law.

Publishers and aggregators preach that people must “respect copyright,” as if copyright is synonymous with the exclusive rights of the copyright holder. This is inaccurate and misleading. In order to safeguard the freedom of expression, copyright is designed to embody not only the rightsholders’ exclusive rights but also many exceptions and limitations to the rightsholders’ exclusive rights. Similarly, there’s no sound legal basis to claim that authors must have absolute control over their own work and its message. Knowledge and culture thrives because authors are permitted to build upon and reinterpret the works of others

Does this mean I should side with the AI companies in this debate?

Many of the largest AI companies exhibit troubling traits that they have in common with many publishers, copyright aggregators, digital platforms (e.g., Twitter, TikTok, Youtube, Amazon, Netflix, etc.), and many other companies with dominant market power. There’s no transparency or oversight afforded to the authors or the public. The authors and the public have little say in how the AI models are trained, just like how we have no influence over how content is moderated on digital platforms, how much royalties authors receive from the publishers, or how much publishers and copyright aggregators can charge users. None of these crucial systematic flaws will be fixed by granting publishers a share of AI companies’ revenue. 

Copyright also is not the entire story. As we’ve seen recently, there are some significant open questions about the right of publicity and somewhat related concerns about the ability of AI to churn out digital fakes for all sorts of purposes, some of which are innocent, but others are fraudulent, misleading, or exploitative. The US Copyright Office released a report on digital replicas on July 31 addressing the question of digital publicity rights, and on the same day the NO FAKES Act was officially introduced. Will the rights of authors and the public be adequately considered in that debate? Let’s remain vigilant as we wait to see the first-ever AI-generated public figure in a leading role to hit theaters in September 2024.

Some Initial Thoughts on the US Copyright Office Report on AI and Digital Replicas

Posted August 1, 2024

On July 31, 2024, the U.S. Copyright Office published Part 1 of its report summarizing the Office’s ongoing initiative of artificial intelligence. This first part of the report addresses digital replicas, in other words, how AI is used to realistically but falsely portray people in digital media. The Office in its report recommends new federal legislation that would create a new right to control “digital replicas” which it defines as  “a video, image, or audio recording that has been digitally created or manipulated to realistically but falsely depict an individual.”

We remain somewhat skeptical that such a right would do much to address the most troubling abuses such as deepfakes, revenge porn, and financial fraud. But, as the report points out, a growing number of varied state legislative efforts are already in the works, making a stronger case for unifying such rules at the federal level, with an opportunity to ensure adequate protections are in place for creators.  

The backdrop for the inquiry and report is a fast-developing space of state-led legislation, including legislation with regard to deepfakes. Earlier this year, Tennessee became the first state to enact such a law, the ELVIS Act (TN HB 2091), while other states mostly focused on addressing deepfakes in the context of sexual acts and political campaigns. New state laws are continuing to be introduced, making it harder and harder to navigate the space for creators, AI companies, and consumers alike. A federal right of publicity in the context of AI has already been discussed in Congress, and just yesterday a new bill was formally introduced, titled the “NO AI Fakes Act.” 

Authors Alliance has watched the development of this US Copyright Office initiative closely. In August 2023, the Office issued a notice of inquiry, asking stakeholders to weigh in on a series of questions about copyright policy and generative AI.  Our comment in response to the inquiry was devoted in large part to sharing the ways that authors are using generative AI, how fair use should apply to training AI, and that the USCO should be cautious in recommending new legislation to Congress

This report and recommendation from the Copyright Office could have a meaningful impact on authors and other creators, including both those whose personality and images are subject to use with AI systems, and those who are actively using AI in the writing and research. Below are our preliminary thoughts on what the Copyright Office recommends, which it summarizes in the report as follows: 

“We recommend that Congress establish a federal right that protects all individuals during their lifetimes from the knowing distribution of unauthorized digital replicas. The right should be licensable, subject to guardrails, but not assignable, with effective remedies including monetary damages and injunctive relief. Traditional rules of secondary liability should apply, but with an appropriately conditioned safe harbor for OSPs. The law should contain explicit First Amendment accommodations. Finally, in recognition of well-developed state rights of publicity, we recommend against full preemption of state laws.”

Initial Impressions

Overall, this seems like a well-researched and thoughtful report, given that the Office had to navigate a huge number of comments and opinions (over 10,000 comments were submitted). The report also incorporates the many more recent developments that included numerous new state laws and federal legislative proposals.  

Things we like: 

  • In the context of an increasing number of state legislative efforts—some overbroad and more likely than not to harm creators than help them—we appreciate the Office’s recognition that a patchwork of laws can pose a real problem for users and creators who are trying to understand their legal obligations when using AI that references and implicates real people.
  • The report also recognizes that the collection of concerns motivating digital replica laws—things like control of personality, privacy, fraud, and deception—are not at their core copyright concerns. “Copyright and digital replica rights serve different policy goals; they should not be conflated.” This matters a lot for what the scope of protection and other details for a digital replica right looks like. Copy-pasting copyright’s life+70 term of protection, for example, makes little sense (and the Office recognizes this, for example, by rejecting the idea of posthumous digital replica rights). 
  • The Office also suggests limiting the transferability of rights. We think this is a good idea to protect individuals from unanticipated downstream use by companies that may persuade individuals to sign deals that would lock them into unfavorable long-term deals. “Unlike publicity rights, privacy rights, almost without exception, are waivable or licensable, but cannot be assigned outright. Accordingly, we recommend a ban on outright assignments, and the inclusion of appropriate guardrails for licensing, such as limitations in duration and protection for minors.” 
  • The Office explicitly rejects the idea of a new digital replica right covering “artistic style.” We agree that protection of artistic style is a bad idea. Creators of all types have always used existing styles and methods as a baseline to build upon, and it’s resulted in a rich body of new works. Allowing for control over “style” however well-defined, would impinge on these new creations. Strong federal protection over “style” would also contradict traditional limitations on rights, such as Section 102(b)’s limits on copyrightable subject matter and the idea/expression dichotomy, which are rooted in the Constitution. 

Some concerns: 

  • The Office’s proposal would apply to the distribution of digital replicas, which are defined as “a video, image, or audio recording that has been digitally created or manipulated to realistically but falsely depict an individual.” This definition is quite broad and could potentially include a large number of relatively common and mostly innocuous uses—e.g., taking a photo with your phone of a person and applying a standard filter on your camera app could conceivably fall within the definition. 
  • First Amendment rights to free expression are critical for protecting uses for news reporting, artistic uses, parody and so on. Expressive uses of digital replicas—e.g., a documentary that uses AI to replicate a recent event involving recognizable people, or reproduction in a comedy show to to poke fun at politicians—could be significantly hindered by an expansive digital replica right unless it has robust free expression protections. Of course, the First Amendment applies regardless of the passing of a new law, but it will be important for any proposed legislation to find ways to allow people to exercise those rights effectively. As the report explains, comments were split. Some like the Motion Picture Association proposed enumerated exceptions for expressive use, while others such as the Recording Industry Association of America took the position that “categorical exclusions for certain speech-oriented uses are not constitutionally required and, in fact, risk overprotection of speech interests at the expense of important publicity interests.” 

We tend to think that most laws should skew toward “overprotection of speech interests,” but the devil is in the details on how to do so. The report leaves much to be desired on how to do this effectively in the context of digital replicas. For its part, “[t]he Office stresses the importance of explicitly addressing First Amendment concerns. While acknowledging the benefits of predictability, we believe that in light of the unique and evolving nature of the threat to an individual’s identity and reputation, a balancing framework is preferable.” One thing to watch in future proposals is what such a balancing framework actually includes, and how easy or difficult it is to assert protection of First Amendment rights under this balancing framework. 

  • The Office rejects the idea that Section 230 should provide protection for online service providers if they host content that runs afoul of the proposed new digital replica rights. Instead, the Office suggests something like a modified version of the Copyright Act’s DMCA section 512 notice and takedown process. This isn’t entirely outlandish—the DMCA process mostly works, and if this new proposed digital replica right is to be effective in practice, asking large service providers that are benefiting from hosting content to be responsive in cases of alleged infringing content may make sense. But, the Office says that it doesn’t believe the existing DMCA process should be the model, and points to its own Section 512 report for how a revised version for digital replicas might work. If the Office’s 512 study is a guide to what a notice-and-takedown system could look like for digital replicas, there is reason to be concerned.  While the study rejected some of the worst ideas for changing the existing system (e.g., a notice-and-staydown regime), it also repeatedly diminished the importance of ideas that would help protect creators with real First Amendment and fair use interests. 
  • The motivations for the proposed digital replica right are quite varied. For some commenters, it’s an objection to the commercial exploitation of public figures’ images or voices. For others, the need is to protect against invasions of privacy. For yet others, it is to prevent consumer confusion and fraud. The Office acknowledges these different motivating factors in its report and in its recommendations attempts to balance competing interests among them. But, there are still real areas of discontinuity—e.g., the basic structure of the right the Office proposes is intellectual-property-like. But it doesn’t really make a lot of sense to try to address some of the most pernicious fraudulent uses, such as deepfakes to manipulate public opinion, revenge porn, or scam phone calls, with a privately enforced property right oriented toward commercialization. Discovering and stopping those uses requires a very different approach and one that this particular proposal seems ill-equipped to deal with. 

Barely a few months ago, we were extremely skeptical that new federal legislation on digital replicas was a good idea. We’re still not entirely convinced, but the rash of new and proposed state laws does give us some pause. While the federal legislative process is fraught, it is also far from ideal for authors and creators to operate under a patchwork of varying state laws, especially those that provide little protection for expressive uses. Overall, we hope certain aspects of this report can positively influence the debate about existing federal proposals in Congress, but remain concerned about the lack of detail about protections for First Amendment rights. 

In the meantime, you can check out our two new resource pages on Generative AI and Personality Rights to get a better understanding of the issues.

What happens when your publisher licenses your work for AI training? 

Posted July 30, 2024
Photo by Andrew Neel on Unsplash

Over the last year, we’ve seen a number of major deals inked between companies like OpenAI and news publishers. In July 2023, OpenAI entered into a two-year deal with The Associated Press for ChatGPT to ingest the publisher’s news stories. In December 2023, Open AI announced its first non-US partnership to train ChatGPT on German publisher Axel Springer’s content, including Business Insider. This was then followed by a similar deal in March 2024 with Le Monde and Prisa Media, news publishers from France and Spain. These partnerships are likely sought in an effort to avoid litigation like the case OpenAI and Microsoft are currently defending from the New York Times.

As it turns out, such deals are not limited to OpenAI or newsrooms. Book publishers have also gotten into the mix. Numerous reports recently pointed out that based on Taylor and Francis’s parent company’s market update, the British academic publishing giant has agreed to a $10 million USD AI training deal with Microsoft. Earlier this year, another major academic publisher, John Wiley and Sons, recorded $23 million in one-time revenue from a similar deal with a non-disclosed tech company. Meta even considered buying Simon & Schuster or paying $10 per book to acquire its rights portfolio for AI training. 

With few exceptions (a notable one being Cambridge University Press), publishers have not bothered to ask their authors whether they approve of these agreements. 

Does AI training require licensing to begin with? 

First, it’s worth appreciating that these deals are made in the backdrop of some legal uncertainty. There are more than two dozen AI copyright lawsuits just in the United States, most of them turning on one key question: whether AI developers should have to obtain permission to scrap content to train AI models or whether fair use already allows this kind of training use even without permission. 

The arguments for and against fair use for AI training data are well explained elsewhere. We think there are strong arguments, based on cases like Authors Guild v. Google, Authors Guild v. HathiTrust, and AV ex rel Vanderhye v. iParadigms, that the courts will conclude that copying to training AI models is fair use. We also think there are really good policy reasons to think this could be a good outcome if we want to encourage future AI development that isn’t dominated by only the biggest tech giants and that results in systems that produce less biased outputs. But we won’t know for sure whether fair use covers any and all AI training until some of these cases are resolved. 

Even if you are firmly convinced that fair use protects this kind of use (and AI model developers have strong incentives to hold this belief), there are lots of other reasons why AI developers might seek licenses in order to navigate around the issue. This includes very practical reasons, like securing access to content in formats that make training easier, or content accompanied by structured, enhanced metadata. Given the pending litigation, licenses are also a good way to avoid costly litigation (copyright lawsuits are expensive, even if you win). 

Although one can hardly blame these tech companies for making a smart business decision to avoid potential litigation, this could have a larger systematic impact on other players in the field, including the academic researchers who would like to rely on fair use to train AI. As IP scholar James Gibson explains, when risk-averse users create new licensing markets in gray areas of copyright law, copyright holders’ exclusive rights expands, and public interest diminishes. The less we rely on fair use, the weaker it becomes.

Finally, it’s worth noting that fair use is only available in the US and a few other jurisdictions. In other jurisdictions, such as within the EU, using copyrighted materials for AI training (especially for commercial purposes) may require a license. 

To sum up: even though it may not be legally necessary to acquire copyright licenses for AI training, it seems that licensing deals between publishers and AI companies are highly likely to continue. 

So, can publishers just do this without asking authors? 

In a lot of cases, yes, publishers can license AI training rights without asking authors first. Many publishing contracts include a full and broad grant of rights–sometimes even a full transfer of copyright to the publisher for them to exploit those rights and to license the rights to third parties. For example, this typical academic publishing agreement provides that “The undersigned authors transfer all copyright ownership in and relating to the Work, in all forms and media, to the Proprietor in the event that the Work is published.” In such cases, when the publisher becomes the de facto copyright holder of a work, it’s difficult for authors to stake a copyright claim when their works are being sold to train AI.

Not all publishing contracts are so broad, however. For example, in the Model Publishing Contract for Digital Scholarship (which we have endorsed), the publisher’s sublicensing rights are limited and specifically defined, and profits resulting from any exploitation of a work must be shared with authors.  

There are lots of variations, and specific terms matter. Some publisher agreements are far more limited–transferring only limited publishing and subsidiary rights. These limitations in the past have prompted litigation over whether the publisher or the author gets to control rights for new technological uses. Results have been highly dependent on the specific contract language used. 

There are also instances where publishers aren’t even sure of what they own. For example, in the drawn-out copyright lawsuit brought by Cambridge University Press, Oxford University Press and Sage against Georgia State University, the court dropped 16 of the alleged 74 claimed instances of infringement because the publishers couldn’t produce documentation that they actually owned rights in the works they were suing over. This same lack of clarity contributed to the litigation and proposed settlement in the Google Books case, which is probably our closest analogy in terms of mass digitization and reuse of books (for a good discussion of these issues, see page 479 of this law review article by Pamela Samuelson about the Google Books settlement). 

This is further complicated by the fact that authors sometimes are entitled to reclaim their rights, such as by rights reversion clause and copyright termination. Just because a publisher can produce the documentation of a copyright assignment, does not necessarily mean that the publisher is still the current copyright holder of a work. 

We think it is certainly reasonable to be skeptical about the validity of blanket licensing schemes between large corporate rights holders and AI companies, at least when they are done at very large scale. Even though in some instances publishers do hold rights to license AI training, it is dubious whether they actually hold, and sufficiently document, all of the purported rights of all works being licensed for AI training.

Can authors at least insist on a cut of the profit? 

It can feel pretty bad to discover that massively profitable publishers are raking in yet more money by selling licensing rights to your work, while you’re cut out of the picture. If they’re making money, why not the author? 

It’s worth pointing out that, at least for academic authors, this isn’t exactly a novel situation–most academic authors make very little in royalties on their books, and nothing at all on their articles, while commercial publishers like Elsevier, Wiley, and SpringerNature sell subscription access at a healthy profit.  Unless you have retained sublicensing rights, or your publishing contract has a profit-sharing clause, authors, unfortunately, are not likely to profit from the budding licensing market for AI training.

So what are authors to do? 

We could probably start most posts like this with a big red banner that says “READ YOUR PUBLISHING CONTRACT!! (and negotiate it too)”  Be on the lookout for what you are authorizing your publisher to do with your rights, and any language in it about reuse or the sublicensing of subsidiary rights. 

You might also want to look for terms in your contract that speak to royalties and shares of licensing revenue. Some contracts have language that will allow you to demand an accounting of royalties; this may be an effective means of learning more about licensing deals associated with your work. 

You can also take a closer look at clauses that allow you to revert rights–many contracts will include a clause under which authors can regain rights when their book falls below a certain sales threshold or otherwise becomes “out of print.” Even without such clauses, it is reasonable for authors to negotiate a reversion of rights when their books are no longer generating revenue. Our resources on reversion will give you a more in-depth look at this issue.

Finally, you can voice your support for fair use in the context of licensing copyrighted works for AI training. We think fair use is especially important to preserve for non-commercial uses. For example, academic uses could be substantially stifled if paid-for licensing for permission to engage in AI research or related uses becomes the norm. And in those cases, the windfall publishers hope to pocket isn’t coming from some tech giant, but ultimately is at the expense of researchers, their libraries and universities, and the public funding that goes to support them.