Tag Archives: Fair Use

Artist Left with Heavy Fees by Copyright Troll Law Firm

Posted October 11, 2024

Facts of the Case & Fair Use

On September 18, the 5th Circuit decided in Keck v. Mix Creative Learning Center that using copyrighted artwork to teach children how to make art in a similar style does not constitute copyright infringement. The case adds to the well-developed jurisprudence that teaching with copyrighted materials is often protected by fair use.

This case was initially filed in 2021 by plaintiff’s counsel, Mathew Kidman Higbee, a known and prolific copyright litigation firm sometimes accused of troll-like behavior.  During the pandemic, the defendant sold a total of six art kits (out of the six kits sold, two were purchased by the plaintiff) that included images of the plaintiff’s dog-themed artworks, biographical information, and details on her artistic styles. Additionally, the kit included paint, paintbrushes, and collage paper. The plaintiff’s side argued that including the artworks in teaching kits constituted willful copyright infringement and therefore demanded $900,000 in damages—to make up for the $250 the defendant made in sales. 

The district court dismissed all infringement claims in 2022; and last month, the 5th Circuit court affirmed that including copies of plaintiff’s artwork in a teaching kit is fair use. 

The courts found the first and fourth fair use factors to favor the defendant. Under the first factor, even though the defendant’s use was commercial in nature, by accompanying the artworks with art theory and history, the teaching kit transformed the original decorative purpose of the dog-themed artworks. The 5th Circuit distinguished this case from Warhol by pointing out that, in the Warhol case, the infringing use served the same illustrative purpose as the original work, while in this case, “the art kits had educational objectives, while the original works had aesthetic or decorative objectives.”  

Under the fourth factor, courts explained that they cannot imagine how the market value of plaintiff’s dog-themed artworks could decrease when included in children’s art lesson kits. The 5th Circuit Court further pointed out that there was no evidence that a market for licensing artworks for similar teaching kits exists now or is ever likely to develop. 

Because these “two most important” factors favored the defendant, the defendant’s use was fair use.

Fee Shifting: Plaintiffs Beware of Copyright Troll Law Firms!

The final outcome of the case: the plaintiff was ordered to cover $102,404 in fees and $165.72 in costs for the defendant.

Even though we are happy for the defendant and her counsel that, after a prolonged legal battle, this well-deserved victory is finally won, it is nevertheless disheartening to see the plaintiff-artist left alone in the end to face the high legal fees of this ill-conceived lawsuit. The plaintiff’s counsel not only failed to advise the plaintiff to act in her own best interest (whether it is to settle the case at the right moment or to pursue more plausible claims), but also conjured up willful infringement claims that were clearly meritless to any trained eye. Even the 5th Circuit Court lamented over this in its opinion, as it begrudgingly upheld the district court’s decision based on the abuse of discretion standard it must follow:

It is troubling that Keck alone will be liable for the high fees incurred by Defendants largely because of Higbee & Associates’ overly aggressive litigation strategy. From our review of the record, the law firm lacked a firm evidentiary basis to pursue hundreds of thousands of dollars in statutory damages against Defendants for willful infringement. Nevertheless, we cannot say, on an abuse of discretion standard, that the district court erred by determining that there was insufficient evidence that the firm’s conduct was both unreasonable and vexatious. … But we warn Higbee & Associates that future conduct of this nature may well warrant sanctions, and nothing in this opinion prevents Higbee & Associates from compensating its client, if appropriate, for the fees that she is now obliged to pay Defendants.

This should serve as a cautionary tale for would-be plaintiffs: copyright lawsuits, like any other type of litigation, are primarily meant to address the damages plaintiffs actually suffered, and the final settlement should make plaintiffs whole again—that is, as if no infringement has ever occurred. Copyright lawsuits (or the threat to sue) should not be undertaken as a way to create brand new income streams, such as was the case in the lawsuit described above. 

When someone aggressively enforces dubious copyright claims with the sole purpose of collecting exorbitant fees rather than protecting any underlying copyrights, they are called a “copyright troll.” Regrettably, beyond the disreputable law firms that are enthused to pursue aggressive claims, many services now exist to tempt creators into troll-like behavior by promising “new licensing income.” The true aim of these services is solely to collect high representation charges from creators, when users of the creators’ works are harassed into paying exorbitant settlements. Many victims often agree to pay just for the nuisance to stop. This predatory business model has been repeatedly exposed by creators and authors, including famously by Cory Doctorow

Needless to say, copyright trolls are harmful to the copyright ecosystem. Obviously, innocent users are harmed when slapped with unreasonable demand letters or even frivolous lawsuits. Worse, creators are misled into supporting this unethical practice while deluded into believing they are doggedly following the spirit of the law—sometimes, as was in this case, they are left to face the inevitable consequences of bringing a frivolous lawsuit, while the lawyer or agent that originally led them into the mire gets off free, upward and onward to their next “representation.” 

It was very unfortunate that the district court did not fully study the plaintiff’s counsel’s track record and issue appropriate disciplinary orders against him. The problem of copyright trolls will have to be addressed soon in order to preserve a healthy copyright system. 

What is “Derivative Work” in the Digital Age?

Posted October 7, 2024
on the top, Seltzer v. Green Day; on the bottom, Kienitz v. Sconnie Nation

Part I: The Problem with “Derivative Work”

The right to prepare derivative works is one of the exclusive rights copyright holders have under §106 of the Copyright Act. Other copyright holders’ exclusive rights include the right to make and distribute copies, and to display or perform a work publicly. 

Lately, we’ve seen a congeries of novel conceptions about “derivative works.” For example, a reader of our blog stated that when looking at AI models and AI outputs, works should be considered infringing “derivatives” even when there is no substantial similarity between the infringing AI model/outputs and the ingested originals. Even in the courts, we’ve seen confusion, for example, Hachette v. Internet Archive presented us with the following statement about derivative works:

Changing the medium of a work is a derivative use rather than a transformative one. . . . In fact, we have characterized this exact use―“the recasting of a novel as an e-book”―as a “paradigmatic” example of a derivative work. [citation omitted; emphasis added]

These statements leave one to wonder—what is a copy, a derivative work, an infringing use, and a transformative fair use in the context of U.S. copyright law? In order to have some clarity on these questions, it’s helpful to juxtapose “derivative works” first with “copies” and then with “transformative uses.” We think the confusion about derivative work and its related concepts arises out of using the phrase to mean “a work that is substantially similar to the original work” as well as “a work that is so in an unauthorized way, not excused from liabilities.”

There are many immediate real world implications for confusion over the meaning of “derivative work.” In privately negotiated agreements, licensees who have a right to make reproductions but not derivative works may be confused as to what medium their use is restricted to. For example, a publisher of a book with a license that allows it to make reproductions but not derivatives might be confused as to whether, under the Hachette court’s reasoning, it is allowed to republish a print book in a digital format such as a simple PDF of a scan. Similarly, for public licenses, such as the CC ND licenses, where a licensor stipulates restriction on the creation of derivative works, it causes confusion for downstream users whether, say, changing a pdf into a Word document is allowed. 

This is also an important topic to explore both in the recent hot debates over Controlled Digital Lending and generative artificial intelligence, as well as in an author’s everyday work—for instance, would quoting someone else’s work make your article/book a derivative work of the original? 

Part II: “Copies” and “Derivatives”

Our basic understanding of derivative works comes from the 1976 Copyright Act. The §101 definition tells us:

A “derivative work” is a work based upon one or more preexisting works, such as a translation, musical arrangement, dramatization, fictionalization, motion picture version, sound recording, art reproduction, abridgment, condensation, or any other form in which a work may be recast, transformed, or adapted. A work consisting of editorial revisions, annotations, elaborations, or other modifications which, as a whole, represent an original work of authorship, is a “derivative work”.

The U.S. Copyright Office published Circular 14 gives some further helpful guidance as to what a §106 derivative work would look like:

To be copyrightable, a derivative work must incorporate some or all of a preexisting “work” and add new original copyrightable authorship to that work. The derivative work right is often referred to as the adaptation right. The following are examples of the many different types of derivative works: 

  • A motion picture based on a play or novel 
  • A translation of an novel written in English into another language
  • A revision of a previously published book 
  • A sculpture based on a drawing 
  • A drawing based on a photograph 
  • A lithograph based on a painting 
  • A drama about John Doe based on the letters and journal entries of John Doe 
  • A musical arrangement of a preexisting musical work 
  • A new version of an existing computer program 
  • An adaptation of a dramatic work 
  • A revision of a website

One immediate observation that can be made from reading these, is that “ebook” or “digitized version of a work” is not listed as, nor similar to any of the exemplary derivative works in the Copyright Act or the Copyright Office Circular. By contrast, “ebook” or “digitized version of a work” seems to fit much better under the § 101 definition of “copies”:

“Copies” are material objects, other than phonorecords, in which a work is fixed by any method now known or later developed, and from which the work can be perceived, reproduced, or otherwise communicated, either directly or with the aid of a machine or device. The term “copies” includes the material object, other than a phonorecord, in which the work is first fixed.

The most crucial difference between a “copy” and a “derivative work” is whether new authorship is added. If no new authorship is added, merely changing the material that the work is fixed on does not create a new copyrightable derivative work. This, in fact, is observed by many courts before Hachette. For example, in Corel v. Bridgeman Art Gallery, the court unequivocally held that there is no new copyright granted to photos of public domain paintings. 

Additionally, as we know from Feist v. Rural Tel., “[t]he mere fact that a work is copyrighted does not mean that every element of the work may be protected.” Copyright protection is only limited to the original elements of a work. We cannot call a work “derivative” of another if it does not incorporate any copyrightable elements from the original copyrighted work. For example, the “Game Genie” device, which let players change elements of a Nintendo game, was not found to be a derivative work by the court because it didn’t incorporate any part of the Nintendo game. 

It is clear from this examination that sometimes a later-created work is a copy, sometimes a derivative, and sometimes it may not implicate any of the exclusive rights of the original.

Part III: “Derivative” and “Transformative” Works

Let’s quickly recap the context in which courts are confusing “derivative” and “transformative” works—

A prima facie case of copyright infringement requires the copyright holder to prove (1) ownership of a valid copyright, and (2) inappropriate copying of original elements. We will not go into more details here, but essentially, the inappropriate copying prong requires plaintiffs to assert and prove defendant’s access to the plaintiff’s work as well as a level of similarity between the works in question that shows improper appropriation of the plaintiff’s work. If the similarity between the defendant’s work and protectable elements in the plaintiff’s work is minimal, then there is no infringement. As seen in the  “Game Genie” example above, courts can rely on substantial similarity analysis to determine whether a work is indeed a potentially-infringing copy or derivative of the plaintiff’s work.

Once the plaintiff establishes a prima facie infringement case—e.g., the defendant’s work is shown to be a derivative or a copy of the plaintiff’s registered work—the defendant may still nevertheless be free to make the use if the use falls outside the ambit of the copyright holder’s §106 rights, such as uses that are fair use. Whether a work is a derivative work under § 106 is no longer a relevant inquiry after establishing a prima facie case: this point is starkly obvious when looking at the many plausible defenses a defendant can raise (including fair use) where even the verbatim copying of a work is authorized by law. 

As the court stated in Authors Guild v. Hathitrust, “there are important limits to an author’s rights to control original and derivative works. One such limit is the doctrine of ‘fair use,’ which allows the public to draw upon copyrighted materials without the permission of the copyright holder in certain circumstances.” When a prima facie infringement case is already established, yet a court still discusses whether the defendant’s work is a “derivative work,” at a minimum, the court adds confusion by beyond the § 101  definition of a derivative work. 

In fact, a distinct new significance is being given to “derivative work” in recent years in the context of the “purpose and character” factor of fair use, specifically, when analyzing if a use has a transformative purpose. The shift in a word’s meaning or a concept is not per se unimaginable or objectionable. It is misguided to consider the copyright legal landscape static. As law professor Pamela Samuelson pointed out, before the mid-19th century, most courts did not even think copyright holders were entitled to demand compensation from others preparing derivative works. The 1976 Copyright Act finally codified copyright holders’ exclusive right to prepare derivative works. And, now, some rights holders want the courts to say there are categorical derivative uses that can never be considered fair use.

The Hachette court is among those that have unfortunately bought into this novel approach. The court seems not only to misconstrue the salient distinction between a ‘copy of a work’ and a ‘derivative work’, they appear to give heightened protections to works they now define as ‘derivative’. If this misconception becomes widespread, we will be living in a world where if a use is new-derivative, then it is never transformative (and, if it is not transformative, it is likely not fair). Ultimately, it is purely circular for a court to say that the reason for denying the fair use defense is that the use is derivative. When we buy into this setup of “derivative v.s. transformative,” it is difficult to ever say with confidence that a work is transformative, because at the same time we remember how a transformative use should often fit in the actual definition of derivative work under § 101, “derivative”—just like the Green Day rendition of the plaintiff’s art in Seltzer v. Green Day.  

Clearly, if we take “derivative work” at its true § 101 definition, out of all potentially infringing works, “transformative fair use” is not an absolute complement, but a possible subset, of derivative works. We know from Campbell v. Acuff-Rose that “transformativeness is a matter of degree, not a binary;” whereas no such sliding scale is plausible for derivative works. A work is either a derivative or it is not: there’s never a “somewhat derivative” work in copyright. All in all, it makes little sense to frame the issues as “transformative v.s. derivative work”—such discussions inevitably buy into the rhetorics of copyright expansionists. We have already warned the court in Warhol against the danger of speaking heedlessly about derivative works in the context of fair use. We must ensure that the “derivative v.s. transformative” dichotomy does not come to dominate future discussions of fair use, so that we conserve the utility and clarity of the fair use doctrine.

The expansion of the relevance of “derivative work” beyond the establishment of a prima facie infringement case not only creates a circular reasoning for denying fair use, but also makes it impossible to make sense of the case law we have accumulated on fair use. Take Seltzer v. Green Day for example, the court held that a work can be transformative even if that work “makes few physical changes to the original.” The Green Day concert background art with a red cross superimposed was found to be a fair use of the original street art—a classic example of how a prima facie infringing derivative work can nevertheless be a transformative, and thus fair, use. Similarly, in Kienitz v. Sconnie Nation, a derivative use of a photo on a tshirt was found to be a fair use. Ideas and concepts, including “derivative works,” are only important to the extent they elucidate our understanding of the world. When the use of “derivative works” leads to more confusion than clarity, we should be cautious in adopting the new meaning being superimposed on “derivative works.”

Clickbait arguments in AI Lawsuits (will number 3 shock you?)

Posted August 15, 2024

Image generated by Canva

The booming AI industry has sparked heated debates over what AI developers are legally allowed to do. So far, we have learned from the US Copyright Office and courts that AI created works are not protectable, unless it is combined with human authorship. 

As we monitor two dozen ongoing lawsuits and regulatory efforts that address various aspects of AI’s legality, we see legitimate legal questions that must be resolved. However, we also see some prominent yet flawed arguments that have been used to enflame discussions, particularly by publisher-plaintiffs and their supporters. For now, let’s focus on some clickbait arguments that sound appealing but are fundamentally baseless. 

Will AI doom human authorship?

Based on current research, AI tools can actually help authors improve creativity, productivity, as well as the longevity of their career

When AI tools such as ChatGPT first appeared online, many leading authors and creators publicly endorsed it as a useful tool like any other tech innovation that came before it. At the same time, many others claimed that authors and creators of lesser caliber will be disproportionately disadvantaged by the advent of AI. 

This intuition-driven hypothesis, that AI will be the bane of average authors, has so far proved to be misguided.

We now know that AI tools can greatly help authors during the ideation stage, especially for less creative authors. According to a study published last month, AI tools had minimal impact on the output of highly creative authors, but were able to enhance the works of less imaginative authors. 

AI can also serve as a readily-accessible editor for authors. Research shows that AI enhances the quality of routine communications. Without AI-powered tools, a less-skilled person will often struggle with the cognitive burden of managing data, which limits both the quality and quantity of their potential output. AI helps level the playing field by handling data-intensive tasks, allowing writers to focus more on making creative and other crucial decisions about their works. 

It is true that entirely AI-generated works of abysmal quality are available for purchase on some platforms. Some of these works are using human authors’ names without authorization. These AI-generated works may infringe on authors’ right of publicity, but they do not present commercially-viable alternatives to books authored by humans. Readers prefer higher-quality works produced with human supervision and interference (provided that digital platforms do not act recklessly towards their human authors despite generating huge profits from human authors).

Are lawsuits against AI companies brought with authors’ best interest in mind? 

In the ongoing debate over AI, publishers and copyright aggregators have suggested that they have brought these lawsuits to defend the interests of human authors. Consider the New York Times for example, in its complaint against OpenAI, NY Times describes their operations as “a creative and deeply human endeavor (¶31)” that necessitates “investment of human capital (¶196).” NY Times argues that OpenAI has built innovation on the stolen hard work and creative output from journalists, editors, photographers, data analysts, and others—an argument contrary to what the NY Times once argued in court in New York Times v. Tasini,  that authors’ rights must take a backseat to NY Times’ financial interests in new digital uses.  

It is also hard to believe that many of the publishers and aggregators are on the side of authors when we look at how they have approached licensing deals for AI training. These licensing deals can be extremely profitable for the publishers. For example, Taylor and Francis sold AI training data to OpenAI for 10 million USD. John Wiley and Sons earned $23 million from a similar deal with a non-disclosed tech company. Though we don’t have the details of these agreements, it seems easy to surmise that in return for the money received, the publishers will not harass the AI companies with future lawsuits. (See our previous blog post about these licensing deals and what you can do as an author.) It is ironic how an allegedly unethical and harmful practice quickly becomes acceptable once the publishers are profiting from it.

How much of the millions of dollars changing hands will go to individual authors? Limited data exist. We know that Cambridge University Press, a good-faith outlier, is offering authors 20% royalties if their work is licensed for AI training. Most publishers and aggregators are entirely opaque about how authors are to be compensated in these deals. Take the Copyright Clearance Center (CCC) for example, it offers zero information about how individual authors are consulted or compensated when their works are sold for AI training under CCC AI training license.

This is by no means a new problem for authors. We know that traditionally-published book authors receive around 10% of royalties from their publishers: a little under $2 per copy for most books. On an ebook, authors receive a similar amount for each “copy” sold. This little amount handed to authors only starts to look generous when compared to academic publishing, where authors increasingly pay publishers to have their articles published in journals. The journal authors receive zero royalties, despite the publishers’ growing profit

Even before the advent of AI technology, most authors were struggling to make a living on writing alone. According to an Authors Guild’s survey in 2018, the median income for full-time writers was $20,300, and for part-time writers, a mere $6,080. Fair wage and equitable profit sharing is an issue that needs to be settled between authors and publishers, even if publishers try to scapegoat AI companies. 

It’s worth acknowledging that it’s not just publishers and copyright industry organizations filing these lawsuits. Many of these ongoing lawsuits have been filed as class actions, with the plaintiffs claiming to represent a broad class of people who are similarly situated and (thus they alleged) hold similar views. Most notably, in Authors Guild v. OpenAI, Authors Guild and its named individual plaintiffs claim to represent all fiction writers in the US who have sold more than 5000 copies of a work. There’s also another case where plaintiff claims to represent all copyright holders of non-fiction works, including authors of academic journal articles, which got support from Authors Guild, and several others in which an individual plaintiff asserts the right to represent virtually all copyright holders of any type

As we (along with many others) have repeatedly pointed out, many authors disagree with the publishers and aggregators’ restrictive view on fair use in these cases, and don’t want or need a self-appointed guardian to “protect” their interests.  We have seen the same over-broad class designation in the Authors Guild v. Google case, which caused many authors to object, including many of our own 200 founding members.

Respect for copyright and human authors’ hard work means no more AI training under US copyright law? 

While we wait for courts to figure out the key questions on infringement and fair use, let’s take a moment to remember what copyright law does not regulate.

Copyright law in the US exists to further the Constitutional goal to “promote the Progress of Science and useful Arts.” In 1991, the Supreme Court held in Feist v. Rural Telephone Service that copyright cannot be granted solely based on how much time or energy authors have expended. “Compensation for hard work“ may be a valid ethical discussion, but it is not a relevant topic in the context of copyright law.

Publishers and aggregators preach that people must “respect copyright,” as if copyright is synonymous with the exclusive rights of the copyright holder. This is inaccurate and misleading. In order to safeguard the freedom of expression, copyright is designed to embody not only the rightsholders’ exclusive rights but also many exceptions and limitations to the rightsholders’ exclusive rights. Similarly, there’s no sound legal basis to claim that authors must have absolute control over their own work and its message. Knowledge and culture thrives because authors are permitted to build upon and reinterpret the works of others

Does this mean I should side with the AI companies in this debate?

Many of the largest AI companies exhibit troubling traits that they have in common with many publishers, copyright aggregators, digital platforms (e.g., Twitter, TikTok, Youtube, Amazon, Netflix, etc.), and many other companies with dominant market power. There’s no transparency or oversight afforded to the authors or the public. The authors and the public have little say in how the AI models are trained, just like how we have no influence over how content is moderated on digital platforms, how much royalties authors receive from the publishers, or how much publishers and copyright aggregators can charge users. None of these crucial systematic flaws will be fixed by granting publishers a share of AI companies’ revenue. 

Copyright also is not the entire story. As we’ve seen recently, there are some significant open questions about the right of publicity and somewhat related concerns about the ability of AI to churn out digital fakes for all sorts of purposes, some of which are innocent, but others are fraudulent, misleading, or exploitative. The US Copyright Office released a report on digital replicas on July 31 addressing the question of digital publicity rights, and on the same day the NO FAKES Act was officially introduced. Will the rights of authors and the public be adequately considered in that debate? Let’s remain vigilant as we wait to see the first-ever AI-generated public figure in a leading role to hit theaters in September 2024.

Introducing the Authors Alliance’s First Zine: Can Authors Address AI Bias?

Posted May 31, 2024

This guest post was jointly authored by Mariah Johnson and Marcus Liou, student attorneys in Georgetown’s Intellectual Property and Information Policy (iPIP) Clinic.

Generative AI (GenAI) systems perpetuate biases, and authors can have a potent role in mitigating such biases.

But GenAI is generating controversy among authors. Can authors do anything to ensure that these systems promote progress rather than prevent it? Authors Alliance believes the answer is yes, and we worked with them to launch a new zine, Putting the AI in Fair Use: Authors’ Abilities to Promote Progress, that demonstrates how authors can share their works broadly to shape better AI systems. Drawing together Authors Alliance’s past blog posts and advocacy discussing GenAI, copyright law, and authors, this zine emphasizes how authors can help prevent AI bias and protect “the widest possible access to information of all kinds.” 

As former Copyright Register Barbara Ringer articulated, protecting that access requires striking a balance with “induc[cing] authors and artists to create and disseminate original works, and to reward them for their contributions to society.” The fair use doctrine is often invoked to do that work. Fair use is a multi-factor standard that allows limited use of copyrighted material—even without authors’ credit, consent, or compensation–that asks courts to examine:

(1) the purpose and character of the use, 

(2) the nature of the copyrighted work, 

(3) the amount or substantiality of the portion used, and 

(4) the effect of the use on the potential market for or value of the work. 

While courts have not decided whether using copyrighted works as training data for GenAI is fair use, past fair use decisions involving algorithms, such as Perfect 10, iParadigms, Google Books, and HathiTrust favored the consentless use of other people’s copyrighted works to create novel computational systems. In those cases, judges repeatedly found that algorithmic technologies aligned with the Constitutional justification for copyright law: promoting progress.

But some GenAI outputs prevent progress by projecting biases. GenAI outputs are biased in part because they use biased, low friction data (BLFD) as training data, like content scraped from the public internet. Examples of BLFD include Creative Commons (CC) licensed works, like Wikipedia, and works in the public domain. While Wikipedia is used as training data in most AI systems, its articles are overwhelmingly written by men–and that bias is reflected in shorter and fewer articles about women. And because the public domain cuts off in the mid-1920s, those works often reflect the harmful gender and racial biases of that time. However, if authors allow their copyrighted works to be used as GenAI training data, those authors can help mitigate some of the biases embedded in BLFD. 

Current biases in GenAI are disturbing. As we discuss in our zine, word2vec is a very popular toolkit used to help machine learning (ML) models recognize relationships between words–like women as homemakers and Black men with the word “assaulted.” Similarly, OpenAI’s GenAI chatbox ChatGPT, when asked to generate letters of recommendation, used “expert,” “reputable,” and “authentic” to describe men and  “beauty,” “stunning,” and “emotional” for women, discounting women’s competency and reinforcing harmful stereotypes about working women. An intersectional perspective can help authors see the compounding impact of these harms. What began as a legal framework to describe why discrimination law did not adequately address harms facing Black women, it is now used as a wider lens to consider how marginalization affects all people with multiple identities. Coined by Professor Kimberlé Crenshaw in the late 1980s, intersectionality uses critical theory like Critical Race Theory, feminism, and working-class studies together as “a lens . . . for seeing the way in which various forms of inequality often operate together and exacerbate each other.” Contemporary authors’ copyrighted works often reflect the richness of intersectional perspectives, and using those works as training data can help mitigate GenAI bias against marginalized people by introducing diverse narratives and inclusive language. Not always–even recent works reflect bias–but more often than might be possible currently.

Which brings us back to fair use. Some corporations may rely on the doctrine to include more works by or about marginalized people in an attempt to mitigate GenAI bias. Professor Mark Lemley and Bryan Casey have suggested “[t]he solution [to facial recognition bias] is to build bigger databases overall or to ‘oversample’ members of smaller groups” because “simply restricting access to more data is not a viable solution.” Similarly, Professor Matthew Sag notes that “[r]estricting the training data for LLMs to public domain and open license material would tend to encode the perspectives, interests, and biases of a distinctly unrepresentative set of authors.” However, many marginalized people may wish to be excluded from these databases rather than have their works or stories become grist for the mill. As Dr. Anna Lauren Hoffman warns, “[I]nclusion reinforces the structural sources of violence it supposedly addresses.”

Legally, if not ethically, fair use may moot the point. The doctrine is flexible, fact-dependent, and fraught. It’s also fairly predictable, which is why legal precedent and empirical work have led many legal scholars to believe that using copyrighted works as training data to debias AI will be fair use–even if that has some public harms. Back in 2017, Professor Ben Sobel concluded that “[i]f engineers made unauthorized use of copyrighted data for the sole purpose of debiasing an expressive program, . . . fair use would excuse it.” Professor Amanda Levendowski has explained why and how “[f]air use can, quite literally, promote creation of fairer AI systems.” More recently, Dr. Mehtab Khan and Dr. Alex Hanna  observed that “[a]ccessing copyright work may also be necessary for the purpose of auditing, testing, and mitigating bias in datasets . . . [and] it may be useful to rely on the flexibility of fair use, and support access for researchers and auditors.” 

No matter how you feel about it, fair use is not the end of the story. It is ill-equipped to solve the troubling growth of AI-powered deepfakes. After being targeted by sexualized deepfakes, Rep. Ocasio-Cortez described “[d]eepfakes [as] absolutely a way of digitizing violent humiliation against other people.” Fair use will not solve the intersectional harms of AI-powered face surveillance either. Dr. Joy Buolamwini and Dr. Timnit Gebru evaluated leading gender classifiers used to train face surveillance technologies and discovered that they more accurately classified males over females and lighter-skinned over darker-skinned people. The researchers also discovered that the “classifiers performed worst on darker female subjects.” While legal scholars like Professors Shyamkrishna Balganesh, Margaret Chon, and Cathay Smith argue that copyright law can protect privacy interests, like the ones threatened by deepfakes or face surveillance, federal privacy laws are a more permanent, comprehensive way to address these problems.

But who has time to wait on courts and Congress? Right now, authors can take proactive steps to ensure that their works promote progress rather than prevent it. Check out the Authors Alliance’s guides to Contract Negotiations, Open Access, Rights Reversion, and Termination of Transfer to learn how–or explore our new zine, Putting the AI in Fair Use: Authors’ Abilities to Promote Progress.

You can find a PDF of the Zine here, as well as printer-ready copies here and here.