Yesterday, Authors Alliance joined a diverse group of creators of online content on an amicus brief in Gonzalez v. Google, a case before the Supreme Court. The case is about Section 230 of the Communications Decency Act and whether it protects curated recommendations by platforms. Section 230 protects online service providers from legal liability for content generated by users, and is considered by many to be essential for a vibrant and diverse internet. By shielding platforms from liability for speech their users make on these platforms, Section 230 enables the free flow of ideas and expression online, including speech on controversial topics. This is consistent with First Amendment values and the functioning of the internet as we know it.
The case concerns ISIS recruitment videos posted on YouTube, which the petitioner alleges were recommended by the platform. Gonzalez argues that Section 230 should not shield Google from liability, and that it aided in ISIS recruitment by recommending these videos to users. Google, on the other hand, contends that Section 230 shields it from liability for recommendations made on the platform, including the recommendations at issue in the case.
Our brief makes three principal arguments. First, it argues that Congress intended Section 230 to foster a free Internet where diverse and independent expression thrives. We explain that 230 was meant to facilitate free expression online, which is precisely what it continues to do.
Second, our brief argues that platform recommendations contribute to the flourishing of free expression, creativity, and innovation online. Authors like our members are served by platform recommendations and curation: for authors whose works may not appeal to a general audience, platform recommendations enable readers interested in a particular topic or type of work to discover them. In this way, platform recommendation can serve authors’ interests in seeing their works reach broad and diverse audiences. This is particularly important for authors just starting out in their careers who have not yet found an audience, and platform recommendations can and do help these authors grow their audiences.
Finally, we argue that altering Section 230’s protections for recommendations could have dire consequences for current and future creators—including authors— and could chill the free flow of ideas online. If platforms were to be held liable for content created by users, we believe they would be inclined to take a more conservative approach, moderating content to avoid the threat of a lawsuit or other legal action. This could reasonably lead platforms to avoid hosting content on controversial topics or content by new and emerging creators whose views are unknown. An author’s ability to write freely, including on controversial topics, is essential for a vibrant democratic discourse. And if platforms were reluctant to recommend content by new creators, who may be seen as less “safe,” dominant and established creators could be entrenched, doing a disservice to less established creators. Were platforms to censor certain writings or ideas to avoid lawsuits, the internet would become less free, less vibrant, and more sanitized—doing a disservice to all of us.
Authors Alliance thanks Keker, Van Nest & Peters LLP for their invaluable support and contributions to this brief, as well as our fellow amici for sharing their stories.
Literary aficionados and copyright buffs alike have something to celebrate as we welcome 2023: A new batch of literary works published in 1927 entered the public domain on January 1st, when the copyrights in those works expired. The public domain refers to the commons of creative expression that is not protected by copyright. When a work enters the public domain, anyone may do anything they want with that work, including activities that were formerly the “exclusive right” of the copyright holder like copying, sharing, translating, or adapting the work.
Some of the more recognizable books entering the public domain this year include:
Virginia Woolf’s To the Lighthouse
William Faulkner’s Mosquitoes
Agatha Christie’s The Big Four
Edith Wharton’s Twilight Sleep
Herbert Asbury’s The Gangs of New York (the original 1927 publication)
Franklin W. Dixon’s (a pseudonym) The Tower Treasure (the first Hardy Boys book)
Literary works can be a part of the public domain for reasons other than the expiration of copyright—such as when a work is created by the government—but copyright expiration is the major way that literary works become a part of the public domain. Copyright owners of works first published in the United States in 1927 needed to renew that work’s copyright in order to extend the original 28-year copyright term. Initially, the renewal term also lasted for 28 years, but over time the renewal term was extended to give the copyright holder an additional 67 years of copyright protection, for a total term of 95 years. This means that works that were first published in the United States in 1927—provided they were published with a copyright notice, were properly registered, and had their copyright renewed—were protected through the end of 2022.
We’re very pleased to announce a new project for 2023, “Text and Data Mining: Demonstrating Fair Use,” which is generously supported by the Mellon Foundation. The project will focus on lowering and overcoming legal barriers for researchers who seek to exercise their fair use rights, specifically within the context of text data mining (“TDM”) research under current regulatory exemptions.
Fair use is one of the primary legal doctrines that allow researchers to copy, transform, and analyze modern creative works—almost all of which are protected by copyright—for research, educational, and scholarly purposes. Unfortunately, in practice, not everyone is able to use this powerful right. Researchers today face the challenge that fair use is often overridden by a complex web of copyright-adjacent laws. One major culprit is Section 1201 of the Digital Millennium Copyright Act (“DMCA”), which imposes significant liability for users of copyrighted works who circumvent technical protection measures (e.g., content scramble for DVDs), unless those users comply with a series of specific exemptions to Section 1201. These exemptions are lengthy and complex, as is the process to petition for their adoption or renewal, which recurs every three years.
Text data mining is a prime example of work that demonstrates the power of fair use, as it allows researchers to discover and share new insights about how modern language and culture reflect on important issues ranging from our understanding of science to how we think about gender, race, and national identity. Authors Alliance has worked extensively on supporting TDM work in the past, including by successfully petitioning the Copyright Office for a DMCA exemption to allow researchers to break digital locks on films and literary works distributed electronically for TDM research purposes, and this project builds on those previous efforts.
The Text Data Mining: Demonstrating Fair Use project has two goals in 2023:
1) To help a broader and more diverse group of researchers understand their fair use rights and their rights under the existing TDM exemption through one-on-one consultations, creating educational materials, and hosting workshops and other trainings; and
2) To collect and document examples of how researchers are using the current TDM exemption, with the aim of illustrating how the TDM exemption can be applied and highlighting its limitations so that policymakers can improve it in the future.
We’ll be working closely with TDM researchers across the United States, as well organizations such as the Association for Computers and the Humanities, and will be actively exploring opportunities to work with others. If you have an interest in this project, we would love to hear from you!
About The Andrew W. Mellon Foundation
The Andrew W. Mellon Foundation is the nation’s largest supporter of the arts and humanities. Since 1969, the Foundation has been guided by its core belief that the humanities and arts are essential to human understanding. The Foundation believes that the arts and humanities are where we express our complex humanity, and that everyone deserves the beauty, transcendence, and freedom that can be found there. Through our grants, we seek to build just communities enriched by meaning and empowered by critical thinking, where ideas and imagination can thrive. Learn more at mellon.org.
Last week saw a flurry of news about the Journalism Competition and Preservation Act (“JCPA”), proposed legislation that would create an exemption to antitrust law that would allow certain news publishers to join together to collectively negotiate with digital platforms to negotiate payments for carrying their content. Authors Alliance has consistently opposed the JCPA, as we believe it would harm small publishers and creators, while further entrenching major players in the news media industry.
Last Monday, December 5th, it was uncovered that the revised JCPA had been included in a “must pass” defense spending bill (the National Defense Authorization Act, or NDAA), leading the legislation’s opposition to promptly decry the move and caution against it. Then, the next day, news broke that Congress had removed the JCPA from the legislation—something to celebrate for those, like Authors Alliance, who believed this was ill-advised legislation that would not have served the interests of the creators who contribute to the news media.
The JCPA was first proposed as separate bills in the Senate and House of Representatives in March 2021. The JCPA has laudable goals: to preserve a strong, diverse, and independent press, responding to ongoing crises in local and national journalism. But the actual text of the JCPA doesn’t meet those goals, while causing other problems. One major problem has been that the JCPA implicitly expands the scope of copyright, and would potentially require payment for activities like linking or using brief snippets of content that are not only fair uses, but are crucial for digital scholarship. In June 2021, Authors Alliance joined a group of like-minded civil society organizations on a letter urging Congress to clarify that the bill would not expand copyright protection to article links, and that authors and other internet users would not have to pay to link to articles or for the use of headlines and other snippets that fall within fair use.
Then, this September, a new version of the bill was released in the Senate. While the revised language made some improvements—like clarifying that the bill would not modify, expand, or alter the rights guaranteed under copyright—it still failed to clarify that the bill would not cover activities like linking that are fundamental for authors creating digital scholarship. And some changes to the legislation posed serious First Amendment concerns. For example, new language in the bill would have forced platforms to carry content of digital journalism organizations that participated in the collective bargaining, regardless of extreme views or misinformation. The revised bill could also have hurt authors of news articles financially, because it failed to include a provision that would require authors of the press articles to be compensated as part of the collective bargaining it envisioned.
Inclusion in the NDAA
Last week, the news that the JCPA had been included in the NDAA was met with outcry. Its opponents argued that the bill was far too complex to be included in must-pass legislation, and merited further discussion and revision before becoming law. The JCPA was never marked up in the House of Representatives, nor did it receive a hearing there. Authors Alliance once again joined 26 other civil society organizations on a letter protesting the move and urging Congress not to include the JCPAA in military spending or other must-pass legislation.
A wide variety of other stakeholders also objected to the inclusion of the JCPA in the NDAA. Small publications, lobbyists for platforms, and even journalism trade groups reiterated their opposition. Meta, the company that owns Facebook, even threatened to remove news from their platform were the legislation to pass (in response to a similar bill being passed in Australia, Meta did in fact remove news from its platform in the country). Then, late on Tuesday, December 6th, the latest version of the bill’s text was released, with the JCPA omitted. The NDAA was approved by the House a few days later.
A Victory for Now
Because the JCPA was removed from the NDAA before its passage, it is no longer on the brink of becoming law. What happens next with the JCPA is less certain. There have already been multiple iterations of the bill, and it could be reintroduced, with or without modifications, at the next legislative session. While it’s unclear how the new makeup of Congress following the midterm elections might affect the JCPA’s chance of becoming law, this is certainly a factor in the bill’s future. This was also not the first time that the government has attempted to support journalism and local news through proposals that could affect users’ and authors’ ability to rely on fair use. Just last year, the Copyright Office conducted a study on establishing a new press publishers’ right in the United States which would have required news aggregators to pay licensing fees as part of their aggregation of headlines, ledes, and short phrases of news articles (you can read about Authors Alliance’s reply comment in that study here), activities. While the Office ultimately decided not to recommend the adoption of a new press publisher’s right, its study shows that the government may continue to investigate these policies from other fronts.
Since 2014, you have helped Authors Alliance fulfill our mission to advance the interests of authors who want to make the world a fairer and more just place, to spark new conversations, and to be read by wide audiences. But our continued existence is not guaranteed, and we need your help to continue to advocate for authors who write to be read. Each year, we launch a year-end fundraising campaign and this year, we need your support more than ever.
We’re proud of our many accomplishments in 2022, and cannot wait for you to see what we have in store for 2023. You can expect a brand new guide to legal issues related to writing about real people, a wealth of advocacy work related to strengthening authors’ ability to engage in text data mining, and more amicus briefs to represent your interests in the courts.
Please consider making a tax-deductible donation today to help us carry on our work in 2023. Every contribution enables us to do our part to help you keep writing to be read!
We are so pleased to be able to co-sponsor this next book talk with Internet Archive, hosted on December 15 at 10am PT/ 1pm ET. Join copyright scholar and Authors Alliance Board President PAMELA SAMUELSON for a discussion with historian and Authors Alliance Advisory Board member PETER BALDWIN about his book THE COPYRIGHT WARS, covering three centuries’ worth of trans-Atlantic copyright battles.
Today’s copyright wars can seem unprecedented. Sparked by the digital revolution that has made copyright—and its violation—a part of everyday life, fights over intellectual property have pitted creators, Hollywood, and governments against consumers, pirates, Silicon Valley, and open-access advocates. But while the digital generation can be forgiven for thinking the dispute between, for example, the publishing industry and libraries is completely new, the copyright wars in fact stretch back three centuries—and their history is essential to understanding today’s battles. THE COPYRIGHT WARS—the first major trans-Atlantic history of copyright from its origins to today—tells this important story.
Last week, the district court released its opinion in United States v. Bertelsmann, an antitrust case concerning a proposed merger between Penguin Random House (“PRH”) and Simon & Schuster (“S&S”), which the court blocked (an “amended opinion” was released earlier this week, but the two documents only differ in their concluding language). Authors Alliance has been covering this case on our blog for thepastyear, and we were eager to read Judge Pan’s full opinion now that redactions had been made and the opinion made public. This post gives an overview of the opinion; shares our thoughts about what Judge Pan got right, got wrong, and left out; and discusses what the case could mean for the vast majority of authors who are not represented in the discussion.
The Department of Justice initiated this antitrust proceeding after PRH and S&S announced that they intended to merge, with Bertelsmann, PRH’s parent company, purchasing S&S from its parent company, Paramount Global. The trade publishing industry has long been dominated by a few large publishing houses which have merged and consolidated over time. Today, the trade industry is dominated by the “Big Five” publishers: PRH, S&S, HarperCollins, Hachette Book Group, and Macmillan. And a sub-section of the trade publishing industry, “anticipated top sellers,” is the focus of the government’s argument and Judge Pan’s opinion. This market segment is defined as books for which authors receive an advance of $250,000 or higher (a book advance is an up-front payment made to authors when they publish a book, and often the only money these authors receive for their works).
The main thrust of Judge Pan’s opinion is simple: the proposed merger would have led to lower advances for authors of anticipated top sellers, and the market harm that would flow from the decreased competition in the industry is substantial enough that the merger can not go forward under U.S. antitrust law. To arrive at this conclusion, the court considered testimony from a variety of publishing industry insiders, experts in economics, and authors.
Defining the Market
Trade publishing houses are those that distribute books on a national scale and sell them in non-specialized channels, like at general interest bookstores or on Amazon. It stands in contrast to self-publishing, academic publishing, and publishing with specialized boutique presses. But changes in how we read and how books are distributed has complicated these distinctions. For example, university presses are sometimes considered to be non-trade publishers, despite the fact that many also publish trade books. University presses are particularly well poised to publish books that bridge the gap between the scholarly and the popular—Harvard University Press’s publication of Thomas Picketty’s Capital in the 21st Century is one example, and it was an unexpected bestseller. Similarly, Amazon sells trade books alongside other types of books. TheAuthors Alliance Guide to Understanding Open Access is available as a print book on Amazon, but it is one we released under an open access license, and is far from a trade book. Consumers increasingly buy books online as brick and mortar bookstores across the country close or downsize, and the Amazon marketplace obscures the distinction between trade publishing and other types of publishing.
Within trade publishing, there is a small segment of books which are seen as “hot,” which the DOJ calls anticipated top sellers. While PRH argued that this distinction was pulled out of whole cloth, the popular “Publisher’s Marketplace,” a subscription-based service for those in the industry, uses certain terms (essentially code words) to indicate the size of the advance in a book deal when they are announced. “Deals under $50,000 are ‘nice,’ those up to $100,000 are ‘very nice,’ those up to $250,000 are ‘good,’ those up to $500,000 are ‘significant,’ and larger deals are ‘major.””
For the market for anticipated top sellers (trade books with advances of $250,000 or higher), the Big Five collectively control 91% of the market share. In contrast, for books where an author receives an advance under $250,000, the Big Five control just 55% of the market, with non-Big Five trade publishers publishing a significant portion of trade books in this category. Post-merger, the combined PRH and S&S were expected to have a 49% share of the market for anticipated bestsellers, according to expert testimony—more than the rest of the Big Five put together. For these reasons, the merger was determined to be improper as a matter of antitrust.
Beyond Anticipated Top Sellers
While Judge Pan’s opinion is measured, thoughtful, and reaches (from our perspective) the correct result, the broader context of the publishing industry shows how narrow the subset of authors in this market is, and how some authors were left out. The market the court considered in this case is “a submarket of the broader publishing market for all trade books.” In its pre-trial brief, PRH asserted that “[s]ome 57,000 to 64,00 books are published in the [U.S.] each year by one of more than 500 different publishing houses” and “another 10,000-20,000 are self published.” It is unclear whether the first number includes academic books and other non-trade titles. “[A]nticipated top-selling books” account for just 2% of “all books published by commercial publishers” (again, it is unclear what “commercial publishers” means in this context), and an even smaller share of all books published in the U.S. in a given year (a difficult statistic to pin down, but somewhere between 300,000 and 1,000,000 books per calendar year, depending onwho you ask).
It is not just that the authors that are the topic of this discussion are unique in the high advances they receive for their books, it is that the business of publishing a book is fundamentally different for these authors than less commercially successful authors. And this is what is missing from Judge Pan’s opinion: the economic system of Big Five book acquisitions for anticipated top sellers is totally unlike many authors’ experiences getting their work published. While many authors struggle to find a publisher willing to publish their book, and more still struggle to convince their publisher to do so on terms that are acceptable to them, anticipated top sellers are generally the subject of book auctions, whereby editors bid on the rights to a manuscript in an auction held by an author’s literary agent. It is important to keep in mind that for a vast majority of working authors, these auctions do not take place.
The language in the opinion shows how it generalizes the experiences of commercially successful trade authors to authors more broadly, doing a disservice to the multitude of authors whose book deals do not look like the transactions it describes. Judge Pan states that “[a]uthors are generally represented by literary agents, who use their judgment and experience to find the best home for publishing a book.” Literary agents play an important role in the publishing ecosystem, and serve as intermediaries for some authors to help them develop their manuscripts and get the best deal possible. But the author-agent relationship is also a financial one: agents receive a “commission” of around 15% of all monies paid to the author. It stands to reason that an author who cares more about their work reaching a broad audience than receiving a large advance, or even an advance at all, is much less likely to be represented by an agent. And these authors too care about finding the right home for their work, getting a book deal with favorable terms, and feeling confident that their publisher is invested in their work. Making the publishing industry less diverse, with fewer houses overall, is just as detrimental to these authors as it is to top-selling ones.
What is troubling about the decision is not that it focuses in on a certain type of author and certain type of book—the question of what the relevant “market” is in antitrust cases is a complicated one—but that the vision of authorship and publication it presents as typical does not reflect the lived experiences of most authors. The dominant narrative that “authors” are famous people who make a living from their writing, primarily through the high advances they receive from trade publishers, simply does not bear out in today’s information economy.
Overall, the decision in this case is in many ways a boon for authors who care about a vibrant and diverse publishing ecosystem—whether they are authors of anticipated top sellers or authors who forgo compensation and publish open access. When publishing houses consolidate, fewer books are published, and fewer authors can publish with these publishers. This could lead less commercially successful trade authors to turn to other publishers (whether small trade publishers, university presses, or boutique publishers), who would then be forced to take on fewer books by less commercially successful authors. Self-publishing is an option for authors no longer able to find publishers willing to take their work on, but self-published authors earned 58% less than traditionally published authors as of 2017, and this decrease could lead some authors to abandon their writing projects altogether. These downstream effects may be speculative, but they deserve attention: this is almost certainly not the last we will hear about anticompetitive behavior in the publishing industry, and the effects of this behavior on non-top selling authors also matter. We hope that future judges considering these thorny questions will remember that authors are not a monolith, yet all are affected by drastic changes to the publishing ecosystem.
We’re excited to invite you to join us for another book talk, co-sponsored with Internet Archive, with author Sarah Lamdan about her book Data Cartels.
Join SPARC’s Heather Joseph for a chat with author Sarah Lamdan about the companies that control & monopolize our information.
Book Talk: Data Cartels with Sarah Lamdan & Heather Joseph Co-sponsored by Internet Archive & Authors Alliance Wednesday, November 30 @ 10am PT / 1pm ET Register now for the virtual discussion. Purchase Data Cartels from The Booksmith
In our digital world, data is power. Information hoarding businesses reign supreme, using intimidation, aggression, and force to maintain influence and control. Sarah Lamdan brings us into the unregulated underworld of these “data cartels”, demonstrating how the entities mining, commodifying, and selling our data and informational resources perpetuate social inequalities and threaten the democratic sharing of knowledge.
Sarah Lamdan is Professor of Law at the City University of New York School of Law. She also serves as a Senior Fellow for the Scholarly Publishing and Academic Resources Coalition, a Fellow at NYU School of Law’s Engelberg Center on Innovation Law and Policy.
Heather Joseph is a longtime advocate and strategist in the movement for open access to knowledge. She is the Executive Director of SPARC, an international alliance of libraries committed to creating a more open and equitable ecosystem for research and education. She leads SPARCs policy efforts, which have produced national laws and executive actions supporting the free and open sharing of research articles, data and textbooks, and has worked on international efforts to promote open access with organizations including the United Nations,, The World Bank, UNESCO, and the World Health Organization.
Book Talk: Data Cartels with Sarah Lamdan & Heather Joseph Co-sponsored by Internet Archive & Authors Alliance Wednesday, November 30 @ 10am PT / 1pm ET Register now for the virtual discussion.
The problem with termination of transfer is that almost no one uses it. Professor Rebecca Giblin has written about this extensively–for example, in an article she co-authored last year demonstrating that in the eight years since works first became eligible for termination under Section 203, creators exercised their termination rights for very few works (e.g. only around 800 booksover that time period, a tiny fraction of those eligible).
The system is incredibly complex and confusing, with numerous exceptions and technical requirements, such that creators can’t reasonably navigate it without significant time, expense, and usually a team of lawyers. I won’t go into all the gory details, but this report by Public Knowledge provides a good overview, highlighting the ways that Termination of Transfer in practice fails creators. This is due to the law’s complexity and the ways publishers and other corporate rightsholders systematically weaponize that complexity to prevent creators from benefiting from termination of transfer. It can be hard for creators to know how to stand up for their rights, or even to know that their rights are at risk in the first place.
Exhibit 1: The Mechanical Licensing Collective’s Attempt to Erase the Termination Right for Songwriters
In it, the Copyright Office recounts an effort by music industry powers to essentially eliminate the termination right for songwriters who would be otherwise entitled to royalties for their songs when sold or streamed digitally. Thankfully, the Copyright Office is paying attention and has crafted a proposed rule to prevent such abuse.
A little bit of background: in the world of music licensing, songwriters often transfer their rights to music publishers. Among the ways that those publishers make money is by licensing the underlying musical composition (lyrics, music) for use in actual sound recordings. These are typically referred to as “mechanical licenses.” In 2018, Congress passed the Music Modernization Act (“MMA”), which established a new blanket licensing system for digital music providers (e.g. Spotify, YouTube Music, and Pandora) that want to stream or offer downloadable digital copies and need to obtain mechanical rights. The system is operated by something called the “Mechanical Licensing Collective,” a nonprofit designated by the Copyright Office pursuant to the MMA and run by a board of 13 directors (ten music publishing executives and three songwriters).
Given this new system of blanket licensing, the MLC had to decide how it would pay out royalties in situations where a songwriter terminated her transfer of rights to a music publisher. The way this works in other contexts–e.g. when ASCAP receives notice from a creator that a grant has been terminated–is that the licensing intermediary holds onto any royalties until it is clear (either by agreement of the parties, or court order) who owns the rights, and then pays out royalties to the appropriate party.
The MLC decided to take a different approach–it proposed a default rule that said that, even when a creator terminates rights, the appropriate payee would be whomever held rights in the work at the time when it happened to have been saved on a digital music provider’s server. This bizarre proposal is a bit easier to understand when you consider that it would also conveniently mean that the publishers would almost always be entitled to all future mechanical license royalties.The MLC, after finding that the Copyright Office and many creators objected to this brazen proposal, changed course (modestly) by adopting a different rule that did basically the same thing. Instead of establishing a process for holding funds until a dispute was resolved, the MLC adopted a rule that as long as a publisher had actively licensed the work and used it at least once before the termination date, the publisher would forever receive royalties from the MLC, and not the creator who terminated rights.
The MLCs legal rationale for its default rules was based on an incredibly generous (to publishers) reading of one of the exceptions to the termination right: the “derivative work” exception, which states that “a derivative work prepared under authority of the grant before its termination may continue to be utilized under the terms of the grant after its termination.” The MLC’s position was that this exception applied to any of the sound recordings used by digital music providers that incorporate music from songwriters, despite the statutory language in the MMA and elsewhere indicating that funds for mechanical rights under the statutory blanket license should be paid out to whomever the copyright holder is at the time of the actual use.
Thankfully, and unlike the MLC, the Copyright Office decided it would read the law for what it says. It concluded, reasonably, that the correct rule should be that whoever actually owns the rights should receive payment at the time the work is used. We plan to submit a comment supporting the Copyright Office’s proposed rule. You can do so too, here.
Making Termination Easier
We strongly believe that it should be easier for creators to exercise their termination rights, without having to jump through complex hoops and without having to battle with moneyed industry interests that seek to exploit and expand exceptions to the rule. We’ve created a number of resources to help authors terminate their transfers and regain their rights. These include a set of Frequently Asked Questions, a tool (created with Creative Commons) to guide authors through the process, and guidance and templates for how to effectuate a termination request. If you have questions or ideas on how we can help make the process easier, including advocating for changes in the law to make the system better, we want to hear from you! You can find us at email@example.com or online on Twitter at @auths_alliance.
Yesterday there was a pretty interesting class action lawsuit filed against Github and Microsoft. The suit is about Github’s Copilot service, which it advertises as “Your AI pair programmer.” As described by Github, Copilot is “trained on billions of lines of code” and “turns natural language prompts into coding suggestions across dozens of languages.” The suit focuses on Github’s reuse of code deposited with it by programers, mostly under open source licenses, which Github has used to train the Copilot AI. Those licenses generally allow reuse but commonly come with strings attached–such as requiring attribution and relicensing the new work under the same or similar terms. The class action asserts, among other things, that Github hasn’t followed those terms because it hasn’t attributed the source adequately and has removed copyright-relevant information.
Sounds interesting, but you might be wondering why we care about this lawsuit. For a few reasons: one, it raises some important questions about the extent to which researchers can use AI to train and produce outputs based on datasets of copyrighted materials, even materials thought generally “safe” because they’re available under open licenses. As the suit highlights, materials that are openly licensed aren’t without any restrictions (most include attribution requirements), but when those materials are aggregated and used to craft new outputs, it can be seriously complicated to find the right way to attribute all the underlying creators. If this suit raises the barrier to using such materials, it could pose real problems for many existing research projects. It could also result in further narrowing of what datasets are likely to be used by AI researchers– resulting an even smaller group of materials that include what law professor Amanda Levendowski refers to as “biased, low-friction data” (BLFD), which can lead to some pretty bad and biased results. How and when open license attribution requirements apply is important for anyone doing research with such materials in aggregate.
Second, the suit at least indirectly implicates some of the same legal principles that authors working on text-data mining projects rely on. We’ve argued (successfully, before the U.S. Copyright Office) that such uses are generally not infringing–-particularly for research and educational purposes-–because fair use allows for it. Several others, such as Professors Michael Carroll and Matthew Sag, have made similar arguments. Of course, Github Copilot has some meaningful differences from text-data mining for academic research; e.g., it is producing textual outputs based on the underlying code for a commercial application. But the fair use issue in this case could have a direct impact on other applications.
Interestingly, the Github Copilot suit doesn’t actually allege copyright infringement, which is how fair use would most naturally be raised as a defense. Instead, the plaintiffs, as class representatives, make two claims that could implicate a fair use defense: 1) a contractual claim Github has violated the open source license covering the underlying code, which generally require attribution among other things; 2) a claim Github has violated Section 1202 of the Digital Millennium Copyright Act by removing copyright management information (“CMI”) (e.g., copyright notice, titles of the underlying works).
The complaint attempts to avoid fair use issue, asserting that ”the Fair Use affirmative defense is only applicable to Section 501 copyright infringement. It is not a defense to violations of the DMCA, Breach of Contract, nor any other claim alleged herein.” The plaintiffs may well be trying to follow the playbook of another recent open source licensing case, Software Freedom Conservancy v. Vizio, which successfully convinced a federal court that its breach of contract claims, based on an alleged breach of the the GPLv2 license, should be considered separate and apart from a copyright fair use defense.
This suit is a little different though. For one, at least five of the eleven licenses at issue explicitly recognize the applicability of fair use; for example, the GNU General Public License version 3 provides that “This License acknowledges your rights of fair use or other equivalent, as provided by copyright law.” It would seem more of a challenge to convince a court that a fair use defense doesn’t matter when almost half of the licenses explicitly say it does. Likewise, while the text of Section 1202 doesn’t explicitly allow for a fair use defense, its restrictions are only applicable to the removal of CMI when it is done “without the authority of the copyright owner or the law.” The plaintiffs claim that fair use isn’t a defense to allegations of a Section 1202 violation, but thats far from clear, and it may be that removal of information pursuant to a valid fair use claim should qualify as removal with the “authority . . . of the law.”
The lawsuit is a class action, so it faces some special hurdles that a typical suit would not. For example, the plaintiffs must demonstrate that they can adequately represent the interests of the class, which it has defined as:
All persons or entities domiciled in the United States that, (1) owned an interest in at least one US copyright in any work; (2) offered that work under one of GitHub’s Suggested Licenses; and (3) stored Licensed Materials in any public GitHub repositories at any time between January 1, 2015 and the present (the “Class Period”).
That could pose a challenge given that it seems likely that at least a portion–if not a sizable portion–of those who contributed code to Github under those open licenses may be more sympathetic to Github’s reuse than the claims of the plaintiffs. In Authors Guild v. Google, another class action suit involving mass copying to facilitate computer-aided search and outputs like snippet view in Google Books, similar intra-class conflicts posed a challenge to class certification (including objections we raised on behalf of academic authors). The Github Copilot suit also includes a number of other claims that mean it could be resolved without addressing the copyright and licensing issues noted above. For now, we’ll monitor the case and update you on outcomes relevant to authors.