Update: Hearing on New 1201 Exemption to Enable Text and Data Mining Research

Posted April 8, 2021
Abstract pattern of green oblong shapes on black background
Photo by Michael Dziedzic on Unsplash

Authors Alliance, joined by the Library Copyright Alliance and the American Association of University Professors, is petitioning the Copyright Office for a new three-year exemption to the Digital Millennium Copyright Act (“DMCA”) as part of the Copyright Office’s eighth triennial rulemaking process. If granted, our proposed exemption would allow researchers to bypass technical protection measures (“TPMs”) in order to conduct text and data mining research on literary works that are distributed electronically and motion pictures. Yesterday, Authors Alliance participated in public hearings hosted by the Copyright Office to consider the merits of the proposed exemption.

Text and data mining (“TDM”) refers to automated analytical techniques aimed at analyzing digital text and data in order to generate information that reveals patterns, trends, and correlations in that text or data. TDM has great potential to enable groundbreaking research and contribute to the commons of knowledge. As a highly transformative use of copyrighted works done for purposes of research and scholarship, TDM fits firmly within the ambit of fair use.

But TDM researchers are currently hindered by Section 1201 of the DMCA, which prohibits the circumvention of TPMs used by copyright owners to control access to their works. Section 1201 makes TDM research on texts and films time consuming and inefficient—and in some cases, impossible—working against the promotion of the progress of knowledge and the useful arts that copyright law has been designed to incentivize.

At yesterday’s hearing, the clinical team from the Samuelson Law, Technology & Public Policy Clinic at UC Berkeley Law School representing Authors Alliance testified about the details of the exemption and its immense value for TDM researchers. The team explained how section 1201 prevents those researchers from creating the corpora of works they need to discover new insights from text and data mining, interfering with their ability to generate new copyrighted works that add to our cultural understanding and advance human knowledge.

Specifically, clinic students, Ziyad Alghamdi, Tait Anderson, and Erin Moore, and clinical supervisor, Professor Erik Stallman, shared how section 1201’s prohibitions chill new research and hinder the progress of knowledge in at least three ways: 1) forcing researchers to limit datasets in a way that makes their findings less illuminating than they would otherwise be, 2) causing researchers to artificially constrain research to public domain texts, and 3) leading researchers to abandon potential TDM projects altogether.

Opponents of the exemption testifying in the hearing—representing publishers, the software industry, and content licensing organizations— raised concerns about whether TDM was fair use under copyright law, whether the proposed security measures for the TDM corpora were sufficient to allay their security concerns, and whether alternatives like pre-assembled TDM corpora would be adequate for TDM researchers.

Regarding fair use, Erin Moore testified that relevant case law firmly establishes TDM as a fair use, and that the fact that the use could have been officially licensed by the copyright holder does not mean the use is not a fair one. Moore also emphasized the noncommercial and educational nature of the uses TDM researchers seek to make under this exemption, classic features of fair use. To address opponents’ security concerns, Tait Anderson explained that “reasonable security measures” as used in our petition was concrete enough to require researchers to take precautions to prevent against public dissemination and unauthorized sharing, while not being overly prescriptive in order to accommodate a wide range of TDM projects with different levels of sensitivity in the underlying data. On the topic of existing alternatives to the TDM corpora the TDM researchers seek to compile, Ziyad Alghamdi highlighted the limitations of commercial TDM databases like Hathitrust, which are both limited in the scope of works they contain and how TDM research can be conducted using these works. TDM researchers are seeking this exemption in part because these databases are costly, difficult to use, and incomplete for answering research questions about contemporary literary works and films.

Other topics discussed during the lively hearing included whether the proposed exemption should align with similar carve outs for TDM research in Europe and Japan, how sharing corpora with affiliated researchers for peer review purposes might work, and how and whether literary works and films should be analyzed differently for the purposes of the exemption. The Librarian of Congress is expected to issue a final decision on the proposed exemption in October 2021. We will keep our members and readers apprised of any updates on our proposed exemption as the process moves forward. We’re grateful to law students from the Samuelson Law, Technology & Public Policy Clinic at UC Berkeley Law School for their work supporting our petition for this new exemption.