Update: Post-Hearing Letter on 1201 Exemption to Enable Text and Data Mining Research

Posted May 25, 2021
Abstract pattern of green oblong shapes on black background
Photo by Michael Dziedzic on Unsplash

Authors Alliance, joined by the Library Copyright Alliance and the American Association of University Professors, is petitioning the Copyright Office for a new three-year exemption to the Digital Millennium Copyright Act (“DMCA”) as part of the Copyright Office’s eighth triennial rulemaking process. If granted, our proposed exemption would allow researchers to bypass technical protection measures (“TPMs”) in order to conduct text and data mining research on literary works that are distributed electronically and motion pictures. Last week, Authors Alliance responded to questions posed by the Copyright Office as it considers the merits of the proposed exemption following last month’s public hearing on the exemption.

Text and data mining (“TDM”) refers to automated analytical techniques aimed at analyzing digital text and data in order to generate information that reveals patterns, trends, and correlations in that text or data. TDM has great potential to enable groundbreaking research and contribute to the commons of knowledge. As a highly transformative use of copyrighted works done for purposes of research and scholarship, TDM fits firmly within the ambit of fair use. But TDM researchers are currently hindered by Section 1201 of the DMCA, which prohibits the circumvention of TPMs used by copyright owners to control access to their works. Section 1201 makes TDM research on texts and films time consuming and inefficient—and in some cases, impossible—working against the promotion of the progress of knowledge and the useful arts that copyright law has been designed to incentivize.

Following last month’s hearing, the Copyright Office asked proponents and opponents of the proposed exemption to: 1) describe minimum security measures eligible institutions should be required to use to secure corpora of literary works or motion pictures used for TDM research and 2) share views on potential regulatory language that would limit a researcher’s ability to view literary works or motion pictures included in corpora. In addition, opponents were given the opportunity to respond to changes proponents proposed in our reply comment to address opponents’ concerns.

Security Standards and Controls

With respect to security measures, our response describes the flexible process that information security and data management professionals at research institutions use to select and apply security controls to research data. This approach tracks the processes laid out in international standards and federal agency procedures. We explain why these risk assessment frameworks are superior to a globally applied fixed list of minimum security requirements and how they are consistent with the Office’s approach to information security in previous exemptions.

Our letter provides examples of common and effective security controls used in many research settings, including user authentication, use of encryption, event logging, and maintaining physical security of the resources housing the data. We recommend that the Office should identify these controls as examples of reasonable security measures, while leaving room for information security departments and researchers to fine-tune the precise security controls used to the specifics of the research corpus and the information system in which it is housed.

Prohibiting Researchers from Viewing Text and Images

With respect to the extent to which regulatory language should limit a researcher’s ability to view literary works or motion pictures included in corpora, we clarify that while researchers do not need this exemption for the purpose of viewing the full text or images of the works that they or their institutions have already obtained lawfully, researchers must be able to verify their research methods and research results. This verification requires that researchers have some ability to view corpus text or images. That ability is consistent with the research environments of both HathiTrust Data Capsules and Google Book Search, and it is consistent with fair use precedent.

Our letter explains why researchers need to view enough of a corpus to verify their methods and their findings. By way of example, if an algorithm tells the researcher that frame #133292 of a corpus copy has a high probability of being a scene of violence, and that frame corresponds to a scene in the film Pulp Fiction, the researcher would not watch a copy of the DVD or digital download in its original format to verify that finding. But at some point, either the researcher or peer reviewer may need to locate and examine frame #133292, a designation that exists only in the corpus copy, to verify the algorithmic finding. Our response explains that a blanket prohibition on viewing text or images would comprehensively undermine TDM research relying on the exemption and would provide little added value or protection given the other restrictions in the proposed exemption. For this reason, we recommend that—although we do not believe an express viewability limitation is warranted—should the Office choose to include one, it should use the model of the HathiTrust Research Center’s Non-Consumptive Use Policy rather than an outright ban on viewability.

* * *

The Librarian of Congress is expected to issue a final decision on the proposed exemption in October 2021. We will keep our members and readers apprised of any updates on our proposed exemption as the process moves forward. We’re grateful to law students and faculty from the Samuelson Law, Technology & Public Policy Clinic at UC Berkeley Law School for their work supporting our petition for this new exemption.