In December 2020, Authors Alliance, joined by the Library Copyright Alliance and the American Association of University Professors, filed a comment with the Copyright Office in support of a new three-year exemption to the Digital Millennium Copyright Act (“DMCA”) as part of the Copyright Office’s eighth triennial rulemaking process. If granted, our proposed exemption would allow researchers to bypass technical protection measures (“TPMs”) in order to conduct text and data mining research on literary works that are published electronically and motion pictures. This week, commenters who oppose the petition for this exemption were given an opportunity to respond to our proposed exemption.
Text and data mining (TDM) refers to automated analytical techniques aimed at analyzing digital text and data in order to generate information that reveals patterns, trends, and correlations in that text or data. TDM has great potential to enable groundbreaking research and contribute to the commons of knowledge. As a highly transformative use of copyrighted works done for purposes of research and scholarship, TDM fits firmly within the ambit of fair use.
But TDM researchers are currently hindered by Section 1201 of the DMCA, which prohibits the circumvention of TPMs used by copyright owners to control access to their works. Section 1201 makes TDM research on texts and films time consuming and inefficient—and in some cases, impossible—working against the promotion of the progress of knowledge and the useful arts that copyright law has been designed to incentivize. What’s more, Section 1201’s prohibitions force some TDM scholars to focus on works first published before 1925, which are in the public domain. Because authorship was far less diverse in 1925 than it is today, focusing TDM on pre-1925 texts privileges white male voices rather than being representative of authors contributing to the commons of knowledge today. For these reasons, our petition and supporting comments ask the Librarian of Congress to grant a new exemption to Section 1201’s anti-circumvention prohibitions that would allow researchers to bypass TPMs on e-books and films for the purpose of conducting TDM research.
Our response comment is due on March 10, 2021, and we look forward to working with the commenters to address their concerns and with the Copyright Office as it evaluates our petition for this new exemption to facilitate TDM research. TDM researchers who have information they would like to share with us to support our response are invited to contact us today.
The Librarian of Congress is expected to issue a final decision on the proposed exemption in October 2021. We will keep our members and readers apprised of any updates on our proposed exemption as the process moves forward. We’re grateful to law students from the Samuelson Law, Technology & Public Policy Clinic at UC Berkeley Law School for their work supporting our petition for this new exemption.
Authors Alliance is pleased to announce our partnership with Library Futures, a brand new organization which seeks to “empower libraries to fulfill their mission and provide non-discriminatory, open access to culture for the public good.” Last week, Library Futures officially launched with the stated goal of addressing the “deleterious impacts of an inequitable knowledge ecosystem.” The organization will engage in advocacy work, grant making, educational campaigns, and community building to effectuate its mission and work towards a technology-positive future for libraries.
We are excited to be a partner organization of Library Futures as it fights for equitable access to knowledge—an important issue for our members and authors writ large. Authors have an interest in a technology-forward future for libraries that ensures that readers, learners, and the general public can continue to discover and access their books in the digital age. We believe that the initiatives of Library Futures will help authors reach the audiences for which they write, advancing our own mission of supporting writers who write to be read.
Jennie Rose Halperin, the organization’s executive director, has said she is “honored to be leading this organization, which will take on major issues in libraries and help usher in a more inclusive digital future for teachers, learners, and researchers from every walk of life.” Library Futures board member Kyle Courtney has said he is hopeful that the organization can make real change on the issues of access and equity that are challenging libraries today: “Digital library books—when loaned correctly—can be a pivotal tool libraries use to preserve great works, provide patrons with access to books, and defend patron privacy. I hope the community will join us in standing up for the future of libraries.”
The Library Futures coalition, of which Authors Alliance is delighted to be a part, is a public interest alliance that “seeks to enable collective action while building power through an innovative advocacy organization.” Other coalition partners include the Internet Archive, Public Knowledge, Creative Commons, SPARC, and the Boston Public Library. We are excited to collaborate with Library Futures and our coalition partners to work towards a better, more equitable future for our libraries!
Last month was a busy one for copyright law (although we cannot fault you if you were distracted by other things going on in the world!). Now that the dust has settled on 2020, we are pleased to share this roundup of copyright developments that happened during the final weeks of last year. First, we saw a new draft bill seeking to reform the Digital Millennium Copyright Act (“DMCA”), and second, we saw two new copyright provisions included within the year-end stimulus package.
The Digital Copyright Act of 2021
In late December 2020, Senator Thom Tillis released a draft bill which aimed to make several reforms to the DMCA. Senator Tillis released this bill after posing a series of questions for stakeholders regarding how the DMCA could be reformed to reflect the needs of copyright holders and the state of the world 22 years after the DMCA was passed. Authors Alliance submitted a response to these questions, as did a multitude of other organizations and individuals. Our response cautioned against a notice-and-staydown system, and instead advised Senator Tillis that copyright law should seek to align the interests of individual creators with the interests of the public for whom they create. We also suggested several existing and new temporary exemptions to DMCA section 1201’s prohibition on bypassing technical protection measures that could be made permanent, and supported a proposal to streamline the section 1201 rule-making process. Finally, we argued that any reforming legislation should require a nexus between the relevant use and copyright infringement for there to be a violation of section 1201.
Senator Tillis’s bill proposes many reforms to copyright law, and unfortunately incorporates few of our suggestions. Most concerningly, the bill replaces the current “notice-and-takedown” system with a “notice-and-staydown” system whereby, once a copyright holder notifies a service provider that they believe a particular use is infringing, the service provider must remove all subsequent infringing uses unless the user makes a statement that the use is licensed or otherwise authorized by law (such as being a fair use). The draft bill also lowers the specificity required in takedown notices, establishes the Copyright Office as a division of the Department of Commerce, limits liability for users who use orphan works after a diligent but unsuccessful search for the copyright holder, and makes changes to the Copyright Office’s triennial rule-making process and exemptions on the DMCA’s prohibition on bypassing technical protection measures with the aim of streamlining the process. Senator Tillis has invited stakeholders to submit reply comments to the draft bill by March 5th.
Copyright Alternative in Small-Claims Enforcement Act of 2020 (CASE Act)
The year-end stimulus package included a provision Authors Alliance has spoken out againstbefore: The CASE Act, co-sponsored by several members of Congress. In short, the CASE Act creates a small claims tribunal—known as the Copyright Claims Board (“CCB”)—within the Copyright Office for copyright disputes as an alternative to pursuing copyright claims in federal court. Proponents of the CASE Act argue that it will help individual creators, who often cannot afford the expense of bringing litigation in federal court, but are more likely to be able to afford the lesser costs associated with pursuing the dispute in the CCB. A more accessible forum for resolving copyright disputes is an admirable goal, but the CASE Act seeks to achieve it in a way that is, in our view, extremely flawed. The CASE Act allows for excessive damages, does not provide for review by a court in most cases, and the overall scheme is one we fear will invite litigation by copyright trolls.
In September 2019, we wrote to Congress voicing our concerns about the CASE Act, but unfortunately it was signed into law last month as part of the year-end stimulus package, leading critics to note that it had little to nothing to do with the “must-pass spending bill.” The CCB is set to begin operations by the end of December 2021, unless the Copyright Office makes the determination to delay implementation.
Protecting Lawful Streaming Act of 2020
Also included in the year-end stimulus package was a provision known as the Protecting Lawful Streaming Act. The Act—sponsored and led by Senator Thom Tillis and Senator Patrick Leahy—targets and punishes “commercial, for profit” services that stream large amounts of copyrighted content without proper authorization. Senator Tillis has said that these services cost the U.S. economy billions of dollars annually. The provision drew attention in part because of its harsh penalties—violators can be sentenced to up to 10 years in prison.
The Protect Lawful Streaming Act is not intended to apply to individual Internet users who access such unauthorized streams, and co-sponsor Senator Leahy has characterized the law as a “narrow” one which only “target[s] only commercial, for-profit criminal privacy.” Critics have noted that there is no glaring need for harsher criminal penalties for copyright infringement, which can already be incredibly costly for alleged infringers, but also acknowledged that the Act is narrow enough that it is unlikely to create liability for individual users or institutional actors acting in good faith. This law is also unlikely to directly negatively affect authors, though we are always wary of expanding copyright liability where there may not be a particular need.
We are pleased to share the highlights of Authors Alliance’s work in 2020 to promote laws, policies, and practices that enable authors to reach wide audiences. Inside, you’ll find details of how we’re helping authors leverage their rights to make—and keep—their works available in the ways they want.
We thank Authors Alliance board member Thomas Leonard for this guest post. Leonard is a University Librarian Emeritus and a Professor of Journalism Emeritus at the University of California, Berkeley. He has served as the president of the Association of Research Libraries and as an Associate Editor of American National Biography. Leonard is also the author of three books on the development of American media.
This blog space has featured the most provocative books, briefs, and cases that our well-informed members discovered in 2020. With the benefit of hindsight, we probably made the wisest decision in illustrating a post, just before Thanksgiving, with the picture of an intense reader and Dr. Seuss’s Hop on Pop. No work we cited, after all, has gained more attention in the English-speaking world.
Hop on Pop is the only volume mentioned last year that has its own Wikipedia page and, word-for-word (the volume we pictured has fewer than 150) it may be the shortest literary work to earn Wikipedia page honor. The lessons for IP grown-ups are many.
Theodor Seuss Geisel (1904-1991) has a place in the origin story of how open access publishing and an enormous corpus of digitized works open to readers took shape in the early years of this century. The Geisel Library at the University of California San Diego provided leadership. All the hard work of UC librarians across the state now rests safely in a full-text home for 8.5 million book titles. This repository has a name that Dr. Seuss would have loved: The HathiTrust, so spelled and punctuated, with a quizzical elephant the first thing you will see.
Hop on Pop draws on an honored tradition in children’s literature: Unaccountable but engaging violence. Authors Alliance pictured an edition of Hop on Pop that dialed this down. The volume that the girl is reading has been abridged so that it will easily fit into her small hands. “Night Fight” is one of the adventures that was cut, though plenty of mayhem remains. For those looking for bright ideas about how to adjust content to age level, here is one. (Older kids who use the longer Hop on Pop to build reading skills get the full story on all the ways to gouge and bash at home.)
If you have encountered Dr. Seuss scholarship (conveniently available on a post from the BBC’s culture desk), you may believe that Hop on Pop was Hop on Politics in Geisel’s mind. Famously, Dr. Seuss did this with the environmental movement in The Lorax. We need not take an excursion back to the eddies of World War II propaganda and Cold War defiance that marked Geisel’s career, as enlightening as this may be. It is the Seussville website where we should go, where the publisher of Hop on Pop for the past six decades has now done some surprising things.
Random House, owned by Bertelsmann, often does in court what we expect large media companies to do in protecting IP. Not long ago, the publisher challenged a mash up of Dr. Seuss and Star Trek. (The Ninth Circuit recently ruled that this was not fair use).
Seussville is obviously a marketing scheme to sell books and merchandise. But there is more to it than that. Random House offers lots of space for followers of Dr. Seuss to be creative with his work; the publisher preaches civic engagement. These are welcome, but unexpected outcomes for guardians of intellectual property.
Scores of Seuss characters, games and other activities are “Printable” on this site, surely a godsend to COVID-confined Americans with young families or their online teachers. The spirit of Seussville is that Seuss creations should be used and shared; no one pokes you in the ribs to watch out for copyright. The content is free.
Dr. Seuss’s lessons go deeper than you’d expect, and not only in The Lorax tradition. One of the many elephants who is excited about the 2020 U.S. Census alerts kids that April 1, 2020 was the cut-off date for counting newborns. “Learn About Government with Dr. Seuss!” the site thundered during the 2020 campaign. Under the gaze of characters who were having bad hair days, school kids were instructed on how to cast ballots and, presciently, how to count them. This was a drive to get kids moving, “with a focus on the American Presidency!”
Much of IP talk is the talk of a zero-sum game, in which only a rights holder or a consumer can win. In 2021, and especially for kids, allowing for a liberal use of Dr. Seuss shows that this need not be true.
Earlier this month, we celebrated the new batch of literary works entering the public domain, and shared with you some common ways that works enter the public domain. Once a work is in the public domain, authors and the public at large can make any use of it in any way they wish, including uses that were formerly the exclusive right of the copyright holder. One such right is the right to prepare derivative works based on the public domain work. Derivative works are new works which build off of pre-existing works, such as translations or theatrical adaptations. Today, we will discuss new uses that can be made of works that have fallen into the public domain using examples from popular films and literature.
One new derivative work based on The Great Gatsby and published just this month is Michael Farris Smith’s Nick, a new prequel. Nick imagines Nick Carroway’s life prior to his time at West Egg, explores Nick’s trauma, and describes a stay in New Orleans after World War I. While the Fitzgerald Trust, which controls the rights to Fitzgerald’s works under copyright, has been selective in granting licenses to prepare derivative works based on Gatsby in the past, it can no longer “try and safeguard the text, to guide certain projects and try to avoid unfortunate ones.” For instance, one recently licensed derivative work of Gatsby was a graphic novel published in June 2020. Fitzgerald Trustee Blake Hazard “was closely involved with the graphic novel” and selected the illustrator herself. Now, anyone is free to use Gatsby as a building block for add-on creation like graphic novels without permission from the Fitzgerald Trust. And we are sure to see new derivative works emerge in the coming months and years: trade publishers are planning new hardcover editions, and fans have recently called for a Muppet version of the novel (though we note that this is complicated by the fact that Disney controls the copyright in the Muppets).
Derivative Works in Popular Culture
Derivative works based on works that have entered the public domain are nothing new. Shakespeare’s plays—which have always existed in the public domain, since their publication predated the first copyright law—have inspired a multitude of beloved derivative works, from filmsTen Things I Hate About You (The Taming of the Shrew) and She’s the Man (Twelfth Night) to Ray Bradbury’s Something Wicked this Way Comes (Macbeth), and has inspired numerous loose retellings such as Brave New World (The Tempest) and even Disney’s The Lion King (Hamlet).
In fact, derivative works based on public domain works will themselves eventually enter the public domain once their copyrights expire, enabling the creation of new derivative works based on now-public domain derivative works. For example, the musical and film, West Side Story, is a derivative work based on Shakespeare’s Romeo and Juliet, a play which itself drew heavily on Ovid’s Pyramus and Thisbe, such that Romeo and Juliet too could be considered a derivative work. Both Romeo and Juliet and Pyramus and Thisbe were published prior to the passage of the first copyright law, but this example illustrates how derivative works based on public domain works can lead to the evolution of popular stories over time. In this way, creating derivative works based on works in the public domain fosters the development of culture and knowledge—a core purpose of copyright law.
Reaching New Audiences with Derivative Works
Derivative works can also enable the original work to reach new audiences. Shakespeare’s plays can be daunting for contemporary readers, using unfamiliar language and conventions. But the multitude of derivative works based on Shakespeare plays brings the stories to audiences who may not be interested in reading the original works, enhancing access to the stories in the process.
It may surprise you to learn that Disney—colossal and vocal defender of copyright protection—has for decades taken advantage of the public domain to produce some of its most popular and successful films. In the 90s, Disney co-produced with Jim Hensen studios two Muppets movies based on public domain books: A Muppet Treasure Island and A Muppet Christmas Carol, based on out-of-copyright works by Robert Louis Stevenson and Charles Dickens respectively. The list goes on—Snow White, Cinderella, and Sleeping Beauty are all based on Grimms’ Fairy Tales; The Little Mermaid is based on a Hans Christian Andersen story, as is the more recent Frozen—a retelling of Andersen’s The Snow Queen. In general, the Disney adaptations made these stories more palatable for children, such as changing the ending of The Little Mermaid from one in which “[Ariel’s] heart is broken when her prince marries someone else” and ultimately sacrifices herself rather than killing the prince, as Ursula demands, to the happily-ever-after ending we know today.
In this way, new derivative works based on public domain works can enable the original work to reach new audiences. Public domain texts can be made freely available online for anyone to read, enhancing access to those texts for those without access to the print editions. Translations are derivative works which allow public domain texts to reach audiences who lack fluency in the work’s original language, and a wide variety of adaptations—from abridged versions for less advanced readers to so-called critical editions for college students—can help the work reach readers of different demographics.
The possibilities for add-on creation to works that have entered the public domain are endless. We encourage our members and readers to explore the public domain and discover new sources of inspiration!
Last week, we celebrated a new batch of works from 1925 entering the public domain. In copyright, the public domain is the commons of material that is not protected by copyright. When a work enters the public domain, anyone may do anything they want with the work, including activities that were formerly the “exclusive right” of the copyright holder like making copies of, sharing, and adapting the work.
Some people mistakenly think that the “public domain” means anything that is publicly available. This is wrong: The public domain has nothing to do with what is readily available for public consumption. Just because a work is freely available on the internet, for example, doesn’t mean the work is in the public domain. Under today’s copyright laws, copyright protection is automatic. This means, for example, that a photographer could take and upload a photograph to a publicly accessible website, and—despite its public availability online—unauthorized uses of the photograph may be infringing, unless the use is otherwise allowed under an exception to copyright.
Just how do works become a part of the public domain? In this post, we’ll share some of the ways in which works enter the public domain or simply exist as a part of the public domain because of the limits of copyright.
One way that works become a part of the public domain is the expiration of their copyright protection. Copyright protects works for a limited time and after that, the copyright expires and works fall into the public domain. Under U.S. copyright law, as of 2021, all works first published in the United States in 1925 or earlier are now in the public domain due to copyright expiration. Copyright law has changed over time and the term of copyright is now calculated based on the life of the author. Under today’s copyright laws, works created by an individual author today won’t enter the public domain until 70 years after the author’s death.
While 2021 brings certainty that works first published in the United States in 1925 are in the public domain, changes in copyright duration and renewal requirements during the 20th century mean that works first published in the United States between 1926 and March 1, 1989 could also be in the public domain because their copyrights were not renewed or because the copyright owner failed to comply with other “formalities” that used to be required for copyright protection. These formalities included requirements that the copyright owner register her work with the Copyright Office and mark the work with a copyright notice upon publication. Analysis from the New York Public Library revealed that approximately 75% of copyrights for books were not renewed between 1923-1964, meaning roughly 480,000 books from this period are most likely in the public domain.
Under today’s copyright laws, authors of new published works are no longer required to comply with any formalities to be eligible for copyright protection, though there are significant benefits to doing so.
Uncopyrightable Subject Matter
Copyright law is not unlimited. There are certain things that are seen as fundamental building blocks of creativity and authorship and are therefore simply not protected by copyright, entering the public domain automatically.
An important category of things that are not copyrightable are facts—even if those facts are obscure or were difficult to collect. For instance, suppose that a historian spent several years reviewing field reports and compiling an exact, day-by-day chronology of military actions during the Vietnam War. Even though the historian expended significant time and resources to create this chronology, the facts themselves would be free for anyone to use. That said, the way that the facts are expressed—such as how they are articulated in an article or a book—is copyrightable. The lack of copyright protection for facts is central to copyright law: Even “asserted truths,” or information presented as factual which later turns out to be untrue, are part of the public domain.
Ideas, themes, and scènes à faire are categories of expression that are also outside of copyright protection. These concepts are closely related, and the overarching justification for excluding them from copyright protection is that they are simply too general and standard to a particular genre or convention for an individual creator to be granted a temporary monopoly on them. Here again, though copying the words used to express the idea or theme could constitute infringement, the similarity of general ideas, themes, or other elements of a work which are standard in the treatment of a given topic cannot form the basis of an infringement claim. For more on ideas, themes, and scènes à faire, check out our post on uncopyrightable subject matter for fiction writers.
The U.S. Copyright Office provides information about additional types of works and subject matter that do not qualify for copyright protection, including names, titles, and short phrases; typeface, fonts, and lettering; blank forms; and familiar symbols and designs. It is worth noting that other areas of intellectual property, such as patent or trademark law, could provide protection for categories that are not eligible for copyright protection.
The Copyright Act provides that works created by the United States federal government are never eligible for copyright protection, though this rule does not apply to works created by U.S. state governments or foreign governments. And under the government edicts doctrine, judicial opinions, administrative rulings, legislative enactments, public ordinances, and similar official legal documents are not copyrightable for reasons of public policy.
The U.S. Copyright Office also reminds potential registrants that works that “lack human authorship” are uncopyrightable, using as an example “a photograph taken by a monkey.” Sound familiar?
Abandonment / No Rights Reserved
In theory, a copyright owner can voluntarily abandon her copyright prior to the expiration of the work’s copyright term by engaging in an overt act reflecting the intent to relinquish her rights. Abandoned works then become part of the public domain, free from copyright and available for anyone to use.
Creative Commons offers a “No Rights Reserved” tool for copyright owners who wish to waive copyright interests in their works and thereby place them as completely as possible in the public domain. And recently, satirist Tom Lehrer added a statement to his website granting permission to the public to download and reuse his lyrics, noting that they “should be treated as though they were in the public domain.” That said, a scholarly article by Dave Fagundes and Aaron Perzanowski criticizes the current state of the law surrounding copyright abandonment. The authors assert that the lack of a clear, reliable way to abandon copyright frustrates authors who wish to abandon their copyrights, and the practical effectiveness of abandonment is undermined by the lack of a broadly accessible record of abandoned works.
Literary aficionados and copyright buffs alike have something to celebrate as we welcome 2021: A new batch of works published in 1925 is entering the public domain on January 1. In copyright, the public domain is the commons of material that is not protected by copyright. When a work enters the public domain, anyone may do anything they want with the work, including activities that were formerly the “exclusive right” of the copyright holder like copying, sharing, and adapting the work.
If you agree with BBC Culture’s assessment that the year 1925 was a “golden moment in literary history,” and maybe even “literature’s greatest year,” there is reason to be excited about the latest collection of books to enter the public domain in the United States. Some of the more recognizable titles include:
F. Scott Fitzgerald’s The Great Gatsby (published in 1925, renewed in 1953)
Theodore Dreiser’s An American Tragedy (published in 1925, renewed in 1953)
Copyright owners of works first published in the United States in 1925 needed to renew the work’s copyright in order to extend the original 28-year copyright term. Initially, the renewal term also lasted for 28 years, but over time the renewal term was extended to give the copyright holder an additional 67 years, for a total term of 95 years. This means that works that were first published in the United States in 1925—provided they were published with a copyright notice, were properly registered, and had their copyright renewed—are protected through the end of 2020.
So what new creativity might we have to look forward to with the current collection of 1925 works entering the public domain? Blake Hazard, F. Scott Fitzgerald’s great-granddaughter and a trustee of his literary estate offers one possibility. Hazard told the Associated Press that, as The Great Gatsby’s95 years of copyright protection was coming to a close, “We’re now looking to a new period and trying to view it with enthusiasm, knowing some exciting things may come. […] I would love to see an inclusive adaptation of Gatsby with a diverse cast. Though the story is set in a very specific time and place, it seems to me that a retelling of this great American story could and should reflect a more diverse America.”
Yesterday, Authors Alliance, joined by the Library Copyright Alliance and the American Association of University Professors, filed a comment with the Copyright Office for a new three-year exemption to the Digital Millennium Copyright Act (“DMCA”) as part of the Copyright Office’s eighth triennial rulemaking process. Our proposed exemption would allow researchers to bypass technical protection measures (“TPMs”) in order to conduct text and data mining research on both literary works that are published electronically and motion pictures.
Background: Section 1201 and Exemptions
Section 1201 of the DMCA prohibits the circumvention of TPMs used by copyright owners to control access to their works. It also prohibits the manufacture or sale of devices or programs designed to circumvent these TPMs. In other words, section 1201 prevents individuals from breaking digital locks on copyrighted works, even when they seek to make a fair use of those copyrighted works or engage in otherwise non-infringing activities.
Because section 1201’s prohibitions can interfere with fair and socially beneficial uses of copyrighted works, the DMCA also provides for a triennial rulemaking process to grant temporary exemptions to the prohibitions. Authors Alliance has participated in each 1201 rulemaking cycle since our founding, petitioning for exemptions and their renewals to help authors enjoy their rights while ensuring their creations reach new audiences during the 2015 and 2018 cycles. For the upcoming 2021 rulemaking, we have petitioned for a new exemption that would allow researchers to bypass TPMs on literary works distributed electronically and films for the purpose of conducting text and data mining (“TDM”) research, in addition to our petition to renew an exemption for multimedia e-books.
Text and Data Mining
Text and data mining refers to automated analytical techniques aimed at analyzing digital text and data in order to generate information that reveals patterns, trends, and correlations in that text or data. TDM has great potential to enable groundbreaking research and contribute to the commons of knowledge. As a highly transformative use of copyrighted works done for purposes of research and scholarship, TDM fits firmly within the ambit of fair use. But the current prohibition on bypassing TPMs in section 1201 makes TDM research on texts and films time consuming and inefficient—and in some cases, impossible—working against the promotion of the progress of knowledge and the useful arts that copyright law has been designed to incentivize.
Because literary works distributed electronically and motion pictures are protected TPMs, researchers—unable to bypass these TPMs due to section 1201—can turn instead to works in the public domain for their TDM research. With regards to films, this avenue is effectively unavailable, since works published after 1925 generally remain under copyright. For literary TDM scholars, literary works published before 1925 remain a potential alternative area of study, but focusing TDM on pre-1925 texts “further reinscribes white men as the center of the field and further marginalizes women and people of color.” Authorship was far less diverse in 1925 than it is today, so TDM research on public domain texts ends up privileging white male voices rather than being representative of authors contributing to the commons of knowledge today.
Our petition for a TDM exemption is accompanied by letters of support from 14 separate authors and researchers currently engaged in TDM research on literary works and films whose work has been hampered by section 1201, and two additional letters from experts who support TDM researchers. Here are just a few examples of their experiences:
The Data Sitters Club is a group of scholars under the Stanford University Literary Lab, “a research collective that applies computational criticism, in all its forms, to the study of literature.” The Data Sitters Club explores research questions in relation to the well-known Baby-Sitters Club series, a series for elementary and middle school aged girls that was popular primarily in the 80s and 90s. The group would like to use computational analysis to investigate the extent to which the characters have distinct voices and explore the series’ treatment of religion, race, adoption, divorce, and disability. The Data Sitters Club sees their study as a step towards exploring the worldview of American women in their 30s and 40s who read the Baby-Sitters Club books as children. It also has the goal of investigating common tropes in the books to explore these questions further.
There are over 200 books in the series, yet literary scholarship on the Baby-Sitters Club is sparse. Due to this gap, the power of TDM to shed new insights on large quantities of text, and the formative effect of children’s literature on its readers, the group sees a particular impetus to explore how the “iconic depiction of girlhood in the upper-middle-class American suburbs” has both mirrored and shaped its readers’ views of the world. Yet because the Baby-Sitters Club books were all written during the latter half of the 20th century, they remain under copyright, and the e-book versions are protected by technical protection measures, as is almost always the case with e-books. Because of section 1201’s prohibition on bypassing TPM, the Data Sitters Club cannot use the Baby-Sitters Club e-books for their project, and are instead forced to manually scan physical books and correct any transcription errors before they can apply their computational analysis to the texts, limiting the amount of texts they can study and detracting from the time they can spend on their important research questions.
Professor Dan Sinykin, an assistant professor at Emory University who teaches English and computational analysis, is currently at work on a book, The Conglomerate Era, which seeks to explore how the conglomeration of U.S. publishing changed fiction: in the 1950s, almost every publisher in the country was independent, but today, despite the continue presence of some independent publishers in the ecosystem, only five multinational media conglomerates dominate the trade market (soon to be four, with the planned merger of Penguin Random House and Simon & Schuster). Professor Sinykin would like to use TDM “to detect patterns of change across thousands of novels across decades” in a groundbreaking exploration of literary history. However, because he seeks to study works published after 1945, which remain protected under copyright, Professor Sinykin’s project is made much more difficult due to section 1201’s prohibition on bypassing technical protection measures.
Because he cannot use the e-book versions of late-20th century novels to do his analysis, Professor Sinykin must use HathiTrust, a digital corpus of works under copyright that scholars can use for TDM purposes with subscriptions or institutional affiliations. Professor Sinykin points out the weaknesses of using HathiTrust, such as the cumbersomeness of using HathiTrust’s “data capsules,” including their limited computing power and the difficulty of accessing the capsules securely. The HathiTrust capsules are also limited to “holdings of select university libraries” and are not representative of fiction during the time period Sinykin wishes to study. Importantly, HathiTrust is not free, making the type of research Sinykin is currently undertaking inaccessible to scholars with fewer resources. If Professor Sinykin could bypass TPM on e-books and use those for his project, he could use more representative fiction texts and would thus be enabled to “write a better, truer book about conglomeration.” He could also teach TDM to his students—the next generation of scholars—to ensure that this work continues in the future.
Professor David Bamman is an assistant professor at UC Berkeley whose research focuses on natural language processing and cultural analytics, and whose current TDM project involves films. Professor Bamman also has experience applying natural language processing to a digitized collection of books which he and his team manually scanned themselves (similar to the Data Sitter’s Club’s workaround) due to concern over section 1201 liability if they instead bypassed TPMs.
In 2018, he became interested in applying TDM techniques—computer vision and video processing techniques specifically—to film, and decided to compile a data set of films to explore whether directorial style in movies can be measured and quantified. Professor Bamman estimated that a dataset of approximately 10,000 films would allow him to conduct this research and explore how directorial style can be decomposed and measured, such as through types and lengths of shots and the color palette used in the film. Yet, cognizant of section 1201’s prohibition on bypassing technical protection measures, Professor Bamman purchased individual DVDs and underwent the burdensome process of playing them on a computer, and using “screen-capture” software to record the movie as it played in real time. This method does not violate section 1201, but proved to be insufficient for Professor Bamman’s project, as it would have apparently taken a human operator 10 years to manually screen capture enough films for him to complete his corpus. As a result, Professor Bamman has abandoned this line of research, despite seeing immense value in research questions around “historical trends in film over the past century.”
We’re grateful to law students from the Samuelson Law, Technology & Public Policy Clinic at UC Berkeley Law School for their work preparing the comment. Responses from commenters who oppose the petition for this exemption are due February 9, 2021 and further comments in support of the petition, or from those who neither support nor oppose the petition, are due March 10, 2021. The Librarian of Congress is expected to issue a final decision on the proposed exemption in October 2021. We will keep our members and readers apprised of any updates on our proposed exemption as the process moves forward.
Since 2014, you have helped us fulfill our mission to advance the interests of authors who want to serve the public good by sharing their creations broadly. If 2020 has made anything clear, it is that empowering authors in the public sphere is more important than ever before.
We’re proud of our accomplishments in 2020, but we cannot continue to do this work without your support. Please consider making a tax-deductible donation today to help us carry on our work in 2021. Every contribution enables us to do our part to help you keep writing to be read!