Category Archives: Uncategorized

Prosecraft, text and data mining, and the law

Posted August 14, 2023

Last week you may have read about a website called, a site with an index of some 25,000 books that provided a variety of data about the texts (how long, how many adverbs, how much passive voice) along with a chart showing sentiment analysis of the works in its collection and displayed short snippets from the texts themselves, two paragraphs representing the most and least vivid from the text. Overall, it was a somewhat interesting tool, promoted to authors to better understand how their work compares to those of other published works. 

The news cycle about was about the campaign to get its creator Benji Smith to take the site down (he now has) based on allegations of copyright infringement. A Gizmodo story about it generated lots of attention, and it’s been written up extensively, for example here, here, here, and here.  

It’s written about enough that I won’t repeat the whole saga here. However, I think a few observations are worth sharing:  

1) Don’t get your legal advice from Twitter (or whatever its called)

Fair Use does not, by any stretch of the imagination, allow you to use an author’s entire copyrighted work without permission as a part of a data training program that feeds into your own ‘AI algorithm.’”  – Linda Codega, Gizmodo (a sentiment that was retweeted extensively)

Fair use actually allows quite a few situations where you can copy an entire work, including situations when you can use it as part of a data training program (and calling an algorithm “AI” doesn’t magically transform it into something unlawful). For example, way back in 2002 in Kelly v. Ariba Soft, the 9th Circuit concluded that it was fair use to make full text copies of images found on the internet for the purpose of enabling web image search. Similarly, in AV ex rel Vanderhye v. iParadigms, the 4th Circuit in 2009 concluded that it was fair use to make full text copies of academic papers for use in a plagiarism detection tool.  

Most relevant to prosecraft, in Authors Guild v. HathiTrust (2014)  and Authors Guild v. Google (2015) the Second Circuit held that Google’s copying of millions of books for purposes of creating a massive search engine of their contents was fair use . Google produced full-text searchable databases of the works, and displayed short snippets containing whatever term the user had searched for (quite similar to prosecraft’s outputs). That functionality also enabled a wide range of computer-aided textual analysis, as the court explained: 

The search engine also makes possible new forms of research, known as “text mining” and “data mining.” Google’s “ngrams” research tool draws on the Google Library Project corpus to furnish statistical information to Internet users about the frequency of word and phrase usage over centuries.  This tool permits users to discern fluctuations of interest in a particular subject over time and space by showing increases and decreases in the frequency of reference and usage in different periods and different linguistic regions. It also allows researchers to comb over the tens of millions of books Google has scanned in order to examine “word frequencies, syntactic patterns, and thematic markers” and to derive information on how nomenclature, linguistic usage, and literary style have changed over time. Authors Guild, Inc., 954 F.Supp.2d at 287. The district court gave as an example “track[ing] the frequency of references to the United States as a single entity (‘the United States is’) versus references to the United States in the plural (‘the United States are’) and how that usage has changed over time.”

While there are a number of generative AI cases pending (a nice summary of them is here) that I agree raise some additional legal questions beyond those directly answered in Google Books, the kind of textual analysis that offered seems remarkably similar to the kinds of things that the courts have already said are permissible fair uses. 

2) Text and data mining analysis has broad benefits

Not only is text mining fair use, it also yields some amazing insights that truly “promote the progress of Science,” which is what copyright law is all about.  Prosecraft offered some pretty basic insights into published books – how long, how many adverbs, and the like. I can understand opinions being split on whether that kind of information is actually helpful for current or aspiring authors. But, text mining can reveal so much more. 

In the submission Authors Alliance made to the US Copyright Office three years ago in support of a Section 1201 Exemption permitting text data mining, we explained:

TDM makes it possible to sift through substantial amounts of information to draw groundbreaking conclusions. This is true across disciplines. In medical science, TDM has been used to perform an overview of a mass of coronavirus literature.Researchers have also begun to explore the technique’s promise for extracting clinically actionable information from biomedical publications and clinical notes. Others have assessed its promise for drawing insights from the masses of medical images and associated reports that hospitals accumulate. 

In social science, studies have used TDM to analyze job advertisements to identify direct discrimination during the hiring process.7 It has also been used to study police officer body-worn camera footage, uncovering that police officers speak less respectfully to Black than to white community members even under similar circumstances.

TDM also shows great promise for drawing insights from literary works and motion pictures. Regarding literature, some 221,597 fiction books were printed in English in 2015 alone, more than a single scholar could read in a lifetime. TDM allows researchers to “‘scale up’ more familiar humanistic approaches and investigate questions of how literary genres evolve, how literary style circulates within and across linguistic contexts, and how patterns of racial discourse in society at large filter down into literary expression.” TDM has been used to “observe trends such as the marked decline in fiction written from a first-person point of view that took place from the mid-late 1700s to the early-mid 1800s, the weakening of gender stereotypes, and the staying power of literary standards over time.” Those who apply TDM to motion pictures view the technique as every bit as promising for their field. Researchers believe the technique will provide insight into the politics of representation in the Network era of American television, into what elements make a movie a Hollywood blockbuster, and into whether it is possible to identify the components that make up a director’s unique visual style [citing numerous letters in support of the TDM exemption from researchers].

3) Text and data mining is not new and it’s not a threat to authors

Text mining of the sort it seemed prosecraft employed isn’t some kind of new phenomenon. Marti Hearst, a professor at UC Berkeley’s iSchool explained the basics in this classic 2003 piece. Scores of computer science students experiment with projects to do almost exactly what prosecraft was producing in their courses each year. Textbooks like Matt Jockers’s Text Analysis with R for Students of Literature have been widely used and adopted all across the U.S. to teach these techniques. Our submissions during our petition for the DMCA exemption for text and data mining back in 2020 included 14 separate letters of support from authors and researchers engaged in text data mining research, and even more researchers are currently working on TDM projects. While fears over generative AI may be justified for some creators (and we are certainly not oblivious to the threat of various forms of economic displacement), it’s important to remember that text data mining on textual works is not the same as generative AI. On the contrary, it is a fair use that enriches and deepens our understanding of literature rather than harming the authors who create it.

Copyright Office Holds Listening Session on Copyright Issues in AI-Generated Visual Works

Photo by Debby Hudson on Unsplash

Earlier this week, the Copyright Office convened a second listening session on the topic of copyright issues in AI-generated expressive works, a part of its initiative to study and understand the issue, and following its listening session on copyright issues in AIgenerated textual works a few weeks back (in which Authors Alliance participated). Tuesday’s sessions covered copyright issues in images created by generative AI programs, a topic that has garnered substantial public attention and controversy in recent months.

Participants in the listening sessions included a variety of professional artist organizations like National Press Photographers Association, Graphic Artists Guild, and Professional Photographers of America; companies that have created the generative AI tools under discussion, like Stability AI, Jasper AI, and Adobe; several individual artists; and a variety of law school professors, attorneys, and think tanks representing varied and diverse views on copyright issues in AI-generated images. 

Generative AI as a Powerful Artistic Tool

Most if not all of the listening sessions’ participants agreed that generative AI programs had the potential to be incredible tools for artists. Like earlier technological developments such as manual cameras and, much more recently, image editing software like Photoshop, generative AI programs can minimize or eliminate some of the “mechanical” aspects of creation, making creation less time-consuming. But participants disagreed on the impact these tools are having on artists and whether the tools themselves or copyright law ought to be reformed to address these effects. 

Visual artists, and those representing them, tended to caution that these tools should be developed in a way that does not hurt the livelihoods of the artists who created the images the programs are trained on. While a more streamlined creative process makes things easier for artists relying on generative AI in their creation, it could also mean fewer opportunities for others artists. When a single designer can easily create background art with Midjourney, for example, they might not need to hire another designer for that task. This helps the first designer to the detriment of the second. Those representing the companies that create and market generative AI programs, including Jasper AI and Stability AI, focused on the ways that their tools are already helping artists: these tools can generate inspiration images as “jumping off points” for visual artists and lower barriers to entry for aspiring visual artists who may not have the technical skills to create visual art without support from these kinds of tools, for example. 

On the other hand, some participants voiced concerns about ethical issues in AI-generated works. A representative from the National Press Photographers Association mentioned concerns that AI-generated images could be used for “bad uses,” and creators of the training data could be associated with these kinds of uses. Deepfakes and “images used to promote social unrest” are some of the uses that photojournalists and other creators are concerned about. 

Copyright Registration in AI-Generated Visual Art

Several participants expressed approval of the Copyright Office’s recent guidance regarding registration in AI-generated works, but others called for greater clarity in the registration guidance. The guidance reiterates that there is no copyright protection in works created by generative AI programs, because of copyright’s human authorship requirement. It instructs creators that they can only obtain copyright registration for the portions of the work they actually created, and must disclose the role of generative AI tools in creating their works if it is more than de minimis. An author can also obtain copyright protection for a selection and arrangement of AI-generated works as a compilation, but not in the AI-generated images themselves. Yet open questions, particularly in the context of AI-generated visual art, remain: how much does an artist need to add to an image to render it their own creation, rather than the product of a generative AI tool? In other words, how much human creativity is needed to transform an AI-generated image into the product of original human creation for the purposes of copyright? How are we to address situations where a human and AI program “collaborate” on the creation of a work? The fact that the Office’s guidance requires applicants to disclose if they used AI programs in the creation of their work also leaves open questions. If an artist uses a generative AI program to create just one element of a larger work, or as a tool for inspiration, must that be disclosed in copyright registration applications? 

The attorney for Kristina Kashtanova, the artist who applied for a copyright registration for her graphic novel, Zarya of the Dawn also spoke. If you haven’t been tracking it, Zarya of the Dawn included many AI-generated images and sparked many of the conversations around copyright in AI-generated visual works (you can read our previous coverage of the Office’s decision letter on Zarya of the Dawn here). Kashtanova’s attorney raised more questions about the registration guidance. She pointed out that the amount of creativity required to create a copyrighted work is very low—there must be more than a “modicum” of creativity, meaning that vast quantities of works (like each of the photographs we take with our smartphones) are eligible for copyright protection. Why, then, is the bar higher when it comes to AI-generated works? Kashantova certainly had to be quite creative to put together her graphic novel, and the act of writing a prompt for the image generator, refining that prompt, and re-prompting the tool until the creator gets an image they are satisfied with requires a fair amount of creative human input. More, one might argue, than is required to take a quick digital photograph. The registration guidance attempts to solve the problem of copyright protection in works not created by a human, but in so doing, it creates different copyrightability standards for different types of creative processes. 

These questions will become all the more relevant as artists increasingly rely on AI programs to create their works. The representative from Getty Images stated that more than half of their consumers now use generative AI programs to create images as part of their workflows, and several of the professional artist organizations noted that many of their members were similarly taking up generative AI tools in their creation.

Calls For Greater Transparency

Many participants expressed a desire for the companies designing and making available generative AI programs to be more transparent about the contents of these tools’ training data. This appealed both to artists who were concerned that their works were used to train the models, and felt this was fundamentally unfair, and those with ethical concerns around scraping or potential copyright infringement. Responsive to these critiques, Adobe explained that it sought to develop its new AI image generator, Firefly (which is currently in beta testing) in a way that is responsive to these kinds of concerns. Adobe explained that it planned to train its tool on openly licensed images, seeking to “drive transparency standards” and “deploy [the] technology responsibly in a way that respects creators and our communities at large.” The representative from Getty Images also called for greater transparency in training data. Getty stated that transparency could help mitigate the legal and economic risks associated with the use of generative AI programs—potential copyright claims as well as the possibility of harming the visual artists who created the underlying works they are trained on. 

Opt-Outs and Licensing 

Related to calls for transparency, much of the discussion centered around attempts to permit artists to opt out of having their works included in the training data used for generative AI programs. Like robots.txt, a tag that allows websites to indicate to web crawlers and other web robots that they don’t wish to allow these robots to visit their sites, several participants discussed a “do not train tag” as a way for creators to opt out of being included in the training data. Adobe said it intended to train its new generative AI tool, Firefly, on openly licensed images and make it easy for artists to opt out with a “do not train” tag, apparently in response to these types of concerns. Yet some rightsholder groups pointed out that compliance with this tag may be uneven—indeed, robots.txt itself is a voluntary standard, and so-called bad robots like spam bots often ignore it. 

Works available under permissive licenses like Creative Commons’ various licenses have been suggested as good candidates for training data to avoid potential rights issues. Though several participants pointed out that there may be compliance issues when it comes to commercial uses of these tools, as well as attribution requirements. And the participant representing the American Society for Collective Rights Licensing voiced support for proposals to implement a collective licensing scheme to compensate artists whose works are used to train generative AI programs, echoing earlier suggestions by groups such as the Authors Guild. 

One visual artist argued fervently that an opt out standard was not enough: in her view, visual artists should have to opt in to having their works included in training data, as, in her view, an opt out system harms artists without much of an online presence or the digital literacy to affirmatively opt out. In general, the artist participants voiced strong opposition to having their works included without compensation, a position many creators with concerns about generative AI have taken. But Jasper AI expressed its view that training generative AI programs with visual works found across the Internet was a transformative use of that data, all but implying that this kind of training was fair use (a position Authors Alliance has taken). It was notable that so few participants suggested that the ingestion of visual works of art for the purposes of training generative AI programs was a fair use, particularly compared to the arguments in the listening session on text-based works. This may well be due to ongoing lawsuits, inherent differences between image based and text based outputs, or the general tenor of conversations around AI-generated visual art. Many of the participants spoke of anecdotal evidence that graphic artists are already facing job loss and economic hardship as a result of the emergence of AI-generated visual art.

‘Negotiating with the Dead’

Posted January 30, 2023

This is a guest post by Meera Nair, PhD, Copyright Specialist for the Northern Alberta Institute of Technology (NAIT), commenting on the recent extension of copyright term in Canada. It was originally published at

When it became evident that our copyright term was to be extended by twenty years, with no measures to mitigate the excess damage wrought by such action, Margaret Atwood’s book of this title kept returning to mind. A foray into the relationships that exist between writers and writing, a book where the word copyright did not feature among those ruminations, the title nonetheless feels apt for the days ahead.

Works of long-since-dead authors will now—in the best of situations—literally become objects of negotiation. This is purportedly to the benefit of those authors’ heirs, whereas on balance the true beneficiaries will be international publishing conglomerates and collective societies. In the worst of situations though, works will simply fade away with no surviving copy to emerge seventy years after their authors’ deaths. Those authors will be forgotten, and the public domain will remain poorer.

Atwood has been a prominent advocate for a stronger scope of protection in the name of copyright, famously remembered for her characterization of exceptions as expropriation and theft during a Standing Committee Meeting of the Department of Canadian Heritage in 1996. Two decades later, when she gave the 2016 CLC Kreisel Lecture at the University of Alberta, fair dealing was called out by name. Nonetheless, that lecture was a delight to listen to, grounded as it was on Atwood’s own experiences of being a Canadian writer.

It is her life that lies at the foundation of Negotiating, which took form through the Empson Lectures at the University of Cambridge in 2000. The combination of literature, literary criticism, book history, and history itself, written as only Margaret Atwood can, makes for compelling reading. In this book she comes perhaps closest to answering an age-old question about writing: what does it mean to write? There is no neat and tidy answer; at the very least it is blood, sweat, and tears amid negotiations between oneself, the society of the living, but also that of the dead.

To be sure, financial wherewithal is relevant to any impetus to write. Money appears approximately three times among the 74 reasons for writing taken “from the words of writers themselves (xx-xxii).” Yet, perhaps unintentionally, Atwood lays bare why copyright was not, nor ever will be, a broad determinant of success (either literary or material) for Canadian writers and publishers. From identifying the limitations of the Canadian publishing sector in the early to mid-twentieth century (to say there was disinterest in Canadian authors is putting it mildly), to stripping away the facades of originality and individuality (which underpin copyright’s structure of rights) in literary endeavor, there is much here to remind us that Canada’s phenomenal success in developing literary talent (see here and here) has occurred despite copyright, not because of it.

After borrowing the book repeatedly from the Edmonton Public Library, I had to buy it. Or rather, I had to buy it in the original form. Because what I had borrowed was a book titled On Writers and Writing, by Margaret Atwood, identified as a Canadian reprint of her earlier work, Negotiating with the Dead.

My preference was to buy Negotiating; in the peculiarities of my own mind, somehow it felt more authentic. As it turned out though, my instincts were correct. The two books are not the same. The difference lies, not in Atwood’s words, but in the representation of what copyright is. While both books specify the copyright as belonging to O.W. Toad (the name of Atwood’s enterprise), similarity ends there.

In Negotiating, published by The Press Syndicate of The University of Cambridge, readers are told: “This book is in copyright. Subject to statutory exceptions and to the provisions of relevant collective licensing agreements, no reproduction of any part may take place without the written permission of Cambridge University Press (emphasis mine).”

There it is. A clear indication that statutory exceptions exist and are relevant; meaning that some reproduction might not require permission. Whereas in Writers, published by Emblem (an imprint of McClelland & Stewart, a division of Random House of Canada Limited, a Penguin Random House Company), readers are told that permission is always needed for even a particle copied:

“All rights reserved. The use of any part of this publication reproduced, transmitted in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, or stored in a retrieval system, without the prior written consent of the publisher – or, in the case of photocopying or other reprographic copying, a license from the Canadian Copyright Licensing Agency – is an infringement of the copyright law (emphasis mine).”

Despite what a publisher might prefer, Canada’s Copyright Act permits unauthorized uses of insubstantial parts of a work and unauthorized uses of substantial parts which comport with fair dealing or other exceptions. As the Supreme Court (with unanimity) stated in 2004, “the fair dealing exception is perhaps more properly understood as an integral part of the Copyright Act than simply a defence. Any act falling within the fair dealing exception will not be an infringement of copyright (para 48).” And yet, willful misinformation is standard fare among books issued in Canada.

Given the stunting of our public domain by term extension, fair dealing is even more important now as it provides some allowance of use of older, protected, material. But even a large and liberal interpretation of fair dealing, as required by our Supreme Court, is no substitute for a vibrant public domain.

With the Act expected to undergo change this year, Canada could still introduce a system of registration associated to a longer term of copyright. Owners of works which continue to be commercially successful fifty years after an author’s death, will likely choose to register and thus receive the additional twenty years of protection. Whereas works that did not have such longevity with respect to commercialization, and works that were never intended for revenue generation, would likely not be registered and thus would enter the public domain without the twenty year delay. Such a system was recommended by a former Industry Committee to uphold our obligations under CUSMA, ensure that commercial works which may benefit by a longer term are able to capture that gain, and continue to grow the public domain.

The difficulty is to convey to current Canadian lawmakers the importance of the public domain. Too often, its intangibility has meant that the public domain is perceived as being of lesser value. That an author’s work is not protected somehow deems it and the author as being unworthy. Even the way older works are spoken of, that they have “fallen into the public domain,” carries an aura of degradation familiar to the plight of “fallen women.” Whereas the public domain is precisely the opposite; it enables new works to emerge. As Jessica Litman wrote in The Public Domain (1990):

To say that every new work is in some sense based on the works that preceded it is such a truism that it has long been a cliche, invoked but not examined. …  The public domain should be understood not as the realm of material undeserving of protection, but as a device that permits the rest of the system to work by leaving the raw material of authorship available for authors to use (966-968).

That this truism went unexamined and unarticulated is a testament to the difficulty of capturing the intricacy of the relationships between old works and new authors. Margaret Atwood not only undertook such an exploration but also elegantly articulated the journey that underlies every literary endeavor.

It is only fitting then that Margaret Atwood should have the last words:

… All writers must go from now to once upon a time; all must go from here to there; all must descend to where the stories are kept; all must take care not to be captured and held immobile by the past. And all must commit acts of larceny, or else of reclamation, depending how you look at it. The dead may guard the treasure, but it’s useless treasure unless it can be brought back into the land of the living and allowed to enter time once more – which means to enter the realm of audience, the realm of readers, the realm of change (p.178).

Authors Alliance Annual Report: 2022 In Review

Authors Alliance is pleased to share this year’s annual report, where you can find highlights of our work in 2022 to promote laws, policies, and practices that enable authors to reach wide audiences. In the report, you can read about how we’re helping authors meet their dissemination goals for their works, representing their interests in the courts, and otherwise working to advocate for authors who write to be read. 

Click here to view the report in your browser.

Authors speak out: an update on the Wiley ebook situation

Last week we wrote about publisher John Wiley & Sons abruptly removing some 1,300 ebooks from library collections, and then (in the face of significant public outcry from librarians, authors, and instructors) temporarily restoring access for the academic year.

Authors Alliance has heard from a number of authors expressing their strong disapproval of Wiley’s actions. To help them express their concerns, we co-wrote a letter with #ebookSOS that authors of the Wiley books can sign on to, calling on Wiley to change their practices. The text of the letter can be read here. We’re still working on reaching out to all of the individual authors of these books (if you are inclined to help find contact info, you can contribute it here), but already we’re hearing back from authors with comments of their own. For example, authors wrote us to express their frustration over lack of respect for their interests in seeing their books put into the hands of readers:

“I find the removal of eBooks arbitrary and infringing on the rights of the authors and the prospective readers/users of these book.”

“I strongly agree with your approach concerning the e-books. Wiley is evidently the only beneficiary of this system, which works against the authors and readers.”

“I would like my book to be available to as many students as possible.”

Unsurprisingly, the question of royalties paid out to the authors of top-selling titles is a frequent topic of discontent, highlighting the mismatch between the high prices that Wiley charges for access and the funds that actually make their way to authors. For example, authors wrote us to say:

“Recently I wrote Wiley if I can get a yearly list of royalty payments corresponding to hard cover, e-book and, if appropriate, solution manual. Because for a book in the forefront of the technology, I received only $8 as the last royalty payment. The result was no answer. All these make me question if they calculate the royalty payments honestly. I was not intending to get rich when I decided to write this book, but the return of time and effort I put for writing such a book is not fair. I wonder whether the return for Wiley is also that low.”

“Wiley created a lot of problems in royalty payments. I had to write a letter of complaint to the CEO of Wiley in order to get my first royalty payment after approximately three years after the publication of the book. The payment department was uncooperative.”

As we note above, in response to mounting pressure, Wiley did recently announced it will reinstate the withdrawn books, but only until June 2023. After hearing from the authors we’ve reached out to, Authors Alliance and #ebookSOS agree that the problem is in no way solved and are continuing their efforts to raise awareness with authors.

For more information please contact us at or at

Note: this initiative is part of a wider joint project, to educate and empower authors, who rarely know how their work is managed post-publication, to hold publishers to account. If you want to help on this project please get in touch.

Biden’s Open Access to Research Policy and How it Affects Authors

Several  weeks ago, the White House Office of Science and Technology Policy (OSTP) issued a memo titled Ensuring Free, Immediate, and Equitable Access to Federally Funded Research. The memo, which builds upon earlier policies including this 2013 Obama administration open access memo and this 2008 National Institutes of Health policy, directs all federal agencies with research and development spending to take steps to ensure that federally sponsored research in the form of scholarship and research data will be available free of charge from the day of publication. 

The initial release of the Biden OSTP memo generated a rush of news speculating about its impact on scholarly publishing—how major publishers would react, how academic institutions would respond (and specifically whether it would result in a shift towards more “Gold Open Access” publishing in which authors pay publishers an article processing charge to publishing their article openly), and how many articles this change would affect. SPARC, a nonprofit dedicated to supporting open research, has a great summary of the policy and related news.

We recently caught up with Peter Suber, Senior Advisor on Open Access at Harvard University Library, to talk about the implications of the OSTP policy for authors. Peter is a founding member of Authors Alliance who has been deeply involved in advocating for open access for several decades. 

Q: Give us a brief overview of the new OSTP policy – what is this and why is it important? 

The key background is that back in 2013, the Obama White House OSTP issued a memo asking the 20 largest federal funding agencies to adopt OA policies. The memo applied to agencies spending over $100 million per year on extramural research funding. The new memo from the Biden White House extends and strengthens the Obama memo in three important ways: 

  • It covers all federal funders, not just the largest ones. I’ve seen estimates that the new memo covers more than 400 agencies, but the OSTP has not yet released a precise number. Among other agencies, the new memo covers the National Endowment for the Humanities. So for the first time federal OA policies will cover the humanities and not just the sciences.
  • The Obama memo permitted embargoes of up to 12 months, and publishers routinely demanded maximum embargoes. The Biden memo eliminates embargoes and requires federally funded research to be open on the date of publication. Like the Gates Foundation —which I believe was the first funder to require unembargoed OA to the research it funded—the White House announced its no-embargo policy several years before it will take effect, giving publishers plenty of time to adapt. 
  • The Obama memo covers data, not just articles. This is an important step to cover more research outputs and more of the practices that make up open science and open scholarship.

How will publishers react to this new policy? Of course they have the right to refuse to publish federally funded research. When the NIH policy was new in 2008, we didn’t know whether any publishers would refuse. Because many publishers lobbied bitterly against it and we thought some might do just that. But it turns out that none refused. It’s hard to prove a negative, but the Open Access Directory keeps a crowd-sourced list of publisher policies on NIH-funded authors, and has so far turned up no publishers who refuse to publish NIH-funded authors just because they are covered by a mandatory OA policy. 

Of course one reason is that the NIH is so large. It’s by far the world’s largest funder of non-classified research. Any publishers who refuse to publish NIH-funded authors would abandon a huge vein of high-quality research to their rivals. But when federal OA policy covers smaller agencies as well, some publishers might well refuse to publish, say, NEH-funded research, because they don’t receive many submissions from NEH-funded authors. This is something to watch. 

Q: The Biden memo does not address ownership of rights or licensing for either scholarship or data. How do you think agencies will address rights issues in their implementation? 

Good point. Neither the Obama nor the Bidem memo explicitly requires open licenses. But both require that agency policies permit “reuse”, which will require open licenses in practice. Unfortunately the Obama White House approved agency policies that did not live up to this requirement. We can hope the Biden White House will do better on that point. Of course Plan S requires a CC-BY license and the Biden memo conspicuously stops short of that. As a result, we can expect lots of lobbying, either at the agency level or the OSTP level — for and against explicit open licensing requirements, and for and against specific licenses like CC-BY.

Q: Some people have written about how open access policies and Plan S Rights Retention Strategy in particular undermine authors rights. E.g., this post on Scholarly Kitchen.  Our point of view is that those policies address a negotiating imbalance that has traditionally favored publishers, and allows academic authors –who on the whole prefer broad reach and access to their work– to switch the “default” to open for their articles even when their publishers wish otherwise.  Do you have a response to that argument that OA policies for funded research undermine authors rights? 

I’ve never seen a good argument that rights retention policies harm authors or limit their rights. On the contrary, these policies help authors and enlarge their rights. I’ve made this case in response to criticisms of the rights-retention OA policies at Harvard, and I’ve enumerated the benefits of rights-retention policies for authors. (For background on the Harvard rights-retention policies, I can recommend a handout I wrote for a talk last year.)  

One criticism is that rights-retention OA policies will reduce author choice by causing some publishers to refuse to publish covered authors. But in practice there is no evidence that this actually happens. I’m not aware of a single instance of this happening In the 14 years that Harvard has had its rights-retention policies. The same goes for the more than 80 Harvard-style policies now in effect in North America, Europe, Africa, Asia, and Australasia. 

In fairness, Harvard-style policies give authors the right to waive the open license. By default the university has non-exclusive rights, but authors can waive that license if they wish, and publishers can demand that authors get the waiver. But that too is rare. In our case, very few publishers – just two or three – systematically make that demand, and I haven’t heard that it’s common anywhere else. Our waiver rate is below 5%. Even with waiver options, these policies definitely shift the default to open.

Under the Plan S rights retention strategy, authors add a paragraph to their submission letter saying that any accepted manuscript arising from the submission is already under a CC-BY license. Publishers have the right to desk-reject those articles upon submission. But we don’t know whether any will actually do so. Plan S has a tool to track journal compliance with the Plan S terms, and it will alert authors to steer clear of those publishers. 

Q: There has been speculation that the Biden memo will accelerate the rate at which publishers adopt a “article processing charge” Gold OA model that will require all authors (or their funders or universities) to pay for their articles to be published. What do you think? 

First we should note that the White House guidelines are 100% repository-based or “green”. They require deposit in OA repositories, not publication in OA journals. As far as I can tell, publishing in an OA journal would not even count toward compliance, since those authors would still have to deposit their texts and data in suitable repositories. 

Publishers could say to federally-funded authors, “You can publish with us only if you pay an APC [article processing charge] for our gold option.” Authors could take them up on that or they could withdraw their submissions and look elsewhere. The new OSTP memo lets authors use some of their grant funds to pay “reasonable publication costs”. Some authors may be fooled and think that paying the fee is the only way to comply with the funder policy. But that would be untrue. As more and more authors realize that they can comply with the funder policy by depositing in a repository, at no charge, I predict that they will divide. Some will take the costless path to compliance and refuse to pay what I’ve called a prestige tax just to publish in a certain journal. Others will pay the prestige tax for a journal’s brand and reputation, if only because journal prestige still carries a lot of weight with academic authors. This obstacle to frictionless sharing is a cultural obstacle that new policies cannot directly dismantle. But we should remember that when publishers demand a publication fee and authors pay it, the authors are paying for the journal brand. They are not paying to comply with the funder policy, which they could do at no charge.

The Biden memo is equivocal about this possibility. On the one hand, it lets federal grantees use grant money to pay reasonable publication costs. On the other hand, it requires that agency policies “ensure equitable delivery” of federally funded research. The memo uses “equity” language in similar contexts half a dozen times. On one natural interpretation, this language rules out APC barriers to compliance, because APCs exclude some authors on economic grounds. This is another front on which there will be lots of lobbying as the agencies put their policies into writing. In fact, the lobbying has already begun.

Some publishers will undoubtedly demand fees or try to demand fees to publish federally-funded authors. But we already know that some will not. Science, for example, has already said that it will publish federally-funded authors without requiring them to buy its “gold” OA option. AAAS said that “it is already our policy that authors who publish with one of our journals can make the accepted version of their manuscript publicly available in institutional repositories immediately upon publication, without delay.” In a related editorial, Science explained that its authors may already deposit in the OA repositories of their choice “without delay or incurring additional fees.” It opposes a full shift toward “author pays” gold OA because it discriminates against many kinds of researchers, such as early-career researchers, researchers from smaller schools, and those in underfunded disciplines. It agrees that the APC model “can be inequitable for many scientists and institutions.” Some journals will follow Science, because it’s Science. Some will do so to avoid the equity barrier. And some will do so to signal that they will only evaluate submissions on their merits.

Q: As agencies go about developing their own plans for implementing this policy, will authors or others have an opportunity to give input, or will this be a closed-door process? 

We don’t know yet. The White House didn’t solicit public comments for the 2022 memo, which angers some publishers. The Obama white house memo did solicit public comments, twice, and both times the comments overwhelmingly favored the policy. 

It seems that agencies could still call for public comments before they finalize their policies. The actual development of the policies will be coordinated by three agencies: the OSTP, the Office of Management and Budget, and the National Science and Technology Council Subcommittee on Open Science. We don’t know what guidelines, if any, they will lay down for that coordination. 

The background on coordination goes back to the Obama White House. When it told the large agencies that they must adopt OA policies, it allowed the policies to differ but asked agencies to work together to ensure that the policies aligned. In the end, I believe the policies differed too much. Universities really feel this because they have to comply with all of the policies, since they receive grants from each agency. Like the Obama memo, the Biden memo allows the policies to differ and calls for coordination. We can hope for less divergence than in the past. 

Authors Alliance Supports Model Ebook Legislation

Posted July 7, 2022

Authors Alliance is pleased to add its support to an effort that would restore balance to the ebook market. Last week, Library Futures released a policy statement and model legislation that would help state legislatures resolve a host of challenges that make it difficult for libraries to lend ebooks and ensure their long-term availability. 

Why we care: For authors who want to have an enduring impact on the world, having their writings fall into obscurity is a major concern. For print books, libraries have historically played an important role in making sure books did not drop out of circulation. No matter the current state of the market or what is going to make the biggest splash on a publisher’s bottom line (or even whether a book has long gone out of print), libraries make authors’ books available to readers in both the short and long term. Copyright law plays an important role in balancing the level of control that publishers have over copies they sell and the rights that libraries and their readers have to access and use those copies. Current law gives libraries fundamental rights, including the ability to buy copies on the open market, lend copies to users one at a time, and preserve books for the long-term. 

For ebooks, publishers have been able to write their own rules for how libraries can use those books. That’s because every ebook that you buy–or that a library acquires –is not actually owned by the purchaser, but licensed by the publisher or ebook distributor. The terms of these licenses are largely dictated by publishers. In recent history, libraries have contended with contracts that severely limit how they can fulfill their mission – contracts that charge libraries multiple times more than what consumers pay, place strict limits on how many times a ebook may be lent out before the library has to pay for it again, place limits on how long the library can have access,  limits on how long a user can check out a book, and limits on how researchers can use that book (e.g., limiting text and data mining), among many other limitations that don’t apply in the print world. 

These kinds of contractual restrictions make an end run around the traditional market balance that existed in the print world. The end result is that for ebooks, authors are less likely to reach the readers they hope to, especially cutting out readers who rely on libraries for access and don’t have the financial means to purchase ebooks themselves. Restrictions on library preservation activities can also jeopardize long-term availability. While libraries are committed to continuing to pay for the care needed to maintain books long into the future, commercial publishers have no such incentive beyond the window in which a work is commercially viable. That window for commercial viability is short – on average, only 5 years. While some publishers (mostly academic publishers) have been willing to agree to license terms with libraries that provide for long-term availability for ebooks, most others have not, and instead actively frustrate library efforts to ensure long-term access. All of this means that as time goes on,  these types of restrictions could make authors’ books harder—if not impossible—to find online.

What the proposed ebooks law does: We wrote last year about a Maryland law passed in 2021 that aimed to force publishers that sold ebooks in the state to also license to libraries on “reasonable terms,” addressing many of licensing problems we note above.  

While the bill passed the legislature and became law,it ran into trouble almost immediately. The Association of American Publishers brought a lawsuit to enjoin its enforcement, arguing that because copyright law is the domain of federal law, state legislation governing ebooks is preempted by federal law and therefore unenforceable. Earlier this year, the Federal District Court of Maryland agreed, holding that because the Maryland law required that publishers “shall offer” a license to libraries whenever they offer an ebook to the public, it effectively forced publishers to grant these licenses, conflicting with the copyright holder’s federally granted exclusive right to control public distribution. The court explained, “[f]orcing publishers to forgo offering their copyrighted works to the public in order to avoid the ambit of the Act interferes with their ability to exercise their exclusive right to distribute. Alternatively, forcing publishers to offer to license their works to public libraries also interferes with their exclusive right to distribute.” The decision tracks closely with an opinion that Shira Perlmutter, Register of Copyrights and Director of the U.S. Copyright Office, released in 2021. The Copyright Office drew a sharp distinction between those state laws that purport to regulate the terms of a contract (which she concluded are unlikely to be preempted since they do not interfere with the right to distribute) with state laws that require publishers to grant a license (likely to be preempted). Perlmutter explained that “[b]oth the Third Circuit and the District of Utah have explicitly excluded from permissible state regulations those that “appropriate[] a product protected by the copyright law for commercial exploitation against the copyright owner’s wishes.”

Since Maryland passed its legislation, numerous other states have taken up the same issue, with slight variations on their approach. Given the failure of the Maryland law, how states craft such legislation is clearly important. 

Authors Alliance supports the Library Futures policy paper and model legislation because it offers a reasonable, productive, and viable alternative pathway for states to address inequities and unequal bargaining power in the ebook marketplace. Specifically, it proposes an approach that does not demand that publishers license to libraries on certain terms, but instead focuses on the state’s traditional and well accepted role in regulating how its own state contract law will apply, particularly in cases of unequal bargaining power.  We encourage states to utilize the framework set out by Library Futures rather than repeating the same framework as the Maryland law.

Update: Fair Use in the Courts in 2021

Posted August 31, 2021
“Prince Mural” by red.wolf is licensed under CC BY-NC-SA 2.0

In April, we published a post on two major fair use decisions from this year: Google v. Oracle and The Andy Warhol Foundation v. Goldsmith. In the post, we expressed our uncertainty about how the decision in Google, which concerned a specific question related to software, would impact fair use analysis for literary and artistic works. Earlier this month, the Second Circuit answered this question, at least with regards to fair use jurisprudence in that circuit.

The Andy Warhol Foundation v. Goldsmith concerned the question of whether Warhol’s screen prints of Prince, based in part on a photograph taken by Goldsmith, constituted fair use. The court found that the works were not fair use, in large part because it believed that Warhol’s screen prints were not transformative, but instead, the same works as Goldsmith’s photograph, but with a new aesthetic. The court signaled that the screen prints were closer to derivative works based on the original photograph than fair uses of the photograph. In contrast, the Supreme Court in Google v. Oracle did find that Google’s use of Oracle’s APIs in its Android platform was a fair one, in part because the Court found the use to be highly transformative.  

After the Google decision was handed down, the Warhol Foundation requested a re-hearing in its case, asking the Second Circuit to consider whether the Google decision would change its fair use determination. The court then issued an amended decision, and for the most part affirmed its earlier ruling, reiterating that the screen prints did not constitute fair use. The court held that the ruling in Google v. Oracle did not have much bearing on determinations about fair use when it comes to literary and artistic works. The court also underscored the Supreme Court’s statement that copyright protection is weaker for functional works—like software—and stronger for literary or artistic works—like Warhol’s screen prints, further making the Google decision inapplicable to its case. 

Another small revision in the Warhol court’s amended decision was notable for its bearing on fair use: the original decision stated that derivative works were “specifically excluded” from being considered fair use as a categorical matter, but in the amended decision, the court stated that derivative works may fail to qualify as fair use, walking back its earlier statement. By leaving open the possibility that a derivative work might still be a fair use, the court reinforced the idea that fair use is a context and fact-specific determination, a principle that also animated the decision in the Google case.

For an in-depth discussion of Google v. Oracle and the original decision in The Warhol Foundation v. Goldsmith, see our earlier post.

Update: Library E-Book Lending Legislation and Partnerships

Posted July 27, 2021
Photo by Perfecto Capucine on Unsplash

It is no secret that Authors Alliance loves libraries, and we support policies that help libraries fulfill their essential role of making knowledge and culture available and accessible to all. In recent months, several states have proposed and in some cases passed legislation that requires publishers to license e-books to libraries under “reasonable terms.” Similarly, bookselling and publishing giant Amazon has taken steps to make its content available to libraries, following years of refusal to license e-books to libraries altogether. In today’s post, we will share some of the details of these exciting developments. 

State Legislation

Over the course of the past year, three state legislatures have introduced legislation that would impose limits on a publisher’s ability to sell e-books to libraries at a high cost. Under the current licensing model, libraries can pay as much as $60 per title for an e-book license, which often have very restrictive terms, whereas consumers can purchase an e-book license for the same title at a fraction of the cost. The first of these bills was passed in Maryland, and the New York state legislature has also recently approved the New York bill. A bill in Rhode Island is currently pending. Additionally, groups in Connecticut, Texas, Virginia, and Washington have reportedly begun advocating for similar legislation. 

Maryland’s Library E-Book Lending Law

Maryland was the first state to enact legislation requiring publishers to offer libraries e-book and digital audiobook licenses on reasonable terms. The Maryland state legislature unanimously passed the bill in March, but before it was approved by the governor, it faced last-minute opposition from the Association of American Publishers (“AAP”), who claimed the bill was unconstitutional. Despite these challenges, Governor Larry Hogan announced that the bill was enacted into law in late May. The law will go into effect in January 2022, and requires publishers who license “electronic literary products” (which may be broader in scope than “e-books”) to the general public to “offer to license the product to public libraries in the State on reasonable terms that would enable public libraries to provide library users with access[.]” It remains to be seen what will constitute “reasonable terms” under the new Maryland law, but the Maryland Library Association has recently issued a statement providing guidance on what might constitute reasonable terms and how these might be developed.

Despite the tough opposition it faced from publishers, the Maryland law has been described by its proponents as “fairly mild.” This is because it does not fundamentally change the e-book licensing scheme employed by publishers, whereby e-books are temporarily licensed to libraries, who remain unable to actually own these digital copies. Instead, the law simply requires publishers to offer e-book licenses to libraries on terms they can afford in order to allow libraries to perform their essential function of serving patrons: readers are not served when libraries cannot afford e-book licenses. This problem took on particular salience during the pandemic, when many readers were unable to access physical books at all. The new Maryland law takes aim at this issue without disrupting the traditional e-book licensing model that publishers are reluctant to abandon. Nonetheless, the AAP has since affirmed its opposition to these legislative efforts, maintaining that the Maryland law and other state legislation like it are inconsistent with federal copyright law.

New York’s Library E-Book Lending Bill

Last month, the New York state legislature passed a bill similar to the Maryland bill. Just as in Maryland, state legislators voted unanimously in favor of the bill’s passage. The New York bill also requires publishers to offer libraries e-book licenses on “reasonable terms” if those e-book licenses are also available for purchase by the public. The New York bill proceeds from the premise that “[p]ublic libraries provide equitable access to information for all.” Because many New Yorkers (like many readers writ large) prefer digital books over physical ones, whether due to print or mobility disabilities or for ease of access, the bill takes aim at “discriminatory practices” such as e-book embargos, whereby libraries must wait months to purchase licenses for new e-books.

The New York bill has not yet been sent to Governor Andrew Cuomo for his signature, but advocates are “cautiously optimistic” that he will sign once it has been sent. The bill must be sent to the governor by the end of the calendar year, and once signed, will take effect after just 19 days. This means that while the New York bill is not yet law, it may well take effect before Maryland’s new law if sent to and signed by Governor Cuomo. 

Rhode Island’s Library E-Book Lending Bill

In Rhode Island, the analogue bill to the Maryland and New York bills was re-introduced in April of this year after a similar bill last legislative session failed to gain momentum. The 2021 bill, which, like the Maryland legislation, includes digital audiobooks, was then recommended for further study by the House Corporations Committee, with no further updates since late April. Former Rhode Island state senator, Mark McKenney, penned an op-ed voicing his support for the bill, pointing out that “libraries lending books to patrons hasn’t put publishers out of business,” and calling out Amazon specifically for its policy of refusing to sell or license e-books it publishes to libraries and schools altogether.

Amazon and the Digital Public Library of America

In December 2020, Amazon announced it was in talks with the Digital Public Library of America (“DPLA”) to make thousands of books it publishes available to public libraries via the DPLA exchange. The long-awaited deal between the organizations was signed in May, and is set to go into effect sometime this summer. The partnership contemplates several different licensing models, including flexible “bundles” of lends and more traditional models involving time limits and restrictions on how many patrons can check out an e-book at a time. Librarians have applauded Amazon for offering the less restrictive “bundle” models, which provide additional flexibility for libraries. Unlike the state library e-book lending legislation, the Amazon-DPLA partnership will offer an alternative to the traditional licensing scheme.

Library advocates are cautiously optimistic about the Amazon-DPLA partnership, but also note that how much it will help libraries will depend on how Amazon prices its e-books for libraries, which is at this point unknown. Unlike the state library e-book lending legislation discussed above, the Amazon deal makes no mention of how library e-book licenses will be priced. Moreover, not all Amazon-published titles will be made available through the partnership—self-published Kindle originals and Audible audiobooks are not included in the program, for example. Another limitation of the Amazon-DPLA partnership is that it requires libraries to participate in the DPLA marketplace, and will make the e-books readable with the SimplyE reading app, an open source e-reading platform developed by the New York Public Library. Many library patrons today access e-books via more popular marketplaces such as OverDrive, and both iBooks and Kindle are much more popular e-reading platforms with which patrons are likely to be more familiar. Yet the Amazon-DPLA partnership is undoubtedly a step in the right direction towards ensuring greater access to books published by Amazon. Moreover, the deal is not exclusive, meaning that Amazon could develop similar partnerships in the future in order to make its e-books even more accessible to library patrons. 

Copyright and American Independence Day

Posted July 6, 2021
Photo by Tim Mossholder on Unsplash

In today’s post, we will be sharing some facts about copyright law and American Independence Day. While the two might not seem to be closely connected, both the history of the Fourth of July and the ways in which we celebrate today implicate copyright law in some unexpected ways.

Patriotic Public Domain Works: The Declaration of Independence

The Fourth of July celebrates the anniversary of the signing of the Declaration of Independence in 1776, whereupon the American colonies declared themselves to be independent from England. The Declaration of Independence is in the public domain for several reasons. Copyright buffs may recall that works published prior to 1926 are in the public domain, and this principle applies to this historic document. But in fact, the lack of a system of copyright protection in the American colonies at the time of the Declaration’s issuance means that it was probably never protected by U.S. copyright law. The first federal copyright law was not passed until 1790, and did not apply retrospectively, but only to new works of authorship. And today, literary works authored by the federal government are automatically in the public domain. 

The Library of Congress makes scanned copies of early historic documents in the public domain, including the Declaration of Independence, available online. Because they are in the public domain, anyone is free to use these documents in whatever manner they wish—reading them aloud to crowds, translating them into other languages, or printing and distributing copies—without fear of copyright liability.

Patriotic Public Domain Works: “The Star Spangled Banner”

The American national anthem too is a part of the public domain. The lyrics to “The Star Spangled Banner” originate from a poem, “Defence of Fort M’Henry,” written by Francis Scott Key in 1814. The musical composition was taken from an earlier written song—“the Anacreontic Song,” official song of the British gentleman’s club, the Anacreontic Society—which was already in the public domain at the time, having been written in the late 1700s (you can hear a full recording of “The Anacreontic Song” on the Smithsonian’s website). Key’s poem set to this tune was subsequently re-titled “The Star Spangled Banner.” The patriot song remained popular for years, and was officially adopted as the American national anthem in 1931. 

Interestingly, only Key’s lyrics were officially adopted as the national anthem. While “The Anacreontic Song” has remained the unofficial, traditional musical composition for “The Star Spangled Banner,” creators have been empowered to create their own adaptations, which could potentially rise in popularity and usurp the original tune (though this has not happened). These adaptations can also draw heavily from the original tune because it is in the public domain: such adaptations could have been considered derivative works that would infringe the copyright owner’s exclusive rights, were the song protected by copyright. Well-known adaptations of “The Star Spangled Banner,” like Igor Stravinsky’s four arrangements of the song and Jimi Hendrix’s instrumental rendition at Woodstock, may not have been possible without the freedom to adapt that the public domain enables. 

Copyright in Revolutionary America

Prior to the issuance of the Declaration of Independence, when the American colonies remained under British rule, there was no copyright protection in the present-day United States. This is because the British copyright law, the Statute of Anne, did not apply to the American colonies. As a result, creators had little to no control over the dissemination of their works, and were not entitled to royalty payments. However, the largely agrarian nature of the present-day U.S. at the time made copyright protections less of a priority for the colonists and revolutionaries. Over the next 14 years, as the country evolved, the Continental Congress and later the Congress of the Confederation (the legislative body established under the Articles of Confederation) allowed for private copyright acts and state law copyright acts, resulting in inconsistencies across states and limited protection for creators. Finally, the first federal copyright bill was signed into law by George Washington in 1790. This law mirrored the Statute of Anne nearly word for word, though U.S. and U.K. copyright laws have evolved in different ways in the intervening century. 

The Fourth of July Today: Copyright in Fireworks Displays

Across the U.S., many celebrate the Fourth of July with fireworks displays, which can be expressive and creative. But fireworks displays are an example of the kind of creative expression that copyright typically does not protect. This is because of the fixation requirement in American copyright law: for creative expression to receive copyright protection, the Copyright Act requires that it be fixed in a tangible medium of expression. The fixation requirement means an improvised speech which is not recorded or documented in any way cannot protected by copyright, for example. And similarly, fireworks displays are simply too ephemeral and intangible to satisfy the fixation requirement. 

But the story does not end there: photographs or film recordings of fireworks displays are eligible for copyright protection, because those types of expression are fixed—recorded on film or saved on a digital camera. Images or recordings of fireworks displays further possess the “modicum of creativity” necessary for a work to be protected, since the person who captured the image or recording made at least some creative choices in how they captured the display. Additionally, a recent court case found that “command protocols” and the underlying computer codes for the actual launching of fireworks were copyrightable as software, which is a type of literary work eligible for copyright protection. So while the actual display of fireworks—what you may have witnessed this Independence Day—cannot be protected by copyright, fixed images of the fireworks and the computer program that made their display possible can be protected.