Author Archives: Dave Hansen

Coalition Letter to Congress on Copyright and AI

Posted September 11, 2023

Photo by Chris Grafton on Unsplash

Earlier today Authors Alliance joined a broad coalition of public interest organizations, creators, academics, and others in a letter to members of Congress urging caution when considering proposals to revamp copyright law to address concerns about artificial intelligence. As we explained previously, we believe that copyright law currently has the appropriate tools needed to both protect creators and encourage innovation.  

As the letter states, the signatories share a common interest in ensuring that artificial intelligence meets its potential to enrich the American economy, empower creatives, accelerate the progress of science and useful arts, and expand humanity’s overall welfare. Many creators are already using AI to conduct innovative new research, address long-standing questions, and produce new creative works. Some, such as some of these artists, have used this technology for many years.

So our message is simple: existing copyright doctrine has evolved and adapted to accommodate many revolutionary technologies, and is well equipped to address the legitimate concerns of creators. Our courts are the proper forum to apply those doctrines to the myriad fact patterns that AI will present over the coming years and decades. 

You can read the full letter here. 

Current copyright law isn’t perfect, and we certainly believe creativity and innovation would benefit from some changes. However, we should be careful about reactionary, alarmist politics. It seldom makes for good law. Unfortunately, that’s what we’re seeing right now with AI, and we hope that Congress has the wisdom to see through it. 

We encourage our members to reach out to your own Congressional representative to express the need to tread carefully, and (if you are) to explain how you are using AI in your work.  We’d also be very happy to hear from you as we develop our own further policy communications to Congress and to agencies such as the U.S. Copyright Office. 

Open Access and University IP Policies in the United States

Posted August 18, 2023

Perhaps the most intuitive statement in the whole of the U.S. Copyright Act is this: “Copyright in a work protected under this title vests initially in the author. . . ..” Of course authors are the owners of the copyright in their works. 

In practice, however, control over copyrighted works is often more complicated. When it comes to open access scholarly publishing, the story is particularly complicated because the default allocation of rights is often modified by an complex series of employment agreements, institutional open access policies, grant terms, relationships (often not well defined) between co-authors, and of course the publishing agreement between the author and the publisher. Because open access publishing is so dependent on those terms, it’s important to have a clear understanding of who holds what rights and how they can exercise them.

Work for Hire and the “Teacher Exception”

First, it’s important to figure out who owns rights in a work when it’s first created. For most authors, the answer is pretty straightforward. If you’re an independent creator, you as the author generally own all the rights under copyright. If co-authors create a joint work (e.g., co-author an article), they both hold rights and can freely license that work to others, subject to an accounting to each other. 

If, however, you work for a company and create a copyrighted work in the scope of your employment (e.g., I’m writing this blog post as part of my work for Authors Alliance) then at least in the United States, the “work for hire” doctrine applies and, the law says, “the employer or other person for whom the work was prepared is considered the author.” For people who aren’t clearly employees, or who are commissioned to make copyrighted works, whether their work is considered “work for hire” can sometimes be complicated, as illustrated in the seminal Supreme Court case CCNV v. Reid, addressing work for hire in the context of a commissioned sculpture.  

For employees of colleges or universities who create scholarly works, the situation is a little more complicated because of a judicially developed exception to the work-for-hire doctrine known as the “teacher exception.” In a series of cases in the mid-20th Century, the courts articulated an exception to the general rule that creative works produced within the scope of one’s employment were owned by the employer for teachers or educators. Those cases each have their own peculiar facts, however, and most significantly, they predated the 1976 Copyright Act, which was a major overhaul of U.S. copyright law. Whether the “teacher exception” continues to survive as a judge-made doctrine is highly contested. Despite the massive number of copyrighted works authored by university faculty after the 1976 Act (well over a hundred million scholarly articles alone, not to mention books and other creative works), we have seen very few cases addressing this particular issue.  

There are a number of law review articles and books on the subject. Among the best, I think, is Professor Elizabeth Townsend-Gard’s thorough and worthwhile article. She concludes, based on a review of past and modern case law, that the continued survival of the teacher exception is tenuous at best: 

“The teacher exception was established under the 1909 act by case law, but because the 1976 act did not incorporate it, the “teacher exception” was subsumed by a work-for-hire doctrine that the Supreme Court’s definition of employment in CCNV v. Reid places teachers’ materials under the scope of employment. Thus the university-employers own their original creative works. No court has decided whether the “teacher exception” survived Reid, but the Seventh Circuit in Weinstein, decided two years before Reid, had already transferred the “teacher exception” from a case-based judge made law to one dictated by university policy.”

University Copyright and IP policies

Whatever the default initial allocation of copyright ownership, authors of all types must also understand how other agreements may modify control and exercise of copyright. These policies can be somewhat difficult to untangle because there actually may be layers of agreements or policies that cross reference each other and are buried deep within institutional policy handbooks. 

For academic authors, this collection of agreements typically includes something like an employee handbook or academic policy manual, which will include policies that all university employees must agree to as a condition of employment. Typically, that will include a policy on copyright or intellectual property. Regardless of whether the teacher exception or work-for-hire applies, these agreements can override that default allocation of rights and transfer them, both from the creator to the university, or from university to the creator. 

These policies differ significantly in the details, but most university IP policies choose to allocate all or substantially all rights under copyright to individual creators of scholarly works, notwithstanding the potential application of the work for hire doctrine. In other words, even though copyright in faculty scholarly works may initially be held by the university, through university policy those rights are mostly handed over to individual creators. The net effect is that most university IP policies treat faculty as the initial copyright holders even if the law isn’t clear that they actually are.

Some universities, like Duke University, say nothing about “work for hire” in their IP policies but merely “reaffirm[] its traditional commitment to the personal ownership of intellectual property rights in works of the intellect by their individual creators.” Others like Ohio State, are similar, stating that copyright in scholarly works “remains” with their creators, but then also provide that “the university hereby assigns any of its copyrights in such works, insofar as they exist, to their creators,” which can act as a sort of savings clause to address circumstances in which the there may be uncertainty about ownership by individual creators. 

Others, like Yale, are a little clearer about their stance on work-for-hire. Yale explains that “The law provides . . . that works created by faculty members in the course of the their teaching and research, and works created by staff members in the course of their jobs, are the property of the University,” but then goes on to recognize that “[i]t is traditional at Yale and other universities, however, for books, articles and other scholarly writings by a faculty member to be deemed the property of the writer . . . . In recognition of that longstanding practice, the University disclaims ownership of works by faculty, staff, postdoctoral fellows and postdoctoral associates and students. . . .” Another example of a university taking a similar approach is the University of Michigan.

Carve outs and open access policies

Every university copyright or IP policy that I’ve seen includes some carve outs from the general rule that copyright will, one way or another, end up being held by individual creators. Almost universally, universities IP policies provide that the university will retain rights sufficient to satisfy grant obligations. Some universities’ IP policies simply provide that, for example, ownership shall be determined by the terms of the grant (see, for example, the University of California system policy). In other cases, however, university IP policy accomplishes compliance with grants simply stating that all intellectual property of any kind (including copyright) created under a grant is owned by the university, full stop. This, therefore, gives the university sufficient authority to satisfy whatever grant obligations it may have. For example, the University of Texas system states that it will not assert ownership of copyright in scholarly works, but that provisio is subject to the limitation that “intellectual property resulting from research supported by a grant or contract with the government (federal and/or state) or an agency thereof is owned by the Board of Regents.” These kinds of broad ownership claw-backs raise some hard questions when it comes to publishing scholarly work. For example, when a UT author personally signs a publication agreement transferring copyright for an article that is the result of grant funding, do they actually hold the rights to make that transfer effective? 

For open access, these grant clauses are important because they are the operative terms through which the university complies with funder open access requirements. Sometimes, these licensing clauses lie somewhat dormant, with funders holding but not necessarily exercising the full scope of their rights. For example, for every article or other copyrighted work produced under a federal grant, even prior to the recent OSTP open access announcement, the government already reserved for all works produced under federal grants a broad “royalty-free, nonexclusive and irrevocable right to reproduce, publish, or otherwise use the work for Federal purposes, and to authorize others to do so.” 

Some universities also retain a broad, non-exclusive license for themselves to make certain uses of faculty-authored scholarly work, even while providing that the creator owns the copyright. For example, Georgia Tech’s policy provides that individual creators own rights in scholarly works, but Georgia Tech retains a “fully paid up, universe-wide, perpetual, non-exclusive, royalty-free license to use, re-use, distribute, reproduce, display, and make derivative works of all scholarly and creative works for the educational, research, and administrative purposes of [Georgia Tech].” Others such as the University of Maryland are less specific, providing simply that although the individual creator owns rights to their work, “the University reserves the right at all times to exercise copyright in Traditional Scholarly Works as authorized under United States Copyright Law.” Those kinds of broad licenses would seem to give the university discretion to make use of scholarly work, including, I think, for open access uses should the university decide that such uses are desirable.

Finally, a growing number of universities have policies, enacted at the behest of faculty, that specifically provide rights to make faculty scholarship openly available. The “Harvard model” is probably the most common, or at least the most well known. These types of policies allocate a license to the university, to exercise on behalf of the individual creator, with the specific intent of making the work available free of charge. Often these policies will include special limitations (e.g., the university cannot sell access to the article) or allow for faculty to opt-out (often by seeking a waiver). 

Pre-existing licenses and publishing agreements

The maze of policies and agreements can matter a great deal for the legal mechanics of effectively publishing an article openly. Of course in the scenario where authors hold rights themselves, they can retain sufficient rights through their publishing contract so they can make their work openly available, typically either via “green open access” by posting their own article to an institutional repository, or by “gold open access” directly from the publisher (though these are sometimes accompanied by a hefty article processing fee). Tools like the SPARC open access addendum are wonderful negotiating tools to ensure authors retain sufficient rights to achieve OA.

That works sometimes, but often publishing contracts come with unacceptably restrictive strings attached. For individual authors publishing with journals and publishers that have great market power, they often have little ability to negotiate for OA terms that they would prefer. 

In these situations, a pre-existing license can be a major advantage for an author. For example, for authors who are writing under the umbrella of a Harvard-style open access policy, the negotiating imbalance with journals is leveled, at least in part  because the journal knows that the university has a pre-existing OA license and also knows that although those policies often permit waivers, it’s not as easy as just telling the author “no” to claw that license back. The same is true about other forms of university pre-existing licenses that could be used to make a work available openly, such as those general licenses I mention that are retained by Georgia Tech or Maryland. While these kinds of pre-existing licenses are seldom acknowledged in journal publishing agreements, sophisticated publishers with large legal teams are undoubtedly aware of them. Because of that, I think there are strong arguments that their publishing agreements with authors implicitly incorporate them (or, if not, good arguments that a publisher that does not recognize them is intentionally interfering with a pre-existing contractual relationship between author and their university). Funder mandates, made effective through university IP policies, take the scenario a step further and force the issue: either the journal acquiesces or it doesn’t publish the paper at all. There is often no waiver option. Of course there are other pathways that both funders and journals may be willing to accept – many funders are willing to support OA publishing fees, and many journals will happily accept OA license terms for a price. 


Although the existing, somewhat messy, maze of institutional IP policies, publishing agreements, and OA policies can seem daunting, understanding their terms is important for authors who want to see their works made openly available. I’ll leave for another day to explore whether it’s a good thing that the rights situation is so complex. In many situations, rights thickets like these can be a real detriment to authors and access to their works. In this case the situation is at least nuanced such that authors are able to leverage pre-existing licenses to avoid negotiating away the bundle of rights they need to see their works made available openly. 

Prosecraft, text and data mining, and the law

Posted August 14, 2023

Last week you may have read about a website called, a site with an index of some 25,000 books that provided a variety of data about the texts (how long, how many adverbs, how much passive voice) along with a chart showing sentiment analysis of the works in its collection and displayed short snippets from the texts themselves, two paragraphs representing the most and least vivid from the text. Overall, it was a somewhat interesting tool, promoted to authors to better understand how their work compares to those of other published works. 

The news cycle about was about the campaign to get its creator Benji Smith to take the site down (he now has) based on allegations of copyright infringement. A Gizmodo story about it generated lots of attention, and it’s been written up extensively, for example here, here, here, and here.  

It’s written about enough that I won’t repeat the whole saga here. However, I think a few observations are worth sharing:  

1) Don’t get your legal advice from Twitter (or whatever its called)

Fair Use does not, by any stretch of the imagination, allow you to use an author’s entire copyrighted work without permission as a part of a data training program that feeds into your own ‘AI algorithm.’”  – Linda Codega, Gizmodo (a sentiment that was retweeted extensively)

Fair use actually allows quite a few situations where you can copy an entire work, including situations when you can use it as part of a data training program (and calling an algorithm “AI” doesn’t magically transform it into something unlawful). For example, way back in 2002 in Kelly v. Ariba Soft, the 9th Circuit concluded that it was fair use to make full text copies of images found on the internet for the purpose of enabling web image search. Similarly, in AV ex rel Vanderhye v. iParadigms, the 4th Circuit in 2009 concluded that it was fair use to make full text copies of academic papers for use in a plagiarism detection tool.  

Most relevant to prosecraft, in Authors Guild v. HathiTrust (2014)  and Authors Guild v. Google (2015) the Second Circuit held that Google’s copying of millions of books for purposes of creating a massive search engine of their contents was fair use . Google produced full-text searchable databases of the works, and displayed short snippets containing whatever term the user had searched for (quite similar to prosecraft’s outputs). That functionality also enabled a wide range of computer-aided textual analysis, as the court explained: 

The search engine also makes possible new forms of research, known as “text mining” and “data mining.” Google’s “ngrams” research tool draws on the Google Library Project corpus to furnish statistical information to Internet users about the frequency of word and phrase usage over centuries.  This tool permits users to discern fluctuations of interest in a particular subject over time and space by showing increases and decreases in the frequency of reference and usage in different periods and different linguistic regions. It also allows researchers to comb over the tens of millions of books Google has scanned in order to examine “word frequencies, syntactic patterns, and thematic markers” and to derive information on how nomenclature, linguistic usage, and literary style have changed over time. Authors Guild, Inc., 954 F.Supp.2d at 287. The district court gave as an example “track[ing] the frequency of references to the United States as a single entity (‘the United States is’) versus references to the United States in the plural (‘the United States are’) and how that usage has changed over time.”

While there are a number of generative AI cases pending (a nice summary of them is here) that I agree raise some additional legal questions beyond those directly answered in Google Books, the kind of textual analysis that offered seems remarkably similar to the kinds of things that the courts have already said are permissible fair uses. 

2) Text and data mining analysis has broad benefits

Not only is text mining fair use, it also yields some amazing insights that truly “promote the progress of Science,” which is what copyright law is all about.  Prosecraft offered some pretty basic insights into published books – how long, how many adverbs, and the like. I can understand opinions being split on whether that kind of information is actually helpful for current or aspiring authors. But, text mining can reveal so much more. 

In the submission Authors Alliance made to the US Copyright Office three years ago in support of a Section 1201 Exemption permitting text data mining, we explained:

TDM makes it possible to sift through substantial amounts of information to draw groundbreaking conclusions. This is true across disciplines. In medical science, TDM has been used to perform an overview of a mass of coronavirus literature.Researchers have also begun to explore the technique’s promise for extracting clinically actionable information from biomedical publications and clinical notes. Others have assessed its promise for drawing insights from the masses of medical images and associated reports that hospitals accumulate. 

In social science, studies have used TDM to analyze job advertisements to identify direct discrimination during the hiring process.7 It has also been used to study police officer body-worn camera footage, uncovering that police officers speak less respectfully to Black than to white community members even under similar circumstances.

TDM also shows great promise for drawing insights from literary works and motion pictures. Regarding literature, some 221,597 fiction books were printed in English in 2015 alone, more than a single scholar could read in a lifetime. TDM allows researchers to “‘scale up’ more familiar humanistic approaches and investigate questions of how literary genres evolve, how literary style circulates within and across linguistic contexts, and how patterns of racial discourse in society at large filter down into literary expression.” TDM has been used to “observe trends such as the marked decline in fiction written from a first-person point of view that took place from the mid-late 1700s to the early-mid 1800s, the weakening of gender stereotypes, and the staying power of literary standards over time.” Those who apply TDM to motion pictures view the technique as every bit as promising for their field. Researchers believe the technique will provide insight into the politics of representation in the Network era of American television, into what elements make a movie a Hollywood blockbuster, and into whether it is possible to identify the components that make up a director’s unique visual style [citing numerous letters in support of the TDM exemption from researchers].

3) Text and data mining is not new and it’s not a threat to authors

Text mining of the sort it seemed prosecraft employed isn’t some kind of new phenomenon. Marti Hearst, a professor at UC Berkeley’s iSchool explained the basics in this classic 2003 piece. Scores of computer science students experiment with projects to do almost exactly what prosecraft was producing in their courses each year. Textbooks like Matt Jockers’s Text Analysis with R for Students of Literature have been widely used and adopted all across the U.S. to teach these techniques. Our submissions during our petition for the DMCA exemption for text and data mining back in 2020 included 14 separate letters of support from authors and researchers engaged in text data mining research, and even more researchers are currently working on TDM projects. While fears over generative AI may be justified for some creators (and we are certainly not oblivious to the threat of various forms of economic displacement), it’s important to remember that text data mining on textual works is not the same as generative AI. On the contrary, it is a fair use that enriches and deepens our understanding of literature rather than harming the authors who create it.

The appropriation bill that would defund the OSTP open access memo

Posted July 27, 2023

A couple of weeks ago the U.S. House Appropriations Subcommittee on Commerce, Justice, and Science (CJS) released an appropriations bill containing language that would defund efforts to implement a federal, zero-embargo open access policy for federally funded research.

We think this is a fantastically bad idea. One of the most important developments in the movement for open access to scholarship came last year when Dr. Alondra Nelson, Director of the Office of Science and Technology Policy, issued a memorandum mandating that all federal agencies that sponsor research put in place policies to ensure immediate open access to published research, as well as access to research data. The agencies are at various stages of implementing the Nelson memo now, but work is well underway. This appropriations bill specifically targets those implementation efforts and would prevent any federal government expenditures from being used to further them. 

For the vast majority of scholarly works, the primary objective of the authors is to share their research as widely as possible. Open access achieves that for authors (if they can only get their publishers to agree). The work is already funded, already paid for. As you might imagine, those opposed to the memo are primarily publishers who have resisted adapting the business model they’ve built of putting a paywall in front of publicly-funded work, largely for profit. 

Thankfully, the CJS appropriations bill, one of twelve appropriation bills, is just a first crack at how to fund the government in the coming year. The Senate, of course, will have their say, as will the President. With the current division in Congress, combined with the upcoming recess (Congress will be on recess in August and reconvene in September), the smart bet is that none of these bills will be enacted in time for the federal government’s new fiscal year on October 1. Instead, a continuing resolution–funding the government under the status quo, as Congress frequently does–will likely be enacted as a stop gap until a compromise can be reached later in the year. 

It is important, however, that legislators understand that this attempt to defund OA efforts is majorly concerning, especially for authors, universities, and libraries that believe that federally funded research should be widely available on an open access basis. It’s a good moment to speak out. SPARC has put together a helpful issue page on this bill, complete with sample text for how to write to your representative or senator. 

As you’ll see if you read the proposed appropriations bill, it is loaded with politics. The relevant OSTP memo language is located amongst other clauses that defund the Biden administration’s efforts to implement diversity initiatives at various agencies, address gun violence, sue states over redistricting, and dozens of other hot-button issues as well. It’s pretty easy for an issue like access to science to get lost in the political shuffle, but we hope with some attention from authors and others, it won’t.

The Anti-Ownership Ebook Economy

Posted July 25, 2023
The Anti-Ownership Ebook Economy

Earlier this month, the Engelberg Center on Innovation Law and Policy at NYU Law released a groundbreaking new report: The Anti-Ownership Ebook Economy: How Publishers and Platforms Have Reshaped the Way We Read in the Digital Age is a detailed report that traces the history of ebooks and, through a series of interviews with publishers, platforms, librarians, and others, explains how the law and the markets have converged to produce the dysfunction we see today in the ebook marketplace.

The report focuses especially closely on the role of platform companies, such as Amazon, Apple and OverDrive, which now play an enormous role in controlling how readers interact with ebooks. “Just as platforms control our tweets, our updates, and the images that we upload, platforms can also control the books we buy, keeping tabs on how, when, and where we use them, and at times, modifying or even deleting their content at will.” 

Claire Woodcock

Last Friday, I spoke with one of the authors, Claire Woodcock, to learn a little bit more about the project and its goals: 

Q: What was your motivation to work on this project? 

A: My co-authors, Michael Weinberg, Jason Schultz, and Sarah Lamdan had all been working on this for well over a year [before] I joined. I knew Sarah from another story I’d written about an ebook platform that was prioritizing the platforming of disinformation last year, and she had approached me about this project. When I hopped on a call with the three of them, I believe it was Michael who posed the core question of this project: “Why can we not own, but only license ebooks?” 

I’ve thought about that question ever since. So my role in joining the project was to help talk to as many people as we could – publishers, librarians, platforms, and other stakeholders to try to understand why not. It seems like a simple question but there are so many convoluted reasons and we wanted to try to distill this down. 

Q: Many different people were interviewed for this project. Tell me about how that went. 

A: There was actually some hesitation to talk; I think a reason why was almost extreme fear of retaliation. So, it took a while to crack into learning about some of the different areas, especially with some publishers and platforms. I wish there was more of a willingness to engage on the part of some publishers, who would flat out tell me things like they weren’t authorized to talk about their company’s internal practices , or from platforms like OverDrive, who we sent our list of questions over to and never heard from again (until I ran into Steve Potash at the American Library Association’s Annual Conference). I’d have loved to hear more from them directly when I was actively conducting interviews.

Q: I noticed there weren’t many interviews with authors. Can you say why not? 

A: Authors weren’t as big of a focus because we realized, particularly in talking with several literary agents, that from a business and legal perspective authors don’t have much of a say in how their books are distributed. Contractually, they aren’t involved in downstream use. I think it would be really interesting to do a follow up with authors to get their perspective on how their books are licensed or sold online.

Q: The report contains a number of conclusions and recommendations. Which among them are your favorite? 

A: One of the most striking things I learned, and what stuck out to me the most when I went back and listened to the interviews, is the importance of market consolidation and lack of competition. OverDrive has roughly 95% of the ebook marketplace for libraries (and I know it’s different for academic publishing, for sure). The lack of competition in our society, especially in this area, makes it hard to speak up and speak out when a certain stakeholder has issues with the dominant market players. Because of that, looking at each of the groups of stakeholder types we spoke with, each could point to other groups causing the problem (it reminds me of the spiderman meme) and there are platforms and other publishers, mostly smaller, who want to make this work but the major players are not doing that. It also stuck out that, almost everyone we talked to talks about librarians as partners, but when we talk to the librarians, they say “they think we are partners, but we don’t feel like we have a seat at the table, decisions that impact us are often made without consulting us in a way that is transparent.” 

Q: If you could do a follow up study, what additional big questions would you focus on? 

A: Lots of people talked about audiobooks. We were focused on ebooks, but the audiobook market is even more concentrated, and lots of people raised the issue that ebooks are only part of the issue. There is a version of this that is happening with audiobooks as well. I also think that the intersections of this market with television, platform streaming, and even other consumer goods like toys and other parts of the market are really interesting. What we’re seeing here, it’s a version of what’s happening in other creative industries. 

I also think it would be worth learning more about how libraries and others are working around the current issues. For example, lots of libraries ask for perpetual licenses, since they’re looking at working within the current context and looking at contracts so they can get assurances, for example if something happens to the publishers platform, the library could still get some assurances that even if something happened to the company, the license agreement could still be honored. But are those efforts actually effective? And, given the importance of licensing, it might also be interesting to explore how libraries are resourced to negotiate those agreements – for example, training and staff to negotiate. I think if libraries were better funded they would probably be able to better handle these challenges. 

Authorship and Ebook Licensing: Introducing the Library Ebook Pledge

Posted July 12, 2023

Authors rarely have meaningful rights to say how their publisher licenses or distributes their book. 

A typical publishing contract will grant the publisher broad discretion to determine the format, price and sublicensing terms under which an author’s book is made available. It can be hard to negotiate for the right to have a say over those terms. Even contracts designed to prioritize authors’ rights, such as the Authors Guild model trade contract,  don’t contemplate an author exercising much control over these matters, and leave most publication and distribution details “as Publisher determines.”

In many cases, ceding control can be OK as long as the interests of the publisher and author are tightly aligned. It’s why we recommend authors pay close attention to the mission and practices of their publisher before signing a contract. But even when a publisher purports to share the author’s interests, this could change in the future, and information the publisher provides about itself can be misleading.

Sometimes, it’s hard to see how those interests diverge until it’s too late. For example, recall last year when academic publisher Wiley decided to remove some 1,300 ebooks from online library collections. We quickly found that many authors of those books objected strongly, and joined us in a letter that outlined concerns and expressed dismay that Wiley, an academic publisher that supposedly prioritizes” access to knowledge,” would make such an aggressive and profit-maximizing decision. But under their contracts, those authors had no legal grounds to push back. 

Library distribution in particular is an area of concern. Libraries provide an important way for authors to connect with readers, and provide a means of access to their books for many people who might otherwise never read them. Libraries also serve an important democratic function in supporting widespread learning that we all benefit from. We’ve written several times over the years about challenges that libraries face in licensing ebooks, and it’s why we’ve supported model state legislation to address the problem and also why we’ve supported models like controlled digital lending that allow for limited access outside of the licensing model.

In addition to basic economic concerns about gouging libraries on price (in some cases publishers have decided to charge libraries 10x the consumer list price for ebook access), some publishers have imposed a variety of other terms that we find unreasonable. This includes, for example, only offering ebooks to libraries through large bundles of content rather than title-by-title, which forces libraries to buy access to books that aren’t necessarily relevant for their community (a practice which also obfuscates and dilutes per-title sales and consequently author royalties). Or limiting access for use only on platforms controlled by the publisher, which can contain significant compromises for reader privacy. Perhaps the most frustrating is the flat refusal to deal – with some publishers refusing to sell some ebooks to libraries at all, in the hopes of driving some would-be library readers (likely a very small percentage of them)  into buying a personal copy. 

Introducing the Library Ebook Pledge

What libraries need to do their jobs in the digital environment isn’t all that complicated. For physical books, libraries have been successful in reaching readers  because they have had clear rights to purchase, lend, and preserve. Publishers have limited, by contract, libraries’ ability to do those same activities with ebooks, but it doesn’t have to be that way. That’s why we’ve been pleased to work with Knowledge Rights 21 and Library Futures to outline twelve basic principles that represent a reasonable approach to ensuring that libraries can continue to do their jobs online.

We know that many publishers care deeply about the role of libraries in supporting research, education and learning. This Pledge, which can be viewed here,  offers a way for those publishers to express their support and commitment to 21st century libraries, so libraries can provide meaningful preservation of and access to ebooks for their readers. We’re encouraged to see some publishers already signing on, and encourage others to do so as well.

We also think this pledge is a valuable tool for authors who care about access to their works. While negotiating for control over distribution can be a challenge, we are hopeful that authors can try to incorporate these principles into their contracts and use this pledge to ask publishers to publicly communicate their intent to license ebooks in ways that will account for the public interest. 

The JCPA, Again

Posted June 15, 2023
Photo by AbsolutVision on Unsplash

For those of you following along, you’ve seen the numerous posts we’ve made about the Journalism Competition and Preservation Act, e.g., here, here, and here. The bill, which neither supports competition nor preservation of journalism, does have a really compelling story. Its apparent goal is to bolster local newsrooms and journalists by making it easier for them to negotiate with companies like Google or Meta (which links to news content), adding revenue to help aid in their operations. 

Today’s update is that the JCPA is a little closer to becoming law, with the Senate Judiciary Committee voting to move the bill forward on a 14-7 vote. We again joined a group of more than two dozen civil society organizations in opposing the bill in this letter led by Public Knowledge. We also joined a large group of organizations opposing a very similar bill that was introduced earlier this year in California, with similar aims. 

While the bill has some wonderful goals, it seems destined to fail at achieving them, while doing real damage to the broader online information ecosystem. As we’ve detailed before, the JCPA seems to create a pseudo-copyright regime in which platforms would have to pay for linking to news, which is a radical change in how the internet functions. It also includes provisions that would effectively force social media platforms to carry certain news outlet coverage, even when a platform disagrees with the views that those news outlets express, thus undermining Section 230 protections for platforms that want to remove false or misleading content from their websites. 

For the actual competition issues, the bill has also been contorted so that its aims–competition and support for small news outlets–have been co-opted by the biggest commercial publishers. For example, the bill’s supporters say it doesn’t benefit the biggest news outlets, but its cap of 1,500 employees would exclude a grand total of *3* of the largest newspapers in the US, while the JCPA’s minimum threshold of $100,000 in revenue  would leave out the smallest, most vulnerable newsrooms. Further, that numerical cap also doesn’t apply to broadcasters at all, which means it actually favors companies like News Corp., Sinclair, iHeartRadio, and NBCU. 

The Senate Judiciary Committee markup earlier today (you can watch the recording here) was relatively tame, but it was clear that there was very little agreement about what the bill would actually accomplish, or what its unintended consequences might be. The recurring theme throughout was that something must be done to protect and support journalism and that it is unfair that big tech companies are reaping incredible profits while small news publishers are getting very little of the financial pie and are struggling to survive. While we agree with both of these propositions, unfortunately, the JCPA seems uniquely ineffective at fixing the problem. 

Athena Unbound and Untangling the Law of Open Access

Posted May 26, 2023

A few months ago, Authors Alliance and the Internet Archive co-hosted an engaging book talk featuring historian Peter Baldwin and librarian Chris Bourg. They discussed Baldwin’s new book, Athena Unbound: Why and How Scholarly Knowledge Should be Free For All. You can watch the recording of the talk here and access the book for free in open access format here.

Today, I’m beginning a series of posts aimed at clarifying legal issues in open access scholarship. Reflecting on some key takeaways from Athena Unbound seemed like a great place to start.

For those already well-versed in the open access community, you know that there is an abundance of literature covering the theory, economics, and sociological dimensions of OA. But, it’s easy to lose the forest for the trees.  Athena Unbound stands out by providing a comprehensive, high-level explanation of how we have reached the current state of open access affairs. The book offers much more than just commentary on the underlying legal structures that impact access to scholarly works. But, as we delve deeper into the legal aspects of open access in this series, I want to highlight three key takeaways on this issue:

  1. Copyright law does not cater to most academic authors.

“Open access does not seek to dispossess authors of their property nor to stint them of their rightful earnings. But authors are not all alike. Those whose creativity supplies their livelihood are entitled to the fruits of their labor. But most authors either do not make a living from their work or are already supported in other ways.” – Athena Unbound, Chapter 2, “The Variety of Authors and Their Content”

In theory, copyright law in the United States is designed to incentivize the creation of new works by granting strong and long-lasting economic rights. This framework assumes authors primarily function as independent operators (Baldwin likens them to “bohemian artistes”) who can negotiate these rights with publishers or directly with members of the public in exchange for financial support.

However, this framework does not align with the reality faced by most academic authors, who number in the millions. While scholarly authors deserve compensation for their work, their remuneration also often comes from sources like university employment. Their motivation to create stems from incentives to share ideas and discoveries with the world, as well as personal gains such as recognition and career advancement. For these authors, the publishing system and the laws that govern it have clash with their interests to such an extent that we now witness academic authors willingly paying thousands of dollars to persuade publishers to distribute their articles for free.

If anything, copyright law, with its excessively long duration, extensive economic control, and limited freedom for researchers to engage with creative works, hampers those authors’ goals in practice. As Baldwin explains, “the fundamental problem open access faces is worth restating. Copyright has become bloated, prey to the rent-seeking academic publishing industry… Legislators, dazzled into submission by the publishing industry’s success in portraying itself as the defender of creativity and cultural patrimony, bear much responsibility.”

As we explore the legal mechanisms that influence open access, it is crucial to remember that the default rules of the system are more often than not at odds with the goals of open access authors. 

  1. Open access must encompass more than contemporary scientific articles.

While much of the current open access discourse revolves around providing access to the latest scholarly research, particularly scientific articles, there is a vast amount of past scholarship that remains inaccessible. An inclusive approach to open access should address how to provide access to these works as well. The majority of research library holdings are not available online in any form. Baldwin uses the term “grey literature” to describe the extensive collections in research libraries that are no longer commercially available. As he points out, most books lose commercial viability rather quickly. “Of the 10,000 US books published in 1930, only 174 were still in print in 2001. Of the 63 books that won Australia’s Miles Franklin prize over the past half-century, ten are unavailable in any format.”

Many of these works have become so-called orphan works: they are so detached from the commercial marketplace that their publishers have gone out of business, authors have passed away, and any remaining rights holders who would benefit from potential sales are obscure, if they exist at all. Even Maria Pallante, former Register of Copyrights and current AAP president, agrees that in the case of true orphan works, “it does not further the objectives of the copyright system to deny use of the work, sometimes for decades. In other words, it is not good policy to protect a copyright when there is no evidence of a copyright owner.”

In addition to this issue around orphan works, a subset of what is known as the “20th Century black hole,” Athena Unbound also sheds light on the various concerns and challenges that act as barriers to open access in scholarly fields outside of the sciences. While the goals of open access may be the same across these different areas, the implementation can vary significantly. In the case of certain scholarly works, such as older books entangled in complex rights issues, we may need to settle for an imperfect form of “open,” such as read-only viewing via controlled digital lending—a far cry from what many consider true open access.

  1. The intricacies of ownership are significant.

Although this is not the primary focus of Athena Unbound, it is an important aspect that deserves attention. In simple terms, the legal pathway to open access appears straightforward: authors, often depicted as individual, independent actors, must retain sufficient rights to allow them to legally share and allow reuse of their writing.

However, reality is far more complex. Multiple-authored works, including in extreme cases thousands of joint authors on one scientific article, can complicate our understanding of who actually holds a copyright interest in a work and can therefore authorize an open license on it. 

Moreover, many if not most academic authors are employed by colleges or universities, each with its own perspective on copyright ownership of scholarly publications. In most cases, as Baldwin explains, universities have been hesitant to assert ownership of scholarly publications under the work-for-hire doctrine (a topic I will cover in a subsequent post), possibly based on the increasingly tenuous “teacher exception” to the work-for-hire doctrine. However, this approach is not universally adopted. For instance, some universities assert ownership of specific categories of scholarly work, such as articles produced under grant-funded projects. Others reserve broad licenses to use scholarly work for university purposes, albeit with ill-defined parameters.

Open access, or at least the type we commonly think of—copyrighted articles typically licensed under Creative Commons or similar licenses—depends heavily on obtaining affirmative permission from the rightsholder. But the identity of the rightsholder, whether it be the university, author, or even the funder, can vary significantly due to a wide range of factors, including state laws, university IP policies, and funder grant contracts. 

Stay tuned for more in this series, and if you have questions in the meantime, check out our open access guide and resource page.

Book Talk: Against Progress by Jessica Silbey

Posted May 8, 2023

Join journalist MARIA BUSTILLOS for a virtual book talk with author & professor of law JESSICA SILBEY for her latest book, AGAINST PROGRESS.


When first written into the Constitution, intellectual property aimed to facilitate “progress of science and the useful arts” by granting rights to authors and inventors. Today, when rapid technological evolution accompanies growing wealth inequality and political and social divisiveness, the constitutional goal of “progress” may pertain to more basic, human values, redirecting IP’s emphasis to the commonweal instead of private interests.

Against Progress considers contemporary debates about intellectual property law as concerning the relationship between the constitutional mandate of progress and fundamental values, such as equality, privacy, and distributive justice, that are increasingly challenged in today’s internet age. Following a legal analysis of various intellectual property court cases, Jessica Silbey examines the experiences of everyday creators and innovators navigating ownership, sharing, and sustainability within the internet eco-system and current IP laws. Crucially, the book encourages refiguring the substance of “progress” and the function of intellectual property in terms that demonstrate the urgency of art and science to social justice today.

Purchase Against Progress from Stanford University Press.

JESSICA SILBEY is Professor of Law at the Boston University School of Law. She is the author of Against Progress: Intellectual Property and Fundamental Values in the Internet Age (Stanford, 2022), The Eureka Myth: Creators, Innovators, and Everyday Intellectual Property (Stanford, 2015), and was a Guggenheim Fellow in 2018.

May 9 @ 10am PT / 1pm ET
Register now for the free, virtual event

An Update on our Text and Data Mining: Demonstrating Fair Use Project

Posted April 28, 2023

Back in December we announced a new Authors Alliance’s project, Text and Data Mining: Demonstrating Fair Use, which is about lowering and overcoming legal barriers for researchers who seek to exercise their fair use rights, specifically within the context of text data mining (“TDM”) research under current regulatory exemptions. We’ve heard from lots of you about the need for support in navigating the law in this area. This post gives a few updates. 

Text and Data Mining Workshops and Consultations

We’ve had a tremendous amount of interest and engagement with our offers to hold hands-on workshops and trainings on the scope of legal rights for TDM research. Already this spring, we’ve been able to hold two workshops in the Research Triangle hosted at Duke University, and a third workshop at Stanford followed by a lively lunch-time discussion. We have several more coming. Our next stop is in a few weeks at the University of Michigan, and we have plans in the works for workshops in the Boston area, New York, a few locations on the West Coast, and potentially others as well. If you are interested in attending or hosting a workshop with TDM researchers, librarians, or other research support staff, please let us know! We’d love to hear from you. The feedback so far has been really encouraging, and we have heard both from current TDM researchers and those for whom the workshops have opened their eyes to new possibilities. 

ACH Webinar: Overcoming Legal Barriers to Text and Data Mining
Join us! In addition to the hands-on in-person workshops on university campuses, we’re also offering online webinars on overcoming legal barriers to text and data mining. Our first is hosted by the Association for Computers and the Humanities on May 15 at 10am PT / 1pm ET. All are welcome to attend, and we’d love to see you online!
Read more and register here. 


A second aspect of our project is to research how the current law can both help and hinder TDM researchers, with specific attention to fair use and the DMCA exemption that Authors Alliance obtained for TDM researchers to break digital locks when building a corpus of digital content such as ebooks or DVDs.

Christian Howard-Sukhil, Authors Alliance Text and Data Mining Legal Fellow

To that end, we’re excited to announce that Christian Howard-Sukhil will be joining Authors Alliance as our Text and Data Mining Legal Fellow. Christian holds a PhD in English Language and Literature from the University of Virginia and is currently pursuing a JD from the UC Berkeley School of Law. Christian has extensive digital humanities and text data mining experience, including in previous roles at UVA and Bucknell University. Her work with Authors Alliance will focus on researching and writing about the ways that current law helps or hinders text and data mining researchers in the real world. 

The research portion of this project is focused on the practical implications of the law and will be based heavily on feedback we hear from TDM researchers. We’ve already had the opportunity to gather some feedback from researchers including through the workshops mentioned above, and plan to do more systematic outreach over the coming months. Again, if you’re working in this field (or want to but can’t because of concerns about legal issues), we’d love to hear from you. 

At this stage we want to share some preliminary observations, based on recent research into these issues (supported by the work of several teams of student clinicians) as well as our recent and ongoing work with TDM researchers:

1) Licenses restrictions are a problem. We’ve heard clearly that licenses and terms of use impose a significant barrier to TDM research. While researchers are able to identify uses that would qualify as fair use and also many uses that likely qualify under the DMCA exemption, terms of use accompanying ebook licenses can override both. These terms vary, from very specific prohibitions–e.g., Amazon’s, which says that users “may not attempt to bypass, modify, defeat, or otherwise circumvent any digital rights management system”–to more general prohibitions on uses that go beyond the specific permissions of the license–e.g., Apple’s terms, which state that “No portion of the Content or Services may be transferred or reproduced in any form or by any means, except as expressly permitted.” Even academic licenses, often negotiated by university libraries to have  more favorable terms, can still impose significant restrictions on reuse for TDM purposes. Although we haven’t heard of aggressive enforcement of those terms to restrict academic uses, even the mere existence of those terms can have chilling and negative real world impacts on research using TDM techniques.

The problem of licenses overriding researchers rights under fair use and other parts of copyright law is of course not limited to just inhibiting text and data mining research. We wrote about the issue, and how easy it is to evade fair use, a few months ago, discussing the many ways that restrictive license terms can inhibit normal, everyday uses of works such as criticism, commentary and quotation. We are currently working on a separate paper documenting the scope and extent of “contractual override,” and will be part of a symposium on the subject in May, hosted by the Association of Research Libraries and the American University, Washington College of Law Program on Information Justice and Intellectual Property.

2) The TDM exemption is flexible, but local interpretation and support can vary. We’ve heard that the current TDM exemption–allowing researchers to break technological protection measures such as DRM on ebooks and CSS on DVDs–is an important tool to facilitate research on modern digital works. And we believe the terms of that exemption are sufficiently flexible to meet the needs of a variety of research applications (how wide a variety remains to be seen through more research). But local understanding and support for researchers using the exemption can vary. 

For example, the exemption requires that the university that the TDM research is associated with implement “effective security measures” to ensure that the corpus of copyrighted works isn’t used for another purpose. The regulation further explains that in the absence of a standard negotiated with content holders, “effective security measures” means “measures that the institution uses to keep its own highly confidential information secure.” University  IT data security standards don’t always use the same language or define their standard to cover “highly confidential information” and so university IT offices must interpret this language and implement the standard in their own local context. This can create confusion about what precisely universities need to do to secure the TDM corpora. 

Some of these definitional issues are likely growing pains–the exemption is still new and universities need time to understand and implement standards to satisfy its terms in a reasonable way–it will be important to explore further where there is confusion on similar terms and how that might best be resolved. 

3) Collaboration and sharing are important. Text and data mining projects are often conceived of as part of a much larger research agenda, with multiple potential research outputs both from the initial inquiry and follow-up studies with a number of researchers, sometimes from a number of institutions. Fair use clearly allows for collaborative TDM work –e.g., in  Authors Guild v. HathiTrust, a foundational fair use case for TDM research in the US, we observe that the entire structure of HathiTrust is a collective of a number of research institutions with shared digital assets. And likewise, the TDM exemption permits a university to provide access to “researchers affiliated with other institutions of higher education solely for purposes of collaboration or replication of the research.” The collaborative aspect of this work raises some challenging questions, both operationally and conceptually. For example, the exemption for breaking digital locks doesn’t define precisely who qualifies as a researcher who is “affiliated,” leaving open questions for universities implementing the regulation. More conceptually, the issue of research collaboration raises questions about how precisely the TDM purpose must be defined when building a corpora under the existing exemption, for example when researchers collaborate but investigate different research questions over time. Finally, the issue of actually sharing copies of the corpus with researchers at other institutions is important because at least in some cases, local computing power is needed to effectively engage with the data. 

Again, just preliminary research, but some interesting and important questions! If you are working in this area in any capacity, we’d love to talk. The easiest way to reach us is at

Want to Learn More?
This current Authors Alliance project is generously supported by the Mellon Foundation, which has also supported a number of other important text and data mining projects. We’ve been fortunate to be part of a broader network of individuals and organizations devoted to lowering legal barriers for TDM researchers. This includes efforts spearheaded by a team at UC Berkeley to produce the “Legal Literacies for Text Data Mining” and its current project to address cross-border TDM research, as well as efforts from the Global Network on Copyright and User Rights, which has (among other things) led efforts on copyright exceptions for TDM globally.