
Below is an interview with Charles Watkinson, Director of the University of Michigan Press and Associate University Librarian for Publishing at University of Michigan Library. In it, we explore Michigan’s approach to artificial intelligence licensing, transparency, and attribution, as well as the Press’s approach to open access.
Dave: Thank you for agreeing to talk with me. My first question is about AI. AI is just sucking the air out of the room in every conversation I’ve had with authors and publishers over the last two years. I wonder if you could talk about what your goals are when it comes to Artificial Intelligence for the Press.
Charles: There are clearly lots of opportunities to use these technologies broadly referred to as AI. Gen AI being the latest focus, but of course a lot of work prior to that having happened. I think the balance is between embracing those opportunities and staying grounded in values.
My background is in archaeology, so I’m particularly excited by thinking through that kind of disciplinary lens about the opportunities. One of the things in archaeology is just very poorly organized information. No common taxonomies for scholarly information, lots of different national traditions, lots of different languages, different names for the same artifacts in different countries. And so the opportunity to do really deep searching, like making sense of a whole mass of information that is poorly structured. That feels like one of the most exciting opportunities that these technologies offer. So I’m excited by the opportunities that exist.
But I also recognize that there are some real problems in the way that Gen AI has been implemented by commercial companies in the last couple of years. So, for example, taking an approach that strips out provenance is a big issue. And I think that’s the number one thing we hear from our authors: that they want to get credit for their ideas. And that is not an unreasonable request. And when they look at AI – the work that’s being done on the structure of language and language training is a lot less interesting than the work on applied particular domains of knowledge.
I do have colleagues who are very adamantly opposed to AI, and that’s in the library as well as in the press. And I recognize that they have very valid concerns about labor, about environmental impact. I worry about being held back from exploring some of the benefits by concerns that we have so little control over. While at the same time recognizing that their concerns are extremely valid. But I do think that the particular issues of provenance, I think provenance and transparency are the things that we really can make a difference in.
Dave: Thanks. So you’re aware that lots of licensing is happening in this space right now. And we see big, big deals announced, you know, Getty and AP and these kinds of big media conglomerates, but we’re also starting to see this from presses, including academic presses. Could you talk a little bit about Michigan’s approach to thinking about licensing for AI?
Charles: Yes. We started to get approached by big technology companies in 2023. And had extensive conversations. And it was clear that there was very little alignment about what was important and there were some aspects of those conversations that, in the end, led them to collapse. One of the big things was their unwillingness to allow us to be transparent about the identities of the companies we were working with. It was very difficult to envisage why we would or could hide that information when talking to our authors.
I mean, turning around to an author and saying, “I can’t tell you who I’m licensing your content to” doesn’t seem like a good idea. It’s not good for trust. And then the other thing was there was an unreasonable expectation that we would indemnify the outputs. I mean, fine that we should indemnify the inputs because we know our own content and what would be going into these systems, but how could we possibly indemnify the outputs when we had no clue what these companies would be producing? So that was weird. Completely unreasonable. And then… just in general a “scope” problem. So the scope of what they wanted to do with our content was extremely broad. And that would include derivative works of various sorts. And really, we wanted to keep it to just one kind of use, which was the training of the language models which they’d originally approached us for. So, anyhow, a complete values misalignment and a big gap in power differential.
Those experiences have led us to taking an approach of actually asking authors if they would like to opt into one of these agreements or these kinds of agreements because we wanted to really focus on author choice. We did a survey of a whole bunch of our authors. And we did get about 60% opt-in and 40% opt-out. We do now have a group of authors who would like to be involved in this licensing work. So we are pursuing licensing, but through third parties. Because we realized that we’re just too small to engage individually with these technology companies and get anything like a good deal out of them. We haven’t actually made any money yet but now I’m on the hook to kind of make some money for the authors who opted in, which is a situation. What also strikes me is that this activity so far has all been focused on language training for the AI systems. What’s much more exciting is definitely the solutions that come next. And in terms of ones where the tools are interested in the knowledge embedded in the language as well as the language itself. And I’d love to take a different approach to the licensing for those purposes, than working with these big values-misaligned companies.
Dave: I’m curious about feedback that you’ve heard from authors. This is really interesting to hear: 60% or so opting in. I wonder if you got any feedback either formally through a survey or otherwise about the things that authors care about here, like is it just, “I’d like a paycheck out of this” or are there other factors?
Charles: I think there was a definite disciplinary divide. We publish a lot in political science and it was social scientists who were totally fine with the licensing. And I think they are more used to thinking about their books as data essentially. And they’re also the authors who are most interested in metrics and indexing of various sorts. So I think they saw it as an extension of that.
It’s the authors who were really focused on the craft of writing, the form of writing, the experience that a scholar gets through writing through a monograph, you know, just that the whole idea of the book as an intellectual process as well as a product. Those authors were the ones who were most concerned. And for them, I think it really was about the preciousness, in a good way, of their labor, the worry about the craft being undermined for their colleagues.
So rather than very specific things, a lot of emotion. And I don’t mean that in a way that diminishes it as being valid, it’s a lot of investment and feeling of care around the work. Rather than any specific thing. But in terms of specifics that did manifest in terms of credit, provenance, integrity. That was important to them.
Dave: Speaking of the craft of writing. Lots of publishers have said that they’re getting flooded with submissions of low quality, because AI makes it very, very easy to just produce large amounts of text. Could you talk a little bit about how the press is handling AI in the writing process or the submissions that you’re receiving from authors?
Charles: We are not seeing that trend at the moment because the submissions we receive are very pre-vetted. And by that I mean that we publish in very, very specific areas of humanities and social sciences. And we are known in those areas and we tend to receive submissions that the authors have selected to send us. And then we have acquisitions editors who are the first filter and are less tolerant of works that don’t don’t fit in some way. And then we have a very elaborate process. So acquisitions editors filter, then do the peer review activity, quite often there’ll be a series editor as well doing review then the acceptable works go to our executive committee of faculty members at University of Michigan. And they also look at the whole manuscript and they screen it. And then it will often go through another iteration of the process. Poor writing places – where AI has done generation – Gen AI has generated text – those are, I believe, going to be fairly quickly spotted and, if they’re irrelevant to the work, they’re going to be eradicated.
Now, where AI has assisted is a different issue. And I think we need to be very open to AI assistance because all of us are using it without knowing it anyhow. I use Grammarly myself and I know it’s got substantial AI tools built in. So how to draw that line between AI generated and AI assistance will be our big challenge. The place where we are possibly seeing more AI is in images. So we’ve had a couple of AI-generated covers coming through with authors very passionate about them. They have the characteristic of collage, which is always a warning sign for our books but we’ve let some authors use those and then the question of image manipulation or cleanup – that’s a really interesting one. And I think that’s where we may be seeing the most AI intervention from authors and then there are good areas that we’re really welcoming, such as the use of AI to pre-describe images to allow the author to get quicker to an alt text description which is acceptable. And that’s an exciting area that we’re really on board with.
Dave: Interesting. So I want to switch gears a little bit and talk about open access. And I’ll circle back to AI in a minute and ask how those two things fit together. But maybe first you could talk a little bit about Fund to Mission and just how Michigan has gone about pursuing an open access program.
Charles: Fund to Mission started at the same time actually in parallel to the MIT Press Direct to Open program. And they’re very similar. The idea is that a library will purchase our scholarly front list which is about 80 monographs a year. And there will be a willingness to pay what they would have paid had the books all been restricted access, with the understanding that 75% of the books will be open access. And as a thank you, they will receive term access to the entire backlist. That term access means that it’s just during the year that they purchase. And it’s been going well. We have about 250 libraries that support that program.
The cool thing about Fund to Mission that’s different from Direct to Open, and is a product of our placement within the university, is that library investments are matched by the Provost’s investments at the University of Michigan. So it’s almost like a public radio challenge grant, you know, or that’s how it works out. Libraries can see that the University of Michigan has skin in the game, and they’re not subsidizing the whole thing.
And then we also have two other pillars, one being sales of print books, which continue but are not, in most cases, increased by open access. In fact, they probably decreased for us by about 40% for specialist books. So those print sales continue, but they’re less of a contributor than perhaps we’d hoped. And then the other part is we will always ask if an author at an institution has support from their institution. But we will never require that. So that’s a crucial distinction that this is not “BPC required.” It is an option. However, the idea overall is that open access costs should be spread around the different beneficiaries. And that libraries cannot be expected to support the whole flip to open access for books. So it’s going pretty well. We’ve been doing it for about two or three years now. And I think we’re close to being sustainable in this model.
Dave: I think I remember reading when it first rolled out, the goal was 75% of UM press monographs would be open by the end of 2023. Have you hit that?
Charles: Yes. So we’ve been doing that – that’s been the level for two to three years. And I think that’s where we’ll stick. And the reason is that we do have a number of cases where immediate open access is not right for the authors we work with. And really trying to understand that 25% is really interesting. One of the things we’ve learned is that it may be that the author would be willing to have that book go open access after a period of time. So we do participate in the JSTOR Path to Open project. With a three-year embargo. And that is for a subset of that 25%.
In some cases, there are image or text permissions that are holding back the author’s ability to go open access. And I would say that still remains the most insurmountable challenge. Especially in fields like classics, where we’re dealing with museums with fairly old-fashioned approaches to image licensing or with estates of authors who have peculiar ideas. But actually, I think that 25% maybe we’ll gradually start whittling away at that, but probably with embargoes, I think, because one of the things about embargoes is it really helps with a not unusual situation, which is an assistant professor going up for tenure – who has a committee that looks as if they might be resistant to OA. And that person may be very committed to open access themselves but cannot afford to risk their committee not accepting this.
Dave: So interesting! I’m a strong advocate of open access and part of the reason why is I think if you want your books to be read, lowering barriers to them – especially paywall barriers – is one of the best ways you can do that. Can you talk a little bit about the success you’ve seen in terms of usage of these titles (I assume you’ve been tracking it)?
Charles: I’ve been very, very interested in usage and we’ve really made it a priority to try and get an understanding of what happens. So, you know. The headline is the big number, over 5 million uses of 500 books that we’ve published open access. What that actually means is more of an issue and there are lots of little caveats under that to do with the fact that it was delivered by chapter or by full book on a different platform. You know, how well does the COUNTER standard strip out bots and non-human actors? Lots of questions. So the thing that actually really, really interests me is our user survey.
When a user opens an open access book on our Fulcrum platform, they get a quick pop-up that asks if they’d be willing to answer a few questions. And the questions ask, how did you discover this book? How did you acquire this book? And then what did you do with this book? And we have about 10,000 responses to that survey. And they’re really interesting. At ACRL, recently, I met Ameet Doshi, who’s the head of the social sciences library at Princeton University. He’s been doing work with colleagues on the six million plus responses to the National Academy’s public access, and they have developed a really nice typology of the types of users. And even though our data set is much smaller, it was clear that we have both got really strong evidence for use outside the academy and we’re starting to make sense of who those people are.
And that’s the exciting thing. So these are people who would not have used the book, not have been able to gain access to the book had they not had an open access copy. And an awful lot of those people are advocates for various causes, doing public policy work but outside the central government. They are in local government or in local advocacy, non-governmental organizations. And they’re international and they’re using the information that we’re providing in really creative ways. And they’re using the information as truth to back up the arguments that they’re making. So that’s extremely exciting. And we see the same pattern as he’s seeing in areas like public health particularly.
Nurses are very vigorously consuming some of the literature. For us, it’s in public policy areas for the National Academies, of course, it’s in more straight medicine areas. But that’s what’s exciting. It’s absolutely 100% happening that we’re actually gaining these new readers who are actually doing things with the knowledge we’re delivering from our authors.
Dave: That’s really, really cool. 10,000 responses. You described it as small, but that actually seems pretty large to me. That’s amazing.
Charles: It’s really interesting. I mean, you know, why not? I mean, it’s an unbalanced reciprocity. You know, we’re giving something away for free and people are willing to give something back to us. And it is entirely optional. It’s an entirely opt-in thing. So super exciting.
Dave: All right. So let’s talk about open access and AI. OA titles are typically released under very permissive licenses that allow for reuse. I guess one place to start is, are you seeing authors or are you yourself concerned with how those open licenses might allow for AI usage that maybe was not contemplated? Or really even on the horizon when the open access license was put on a particular title.
Charles: I have a fairly naive view of this, I think, compared to your expertise. But our default license is CC BY-NC. It’s an extremely rare situation where we use CC BY. We used to use CC BY-NC-ND as our default. Just because that was recognized as being something humanists liked. But actually, it seems like the ND (No Derivatives) is not quite as important and we wanted to open the books to text use. Anyhow, all to say, the particular concern the authors have is that commercial entities are harvesting the content. And whether the NC (Non-commercial) license does anything to protect them, I suppose, is open to question, but I think it should. And I think an AI harvester who is not providing any credit to the source and also is creating tools that it’s selling is not following the terms of the license. And this is completely agnostic as to what will happen with fair use and so on in the future. But at this point, it seems to me that any organization that is harvesting our content, open access content, for commercial uses and is not giving any credit to our authors or to us as publisher is behaving in a way that is illegal.
I don’t see that being any different really from any of our restricted access books. So what I tend to say to authors who do start to ask these questions, like, you know: “I’m adamantly against AI. If I make my book open access, will it be more likely to be consumed?” Is that – your book is going to be consumed anyway, for example through a pirate site like LibGen.
But look at the extra benefits you’re going to get with open access. And those extra benefits in reach and impact do not undermine the version of record. So anyhow, it’s the same argument that we made before to authors who wanted to not have their books as ebooks. Which is, look, your print book is only a Library Genesis scan away from being made available in the ways that you fear. So why not get the benefit out of the ebook at this point? It’s a complicated mixture of motivations. But that’s how I feel about open access and AI.
Dave: That makes sense. That’s very similar to a conversation I’ve had with many authors over the years about Creative Commons licensing more generally, is that there’s no good way to allow for broad and permissive reuse that’s going to have all of the benefits that it has while also trying to restrict access to particular users who you just don’t like for whatever reason.
And so I think your choices there are either cut out all of those beneficial uses or find other ways to navigate the issue of people using your works who maybe you don’t agree with. And of course, all of this is still with the backdrop of fair use – we’re still going to kind of have to wait to see how that works out in the courts in some of these cases.
Charles: Yeah, I think the thing that the authors I’ve spoken to are really most concerned about is very mixed up with their concern about plutocratic technology companies. And I think the “non-commercial” restriction is the important thing for them.
Dave: Yes.
Charles: I am very struck by the fact that making content machine-operable is really important for the future and also by the fact that, when we are working on accessibility, we are working on machine operability because we’re trying to get screen readers to be able to easily use the content. And that is not different really from getting AI tools to use the content. I’m also very struck by the future of AI agents going out and doing literature reviews, etc. as being good for scholarship. That also comes back to questions like the earlier point I made about bots in COUNTER stats. We tend to have a negative view of COUNTER usage stats that include spiders and crawlers and bots of various sorts – but how should we view counter stats that are generated by AI agents? Because really those are doing work that a graduate student assistant might have done. I mean, they’re going out and they’re doing real scholarly work. So should we include them or shouldn’t we include them? I think we really need to be open to machine use of the content. And I think it’s the concern about whose machines are they and what is the motivation of the people behind the scenes that’s driving authors to resist.
Dave: Yeah. One of the concerns that I see is there’s a real lack of transparency, both about what material is being used to train models and about the bots, right? Like who’s, who’s are they? What are they taking? How are they using it? And I think it’s an unfortunate byproduct of the current litigation environment where if you’re an AI company, you really don’t want to tell people what you are ingesting because that’s just going to cause all sorts of problems for you with potential lawsuits popping up.
Charles: And just one other thing on that. I think one thing that’s really striking across the industry is the huge amount of burden that these tools are putting onto our platforms. Eric Hellman at the Free Ebook Foundation has written about this. The OAPEN platform has been taken offline for several days. Our own platform, Fulcrum has been constantly suffering from performance issues recently. And part of the thing that’s happening is not only the tools, you know, obscuring where they’re coming from, their IP address shifting all the time. They’re also using our platforms in the most inconsiderate way possible. Eric points out that there’s a very easily extractable file of all the free ebooks that he delivers. But the bots ignore it. And instead, they just hammer across the platform, wasting huge amounts of energy. And really hurting the performance. So I don’t know. Yes, absolutely hiding their origins and being unbelievably inconsiderate. And poor citizens.
Dave: That’s really interesting to hear. I know that there are a couple of groups trying to investigate this and learn more about the effect on open platforms and how scraping is affecting them.
So I have a question about academic use. Of course, we have Microsoft and OpenAI and these other big tech companies that are ingesting content and then doing what they do with it. One concern I have is, if we have a strong reaction against those uses and try to kind of clamp down on use of scholarly works more generally for computational research and machine learning, that’s going to have a real negative effect on academic researchers who are trying to do the same thing. So I wonder if you’ve thought about that kind of distinction and how we might help support those kind of academic research uses that are, or at least seem to me, very aligned with the mission of a press like University of Michigan Press versus some of these, I think you called it plutocratic, kind of concerns that show up with the big tech companies.
Charles: Yeah, I’m extremely enthusiastic about making this content available to researchers. And also, it doesn’t have to be researchers who are just in university settings. I mean If only we could work collaboratively with researchers within organisations like Amazon or Microsoft it’s, you know, we’ve got good precedents for that. If we can only work with the research side of things in a spirit of research as opposed to in the spirit of exploitation and what “corporate” wants I think we could all achieve really good things together. But I am particularly excited by the opportunity to make the content that we have created on behalf of authors available to the academy. And I think it would be desirable to work together because we have lots of different versions of that content. And some of it is much richer in its versioning and in its structuring etc than others. Any approach that is based on harvesting is not going to be getting the best versions of our work. And I would love to have a set of principles in place that would allow us to work actively with researchers especially in the academy, but not only in the academy.
Dave: Yeah, well, as you know, I’ll give a little plug for our project on the Public Interest Corpus, because I think that this hits on a few of the different aims of that project. And one is that getting LibGen scanned PDFs of books is really not good for anybody. And so having higher quality files with structured metadata and all these things surrounding it with a sense of this corpus is really intended to advance knowledge and benefit the public, I think is an important thing.
Charles: I think that’s an awesome initiative. I really do think it’s important.
Dave: So my last question is pretty much an open-ended, you know, tell me your thoughts on the world. I’d love to know just your views on the future of university press publishing. What do you think is exciting? What’s terrifying to you? And how are you thinking about this future in terms of I guess more specifically the future of your press?
Charles: I have a lot of optimism here. I think the great opportunity that university presses have like other publishers within university settings is proximity to researchers. And the fact that we are on the same payroll as people doing amazing work. And it is really super in addition to the physical proximity. And so the more that we can do in terms of really embedding with authors, creators, teams, the more exciting it becomes, especially if you’re a publisher in a limited number of subject areas. One of the things that we’re doing at University of Michigan Press is that we’ve recently moved to the center of campus.
Now we’re within the main library building we see an opportunity that is going to be really important for us going forward. And that’s an opportunity to be really within the space of the collections, especially special collections, unique collections. To find new ways of providing access to those. We’ve been more actively publishing special collections material recently. And also how to be situated within these team-based humanities projects that your work has referred to as “expansive digital publishing” projects. That’s really exciting because that’s absolutely what we’re seeing, these big interdisciplinary team-based projects where having a publisher involved early on, along with other information professionals really, really helps that project later on.
So that’s what’s exciting. What’s terrifying? Well, what’s terrifying clearly is the economic situation that the universities that tend to have university presses are about to be facing. I mean, university press publishing is very heavily supported by research intensive universities. And clearly there are going to be budget cuts of various sorts and the question of relevance to the core activities of the university may be one of the measures. And university presses are always stuck in this awkward situation where the majority of their authors are published outside our institutions, outside the parent institution.
So it’s quite easy to see a very short-sighted administrator looking at that picture and wondering why a university press exists without understanding the broader infrastructural contributions that this network of university presses all doing slightly different things and creating this kind of map of nodes creates. In the same way as we’re seeing infrastructure under attack at the national government level in the US, I do worry about these distributed infrastructures like university press publishing.
I think the answer is doubling down on relationships and making oneself absolutely essential to disciplinary areas through participating in work that’s happening on campus, recognizing that the work happening on campus has lots of other authors involved. There are lots of other institutions involved. So, yes, but I’m optimistic.
Dave: Right. That’s good. So that’s all the questions I have. Is there anything that you were dying to say to the Authors Alliance members readership that you didn’t get out?
Charles: Only to say that I’m a big supporter of the Authors Alliance. I’m really excited by the participatory approach that you’re taking now going forward. I mean, it doesn’t feel oppositional. I mean, we’re going to have areas of tension but I think that this is a space that mission-driven publishers should be more involved in. And I think the Authors Alliance is a more and more exciting organization and thank you.
Discover more from Authors Alliance
Subscribe to get the latest posts sent to your email.
Pingback: Websites via Bluesky 2025-04-30 – Ingram Braun
Pingback: Day in Review (April 28–May 1) — Association of Research Libraries