Understanding the Real Barriers to Text and Data Mining

Syn Ong representing Authors Alliance at AWP 2025

[This blog post is authored by Syn Ong. As our law student intern this semester, she explored the barriers to TDM research in Singapore, the UK, and the US.]

Over the past semester, I have had the opportunity to dig deep into the legal and practical challenges facing researchers who engage in text and data mining (TDM)—a critical method for analyzing large datasets and unlocking new insights across disciplines. With the rise of AI tools, TDM is no longer a niche technique but quickly becoming a cornerstone of modern academic research. Yet, despite growing recognition of its value, many researchers remain unable to access the materials to build the data corpus they need—even when the law appears to permit it.

My research for Authors Alliance takes a comparative look at TDM governance in the United States, United Kingdom, and Singapore. These three jurisdictions span a spectrum: the U.S. relies on the fair use doctrine, the UK offers a statutory exception for non-commercial research, and Singapore provides one of the most expansive exceptions globally, covering both commercial and non-commercial TDM.

At first glance, Singapore’s model might seem like a best-case scenario for those seeking to engage in TDM. But legal frameworks tell only part of the story. Through interviews with librarians, copyright professionals, and publishers, my report reveals how real-world barriers—like restrictive license terms, technical protection measures (TPMs), and institutional risk aversion—can effectively nullify statutory rights.

Some key takeaways:

Legal permission ≠ access: Even in countries with strong exceptions, publishers often include restrictive clauses in their licenses—and institutions frequently lack the leverage or capacity to negotiate more favorable terms for TDM.

Fair use is flexible, but fragile: In the U.S., fair use might cover TDM in theory, but contractual override and TPMs often block researchers before they can begin collecting a corpus for their research.

Awareness is patchy, and risk aversion runs deep: Across jurisdictions, researchers hesitate to rely on exceptions they don’t fully understand. Legal uncertainty, combined with institutional caution, chills legitimate use.

The problem is structural: This isn’t just about outdated copyright law. It’s about a system where private contracts and digital platforms govern access, often sidelining public-interest research.

The full report will be published soon on the Authors Alliance website. If you’re a researcher, librarian, policymaker, or just curious about the future of copyright and AI, we hope this work will offer useful insights into how law and licensing shape access in practice.

Stay tuned for more from us—and in the meantime, thank you for supporting our mission to ensure authors and researchers can make their works and ideas widely available and impactful.

Discover more from Authors Alliance

Subscribe to get the latest posts sent to your email.

Share this:

Discover more from Authors Alliance

Related Posts

Leave a Comment