
The University of Texas at Arlington Photograph Collection
This is a post by Syn Ong, AI Policy Researcher at Authors Alliance.
Authors increasingly rely on text and data mining (TDM) to analyze large corpora across disciplines. Our new working paper, Beyond the Exception: Licensing, Access, and the Realities of Text and Data Mining in the US, UK, and Singapore, finds that formal legal permissions alone do not secure usable access for TDM research. Instead, usable access turns on how statutory rules interact with private licenses, platform architectures, and technological protection measures (TPMs).
Scope and methods
The paper combines doctrinal analysis with an empirical component: six semi-structured interviews, two in each jurisdiction, with librarians, licensing officers, university counsel, and publishing professionals. Interviews were anonymized and thematically coded around three lenses used throughout the paper: (1) the legal framework, (2) the licensing environment, and (3) technical and infrastructural conditions.
Key findings
- United States. There is no dedicated TDM exception; researchers rely on fair use. In practice, private contracts often restrict TDM, and DMCA § 1201 generally prohibits circumvention of access-control TPMs, with only narrow, time-limited TDM exemptions. The result is a “negative space” in which TDM may be lawful in principle yet blocked in operation.
- United Kingdom. CDPA § 29A authorizes non-commercial TDM and voids contractual override, but there is no right to circumvent TPMs. Ambiguity around “non-commercial” and uneven institutional capacity produce uneven access across universities.
- Singapore. Copyright Act 2021 §§ 243–244 (CDA) covers commercial and non-commercial TDM and limits contractual override, yet Part 8 anti-circumvention rules prohibit bypassing access-controls. Foreign-law licences and bargaining asymmetries can blunt the on-paper breadth of the CDA.
Interviewees reported that uncertainty about TPM circumvention and contractual limits, rather than the legality of analytical use per se, most often determines whether projects proceed.
Recommendations
- Law and policy. In the U.S., agencies and professional bodies should publish plain-English guidance confirming that research TDM can qualify as fair use, and the government should make the DMCA § 1201 exemptions easier for researchers to locate and use – for example, by publishing plain-English guides or educational materials explaining how the exemptions work in practice. In the UK, government should explain what “non-commercial research” actually covers and make the TPM complaints process fast, transparent, and usable. In Singapore, authorities can help universities enforce the CDA by issuing model contract language, baking CDA-respecting terms into agreements with vendors, and offering a central help desk for tricky cases.
- Licensing practice. Libraries should work through existing consortia to push for aligned, TDM-friendly terms. A short addendum can do most of the work: confirm that automated analysis is allowed on content the institution can legally access, require a workable route for access (API or bulk download with reasonable limits), allow researchers to keep and share non-expressive outputs (like embeddings or term counts), and prevent contract language from stripping away rights granted by law.
- Infrastructure. Legal permission only helps if systems exist to use it. Universities and publishers should provide secure, auditable TDM platforms with sensible permissions and logging. Because researchers often need to analyze materials across multiple vendors, these platforms should be interoperable and support cross-collection workflows. When bulk access isn’t available, prefer publisher APIs or “contained” analysis environments with clear rate limits and uptime so research is predictable and reproducible.
Access
The paper is open access and available here. Authors Alliance welcomes reader feedback and additional case studies to inform future guidance for research communities.
Discover more from Authors Alliance
Subscribe to get the latest posts sent to your email.