On Feb 5, 2026, we hosted a workshop on DMCA §1202 and Attribution Standards for AI. In brief, we wanted to have a conversation about how attribution standards should be developed and implemented in a category of technology that trends toward abstracting (forgetting? obscuring? diluting?) its inputs.
We think attribution is extremely important for a healthy information ecosystem. Without it, we lose the ability to robustly interrogate research that needs to be verified, validated, tested, retested, and scrutinized in all the ways that happen in a properly functioning system. If AI models move us away from verifiable systems and undermine or eliminate our ability to trust information, the results will be catastrophic.
With those stakes in mind, we brought together people who either are authors or represent authors, people who have deep expertise in copyright law, and people who are working closely with AI companies. We viewed the workshop as an entry point for a conversation and possible future interventions, rather than as a means to find an immediate solution to a problem.
Below, I will try to capture some of the questions I had going into the workshop, some of the observations and insights I gained from the workshop, and some possible future steps.
If we want to find a way to compel AI companies to take attribution seriously, how useful is §1202?
I think it’s fair to say that a lot of us don’t view §1202 as fit for purpose when it comes to incentivizing attribution in the AI space (in fairness, this was never its purpose!).
Pamela Samuelson, Erik Stallman, and Jennifer M. Urban have written a really illuminating and compelling article about §1202, Unbundling 17 U.S.C. §1202: Construing the Law’s Scope in Light of Its Text, Purpose, and Remedies, which unpacks the history of 1202, its intended purpose and how courts have interpreted it/managed its imprecision, and some of the perils that would emerge if it were applied too broadly.
Among the many strong points made in that article and in our workshop conversation about the utility of §1202 more broadly, here are some highlights:
- §1202 was really about “massive piracy via perfect copies.” In the late 1990s, when the internet was still in a fairly early stage, there was a lot of concern that you could make a perfect copy of a song, a book, or a movie and distribute that copy on a massive scale. Stripping copyright management information from the work was viewed as a key step toward massive infringement. The stripping of information from perfect copies to make widespread infringement possible was what §1202 was meant to address and we should read and understand the law with that context in mind.
- Section 1202 was poorly drafted and creates many risks of being overbroad. One workshop participant said that “1202 looks like it was drafted under duress” (that may have been the gentlest thing that was said about 1202 during the entire workshop). The poor drafting of 1202 is a problem in and of itself, but the problem is magnified because it’s quite easy to bring a claim under 1202. You don’t need to be a rightsholder to bring a 1202 claim and you don’t need to have a copyright registration. When the barrier to bringing a claim is that low, the potential for poorly drafted language to wreak havoc is high.
- If there aren’t useful tools to filter out claims at an early stage, §1202 could do a lot of damage in the AI space. The prevailing view in the room was that the “identicality requirement” as an analytical tool, “allowed courts to make the right inferences” about claims, and helped filter out claims that would undermine many legitimate uses of works. While many people in the room expressed skepticism about “identicality” as the test, there was high confidence that the purpose it serves is vital.
Even if §1202 is a bad fit for the attribution we want to see, might it still play a role in this space?
The short answer to this question is that it’s already playing a role and that could continue, regardless of how well it meets the need.
§1202’s prohibitions related to the removal of Copyright Management Information already feature in several ongoing AI/copyright lawsuits. Most recently, oral argument in Doe v. Github was heard in the 9th Circuit, on February 11, 2026. Briefs in that case can be found here (we submitted this amicus brief in support of defendants-appellees).
The certified question of identicality is one we’ve discussed previously and is explored in great detail in the briefs. The outcome of the current appeal will determine whether the defendants can successfully move to dismiss the plaintiff’s claim. If not, the lawsuit will move forward to more intensive, contentious, and expensive stages of litigation – discovery, pre-trial motions, and possibly trial.
The outcome of this case is important and will inform the litigation strategy of other parties in the 9th Circuit and elsewhere. If the defendants prevail on their motion to dismiss, we will either see fewer §1202 claims or a new breed of claims shaped by the court’s opinion.
If plaintiffs prevail and this case moves to discovery, and becomes more expensive to litigate, the impact of the lawsuit may be truly profound for researchers and small and medium sized AI companies. Long and protracted litigation requires considerable energy and financial resources – if it looks like most claims will survive a motion to dismiss, incentives to settle will go up, risk profiles will change, and a resource threshold for working in this space may emerge. There is a version of the future where section 1202 acts as a moat for larger AI companies, with the specter of bankruptcy inducing litigation/settlements hanging over all smaller actors. As advocates for researchers and smaller actors, we do not think that would be a good outcome.
Setting 1202 aside, how do we foster attribution in the ways we care about?
To begin, here are some of the facets of attribution that we know we care about:
1. Credit – Recognition of the creator’s contribution. For academic authors, this is vital to careers: tenure, promotion, reputation, and impact metrics all depend on it.
2. Provenance – Enabling readers/users to assess the quality, reliability, and authority of information. Peer-reviewed source vs. hallucinated text. This is especially urgent in an era of AI-generated content.
3. Discoverability – Surfacing source works so they can be found, read, and built upon.
4. Accountability/Verifiability – Allowing downstream users to check claims, trace reasoning, and hold producers of knowledge accountable.
5. Licensing and rights management – Identifying who holds rights, under what terms, so that permissions can be obtained when necessary.
6. Relational acknowledgment / scholarly norms – Citation as a prosocial act: “I could not have done this without your contribution.”
It was pretty clear from the conversation that Section 1202 was viewed as a distracting force, sapping energy, resources, and attention, rather than one that would lead to the types of attribution solutions we’re most interested in.
In the realm of solutions, one refrain we heard over and over again was that trying to force attribution at the training stage of AI is bound to be a dead end. The universal comment was “Inference is the place for solutions, not training.”
One anecdote that came out in the meeting demonstrates some of the complexity here and offers a nice illustration for why seeking to solve attribution problems at the training stage may lead to unintended and unwanted consequences. Some participants highlighted a model trained on Icelandic works, where copyright management information related to authorship had not been removed from specific works associated with those authors. There, researchers found that the resulting model was highly adept at replicating the literary style of specific authors, a result that went far beyond attribution and was not at all desired by the authors associated with the project. In that example, removing CMI was more closely aligned with the author’s best interests.
In addition to unintended consequences, it is widely believed that we can, at best, achieve only approximations of attribution via the training stage. Techniques like influence functions, data Shapley values, and TracIn can produce estimates of how much a given training example contributed to a particular model output. But what these methods produce is not attribution or “provenance” in any sense an information professional would recognize. They produce a statistical approximation—a fuzzy, reverse-engineered guess about the degree to which a training input nudged model weights in a direction that later influenced an output.
Instead of lawsuits and forcing attribution at the training stage, participants emphasized the importance of developing best practices and model attribution standards that could then be adopted widely. One might be cynical about this, but there was clearly a desire to better understand what the community wants and build around that, rather than merely a self-interested desire to avoid litigation.
What are the opportunities here/future good work to be done?
I must confess that I walked into the workshop hoping I might hear some version of “Oh, this is an easily solved problem” or “We know people who are working on this and they are very close to a solution.” I didn’t hear that. I don’t think a solution to this problem is a month away or just about to be released.
Instead, I heard a few things. One, authors and academic communities will likely need to do a healthy amount of the design around attribution standards.
At least two things are true here – (1) AI companies don’t want to expend a lot of resources coming up with a solution that nobody likes and (2) AI companies, as well resourced as many are, have an enormous number of competing priorities right now. Developer time is still a precious resource for them and a focus on attribution standards is not necessarily the way resources are currently being allocated. I make this point with little confidence that I have a clear or full picture of what priorities are highest right now, and I imagine it is highly variable – that acknowledged, it was clear that attribution could be a priority if a strong business case could be made, coupled with a path forward.
More optimistically, we heard that if anyone starts getting attribution right, and is celebrated for it, then other companies are likely to quickly follow (“stealing, but in the good sense.”) There wasn’t any opposition to attribution, but more a recognition that it may be challenging to truly get it right.
In discussing concrete pathways toward developing standards, one workshop participant highlighted the National Academies and American Academy of Arts and Sciences as potential future collaborators and a mechanism for both elevating the visibility of this issue and developing models to address it. Many in the room responded positively to this suggestion and it seems like a very good next step.
What this would ultimately look like is something we’re now beginning to think about. The National Academies, for example, have a process for Consensus Studies, which may ultimately provide the appropriate level of rigor to setting standards for attribution. Both organizations have robust mechanisms for engaging and convening stakeholders around common problems; the challenge before us is to work with their leadership to identify the best fit between this challenge and their mechanisms for taking on the work.
Side discussions with workshop participants who had greater insight into AI development made it clear that these standards, once established, might be helpful in future conversations with the product development leadership at the major AI companies. From what I could glean, “product leads” are the people who could respond best to a tightly defined set of standards, and make the business case for deploying them in future AI models.
Discover more from Authors Alliance
Subscribe to get the latest posts sent to your email.

