News

Local news organizations discover the value of their own archives

Nieman Lab · Derek Willis · last updated

For the past several months, journalism students at the University of Maryland have spent a lot of time working with the archives of a local newspaper in our state. Their goal is to use previous stories to help build reporting guides — “beat books” — for the mostly inexperienced journalists who arrive in an unfamiliar place, not knowing the local context.

Their work, and the lessons I’ve drawn from it, are the source of my prediction for 2026: Local news organizations will begin to realize the financial value of their archives, including by selling access to companies that are building large language models and other artificial intelligence products.

Most local news organizations already place some value on their archives. The most common way they do this is by putting most of it behind a paywall, a reasonable choice in an era when many tech companies would be pleased to scrape the contents of any reliable source they can find.

But local news archives aren’t just a record of the past; they are in many cases irreplaceable civic infrastructure, a critical part of understanding communities. They are among the first places that researchers look when trying to piece together what happened decades ago or last month. And I don’t mean just stories: My students can see the value in publishing legal notices, event calendars, obituaries, and high school sports statistics. Together, they have real financial value to companies that want to provide their users with accurate local information.

Tech companies, especially those that seek useful and accurate training material for AI products, should pay for that access. Too often they have not, and publishers are right to enforce their copyright claims. There is another path, one in which local news organizations provide — for a price — programmatic access to their news. The idea is not new, but local news organizations are in a unique position compared to some of their larger colleagues. There are thousands of them, which means that tech companies who want to scrape their content have to do so at scale. That they are doing so — prompting a counter-measure from the CEO of Cloudflare — reinforces the value of the information.

Tech companies can say only so much about local news and events. The existing resources they have access to, legally or otherwise, do not provide the same depth or context as a local news organization. Reddit, NextDoor, and Facebook groups are useful, to a point. But they cannot offer the consistency and focus of local news organizations, even ones that have been forced to reduce their staff and curb their coverage.

To accomplish this, several things would need to happen. Mainly, this would require news organizations to think about how they publish and store their stories in new and different ways. It would mean treating archives as a first-class product of the newsroom, not a byproduct of it. That means exerting more direct control in many cases, which would require some technical knowledge and skills. And that’s before we get to the generative AI aspect, which comes with a ton of complications, not least of which is the decision to embrace an AI-heavy future. But local newsrooms have a uniquely valuable product to offer, and they should market it that way.

In working with the Maryland newspaper’s archives, my students have come to realize the depth and breadth of information in them, details that they would be hard-pressed to find anywhere else. These would be prohibitively expensive for tech companies to recreate and would improve their own offerings. I hope — and predict — that local news organizations will charge them for the privilege.

Derek Willis is a lecturer in data and computational journalism at the University of Maryland.