News

AI-powered search is fueling a wave of Epstein Files transparency projects

Nieman Lab · Andrew Deck · last updated March 4, 2026 – 7:25 AM

The Epstein Files Transparency Act (EFTA) requires that the millions of documents collected by the Department of Justice (DOJ) about Jeffrey Epstein be shared with the public in a “searchable and downloadable” format. In practice, though, the searchability of the DOJ releases has been crude at best. Keywords may turn up individual links to PDFs, but users have reported major search malfunctions and limitations handling the documents at scale.

As the American public and people around the world try to understand the over 3 million pages of documents, 180,000 images, and 2,000 videos in the latest Epstein Files drop, these search limitations are a serious barrier to entry. In the vacuum created by the DOJ, journalists and engineers have stepped in to fulfill EFTA’s transparency promise. Many of them are using AI tools to create alternative databases and release them for the general public — making the files more easily searched, analyzed, and understood by the average person.

Take Jmail, an interactive archive that has transformed the dense email PDF files of Epstein’s emails into a familiar, searchable Gmail-style inbox. Last month, Riley Walz, one of Jmail’s creators, announced that the Jmail website had surpassed 450 million page views.

“They’re trying to get as many eyes on [the Epstein Files] and as much public awareness, knowledge, and understanding of this as possible,” Dan Rosenheck, the editor of The Economist’s data team, said of Jmail and the work of its volunteer engineers. “They built something that the public can use directly, rather than having it be intermediated by journalists, basically having it be in a format that so many people use in their everyday life.”

These types of AI-powered transparency projects have only become more important as trust in government institutions and the Trump administration’s handling of the files erodes. Last week, NPR reported that the DOJ intentionally withheld and removed documents in the Epstein Files that named Donald Trump, including an accusation by a woman that he had sexually abused her when she was a minor.

“In a time where people feel the government’s not being transparent, I think it’s even more important for media outlets to provide that service, to give people access, to feel empowered and feel like they can take control over the information that’s out there,” said Camaron Stevenson, a national correspondent for Courier Newsroom. (Courier’s publisher and CEO, Tara McGowan, is a former Democratic political strategist, and Courier newsrooms explicitly support Democratic candidates in battleground states.)

Stevenson built two public searchable databases using files released by the DOJ and Congress last year, before the most recent DOJ drop. He said the project has given his readers a sense of agency in a news cycle that often leaves them feeling powerless. It has also returned hundreds of tips to fuel his investigative reporting.

Building AI document search for readers

Since the first Epstein Files were released last year, newsrooms have been using machine learning and LLMs to parse documents and find story leads.

Earlier this month, New York Times AI projects editor Dylan Freedman explained how he and his colleagues built “bespoke software applications” to help reporters search photos visually, identify document duplicates, and generate video and audio transcripts. The Times has also been using a proprietary search tool developed by its Interactive News desk to break news about the files and comb through the documents for investigative leads.

“If we had 50 reporters reading 500 documents a day [it] would take us four months to get through all the documents,” said Nicholas Confessore, an investigative reporter for The New York Times in a recent episode of The Daily, talking about the January DOJ release. “And that’s just to read them.”

The Guardian and the BBC have also been using similar proprietary search tools, according to the Reuters Institute for the Study of Journalism.

While these AI tools frequently serve reporters, the same AI technologies and techniques are increasingly being used to build reader-facing products about the files.

Last October, after Congress released 20,000 documents from Epstein’s estate, Courier’s Stevenson uploaded the files to Google Pinpoint, which is marketed as a free “AI research tool for journalists.” Pinpoint uses optical character recognition (OCR) to make text across thousands of documents machine-readable. It also leans on Google’s Gemini models to generate transcriptions of audio files.

Stevenson was able to use his Pinpoint database to do keyword searches for relevant people and organizations in the files, and search images using basic descriptions. It also helped him maintain a stable archive of files, as the Trump administration continued to redact and remove documents from the official DOJ archive in the weeks after their release. After the first DOJ drop in December 2025, Stevenson created a second database using Pinpoint.

Homepage of Camaron Stevenson's Google Pinpoint project for the Epstein Files.

Rather than limiting access to these Pinpoint projects to Courier staff, Stevenson published both on their site and shared them on social media. The posts included a call to action, asking Courier readers to flag any documents of interest they found using the tool. Some tips Stevenson received have directly supported his reporting, including coverage of Epstein’s ties to Jes Staley, a former J.P. Morgan executive.

“Even the ones where it’s not necessarily something I can use for a story, it’s been very helpful to build trust with the general public and restore faith in our broader institutions in a way that people feel the government is failing to do,” Stevenson said.

Pinpoint does have serious limitations. The tool can’t handle video and has limited photo-processing capabilities. There are also caps on the number of documents that users can upload.

“I can only upload 250,000 documents, which in any other case would be fine,” said Stevenson. The three million pages of documents in the most recent DOJ dump, however, blew past that cap. Stevenson says he’s been taking calls with engineers and companies who reached out to offer their services to find an alternative hosting platform or build a custom tool to continue the project. “We’re in the process of developing something we can use and the public can use because, unfortunately, a free tool from Google is not going to cut it anymore.”

“Our project was being used to go after the perpetrators”

Jmail has leaned on over a dozen volunteer engineers to tackle the new DOJ release, according to Luke Igel, one of its co-creators and the CEO of Kino AI.

At first, Igel says, he and collaborators ran the files through Cursor, a generative AI product built on top of Anthropic’s Claude models. Routine errors pushed the team to start using a more boutique PDF extraction tool. For most of the files, they have used tools built by the startup Reducto AI.

Faced with a PDF of an email from Epstein — or in many cases a PDF of a photocopy of a print-out of an email from Epstein — Reducto was able to identify and pull out the subject line, sender, and body of the email. Jmail engineers then used the corresponding JSON data to populate their Gmail imitation app. In the months since Jmail’s launch last November, the team has released several spin-off projects that mimic Google’s suite of products, including JPhotos for images in the files, JeffTube for videos, and JFlights for Epstein’s flight records and passenger lists.

While Igel does not consider himself a journalist, he does subscribe to a similar set of values. His biggest inspiration for the project, he says, is the Pentagon Papers. Before the Supreme Court ruled that The New York Times and The Washington Post could report on the leaked military documents, the Pentagon Papers circulated throughout Washington, D.C.

“Senator Mike Gravel, to get it in the Congressional Record, just had to read it out loud,” said Igel, referring to when the then-Alaskan Senator read aloud 4,100 pages of leaked government documents about the Vietnam War during a Congressional subcommittee in 1971. “Everyone was so terrified of what would happen — the legal and social consequences of just posting such insane leaked materials.”

Igel sees Jmail as, similarly, entering the Epstein Files into the public record.

Epstein Files Jmail homepage

Even with custom PDF extraction tools, though, errors can slip through. LLM hallucinations and more basic OCR or transcription errors are a risk with any AI-powered tool that touches the files. The risks are only compounded when these tools are released directly to the public, without a layer of verification or fact-checking.

“It’s impossible for us to make sure that every single email is correctly verified,” Igel told me, speaking to the resource constraints of the small volunteer team. He emphasized that Jmail has a button at the upper right hand corner of every email where users can click through to see the original source in the files. “That’ll immediately give you more trust in our system, because you’ll see, this is completely one-to-one. Then on the off case where it might not be good, you can click and see the original.”

To avoid any further violations of victim privacy by the DOJ, Igel also says the Jmail team has been “proactively and reactively” redacting names that should not have been released.

The Epstein Files is part of an information ecosystem that is already riddled with misinformation and speculation. Whether to maintain editorial standards or avoid liability, most major news organizations have chosen not to take on the risk of publishing inaccuracies or breaching privacy by making their internal AI search tools available to readers.

Still, Jmail has collaborated with several news organizations since its launch last fall. In February, The Economist published a story that analyzed and visualized Jmail’s underlying data. The story identified the 500 most-represented public figures in Jeffrey Epstein’s emails and organized them by industry, showing how frequently notable people like Michael Wolff, Ariane de Rothschild, and Sultan bin Sulayem were in touch with Epstein.

“As a data journalist working on a project, I wanted the giant CSV and to have it be as correct as possible,” said Rosenheck, explaining that his team worked with Jmail’s team to vet the structured data for accuracy before publication. “It was very fortunate that, when we got this idea, someone else had gotten 90% of the way there already.”

Drop Site, a Substack founded by former Intercept investigative reporters Ryan Grim, Jeremy Scahill, and Nausicaa Renner, has also collaborated with Jmail. Last fall, Drop Site obtained access to Epstein’s Yahoo email account archives via the nonprofit organization Distributed Denial of Secrets (DDoSecrets), a leaked dataset separate from the DOJ and Congressional Epstein Files releases. While many of the Yahoo emails had first been obtained and reported on by Bloomberg, they had not been released.

After connecting with Igel and his team, Drop Site published their files on Jmail’s website in December, making their cache of Yahoo emails publicly available and searchable.

Last month, Les Wexner, a retail billionaire who the FBI once labeled a co-conspirator of Epstein’s, was asked about one of those emails during a Congressional deposition. Wexner emailed Epstein right after his 2008 sex crimes conviction, writing, “You violated your own number 1 rule… always be careful.” Through the deposition, the exchange was entered into the congressional record.

“This is something that you could only find on Jmail,” said Igel. “It was very satisfying to see that the more original stuff in our project was being used to go after the perpetrators.”

Photo of Epstein Library search bar on DOJ website by Lucky Pics used under an Adobe Stock license.

View full article

AI-powered search is fueling a wave of Epstein Files transparency projects

Building AI document search for readers

“Our project was being used to go after the perpetrators”

here today, more tomorrow