News

How this year’s Pulitzer awardees used AI in their reporting

Nieman Lab · Andrew Deck · last updated

A visualization of Elon Musk’s political reinvention. A genealogical investigation that hinged on land grants from the mid-1800s. A visual forensics analysis that disproved the Israeli military’s narrative about killing two journalists. A years-long probe that produced a new database of people killed in the U.S. by police officers who used “less-lethal force.”

On May 5, the Pulitzer Prizes recognized these stories among its winners and finalists across 15 different journalism categories. For the second year in a row, the Prizes required that entrants disclose whether they relied on AI technologies. Each of these stories, which include one award winner and three finalists, disclosed AI usage to the judging committee.

Similar to last year’s cohort, generative AI tools were not well represented among these AI disclosures. Rather, reporters I spoke to who worked on these stories primarily used machine learning techniques that preceded the release of ChatGPT and rise of large language models (LLMs). Overall, many of the AI tools used this year count investigative reporters as their earliest adopters in newsrooms.

“At this early juncture, we see responsible AI use as a significant component in the increasingly versatile toolkit utilized by today’s working journalists,” said Marjorie Miller, the administrator of the Pulitzer Prizes, who also called attention to other tools represented among the winners, including statistical analysis, public record requests, and visual forensics. “[AI] technology, when used appropriately, seems to add agility, depth and rigor to projects in ways that were not possible a decade ago.”

Last year, I spoke to the first two Pulitzer winners to disclose using AI in their work. In another first, this year the Prizes required entrants in the Breaking News Photography and Feature Photography categories to submit their original camera-recorded files (and not just screenshots) alongside published images.

The new policy offers “a clear chain of custody for judges in those categories,” according to Miller, and went into effect as debates around AI manipulation and authorship took hold in photojournalism.

Mapping Musk’s political reinvention

Last year, The Wall Street Journal set out to visualize Elon Musk’s rhetoric on X (formerly Twitter). Since acquiring Twitter in 2022, Musk has increasingly used his personal account — and its more than 200 million followers — to advance his political agenda. The Journal’s reporting captured just how much the billionaire has reinvented his online persona in that time.

“Even though I had in the back of my head that we were going to see some move towards politics, the starkness of it really surprised me,” said John West, a computational journalist at the Journal who co-reported the analysis of more than 41,000 of Musk’s interactions on X. The story was one of several about Musk’s political influence to win the Journal staff a Pulitzer for national reporting this year.

The Journal started with a dataset from Clemson University’s Media Forensics Hub, which included most of Musk’s posts on X dating back to 2019. After some adjustment and additions, West and his colleagues converted this dataset into vectors, a machine learning technique that assigned them numeric values. This process captured the X interactions’ semantic relationships and placed them in space. The visualization of the vectors grouped posts with similar keywords together in clusters and showed that Musk’s rhetoric had shifted over time.

“The top is all immigration and politics and more divisive political issues, and the bottom is memes and Tesla stuff,” said West. “It shows the way his speech on Twitter has moved from Musk the business guy to Musk as political figure,” said West.

The reporting did not use generative AI models. Rather, West says they pulled both text and image embedding models — models used to create vectors — from Google’s Software Development Kit (SDK) “We wanted to capture the semantic meaning of memes,” said West, referencing Musk’s reputation for posting viral fodder to his account. “Image embedding models are quite powerful now. We couldn’t have done that a year before.”

Miller, the Pulitzer administrator, took note of the Journal’s innovative visualizations. “A decade ago, this would have been the somewhat opaque province of scholars in communications studies and the digital humanities,” she said. But a decade from now, “these tools may be available as turnkey products for independent journalists.”

For now, though, West says he finds it difficult to clarify the type of AI he uses in his work, even when explaining it to his own parents. “It’s not like we fed the data into ChatGPT,” he said, echoing comments from other Pulitzer awardees I spoke to. “AI has become synonymous with a generative style, transformer model. It would be great to have a word that means ‘a generative AI model did this thing. An AI model that’s more like machine learning did this thing.’ But we don’t really have that nuance available to us.”

Excavating land grants

A finalist for explanatory reporting, the series “40 Acres and a Lie” dug deep into Reconstruction history and exposed the legacy of the government program 40 Acres and a Mule. Journalists from the now-shuttered Center for Public Integrity, Reveal, and Mother Jones identified over a thousand formerly enslaved Black men and women who were given land after the Civil War, only to have it stripped away within a year and a half.

“[AI] tools helped us access information that we could not have gathered manually unless we’d had unlimited time to examine all 1.8 million digitized Freedmen’s Bureau records,” said Alexia Fernández Campbell, one of the lead reporters on the story.

Land titles in the archive hadn’t been indexed or clearly labeled and were handwritten in Spencerian script, a cursive style from the 1800s. To sort through them, Pratheek Rebala, a computational journalist, developed a custom image-recognition algorithm. He fed land titles and land registers the team had already identified as training data, then used the resulting model to search the entirety of The Freedmen’s Bureau records collection.

“That search turned up hundreds more names and land records than I had found manually,” said Fernández Campbell, who had already done extensive research using the collection. This combination of painstaking archival research and custom-built AI detection tools allowed the team to identify rough 500 additional formerly enslaved people who had received land grants. “The AI tools helped us broaden the scope of the project and show that the 40 Acres and a Mule program impacted more people than many had realized,” she said.

These records were also a starting line for the team’s genealogical investigations. They tracked down several descendants and spoke to them about the properties and inheritances that were taken from them.

Remote war reporting

For its story scrutinizing the Israeli military’s justification for killing two Al Jazeera journalists in Gaza, the visual forensic team at The Washington Post took a close look at drone and satellite footage from near the deadly strike. One of several stories recognized by the Pulitzers as a finalist for International Reporting, this work included consulting with Preligens, a geospatial AI firm that uses object detection models to identify military vehicles in satellite imagery.

The Israel Defense Force (IDF)’s narrative was that it had struck a “terrorist operating an aircraft that posed a threat to IDF troops.” That strike killed Hamza Dahdouh and Mustafa Thuraya, and severely injured two other Palestinian freelance journalists. On January 7, 2024, the day of the strike, Preligens found no military vehicles within 10 miles from where the journalists had been located. The analysis confirmed what drone footage revealed and what other analysts were telling the Post: There was no immediate threat to the IDF present at the time of the strike.

Increasingly, visual investigation teams in large newsrooms have been using AI models to analyze satellite imagery, particularly in Gaza, where Israel has banned most international journalists from entering and reporting on the ground. Last year, I spoke to a Pulitzer-winning computational journalist at The New York Times who used object detection models to analyze crater formations across Gaza and helped prove that the IDF was using some of its heaviest arsenal in areas marked as safe for civilians. The Post used Preligens’ model in a similar way, essentially as a pattern recognizer that could comb through large swaths of imagery.

Building a national database

In 2021, the year after the murder of George Floyd, The Associated Press launched an investigation into how many other people in the U.S. had been killed by law enforcement, without the use of firearms. The government had often failed to properly record and track these deaths, leaving the AP with the daunting task of building their own national database of “lethal restraint” cases. They recruited collaborators, including the Howard Center for Investigative Reporting at the University of Maryland, which ultimately led the investigation’s adoption of machine learning and AI tools.

Over three years, the team that reported the Investigative Reporting finalist “<ahref=”https://apnews.com/projects/investigation-police-use-of-force/”>Lethal Restraint” collected more than 200,000 pages of digital documents, including court filings, police reports, autopsies, and death certificates. Many were handwritten or included low-quality scans. Optical character recognition (OCR), a process that can extract text from images of documents, was essential to finding needles in this haystack. Tools like Amazon’s Textract, in particular, made thousands of documents more readable, and more searchable, for reporters.

“We looked for opportunities to get to key facts more efficiently,” said Sean Mussenden, the data editor at the Howard Center, of their use of machine learning tools like OCR. “They were critical to building the core database, and the core database was critical to every single story.” In particular, Mussenden pointed to how machine learning assisted in their work indexing the means and cause of death in over 1,000 “lethal restraint” cases, which exposed patterns of negligence and bias by medical examiners.

When the investigation kicked off, ChatGPT was over a year away from release, but as more generative AI tools hit the market the team at Howard decided to experiment. “It was also pretty incredible to be doing this work both before and after the generative language model explosion. It seemed like every day there was some new model or tool worth testing,” said Mussenden. Most notably, his team used Whisper, OpenAI’s speech recognition model, to transcribe audio from hundreds of hours of police body camera footage they collected, often attained through public records requests.

“We never directly published the output of these models. They were used as tools to get reporters closer to critical information, but there were multiple layers of human review — including a robust fact-checking process,” said Mussenden. “The most important reporting method, by far, was human reporters reading and extracting information from the documents.”