Researchers document AI firms’ pilfering of news sites
If they were stealing jewels or pirating movies, AI companies might be prosecuted.
But they face few consequences for ripping off news publishers, using copyrighted work and rarely providing even attribution.
This pilferage is documented by an “AI news audit” released Monday by Canadian researchers at McGill University in Montreal.
It found AI models to be quite knowledgeable about current news stories. But in queries involving web searches, they provided no source attribution in 82% of the responses.
Professors Taylor Owen and Aengus Bridgman at McGill’s Centre for Media, Technology and Democracy tested four major AI models to see how much they knew about current news stories in Canada and how much credit they give to outlets that originally reported the stories.
“AI companies have built commercial products that depend, in significant part, on the reporting that Canadian journalists produce,” the professors wrote. “They have done so without compensation, without attribution, and without any obligation to sustain the infrastructure they are drawing from. The result isa system that accelerates the economic decline of the journalism it relies on.”
Research like this, along with publisher lawsuits producing similar evidence of theft, should prod AI companies to pay up. If they don’t, government should step in and hold them to account.
The audit ran two tests. One examined the use of news to train AI models. Another looked at how the models cited news when they incorporated web searches into the answers they delivered.
They tested ChatGPT, Gemini, Claude and Grok on a sample of 2,267 Canadian news stories.
With web search enabled, 52% had at least one link to a Canadian news site but the source was named in the response text only 28% of the time.
When asked about a story from a specific outlet, the responses named the source 74% to 97% of the time. That indicates the companies are technically capable of naming sources but are making a “design choice” not to, the audit states.
“The chatbots surface journalistic content because it has accurate information … so these companies recognize the enormous value that journalism provides,” Bridgman said in an interview.
They are using it in consumer-facing products and “there should be acknowledgment and financial recognition of that value.”
Even if links are included in AI summaries, most people don’t click them. So AI companies are enabling people to “get the news” without visiting news sites. AI companies get the subscription and advertising revenue, instead of news sites that paid to report, edit and publish the stories.
Bridgman suggested the links could mostly be “a credibility building exercise” saying “you can trust us, because ‘look at our sources.’”
The audit found occasions where AI companies cited stories behind news sites’ paywalls “suggesting that paywalls may not block automated retrieval the way they block human readers,” it states.
Additional research is being done at McGill on the “piercing” of paywalls. Others have found that software guardrails to prevent AI companies from scraping news stories are widely ignored.
Bridgman noted that AI companies are using different approaches to answer queries about news.
In some cases, they act like ordinary people trying to learn about a story. If they come to a news site’s paywall, they may decline to pay and scrounge around the web trying to get the same information for free. Often they can find enough from various free sources to provide the gist of a story.
I guess if you wanted to avoid paying for a new movie in theaters, you could search for free trailers and snippets posted on social media. With powerful computers, you could quickly stitch them into an approximation.
Then, if you had no scruples, you could charge people for the service providing your Frankenstein version and not pay anything to the people who wrote, directed, edited and acted in the actual movie.
Eventually, there wouldn’t be any new trailers, snippets or movies.
There’s concern about that happening to local news, which is considered essential to civic literacy and democracy. But attempts to secure fair payment and help ensure its survival are routinely swatted down by the tech lobby and its allies.
Canada is one of the few countries to resist that pressure. Since 2023 it has required tech giants profiting from news to compensate publishers, under a policy called the Online News Act.
Google has since paid publishers $100 million Canadian per year. Meta chose to block news on its platforms in Canada, to avoid paying. Now Meta’s reportedly considering paying some publishers, on condition they oppose the legislation.After seeing the audit, Culture Minister Marc Miller said the Online News Act is about “people paying their fair share” and that principal doesn’t change with AI’s emergence, The Canadian Press reported.
“Having the news cannibalized and regurgitated undermines the spirit of the use of that news in the first place and the purpose for which it’s used and we have to have a serious conversation with the platforms that purport to use it including AI shops,” he said, per The Canadian Press.
A similar policy in the United States, the Journalism Competition and Preservation Act, had bipartisan support but stalled in Congress in 2023.
It’s past time for a new version of the JCPA, addressing how AI companies are changing the way people get information and preventing them from suffocating the local news industry.
To help get the ball rolling, I encourage academics in the U.S. to connect with Owen and Bridgman, who are willing to share their models, and produce similar audits here.
Such research won’t produce definitive answers to many of the questions around AI.
But like an unscrupulous chatbot, it should provide a fairly good idea of what’s happening.
Brier Dudley is editor of The Seattle Times Save the Free Press Initiative. This column was originally published on March 19th and is reprinted here with permission. Dudley’s work will appear regularly on the Medill Local News Initiative site.