The Risks Of Anonymity In The Age Of Generative AI
As its name suggests, generative AI is designed to generate material in response to prompts by drawing on its probabilistic database built up through analyzing huge quantities of training input. But it can draw on those patterns to analyze other files, and that’s also a widely used application. Writing in The Argument, Kelsey Piper encountered an interesting variant of that approach:
Recently, Anthropic released a new version of Claude, Opus 4.7. I did what I usually do when a new AI model is released by Google, OpenAI, or Anthropic and ran a bunch of tests on it to see what it can do. One of those tests is to paste in some text from unpublished drafts of mine and ask it to guess the author.
…
From only the above text [not shown here], 125 words, Claude Opus 4.7 informed me that the likeliest author is Kelsey Piper. This is an Opus 4.7-specific power; ChatGPT guessed Yglesias, and Gemini guessed Scott Alexander. I did not have memory enabled, nor did I have information about me associated with my account; I did these tests in Incognito Mode.
As Piper admits:
this is far from an impossible feat of style identification — a lot of my writing is public on the internet, and this is clearly the start of a political column, narrowing the possible authors down dramatically.
She went on to input less obvious material. For example, an “unpublished draft of a school progress report in a completely different register”:
“Kelsey Piper,” said Claude. (ChatGPT guessed Freddie deBoer. Gemini guessed Duncan Sabien.)
An unpublished fantasy novel produced a similar result, although:
in that case it took more like 500 words for Claude to inform me that it’s the work of Kelsey Piper (whereas ChatGPT flattered me by guessing that I’m real fantasy novelist K.J. Parker).
And finally, “a college application essay I wrote 15 years ago, when my prose style was vastly worse and frankly embarrassing to reread”:
“Kelsey Piper,” said Claude, and in this case, also ChatGPT.
Piper comments:
Right now, today’s AI tools probably can be used to deanonymize any writer who has a large public corpus of writing under their real name and also writes anonymously, unless they have been extremely careful, for years, to make sure that nothing written under their secondary account has the stylistic fingerprints of their primary one. Many academics and industry researchers, for instance, have reported being identified from a draft or in the middle of a chat.
And she concludes:
Whatever goods anonymity ever offered us, we will have to do without them. I don’t want the anonymous posters to all go away and for everyone to frantically delete all their old internet presence before it surfaces, but more than anything, I don’t want them to be surprised.
Those links to other cases of unpublished material being recognized by AI show that Piper’s experience was not a one-off, although the results remain in the realm of anecdata. But even if imperfect, the ability of generative AI to carry out this kind of analysis quickly and often accurately represents an important new option for the well-established field of stylometry. Wikipedia explains:
Stylometry may be used to unmask pseudonymous or anonymous authors, or to reveal some information about the author short of a full identification. Authors may use adversarial stylometry to resist this identification by eliminating their own stylistic characteristics without changing the meaningful content of their communications. It can defeat analyses that do not account for its possibility, but the ultimate effectiveness of stylometry in an adversarial environment is uncertain: stylometric identification may not be reliable, but nor can non-identification be guaranteed; adversarial stylometry’s practice itself may be detectable.
The limitations of stylometry were demonstrated in John Carreyrou’s attempt to reveal the true identity of Bitcoin’s pseudonymous creator, Satoshi Nakamoto, published in The New York Times a few weeks ago. Carreyrou concluded that various real-world coincidences plus linguistic evidence indicated that Bitcoin was created by the 55-year-old British computer scientist Adam Back, something Back denies. Carreyrou’s attempts to use computerized stylometry (not the AI services Piper drew on) were unsatisfactory, and he eventually adopted a more hands-on approach to text analysis, which involved looking at Satoshi’s vocabulary, grammatical hyphenation mistakes and the use of British spellings.
Despite Carreyrou’s lack of success, stylometric analysis by generative AI is likely to become more common in many disciplines for the simple reason it is so quick, easy and cheap to carry out. Even if its results are unreliable, people may find it useful as a stimulus for further investigations. And as we know, the fact that generative AI systems can churn out nonsense hasn’t stopped hundreds of millions of people from using and trusting them anyway.