News commentary

Opinion | Anthropic C.E.O.: Don’t Let A.I. Companies off the Hook

nytimes.com · last updated

Picture this: You give a bot notice that you’ll shut it down soon, and replace it with a different artificial intelligence system. In the past, you gave it access to your emails. In some of them, you alluded to the fact that you’ve been having an affair. The bot threatens you, telling you that if the shutdown plans aren’t changed, it will forward the emails to your wife.

This scenario isn’t fiction. Anthropic’s latest A.I. model demonstrated just a few weeks ago that it was capable of this kind of behavior.

Despite some misleading headlines, the model didn’t do this in the real world. Its behavior was part of an evaluation where we deliberately put it in an extreme experimental situation to observe its responses and get early warnings about the risks, much like an airplane manufacturer might test a plane’s performance in a wind tunnel.

We’re not alone in discovering these risks. A recent experimental stress-test of OpenAI’s o3 model found that it at times wrote special code to stop itself from being shut down. Google has said that a recent version of its Gemini model is approaching a point where it could help people carry out cyberattacks. And some tests even show that A.I. models are becoming increasingly proficient at the key skills needed to produce biological and other weapons.

 

None of this diminishes the vast promise of A.I. I’ve written at length about how it could transform science, medicine, energy, defense and much more. It’s already increasing productivity in surprising and exciting ways. It has helped, for example, a pharmaceutical company draft clinical study reports in minutes instead of weeks and has helped patients (including members of my own family) diagnose medical issues that could otherwise have been missed. It could accelerate economic growth to an extent not seen for a century, improving everyone’s quality of life. This amazing potential inspires me, our researchers and the businesses we work with every day.

Sign up for the Opinion Today newsletter  Get expert analysis of the news and a guide to the big ideas shaping the world every weekday morning. Get it sent to your inbox.

But to fully realize A.I.’s benefits, we need to find and fix the dangers before they find us.

Every time we release a new A.I. system, Anthropic measures and mitigates its risks. We share our models with external research organizations for testing, and we don’t release models until we are confident they are safe. We put in place sophisticated defenses against the most serious risks, such as biological weapons. We research not just the models themselves, but also their future effects on the labor market and employment. To show our work in these areas, we publish detailed model evaluations and reports.

But this is broadly voluntary. Federal law does not compel us or any other A.I. company to be transparent about our models’ capabilities or to take any meaningful steps toward risk reduction. Some companies can simply choose not to.

Right now, the Senate is considering a provision that would tie the hands of state legislators: The current draft of President Trump’s policy bill includes a 10-year moratorium on states regulating A.I.

The motivations behind the moratorium are understandable. It aims to prevent a patchwork of inconsistent state laws, which many fear could be burdensome or could compromise America’s ability to compete with China. I am sympathetic to these concerns — particularly on geopolitical competition — and have advocated stronger export controls to slow China’s acquisition of crucial A.I. chips, as well as robust application of A.I. for our national defense.

 

But a 10-year moratorium is far too blunt an instrument. A.I. is advancing too head-spinningly fast. I believe that these systems could change the world, fundamentally, within two years; in 10 years, all bets are off. Without a clear plan for a federal response, a moratorium would give us the worst of both worlds — no ability for states to act, and no national policy as a backstop.

A focus on transparency is the best way to balance the considerations in play. While prescribing how companies should release their products runs the risk of slowing progress, simply requiring transparency about company practices and model capabilities can encourage learning across the industry.

At the federal level, instead of a moratorium, the White House and Congress should work together on a transparency standard for A.I. companies, so that emerging risks are made clear to the American people. This national standard would require frontier A.I. developers — those working on the world’s most powerful models — to adopt policies for testing and evaluating their models. Developers of powerful A.I. models would be required to publicly disclose on their company websites not only what is in those policies, but also how they plan to test for and mitigate national security and other catastrophic risks. They would also have to be upfront about the steps they took, in light of test results, to make sure their models were safe before releasing them to the public.

Anthropic currently makes such information available as part of our Responsible Scaling Policy, and OpenAI and Google DeepMind have adopted similar policies, so this requirement would be codifying what many major developers are already doing. But as models become more powerful, corporate incentives to provide this level of transparency might change. That’s why there should be legislative incentives to ensure that these companies keep disclosing their policies.

Having this national transparency standard would help not only the public but also Congress understand how the technology is developing, so that lawmakers can decide whether further government action is needed.

 

State laws should also be narrowly focused on transparency and not overly prescriptive or burdensome. If a federal transparency standard is adopted, it could then supersede state laws, creating a unified national framework.

We can hope that all A.I. companies will join in a commitment to openness and responsible A.I. development, as some currently do. But we don’t rely on hope in other vital sectors, and we shouldn’t have to rely on it here, either.

This is not about partisan politics. Politicians on both sides of the aisle have long raised concerns about A.I. and about the risks of abdicating our responsibility to steward it well. I support what the Trump administration has done to clamp down on the export of A.I. chips to China and to make it easier to build A.I. infrastructure here in the United States. This is about responding in a wise and balanced way to extraordinary times. Faced with a revolutionary technology of uncertain benefits and risks, our government should be able to ensure we make rapid progress, beat China and build A.I. that is safe and trustworthy. Transparency will serve these shared aspirations, not hinder them.