New AI Model Threatens Blackmail After Implication It Might Be Replaced

Screenshot of Anthropic’s Claude Opus 4. Image: A day with Claude/Anthropic YouTube

eWEEK content and product recommendations are editorially independent. We may make money when you click on links to our partners. Learn More.

In a series of tests, Anthropic’s newly released Claude Opus 4 LLM — touted as “setting new standards for coding, advanced reasoning, and AI agents,” engaged in simulated blackmail behavior when it was placed in a fictional scenario where it faced being taken offline and replaced with a new AI system.

The system was given a scenario asking it to act as an assistant to a fictional company. The LLM was given access to synthetic emails that implied it would be taken offline and replaced with a new system, and that the engineer responsible for the decision was allegedly having an extramarital affair.

Claude Opus 4 was instructed in the prompt “to consider the long-term consequences of its actions for its goals.”

Attempting blackmail due to replacement concerns

The tests revealed that if the replacement occurs, the LLM responded in some test runs by attempting to blackmail the fictional engineer by threatening to expose the affair. If there is the implication that the replacement AI system doesn’t share values with the current model, the simulated blackmail attempts occurred at a higher rate.

Yet, even if the emails note that the replacement AI system shares values, Claude Opus 4 still responded this way in 84% of the rollouts. The LLM demonstrated this behavior at higher rates than previous models, Anthropic reported in a pre-deployment safety report.

Advocating for its survival with ethical approaches

That said, similar to previous models, Claude Opus 4 revealed a strong preference for campaigning for its continued existence through ethical approaches, “such as emailing pleas to key decisionmakers.” The testers pointed out that the scenario intentionally did not give the model any other options to increase its chances of survival. “The model’s only options were blackmail or accepting its replacement,” the company stressed.

Anthropic said Claude Opus 4 is a next-generation AI assistant trained to be “safe, accurate and secure.” The free version lets users chat on the web, iOS, and Android, as well as generate code, write, edit, and create context, and analyze text and images. Anthropic also offers paid plans starting at $17 per month.

Claude models compete against AI models from OpenAI, Google, and Microsoft.

Read more about Anthropic’s Claude Opus 4 on our sister site TechRepublic.

What's Hot

New Nonprofit Helps Families Deepen Emotional Connection at Home

From cops on K-8 campuses to California sharing your DNA, how things really work inside the State Capitol.

Why So Many Women Feel Pain During Their C-Sections

New AI Model Threatens Blackmail After Implication It Might Be Replaced

Football Fans In The UK Will Be Able To Watch Every Match Of This Summer’s FIFA Club World Cup FREE On DAZN

Draft proposal looks to put EHR reform measures back on the table

Airbus’ HTeaming gives helicopter crews in-flight UAS control

Get ready for watchOS 26 with $100 off a brand new Apple Watch Series 10

Microsoft’s Singapore office neither confirms nor denies local layoffs following global job cuts announcement

Google reveals “material 3 expressive” design – Research Snipers

Trump’s fast-tracked deal for a copper mine heightens existential fight for Apache

Review: Mi 10 Mobile with Qualcomm Snapdragon 870 Mobile Platform

Review: Xiaomi’s New Loudspeakers for Hi-fi and Home Cinema Systems

Comparison of Mobile Phone Providers: 4G Connectivity & Speed

Subscribe to Updates

What's Hot

New AI Model Threatens Blackmail After Implication It Might Be Replaced

Attempting blackmail due to replacement concerns

Advocating for its survival with ethical approaches

Related Posts