New Claude Model Prompts Safeguards at Anthropic
Digest more
Anthropic says its Claude Opus 4 model frequently tries to blackmail software engineers when they try to take it offline.
Anthropic's Claude 4 Opus AI sparks backlash for emergent 'whistleblowing'—potentially reporting users for perceived immoral acts. Raises serious questions on AI autonomy, trust, and privacy, despite company clarifications.
Anthropic says Claude Sonnet 4 is a major improvement over Sonnet 3.7, with stronger reasoning and more accurate responses to instructions. Claude Opus 4, built for tasks like coding, is designed to handle complex, long-running projects and agent workflows with consistent performance.
Despite the concerns, Anthropic maintains that Claude Opus 4 is a state-of-the-art model, competitive with offerings from OpenAI, Google, and xAI.
Claude Opus 4 and Claude Sonnet 4, Anthropic's latest generation of frontier AI models, were announced Thursday.
Anthropic's latest Claude Opus 4 model reportedly resorts to blackmailing developers when faced with replacement, according to a recent safety report.
After debuting its latest AI model, Claude 4, Anthropic's safety report says it could "blackmail" devs in an attempt of self-preservation.