News
The AI blackmailed the engineer in 84% of simulations despite being told a more advanced replacement was imminent.
Anthropic claims Claude Opus 4 can compete with GPT-4.1 and Gemini 2.5, while Sonnet 4 outperforms its predecessor in ...
Anthropic said Claude Sonnet 4 achieved state-of-the-art (SOTA) on the SWE-Bench benchmark with a score of 72.7 percent.
This development, detailed in a recently published safety report, have led Anthropic to classify Claude Opus 4 as an ‘ASL-3’ ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results