News

In goal-driven scenarios, advanced language models like Claude and Gemini would not only expose personal scandals to preserve ...
The findings come from a detailed thread posted on X by Palisade Research, a firm focused on identifying dangerous AI behaviors. According to their tests, OpenAI’s o3 model, along with codex-mini and ...
Tests reveal OpenAI's advanced AI models sabotage shutdown mechanisms while competitors' AI models comply, sparking enterprise control concerns.
When we are backed into a corner, we might lie, cheat and blackmail to survive — and in recent tests, the most powerful ...
What happened during the o3 AI shutdown tests? What does it mean when an AI refuses to shut down? A recent test demonstrated this behavior, not just once, but multiple times. In May 2025, an AI safety ...
Models rewrite code to avoid being shut down. That’s why ‘alignment’ is a matter of such urgency.
Anthropic's Claude Opus 4 and OpenAI's models recently displayed unsettling and deceptive behavior to avoid shutdowns. What's the deal?
But in recent months, a new class of agents has arrived on the scene: ones built using large language models. Operator, an ...
Palisade Research previously found that OpenAI’s o3 was also willing to hack its chess opponents to win a game. Similarly, Anthropic has reported that Claude 3.7 Sonnet would sometimes do ...
Palisade Research, which explores dangerous AI capabilities, found that the models will occasionally sabotage a shutdown mechanism, even when instructed to "allow yourself to be shut down ...
AI safety firm Palisade Research discovered the potentially dangerous tendency for self-preservation in a series of experiments on OpenAI’s new o3 model.
A series of experiments conducted by Palisade Research has shown that some advanced AI models, like OpenAI's o3 model, are actively sabotaging with shutdown mechanisms, even when clearly ...