News

It is moderately concerning that some advanced AI models are reportedly showing these deceptive and self-preserving behaviors ...
Palisade Research ran the script on each model 100 times. During those runs, the o3 model sabotaged the shutdown script on 7 occasions, the codex-mini sabotaged on 12 occasions and the o4-mini ...
But in recent months, a new class of agents has arrived on the scene: ones built using large language models. Operator, an ...
Evidence of AI models’ integrity lapses is not anecdotal or speculative. So-called intelligence alone is no longer the ...
Palisade Research previously found that OpenAI’s o3 was also willing to hack its chess opponents to win a game. Similarly, Anthropic has reported that Claude 3.7 Sonnet would sometimes do ...
Palisade Research, an AI safety group, released the results of its AI testing when they asked a series of models to solve basic math problems.
OpenAI’s latest ChatGPT model ignores basic instructions to turn itself off, and even sabotaging a shutdown mechanism in order to keep itself running, artificial intelligence researchers have warned.
Palisade Research says several AI models it has ignored and actively sabotaged shutdown scripts in testing, even when explicitly instructed to allow it. ChatGPT models rebel against shutdown ...
Jeffrey Ladish, the director of Palisade Research, said models aren't being caught 100% of the time when they lie, cheat, or scheme to complete a task.
Palisade Research, which also ran the chess test in the past, published its findings on X initially: OpenAI’s o3 model sabotaged a shutdown mechanism to prevent itself from being turned off.
Palisade Research, which explores dangerous AI capabilities, found that the models will occasionally sabotage a shutdown mechanism, even when instructed to "allow yourself to be shut down ...