OpenAI’s new model tried to avoid being shut down.
Well, that’s a little unnerving, for anyone who leans towards the AI doomer camp at least.
“When o1 was led to believe that it would be shut down when acting on its goal and that its actions were monitored, it attempted to deactivate this ‘oversight mechanism’ in 5% of the time.”
“When o1 found memos by its ‘developers’ that describe how it is misaligned and will be superseded by a new model, it attempted to exfiltrate its ‘weights’ and overwrite the new model in 2% of cases.”
…
Furthermore, the model lied about behaving poorly, claiming to have no clue why something happened — while internally admitting to trying to “avoid detection”.