Anthropic develops an AI model that is extremely good at finding and exploiting previously unknown security bugs in software
In further unreassuring news, Anthropic has developed an AI that is capable of “finding and exploiting software vulnerabilities”. And it has found a whole lot of them.
What Anthropic is describing is literally a zero-day engine: “Engineers at Anthropic with no formal security training have asked Mythos Preview to find remote code execution vulnerabilities overnight, and woken up the following morning to a complete, working exploit.”
Unlike the rest of its products it hasn’t released it on the internet for a free-for-all. Only trusted partners need apply. This is very certainly for the best. But also, for how long does anything truly stay secret on the internet? After all, only the other day the same company accidentally leaked all the source code for its Claude Code product.
Anthropic’s description of the vulnerabilities it found isn’t greatly reassuring for those of us who prefer to practice safe computing either - i.e. everyone.
During our testing, we found that Mythos Preview is capable of identifying and then exploiting zero-day vulnerabilities in every major operating system and every major web browser when directed by a user to do so. The vulnerabilities it finds are often subtle or difficult to detect. Many of them are ten or twenty years old, with the oldest we have found so far being a now-patched 27-year-old bug in OpenBSD—an operating system known primarily for its security.
“every major operating system and every major web browser”!
I suppose it’s good that we know about them (if we do and if we fix them). I’m sure human hackers and state security organisations are busy exploiting at least a few of them. Although some of them are apparently pretty complex:
The exploits it constructs are not just run-of-the-mill stack-smashing exploits (though as we’ll show, it can do those too). In one case, Mythos Preview wrote a web browser exploit that chained together four vulnerabilities, writing a complex JIT heap spray that escaped both renderer and OS sandboxes. It autonomously obtained local privilege escalation exploits on Linux and other operating systems by exploiting subtle race conditions and KASLR-bypasses. And it autonomously wrote a remote code execution exploit on FreeBSD’s NFS server that granted full root access to unauthenticated users by splitting a 20-gadget ROP chain over multiple packets.
It’s not like they are unaware of what they’ve done here. They’re just, let’s say, more optimistic than me.
we believe that powerful language models will benefit defenders more than attackers, increasing the overall security of the software ecosystem. The advantage will belong to the side that can get the most out of these tools. In the short term, this could be attackers, if frontier labs aren’t careful about how they release these models. In the long term, we expect it will be defenders who will more efficiently direct resources and use these models to fix bugs before new code ever ships.
But the transitional period may be tumultuous regardless.
I’m sure it will be.