An Anthropic AI model discovered vulnerabilities in classified U.S. government computer systems while undergoing a testing exercise, according to a U.S. official. The probe used Anthropic's Mythos model and was carried out as part of Project Glasswing, an initiative designed to work with technology firms and government entities to harden critical software against risks that advanced AI models could pose.
Officials involved in the exercise said the Mythos model identified specific weaknesses within a matter of hours. The testing team did not exploit those weaknesses during that same timeframe, the U.S. official added.
Project Glasswing is described by Anthropic as a program that collaborates with both industry and government to assess and mitigate severe threats that advanced AI systems could create for public safety, national security and the economy. The Mythos model was used in that context to find and report potential flaws so that they could be addressed.
The interaction between Anthropic and the federal government has been tense recently. Anthropic has expressed concern about the potential military applications of its technology, and the administration has imposed limitations on the use of some Anthropic models. Those tensions were underscored earlier this month when the administration issued a directive requiring Anthropic to prevent foreign nationals from accessing its newest models, Fable 5 and Mythos 5.
Participants in the testing described the exercise as an attempt to surface critical vulnerabilities before they could be exploited in real-world settings. The findings underline the potential of advanced AI systems to both identify technical weaknesses rapidly and to pose novel risks if not carefully managed.
The U.S. official's account focused on the detection of vulnerabilities and the fact that no exploitation took place during the short window of the initial tests. The report did not provide further technical detail on the nature of the vulnerabilities nor on subsequent remediation steps taken.
Given the recent administrative restrictions and Anthropic's own concerns about military use, this episode illustrates the complicated balance between leveraging AI capabilities for defensive testing and controlling access and applications of powerful models.
Summary
An Anthropic artificial intelligence model, Mythos, identified vulnerabilities in classified U.S. government computer systems during testing performed under Project Glasswing. The model found issues within hours and did not exploit them during that period. The interaction comes amid policy measures limiting some uses of Anthropic models and a directive restricting foreign access to the company’s newest releases.
Key points
- The Mythos model detected vulnerabilities in classified government systems within hours during testing under Project Glasswing.
- The model did not exploit the identified vulnerabilities during the same testing window.
- The incident occurs alongside strained relations between Anthropic and the U.S. administration, including restrictions on model use and limits on foreign access to Fable 5 and Mythos 5.
Risks and uncertainties
- Unresolved vulnerabilities in classified government systems could present national security and economic risks if not remediated - impacts sectors involving government IT and national security.
- The dual-use nature of powerful AI models creates uncertainty about how they should be managed or limited - relevant to the AI industry and defense contracting sectors.
- Ongoing tensions and administrative restrictions on model access may affect collaborations between AI firms and government agencies - affecting technology procurement and regulatory policy in the public sector.