The Pentagon has started hands-on evaluations of AI models with 25 of the department's most active internal users as it pursues alternatives to Anthropic PBC's Claude, according to a Bloomberg report.
Testing reportedly began in early March, three days after Defense Secretary Pete Hegseth designated Anthropic as a supply-chain risk on the basis of the company's positions on technology guardrails and moved to end its role as a provider of AI tools for the department. Anthropic has responded by challenging the designation in court, warning the action could cost the company billions in revenue.
The Defense Department uses Anthropic's Claude within a classified digital mission control program called the Maven Smart System for operations related to Iran. Claude has drawn users within the department, although officials have not disclosed the full extent of how the model has been used in AI targeting.
Following the separation from Anthropic, the Pentagon put in place a six-month timetable to eliminate reliance on the company's products. This month the department announced new agreements with several other companies to place their AI tools onto classified networks while it builds out a roster of multiple model suppliers.
Emil Michael, the U.S. undersecretary of defense for research and engineering, told Bloomberg Television that conversations with Anthropic remain on hold because the company is contesting the supply-chain risk designation. Michael said the department is prepared to transition to other vendors.
Michael also said he expects new releases from competing models to provide capabilities comparable to Anthropic's offerings on a steady cadence - approximately every month or two. He noted that Anthropic's ideological stance may diverge from the Pentagon's mission requirements, even as other U.S. government entities continue to use Anthropic models, according to the report.
According to the Bloomberg account, models from OpenAI and Alphabet Inc.'s Google are among those being evaluated on a digital testing platform that is separate from the Maven Smart System. Early testing has shown that different models produce different responses to the same prompts, and that adjusting prompts across models has helped tune performance.
The report said officials declined to share specific performance metrics or interim rankings from the trials. It also noted that the Pentagon may consider publishing its final evaluation when the testing concludes.
Analysis
This testing phase reflects a deliberate effort by the department to establish multiple suppliers of generative AI models for classified environments while managing operational and supply-chain considerations. The presence of 25 power users suggests the evaluations are intended to be practical and user-focused, rather than purely theoretical lab comparisons.
At this stage the department is balancing legal, technical and mission alignment factors: the legal challenge from Anthropic has paused direct talks; technical testing across multiple vendors is ongoing but not yet transparent; and operational deployments such as Maven Smart System remain in place without full public disclosure of scope.