Microsoft Corp. is advancing plans to build its own large-scale artificial intelligence models within the coming year as part of a push to produce in-house alternatives to sophisticated tools now offered by OpenAI and Anthropic. Company leadership says the work spans models capable of generating or responding to text, images and audio, with an ambition to hit state-of-the-art performance by 2027.
Mustafa Suleyman, chief executive of Microsoft AI, described the timetable and technical direction in an interview with Bloomberg News, underscoring two parallel milestones: near-term development of large models and a multi-year target for achieving top-tier capability across multiple modalities.
On Thursday, Suleyman’s organization published a speech transcription model that Microsoft says surpassed competing products on benchmark tests covering 11 of the 25 most widely spoken languages. The company framed the model as a specialized, efficiency-focused tool that reached strong language coverage while being trained on fewer data points than broader, general-purpose systems such as Claude 3 Opus or OpenAI’s GPT-4.
Alongside model development, Microsoft is expanding the computing infrastructure required to train and refine more capable systems. In October, the company began deploying a cluster of Nvidia GB200 chips to augment its compute resources. Suleyman said the firm plans to scale that infrastructure up to what he described as frontier-level computing capacity over the next 12 to 18 months.
The company’s public comments link advances in model capability to concurrent investments in hardware and capacity. The speech transcription release is presented as an example of a targeted model optimized for efficiency and language coverage, while the GB200 deployments are presented as steps toward broader training ambitions.
Summary
Microsoft aims to develop large-scale AI models by next year and to reach state-of-the-art multimodal capabilities by 2027. The company released an efficiency-focused speech transcription model that outperformed rivals on benchmarks for 11 of 25 widely spoken languages and has been deploying Nvidia GB200 chips since October to expand training capacity, with plans to scale to frontier-level compute within 12 to 18 months.
Key points
- Microsoft is developing large-scale AI models by next year as part of efforts to produce in-house alternatives to advanced platforms from OpenAI and Anthropic - impacts technology and cloud software sectors.
- The company released a specialized speech transcription model that Microsoft says beats competitors on benchmarks in 11 of the 25 most widely spoken languages - relevant to AI services and voice/transcription markets.
- Microsoft is expanding compute capacity using Nvidia GB200 chips and plans to reach frontier-level computing power within 12 to 18 months - relevant to cloud infrastructure and semiconductor demand.
Risks and uncertainties
- Timelines are aspirational: the stated aim to achieve state-of-the-art capability by 2027 and to scale compute within 12 to 18 months represent targets rather than guaranteed outcomes - this affects technology investment and cloud services planning.
- Specialized models trained on fewer data points may deliver efficiency but could differ in scope and generality compared with broad, general-purpose systems like Claude 3 Opus and GPT-4 - relevant to AI product strategy and adoption.
- Competition from established advanced AI providers is explicit, as Microsoft positions in-house models as alternatives to tools from OpenAI and Anthropic - this bears on market dynamics in AI platforms and enterprise procurement.