OpenAI on Thursday introduced a trio of audio models aimed at expanding the company's developer platform into real-time voice applications. The new application programming interfaces let software agents listen and respond during live conversations rather than only performing offline transcription or text chat.
The three models are offered under the GPT-Realtime name and are available for developers to test in OpenAI's developer playground. They are:
- GPT-Realtime-2 - Built to tackle more complex voice requests, this model is described as capable of calling external tools, handling interruptions, and maintaining conversational context across longer voice sessions.
- GPT-Realtime-Translate - Designed for live translation tasks, it supports input from more than 70 languages and can output into 13 target languages. OpenAI positions this model for use cases such as customer support and education, among other settings.
- GPT-Realtime-Whisper - A live speech-to-text model intended to produce captions, meeting notes, and workflow updates as a speaker is talking.
OpenAI said companies already testing the new audio tools include online real estate marketplace Zillow, online travel agency Priceline and European telecommunications firm Deutsche Telekom.
Pricing was disclosed by model. GPT-Realtime-2 begins at $32 per million audio input tokens. The translation model, GPT-Realtime-Translate, is priced at $0.034 per minute. The speech-to-text model, GPT-Realtime-Whisper, is priced at $0.017 per minute.
Promotional notice in the original release: The announcement also included copy promoting an investment research product, stating that better data can guide investment choices and pointing readers to tools that combine institutional-grade data with AI-powered insights. That promotional material asked what the best investments of 2026 might be and referenced a product named WarrenAI as a decision aid.
The immediate availability of these models in a testing environment highlights OpenAI's move beyond basic transcription and chat features toward agents that can act, translate and update workflows in real time. The company provided usage rates and named early testers but left other operational details, such as production readiness timelines, to be determined by developers evaluating the models in the playground.