Stock Markets May 7, 2026 02:12 PM

OpenAI Releases Three Real-Time Audio Models for Developers

New GPT-Realtime family adds live comprehension, translation and speech-to-text capabilities with usage-based pricing

By Caleb Monroe

OpenAI announced three audio-focused models for its developer platform that enable real-time listening, translation and transcription. The models - GPT-Realtime-2, GPT-Realtime-Translate and GPT-Realtime-Whisper - are available for testing in OpenAI's developer playground. Selected corporate testers include Zillow, Priceline and Deutsche Telekom. Pricing is usage-based and varies by model.

OpenAI Releases Three Real-Time Audio Models for Developers

Key Points

  • OpenAI launched three GPT-Realtime audio models - GPT-Realtime-2, GPT-Realtime-Translate and GPT-Realtime-Whisper - for its developer platform and they are available to test in the developer playground.
  • GPT-Realtime-2 is tailored for complex voice interactions, including tool calls, interruption handling and extended context; GPT-Realtime-Translate supports input from over 70 languages into 13 output languages and is targeted at customer support and education; GPT-Realtime-Whisper provides live speech-to-text for captions and notes.
  • Early testers include Zillow, Priceline and Deutsche Telekom; pricing is usage-based with GPT-Realtime-2 starting at $32 per million audio input tokens, GPT-Realtime-Translate at $0.034 per minute and GPT-Realtime-Whisper at $0.017 per minute.

OpenAI on Thursday introduced a trio of audio models aimed at expanding the company's developer platform into real-time voice applications. The new application programming interfaces let software agents listen and respond during live conversations rather than only performing offline transcription or text chat.

The three models are offered under the GPT-Realtime name and are available for developers to test in OpenAI's developer playground. They are:

  • GPT-Realtime-2 - Built to tackle more complex voice requests, this model is described as capable of calling external tools, handling interruptions, and maintaining conversational context across longer voice sessions.
  • GPT-Realtime-Translate - Designed for live translation tasks, it supports input from more than 70 languages and can output into 13 target languages. OpenAI positions this model for use cases such as customer support and education, among other settings.
  • GPT-Realtime-Whisper - A live speech-to-text model intended to produce captions, meeting notes, and workflow updates as a speaker is talking.

OpenAI said companies already testing the new audio tools include online real estate marketplace Zillow, online travel agency Priceline and European telecommunications firm Deutsche Telekom.

Pricing was disclosed by model. GPT-Realtime-2 begins at $32 per million audio input tokens. The translation model, GPT-Realtime-Translate, is priced at $0.034 per minute. The speech-to-text model, GPT-Realtime-Whisper, is priced at $0.017 per minute.


Promotional notice in the original release: The announcement also included copy promoting an investment research product, stating that better data can guide investment choices and pointing readers to tools that combine institutional-grade data with AI-powered insights. That promotional material asked what the best investments of 2026 might be and referenced a product named WarrenAI as a decision aid.


The immediate availability of these models in a testing environment highlights OpenAI's move beyond basic transcription and chat features toward agents that can act, translate and update workflows in real time. The company provided usage rates and named early testers but left other operational details, such as production readiness timelines, to be determined by developers evaluating the models in the playground.

Risks

  • The models are described as available to test in the developer playground, indicating they are in a testing environment; production readiness and real-world performance are not detailed.
  • GPT-Realtime-Translate supports more than 70 input languages but outputs to 13 languages, representing a limitation in available output language coverage.
  • Each model carries a distinct usage-based price point - GPT-Realtime-2 at $32 per million audio input tokens, GPT-Realtime-Translate at $0.034 per minute and GPT-Realtime-Whisper at $0.017 per minute - creating a defined cost structure for deployment.

More from Stock Markets

Justice Department and Six States Reach Settlement With Agri Stats Over Meat Pricing Reports May 7, 2026 Rave Sues Apple, Seeks Reinstatement After App Store Removal May 7, 2026 DOJ Antitrust Chief Signals Skepticism Toward AI-Based Merger Defenses May 7, 2026 Hawkeye 360 Pops 30% in Market Debut After $416 Million IPO May 7, 2026 Market Movers: Qualcomm and Datadog Lead Gains While Arm and Insmed Slide May 7, 2026