Google Launches Gemini 3.5 Live Translate for Near Real-Time Speech Translation

New audio model aims to convert spoken language across 70+ tongues with continuous, low-latency output and enterprise previews in Google Meet

Google introduced Gemini 3.5 Live Translate, an audio model designed to translate speech-to-speech in near real time across more than 70 languages. The system detects languages automatically, streams translations continuously a few seconds behind the speaker, and is being rolled out across developer and consumer products with enterprise previews for Google Meet.

GOOGL

Summarize with

ChatGPT Perplexity Claude Grok Gemini

Key Points

Gemini 3.5 Live Translate is an audio model that provides near real-time speech-to-speech translation in more than 70 languages.
The model streams translated speech continuously, staying a few seconds behind the speaker, and processes multilingual inputs without manual configuration.
Rollout includes a public developer preview via the Gemini Live API and Google AI Studio, a private enterprise preview in Google Meet starting this month, and availability in Google Translate on Android and iOS; third-party platforms including Agora, Fishjam, LiveKit, Pipecat and Vision Agents are integrating the technology. Sectors impacted include enterprise communications, developer platforms, and transportation/logistics where multilingual voice calls occur.

Google today unveiled Gemini 3.5 Live Translate, an audio-focused model engineered to perform near real-time speech-to-speech translation in more than 70 languages. The company says the model automatically identifies the language being spoken and produces translated audio that preserves a speaker's intonation, pacing and pitch.

Unlike systems that wait for a speaker to finish before responding, Gemini 3.5 Live Translate generates output continuously, trailing the speaker by only a few seconds during interactions. The model processes audio as it streams and accepts multilingual inputs without requiring manual configuration from users or developers. Google also stated that the model includes noise robustness designed to allow it to operate in loud, unpredictable environments.

Rollout of the technology begins immediately across multiple Google products. Developers will have a public preview via the Gemini Live API and Google AI Studio. Enterprises can access a private preview within Google Meet starting this month. The capability will also be made available through Google Translate on Android and iOS.

Third-party developer platforms are moving to incorporate the model into their services. Agora, Fishjam, LiveKit, Pipecat and Vision Agents are listed as integrating Gemini 3.5 Live Translate to enable voice translation features in their applications.

One early tester, ride-hailing and delivery firm Grab, is trialing the model to facilitate multilingual communication between drivers and riders during pickups. Grab's users place more than 10 million voice calls per month through the platform. Philipp Kandal, Chief Product Officer at Grab, said the company appreciated the model's automatic detection of multiple languages and its ability to produce accurate translations with low latency.

Google Meet will expand its speech translation coverage by using Gemini 3.5 Live Translate, increasing supported languages from a prior limit of five to more than 70. The company said this change enables conversations across over 2,000 language combinations within a single meeting.

All audio produced by the model will carry SynthID, an imperceptible watermark embedded in the audio output. Google described SynthID as a means to help prevent misinformation by identifying generated audio.

Deployment and developer integration

The release strategy combines developer previews and staged enterprise access while also pushing the tool into consumer mobile apps. Developer platforms integrating the model suggest a path to wider adoption through third-party voice applications.

What Google says about performance

The model streams translations continuously and remains a few seconds behind a live speaker.
It automatically detects languages and handles multilingual inputs without manual setup.
Google claims the model is robust to noise and suitable for unpredictable, loud environments.

Risks

Initial access for enterprises is staged as a private preview in Google Meet, which may limit immediate availability for some organizations - this affects enterprise communications and meeting software vendors.
Google embeds SynthID watermarks in generated audio to help prevent misinformation, indicating concerns about potential misuse or misattribution of synthetic speech - this is relevant to information verification and content moderation sectors.
Wide deployment depends on third-party developer integration; adoption and performance across different apps will vary as platforms such as Agora, Fishjam, LiveKit, Pipecat and Vision Agents implement the model - this impacts developer ecosystems and voice-enabled service providers.

Menu

Google Launches Gemini 3.5 Live Translate for Near Real-Time Speech Translation

Key Points

Risks

More from Stock Markets