Google today unveiled Gemini 3.5 Live Translate, an audio-focused model engineered to perform near real-time speech-to-speech translation in more than 70 languages. The company says the model automatically identifies the language being spoken and produces translated audio that preserves a speaker's intonation, pacing and pitch.
Unlike systems that wait for a speaker to finish before responding, Gemini 3.5 Live Translate generates output continuously, trailing the speaker by only a few seconds during interactions. The model processes audio as it streams and accepts multilingual inputs without requiring manual configuration from users or developers. Google also stated that the model includes noise robustness designed to allow it to operate in loud, unpredictable environments.
Rollout of the technology begins immediately across multiple Google products. Developers will have a public preview via the Gemini Live API and Google AI Studio. Enterprises can access a private preview within Google Meet starting this month. The capability will also be made available through Google Translate on Android and iOS.
Third-party developer platforms are moving to incorporate the model into their services. Agora, Fishjam, LiveKit, Pipecat and Vision Agents are listed as integrating Gemini 3.5 Live Translate to enable voice translation features in their applications.
One early tester, ride-hailing and delivery firm Grab, is trialing the model to facilitate multilingual communication between drivers and riders during pickups. Grab's users place more than 10 million voice calls per month through the platform. Philipp Kandal, Chief Product Officer at Grab, said the company appreciated the model's automatic detection of multiple languages and its ability to produce accurate translations with low latency.
Google Meet will expand its speech translation coverage by using Gemini 3.5 Live Translate, increasing supported languages from a prior limit of five to more than 70. The company said this change enables conversations across over 2,000 language combinations within a single meeting.
All audio produced by the model will carry SynthID, an imperceptible watermark embedded in the audio output. Google described SynthID as a means to help prevent misinformation by identifying generated audio.
Deployment and developer integration
The release strategy combines developer previews and staged enterprise access while also pushing the tool into consumer mobile apps. Developer platforms integrating the model suggest a path to wider adoption through third-party voice applications.
What Google says about performance
- The model streams translations continuously and remains a few seconds behind a live speaker.
- It automatically detects languages and handles multilingual inputs without manual setup.
- Google claims the model is robust to noise and suitable for unpredictable, loud environments.