Google Unveils Gemini 3.5 Live Translate for Real-Time Speech-to-Speech Translation
Google has released its latest audio model, Gemini 3.5 Live Translate, which enables near real-time speech-to-speech translation in over 70 languages.
This technology is the culmination of two decades of machine learning research at Google, with more than a trillion words translated for billions of users across various products every month.
The new audio model automatically detects multiple languages and generates smooth, natural-sounding translations that preserve speakers' intonation, pacing, and pitch.
Unlike traditional turn-by-turn systems, Gemini 3.5 Live Translate continuously generates speech while balancing the trade-off between waiting for context to improve quality and translating immediately to stay in sync with the speaker.
The model delivers fluid audio without awkward pauses and stays just a few seconds behind the speaker throughout the session.
Gemini 3.5 Live Translate is rolling out across various Google products, including Google Meet, which will soon use this technology for improved translation experience during meetings.
In addition to Google's own products, several developer platforms like Agora, Fishjam, and Vision Agents are integrating Gemini 3.5 Live Translate into their services, enabling developers to build voice translation apps with ease.
Companies such as Grab, CJ ENM, and LiveKit have already tested the model and praised its impressive translation quality, accuracy, and low latency.
The technology is also being used by Grab to enable multilingual communication in near real-time between drivers and travelers at pickups, handling over 10 million voice calls per month.
For Android users, a new 'listening mode' with Gemini 3.5 Live Translate allows them to hear translations directly through their phone's earpiece without needing headphones.
The model is also rolling out on the Google Translate app globally, both for Android and iOS devices.