DeepMind Announces Gemini 3.5 Live Translate! Real-time Voice Translation Overcoming Language Barriers

Gemini 3.5 Live Translate Emerges, Paving the Way for a Future Without Language Barriers

On June 9, 2026, Google DeepMind announced its latest audio model, 'Gemini 3.5 Live Translate.' This model automatically detects over 70 languages and translates spoken words into audio in another language almost in real-time. What sets it apart from conventional translation tools is its ability to produce natural translated speech while preserving the speaker's vocal nuances, pace, and pitch. This technology will be progressively rolled out to familiar services like the Google Translate app and Google Meet, and an API will also be made available to developers, opening up possibilities for its application in various innovative applications.

Technical Details: From 'Turn-Based' to 'Streaming Translation'

Most conventional real-time translation systems employed a 'turn-based' approach, waiting for the speaker to finish speaking before processing through a pipeline of 'speech recognition → text conversion → machine translation → speech synthesis.' A major challenge with this method was the cumulative delay at each step, creating unnatural pauses in conversations. Gemini 3.5 Live Translate addresses this by adopting a 'streaming audio translation model' architecture. This technology continuously processes audio data from the moment the speaker begins speaking, initiating the generation of translated speech. By using an audio-to-audio approach that directly converts audio input to audio output, it either bypasses or streamlines the intermediate text conversion step, achieving delays of only a few seconds. This enables seamless, natural multilingual communication, as if an interpreter were right next to you.

Impact and Outlook for Engineers

The advent of Gemini 3.5 Live Translate will have a significant impact on engineers in Japan. Firstly, communication within global development teams will become dramatically smoother. Real-time discussions and pair programming will be possible without conscious awareness of language barriers, leading to improved development efficiency. Furthermore, the release of the 'Gemini Live API' will significantly lower the barrier to developing multilingual applications. For instance, it will accelerate the creation of services that were previously difficult to realize, such as online event platforms with integrated real-time translation, customer support tools that allow smooth conversations with international clients, or new voice-centric social applications. Moving forward, we can expect further reductions in latency, expansion of supported languages, improved accuracy in handling non-native accents, and enhanced translation capabilities in complex situations where multiple people speak simultaneously.

📦

Amazon で関連書籍・ツールを検索

artificial intelligence machine learning LLM book

Amazonで探す →（アソシエイトリンク）