Google's Gemini 2.5 enables voice generation in multiple languages, including Japanese

Gemini 2.5 now includes a new voice generation feature called 'native audio,' which can generate speech with human-like expressiveness. You can try it for free on Google AI Studio and other platforms.
Gemini 2.5's native audio capabilities
The newly integrated native audio features include real-time voice dialogue and controllable text-to-speech .
◆ Real-time voice dialogue
A high-quality, expressive text-to-speech function. Use natural language prompts to set the reading accent and adjust the tone of the voice. Japanese is also supported.
You can generate audio by entering a prompt in the 'Stream' tab of Google AI Studio and clicking 'Run.'

I actually input some prompts and had it read them out loud. While it was able to naturally express the relaxed feeling of endings such as 'masukanee,' the intonation of the Kansai dialect was quite unnatural.
◆ Controllable text-to-speech
This feature gives you complete control over the generated voice, from short sentences to full-length narration, allowing you to precisely dictate the style, tone, emotion and performance, all controlled by natural language prompts.
It can be used by selecting 'Gemini speech generation' in the 'Generate Media' tab in Google AI Studio .

Enter the text you want to have read out into 'Raw structure'.

Specify the speaker in the input text in the form of 'Name:' and enter the name in the text in the 'Name' box on the right. This will allow you to generate a conversation for up to two people.

I actually had it read out loud.
I tried out controllable text reading with 'Gemini 2.5' in Google AI Studio - YouTube
Native audio functionality is available in Google AI Studio, as well as in Vertex AI via the Gemini API.
According to Google, all generated audio is embedded with Google's watermarking technology, SynthID .
Related Posts:
in Review, Software, Web Application, Video, Posted by log1p_kr