Home
News
You are here

Google has developed AI voice that sounds indistinguishable from the voice of a real human

By Victor Hristov

Updated: Jan 02, 2018, 2:50 AM

9comments

Google

A mel-scape spectrogram for the word whoa, used by Google Tacotron 2 AI voice system

Google has pioneered a brand new text-to-speech system that it calls Tacotron 2 and it works with stunning accuracy, delivering voice narrations that are indistinguishable from the voice of a real human. This is not an exaggeration: Tacotron 2 is the second generation of the technology and it consists of two deep neural networks, one that converts the text into a special spectogram (like the one you see in the picture above), and the second one, the WaveNet, that reads this chart and interprets it into a real voice.

The system is currently only trained to work in English with the one female voice that you can hear below. It can not only read, but it will also be able to tell nuance, and if a certain word is highlighted in all caps, it will add an accent to that word. It is also able to deal with a small amount of typing errors.

Here are a few examples, showing the Tacotron 2 in action:

“That girl did a video about Star Wars lipstick.”

We find it hard to tell which one is narrated by a real person, and which one is the computer generated voice (it's the second one).

Here are a couple more examples showing the capabilities of the system. Note that all of the below phrases are unseen by Tacotron 2 during training.

Tacotron 2 text-to-speech system in action

Complex words in sentences: "Generative adversarial network or variational auto-encoder."

"Basilar membrane and otolaryngology are not auto-correlations."

Tacotron 2 knows the right pronunciation depending on semantics: "He has read the whole thing."

"He reads books."

"Don't desert me here in the desert!"

"He thought it was time to present the present."

It can deal with typing errors: "Thisss isrealy awhsome."

It changes prosody with punctuation, notice the comma: "This is your personal assistant, Google Home."

"This is your personal assistant Google Home."

It can adapt to stress with intonation: "The buses aren't the problem, they actually provide a solution."

"The buses aren't the PROBLEM, they actually provide a SOLUTION."

And it even handles tongue-twisters: "Peter Piper picked a peck of pickled peppers. How many pickled peppers did Peter Piper pick?"

"She sells sea-shells on the sea-shore. The shells she sells are sea-shells I'm sure."

What is most impressive about the Tacotron 2 system is that it is not just some sort of technology that will stay in the lab. Google is already using the WaveNet network to generate the more realistic voice in Google Assistant. Once the Tacotron 2 is polished, it will roll out to systems like the Assistant.

source: Google Research Paper via Quartz

With Galaxy AI – port-in & $720 upfront required

We may earn a commission if you make a purchase

Check Out The Offer

View Full Bio

Victor, a seasoned mobile technology expert, has spent over a decade at PhoneArena, exploring the depths of mobile photography and reviewing hundreds of smartphones across Android and iOS ecosystems. His passion for technology, coupled with his extensive knowledge of smartphone cameras and battery life, has positioned him as a leading voice in the mobile tech industry.

Read the latest from Victor Hristov