Google has developed AI voice that sounds indistinguishable from the voice of a real human


Google has pioneered a brand new text-to-speech system that it calls Tacotron 2 and it works with stunning accuracy, delivering voice narrations that are indistinguishable from the voice of a real human. This is not an exaggeration: Tacotron 2 is the second generation of the technology and it consists of two deep neural networks, one that converts the text into a special spectogram (like the one you see in the picture above), and the second one, the WaveNet, that reads this chart and interprets it into a real voice.

The system is currently only trained to work in English with the one female voice that you can hear below. It can not only read, but it will also be able to tell nuance, and if a certain word is highlighted in all caps, it will add an accent to that word. It is also able to deal with a small amount of typing errors.

Here are a few examples, showing the Tacotron 2 in action:

“That girl did a video about Star Wars lipstick.”



We find it hard to tell which one is narrated by a real person, and which one is the computer generated voice (it's the second one).

Here are a couple more examples showing the capabilities of the system. Note that all of the below phrases are unseen by Tacotron 2 during training.

Tacotron 2 text-to-speech system in action


Complex words in sentences: "Generative adversarial network or variational auto-encoder."


"Basilar membrane and otolaryngology are not auto-correlations."


Tacotron 2 knows the right pronunciation depending on semantics: "He has read the whole thing."


"He reads books."


"Don't desert me here in the desert!"


"He thought it was time to present the present."


It can deal with typing errors: "Thisss isrealy awhsome."


It changes prosody with punctuation, notice the comma: "This is your personal assistant, Google Home."


"This is your personal assistant Google Home."


It can adapt to stress with intonation: "The buses aren't the problem, they actually provide a solution."


"The buses aren't the PROBLEM, they actually provide a SOLUTION."


And it even handles tongue-twisters"Peter Piper picked a peck of pickled peppers. How many pickled peppers did Peter Piper pick?"


"She sells sea-shells on the sea-shore. The shells she sells are sea-shells I'm sure."


What is most impressive about the Tacotron 2 system is that it is not just some sort of technology that will stay in the lab. Google is already using the WaveNet network to generate the more realistic voice in Google Assistant. Once the Tacotron 2 is polished, it will roll out to systems like the Assistant.

FEATURED VIDEO

9 Comments

1. surethom

Posts: 1662; Member since: Mar 04, 2009

This urgently need to come to Google Home that female UK voice is robotic & sounds even more robotic on the large Google home as it has way to much base.

2. Peacetoall unregistered

Reading this article reminds me of movie named ''HER'' I suggest anyone who haven't seen it please watch it. Good interesting romantic story

3. Cat97

Posts: 1861; Member since: Mar 02, 2017

So is this the level of AI in 2017 ? On learning how to pronounce properly ? There's such a long way to go...

5. worldpeace

Posts: 3127; Member since: Apr 15, 2016

No worries there's still enough of time to make cyborg assassin by 2029 just like in terminator movie.. or push the year all the way back to 2200 (but probably we already in the matrix by then)

6. RebelwithoutaClue unregistered

This is just part of what AI is learning and actually, language is a pretty hard thing to master.

4. An.Awesome.Guy

Posts: 636; Member since: Jan 12, 2015

Great Sound Kudos for them to do such an amazing pronunciation

7. shshxiaojing

Posts: 13; Member since: Aug 18, 2016

It seems that I need to know and learn about AI. Trended.

8. Zylam

Posts: 1813; Member since: Oct 20, 2010

Incredible. Thank for implementing all the voice recording Victor!

9. MarmiteTheDog

Posts: 191; Member since: Jul 31, 2017

Will it be able to emphasise words correctly, or will it only do American?

Latest Stories

This copy is for your personal, non-commercial use only. You can order presentation-ready copies for distribution to your colleagues, clients or customers at https://www.parsintl.com/phonearena or use the Reprints & Permissions tool that appears at the bottom of each web page. Visit https://www.parsintl.com/ for samples and additional information.