Can you tell the difference between a real human voice and Google's new AI voice?


Google has developed a new AI-based text-to-speech system - the Tacotron 2 - that sounds indistinguishable from the voice of a real human, at least that is what Google claims.

To be perfectly exact, the Tacotron 2 system received a mean opinion score (MOS) of 4.53 comparable to a MOS of 4.58 for a professionally recorded speech.

The new system does not sound robotic or digitized in any easily noticeable way, and it can even tell the correct pronunciation of words depending on the semantics. It can also deal with some slight typing errors and can do things like tongue-twisters.

We tried it and we could not tell the difference between a human voice and this new system. So let's play a fun game... below you will find four samples recorded twice. One of the recordings is the Tacotron 2 AI voice and the second one is the professional human narrator. Can you tell which one is which? (We have the answers at the end of this article.)

1. Which one is real voice?



2. Which one is real voice?



3. Which one is real voice?



4. Which one is real voice?



Could you tell the difference?

*We were able to hear the difference after listening to the samples a few times, but not initially.

Here are the correct answers for which one is which.


A - Human voice
B - Tacotron 2 AI voice

C - Tacotron 2 AI voice
D - Human voice

E - Tacotron 2 AI voice
F - Human voice

G - Human voice
H - Tacotron 2 AI voice

