Google video walks through where we are with speech recognition, and where we'll go next
Over the past couple of years, we've heard bits and pieces about how Google has been working on speech recognition, and seen how those efforts have become part of the software we love in Google Now, Chrome, and Android. We've heard a bit about the "neural network" that Google created in order to mimic a human brain, and how that led to a breakthrough with speech recognition accuracy. Now, Google wants to walk us through the larger history of speaking to computers.
The new video is titled "Behind the Mic: The Science of Talking with Computers", and it starts with a pretty basic idea: humans are born with the innate ability to communicate in one way or another. The most efficient way we've found so far is speech, but until recently, we've only been able to communicate with computers by "passing notes", which is much more difficult. The video gives a brief history of speech recognition and then puts into stark context where we are today, which is surprisingly rudimentary in the grand scheme.
Right now, computers and mobile devices are great at transcription, and basic queries/commands. But, Google wants to tackle the next level of speech recognition, which is getting into understanding meaning. This is incredibly difficult, because of the many subtleties of tone, sarcasm, irony, semantics, accents, and even physical cues, like facial expressions. The video goes on to explain in a (relatively) simple way how neural nets work, and how computers are being taught/learning to gather meaning. (The example of Paris - France + Italy = Rome is quite a brilliant moment.)
Ultimately, the hope is that within 5 years, we'll see huge improvements in speech recognition. So, right now, you can easily ask your phone about places, weather, and more. But, what will be possible in 5 years?