Google video walks through where we are with speech recognition, and where we'll go next
The new video is titled "Behind the Mic: The Science of Talking with Computers", and it starts with a pretty basic idea: humans are born with the innate ability to communicate in one way or another. The most efficient way we've found so far is speech, but until recently, we've only been able to communicate with computers by "passing notes", which is much more difficult. The video gives a brief history of speech recognition and then puts into stark context where we are today, which is surprisingly rudimentary in the grand scheme.
Right now, computers and mobile devices are great at transcription, and basic queries/commands. But, Google wants to tackle the next level of speech recognition, which is getting into understanding meaning. This is incredibly difficult, because of the many subtleties of tone, sarcasm, irony, semantics, accents, and even physical cues, like facial expressions. The video goes on to explain in a (relatively) simple way how neural nets work, and how computers are being taught/learning to gather meaning. (The example of Paris - France + Italy = Rome is quite a brilliant moment.)
Ultimately, the hope is that within 5 years, we'll see huge improvements in speech recognition. So, right now, you can easily ask your phone about places, weather, and more. But, what will be possible in 5 years?