x PhoneArena is looking for new authors! To view all available positions, click here.
  • Home
  • News
  • Why can't Google Now, Siri, and Cortana offer full voice control?

Why can't Google Now, Siri, and Cortana offer full voice control?

Posted: , by Michael H.

Tags:

Why can't Google Now, Siri, and Cortana offer full voice control?
I love voice control. Let's just get that out there right from the start. I may be a writer, meaning that I am at my best when conveying my thoughts through the written word and not on the spot talking; but, I am also a lazy man, and I like to be able to get things done with the minimal amount of interaction with my computing devices. As such, I can't help but wonder: why can't Google Now, Siri, and Cortana offer full voice control?

As already mentioned, I love voice control. It is one of the main reasons why I traded in my Nexus 5 for a Moto X - I wanted the Touchless Controls. And, as much as I love Touchless Controls with my Moto X, I can't help but want more. The trouble is that there is a limit to what I can accomplish with voice commands alone. There is a huge assortment of options for voice commands. I can send emails, texts, navigate to websites, ask questions, get directions, set alarms, set reminders, play music, and plenty more. The trouble is that once that first command is done, there is nothing left for me with voice command. 

One of the best innovations of recent years is in Google's conversational speech recognition in Search. From a technical standpoint, it means that Google can understand pronouns, and connect them to previous requests. So, if you as about Kawhi Leonard in one voice action, then ask a follow-up question using the pronoun "him", Google will understand and give you the information that you want. That is an amazing piece of tech that most don't fully appreciate. It creates a back-and-forth with your device that feels natural. Unfortunately, that back-and-forth doesn't extend into more useful scenarios. 

Natural controls


Why can't Google Now, Siri, and Cortana offer full voice control?
It's nice to be able to run follow-up commands, but the current implementation is fairly limited. I simply don't have many instances where I need to ask a follow-up question about a person or place. I would much rather be able to continue a device command in that same conversational way. My issues come from how other voice commands don't contain similar follow-up scenario options. For example, let's say that I ask my Moto X to play a song by Me'Shell Ndegéocello, because I haven't yet had a chance to listen to her new album. That first request should go through without a hitch (assuming I can pronounce her name correctly, otherwise I'll just opt for a safer voice recognition name like Gregory Porter.) The trouble is that once the music starts, my voice command options run dry. All I can do from there is submit a voice command to play another artist or song. But, what I really want to be able to do is tell my device to do one of a multitude of things, like "pause", "next track", "lower/raise volume", or repeat track. Unfortunately, I can't.

I don't really understand why I can't do this. From a technical standpoint, there are almost no barriers to allowing me this sort of full voice control over my device. Starting with voice recognition, we're golden. All voice command systems can understand simple words like: play, pause, next, previous, repeat, etc. As far as a touchless trigger, that's possible too. Google has recently expanded its hotwords to allow for the "OK, Google" command to be initiated from anywhere. There are rumors that the next iPhone will offer similar functionality for Siri; and, there's no reason why Cortana couldn't do the same for Windows Phone users. Always listening is becoming the norm, so that shouldn't be an issue. 

I can understand that more voice interaction would likely mean more drain on the battery, which is always a point of concern for manufacturers; but, it seems like a problem with a relatively easy solution. A device that is "always-listening" is already possible, especially when the device has a companion core or optimized processor (anything from a Snapdragon 800 and newer) specifically dedicated to listen for voice commands. That takes care of the battery issues. The other side of the issue should be a simple API, at least to get things started.

That is what Ubuntu Touch is planning to implement. Once you're inside an app, there is a fairly limited selection of commands that one might want to use via voice. News apps and other reading apps might not have much use for voice command, but even implementing simple commands, like "back", "scroll down/up", "search" and "share to..." would add a wealth of functionality for the vast majority of apps. Once you jump specifically into apps that would have more options for standard voice commands, like media consumption apps, the possibilities become much clearer. Imagine having full media controls with voice, like "play/pause", "next/previous", "rewind/fast forward". "volume up/down", or even "skip to (time)". Of course, even dynamic commands shouldn't be a trouble because in-app commands will mostly be one or two word commands, many of which would overlap between apps, allowing for easier implementation of a standard API; and as mentioned before, the recognition for those commands shouldn't be a problem.

Who does it first?


It's not like this sort of functionality is completely new. Windows 7 and 8 offer much broader voice command functionality, allowing for full navigation of the screen just with voice commands. Many would say that's desktop, and mobile is a different world with more limited options, but that sort of thinking doesn't hold as true anymore. Mobile platforms are becoming more and more advanced, and bridging the functionality gap with desktops in many ways. One of the big plans for Ubuntu Touch has been to allow for wider voice commands within apps. One of the first demos that Canonical showed had the standard items in a dropdown menu being actionable via voice, meaning in-app search, and commands like "open", "save", "crop", etc. 

Why can't Google Now, Siri, and Cortana offer full voice control?

Canonical has not yet gotten that functionality working in Ubuntu Touch, but frankly there is still a lot in Ubuntu Touch that doesn't yet work to its full potential. My question is in regards to the established platforms. Sure, Google and Apple continue to expand the functionality of Google Now and Siri, respectively, and Microsoft looks to be coming out of the gate with and impressive feature set for Cortana; but, none appear to have any plans to offer full voice control, which is pretty disappointing. The best we can hope for right now is a back-and-forth conversation to make sure that your voice command is handled properly, and that all of the relevant information is included like with calendar events or reminders. 

In the end, we're definitely going to get full voice control; it's more a matter of who implements it first. As mentioned, Microsoft has it working in Windows, but not Windows Phone. Microsoft has stated intentions to bring "Kinect-like" control to its platforms, but there is no way to tell what the timeline is on those features. It seems most likely to be in Windows Phone 9, which is expected next year. Canonical is building it for Ubuntu, but it isn't ready yet. Apple hasn't given any outward appearance that it even has this functionality on its radar yet, but it seems likely that it is at least in R&D. Samsung also hasn't shown any inclination towards this feature. Samsung already offers some features like this, and S Voice is powered by Nuance, which is also behind Siri's voice recognition. Obviously the capabilities are there, but Samsung (not surprisingly) has the features limited to its own apps, and not globally on its devices. That just leaves Google. 

In various Android Wear videos, Google has teased that there is an expansion of voice commands on the way. One video showed someone on a bike using a command like "OK Google, open the garage door". Unfortunately, it's hard to tell what this means. It could be that Google will be opening up voice commands to developers, allowing for deeper integration into apps and for developers to create custom voice actions. It seems more that it will be a new set of standard actions that apps can hook into, like how the standard "note to self" command can be used with email, Keep, Evernote, and other apps. Google has shown an option to say "OK Google, call me a car", and let you choose an app to handle that request. The first option could lead to a lot more functionality, although it would be something of a mess. The latter would keep functionality limited, but more consistent. Either way, it does look like Google will be the first to add more full-featured voice control. 

Conclusion


The "What?" and "Why?" are easy: full voice control, because we all want to live in Star Trek. The "How?" also seems to be answered: always-listening and APIs. The answer to "Who?" is really everyone, but it does look like Google will be the first out of the gate to offer full voice control. So, that just leaves the last question: "When?"

Given what Google has teased, it's hard to say that full voice control would start to roll out before the end of this year. The functionality would probably need to be part of Android L, and Google made no real mention of it during the I/O keynote. This kind of deeper integration into apps would need to be at the system level, and not just use Android's app handler calls. It does seem like Google may at least be putting down the foundations for full voice control. Unfortunately, regardless of your platform of choice, it is likely that full voice control isn't in the cards until 2015 at the earliest. I'm a patient person, but that seems like a long time to wait for a feature that should already be in the works by all of the big platforms.

32 Comments
  • Options
    Close




posted on 30 Jun 2014, 12:08 5

1. ArtSim98 (Posts: 2428; Member since: 21 Dec 2012)


I never use Google Now. I'm just not interested in talking to my phone. At least yet.

posted on 30 Jun 2014, 12:16 12

2. ArtSim98 (Posts: 2428; Member since: 21 Dec 2012)


Oh, almost forgot. Great article Michael!

posted on 30 Jun 2014, 12:27 1

3. kosal (Posts: 14; Member since: 19 Oct 2013)


Try Google now to set reminders and alarms thats cool enough dude...

posted on 30 Jun 2014, 12:56

7. ihavenoname (Posts: 1313; Member since: 18 Aug 2013)


I use Google Now cards, but voice recognition is gimmick for me. It works very well, but almost always I prefer to type and because "Ok Google" doesn't work with my language (yet), I have to press the button anyway.

posted on 30 Jun 2014, 13:03

8. ArtSim98 (Posts: 2428; Member since: 21 Dec 2012)


I think I can do it quicker by just going into the calendar. And I'm not sure if I can set a SmartBand alarm in GN. I haven't tried though.

posted on 30 Jun 2014, 18:46 1

24. joey_sfb (Posts: 2716; Member since: 29 Mar 2012)


Are you speech impaired? It take me 2 sec to finish my one sentence setting up my reminder or alarm with Google now.

posted on 30 Jun 2014, 18:59 1

27. joey_sfb (Posts: 2716; Member since: 29 Mar 2012)


50 examples on how to use Google Now.

http://www.youtube.com/watch?v=2vT0AWDq3DE

posted on 30 Jun 2014, 14:29

14. InspectorGadget80 (Posts: 6215; Member since: 26 Mar 2011)


Same cause voice recognition can't hear us clearly that well.

posted on 30 Jun 2014, 12:31 6

4. Penny (Posts: 1125; Member since: 04 Feb 2011)


Hmm, I don't know when Google's timeline for full OS voice control is, but Microsoft might be headed in this direction by the end of the year.

It has been reported that Microsoft is looking to bring "kinect-like" gesture controls to WP, and my guess is that they would try to bring along "kinect-like" voice controls with that. Also, Cortana seems to be the furthest ahead in the way of APIs; it allows third-party applications to tap into the power of Cortana already and add their own unique commands to its list.

Google, meanwhile, seems be the furthest ahead on the actual hardware implementation side of things. The Moto X was essentially a proof of concept to demonstrate always-on listening with a dedicated chip that does not harm battery life, and Google's Now will only continue to gain more capabilities.

posted on 30 Jun 2014, 12:46 1

5. TheGenius (Posts: 297; Member since: 06 Mar 2014)


Samsung already offers the functionality to play, pause, next, previous, inc/Dec the volume in its music player.

posted on 30 Jun 2014, 12:53 5

6. teerex42 (Posts: 152; Member since: 14 Jun 2012)


I love google now..it's the best virtual assistant out there. Especially now that you can ask it a question from the lock screen with the latest version, it's even more functional. I never liked siri cause it was gimmicky in how it joked with you if you were to make jokes with it. Google now is pure answers and has a much more natural voice, not robotic like siri. I haven't tried cortana but probably won't cause it's windows based.

posted on 30 Jun 2014, 13:04 6

9. ArtSim98 (Posts: 2428; Member since: 21 Dec 2012)


I think Cortana seems to be the best one out there. Google Now is the only one I have tried though,

posted on 30 Jun 2014, 14:36 4

15. Deaconclgi (Posts: 221; Member since: 03 Nov 2012)


I use Siri, Google Now and Cortana and Cortana has been the best for me, offering features that Siri and Google Now don't offer and in a more natural sounding voice as well.

posted on 30 Jun 2014, 18:54 2

26. joey_sfb (Posts: 2716; Member since: 29 Mar 2012)


Cortana is a halo character I don't care much about. Its has a flat personality.

Anyway, good thing I stay away from wp and rt. They really bundled their ugly tiles, low feature set and now lifeless Cortana in one convenient package for me to avoid.

posted on 01 Jul 2014, 03:49

30. sbw44 (Posts: 380; Member since: 04 Dec 2012)


Everytime someone mentions WP you jump to all these bashing and trolling! I mean seriously? Cortana Lifeless? just shows much of your credibility.

posted on 01 Jul 2014, 06:43

31. jojon (Posts: 72; Member since: 11 Feb 2014)


thats a blessing then

posted on 01 Jul 2014, 20:11

32. eharris560 (Posts: 59; Member since: 28 Dec 2012)


People he's here all month.

posted on 30 Jun 2014, 15:02 2

20. elitewolverine (Posts: 1308; Member since: 28 Oct 2013)


I have tried all three, at work we regularly pull up the 5s, note 3, and my 925, it is fun to see how all them compare to the exact same commands.

So far Now and Cortana are my favorite.

I use cortana for alot of reminders, not so much voice things. If it always listened i would like it like my xbox. Just go to my room, xbox on, netflix, play. I wouldnt mind that on a phone as the article stated.

posted on 01 Jul 2014, 03:44 1

29. sbw44 (Posts: 380; Member since: 04 Dec 2012)


You haven't used Cortana but you claim Google Now is the best out there? Seriously? It's like saying a Ferrari is the fastest car out there but you haven't tried a Veyron because it does not look nice!

Plus I have used Google Now a couple of time and seriously if you think that does not sound robotic then there must be something wrong with your hearing.

I say all 3 of them offer something different, its just a matter of time before one of them offers the full package. But like Penny said Cortana is currently the better plus if rumors are true about the Kinect features coming to WP it won't be long before we get the full package!

posted on 30 Jun 2014, 13:23 3

10. jibraihimi (Posts: 667; Member since: 29 Nov 2011)


Its time to grab popcorn, as soon as you see the name of Michael H, on the top of the article...... One again great article Michael, always enjoy your long indepth articles..... Keep up the good work, and looking forward to your next article......... Btw i don't use voice assistant like google now on my android or siri on my iphone, in beginning they looked fun to use, but eventually their attraction got diminished, and they began to look gimmicky, though sometimes i do find it useful to get some relavant cards from google now.......

posted on 30 Jun 2014, 14:21

11. LuckyS (Posts: 42; Member since: 07 Dec 2013)


All of this is possible with the help of tasker and autovoice application. It requires some basic setup and may be difficult for beginers.

Root is needed too, xposed framework and google now api modul. Every possible command may be programmed and executed by voice. What it does is rerouting voice command from gnow to tasker which then executes command.

posted on 30 Jun 2014, 14:24

12. b0wzer (Posts: 38; Member since: 07 Feb 2014)


I can't agree more with this artice. I was so impressed after saying "ok google, what is the circumference of a circle whose diameter is 42cm" and having google read me out the answer. I love it! More!

posted on 30 Jun 2014, 14:27

13. mistertimi (Posts: 73; Member since: 28 May 2014)


Can't stand articles like these. It reminds us devs just why we hate consumers who just expect things. Nevermind the thousands and thousands of hours it takes to get something this complicated working.

posted on 30 Jun 2014, 14:36 3

16. Awalker (Posts: 217; Member since: 15 Aug 2013)


I don't think Google intended for Now to be a personal assistant. It was just suppose to be search with voice. Now they're slowly changing it into a personal assistant.

posted on 30 Jun 2014, 14:50 6

17. NokiaFTW (Posts: 1888; Member since: 24 Oct 2012)


I really love Cortana. Instead of typing in my search queries, I talk to Cortana. Its really fast, and honestly, after using both Google Now and Siri, I can safely say that Cortana is the best virtual voice assistant out there, more so since its in BETA unlike the other two.

posted on 30 Jun 2014, 14:51

18. frustyak (Posts: 147; Member since: 08 Mar 2010)


I'm still at a point where using voice control doesn't feel natural at all. I feel too self-conscious in public every time I've been tempted to use it. Plus, every time I've seen someone use it, it has always been a game, let's see what neat tricks Siri or Google now can do. I have yet to see anyone actually honest-to-god using it, in public, to get things done.

posted on 30 Jun 2014, 15:00

19. tacarat (Posts: 130; Member since: 22 Apr 2013)


Full voice interaction will be what forces Skynet to move from whatever it's doing now to actively exterminating the human race. Have you watched somebody yelling at their phone without voice interaction? "Hey Siri, can you do that thing you did last night?".

posted on 30 Jun 2014, 16:24 1

21. Vexify (Posts: 294; Member since: 16 Jun 2014)


Why can't we stop being so spoiled?

posted on 30 Jun 2014, 16:25 2

22. -box- (Posts: 3735; Member since: 04 Jan 2012)


My wife and I will have mini completions between my Lumias with Cortana and her GS5, and Cortana wins every time, and sounds like a person, rather than the highly-robotic google now.

posted on 30 Jun 2014, 16:27 1

23. -box- (Posts: 3735; Member since: 04 Jan 2012)


Also, GN rarely actually understand her command regularly, maybe 1/10 tries arts successful. Cortana if just the opposite.

posted on 30 Jun 2014, 18:50

25. eharris560 (Posts: 59; Member since: 28 Dec 2012)


@Michael H. I just did a test Cortana can pause, resume, and play next track using voice commands. IOS can also play the next track. Both can't play pervious track.

posted on 01 Jul 2014, 02:48 3

28. Gemmol (Posts: 492; Member since: 09 Nov 2011)


Cortana can do this, I bet the person just tested Siri and google now and assume Cortana cannot do that.......Cortana way advance compare to the rest

Want to comment? Please login or register.

Latest stories