In this 10-year time frame, I believe that we’ll not only be using the keyboard and the mouse to interact, but during that time we will have perfected speech recognition and speech output well enough that those will become a standard part of the interface.
Well, 6 years after Gates’ time frame ended, “perfect” voice input and output is not yet a user interface standard. It appears to be the only hope for mobile purists, who have yet to find a proper input replacement for ye olde keyboard.
I’m here to tell you: you better find a something new. Voice input isn’t going to save you.
It’s not about latency, it’s not about quality of voice recognition (Siri aside…which has both problems.) Have you used Google’s mobile voice recognition? I have a hard time confusing it, and it recognizes my words as soon as I speak them (sometimes before: thanks, autocomplete.) At what tipping point will it be “good enough” for you realize it’s the modality itself?
The two major problems with voice commands are:
1. Voice control systems do not have very good affordance. That is to say it’s hard to know what the system allows you do and what syntax is required. A graphical user interface makes designing for affordance much easier: if a toggle exists on screen it is something you can change, and its image can suggest how to manipulate the value along with its possible states. On a voice controlled system, like on the command line, one must do more exploring.
One solution to this problem is for voice systems to allow you do anything: any possible query will give you a directional answer. This is one of reasons it is nicer to use Google search via voice command than Siri. Google has a relatively unlimited argument space (anything that can converted in to text) while Siri has what feels like an arbitrary and bizarre set of functional restrictions.
Perhaps you will one day be able to say anything in any format and have the system understand you and execute. This is essentially an AI problem. But even if we solve this (I am more optimistic than most people, less optimistic than the Singularity folks), the second problem remains.
2. By far the bigger problem is this: it feels weird to interact with a computer or smartphone with your voice. I don’t want to stand in line at the post office dictating my bizarre Wikipedia queries: I don’t even want to do that while my girlfriend is in the next room. It is so uncomfortable that I won’t even do it sitting home alone or walking down an empty street. It is hard to change this cultural norm, and I don’t think we will. The only time I feel comfortable using voice is in the car, and that is where I expect it will stay for most users.
The interfaces that do allow the next generation of input will be native to their host systems, not buckled on later like voice. On the tablet, this means fingers on glass. What will need to change is the metaphors, and the software that communicates the metaphors to us, and us to them.