by Alexander
Voice Code
Almost all science fiction movies made so far have invariably featured some form of speech recognition, where all sorts of computers, robots or even sliding doors could accurately understand and respond to human speech under any conditions. Although speech recognition technology hasn’t progressed quite as fast as SF writers had imagined, it is increasingly being used for everyday products and services: a handful of mobile phones can already respond to basic voice commands without needing any previous training with samples of your voice, and Interactive Voice Response (IVR) systems will pick up your call, listen to what you want, and tell you where is the nearest restaurant or when is the next train.
The reason such applications have been (relatively) succesful is that they manage to overcome the inaccuracies of speech recognition and the ambiguity of human speech by limiting themselves to a particular domain (e.g. requests about movie showtimes) and taking into account the specific domain knowledge (e.g. users are likely to pronounce the name of a movie currently on show). By carefully following this principle, researchers from the National Research Council of Canada have created VoiceCode, an application that allows computer programmers to dictate program code to their computer, with virtually no need to touch the keyboard.

The main issue with dictating program code is that programming languages were never meant to be spoken but were designed to have a clear, simple structure that can be easily understood by a computer. A simple example of this is the naming of functions and variables: in program code it is not uncommon to see abbreviations such nInMsgs for a variable that represents the number of incoming messages. Without VoiceCode, one would have to spell out the abbreviation somehow like “n-capital-i-n-capital-m-s-g-s”. VoiceCode, however, can learn and adapt to the ways that computer programmers abbreviate such names, and match a full spoken phrase with its likely abbreviations. But such heuristics are bound to fail once in a while, so what happens in this case? VoiceCode again follows a basic principle of interaction design: in case of error, give the user a quick way to recover. Whether the error was made in the stage of voice recognition or in translating natural language to program code, it can be easily corrected by selecting one of the alternative interpretations presented in a popup window.
It’s impossible to describe all the features that make this piece of software a great case study in voice recognition, so I highly recommend watching their demo video that was presented in the CHI 2006 conference. However, as impressive as this video may be, don’t rush to throw away your keyboards just now. You may escape RSI, but VoiceCode authors are quick to point out that a similar condition, voice strain may be linked to the continuous use of speech recognition products. One can only wonder if such problems may also arise with more futuristic input technologies, such as brain wave reading.
- Categories: ergonomics, voice & sound, input techniques, accessibility
- Trackback from your site
11:31
Is there any chance this would be faster than typing? If not, I’d suggest it’s got zero chance of any significant use.
14:38
I think their main target audience seem to be programmers who suffer from RSI. I can also see this getting used by people with physical disabilities who cannot use a mouse or keyboard. There’s another system that targets these users specifically.
Still, it would indeed be interesting to know whether it’s fast enough to tempt even “manual” programmers. Unfortunately, their paper doesn’t talk about this.
21:34
On Phones runnnig Nokia S40 Series 3rd Edition, the voice recognition is quite good. The correct name is usually picked, unless you have many similar names in your address book, like “Paul Johnsen” and “Paul Johnsted”.
In that case, the programmers have been careful enough to introduce a small delay in which the user can select the correct contact from the phone menu - if he happens to be looking at his screen.
it’s to my knowledge not possible to “cancel” an outgoing call by voice commands. The headset will duly repeat the phrase the phone believes to have picked up, and then - if it has picked the wrong contact - you’ll have to immediately grab for your phone to hang up the call before the wrong person answers.
23:06
Paul: I don’t have any hard data at hand (although I guess they do exist) about programmer productivity, but from my personal experience writing software, I doubt that the #1 goal is the speed of data entry. Modern IDEs (such as Eclipse) have already reduced the need for a lot of typing by providing code templates and context-sensitive completions, features also present in VoiceCode. I don’t think that triggering these features by voice is going to be much slower than triggering them with a couple of keystrokes.
Sven: Good observation, as I said, there always needs to be a quick way of correcting mistakes made by voice recognition. Although I suspect that if you’re talking to the phone through a headset, all headsets should have a button that allows you to end the call.
01:38
[…] UIScape: Researchers from the National Research Council of Canada have created VoiceCode, an application […]