by Alexander
Voice Code
Almost all science fiction movies made so far have invariably featured some form of speech recognition, where all sorts of computers, robots or even sliding doors could accurately understand and respond to human speech under any conditions. Although speech recognition technology hasn’t progressed quite as fast as SF writers had imagined, it is increasingly being used for everyday products and services: a handful of mobile phones can already respond to basic voice commands without needing any previous training with samples of your voice, and Interactive Voice Response (IVR) systems will pick up your call, listen to what you want, and tell you where is the nearest restaurant or when is the next train.
The reason such applications have been (relatively) succesful is that they manage to overcome the inaccuracies of speech recognition and the ambiguity of human speech by limiting themselves to a particular domain (e.g. requests about movie showtimes) and taking into account the specific domain knowledge (e.g. users are likely to pronounce the name of a movie currently on show). By carefully following this principle, researchers from the National Research Council of Canada have created VoiceCode, an application that allows computer programmers to dictate program code to their computer, with virtually no need to touch the keyboard.

The main issue with dictating program code is that programming languages were never meant to be spoken but were designed to have a clear, simple structure that can be easily understood by a computer. A simple example of this is the naming of functions and variables: in program code it is not uncommon to see abbreviations such nInMsgs for a variable that represents the number of incoming messages. Without VoiceCode, one would have to spell out the abbreviation somehow like “n-capital-i-n-capital-m-s-g-s”. VoiceCode, however, can learn and adapt to the ways that computer programmers abbreviate such names, and match a full spoken phrase with its likely abbreviations. But such heuristics are bound to fail once in a while, so what happens in this case? VoiceCode again follows a basic principle of interaction design: in case of error, give the user a quick way to recover. Whether the error was made in the stage of voice recognition or in translating natural language to program code, it can be easily corrected by selecting one of the alternative interpretations presented in a popup window.
It’s impossible to describe all the features that make this piece of software a great case study in voice recognition, so I highly recommend watching their demo video that was presented in the CHI 2006 conference. However, as impressive as this video may be, don’t rush to throw away your keyboards just now. You may escape RSI, but VoiceCode authors are quick to point out that a similar condition, voice strain may be linked to the continuous use of speech recognition products. One can only wonder if such problems may also arise with more futuristic input technologies, such as brain wave reading.
- Categories: ergonomics, voice & sound, input techniques, accessibility
- 5 Comments