Alexander’s entries

Voice Code

Almost all science fiction movies made so far have invariably featured some form of speech recognition, where all sorts of computers, robots or even sliding doors could accurately understand and respond to human speech under any conditions. Although speech recognition technology hasn’t progressed quite as fast as SF writers had imagined, it is increasingly being used for everyday products and services: a handful of mobile phones can already respond to basic voice commands without needing any previous training with samples of your voice, and Interactive Voice Response (IVR) systems will pick up your call, listen to what you want, and tell you where is the nearest restaurant or when is the next train.

The reason such applications have been (relatively) succesful is that they manage to overcome the inaccuracies of speech recognition and the ambiguity of human speech by limiting themselves to a particular domain (e.g. requests about movie showtimes) and taking into account the specific domain knowledge (e.g. users are likely to pronounce the name of a movie currently on show). By carefully following this principle, researchers from the National Research Council of Canada have created VoiceCode, an application that allows computer programmers to dictate program code to their computer, with virtually no need to touch the keyboard.

Voice Code

The main issue with dictating program code is that programming languages were never meant to be spoken but were designed to have a clear, simple structure that can be easily understood by a computer. A simple example of this is the naming of functions and variables: in program code it is not uncommon to see abbreviations such nInMsgs for a variable that represents the number of incoming messages. Without VoiceCode, one would have to spell out the abbreviation somehow like “n-capital-i-n-capital-m-s-g-s”. VoiceCode, however, can learn and adapt to the ways that computer programmers abbreviate such names, and match a full spoken phrase with its likely abbreviations. But such heuristics are bound to fail once in a while, so what happens in this case? VoiceCode again follows a basic principle of interaction design: in case of error, give the user a quick way to recover. Whether the error was made in the stage of voice recognition or in translating natural language to program code, it can be easily corrected by selecting one of the alternative interpretations presented in a popup window.

It’s impossible to describe all the features that make this piece of software a great case study in voice recognition, so I highly recommend watching their demo video that was presented in the CHI 2006 conference. However, as impressive as this video may be, don’t rush to throw away your keyboards just now. You may escape RSI, but VoiceCode authors are quick to point out that a similar condition, voice strain may be linked to the continuous use of speech recognition products. One can only wonder if such problems may also arise with more futuristic input technologies, such as brain wave reading.

Context photography: capturing more than meets the eye

It is often said that a picture is worth a thousand words. In the case of a picture created through photography, what would these thousand words actually describe? A photograph is most often a static depiction of a scene at one specific moment in time. Although with modern cameras it is relatively easy to capture all the visual detail of a scene, this is rarely sufficient for portraying the context in which this scene was set, which is usually left to the imagination of the viewer.

A group of researchers from the Future Applications Lab at the Viktoria Institute in Sweden set themselves to changing this status quo by proposing context photography, a novel way of representing within the photo itself the context in which it was taken. In their experimentations, they chose two parameters to represent context: camera movement and ambient sound. Their prototype “context camera”, implemented using a camera-phone, monitors these parameters in real time and feeds them into a set of visual effects that are applied to the picture as is it shot with the phone camera. This process is adjustable by the photographer so that, for example, one can chose to associate the effect of colour shadows with the presence of high-pitched ambient noise. Some examples of the resulting photographs are visible in the following pictures:

Context photography prototype and samples

In their latest research paper published in the 2006 NordiCHI conference, the researchers describe the reactions of photographers who used context camera-phone prototypes for a period of six weeks and submitted a total of around 300 pictures. Although each participant’s experience with context phtography was unique, possibly reflecting their different attitudes to personal photography in general, some common themes did emerge from the participants’ photographs and comments

First, as with many innovative interactive technologies, context photography has found unexpected uses. Although it was probably conceived as a way of capturing the existing context, it was found that users would attempt to artificially “create” some context in order to trigger the contextual effects. For example, some users would scream if there were no ambient sounds, or would try moving the camera in different ways if there was no natural movement in the scene. In doing so, they turned the contextual parameters into yet another input in the creating process of photography.

However, users would not generally agree how these inputs should affect the resulting image. The aesthetics of the visual effects applied by the context camera proved to be “highly subjective and […] a matter of personal taste”. Although each photographer could adjust the visual effects through some calibration process, some found this process complicated or ambiguous. The researchers themselves concede that “designing effects to suit a high number and wide range of users becomes a challenging task”.

Although it may take some time for context photography to be optimised for the everyday user, Apple’s iPhone is already pioneering the use of sensors that measure ambient light, body proximity and acceleration, and could become an interesting platform for context photography experimentation. How about a mobile, context photography-based version of Apple Photo Booth effects?