Pinching thin air

Multi-touch is all the rage these days. Presenters resizing, zooming and rotating photographs and maps, all with a simple movement of two hands or fingers across the display surface. Regardless of how likely you are to actually want to do these things, it is a compelling interaction technique, because it is natural and direct. However, touch screens have their downsides and limitations, and it is uncertain whether they will ever displace the keyboard and the mouse from their spot on our desktops.

An alternative for gestural input is through computer vision, as is used in the the EyeToy for the Sony PlayStation 2. However, accurate recognition of complex gestures involving fingers poses a challenge for these systems.

Computer scientist Andrew Wilson has now found a new way of achieving vision-based gestural input using a much simpler method than previous approaches. Using a standard web cam looking down onto your keyboard, his software can recognise when you put your thumb and forefinger together, and allows you to then move, zoom and rotate content thus “grabbed”. In addition to the two-point manipulations that multi-touch allows, you can also pinch with only one hand and twist it to rotate, or move it up towards the camera to zoom in.

Andrew Wilson demonstrating vision-based pinching interface

I highly recommend watching the video. (Try to ignore Robert Scoble’s musings in between Wilson’s explanations. Credit to him, though, for shooting this as part of his tour of Microsoft Research.)

The way it works is simple but ingenious. Instead of trying to recognise complex shapes of the hand, Wilson’s solution uses a simple heuristic: while your hands are in the picture without any fingers touching, the background is one single continuous region; when you put your thumb and forefinger together, you are “pinching off” a piece of the background, creating a new region in the image. Regions touching the edge of the image are ignored, which avoids interpreting shapes that you may inadvertently create by cutting across the corner of the image with your arm. To allow the one-handed manipulations, the software further has to find the approximate ellipse formed between the fingers, and react to changes in its orientation and size.

The solution has some limitations. Altough the interaction seems simple and natural, the hands and fingers have to be held in a particular way and within a certain area. The appearance of the background is also important so that hands can be recognised. And although the technique can be used to control the mouse cursor, the interaction for this is not as natural as for direct manipulation. There are also no empirical results from user testing yet, so there may be further usability issues lurking. Despite its limitations, however, this interface looks promising in that it may allow ad-hoc gestural interaction to complement our keyboard and mouse, without requiring expensive new hardware.