Would you like to go Super Size?

Have you ever stood in front of one of those dual-30″ Cinema Display setups in an Apple Store and wondered whether you’d get a stiff neck working with so much screen real-estate? With desktop display sizes growing quickly, and more and more employers recognising that dual-screen setups can increase productivity, it’s actually becoming a valid question whether there’s a limit past which this trend becomes unreasonable. In certain scientific and military applications, visualisations are already big enough to require physically walking from one part of the display to another.
100 Mpixel display at Calit2, UCSD

At last week’s CHI ’07 conference, two studies from Virgina Tech were presented that fit into this theme. The first one looked into what would happen to users’ performance if a display was so big that it required walking. They tested both a spatial, map-based visualisation and a more abstract grid-based design, at 2560 × 768, 5120 × 1536 and a whopping 10240 × 3072 pixels (about 2.7m × 1.0m or 9′ × 3.5′). The tasks on the larger displays involved more data, and so would be expected to take longer. But it seems that our ability to process visual information scales quite well: people took on average only about three times as long when the visualisation was sixteen times larger (with variation between tasks).

The second study also tested different display sizes (the largest one being the same as in the other study), but with the aim of comparing physical navigation to its “virtual” counterparts, panning and zooming. With the larger displays, participants tended to rely less on virtual navigation, showing that people do in fact prefer moving around or turning their head. This turned out to be the right choice, as it was also more efficient than panning and zooming.

Putting these results together, it would seem that having a larger display always pays off in terms of cognitive efficiency, navigation efficiency and user preference, even if it’s too big to see all at once. Interestingly, both studies found that spatial visualisations benefit more from the extra screen real-estate than non-spatial ones.

Although efficiency is important, it would also be interesting to see a physical ergonomist’s take on the issue. Do extra-large displays hold new risks of work-related injuries, or is the extra movement actually healthier than our traditionally static workstations?

Voice Code

Almost all science fiction movies made so far have invariably featured some form of speech recognition, where all sorts of computers, robots or even sliding doors could accurately understand and respond to human speech under any conditions. Although speech recognition technology hasn’t progressed quite as fast as SF writers had imagined, it is increasingly being used for everyday products and services: a handful of mobile phones can already respond to basic voice commands without needing any previous training with samples of your voice, and Interactive Voice Response (IVR) systems will pick up your call, listen to what you want, and tell you where is the nearest restaurant or when is the next train.

The reason such applications have been (relatively) succesful is that they manage to overcome the inaccuracies of speech recognition and the ambiguity of human speech by limiting themselves to a particular domain (e.g. requests about movie showtimes) and taking into account the specific domain knowledge (e.g. users are likely to pronounce the name of a movie currently on show). By carefully following this principle, researchers from the National Research Council of Canada have created VoiceCode, an application that allows computer programmers to dictate program code to their computer, with virtually no need to touch the keyboard.

Voice Code

The main issue with dictating program code is that programming languages were never meant to be spoken but were designed to have a clear, simple structure that can be easily understood by a computer. A simple example of this is the naming of functions and variables: in program code it is not uncommon to see abbreviations such nInMsgs for a variable that represents the number of incoming messages. Without VoiceCode, one would have to spell out the abbreviation somehow like “n-capital-i-n-capital-m-s-g-s”. VoiceCode, however, can learn and adapt to the ways that computer programmers abbreviate such names, and match a full spoken phrase with its likely abbreviations. But such heuristics are bound to fail once in a while, so what happens in this case? VoiceCode again follows a basic principle of interaction design: in case of error, give the user a quick way to recover. Whether the error was made in the stage of voice recognition or in translating natural language to program code, it can be easily corrected by selecting one of the alternative interpretations presented in a popup window.

It’s impossible to describe all the features that make this piece of software a great case study in voice recognition, so I highly recommend watching their demo video that was presented in the CHI 2006 conference. However, as impressive as this video may be, don’t rush to throw away your keyboards just now. You may escape RSI, but VoiceCode authors are quick to point out that a similar condition, voice strain may be linked to the continuous use of speech recognition products. One can only wonder if such problems may also arise with more futuristic input technologies, such as brain wave reading.

Crossmodal ambient displays

As a way of enriching the way we interact with and perceive the physical spaces we live and work in, more and more information technology is being integrated in architecture. Video screens in elevators, bars that react to touch and buildings that let passers-by catch a glimpse of the activity inside are all examples of ambient displays. They provide peripheral information, are smoothly integrated into the physical environment and usually have a focus on aesthetic appeal.

A limitation of most ambient displays, and in fact of public displays in general, is that they are not personal: everyone gets to see the same information. This limits their possible applications, leaving hand-held devices as the only means of getting more personalised information.

However, there is a way around this. If a display “broadcasts” to the public by cycling through all the information people might need, individuals can tune in to the part they’re interested in by paying attention to the appropriate time slot in each cycle. Many public displays already do this kind of multiplexing. For example, train times may be shown on two alternating “pages” on a screen, or that display in the elevator may cycle through the weather forecast, news headlines and celebrity gossip. The problem here is that you need to watch constantly to pick out the parts you want. The interaction is no longer peripheral, instead becoming the main focus of your attention.

Insights from cognitive neuroscience into how our brain can integrate information from two different senses, or modalities, come to the rescue here. Researchers from the University of Newcastle upon Tyne realised that you could cue users through a modality other than vision to guide their attention towards the right time slot in a display’s cycle.

They designed a navigation system called CrossFlow, which projects arrows onto the floor, pointing in each of the possible directions in turn, in a repeating cycle. To know which set of arrows to follow, a user specifies their destination on a mobile device. The device then figures out the schedule of the relevant arrows, and vibrates and/or beeps in sync with them. This cross-modal cue allows the user to focus on a particular direction, without having to pay constant attention to either the ambient display or the mobile device.

CrossFlow illustration

Testing of the system against using a map showed improved performance both in navigating and in arithmetic tasks they had to do at the same time, and participants perceived their mental workload to be lower.

I find something strangely elegant and compelling about this concept of a public-private information display. To bystanders, the public, visual component of the display presents a mysterious and aesthetic phenomenon. Only those who receive the other half of the information in the form of haptic or auditory cues can make sense of it. And, as long as only vibration is used, you won’t know how the person standing next to you is perceiving it all.

Content-aware scrolling

When working with digital content on a screen, we spend an awful lot of our time scrolling. Two things in particular can make this very ineffective. One is that you often want to traverse content linearly that is represented in two dimensions, for example some text that’s in several columns on a page. If your screen isn’t big enough to fit all columns, you end up having to scroll up and down and left to right repeatedly to read it.

The other problem is that a lot of the stuff you scroll through may not actually be important. If you’re interested in particular parts of a document, everything in between feels like a waste of space while scrolling.

Edward Ishak and Steven Feiner of Columbia University have devised a technique for dealing with these issues. Their solution is to identify the content of a document that’s relevant to a task and to determine a meaningful path through it, which the user can then move along with a special scrollbar. This achieves two things: the user’s one-dimensional action can be translated into movement through two-dimensional content, and “unimportant” areas can be skipped automatically. Actually, rather than simply skipping them, their system “flies” over these areas at high speed, while at the same time zooming out to help you keep your orientation.

You can see the the technique in action in this movie, which gives you a good idea of how it works. As part of their research, they implemented this for reading multi-column PDF documents, for jumping between search results in a text, and even for traversing all the faces in a photograph.

Content-aware scrolling illustration

The content-aware scrolling path through a two-column text document, for search results and for faces in a photo. Dashed segments are flown over automatically.

The issues this design addresses are particularly pertinent to hand-held devices with small screens. Other approaches in this area include tilt-based scrolling, momentum-based scrolling and zooming. But even though scrolling in two dimensions may not be so common on today’s large desktop screens, finding a place in a long document is, and content-aware scrolling has the potential to help even here. Unfortunately, this first study didn’t include formal user testing, so the real-world usability of the technique is still uncertain.

Four seconds at a time…

More and more computing power is being put into handheld devices like mobile phones. A wealth of applications that were previously only available on desktop computers are now available in the palm of your hand. Taking these applications with you on the move is often as useful as the designers hope, but once as I tried to complete a game of Monopoly in London’s Victoria Station at rush hour, I knew that I was doing two things that were fundamentally incompatible with each other. 

A group of Finnish researchers have asked the question: just how much does a busy, demanding real-world context affect the user’s interaction with mobile devices? The answer, in case you’re walking across Victoria Station too, is a lot.

In fact, this study clearly demonstrates that mobility itself - the very thing we value mobile phones for - directly limits our ability to use mobile applications.

The researchers argue that we can‘t be mobile without dedicating some of our precious cognitive resources to maintain our position in the world. We have to commit social tasks like maintaining a sense of privacy on a crowded train, by monitoring the position of others and then shifting position appropriately. We also have to commit navigational tasks when mobile, like finding our current location, planning a route, walking purposely to avoid oncomers, buses, and so forth. Simply put, the time we spend thinking about these things means we can’t think about the device we’re trying to use at the same time.

                                                 170875918_93692d48e4_m.jpg

                                                   Image courtesy of evanrude

This paper gives an idea of just how much our attention falls away in demanding mobile settings. In the lab, the participants looked at the mobile screen for an average of about 14s before looking away. When the participants were on a busy street, it was only 4s - less than a third of the time. This pattern continued throughout many measures: the strongest in my opinion is that in the lab, participants looked at their external environment for only 5% of the time they were using the device, while in the busy street participants were distracted enough that they looked away from the screen 51% of the time.

Two small quibbles with these figures is that they only measured visual attention, and only while a new internet page was loading. We can attend to our external environment in ways other than looking at it, and we’re more likely to look away from the screen while it’s loading than at other times. So I’d say keep in mind the ratios between the figures, rather than the absolute values.

And clearly, the users were in a difficult position: trying to use this new device while not walking into anyone or anything. How did they cope with these conflicting demands? The researchers reported that users seemed to pay lots of attention to a setting just after they entered it, then settled into long periods of using the device, punctuated by brief periods of attending to the environment. Perhaps most interesting, the researchers noted that when the environment demanded social interaction, users almost always stopped using the mobile device until the social demand was fulfilled. This hints that users give social demands a much higher priority than using their device.

For those of us looking to design, this paper is worth knowing about because it provides some hard guidelines to a previously intuitive idea of mobile contexts. For instance, don’t dream that your users spend as long as you do looking at the screen: count on 4s of your user’s time before they have to look away. You might get half your users time if they’re busy, so respect what time you get. Provide a simple, fast, easy-to-scan interface. Let them finish quickly, so they can get on with the more pressing task of making their way across Victoria station.

Pinching thin air

Multi-touch is all the rage these days. Presenters resizing, zooming and rotating photographs and maps, all with a simple movement of two hands or fingers across the display surface. Regardless of how likely you are to actually want to do these things, it is a compelling interaction technique, because it is natural and direct. However, touch screens have their downsides and limitations, and it is uncertain whether they will ever displace the keyboard and the mouse from their spot on our desktops.

An alternative for gestural input is through computer vision, as is used in the the EyeToy for the Sony PlayStation 2. However, accurate recognition of complex gestures involving fingers poses a challenge for these systems.

Computer scientist Andrew Wilson has now found a new way of achieving vision-based gestural input using a much simpler method than previous approaches. Using a standard web cam looking down onto your keyboard, his software can recognise when you put your thumb and forefinger together, and allows you to then move, zoom and rotate content thus “grabbed”. In addition to the two-point manipulations that multi-touch allows, you can also pinch with only one hand and twist it to rotate, or move it up towards the camera to zoom in.

Andrew Wilson demonstrating vision-based pinching interface

I highly recommend watching the video. (Try to ignore Robert Scoble’s musings in between Wilson’s explanations. Credit to him, though, for shooting this as part of his tour of Microsoft Research.)

The way it works is simple but ingenious. Instead of trying to recognise complex shapes of the hand, Wilson’s solution uses a simple heuristic: while your hands are in the picture without any fingers touching, the background is one single continuous region; when you put your thumb and forefinger together, you are “pinching off” a piece of the background, creating a new region in the image. Regions touching the edge of the image are ignored, which avoids interpreting shapes that you may inadvertently create by cutting across the corner of the image with your arm. To allow the one-handed manipulations, the software further has to find the approximate ellipse formed between the fingers, and react to changes in its orientation and size.

The solution has some limitations. Altough the interaction seems simple and natural, the hands and fingers have to be held in a particular way and within a certain area. The appearance of the background is also important so that hands can be recognised. And although the technique can be used to control the mouse cursor, the interaction for this is not as natural as for direct manipulation. There are also no empirical results from user testing yet, so there may be further usability issues lurking. Despite its limitations, however, this interface looks promising in that it may allow ad-hoc gestural interaction to complement our keyboard and mouse, without requiring expensive new hardware.

Context photography: capturing more than meets the eye

It is often said that a picture is worth a thousand words. In the case of a picture created through photography, what would these thousand words actually describe? A photograph is most often a static depiction of a scene at one specific moment in time. Although with modern cameras it is relatively easy to capture all the visual detail of a scene, this is rarely sufficient for portraying the context in which this scene was set, which is usually left to the imagination of the viewer.

A group of researchers from the Future Applications Lab at the Viktoria Institute in Sweden set themselves to changing this status quo by proposing context photography, a novel way of representing within the photo itself the context in which it was taken. In their experimentations, they chose two parameters to represent context: camera movement and ambient sound. Their prototype “context camera”, implemented using a camera-phone, monitors these parameters in real time and feeds them into a set of visual effects that are applied to the picture as is it shot with the phone camera. This process is adjustable by the photographer so that, for example, one can chose to associate the effect of colour shadows with the presence of high-pitched ambient noise. Some examples of the resulting photographs are visible in the following pictures:

Context photography prototype and samples

In their latest research paper published in the 2006 NordiCHI conference, the researchers describe the reactions of photographers who used context camera-phone prototypes for a period of six weeks and submitted a total of around 300 pictures. Although each participant’s experience with context phtography was unique, possibly reflecting their different attitudes to personal photography in general, some common themes did emerge from the participants’ photographs and comments

First, as with many innovative interactive technologies, context photography has found unexpected uses. Although it was probably conceived as a way of capturing the existing context, it was found that users would attempt to artificially “create” some context in order to trigger the contextual effects. For example, some users would scream if there were no ambient sounds, or would try moving the camera in different ways if there was no natural movement in the scene. In doing so, they turned the contextual parameters into yet another input in the creating process of photography.

However, users would not generally agree how these inputs should affect the resulting image. The aesthetics of the visual effects applied by the context camera proved to be “highly subjective and […] a matter of personal taste”. Although each photographer could adjust the visual effects through some calibration process, some found this process complicated or ambiguous. The researchers themselves concede that “designing effects to suit a high number and wide range of users becomes a challenging task”.

Although it may take some time for context photography to be optimised for the everyday user, Apple’s iPhone is already pioneering the use of sensors that measure ambient light, body proximity and acceleration, and could become an interesting platform for context photography experimentation. How about a mobile, context photography-based version of Apple Photo Booth effects?

Feeling unhappy? Try ligatures.

The question of whether applying proper typographic rules really makes text more legible or aesthetically pleasing to anyone other than typography geeks has no doubt been debated to death.

Certain aspects of text presentation, such as line width, leading and anti-aliasing have been shown to cause differences in reading speed and/or comprehension. However, do the more subtle aspects that typographers pay attention to, such as ligatures and kerning (allowing the space occupied by two characters to overlap), really make any difference?

As part of a series of studies, a group of researchers around Microsoft’s Kevin Larson tested the use of advanced typographic features of OpenType (kerning, ligatures, small caps, non-lining numerals, subscript and superscript) against text without these features.

OpenType illustration

The result was that they made no significant difference to reading speed or comprehension, and in fact not even to subjective ratings: about half the people preferred the non-OpenType version of the text.

However, they then went on to determine participants’ affect, or emotional state. One way they did this was by measuring activation of the facial corrugator muscle. Surprisingly, participants turned out to frown less, and could therefore be said to have been “happier”, when reading text with the enhanced typography.

In another test, people were given creative problem solving tasks after they had done the reading. It had previously been shown that performance on these correlates with positive affect, so it was hoped that the outcome would capture aesthetic appeal. Indeed, participants who read text with good typography did perform better on the tests.

These results are interesting in themselves, but proving the merit of good typography wasn’t the study’s only goal. Another main motivation was to find new ways of measuring the effect of aesthetic factors. These are often too subtle to be noticed consciously, and therefore can’t be tested through questionnaires. It looks like measuring facial muscle activation and creative cognitive task performance may be sensitive and reliable enough to do the job.

(These results were presented at the British HCI 2006 conference, but the paper, Measuring the Aesthetics of Reading, is not yet available online. However, you can get a precursory paper that covers part of the work.)

Introducing UIScape

When the computer mouse first successfully made it onto people’s desktops in the early ’80s, it was already a twenty-year-old invention. When Steve Jobs showed off the iPhone’s multi-touch interface in early 2007, he was presenting a technique researchers had been experimenting with for twenty-five years.

Why do new models of interaction that have the potential to truly revolutionise the way we use technology take so long to make it into our lives? In many cases, it’s because the technology required to make them feasible and affordable is not available for years after their invention. There is also a certain inertia in the market that makes consumers shy away from radically new ways of doing things, because of questions about compatibility, because of the need to relearn and because of the subtle rules of our social framework.

However, we believe that there is another factor that contributes to these ideas not being picked up. Those who work in the field of human–computer interaction (HCI) research are eager to get their work noticed and to make a difference to people’s lives. However, the primary way for them to publicise their ideas is through conferences and journals, which are generally not accessible to (or at least not accessed by) those not working in research labs or academia. As a result, researchers seem to be communicating their ideas mainly to each other, not the other people to whom their work is highly relevant: the designers, engineers, marketers and users of technology.

UIScape is our humble attempt at bridging this divide between the research world and the rest of the world. We’ll be keeping up with the latest interaction-related research, picking out the bits we think you will find interesting and presenting them here in an easy-to-digest format. There is also plenty of older work which deserves more attention, so expect some ventures into history.

HCI is an insanely broad field, drawing from psychology, ergonomics, design theory, computer science, sociology and anthropology. HCI research can involve studying users, modelling human behaviour, designing and building solutions, and experiments to test hypotheses and designs. This breadth not only makes it hard to define what HCI is, but also to predict what you will find on this site. However, what all the work has in common is that it is relevant in some way to how humans interact with technology, and therefore potentially interesting to anyone involved with this aspect of design. Whether you are an interaction designer, software developer, product designer, architect or simply a design and technology enthusiast, we’re sure there’ll be plenty of interesting stuff for you in there.