Amar’s entries

Would you like to go Super Size?

Have you ever stood in front of one of those dual-30″ Cinema Display setups in an Apple Store and wondered whether you’d get a stiff neck working with so much screen real-estate? With desktop display sizes growing quickly, and more and more employers recognising that dual-screen setups can increase productivity, it’s actually becoming a valid question whether there’s a limit past which this trend becomes unreasonable. In certain scientific and military applications, visualisations are already big enough to require physically walking from one part of the display to another.
100 Mpixel display at Calit2, UCSD

At last week’s CHI ’07 conference, two studies from Virgina Tech were presented that fit into this theme. The first one looked into what would happen to users’ performance if a display was so big that it required walking. They tested both a spatial, map-based visualisation and a more abstract grid-based design, at 2560 × 768, 5120 × 1536 and a whopping 10240 × 3072 pixels (about 2.7m × 1.0m or 9′ × 3.5′). The tasks on the larger displays involved more data, and so would be expected to take longer. But it seems that our ability to process visual information scales quite well: people took on average only about three times as long when the visualisation was sixteen times larger (with variation between tasks).

The second study also tested different display sizes (the largest one being the same as in the other study), but with the aim of comparing physical navigation to its “virtual” counterparts, panning and zooming. With the larger displays, participants tended to rely less on virtual navigation, showing that people do in fact prefer moving around or turning their head. This turned out to be the right choice, as it was also more efficient than panning and zooming.

Putting these results together, it would seem that having a larger display always pays off in terms of cognitive efficiency, navigation efficiency and user preference, even if it’s too big to see all at once. Interestingly, both studies found that spatial visualisations benefit more from the extra screen real-estate than non-spatial ones.

Although efficiency is important, it would also be interesting to see a physical ergonomist’s take on the issue. Do extra-large displays hold new risks of work-related injuries, or is the extra movement actually healthier than our traditionally static workstations?

Crossmodal ambient displays

As a way of enriching the way we interact with and perceive the physical spaces we live and work in, more and more information technology is being integrated in architecture. Video screens in elevators, bars that react to touch and buildings that let passers-by catch a glimpse of the activity inside are all examples of ambient displays. They provide peripheral information, are smoothly integrated into the physical environment and usually have a focus on aesthetic appeal.

A limitation of most ambient displays, and in fact of public displays in general, is that they are not personal: everyone gets to see the same information. This limits their possible applications, leaving hand-held devices as the only means of getting more personalised information.

However, there is a way around this. If a display “broadcasts” to the public by cycling through all the information people might need, individuals can tune in to the part they’re interested in by paying attention to the appropriate time slot in each cycle. Many public displays already do this kind of multiplexing. For example, train times may be shown on two alternating “pages” on a screen, or that display in the elevator may cycle through the weather forecast, news headlines and celebrity gossip. The problem here is that you need to watch constantly to pick out the parts you want. The interaction is no longer peripheral, instead becoming the main focus of your attention.

Insights from cognitive neuroscience into how our brain can integrate information from two different senses, or modalities, come to the rescue here. Researchers from the University of Newcastle upon Tyne realised that you could cue users through a modality other than vision to guide their attention towards the right time slot in a display’s cycle.

They designed a navigation system called CrossFlow, which projects arrows onto the floor, pointing in each of the possible directions in turn, in a repeating cycle. To know which set of arrows to follow, a user specifies their destination on a mobile device. The device then figures out the schedule of the relevant arrows, and vibrates and/or beeps in sync with them. This cross-modal cue allows the user to focus on a particular direction, without having to pay constant attention to either the ambient display or the mobile device.

CrossFlow illustration

Testing of the system against using a map showed improved performance both in navigating and in arithmetic tasks they had to do at the same time, and participants perceived their mental workload to be lower.

I find something strangely elegant and compelling about this concept of a public-private information display. To bystanders, the public, visual component of the display presents a mysterious and aesthetic phenomenon. Only those who receive the other half of the information in the form of haptic or auditory cues can make sense of it. And, as long as only vibration is used, you won’t know how the person standing next to you is perceiving it all.

Content-aware scrolling

When working with digital content on a screen, we spend an awful lot of our time scrolling. Two things in particular can make this very ineffective. One is that you often want to traverse content linearly that is represented in two dimensions, for example some text that’s in several columns on a page. If your screen isn’t big enough to fit all columns, you end up having to scroll up and down and left to right repeatedly to read it.

The other problem is that a lot of the stuff you scroll through may not actually be important. If you’re interested in particular parts of a document, everything in between feels like a waste of space while scrolling.

Edward Ishak and Steven Feiner of Columbia University have devised a technique for dealing with these issues. Their solution is to identify the content of a document that’s relevant to a task and to determine a meaningful path through it, which the user can then move along with a special scrollbar. This achieves two things: the user’s one-dimensional action can be translated into movement through two-dimensional content, and “unimportant” areas can be skipped automatically. Actually, rather than simply skipping them, their system “flies” over these areas at high speed, while at the same time zooming out to help you keep your orientation.

You can see the the technique in action in this movie, which gives you a good idea of how it works. As part of their research, they implemented this for reading multi-column PDF documents, for jumping between search results in a text, and even for traversing all the faces in a photograph.

Content-aware scrolling illustration

The content-aware scrolling path through a two-column text document, for search results and for faces in a photo. Dashed segments are flown over automatically.

The issues this design addresses are particularly pertinent to hand-held devices with small screens. Other approaches in this area include tilt-based scrolling, momentum-based scrolling and zooming. But even though scrolling in two dimensions may not be so common on today’s large desktop screens, finding a place in a long document is, and content-aware scrolling has the potential to help even here. Unfortunately, this first study didn’t include formal user testing, so the real-world usability of the technique is still uncertain.

Pinching thin air

Multi-touch is all the rage these days. Presenters resizing, zooming and rotating photographs and maps, all with a simple movement of two hands or fingers across the display surface. Regardless of how likely you are to actually want to do these things, it is a compelling interaction technique, because it is natural and direct. However, touch screens have their downsides and limitations, and it is uncertain whether they will ever displace the keyboard and the mouse from their spot on our desktops.

An alternative for gestural input is through computer vision, as is used in the the EyeToy for the Sony PlayStation 2. However, accurate recognition of complex gestures involving fingers poses a challenge for these systems.

Computer scientist Andrew Wilson has now found a new way of achieving vision-based gestural input using a much simpler method than previous approaches. Using a standard web cam looking down onto your keyboard, his software can recognise when you put your thumb and forefinger together, and allows you to then move, zoom and rotate content thus “grabbed”. In addition to the two-point manipulations that multi-touch allows, you can also pinch with only one hand and twist it to rotate, or move it up towards the camera to zoom in.

Andrew Wilson demonstrating vision-based pinching interface

I highly recommend watching the video. (Try to ignore Robert Scoble’s musings in between Wilson’s explanations. Credit to him, though, for shooting this as part of his tour of Microsoft Research.)

The way it works is simple but ingenious. Instead of trying to recognise complex shapes of the hand, Wilson’s solution uses a simple heuristic: while your hands are in the picture without any fingers touching, the background is one single continuous region; when you put your thumb and forefinger together, you are “pinching off” a piece of the background, creating a new region in the image. Regions touching the edge of the image are ignored, which avoids interpreting shapes that you may inadvertently create by cutting across the corner of the image with your arm. To allow the one-handed manipulations, the software further has to find the approximate ellipse formed between the fingers, and react to changes in its orientation and size.

The solution has some limitations. Altough the interaction seems simple and natural, the hands and fingers have to be held in a particular way and within a certain area. The appearance of the background is also important so that hands can be recognised. And although the technique can be used to control the mouse cursor, the interaction for this is not as natural as for direct manipulation. There are also no empirical results from user testing yet, so there may be further usability issues lurking. Despite its limitations, however, this interface looks promising in that it may allow ad-hoc gestural interaction to complement our keyboard and mouse, without requiring expensive new hardware.

Feeling unhappy? Try ligatures.

The question of whether applying proper typographic rules really makes text more legible or aesthetically pleasing to anyone other than typography geeks has no doubt been debated to death.

Certain aspects of text presentation, such as line width, leading and anti-aliasing have been shown to cause differences in reading speed and/or comprehension. However, do the more subtle aspects that typographers pay attention to, such as ligatures and kerning (allowing the space occupied by two characters to overlap), really make any difference?

As part of a series of studies, a group of researchers around Microsoft’s Kevin Larson tested the use of advanced typographic features of OpenType (kerning, ligatures, small caps, non-lining numerals, subscript and superscript) against text without these features.

OpenType illustration

The result was that they made no significant difference to reading speed or comprehension, and in fact not even to subjective ratings: about half the people preferred the non-OpenType version of the text.

However, they then went on to determine participants’ affect, or emotional state. One way they did this was by measuring activation of the facial corrugator muscle. Surprisingly, participants turned out to frown less, and could therefore be said to have been “happier”, when reading text with the enhanced typography.

In another test, people were given creative problem solving tasks after they had done the reading. It had previously been shown that performance on these correlates with positive affect, so it was hoped that the outcome would capture aesthetic appeal. Indeed, participants who read text with good typography did perform better on the tests.

These results are interesting in themselves, but proving the merit of good typography wasn’t the study’s only goal. Another main motivation was to find new ways of measuring the effect of aesthetic factors. These are often too subtle to be noticed consciously, and therefore can’t be tested through questionnaires. It looks like measuring facial muscle activation and creative cognitive task performance may be sensitive and reliable enough to do the job.

(These results were presented at the British HCI 2006 conference, but the paper, Measuring the Aesthetics of Reading, is not yet available online. However, you can get a precursory paper that covers part of the work.)

Introducing UIScape

When the computer mouse first successfully made it onto people’s desktops in the early ’80s, it was already a twenty-year-old invention. When Steve Jobs showed off the iPhone’s multi-touch interface in early 2007, he was presenting a technique researchers had been experimenting with for twenty-five years.

Why do new models of interaction that have the potential to truly revolutionise the way we use technology take so long to make it into our lives? In many cases, it’s because the technology required to make them feasible and affordable is not available for years after their invention. There is also a certain inertia in the market that makes consumers shy away from radically new ways of doing things, because of questions about compatibility, because of the need to relearn and because of the subtle rules of our social framework.

However, we believe that there is another factor that contributes to these ideas not being picked up. Those who work in the field of human–computer interaction (HCI) research are eager to get their work noticed and to make a difference to people’s lives. However, the primary way for them to publicise their ideas is through conferences and journals, which are generally not accessible to (or at least not accessed by) those not working in research labs or academia. As a result, researchers seem to be communicating their ideas mainly to each other, not the other people to whom their work is highly relevant: the designers, engineers, marketers and users of technology.

UIScape is our humble attempt at bridging this divide between the research world and the rest of the world. We’ll be keeping up with the latest interaction-related research, picking out the bits we think you will find interesting and presenting them here in an easy-to-digest format. There is also plenty of older work which deserves more attention, so expect some ventures into history.

HCI is an insanely broad field, drawing from psychology, ergonomics, design theory, computer science, sociology and anthropology. HCI research can involve studying users, modelling human behaviour, designing and building solutions, and experiments to test hypotheses and designs. This breadth not only makes it hard to define what HCI is, but also to predict what you will find on this site. However, what all the work has in common is that it is relevant in some way to how humans interact with technology, and therefore potentially interesting to anyone involved with this aspect of design. Whether you are an interaction designer, software developer, product designer, architect or simply a design and technology enthusiast, we’re sure there’ll be plenty of interesting stuff for you in there.