Computer Vision Based Human-Computer Interaction

- a collaboration between

Computational Vision and Active Perception Lab (CVAP)

Centre for User Oriented IT Design (CID)

Department of Numerical Analysis and Computing Science, KTH, S-100 44 Stockholm, Sweden


With the development of information technology in our society, we can expect that computer systems to a larger extent will be embedded into our environment. These environments will impose needs for new types of human-computer-interaction, with interfaces that are natural and easy to use. In particular, the ability to interact with computerized equipment without need for special external equipment is attractive.

Today, the keyboard, the mouse and the remote control are used as the main interfaces for transferring information and commands to computerized equipment. In some applications involving three-dimensional information, such as visualization, computer games and control of robots, other interfaces based on trackballs, joysticks and datagloves are being used. In our daily life, however, we humans use our vision and hearing as main sources of information about our environment. Therefore, one may ask to what extent it would be possible to develop computerized equipment able to communicate with humans in a similar way, by understanding visual and auditive input.

The purpose of this project is to develop new perceptual interfaces for human-computer-interaction based on visual input captured by computer vision systems, and to investigate how such interfaces can complement or replace traditional interfaces based on keyboards, mouses, remote controls, data gloves or speech. Examples of applications of gesture recognition include:

  • Control of consumer electronics
  • Interaction with visualization systems
  • Control of mechanical systems
  • Computer games
Main advantages of using visual input in this context are that visual information makes it possible to communicate with computerized equipment at a distance, without need for physical contact with the equipment to be controlled. Compared to speech commands, hand gestures are advantageous in noisy environments, in situations where speech commands would be disturbing, as well as for communicating quantitative information and spatial relationships. The idea is that the user should be able to control equipment in his environment as he is, and without need for specialized external equipment, such as a remote control,

The project combines CVAPs expertise in computer vision with CIDs experience in designing and evaluating of new human-machine interfaces. Initially, the focus is on developing algorithms for recognizing hand gestures and to build prototype systems that make it possible to test perceptual interfaces in practical applications. An important component of the work is to perform continuous user studies in close connection with the development work.

It may be worth emphasizing that that the aim is not to recognize the kind of expressive gestures that are tightly coupled to our speech, or sign languages aimed at inter-human communication. The goal is to explore hand gestures suitable for various control tasks in human-machine interaction. Multi-modal interfaces including hand gesture recognition, face and gaze tracking and speech recognition will also be considered.

Detailed information (click on an image to see a videoclip demo; on a header for technical information)

  • A prototype system for computer vision based human computer interaction based on hand gesture recognition -- Includes a demonstration of how a user can control different types of consumer electronics using hand gestures.

  • Camera mouse control -- A simple demonstration of how it the cursor on a computer screen can be controlled by hand motions, and how hand gestures can be used for interacting with programs via a cursor controlled by hand gestures.

  • Simultaneous hand tracking and hand posture recognition -- Includes a demonstration where the estimated hand motions and the recognized hand postures control the motion of drawing device. (In the video clip below, the type of hand posture controls the type of action that is performed, while the estimated hand motion controls the motion of the pencil, the cursor or the drawing.)

    Example of internal states of the multi-state particle filtering algorithm (click for larger image):

    enlarge image

  • The 3-D hand mouse -- A demonstration of how it is possible to measure the three-dimensional motion (translations and rotations) of a human hand, and to use such motion estimates to control the three-dimensional motion of other computerized devices. (In the video clip below, the cube moves according to the estimated motion of the hand.)

  • People -- who have contributed to this project


A prototype hand gesture recognition system constructed within this project was presented at the Swedish IT-fair Connect 2001 at Älvsjömässan, Stockholm, Sweden, April 23-25, 2001. View the news reports about our gesture control system at www.idg.se, in Expressen and Dagens IT.

References (click on a title to fetch the corresponding text)

S. Lenman, L. Bretzner, and B. Thuresson, ``Using marking menus to develop command sets for computer vision based hand gesture interfaces'', in Second Nordic Conference on Human-Computer Interaction, NordiCHI02, (Aarhus, Denmark), pp. --, Oct. 2002. (pdf)

L. Bretzner, I. Laptev and T. Lindeberg, ``Hand gesture recognition using multi-scale colour features, hierarchical models and particle filtering'', in Proc. 5th IEEE International Conference on Automatic Face and Gesture Recognition Washington D.C, May 2002. (pdf)

L. Bretzner and T. Lindeberg, ``Qualitative multi-scale feature hierarchies for object tracking'', in Proc. 2nd International Conference on Scale-Space Theories in Computer Vision (O. F. Olsen M. Nielsen, P. Johansen and J. Weickert, eds.), vol. 1682, (Corfu, Greece), pp. 117--128, Springer Verlag, Sept. 1999. Lecture Notes in Computer Science, (Extended version available as Tech. Rep. ISRN KTH/NA/P--99/09--SE). (PostScript)

L. Bretzner and T. Lindeberg, ``Qualitative multi-scale feature hierarchies for object tracking'', Journal of Visual Communication and Image Representation, vol. 11, pp. 115--129, June 2000.

L. Bretzner and T. Lindeberg, ``Use your hand as a 3-D mouse or relative orientation from extended sequences of sparse point and line correspondances using the affine trifocal tensor'', in Proc. 5th European Conference on Computer Vision (H. Burkhardt and B. Neumann, eds.), vol. 1406 of Lecture Notes in Computer Science, (Freiburg, Germany), pp. 141--157, Springer Verlag, Berlin, June 1998. (PostScript)

L. Bretzner, I. Laptev, T. Lindeberg, S. Lenman, Y. Sundblad, ``A prototype system for computer vision based human computer interaction Technical report ISRN KTH/NA/P-01/09-SE, April 2001.

I. Laptev and T. Lindeberg, "Tracking of multi-state hand models using particle filtering and a hierarchy of multi-scale image features", Technical report ISRN KTH/NA/P-00/12-SE, September 2000. Shortened version in IEEE Workshop on Scale-Space and Morphology, Vancouver, Canada, July 2001, Springer Verlag Lecture Notes in Computer Science.

I. Laptev and T. Lindeberg, "A multi-scale feature likelihood map for direct evaluation of object hypotheses", Technical report ISRN KTH/NA/P-01/03-SE, March 2001. Shortened version in IEEE Workshop on Scale-Space and Morphology, Vancouver, Canada, July 2001, Springer Verlag Lecture Notes in Computer Science.

T. Lindeberg and L. Bretzner ``Method and arrangement for controlling means for three-dimensional transfer of information by motion detection'', International patent application PCT/SE1999/000402, 1999 (now released).

T. Lindeberg and L. Bretzner, ``Förfarande och anordning för överföring av information genom rörelsedetektering, samt användning av anordningen'', Swedish patent 9800884-0, March 1998 (now released).


This work has been made possible by support from the Swedish National Board for Industrial and Technical Development, NUTEK, and the Swedish Research Council for Engineering Sciences, TFR.
Responsible for this page: Lars Bretzner Tony Lindeberg