Where are you going little robot? –
Prospects of Human-Robot Interaction

Position paper for the
CHI 99 Basic Research Symposium

Lars Oestreicher
Helge Hüttenrauch
Kerstin Severinsson-Eklund

Interaction and Presentation Laboratory
NADA

Royal Institute of Technology
Stockholm, SWEDEN

 

Introduction

In science fiction there are many examples of intelligent, autonomous, and automated machines or robots. While the fantastic, sometimes frightening, operations robots perform in books and movies are technologically complex, the interaction with humans is often depicted as natural. These fictitious robot characters are mainly designed to be primarily entertaining in the plays. Neither mechanically nor structurally are real robots today anywhere close to handling the same kind of tasks as their fictitious counterparts. So even if we do not strive for the full functionality of such colourful creations as the autonomous robots "3CPO" and "R2-D2" from the Star Wars movies, the tasks that automated robot systems are able to handle are limited to a few very special domains. Looking, for example, at domestic settings most seemingly easy household tasks still prove to be highly challenging to handle for robots.

Despite this, and while many questions about robots are still in need of further investigation, companies have already begun to target the consumer market (cf. the automatic vacuum-cleaner, Electrolux, 1999). The human-robot interaction group at the Interaction and Presentation Laboratory at NADA believes that these artefacts of digital, electrical, and mechanical engineering efforts will provide a new research frontier in interactive systems. Mobile robots differ substantially from other computational systems people interact with. They might consequently require adapted or new methods for design and engineering. In two related projects ("Intelligent Service Robots" and "A fetch-and-carry robot for people with special needs") we are studying the following fundamental issues in the light of Human-Robot Interaction:

  • Multi-modal Human-Robot Communication and Interaction Multimodality in Human-Robot interfaces.
  • As a pilot study, a questionnaire survey has recently been performed with the purpose of assessing people’s attitudes towards obtaining help from an intelligent service robot in household tasks (Khan, 1998). The study included 134 participants of various background and education, and with equal participation of both sexes. The results showed that a comparatively large proportion of those who needed help with a household task (30-50%) were positive towards having a robot to help with it. The following table gives a picture of the tasks that were most relevant for robot help according to this criterion.

    Task
    Need help
    Want help from a robot
    Polishing windows
    113
    39 (34,5%)
    Cleaning ceiling and walls
    113
    34 (30,1%)
    Wet cleaning
    104
    39 (37,5%)
    Other cleaning, incl. dusting
    107
    44 (41,1%)
    Moving heavy objects
    103
    29 (28,2%)
    Washing clothes
    86
    37 (43,0%)
    Ironing clothes
    89
    34 (38,2%)
    Dishwashing
    90
    43 (47,8%)
    Wiping surfaces
    97
    31 (32,0%)
    Reparation and maintenance
    89
    35 (39,3%)

    Table 1. Some domestic tasks that were judged to be interesting to support by an autonomous service robot.

     

    There were certain gender differences in the results to the effect that female respondents were more negative towards the use of a robot for some of the tasks.

    Results on questions on privacy and security showed that a great majority (71%) did not feel that a robot would intrude on privacy, furthermore, most participants (66%) would feel safe with a robot doing various tasks in their home.

    One question concerned a robot’s independence. The participants were given the following choices with respect to how they would allow the robot to conduct household tasks:

  • to perform actions and solve problems by itself (learning by doing), or
  • to be a smart robot, i.e. to take initiatives and suggest actions.
  • The results show a mixed picture of user preferences. With respect to the "programmed robot" alternative, 78% of the participants were positive, whereas the figures were 48% and 54% for the "learning by doing" and "smart robot" respectively.

    The survey reported by Kahn (1998) included questions about the preferred way of communicating with a service robot. The questions included, among other things, the choice of modality for the interaction. The results showed that most participants preferred speech (82%) followed by touch screen (63%), gestures (51%) and command language (45%). Altogether, the survey showed that speech is a preferred form of interacting with a household robot, but that several forms of interaction are acceptable as a complement.

    The survey provides a useful background for the problem both of finding out the proper tasks that the robot should perform, and of how to communicate with the robot in the best way. We will continue to discuss these two problems in the following.

    In a household environment, even simple examples of common tasks, such as ironing, folding sheets, laying tables, picking up toys from the floor etc., prove to be ambitious also for advanced robots. These tasks require large efforts in fields such as visual recognition (and image processing), eye-hand co-ordination, planning strategies, and other mainly similar technically oriented inventions (Kortenkamp, Bonasso, & Murphy, 1998).

    We suggest that robots for domestic usage should be regarded as end-user oriented computational, mobile devices (and/or agents). We believe the interaction with robots will introduce highly interesting problems from an HCI point of view in the near future, and the introduction of this perspective will be beneficial both for robotics as a general application area as well as for other applications which display similar characteristics.

    Figure 1. The currently used fetch-and-carry robot – A Nomadic systems SuperScout — in a joystick-controlled session.

     

    Important factors in the definition of usability are user acceptability, utility, ease of learning, and reliability reliability (Nielsen, 1992; Nielsen, 1993). User acceptability is based on the physical design, as well as the system’s functionality. It is furthermore dependent upon the extent to which the system satisfies the users’ needs by performing the wanted tasks and how efficient the user experiences the gadget(s) to be. To design for or judge these properties in a product, a (series of) prototype(s), mock-up(s) or simulation(s) is often used.

    However, to produce a robot prototype, mock-up, or simulation is often too unrealistic and difficult, too limited in its scope, in effect not modelling the robot characteristics truthfully from a usability research point of view. The robots that people will need to interact with in our research scenarios, have a physical entity, which we like to be represented even in the usability testing. We believe that the "real-world" characteristics of a working mobile robot will provide the experimental settings needed to research into by appropriate means. For early evaluation purposes Wizard-of-Oz studies may be conducted using more primitive means of operating the robot (cf. Figure 1).

    The discussion of task analysis can suit as example to be regarded in this changed research context for such a robot.

    Figure 2. An impression of how the fetch-and-carry-robot follows a person at close distance on the way to, e.g., a copying machine. The books on top of the robot will in the real prototype be placed in some kind of thread basket or similar attachment.

     

    Task Analysis

    The problems mentioned in the previous section give rise to several interesting research issues within the context of task analysis. We would like to invite the HCI research community to discuss and investigate the following:

    It is probable, if not directly plausible that the traditional methods will be insufficient for the description of robot tasks. One reason is that they in most cases do not consider a dynamically changing environment, in which both context as well as "whereabouts" of the activities change by themselves, as well as as a result of their own activities. The robot both manipulates and moves around within a physical environment. Furthermore, physical object manipulation by a machine is difficult to describe in a task analysis, since there are many considerations that are difficult for the robot to make and which will have to be discussed in the task analysis, such as object recognition. To give a concrete example:

    When asked to pick up a bowl on the living room table, should the robot deliver the bowl including its content of pop corn, or should it empty the bowl (carefully) and bring it empty? Similarly, when asked to bring a certain book from the table, should it bring only the book the user asked for, or include the other book(s) that lie on top of the requested book?

    In some sense these two tasks are similar in that the target object is a composed object (the bowl with pop corn respective the book pile on top of the table). However, in the situation the correct action to take is dependent on the properties of the target object. It is only a difference in knowledge about the environment that would allow a robot to choose the right action in the respective case. There are similar problems involved even with a simple fetch-and-carry robot (cf. Figure 2). What should the robot do in case the path is blocked? What should the robot do if the person it follows just stops in the middle of the path? Both these problems can be described by a task analysis, but with modifications to the existing methods.

    Task analysis in computer contexts has been studied for a long time, and the older methods date back to the late 60:s (Shepherd, 1989). Still there are only a few general paradigms that are mentioned in the literature (Dix, Finlay, Abowd, & Beale, 1998, Diaper, 1989):

    Furthermore, these methods are directed at highly interactive tasks, where the user and the computer work together in a relatively close interaction, e.g., word processing, e-mail systems (cf. Oestreicher, 1987).

    Within these paradigms we find many different methods, varying in the more detailed aspects of how tasks are analysed for the design of computer systems or for other purposes. Common to most of these methods is that they regard the computer as a more or less closed entity (no external context) within which the human operator performs the work. Many of the task analysis methods used for traditional software design are also focused on goal fulfilment, i.e. the goals of the tasks are stressed in the descriptions (cf. Card, Moran, & Newell, 1983; Luczak, 1997; Shepherd, 1989).

    For intelligent service robots the work context is different from in traditional software. Whereas in traditional computer systems the task is mostly internal in the computer, the robot interacts heavily with the real world. Most of the tasks we are envisioning would like to use the robot to be asked to perform for presume a large degree of autonomous activity from the robot’s side. Even in tasks where the robot needs to interact with the user in order to clarify its task, the robot is expected to be instructed in fairly large chunks.

    In a usage situation, the user is supposed to state the task for the robot, which should do it without "too much" user consulting. It is an interesting issue to see how many interactions with the user the robot is allowed to engage in, before the level of required information input, clarification, or confirmation it gets too disturbing for a user. The degree of interactivity is dependent on how well the tasks can be described and foreseen in advance. This situation also indicates a need for a shift in perspective when the tasks are being modelled.

    The robot has to fit into the user’s situation of use and context, which might means that it will can be assigned autonomous tasks by the user, tasks which will bridge over to the user’s goal states. Whether If the user will trust the robot to do larger tasks, even tasks of autonomous character, will to large extent depend largely on how acceptable the robot appears from the user’s point of view. The amount and level of interaction will have to be determined from, e.g., a task analysis, or as a result of a usability test.

    The tasks for the robot will often consist of smaller parts of the main task (subtasks) for the user. These subtasks may not necessarily seem meaningful in themselves to the user. Sometimes they are not even perceived of as tasks by the user. To give an example for a simple fetch-and-carry robot: when the robot is supposed to carry a book to another person, it will have to load the book onto itself (or have it placed on itself). For the user, this action may be hidden in a single pickup action done by the user upon leaving the room.

    Communication and Modality

    The results from the survey reported on in the introduction express a clear preference of using speech in combination with other interaction modalities. Usability research into Human-Robot communication and interaction will therefore need to investigate whether

    The range of communication and interaction systems that users are experienced with and use skilfully, include face-to-face, mediated human-to-human and man-machine communication and interfaces. This prior knowledge will be of importance in evaluating the robot's characteristics and perceived usability of expressiveness. In face-to-face communication people use (spoken) language, gestures, and gazes to convey an exchange of meaning, attitudes and opinions. As typical properties, human communication is rich in phenomena like ellipses, indirect speech acts, and situated object or action references (cf. Donellan, 1966; Milde, Peters, & Strippgen, 1997). The ambiguities incorporated in a human-to-human conversation needs to be carefully thought through and designed for in Human-Robot Interaction (see e.g., Grice, 1975).

    While the characteristics of Natural Language Interfaces (NLI:s) has been discussed previously (Ogden & Bernick, 1997) and even design principles put forward for telephone based dialog systems (Bernsen, Dybkjær, & Dybkjær, 1997), the physical embodiment of a robot might require new dialog strategies both different from telephony based or workstation based NLI systems. As a simple example the ambiguity coming along with spatial relationships and the need for adapted dialogues could be given:

    The mobile robot and the user are physically in the same room. The robot is told to "go left" - dependent upon the location of the robot in regard to the user, the 'correct' execution might mean two different directions. This will need to be resolved by the robot detecting this ambiguity (and solving it intelligently) and/or initiating an appropriate dialogue.

    We would like to clarify in which situations the communication with such devices of 'physical presence and mobility' might need to be communicated and interacted with differently.

    It is therefore that we also turn to the research into intelligent, multi-modal interfaces and (software) agents. Multi-modal interfaces are supposed to be beneficial due to their potentially high redundancy, higher perceptibility, increased accuracy, and possible synergy effects of the different individual communication modes, if taken in together. In case of a human-machine communication and interaction today, most computer systems place heavy restrictions upon the modalities, which can be used.

    For the prominent interaction mode of systems the physical act is thus restricted to the direct manipulation of the input devices (for machine tools or artefacts this often translates to the manipulations of dials, knobs and buttons). The graphical act is often dependent and bounded by looking at a display, which resides on the system and is an integral part of it for interaction and visual feedback.

    As Bolt describes in the "Put-That-There" article (Bolt, 1980) , we believe to have evidence that a ‘robot-user’ expects the location of interaction to move and expand from the screen's surface (as single-point of entry) to the real space of a room, that a user and a robot share or are connected through by means of a network. Bolt calls this descriptive the "continuos interactive space".

    Taking the robot as such an extended interactive system we would like to put the challenge of designing for an adequate combination of communication and interaction modalities up for discussion:

    What should be the principles guiding the design of interactive systems like mobile robots and how can we possibly avoid or minimise the complexity and intrusiveness of systems like data-gloves, eye-tracking, head-mounted microphones, to name but a few of the input devices, which have been used in multi-modal interaction research so far?

    The "Three Laws of Robotics" (Asimov, 1995) might also provide for a starting ground by stating the necessary principles dealing with safety, sub-ordinance, and command authority.

    Another research question is whether it really

    "is becoming more desirable to communicate with machines rather than operating them."

    as Koons et. al. (1993) postulate. In short, what metaphor(s) is (are) appropriate for what kind of robots?

    The conclusions from this question will be likely to fundamentally influence the interaction design and therefore needs to be studied and discussed.

    If for example a fully trusted, potentially autonomous intelligent agent embodiment is preferred, which heuristics do we foresee for such a device, what is the extent, range and how does the design of the communication and interaction look like? If the robot acts as a social agent, should it have an illustrative interface, or should it be allowed to initiate communication not only with its primary user, but e.g. also be able to leave a voice message on a third person's answering machine or send him/her an email if some intended action could not be completed successfully?

    As we are currently building prototypes and conducting user studies upon these kind of question we are extremely interested to put them up for discussion and receive feedback upon our questions raised and line of argumentation.

    Conclusion

    We propose that the area of domestic robots is not only a suitable but also challenging field of Human-Computer Interaction, which contains its own specific research problems. The main problem statements in HCI of course remain the same, but there are additional problems that the research needs to address, e.g. the dynamic environment, object and context recognition, HCI for autonomous agents in a physical environment, just to mention a few.

    The need to expand HCI into this area is manifold, and we will just mention a few examples:

    This paper raises the issue of Human-Robot Interaction (HRI) from primarily two points of focus, namely task analysis and modelling on the one hand, and communication and communication modality on the other.

    Presentation of Authors

    Lars Oestreicher is a university lecturer in HCI since 1991, and has been teaching HCI, Task Analysis, Usability Engineering and Cognitive Psychology to Computer Scientists since 1988. He has been involved in research on HCI topics since 1985, and is now writing his PhD thesis on task analysis for intelligent service robots.

    Helge Hüttenrauch is a Ph.D. student of the Graduate School for Human-Machine Interaction (HMI) at the Royal Institute of Technology in Stockholm. He received his diploma in communication and computer science in 1995 from the University of Applied Science, Furtwangen to join Ericsson Business Networks. Helge currently researches the communication and interaction modalities with intelligent, mobile autonomous robots.

    Kerstin Severinsson-Eklundh is Professor in Human-Computer Interaction and director of the Graduate School for Human-Machine Interaction, at the Royal Institute of Technology in Stockholm.

     

    Bibliography

    Asimov, I. (1995). The Complete robot - The Definitive Collection of Robot Stories. London: Harper Collins.

    Bernsen, N. O., Dybkjær, H., & Dybkjær, L. (1997). What Should Your Speech System Say. IEEE Computer, 30(12) December).

    Bolt, R. A. (1980). "Put-That-There": Voice and Gesture at the Graphics Interface. Computer Graphics, 14(3), 262 - 70.

    Card, S. K., Moran, T. P., & Newell, A. (1983). The Psychology of Human-Computer Interaction. Hillsdale, New Jersey: Lawrence Erlbaum.

    Dix, A., Finlay, J., Abowd, G., & Beale, R. (1998). Human-Computer Interaction (2nd ed.). London, England: Prentice Hall.

    Donellan, K. (1966). Reference and Definite Descriptions. Philosophical Review, LXXV, 281-304.

    Electrolux, (1999) Electrolux, Robot vaccum cleaner prototype, available at: http://www3.electrolux.se/robot/meny.html, 5/2 1999.

    Grice, H. P. (1975). Logic and Conversation. In: P. Cole & J. L. Moorgan (red.), Syntax and Semantics — III: Speech Acts, New York: Seminar Press.

    Khan, Z. (1998). Attitudes towards Intelligent Service Robots, IpLab, Nada, Royal Institute of Technology.

    Koons, D. B., Sparrelll, C. J., & Thórinsson, K. R. (1993). Integrating Simultaneous Input from Speech, Gaze and Hand Gestures. In: M. T. Maybury (red.), Intelligent Multi-Media Interfaces (p. 252-276). Cambridge, MA: AAAI Press/M.I.T. Press.

    Kortenkamp, D., Bonasso, R. P., & Murphy, R. (Eds.). (1998). Artificial Intelligence and Mobile Robots - Case Studies of Successful Robot Systems. Menlo Park, CA.: AAAI Press/The MIT Press.

    Luczak, H. (1997). Task Analysis. In: G. Salvendy (red.), Handbook of Human Factors (p. 341 - 416). New York: John Wiley and Sons, INC.

    Milde, J.-T., Peters, K., & Strippgen, S. (1997). Situated communication with Robots: First International Workshop on Human-Computer Conversation. Bellagio, Italy:

    Nielsen, J. (1992). The Usability Engineering Life Cycle. IEEE Computer, 25(3 (March)), 12 -22.

    Nielsen, J. (1993). Usability Engineering. San Diego, California: Academic Press.

    Oestreicher, L. (1987). The Human-Computer Interface - A User's Guide? M.Sc. Thesis, Uppsala.

    Ogden, W. C., & Bernick, P. (1997). Using Natural Language Interfaces. In: M. Helander, T. K. Landauer, & P. Prabhu (red.), Handbook of Human-computer Interaction Amsterdam: Elsevier Science Publishers B.V.

    Shepherd, A. (1989). Analysis and Training in Information Technology Tasks. In: D. Diaper (red.), Task Analysis for Human-Computer Interaction (p. 15 - 54). Chichester, England: Ellis Horwood.