By Henrry Rodriguez, henrry@nada.kth.se
In this paper I will compare two approaches: Activity theory and Cognitive science in the field of Human computer Interaction. I will not give a very detailed description of the two approaches. I do this because this paper is the term paper of the course 'Activity theory in HCI' given in Nov. 13 and Nov. 20 by Victor Kaptellinin.
If we want to understand what a person does, we first have to know in which context that person is. Now that I say that comes to my mind a technic that is very often used by film makers, they start the film with a scene that does not have any meaning for the public, it just fills them of question and curiosity. What film makers are doing is just capturing the public's attention. The development the film then explaining the global context, in which the first action was done. This is a very simple example to prove how important it is to understand the context that any action takes place. If we can understand the context that means that we could have an idea of that which actions can be performed and in which sequence. In the field of HCI to know this context is very important and to get a description of that context is always hard to find because attempting to have a good description of that context we could include a lot of information that is not relevant for the design of an interface also we could very easily no recognize a situation that could be very important and that could change the context if we do not give to this situation the value that it can have. The combination of psychology, ergonomics, and computer technology has generated an area of interdisciplinary knowledge known as 'Human-Computer Interaction' (HCI). There are guidelines to assist designers, who no longer have to rely on guesswork or personal experience and expertise to decide between possibilities.
Activity theory originated in the former Soviet union as part of the cultural-historical school of psychology founded by Vygotskij, Leontjev and Lurija. The theory is a philosophical framework for studying different forms of human praxis as developmental processes, with both the individual and the social level interlinked.
In activity theory the unit of analysis is an activity that is being composed of subject, object, actions, and operation. A subject is a person or a group engaged in an activity. An object is help by the subject and motivates activity. 'Behind the object there always stands a need or a desire, to which [the activity] always answer.'
Cognitive Science is a rapidly expanding field of study aimed at understanding the mental processes that underlie cognitive abilities. The questions asked by Cognitive Science are not new. Philosophers, Psychologists, Linguists, Neuroscientists and Computer Scientists have all approached the basic questions posed by the nature of mental processes in their own ways as part of the broader endeavours of their respective fields. Cognitive Science is distinguished from these traditional disciplines by its highly interdisciplinary approach. Its defining technique is to bring expertise gained from the related disciplines to bear on a set of common questions: What are the basic components of cognitive processes? Are they subsumed by a common mental mechanism? What is the relationship between the physical apparatus and cognition? To answer these questions Cognitive Scientists engage in empirical studies aimed at assessing their formal and computational models of various aspects of cognition. The sorts of areas investigated include the information-acquisition and information-processing mechanisms underlying cognitive abilities like perception, recognition, information storage and information retrieval, language acquisition, comprehension and production, concept acquisition, problem solving, and reasoning.
Since the Seventeenth Century, the development of a unified science of the mind has been frustrated by the fact that questions about perception, thought, memory, imagination, language comprehension and learning, and other mental phenomena fall under the purview of several distinct sciences, each with its own methodology, conception of explanation, and preferred set of explanatory models. Until recently, most psychologists, philosophers, computer scientists, linguists, and neurobiologists have been content to pursue these questions in relative isolation, awaiting, it seems, the arrival of some modern-day Newton of the mind. In the last two decades, however, the gradual emergence in each of these disciplines of some version of the view that mental phenomena can be fruitfully understood as operations on symbolic representations and that the mind is thus, in some sense or other, an information processor, has made possible a truly interdisciplinary approach, cognitive science, that holds the promise of being the long sought unified science of the mind.
Cognitive science is not really new, the phenomenon of thought and language are points of interest for philosopher and scientist since a very long time ago. Cognitive sciences need to be distinguished from cognitive psychology, which is the branch of traditional psychology dealing with cognition. Although cognitive psychology constitutes a substantial part of what is seen as Cognitive sciences, it follows specific methodological principles that limit its scope.
In the field of Cognitive sciences there are some approaches to modelling the interaction between a user and the device. The GOMS (Goal, Operators, Methods and Selection and the production system model) are know as 'process' models because attempt to supply a simple model of the mental processes involved in using an interface, including remembering items, starting a new subgoal, etc. They yield predictions which are less quantitative in nature, based perhaps on how many items must be simultaneously retained in working memory or on other measures. Both models characterize the knowledge necessary to performance routine tasks like text editing.
The GOMS model represents a user' s knowledge of how to carry out routine skills in terms of goals, operations, methods, and selection rules. GOMS describes the operation of the interface in terms of a 'state space.' The users goal is to achieve a particular state; each available operator takes the user to the same state or a new state, in which different operators will be available.
Goals represent a user's intention to perform a task, a subtask, or a single cognitive or a physical operation. Goals are organized into structures of interrelated goals that sequence cognitive operations and user actions.
Operations characterize elementary physical actions (e.g., pressing a function key or typing a string of characters), and cognitive operations not analysed by the theory (e.g., perceptual operations, retrieving an item from memory, or reading a parameter and storing it in working memory).
A user's knowledge is organized into methods which are subroutines. Methods generate sequences of operations that accomplish specific goals or subgoals. The goal structure of a method characterizes its internal organization and control structure.
Selection rules specify the conditions under which it is appropriate to execute a method to effectively accomplish a specific goal in a given context. They are compiled pieces of problem solving knowledge. They function by asserting the goal to execute a given method in the appropriate context.
Content and Structure of a User' s Knowledge The GOMS model assumes that execution of a task involves decomposition of the task into a series of subtasks. A skilled user has effective methods for each type of subtask. Accomplishing a task involves executing the series of specialized methods that perform each subtask. There are several kinds of methods. High-level methods depose the initial task into a sequence of subtasks. Immediate-level methods describe the sequence of functions necessary to complete a subtask. Low-level methods generate the actual sequence of user actions necessary to perform a function.
A user's knowledge is a mixture of task-specific information, the high-level methods, and system-specific knowledge, the low-level methods. The knowledge captured in the GOMS representation describes both general knowledge of how the task is to be decomposed as well as specific information on how to execute functions required to complete the task on a given system.
Kieras, Bovair and Polson among others have successfully tested assumptions underlying these predictions. These authors have shown that the amount of time required to learn a task is a linear function of the number of new rules that must be acquired in order to successfully execute the task and that execution time is the sum of the execution times for the rules that fire in order to complete the task. They have shown that transfer of training can be characterized in the terms of shared rules.
People who work with computers extensively build up a repertoire of efficient, smooth, learned behaviours for carrying out theirs routine communicative activities. Yet the interaction is intensely cognitive. The skills are wielded within a problem-solving context, and the skills themselves involve the processing of symbolic information, there is always required the interpretation of instructions, the formulation of sequences of command, and the communication of these commands to the computer.
Susane Bker points out that the conditions that trigger a certain operation from the repertoire of operation are what we need to investigate in user interface design.
Terry Winograd points out that ' many difficult issued are raised by the attempt to relate programs to theory and to cognitive mechanism. Within the Cognitive sciences community, there is much debate about just what role computers programs have in developing and testing theories'. He says that Cognitive sciences will have important limitations in its scope and in its power to explain what we are and what we do.
Maturana in 1970 says that 'Learning is not a process of accumulation of representation of the environment; it is a continuous processor transformation of behavior through continuous change in the capacity of the nervous system to synthesize it. Recall does not depend on the indefinite retention of a structural invariant that represents an entity (an idea, image, or symbol), but on the functional ability of the system to create, when certain recurrent conditions are given, a behavior that satisfies the recurrent conditions or that the observer would class as a reenacting of a previous one.'
Winograd in his book gives a very simple explanation about that it is impossible to establish a context_independet basis for circumscribing the literal use of a term even as seemingly simple as 'water' through this example
A: Is there any water in the refrigerator?
B: Yes.
a: Where? I don't see it.
B: In the cells of the eggplant.
As we can see in the both approaches ( Activity theory, GOMS and CTA) try to give a framework for the design of interfaces in the field of HCI. Now I will try to explain some of the limitation for them.
For error-free behavior, a GOMS model provides a complete dynamic description of behavior, measured at the level of goal, method, and operators. Given a specific task, this description can be instantiated into a sequence of operators. By associating times with each operator, such a model will make total time predictions. If these time are given as distribution, it will make statistical predictions. But, without augmentation, the model is not appropriate if errors occur Yet errors exist in routine cognitive skilled behavior. Indeed, errors' rates may not even be small, in the sense of having negligible frequency, taking negligible time, or having negligible consequences. For skilled behavior the detection and correction of errors is mostly routine. It cannot be entire routine, since the occurrence of rare types of errors for which the user is unprepared is always possible. But in the main, errors are quickly detected and result in additional time to correct the error. The final effect of the behavior remains relatively error-free, and the behavior can be characterized solely by the time to completion. Thus, errors can be converted to variance in operators time, so that GOMS theory can be applied to actual behavior at the price of degraded accuracy. For a general treatment of errors and interruptions of the users, the hierarchical control structure of a GOMS model is inadequate; a more general control structure is required. The use of stack discipline GOMS model instead of a more general control structure, such as production system (Newell Simon, 1972), should be taken as an approximation especially appropriate for skilled cognitive behavior and preferred here because of its greater simplicity. The very limited degree to which this analysis involves any psychological process model can be assessed from the amount of reasoning behind it. The analysis is based on two basic principles of psychology. Firstly, that people act so as to attain their goals through rational action given the structure of the task, and secondly, that problem solving activity can be described in terms of a set of knowledge states; operators for changing states; and control knowledge for applying knowledge. Since 'Operators are elementary . . . Elementary processing acts, whose execution is necessary to change any aspects of the user's memory . . .' Card, 1978, p 58) the model has the potential for processing operations.
On the other hand GOMS uses only the knowledge in the design, and produces absolute estimates of performance time - 'it will take 3.45 second for a skilled user to perform this task using this system'
CTA is first and foremost a means of making relationships explicit between approximate knowledge representations and cognitive limitations on their mental processing, The characteristic and limitation are specified in terms of the properties of mental codes; restricted capabilities for coordinating and controlling processes which handle those codes: and more specific limitations such as recency and description effects in memory retrieval. The approach essentially provides a language in which such constraints can be specified. The language refers to processes and coded mental representations which can be described in terms of theirs attributes. In its present form, only a limited range of attributes and constraint are actually utilised. They can, however, be added to as further analyses of user performance and provide additional empirical justification for extending that range.
In my point of view I think that Activity theory has a lot still do give to HCI, I think that the most important here is that the elementary unit of study in Activity theory is the action and when we are interacting with a computer there are a lots of actions so a framework in which action is the main object of study as in Activity theory will give a lot to make easy the study of this field. One aspect that is very important is that Activity theory is not a rigid but flexible, it gives the possibility to go from level to other in both directions. When we are working with different users, and as far as I know there is no system done to be run for one specific person, we have to be very careful because it is not easy to put all these users into a frame and start to develop, that is why it is very important to have a flexible approach.
Nardi points out that the use of Activity theory framework implies
If we use a model under the frame of Activity theory then we will have a model that:
In my opinion nowadays the possibility that AT gives to the field of HCI is still on process and that only with practice we will find out if working in this framework this filed will find a common and general way in its research. The answer to this will only be given within the time and with the use of this framework. It is one chance and a very wide set of possibilities and I do not see any reason why we should no try to start new ways in research, in fact that is what keeps science going on.