Local Spatio-Temporal Image Features for Motion Interpretation

Ivan Laptev

PhD Thesis, defended on June 11, 2004 at Computational Vision and Active Perception Laboratory (CVAP), NADA, KTH, Stockholm

Abstract

Visual motion carries information about the dynamics of a scene. Automatic interpretation of this information is important when designing computer systems for visual navigation, surveillance, human-computer interaction, browsing of video databases and other growing applications.

In this thesis, we address the issue of motion representation for the purpose of detecting and recognizing motion patterns in video sequences. We localize the motion in space and time and propose to use local spatio-temporal image features as primitives when representing and recognizing motions. To detect such features, we propose to maximize a measure of local variation of the image function over space and time and show that such a method detects meaningful events in image sequences. Due to its local nature, the proposed method avoids the influence of global variations in the scene and overcomes the need for spatial segmentation and tracking prior to motion recognition. These properties are shown to be highly useful when recognizing human actions in complex scenes.

Variations in scale and in relative motions of the camera may strongly influence the structure of image sequences and therefore the performance of recognition schemes. To address this problem, we develop a theory of local spatio-temporal adaptation and show that this approach provides invariance when analyzing image sequences under scaling and velocity transformations. To obtain discriminative representations of motion patterns, we also develop several types of motion descriptors and use them for classifying and matching local features in image sequences. An extensive evaluation of this approach is performed and results in the context of the problem of human action recognition are presented.

In summary, this thesis provides the following contributions: (i) it introduces the notion of local features in space-time and demonstrates the successful application of such features for motion interpretation; (ii) it presents a theory and an evaluation of methods for local adaptation with respect to scale and velocity transformations in image sequences and (iii) it presents and evaluates a set of local motion descriptors, which in combination with methods for feature detection and feature adaptation allow for robust recognition of human actions in complex scenes with cluttered and non-stationary backgrounds as well as camera motion.

PDF: (8Mb)

PowerPoint presentation: (11Mb) including video demonstrations: (36Mb)

Related projects: Recognition of human actions

Related publications: (Local descriptors for spatio-temporal recognition) (Velocity adaptation of space-time interest points) (Recognizing Human Actions: a Local SVM Approach) (Space-time interest points) (Interest point detection and scale selection in space-time) (Velocity-adaptation of spatio-temporal receptive fields for direct recognition of activities: An experimental study) (Monograph on scale-space theory)


Ivan Laptev