Home Home

HEDVIG KJELLSTRÖM (Sidenbladh)
Hedvig Kjellstrom

Research

PhD Students
Martin Hjelm (co-supervisor)
Alessandro Pieropan
Cheng Zhang
Javier Romero (co-supervisor, PhD 2011, now at MPI Tübingen, Germany)

MSc Students
Saad Ullah Akram
Sina Nakhostin (guest from Örebro U)
Akshaya Thippur
Sara Mansouri (MSc 2011, guest from Chalmers U of Technology)
Cheng Zhang (MSc 2011)
Nataliya Shapovalova (MSc 2010, guest from U Bourgogne, France)
Simon Bos (MSc 2008, guest from U Rennes, France)
Anette Larsson (MSc 2007)
Josef Grahn (MSc 2005)
Matthieu Bray (MSc 2001, guest from U Blaise Pascal, France)

BSc Students
Björn Hegstam (BSc 2011)
Joakim Hugmark (BSc 2011)
Oliver Schneider (BSc 2010, guest from U Karlsruhe, Germany)

Research Engineers
Sriram Elango
Joakim Hugmark

TOMSY - Topology based motion synthesis for dexterous manipulation (EU FP7, 2011-present)

The aim of TOMSY is to enable a generational leap in the techniques and scalability of motion synthesis algorithms. We propose to do this by learning and exploiting appropriate topological representations and testing them on challenging domains of flexible, multi-object manipulation and close contact robot control and computer animation. Traditional motion planning algorithms have struggled to cope with both the dimensionality of the state and action space and generalisability of solutions in such domains. This proposal builds on existing geometric notions of topological metrics and uses data driven methods to discover multi-scale mappings that capture key invariances - blending between symbolic, discrete and continuous latent space representations. We will develop methods for sensing, planning and control using such representations.

Joint work with Carl Henrik Ek, Martin Hjelm, Danica Kragic, Alessandro Pieropan, Florian Pokorny and Sethu Vijayakumar.

Project home page

Robust image-based recognition of sign language signs (KTH, 2011-present)

Automatic recognition of sign language is a research area pertaining to several different areas in computer science, such as computer vision and language technology. Sign language technology research has attracted a lot of attention recently, with a potential to dramatically improve accessibility in society much in the same way as speech technology has in recent years.
The goal of this project is a method for automatic visual recognition of Swedish sign language. The method will build on a hand tracking method developed within the PACO-PLUS project (below), and will use data from video and/or 3D sensor input (e.g. Microsoft Kinect).

Joint work with Jonas Beskow.

Example of training data

Gesture-based violin synthesis (KTH, 2011-present)

There are many commercial applications of synthesized music from acoustic instruments, e.g. generation of orchestral sound from sheet music. Whereas the sound generation process of some types of instruments, like piano, is fairly well known, the sound of a violin has been proven extremely difficult to synthesize. The reason is that the underlying process is highly complex: The art of violin-playing involves extremely fast and precise motion with timing in the order of milliseconds.
We believe that ideas from Machine Learning can be employed to build better violin sound synthesizers. The task of this project is to use learning methods to create a generative model of violin sound from sheet music, using an intermediate representation of the kinematic system (violin and bow) generating the sound. To train the generative model, a database with motion capture of bowing will be used, containing a large set of bowing examples, performed by 6 professional violinists.

Joint work with Anders Askenfelt.

Example of motion capture data
Parameters extracted from this motion capture data

HumanAct - Visual and multi-modal learning of human activity and interaction with the surrounding scene (VR, 2010-present)

The overwhelming majority of human activities are interactive in the sense that they relate to the world around the human (in Computer Vision called the "scene"). Despite this, visual analyses of human activity very rarely take scene context into account. The objective in this project is modeling of human activity with object and scene context.
The methods developed within the project will be applied to the task of Learning from Demonstration, where a (household) robot learns how to perform a task (e.g. preparing a dish) by watching a human perform the same task.

For more information, see book chapter (Kjellström, 2011), and papers (Kjellström et al., 2011) and (Kjellström et al., 2010) in the publication list.

Joint work with Michael Black, Alessandro Pieropan and Cheng Zhang.

Videos from (Kjellström et al., 2010)

PACO-PLUS - Perception, action and cognition through learning of object-action complexes (EU FP6, 2007-2010)

The EU project PACO-PLUS brings together an interdisciplinary research team to design and build cognitive robots capable of developing perceptual, behavioural and cognitive categories that can be used, communicated and shared with other humans and artificial agents. In my part of the project, we are interested in programming by demonstration applications, in which a robot learns how to perform a task by watching a human do the same task. This involves learning about the scene, objects in the scene, and actions performed on those objects. It also involves learning grammatical structures of actions and objects involved in a task.

For more information, see papers (Kjellström et al., 2011), (Kjellström et al., 2008b), (Romero et al., 2010) and (Sanmohan et al., 2011) in the publication list.

Joint work with Tamim Asfour, Jan-Olof Eklundh, Danica Kragic, Volker Krüger and Javier Romero.

Project home page
Video from (Romero et al., 2010)

ARTUR - A multi-modal articulation tutor (VR, 2004-2006)

The intended outcome of the project is a system that simplify articulation training for hearing impaired or second language learners by providing both acoustic and articulatory feedback. This involves two research problems spanning over speech technology, computer vision and human-computer interaction. Firstly, methods to reconstruct the motion of the face, lips and vocal tract from speech and video of the face have to be developed. Secondly, presentation strategies have to be developed and tested. The feedback to the user will consist of the reconstructed face, lips and vocal tract, along with explanations what was correct and what should be changed in the articulation.

For more information, see papers (Kjellström and Engwall, 2009) and (Engwall et al., 2006b) in the publication list.

Joint work with Olle Bälter and Olov Engwall.

Project home page

Detection of humans in images (FOI, 2003-2006)

The scope of this work is methods for detection of humans in video. This encompasses human motion detection for automatic visual surveillance, and fast detection of pedestrians from IR imagery for car safety applications, in collaboration with industry.

For more information, see the paper (Sidenbladh, 2004) in the publication list.

Joint work with Jörgen Karlholm.

Monte Carlo methods for information fusion (FOI, 2002-2005)

In this work, particle filter methods for tracking of a changing and unknown number of objects were developed. One application is to track multiple vehicles or units in terrain, observed by aerial vehicles, ground sensor networks and humans on the ground.

For more information, see paper (Sidenbladh, 2003) and (Sidenbladh and Wirkander, 2003) in the publication list.

We also developed methods for comparison of situation pictures of different granularity or at different points in time. One application is to obtain a measure of system reliability - if the situation pictures obtained from two independent systems (e.g. trackers) differ greatly, one or both systems are wrong or have too little information. This can be used to give a human operator of the system information about the reliablility of the situation pictures.

For more information, see papers (Sidenbladh et al., 2005) and (Sidenbladh et al., 2004) in the publication list.

The methods were part of a larger data fusion system, indended to fuse data presented to operators of a military command and control system. Here, fusing means to combine data statistically to lower the amount of data presented to the operator, while at the same time enhancing the quality of the presented data.

Joint work with Johan Schubert, Pontus Svenson and Sven-Lennart Wirkander.

Videos from (Sidenbladh, 2003)

3D reconstruction of human motion (SSF, 1997-2001)

The subject of my PhD work was Bayesian methods (e.g. particle filtering) for tracking and reconstruction of human motion in 3D from image sequences. Possible applications are video-based human motion capture and recognition of human actions and gestures. Of course this is a highly underdetermined problem. The depth information that was lost in the projection of the human onto the camera plane has to be inferred from other data. Therefore, models of how humans normally move and appear were learned from training data (images and motion capture data), and used to infer 3D motion from the 2D image sequence.

For more information, see the PhD thesis or papers (Sidenbladh and Black, 2003), (Sidenbladh et al., 2002) and (Sidenbladh et al., 2000b) in the publication list.

Joint work with Michael Black, Fernando De la Torre, Jan-Olof Eklundh and David Fleet.

Image data and example code from the thesis
Video tracking examples from the thesis