Research
PhD Students
Martin Hjelm (co-supervisor)
Alessandro Pieropan
Cheng Zhang
Javier Romero
(co-supervisor, PhD 2011, now at MPI Tübingen, Germany)
MSc Students
Saad Ullah Akram
Sina Nakhostin (guest from Örebro U)
Akshaya Thippur
Sara Mansouri (MSc 2011, guest from Chalmers U of Technology)
Cheng Zhang (MSc 2011)
Nataliya Shapovalova (MSc 2010, guest from U Bourgogne, France)
Simon Bos (MSc 2008, guest from U Rennes, France)
Anette Larsson (MSc 2007)
Josef Grahn (MSc 2005)
Matthieu Bray (MSc 2001, guest from U Blaise Pascal, France)
BSc Students
Björn Hegstam (BSc 2011)
Joakim Hugmark (BSc 2011)
Oliver Schneider (BSc 2010, guest from U Karlsruhe, Germany)
Research Engineers
Sriram Elango
Joakim Hugmark
TOMSY - Topology based
motion synthesis for dexterous manipulation (EU FP7, 2011-present)
|
The aim of TOMSY is to enable a generational leap in the techniques and scalability of motion synthesis algorithms. We propose to do this by learning and exploiting appropriate topological representations and testing them on challenging domains of flexible, multi-object manipulation and close contact robot control and computer animation. Traditional motion planning algorithms have struggled to cope with both the dimensionality of the state and action space and generalisability of solutions in such domains. This proposal builds on existing geometric notions of topological metrics and uses data driven methods to discover multi-scale mappings that capture key invariances - blending between symbolic, discrete and continuous latent space representations. We will develop methods for sensing, planning and control using such representations.
Joint work with Carl Henrik Ek, Martin Hjelm, Danica Kragic, Alessandro Pieropan, Florian Pokorny and Sethu Vijayakumar.
Project home page
|
Robust image-based
recognition of sign language signs (KTH, 2011-present)
|
Automatic recognition of sign language is a research area pertaining to several different areas in computer science, such as computer vision and language technology. Sign language technology research has attracted a lot of attention recently, with a potential to dramatically improve accessibility in society much in the same way as speech technology has in recent years.
The goal of this project is a method for automatic visual recognition of Swedish sign language. The method will build on a hand tracking method developed within the PACO-PLUS project (below), and will use data from video and/or 3D sensor input (e.g. Microsoft Kinect).
Joint work with Jonas Beskow.
Example of training data
|
Gesture-based violin
synthesis (KTH, 2011-present)
|
There are many commercial applications of synthesized music from acoustic instruments, e.g. generation of orchestral sound from sheet music. Whereas the sound generation process of some types of instruments, like piano, is fairly well known, the sound of a violin has been proven extremely difficult to synthesize. The reason is that the underlying process is highly complex: The art of violin-playing involves extremely fast and precise motion with timing in the order of milliseconds.
We believe that ideas from Machine Learning can be employed to build better violin sound synthesizers. The task of this project is to use learning methods to create a generative model of violin sound from sheet music, using an intermediate representation of the kinematic system (violin and bow) generating the sound. To train the generative model, a database with motion capture of bowing will be used, containing a large set of bowing examples, performed by 6 professional violinists.
Joint work with Anders Askenfelt.
Example of motion capture data
Parameters extracted from this motion capture data
|
HumanAct - Visual and
multi-modal learning of human activity and interaction with the
surrounding scene (VR, 2010-present)
|
The overwhelming majority of human activities are interactive in the sense that
they relate to the world around the human (in Computer Vision called the "scene"). Despite this, visual analyses of human activity very rarely take scene context into account. The objective in this project is modeling of human activity with object and scene context.
|
PACO-PLUS - Perception,
action and cognition through learning of object-action complexes (EU
FP6, 2007-2010)
|
The EU project PACO-PLUS brings together an interdisciplinary research team to design and build cognitive robots capable of developing perceptual, behavioural and cognitive categories that can be used, communicated and shared with other humans and artificial agents. In my part of the project, we are interested in programming by demonstration applications, in which a robot learns how to perform a task by watching a human do the same task. This involves learning about the scene, objects in the scene, and actions performed on those objects. It also involves learning grammatical structures of actions and objects involved in a task.
For more information, see
papers (Kjellström et al., 2011), (Kjellström et al., 2008b),
(Romero et al., 2010) and
(Sanmohan et al., 2011) in the publication list.
Joint work with Tamim Asfour, Jan-Olof Eklundh, Danica Kragic, Volker Krüger and Javier Romero.
Project home page
Video from (Romero et al., 2010)
|
ARTUR - A multi-modal
articulation tutor (VR, 2004-2006)
|
The intended outcome of the project is a system that simplify articulation
training for hearing impaired or second language learners by providing both
acoustic and articulatory feedback. This involves two research problems
spanning over speech technology, computer vision and human-computer
interaction. Firstly, methods to reconstruct the motion of the
face, lips and vocal tract from speech and video of the face have to be
developed. Secondly, presentation strategies have to be developed and tested.
The feedback to the user will consist of the reconstructed face, lips and
vocal tract, along with explanations what was correct and what should
be changed in the articulation.
For more information, see papers
(Kjellström and Engwall, 2009) and
(Engwall et al., 2006b) in
the publication list.
Joint work with Olle Bälter and Olov Engwall.
Project home page
|
Detection of humans in
images (FOI, 2003-2006)
|
The scope of this work is methods for detection of humans in video.
This encompasses human motion detection for automatic visual surveillance,
and fast detection of pedestrians from IR imagery for car safety
applications, in collaboration with industry.
For more information, see the paper
(Sidenbladh, 2004) in the
publication list.
Joint work with Jörgen Karlholm.
|
Monte Carlo methods for
information fusion (FOI, 2002-2005)
|
In this work, particle filter methods for tracking of a changing and unknown
number of objects were developed. One application is to
track multiple vehicles or units in terrain, observed by aerial vehicles,
ground sensor networks and humans on the ground.
For more information, see paper
(Sidenbladh, 2003) and
(Sidenbladh and Wirkander, 2003)
in the publication list.
We also developed methods for comparison of situation pictures of different
granularity or at different points in time. One application is to obtain a
measure of system reliability - if the situation pictures obtained from two
independent systems (e.g. trackers) differ greatly, one or both systems are
wrong or have too little information.
This can be used to give a human operator of the system information about the
reliablility of the situation pictures.
For more information, see papers
(Sidenbladh et al.,
2005) and (Sidenbladh et al.,
2004) in the publication list.
The methods were part of a larger data fusion system, indended to
fuse data presented to operators of a military command
and control system. Here, fusing means to combine data statistically to lower
the amount of data presented to the operator, while at the same time
enhancing the quality of the presented data.
Joint work with Johan Schubert, Pontus Svenson and Sven-Lennart Wirkander.
Videos from (Sidenbladh, 2003)
|
3D reconstruction of
human motion (SSF, 1997-2001)
|