next up previous
Next: Pose Estimation Up: iros02HTML Previous: Introduction


Motivation and Related Work

In terms of manipulation, it is usually required to accurately estimate the pose of the object to, for example, allow the alignment of the robot arm with the object or to generate a feasible grasp and grasp the object. Using prior knowledge about the object, a special representation can further increase the robustness of the tracking system. Along with commonly used CAD models (wire-frame models), view- and appearance-based representations may be employed [2].

A recent study of human visually guided grasps in situations similar to that typically used in visual servoing control, [9] has shown that the human visuo-motor system takes into account the three dimensional geometric features rather than the two dimensional projected image of the target objects to plan and control the required movements. These computations are more complex than those typically carried out in visual servoing systems and permit humans to operate in large range of environments.

Figure: Some of the objects we want robot to manipulate.
\includegraphics[width=.75\linewidth]{obj.eps}

We have therefore decided to integrate both appearance based and geometrical models to solve different steps of a manipulation task. Many similar systems use manual pose initialization where the correspondence between the model and object features is given by the user, (see [8] and [5]). Although there are systems where this step is performed automatically [7], [13] proposed approaches are time consuming and not appealing for real-time applications. One additional problem, in our case, is that the objects to be manipulated by the robot are highly textured and therefore not suited for matching approaches based on, for example, line features [17], [11], [18].

Figure: The small image shows the training image used to estimate the nearest pose of the object for the current image. Left) the initial pose overlaid on the current image, and right) the final pose obtained by local refinement method.
\begin{figure}\epsfig{figure=ref1.eps, width=.4\textwidth} \end{figure}

After the object has been recognized and its position in the image is known, an appearance based method is employed to estimate its initial pose. The method we have implemented has been initially proposed in [15] where just three pose parameters have been estimated and used to move a robotic arm to a predefined pose with respect to the object. Compared to our approach, where the pose is expressed relative to the camera coordinate system, they express the pose relative to the current arm configuration, making the approach unsuitable for robots with different number of degrees of freedom.

Compared to the system proposed in [19], where the network has been entirely trained on simulated images, we use real images for training where no particular background was considered. As pointed out in [19], the illumination conditions (as well as the background) strongly affect the performance of their system and these can not be easily obtained with simulated images. In addition, the idea of projecting just the wire-frame model to obtain training images can not be employed in our case due to the objects' texture. The system proposed in [17] also employs a feature based approach where lines, corners and circles are used to provide the initial pose estimate. However, this initialization approach is not applicable in our case since, due to the geometry and textural properties, these features are not easy to find with high certainty.


next up previous
Next: Pose Estimation Up: iros02HTML Previous: Introduction
Danica Kragic 2002-12-06