Vision systems, such as ``seeing'' robots, should be able to operate robustly in generic environments. In this thesis, we investigate certain aspects of how these demands of robustness of a systems approach to vision could be met.
Firstly, we suggest that robustness can be improved by fusing the variety of information offered by the environment, and, therefore, we investigate the effectiveness of using the coincidence of multiple cues. Secondly, we are concerned about the use of coarse algorithms. Even though the environment provides much information, it is neither necessary nor possible to extract all information available. Therefore, we will show that coarse algorithms will suffice for certain problems.
To investigate the effectiveness of using the coincidence of multiple cues, we perform a series of experiments on detecting planar surfaces in binocular images. These experiments are based on two schemes of a somewhat different character.
The first one is a hypothesis-and-test scheme that incorporates the cues in a certain order and hence, by design, imposes a ranking of them. The general idea is to use arbitrary cues exploiting local image data to get an idea about whether the model (a planar surface) is seen in the image and at which location it is found. If one or more cues strongly indicate a certain instance of a model, then this observation serves as a hypothesis to be tested by other cues to support or reject this hypothesis. In comparison to the cues used for hypothesis generation, those used for hypothesis testing should be more reliable and can also have a higher computational complexity since they are only employed when needed.
The general idea of the second scheme is to first use a simple, and quick cue exploiting local image data to get an idea of where in the image the model (a planar surface) could be found. After this initial localization step, all cues that can be computed are gathered and allowed to vote for the occurrence of the model in the hypothesized region. The initialization of this approach is a hypothesis forming step, similar to that of the hypothesis-and-test approach. This step though, is much weaker because it only indicates a region in the images where to look. The approach allows direct fusion of incommensurable cues, such as intensity and surface orientation. Generally, it can be regarded as a less restrictive approach than the hypothesis-and-test approach.
We propose that coarse algorithms may be motivated from a robustness and flexibility point of view. Our experiments demonstrate that there is support for this claim, at least, for some tasks of relevance, such as those of finding planar surfaces, or similar simple models.
The full version of the thesis is available as Technical Report ISRN KTH/NA/P-98/11-SE, from KTH, NADA, S-100 44 Stockholm, Sweden.