Click on the image for the book cover in full size

Scale-Space Theory in Computer Vision

Tony Lindeberg

KTH (Royal Institute of Technology)
Stockholm, Sweden


We perceive objects in the world as having structures both at coarse and fine scales. A tree, for instance, may appear as having a roughly round or cylindrical shape when seen from a distance, even though it is built up from a large number of branches. At a closer look, individual leaves become visible, and we can observe that the leaves in turn have texture at an even finer scale.

This fact that objects in the world appear in different ways depending upon the scale of observation has important implications when analysing measured data, such as images, with automatic methods. A straightforward way of exemplifying this is to note that every operation on image data must be carried out on a window, whose size can range from a single point to the whole image. The type of information we can get from such an operation is largely determined by the relation between structures in the image and the size of the window. Hence, without prior knowledge about what we are looking for, there is no reason to favour any particular scale. We should therefore try them all and operate at all window sizes.

These insights are not completely new in computer vision. Multi-scale representations of images in terms of pyramids were developed already around 1970. A main motivation then was to achieve computational efficiency by coarse-to-fine strategies. This approach was also supported by findings in neurophysiology about the primate visual system. However, it was soon discovered that relating structures from different levels in the multi-scale representation was far from trivial. Structures at coarse levels could sometimes not be assigned any direct interpretation, since they were hard to trace to finer scales. Despite considerable efforts to develop techniques for matching between scales, a theoretical foundation was missing.

In 1983, Witkin proposed that scale could be considered as a continuous parameter, thereby generalizing the existing notion of Gaussian pyramids. He noted the relation to the diffusion equation and hence found a well-founded way of relating image structures between different scales. Koenderink soon furthered the approach, which has been developed into what we now know as scale-space theory.

Since that work, we have seen the theory develop in many ways, and also realized that it provides a framework for early visual computations of a more general nature. The aim of this book is to provide a coherent overview of this recently developed theory, and to make material, which has earlier existed only in terms of research papers, available to a larger audience. The presentation provides an introduction into the general foundations of the theory and shows how it applies to essential problems in computer vision such as computation of image features and cues to surface shape. The subjects range from the mathematical foundation to practical computational techniques. The power of the methodology is illustrated by a rich set of examples.

I hope that this work can serve as a useful introduction, reference, and inspiration for fellow researchers in computer vision and related fields such as image processing, signal processing in general, photogrammetry, and medical image analysis. Whereas the book is mainly written in the form of a research monograph, the level of presentation has been adapted so that it can be used as a basis for advanced courses in these fields.

The presentation is organized in a logical bottom-up way, following the ordering of the processing modules in an imagined vision system. It is, however, not necessary to read the book in such a sequential manner. Several of the chapters are relatively self-contained, and it should be possible to read them independently. A guide to the reader describing the mutual dependencies is given in section 1.7 (page 22). I wish the reader a pleasant tour into this highly stimulating and challenging subject.

Stockholm, September 1993,
Tony Lindeberg
Responsible for this page: Tony Lindeberg