ebook img

Machine Vision for Three-Dimensional Scenes PDF

417 Pages·1990·10.111 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Machine Vision for Three-Dimensional Scenes

Machine Vision for Three-Dimensional Scenes Edited by Herbert Freeman CAIP Center Rutgers University Piscataway, New Jersey ACADEMIC PRESS, INC. Harcourt Brace Jovanovich, Publishers Boston San Diego New York London Sydney Tokyo Toronto This book is printed on acid-free paper. ® Copyright © 1990 by Academic Press, Inc. All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopy, recording, or any information storage and retrieval system, without permission in writing from the publisher. Figures 6-7 in Segmentation and Analysis of Multi-Sensor Images are from Integrated Analysis of Thermal and Visual Images for Scene Interpretation, by N. Nandhakumar and J. K. Aggarwal, IEEE Trans., PAMI10, (4) 1988, pp. 469-481. © 1988 by IEEE. Figures 11-15 in Segmentation and Analysis of Multi-Sensor Images are from Integrated Modelling of Thermal and Visual Image Generation, by C. Oh, N. Nandhakumar, and J. K. Aggarwal, Proc. of the IEEE Conf on Computer Vision and Pattern Recognition, June 4-8, 1989, San Diego, CA. © 1989 by IEEE. Figures 2, 3, 5, and 10 in A Framework for 3D Recognition are from Visual Recognition Using Concurrent and Layered Parameter Networks, by R. M. Bolle, Proc. IEEE Conf. on Computer Vision and Pattern Recognition, June 4-8, 1989, San Diego, CA. © 1989 by IEEE. ACADEMIC PRESS, INC. 1250 Sixth Avenue, San Diego, CA 92101 United Kingdom Edition published by ACADEMIC PRESS LIMITED 24-28 Oval Road, London NW1 7DX Library of Congress Cataloging-in-Publication Data Machine vision for three-dimensional scenes / edited by Herbert Freeman, p. cm. Includes bibliographical references and index. ISBN 0-12-266722-0 (alk. paper) 1. Computer vision. 2. Three-dimensional display systems. I. Freeman, Herbert. TA1633.M3365 1990 006.37-dc20 90-36164 CIP Printed in the United States of America 90 91 92 93 9 8 7 6 5 4 3 2 1 Preface Since 1986, an annual workshop dealing with machine vision has been held in New Brunswick, New Jersey, under the auspices of the Center for Computer Aids for Industrial Productivity (CAIP) of Rut­ gers University. Some 80 persons, drawn approximately equally from industry and university research laboratories, have typically partici­ pated in the workshops, and they have come from all over the United States as well as from countries overseas. The objective of the workshops has been to exchange informa­ tion on the current state of the art, to identify the key obstacles to further progress, and generally to determine where machine vision stood and where it was going. Each workshop has had a particular theme. The first was entitled ’’Machine Vision - Algorithms, Archi­ tectures and Systems,” the second, ’’Machine Vision for Inspection and Measurement, and the latest one, ’’Machine Vision - Acquiring and Interpreting the 3D Scene”. All the workshops have been con­ cerned with the solution of real industrial problems; and, although fairly long-term approaches were at times discussed, the overriding objective throughout was on ultimately obtaining practical solutions to real problems. Presented here is a collection of 14 articles that have emanated from the most recent workshop, held in April 1989. Emphasis was on image acquisition as well as on 3D scene interpretation. Of the three articles dealing specifically with image sensing, one addresses the problem of segmentation of multi-sensor images, another is concerned with the placement of sensors so as to minimize occlusion, and a third describes the use of light striping to obtain range data. The problem of machine vision inspection is the subject of two other papers. One describes the current state of the LESTRADE project under development in Rutgers’ CAIP Center, in which eye- tracking is utilized to train a vision system so as to learn from, and eventually emulate, the inspection capabilities of a human inspector. vii viii Preface More than half of the papers deal with images of three-dimensional scenes and the attendant problems of image understanding, including one paper that specifically addresses the problem of object motion. Also included are summaries of two stimulating panel discussions, one dealing with real-time range mapping and the other with the relationship between the developing technology and the market place. The workshops are made possible through the generous support of the New Jersey Commission on Science and Technology as well as that of the CAIP Center’s industrial members. This support is grate­ fully acknowledged. Thanks are also due to Ruye Wang and Shuang Chen, both graduate students in the CAIP Center, for their diligence in formatting the chapters in I^TgXand putting the manuscript into camera-ready form. Herbert Freeman Contributors Numbers in parentheses indicate the pages on which authors’ contri­ butions begin. J.K. Aggarwal (267), Computer and Vision Research Center, The University of Texas at Austin, Austin, TX 78712 Paul J. Besl (25), Computer Science Department, General Motors Research Laboratories, Warren, MI 48090-9055 Ruud M. Bolle (1), IBM Thomas J. Watson Research Center, York- town Heights, NY 10598 Andrea Califano (1), IBM Thomas J. Watson Research Center, York- town Heights, NY 10598 L. Richard Carley (381), Department of Electrical and Computer En­ gineering, Carnegie Mellon University, Pittsburgh, PA 15213 Per-Erik Danielsson (347), Department of Electrical Engineering, Linköping University, S-581 83 Linköping, Sweden M. De Groof (163), Department of Electrical Engineering, Katholieke Universiteit Leuven, Kardinal Mercierlaan 94, 30330 Heverlee, Belgium Herbert Freeman (109,219), CAIP Center, Rutgers University, P.O. Box 1390, Piscataway, NJ 08855-1390 W. Eric L. Grimson (73), Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02139 Andrew Gruss (381), School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213 T.S. Huang (195), Coordinated Science Laboratory, University of Illi­ nois, 1101 W. Springfield Avenue, Urbana, IL 61801 ix X Contributors George Hung (219), Department of Biomedical Engineering, Rutgers University, P.O. 909, Piscataway, NJ 08854 Takeo Kanade (381), School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213 Richard Mammone (219), CAIP Center, Rutgers University, P.O. Box 1390, Piscataway, NJ 08855-1390 G. Marchal (163), Department of Electrical Engineering, Katholieke Universiteit Leuven, Kardinal Mercierlaan 94, 30330 Heverlee, Belgium A.N. Netravali (195), AT&T Bell Laboratories, 600 Mountain Av­ enue, Murray Hill, NJ 07974 J. Nuyts (163), Department of Electrical Engineering, Katholieke Uni­ versiteit Leuven, Kardinal Mercierlaan 94, 30330 Heverlee, Bel­ gium A. Oosterlinck (163), Department of Electrical Engineering, Katholieke Universiteit Leuven, Kardinal Mercierlaan 94, 30330 Heverlee, Belgium 0. Seger (347), Department of Electrical Engineering, Linköping University, S-581 83 Linköping, Sweden Albert Sicignano (243), Philips Laboratories, North American Philips Corporation, Briarcliff Manor, NY 10563 C. Smets (163), Department of Electrical Engineering, Katholieke Universiteit Leuven, Kardinal Mercierlaan 94, 30330 Heverlee, Belgium P. Suetens (163), Department of Electrical Engineering, Katholieke Universiteit Leuven, Kardinal Mercierlaan 94, 30330 Heverlee, Belgium Kostantinos Tarabanis (301), Computer Science Department, Columbia University, New York, NY 10027 Contributors xi Roger Y. Tsai (301), IBM Thomas J. Watson Research Center, York- town Heights, NY 10598 Arend van de Stadt (243), Philips CFT-Briarcliff, North American Philips Corporation, Briarcliff Manor, NY 10510 D. Vandermeulen (163), Department of Electrical Engineering, Katholieke Universiteit Leuven, Kardinal Mercierlaan 94, 30330 Heverlee, Belgium Ruye Wang (109), CAIP Center, Rutgers University, P.O. Box 1390, Piscataway, NJ 08855-1390 Joseph Wilder (219, 341), CAIP Center, Rutgers University, P.O. Box 1390, Piscataway, NJ 08855-1390 Nello Zuech (399), Fairchild Weston, 3 Milton Dr., Yardley, PA 19067 A Framework for 3D Recognition Ruud M. Bolle and Andrea Califano Exploratory Computer Vision Group IBM Thomas J. Watson Research Center Abstract This paper describes a modular and coherent approach to 3D object recognition intended to deal with objects drawn from a large visual world. We discuss the problems that arise when dealing with large object databases and propose solutions to these problems. 1 Introduction The ultimate goal of the vision system, currently under development at the IBM Thomas J. Watson Research Center, is to recognize objects drawn from a large, cluttered visual world. The surfaces of the objects in this visual world are represented as collections of patches of planes and quadrics of revolution (spheres, cylinders, cones, etc.). As addition­ al information, the object models contain information about 3D curves, i.e., surface intersections and occluding boundaries. A large percentage of man-made objects can be exhaustively described using this set of primitives [18]. By using a highly modular structure for the recognition paradigm, enriching the primitive set is a straightforward operation. In fact, the vision system has already evolved from a stage in which the primitive set contained just planes and spheres to the variety of feature types of the current incarnation. For an arbitrary input scene, it is a priori unknown which of the contained features are relevant and which are not. Therefore, an abundance of features is extracted in a highly modular and parallel fashion. That is, the parameters of the surfaces and the 3D curves that 1 Machine Vision for Copyright© 1990 by Academic Press, Inc. Three-Dimensional Scenes All rights of reproduction in any form reserved. ISBN 0-12-266722-0 2 Ruud M. Bolle and Andrea Califano are present in the input data are extracted simultaneously. In concert, this information is used to arrive at a consistent global interpretation of the scene in terms of objects. This produces a highly homogeneous para­ digm for recognition. The input to the system is, for the moment, a depth map obtained from a laser range fin­ der^]. In the near future, we will incorporate other sources of sensory information, e.g., reflec­ tance data. Systems, such as the one described in [24], can be easily integrated thanks to the modularity and homogeneity of Figure 1: System architecture. the paradigm. The requirement, to be able to recognize a large number of objects, immediately poses many pro­ blems, especially at the so-called higher-level processing stages. In the most important part of this paper, our solutions, and proposed solutions to these problems are discussed. We touch upon, for example, parallel layered parameter transforms, use of long-distance correlation, feature matching, multiple-resolution object modeling, and processing at mul­ tiple resolutions. But let us first introduce our paradigm for recognition. 2 The vision system The system is intended to recognize complex 3D objects in cluttered environments, as for example a bin of parts. We have proposed a homogeneous framework for recognition. Figure 1 represents an over­ view of this approach. Recognition is structured as a hierarchy of layered and concurrent parameter transforms [3] for feature extraction. Features that are structurally independent, for instance, planes and linear 3D edges, form concurrent paths of recognition. Features that A Framework for 3D Recognition 3 depend upon other low-level features, for example, boxes and planar patches, are placed in hierarchical layers within a path. Parameter transforms generate hypotheses about primitive shapes in the scene. Evidence for the various hypotheses is fused using constraint satisfac­ tion networks [14], in a fashion motivated by work in connectionist networks [15][27]. This results in a highly parallel, modular system for visual recognition, where the search is controlled in the same fashion at every recognition stage-from low to high level. The most important aspect of the approach is the homogeneity that allows different feature types (such as, surfaces and curves) and poten­ tially different input sources (range, reflectance, tactile data, ...) to be easily integrated. Such an homogeneity is obtained with the introduc­ tion of a generalized feature concept which allows geometric knowledge to be treated uniformly at any level. Each feature type is defined by a parameterization and by relationships to other features. New feature types can be introduced by defining a procedure to compute their parametric representation from the input data or lower-level features (parameter transforms) and by defining relationships {compatibility relationships) to other feature types. A global interpretation of the scene is arrived at through the fusion of diverse sources of evidence. Note that this approach significantly deviates from classical “segmentation follo­ wed by interpretation” (e.g., [16]) schemes, in the sense that “hard” decisions are deferred till the later stages of processing. 3 Some details of the implementation Robust geometric shape extraction is the basis for recognition. To extract the parameters of complex geometric entities, one would like to devise anM xli operator that computes some parametric description of the curves and surfaces. To avoid interference from nearby local features, the size M of the operator should be small-but this will make estimates of higher-order properties of the curve and surfaces inac­ curate. To solve these problems, we use the long-distance correlation between different windows on the same feature. For both curve and surface extraction, we examine a global neighborhood to extract the parameters of our primitive features, using a set of nearby windows. (The multiple window approach is described in detail in [10][11][12].)

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.