ebook img

Interpretation of Visual Motion: A Computational Study PDF

146 Pages·1988·9.216 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Interpretation of Visual Motion: A Computational Study

to my parents Muralidhara Subbarao Department of Electrical Engineering State University of New York at Stony Brook Interpretation of Visual Motion: A Computational Study Pitman, London Morgan Kaufmann Publishers, Inc., San Mateo, California PITMAN PUBLISHING 128 Long Acre, London WC2E 9AN © Muralidhara Subbarao 1988 First published 1988 Available in the Western Hemisphere from MORGAN KAUFMANN PUBLISHERS, INC., 2929 Campus Drive, San Mateo, California 94403 ISSN 0268-7526 British Library Cataloguing in Publication Data Subbarao, Muralidhara Interpretation of visual motion: a computational study.—(Research notes in artificial intelligence, ISSN 0268-7526). 1. Motion. Visual perception by man. Interpretation Applications of computer systems I. Title II. Series 152.Γ425Ό285 ISBN 0-273-08792-4 All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording and/or otherwise, without either the prior written permission of the Publishers or a licence permitting restricted copying in the United Kingdom issued by the Copyright Licencing Agency Ltd, 33-34 Alfred Place, London WC1E 7DP. This book may not be lent, resold, hired out or otherwise disposed of by way of trade in any form of binding or cover other than that in which it is published, without the prior consent of the publishers. Reproduced and printed by photolithography in Great Britain by Biddies Ltd, Guildford Preface This book is based on my doctoral dissertation completed in 1986 at Computer Vision Laboratory, University of Maryland. The original thesis has been revised and updated in many respects. Also some new material has been added. A changing scene produces a changing image or visual motion on the eye's retina. The human visual system is able to recover useful three-dimensional informa­ tion about the scene from this two-dimensional visual motion. This thesis is a study of this phenomenon from an information processing point of view. A computational theory is formulated for recovering the scene from monocular visual motion. This formulation deals with determining the local geometry and the rigid body motion of surfaces from spatio-temporal parameters of visual motion. Based on this formula­ tion, a computational approach is presented. A notable characteristic of this approach is a uniform representation scheme and a unified algorithm which is flexible and extensible. Visual motion, also referred to as optical flow in the literature, has been the topic of intense research activity in the last few years. An important outcome of this study is combining a large body of previous work to yield a coherent theoretical framework and a unified computational approach. Many of the results originally obtained by other researchers and by myself in the early stages of my research in this area have been rederived, sometimes by a simpler method, in the new framework. Thus, this thesis provides a theoretical and computational framework for future research on visual motion, both in human vision and machine vision areas. I have made an attempt to make this book somewhat self-contained and suitable for reading by a non-specialist. The basic concepts behind the approach taken here and a summary of the results in this thesis are given in Chapter 1. The background material necessary to understand the later chapters is provided through a broad introduction to the field and review of previous literature in Chapter 2. Mathemati­ cal details of many results and implementation details of many computational algo­ rithms are excluded from the main body of the book. They are included in the appendices as they are useful in computer vision applications. Stony Brook, N.Y. M. S. January, 1988. Acknowledgements It is a delight to thank here those people who made this thesis possible, and this period of my life enjoyable and exciting. Dr. Allen Waxman, who has influenced every aspect of this work, both in spirit and in content. His critical and insightful comments at every stage of this research have been invaluable. Dr. Larry Davis, who has been a marvelous adviser to me all through my graduate studies. He and Dr. Waxman have given me immense freedom and have been extremely encouraging and supportive of the paths I have taken. Dr. Azriel Rosenfeld, for providing me the unique opportunity to pursue my research in the Computer Vision Laboratory. His comments on my research has contributed greatly towards improving it. It is a pleasure to express my deep gratitude to him. Dr. Behrooz Kamgar-Parsi, Prof. Ken-ichi Kanatani (of Gunma University, Japan), and the anonymous referees of my research papers have offered their valuable com­ ments at various stages of this work. It is a delight to express my thanks to them all. I express my heartfelt thanks to my friends in Maryland, Boston, and Stony Brook for good friendship and great times! Shah Ashiquzzaman, Roger Eastman, David Harwood, Venugopal Iyengar, Donna Marie, Stephen Omohundro, Raja Sekar, Yogen- dra Simha, Babu Srinivasan, Kambhampati Subbarao, Kwangyoen Wohn, and many more. I have learnt a great deal from these people and have enjoyed their friendship immensely. A significant part of this research was done at the Computer Vision Laboratory, University of Maryland. The staff of the laboratory has been extremely helpful to me in all administrative matters. My sincere thanks to all members of the staff. The remaining part of this research was done at Thinking Machines Corporation, Cam­ bridge, Massachusetts, where excellent computing and office facilities were made available to me. I thank the concerned people for this and for their encouragement in my graduate research. This book was prepared using troff1 text processing facility. Most of the diagrams in here were drawn using PED program written by Prof. Theo Pavlidis. Arun Simha helped in proofreading. This research was funded by the Defense Advanced Research Projects Agency and the United States Army Night Vision and Electro-Optics Laboratory under Contract DAAK70-83-K-0018 (DARPA Order 3206). I gratefully acknowledge this support. Most importantly, my mother Sharada and father Subbarao, who are my model of ideal parents. My sisters Krishnaveni and Manjula, and brothers Sreesha and Jagan- natha, for their love and their support all along. troff is a trade mark of Bell Laboratories. 1 Introduction "One of the principal objects of theoretical research in any department of knowledge is to find the point of view from which the subject appears in its greatest simplicity". -J.W. Gibbs 1.1. Problem description Humans effortlessly perceive the shape and motion of unfamiliar objects from their changing images. From a purely theoretical point of view this ability of the human visual system has intrigued perception psychologists for decades, and more recently computer vision scientists, for two reasons. First, valuable information is lost due to the projection of the three-dimensional scene onto the two-dimensional retina, and second, light as an intermediary of information transmission introduces certain ambiguities (e.g. the aperture problem discussed later) into the interpretation process. A primary goal of this research is to understand this phenomenon at the level of its computational theory (Marr, 1982). A computational study of this phenomenon is usually carried out in two stages. First is the measurement of visual motion. This involves computing the motion of image elements from the changing intensity pattern on the eye's retina. The second stage is the interpretation of this visual motion, i.e., to infer the three-dimensional shape and motion of objects given the visual motion. The first stage, the measurement of visual motion, has been intensively studied by a number of researchers (e.g.: Horn and Schunck, 1981; Hildreth, 1983; Waxman and Wohn, 1985). This book is concerned with the second stage, the interpretation of visual motion. It extends the previous work of Longuet-Higgins and Prazdny (1980) and Waxman and Ullman (1985) in this area in many ways. A general formulation of the problem and algorithms for the interpretation process are presented. 1 1.2. Motivations for the study An important goal of computer vision is to understand human vision from an information processing perspective. For this purpose, the task of vision can be divided into several stages, at least as a first approximation. Marr has suggested that the goal of the first stage of visual processing is to obtain descriptions of the physical properties of visible surfaces with respect to the viewer, properties such as distance, orientation, texture, and reflectance. This stage has been termed the 2 1/2 -D sketch and the processes involved are called early vision processes. This early stage of processing is primarily bottom-up, relying on general knowledge about the world, but not on special high-level information about the scene to be analyzed. Computational studies and perceptual experiments (Marr and Poggio, 1977) suggest that early vision processes are generic ones that correspond to conceptually independent modules that can be studied, at least to a first order, in isolation. Examples of early vision processes are edge detection for finding sharp intensity changes, stereopsis for computing a depth-map from a stereo pair of images, shape from shading, shape from texture, measurement of visual motion and interpretation of visual motion. There is no proof yet that the paradigm for computational vision proposed by Marr and his collaborators is correct, but we adopt it in the belief that something similar should be true. In this framework we contend that a rigorous and thorough analysis of the individual visual modules is fundamental to understanding vision as an information processing task. For this reason, this study focuses on one module, the visual motion module. Existence of this module as an independent process in the human visual system is demonstrated in many perceptual studies (Wallach and O'Connell, 1953; Johansson, 1973, 1975; Ullman, 1979; see Figure 1). Another very important motivation for this research arises from its potential applications in machine vision systems. This work is directly relevant to autonomous land vehicle and aircraft navigation, robot manipulation of moving machine parts, and general machine vision systems. 2 Figure 1. Ullman's (1979) two cylinder experiment which demonstrates the perception of three-dimensional shape and motion from visual mo­ tion. Positions of about 100 points lying on the surfaces of two coaxial cylinders were stored in a computer's memory. The orthographic projec­ tions of these points on a frontal plane were computed and displayed on a CRT screen. (The outlines of the cylinders were not presented in the actual display.) Although the density of dots in the projected image in­ creased at the edges of each cylinder, in the combined image of the two cylinders the dot pattern was complex and was ineffective in revealing the two cylinders. However when a changing projection of the dots was presented corresponding to a rotation of the two cylinders, the shape and motion of the cylinders were immediately perceived. Since each instan­ taneous view contained no shape information, visual motion alone was sufficient to recover both shape and motion of the cylinders. 1.3. The approach This thesis is a computational study of the problem of visual motion interpretation. Before proceeding further we define the two key terms computational theory and computational approach. These two terms were originally elucidated by Marr. 3

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.