ebook img

Video Object Extraction and Representation: Theory and Applications PDF

182 Pages·2002·4.05 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Video Object Extraction and Representation: Theory and Applications

THE KLUWER INTERNATIONAL SERIES IN ENGINEERING AND COMPUTER SCIENCE VIDEO OBJECT EXTRACTION AND REPRESENTATION Theory and Applications by I-JONG LIN Hewlett-Packard Laboratories S.Y. KUNG Princeton University KLUWER ACADEMIC PUBLISHERS New York / Boston / Dordrecht / London / Moscow eBookISBN: 0-306-47037-3 Print ISBN: 0-792-37974-8 ©2002 Kluwer Academic Publishers New York, Boston, Dordrecht, London, Moscow All rights reserved No part of this eBook may be reproduced or transmitted in any form or by any means, electronic, mechanical, recording, or otherwise, without written consent from the Publisher Created in the United States of America Visit Kluwer Online at: http://www.kluweronline.com and Kluwer's eBookstore at: http://www.ebooks.kluweronline.com For my first mentor, Professor S.Y. Kung I-Jong Lin Contents Preface ix Explanation and Index of Outline Pages xi Acknowledgments xiii 1. INTRODUCTION TO CONTENT-BASED VISUAL PROCESSING 1 1. Content-Based Information Processing 2 2. Scope of Book 3 3. Physical Object vs. Video Object 3 4. Convergenceof Technologies 6 5. The MPEG Standards 12 6. Specific Challenges of the MPEG Standards 19 7. Research Contributions 20 8. Book Outline 22 9. Conclusion 22 2. EXISTING TECHNIQUES OF VISUAL PROCESSING 25 1. Design Implications ofHuman Vision 26 2. Image Segmentation 29 3. Motion Estimation 33 4. Knowledge Representation 41 5. Dynamic Programming 43 6. Compressed Domain 44 7. Conclusion 45 3. VORONOI ORDERED SPACE 47 1. Previous Work 47 2. Notation 48 3. Definition ofVoronoi Ordered Space 48 4. Properties ofVoronoi Ordered Space 51 5. Definition ofthe Voronoi Ordered Skeleton (VOS) 55 viii VIDEO OBJECT EXTRACTION AND REPRESENTATION 6. Multiresolution Property of the VOS 57 7. Applications 63 8. Conclusion 64 4. A SYSTEM FOR VIDEO OBJECT SEGMENTATION 67 1. Previous Work 67 2. Problem Simplifications 70 3. Problem Formulation via Surface Optimization 73 4. The Bootstrap Stage 82 5. Surface Optimization 86 6. Results and Analysis 96 7. Conclusion 105 5. ROBUST REPRESENTATION OF SHAPE WITH DAGS 107 1. Previous Work 107 2. Order, Mapping and DAGs 109 3. Partitioning 111 4. DAG-Coding 113 5. Comparing DAG Representations 118 6. Recursive Structure ofDAGs 122 7. Conclusion 126 6. A SYSTEMFORIMAGE/VIDEO OBJECT QUERY BY SHAPE127 1. Previous Work 127 2. Problem Description 128 3. Shape Extraction via VOS 131 4. VOS Representation via DAG-Ordered Trees (DOTs) 136 5. SystemDesign 140 6. Results and Analysis 148 7. Conclusion 153 7. THE FUTURE OF CONTENT-BASED VIDEO PROCESSING 155 1. Universal MultimediaAccess 155 2. MPEG-4 and MPEG-7 System Synergy 158 3. Intelligent Video Transport 158 4. The “New” Media 161 Index 175 Preface “If you have built castles in the air, your work need not be lost; that is where they should be. Now put the foundations under them. ” - Henry David Thoreau, Walden Although engineering is a study entrenched firmly in belief of prag- matism, I have always believed its impact need not be limited to prag- matism. Pragmatism is not the boundaries that define engineering, just the (sometimes unforgiving) rules by which we sight our goals. This book studies two major problems of content-based video process- ing for a media-based technology: Video Object Plane (VOP) Extrac- tion and Representation, in support of the MPEG-4 and MPEG-7 video standards, respectively. After reviewing relevant image and video pro- cessing techniques, we introduce the concept of Voronoi Ordered Spaces for both VOP extraction and representation to integrate shape informa- tion into low-level optimization algorithms and to derive robust shape descriptors, respectively. We implement a video object segmentation system with a novel surface optimization scheme that integrates Voronoi Ordered Spaces with existing techniques to balance visual information against predictions of models of a priori information. With these VOPs, we have explicit forms of video objects that give users the ability to ad- dress and manipulate video content. We outline a general methodology of robust data representation and comparison through the concept of complex partitioning mapped onto Directed Acyclic Graphs (DAGs). We produce a novel, intuitively interfaced image/video query by shape system for our VOPs whose extraction is based upon Voronoi Ordered Space and whose representation and comparison algorithm are based on our work on DAGs. Within a media-based context, this Image/Video Query by Shape is an important functionality in the creation of a “new” media: an intuitive search over video content accessible both locally and x VIDEO OBJECT EXTRACTION AND REPRESENTATION over the Internet. In conclusion, we outline the future applications of content-based video processing and link these content-based processing systems synergistically in a proposed MPEG-4/7 hybrid system that uses high-level VOP representations to aid in the low-level VOP extraction. In other words, my nearly five years at Princeton have been spent searching in the library, pounding code on the keyboard, reading papers, writing papers, or translating theory into code. This book is the readable distillation of the process that will serve as reference or guidepost to those who wish to pursue this area of research. In other words, my nearly five years at Princeton have been spent on the aspects of human computation that we take for granted – general- ization, recognition and segmentation. However, in deference to prag- matism and the coming MPEG-4/7 standards, the main topic of this book focuses on the video object segmentation problem, from an en- gineering perspective. Although our stated purpose is to support the MPEG-4/7 standards, I hope the connections of this work run deeper. The video object segmentation problem is a solved problem. It is one of the mind’s simple preprocessing steps before visual understanding. In fact, it is such an inherent reflex of our visual processing that it must be connected to the structure of the mind and its processes. I hope the system, its algorithms, its analysis, and its results may shed some more light on these mysterious processes that allow us to make sense of this world visually. What are Outline Pages? Outline pages are: - synthesis of presentation slides from conferences and Ph.D. dissertation - Written for a general engineering audience - A quick summary of a part of a chapter - to periodically familiarize and orient the reader to the chapter material - Formatted like this page: - Full Page - Outline Box - Question as a title - Conclusion as an answer to the title Outline pages are NOT: - Formal - Precise - Implementation-Oriented Index of Outline pages pg.5 Why are Video Objects Important? pg.8 What Technologies Do We Need? pg.14 How are MPEG Standards Evolving? pg.27 How do We Best Leverage Current Technologies? pg.49 How do we use Voronoi Order? pg.56 How do we use Voronoi Order Skeleton? pg.69 How are Video Object Segmentation and MPEG-4 Related? pg.72 How do we Extract Video Objects? pg.108 How to use DAGs for Robust Recognition? pg.110 What do we need to apply our DAG approach? pg.129 Why do we use Shape for Content Query? pg.138 How do we use DAG-Ordered Trees for Recognition? pg.160 What do we need for the “New” Media? Outline Pages are a useful tool in orienting the reader with the text. Acknowledgments There are many people without whose support I could not finished this process of graduate school education and subsequent book writing process. First, I would like to thank Princeton University for such an envi- ronment where academic and deep work can flourish. Karen Williams, Sheila Gunning, Stephanie Constanti, John Bittner, Eugene Conover and Jay Plett have provided me all the support and advice. Professors Wayne Wolf, Peter Ramadge, Ruby Lee, Sharad Malik, and Bede Liu all have contributed to my book with their insights and conversations. I would especially like to thank Prof. Ling Guan and Peter Ramadge for their conscientious and careful reading. The cold hard cash helps also. I would like to thank Mitsubishi Ad- vanced Television Lab for supporting my work in content-based video processing: Huifang Sun for blending their industrial efforts with my thesis work, Anthony Vetro for his expert help in deciphering and trans- lating MPEG documents into English, Ajay Divakaran for deciphering and translating MPEG-7 and advocating our efforts at the MPEG-7 meetings. Their enthusiasm and help made my work that much better. Having a publisher is also a good thing. Kluwer Academic Publishers have graciously given me this opportunity for my thesis to be something more than three unread copies in a Princeton Library. Many thanks goes to C. Anne Murray for getting the details correct and Jennifer Evans for guiding me through the publication process. As I’ve noted in my previous incarnation at Stanford, what I remem- ber most from a school is the students. The diversity and character of Princeton students creates a creative and pleasant atmosphere of learn- ing. First, in expressing my frustration in the thesis, I have played a goodly amount of basketball and I would like to thank the morn- ing basketball crew. It’s hard to find a good pick-up basketball game

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.