ebook img

Multimedia information extraction: advances in video, audio, and imagery analysis for search, data mining, surveillance and authoring PDF

475 Pages·2012·5.636 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Multimedia information extraction: advances in video, audio, and imagery analysis for search, data mining, surveillance and authoring

MULTIMEDIA INFORMATION EXTRACTION Press Operating Committee Chair James W. Cortada IBM Institute for Business Value Board Members Richard E. (Dick) Fairley, F ounder and Principal Associate, Software Engineering Management Associates (SEMA) Cecilia Metra, Associate Professor of Electronics, University of Bologna Linda Shafer, former Director, Software Quality Institute, The University of Texas at Austin Evan Butterfi eld, Director of Products and Services Kate Guillemette, P roduct Development Editor, CS Press IEEE Computer Society Publications The world - renowned IEEE Computer Society publishes, promotes, and distributes a wide variety of authoritative computer science and engineering texts. These books are available from most retail outlets. Visit the CS Store at h ttp://computer. org/store for a list of products. IEEE Computer Society / Wiley Partnership The IEEE Computer Society and Wiley partnership allows the CS Press authored book program to produce a number of exciting new titles in areas of computer science, computing and networking with a special focus on software engineering. IEEE Computer Society members continue to receive a 15% discount on these titles when purchased through Wiley or at wiley.com/ieeecs . To submit questions about the program or send proposals please e - mail [email protected] or write to Books, IEEE Computer Society, 10662 Los Vaqueros Circle, Los Alamitos, CA 90720 - 1314. Telephone + 1 - 714 - 816 - 2169. Additional information regarding the Computer Society authored book program can also be accessed from our web site at h ttp://computer.org/cspress . MULTIMEDIA INFORMATION EXTRACTION Advances in Video, Audio, and Imagery Analysis for Search, Data Mining, Surveillance, and Authoring Edited by MARK T. MAYBURY A JOHN WILEY & SONS, INC., PUBLICATION Copyright © 2012 by IEEE Computer Society. All rights reserved. Published by John Wiley & Sons, Inc., Hoboken, New Jersey. Published simultaneously in Canada. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400, fax 978-646-8600, or on the web at www.copyright.com. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ07030, (201) 748-6011, fax (201) 748-6008. Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifi cally disclaim any implied warranties of merchantability or fi tness for a particular purpose. No warranty may be created or extended by sales representatives or written sales materials. The advice and strategies contained herein may not be suitable for your situation. You should consult with a professional where appropriate. Neither the publisher nor author shall be liable for any loss of profi t or any other commercial damages, including but not limited to special, incidental, consequential, or other damages. For general information on our other products and services please contact our Customer Care Department within the U.S. at 877-762-2974, outside the U.S. at 317-572-3993 or fax 317-572-4002. Wiley also publishes its books in a variety of electronic formats. Some content that appears in print, however, may not be available in electronic format. Library of Congress Cataloging-in-Publication Data: Maybury, Mark T. Multimedia information extraction : advances in video, audio, and imagery analysis for search, data mining, surveillance, and authoring / by Mark T. Maybury. p. cm. Includes bibliographical references and index. ISBN 978-1-118-11891-7 (hardback) 1. Data mining. 2. Metadata harvesting. 3. Computer fi les. I. Title. QA76.9.D343M396 2012 006.3'12–dc23 2011037229 Printed in the United States of America 10 9 8 7 6 5 4 3 2 1 CONTENTS FOREWORD ix Alan F. Smeaton PREFACE xiii Mark T. Maybury ACKNOWLEDGMENTS xv CONTRIBUTORS xvii 1 INTRODUCTION 1 Mark T. Maybury 2 MULTIMEDIA INFORMATION EXTRACTION: HISTORY AND STATE OF THE ART 13 Mark T. Maybury SECTION 1 IMAGE EXTRACTION 41 3 VISUAL FEATURE LOCALIZATION FOR DETECTING UNIQUE OBJECTS IN IMAGES 45 Madirakshi Das, Alexander C. Loui, and Andrew C. Blose 4 ENTROPY-BASED ANALYSIS OF VISUAL AND GEOLOCATION CONCEPTS IN IMAGES 63 Keiji Yanai, Hidetoshi Kawakubo, and Kobus Barnard 5 THE MEANING OF 3D SHAPE AND SOME TECHNIQUES TO EXTRACT IT 81 Sven Havemann, Torsten Ullrich, and Dieter W. Fellner v vi CONTENTS 6 A DATA-DRIVEN MEANINGFUL REPRESENTATION OF EMOTIONAL FACIAL EXPRESSIONS 99 Nicolas Stoiber, Gaspard Breton, and Renaud Seguier SECTION 2 VIDEO EXTRACTION 113 7 VISUAL SEMANTICS FOR REDUCING FALSE POSITIVES IN VIDEO SEARCH 119 Rohini K. Srihari and Adrian Novischi 8 AUTOMATED ANALYSIS OF IDEOLOGICAL BIAS IN VIDEO 129 Wei-Hao Lin and Alexander G. Hauptmann 9 MULTIMEDIA INFORMATION EXTRACTION IN A LIVE MULTILINGUAL NEWS MONITORING SYSTEM 145 David D. Palmer, Marc B. Reichman, and Noah White 10 SEMANTIC MULTIMEDIA EXTRACTION USING AUDIO AND VIDEO 159 Evelyne Tzoukermann, Geetu Ambwani, Amit Bagga, Leslie Chipman, Anthony R. Davis, Ryan Farrell, David Houghton, Oliver Jojic, Jan Neumann, Robert Rubinoff, Bageshree Shevade, and Hongzhong Zhou 11 ANALYSIS OF MULTIMODAL NATURAL LANGUAGE CONTENT IN BROADCAST VIDEO 175 Prem Natarajan, Ehry MacRostie, Rohit Prasad, and Jonathan Watson 12 WEB-BASED MULTIMEDIA INFORMATION EXTRACTION BASED ON SOCIAL REDUNDANCY 185 Jose San Pedro, Stefan Siersdorfer, Vaiva Kalnikaite, and Steve Whittaker 13 INFORMATION FUSION AND ANOMALY DETECTION WITH UNCALIBRATED CAMERAS IN VIDEO SURVEILLANCE 201 Erhan Baki Ermis, Venkatesh Saligrama, and Pierre-Marc Jodoin SECTION 3 AUDIO, GRAPHICS, AND BEHAVIOR EXTRACTION 217 14 AUTOMATIC DETECTION, INDEXING, AND RETRIEVAL OF MULTIPLE ATTRIBUTES FROM CROSS-LINGUAL MULTIMEDIA DATA 221 Qian Hu, Fred J. Goodman, Stanley M. Boykin, Randall K. Fish, Warren R. Greiff, Stephen R. Jones, and Stephen R. Moore 15 INFORMATION GRAPHICS IN MULTIMODAL DOCUMENTS 235 Sandra Carberry, Stephanie Elzer, Richard Burns, Peng Wu, Daniel Chester, and Seniz Demir CONTENTS vii 16 EXTRACTING INFORMATION FROM HUMAN BEHAVIOR 253 Fabio Pianesi, Bruno Lepri, Nadia Mana, Alessandro Cappelletti, and Massimo Zancanaro SECTION 4 AFFECT EXTRACTION FROM AUDIO AND IMAGERY 269 17 RETRIEVAL OF PARALINGUISTIC INFORMATION IN BROADCASTS 273 Björn Schuller, Martin Wöllmer, Florian Eyben, and Gerhard Rigoll 18 AUDIENCE REACTIONS FOR INFORMATION EXTRACTION ABOUT PERSUASIVE LANGUAGE IN POLITICAL COMMUNICATION 289 Marco Guerini, Carlo Strapparava, and Oliviero Stock 19 THE NEED FOR AFFECTIVE METADATA IN CONTENT-BASED RECOMMENDER SYSTEMS FOR IMAGES 305 Marko TkalČiČ, Jurij TasiČ, and Andrej Košir 20 AFFECT-BASED INDEXING FOR MULTIMEDIA DATA 321 Gareth J. F. Jones and Ching Hau Chan SECTION 5 MULTIMEDIA ANNOTATION AND AUTHORING 347 21 MULTIMEDIA ANNOTATION, QUERYING, AND ANALYSIS IN ANVIL 351 Michael Kipp 22 TOWARD FORMALIZATION OF DISPLAY GRAMMAR FOR INTERACTIVE MEDIA PRODUCTION WITH MULTIMEDIA INFORMATION EXTRACTION 369 Robin Bargar 23 MEDIA AUTHORING WITH ONTOLOGICAL REASONING: USE CASE FOR MULTIMEDIA INFORMATION EXTRACTION 385 Insook Choi 24 ANNOTATING SIGNIFICANT RELATIONS ON MULTIMEDIA WEB DOCUMENTS 401 Matusala Addisu, Danilo Avola, Paola Bianchi, Paolo Bottoni, Stefano Levialdi, and Emanuele Panizzi ABBREVIATIONS AND ACRONYMS 419 REFERENCES 425 INDEX 461 FOREWORD I was delighted when I was asked to write a foreword for this book as, apart from the honor, it gives me the chance to stand back and think a bit more deeply about multimedia information extraction than I would normally do and also to get a sneak preview of the book. One of the fi rst things I did when preparing to write this was to dig out a copy of one of Mark T. Maybury ’ s previous edited books, I ntelligent Multimedia Information Retrieval from 1997. 1 The bookshelves in my offi ce don ’ t actually have many books anymore — a copy of Keith van Rijsbergen ’ s I nformation Retrieval from 1979 (well, he was my PhD supervisor!); Negroponte ’ s book B eing Digital ; several generations of TREC, SIGIR, and LNCS proceedings from various conferences; and some old database management books from when I taught that topic to undergraduates. I ntelligent Multimedia Information Retrieval was there, though, and had survived the several culls that I had made to the bookshelves ’ contents over the years, each time I ’ ve had to move offi ce or felt claustrophobic and wanted to dump stuff out of the offi ce. All that the modern professor, researcher, student, or interested reader might need to have these days is accessible from our fi ngertips anyway; and it says a great deal about Mark T. Maybury and his previous edited collection that it survived these culls; that can only be because it still has value to me. I would expect the same to be true for this book, Multimedia Informa- tion Extraction. Finding that previous edited collection on my bookshelf was fortunate for me because it gave me the chance to reread the foreword that Karen Sp ä rck Jones had written. In that foreword, she raised the age - old question of whether a picture was worth a thousand words or not. She concluded that the question doesn’ t actually need answering anymore, because now you can have both. That conclusion was in the context of discussing the natural hierarchy of information types — multimedia types if you wish — and the challenge of having to look at many different kinds of 1 Maybury, M.T., ed., Intelligent Multimedia Information Retrieval (AAAI Press, 1997). ix x FOREWORD information at once on your screen. Karen’ s conclusion has grown to be even more true over the years, but I’ ll bet that not even she could have foreseen exactly how true it would become today. The edited collection of chapters, published in 1997, still has many chapters that are relevant and good reading today, covering the various types of content - based information access we aspired to then, and, in the case of some of those media, the kind of access to which we still aspire. That collec- tion helped to defi ne the fi eld of using intelligent, content - based techniques in multimedia information retrieval, and the collection as a whole has stood the test of time. Over the years, content - based information access has changed, however; or rather, it has had to shift sideways in order to work around the challenges posed by analyzing and understanding information encoded in some types of media, notably visual media. Even in 1997, we had more or less solved the technical challenges of capturing, storing, transmitting, and rendering multimedia, specifi cally text, image, audio, and moving video; and seemingly the only major challenges remaining were multimedia analysis so that we could achieve content - based access and navigation, and, of course, scale it all up. Standards for encoding and transmission were in place, network infrastructure and bandwidth was improving, mobile access was becoming easy, and all we needed was a growing market of people to want the content and somebody to produce it. Well, we got both; but we didn ’ t realize that the two needs would be satisfi ed by the same source — the ordinary user. Users generating their own content introduced a fl ood of material; and professional content- g enerators, like broadcasters and musicians, for example, responded by opening the doors to their own content so that within a short time, we have become overwhelmed by the sheer choice of multimedia material available to us. Unfortunately, those of us who were predicting back in 1997 that content - based multimedia access would be based on the true content are still waiting for this to happen in the case of large - scale, generic, domain - independent applications. Content - based multimedia retrieval does work to some extent on smaller, personal, or domain - dependent collections, but not on the larger scale. Fully understanding media content to the level whereby the content we identify automatically in a video or image can be used directly for indexing has proven to be much more diffi cult than we anticipated for large - scale applications, like searching the Internet. For achieving multimedia information access, searching, summarizing, and linking, we now leverage more from the multimedia collateral— the metadata, user - assigned tags, user commentary, and reviews— than from the actual encoded content. YouTube videos, Flickr images, and iTunes music, like most large multimedia archives, are navigated more often based on what people say about a video, image, or song than what it actually contains. That means that we need to be clever about using this collateral information, like metadata, user tags, and commentaries. The challenges of intelligent multimedia information retrieval in 1997 have now grown into the challenges of multimedia information mining in 2012, developing and testing tech- niques to exploit the information associated with multimedia information to best effect. That is the subject of the present collection of articles— identifying and mining useful information from text, image, graphics, audio, and video, in applica- tions as far apart as surveillance or broadcast TV. In 1997, when the fi rst of this series of books edited by Mark T. Maybury was published, I did not know him. I fi rst encountered him in the early 2000s, and I FOREWORD xi remember my fi rst interactions with him were in discussions about inviting a keynote speaker for a major conference I was involved in organizing. Mark suggested some- body named Tim Berners - Lee who was involved in starting some initiative he called the“ semantic web,” in which he intended to put meaning representations behind the content in web pages. That was in 2000 and, as always, Mark had his fi nger on the pulse of what is happening and what is important in the broad information fi eld. In the years that followed, we worked together on a number of program committees— SIGIR, RIAO, and others — and we were both involved in the devel- opment of LSCOM , the Large Scale Ontology for Broadcast TV news, though his involvement was much greater than mine. In all the interactions we have had, Mark’ s inputs have always shown an ability to recognize important things at the right time, and his place in the community of multimedia researchers has grown in importance as a result of that. That brings us to this book. When Karen Sp ä rck Jones wrote her foreword to Mark’ s edited book in 1997 and alluded to pictures worth a thousand words, she may have foreseen how creating and consuming multimedia, as we do each day, would be easy and ingrained into our society. The availability, the near absence of technical problems, the volume of materials, the ease of access to it, and the ease of creation and upload were perhaps predictable to some extent by visionaries. However, the way in which this media is now enriched as a result of its intertwining with social networks, blogging, tagging, and folksonomies, user - generated content of the wisdom of crowds— that was not predicted. It means that being able to mine information from multimedia, information culled from the raw content as well as the collateral or metadata information, is a big challenge. This book is a timely addition to the literature on the topic of multimedia infor- mation mining, as it is needed at this precise time as we try to wrestle with the problems of leveraging the “ collateral ” and the metadata associated with multime- dia content. The fi ve sections covering extraction from image, from video, from audio /graphics/behavior, the extraction of affect, and fi nally the annotation and authoring of multimedia content, collectively represent what is the leading edge of the research work in this area. The more than 80 coauthors of the 24 chapters in this volume have come together to produce a volume which, like the previous volumes edited by Mark T. Maybury, will help to defi ne the fi eld. I won ’ t be so bold, or foolhardy, as to predict what the multimedia fi eld will be like in 10 or 15 years’ time, what the problems and challenges will be and what the achievements will have been between now and then. I won ’ t even guess what books might look like or whether we will still have bookshelves. I would expect, though, that like its predecessors, this volume will still be on my bookshelf in whatever form; and, for that, we have Mark T. Maybury to thank. Thanks, Mark! Alan F. Smeaton

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.