Table Of Content

3C Vision 3C Vision Cues, Contexts, and Channels Virginio Cantoni Universita` di Pavia Stefano Levialdi Sapienza, Universita` di Roma Bertrand Zavidovique Universite´ Paris-Sud 11 Institute for Electronics Fundamentals Orsay, France AMSTERDAM BOSTON HEIDELBERG LONDON NEWYORK OXFORD (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) PARIS SANDIEGO SANFRANCISCO SINGAPORE SYDNEY TOKYO (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) Elsevier 32JamestownRoadLondonNW17BY 225WymanStreet,Waltham,MA02451,USA Firstedition2011 Copyrightr2011ElsevierInc.Allrightsreserved Nopartofthispublicationmaybereproducedortransmittedinanyformorbyanymeans, electronicormechanical,includingphotocopying,recording,oranyinformationstorageand retrievalsystem,withoutpermissioninwritingfromthepublisher.Detailsonhowtoseek permission,furtherinformationaboutthePublisher’spermissionspoliciesandour arrangementwithorganizationssuchastheCopyrightClearanceCenterandtheCopyright LicensingAgency,canbefoundatourwebsite:www.elsevier.com/permissions Thisbookandtheindividualcontributionscontainedinitareprotectedundercopyrightby thePublisher(otherthanasmaybenotedherein). Notices Knowledgeandbestpracticeinthisfieldareconstantlychanging.Asnewresearchand experiencebroadenourunderstanding,changesinresearchmethods,professionalpractices, ormedicaltreatmentmaybecomenecessary. Practitionersandresearchersmustalwaysrelyontheirownexperienceandknowledgein evaluatingandusinganyinformation,methods,compounds,orexperimentsdescribed herein.Inusingsuchinformationormethodstheyshouldbemindfuloftheirownsafetyand thesafetyofothers,includingpartiesforwhomtheyhaveaprofessionalresponsibility. Tothefullestextentofthelaw,neitherthePublishernortheauthors,contributors,oreditors, assumeanyliabilityforanyinjuryand/ordamagetopersonsorpropertyasamatterof productsliability,negligenceorotherwise,orfromanyuseoroperationofanymethods, products,instructions,orideascontainedinthematerialherein. BritishLibraryCataloguing-in-PublicationData AcataloguerecordforthisbookisavailablefromtheBritishLibrary LibraryofCongressCataloging-in-PublicationData AcatalogrecordforthisbookisavailablefromtheLibraryofCongress ISBN:978-0-12-385220-5 ForinformationonallElsevierpublications visitourwebsiteatwww.elsevierdirect.com ThisbookhasbeenmanufacturedusingPrintOnDemandtechnology.Eachcopyis producedtoorderandislimitedtoblackink.Theonlineversionofthisbookwillshow colorfigureswhereappropriate. Foreword Digital images are the product of two technologies: cameras and graphics. This book is about both kinds of images. Images from cameras may be fed into computer vision and image-processing systems; the word “vision” in the title of this book refers first to this kind of analysis, which depends on cues (visual features) and contexts (frameworks and predispositions for interpretation). Images that result from graphical synthesis represent the visions (imaginations) of their creators, and this is the second meaning of “vision” in the title. The prime purpose of the image hereistocommunicatethevision,andthespecificmedium(e.g.,stillimage,video, virtualworld,game,andsoon)isa“channel.” The strength ofthis book lies inits integration oftechnical information, application, and reflections on the social consequences of this rapidly expanding aspect of the human experience. Let us consider briefly the two main phenomena that are drivingtheseideas. First, we have digital camerasrapidlybecomingmorepowerful inthreerespects: 1. Betterimaging,withautomaticfocus,exposure,highdynamicrange,imagestabilization, colorbalance,red-eyereduction,panorama,stereo,andsoon. 2. Ubiquity, with cameras appearing as standard items in laptops, cell phones, and automobiles. 3. Connectedness,withmanycamerasintegratedwiththeInternetviatheircellphonehosts. Thus, more than ever before, digital images are being generated in greater numbers, in higherquality,andaspartoftheInternet. The second phenomenon in the rise of the image is incredible technologies for producing synthetic images, such as the animation systems that helped produce the 3D film Avatar, and the powerful gaming graphics chips from companies like NVIDIAthatcanturnpolygonlistsintorealisticshadedscenesathighspeed. Whatdoesthismean?Themindshareofdigitalimages,ratherthantext,formost of the population isgrowing. Daily experience ismoreimage oriented andlesstext orientedforincreasinglymorepeople.Itisimportant,therefore,foreducatedpeople to understand what is going on with images and to be familiar with the basic concepts of how the images are processed, how they are sent through the Internet, and howtheyrelatetoothermedia,suchasinteractivegames,andaudio. Thisbookdoesanadmirablejobofgivinganoverview,integratingbasictechni- cal concepts of image processing with a discussion of a key class of applications: web search involving images (either as queries or as targets), plus a reflection on the role played by images in current and emerging communication media, such as onlinevirtualworlds. x Foreword Another strength of this book is its bringing together of many references, put into context by the narrative, so that the reader can easily dig into the literature to follow up on a topic of particular interest. These references span a wide gamut, from articles and texts on computer graphics and computer vision to works on the philosophyofcommunication. StevenTanimoto UniversityofWashington Preface This book analyzes the visual hints by which both humans and computer programs generally interpret, process, and exploit images. The chapters in this book provide a unified framework in which both biological and artificial vision are discussed through visual cues, the role of contexts, and the available multichannels with whichinformationisdelivered.Thesevarioussubjectsaretraditionallyinvestigated withindifferentscientificcommunitiesproducingspecifictechnicalliterature. The exponential explosion of images and videos concerns everybody because media are now present everywhere and in all human activities. At this stage, it seems important that scientists, artists, and engineers in all fields be aware of how images are essential information carriers. Images carry a strong evocative power because they quickly bring to mind a number of related pictures of past experiences—or even abstract concepts like pleasure, attraction, or aversion. Analyzing the impact that images have on people is the thread of this book. It puts forward the connection between technical issues that can be extracted from objec- tive measurements and psychological issues that emerge from perception, bound to the context and the time at which such images are observed. In short, the book advocates that, as is commonly stated, “there is more than meets the eye” when a picture, a graphical representation, a sketch, or an icon islooked at; for this reason, it is hoped the reader will acquire a deeper insight into joint computer-vision and human-brain machinery when looking at a picture, be it natural, digitized, or artifi- ciallygenerated. Images play a fundamental role in many application domains such as video sur- veillance, biomedical diagnostics, remote sensing, automatic inspection, robot vision,andsoon.Forthisreason,theautomaticmanagement,retrieval,andproces- sing of images requires sophisticated tools to effectively perform such tasks. Present techniques in most applications—for example, vehicle autonomy from vision, image-based retrieval, or automatic video management—benefit from a bet- terintegrationofbio-inspiredsolutionsandadvancedinformation-processingfacili- ties.Inthisconnection,acontrolledviewonthevisionprocessinvolves: (cid:1) concurrentbasicinformationandrelativedescription(cuesfromimageinput); (cid:1) intendedaction,aprioriknowledge,andcommonculture(context);and (cid:1) media to convey processed information, including the coding scheme (through-channels output). When targeting artificial vision systems, it is worth considering the perception processinagoal-oriented fashion, therefore includinganintention.There isalways an action to be performed at the end. For instance, in character recognition, a xii Preface generic action could be rewriting; in understanding an outdoor scene, the action may be survival. Therefore,the computervisionresearchcommunity hasagreed on atwo-stageanalyticstrategy: (cid:1) Systemsfirstdealwithobjects,aimingatcapturingtheirinterrelationsfromcurrentscene analysis. To do that, common sense knowledge is employed, usually based on intuitive geometry (perspective, projections, etc.) or na¨ıve physics (e.g., lighting, shadows, and moregenerallyphotometry). (cid:1) Systems next discover relations among objects and tackle the question of scene under- standingfordecisionandactionplanning.Forthis,elaborateorexpertknowledgeisnec- essary: This is usually stored independently from the current acquisition and contingent information,allowingforalargerpotentialinterpretation. Note thatsuch stepsfollowthehistoricalevolutioninhumanvisual communication. During the Renaissance period, artists studied and began to understand the physical rules of their environment, to enable faithful rendering, for instance, of natural landscapes (complying with perspective and light propagation laws). During the last century, the impressionists, although they fully knew the physical laws of light, overcame the relative constraints. This allowed for the conveyance of abstraction, for instance, for the viewer’s immediate access to feelings based on color or texture that predominate over the structure of the painting as a whole. Later, cubists and their followers even used paradoxes triggered by the fragmented image, enabling enriched and more complex nuances to emerge in the observer’s mind. In this book, the evolution of artificial vision will be considered on the basis of a growing understanding of the human visual system. This does not imply in any way that machines can or should mimic human perception. The computer vision community has been proving, for the last decade, that a synthesis process is feasi- ble for interhuman mediation through machines or for interfacing machines with humans. Communication systems employ many information channels and an actual representation of the intended goal. They use multiple media to purposely stress a selective presentation of scene components, favoring the given goal of a human or technological agent. This process currently corresponds to a simplistic version of the basic human information exchange, merging images and sounds with actions, such as made popular over centuries by theater or opera, and more recently by comics,movies,andsoforth. Note that, whether computer vision is considered for solving the inverse (image analysis) or direct (image synthesis) problem, whenever interaction occurs, the more intimate and cultivated people are, the more abstract and subjective the process becomes. Abstraction usually goes hand in hand with flexibility (relaxation of constraints), thus lowering precision and favoring subjectivity (interpretation). In this process, the two analysis steps (cues detection and context exploitation) do not have strict parameters. Moreover, objects can also be described qualitatively, sug- gesting or triggering the observer’s feelings. The pattern of the Mona Lisa’s lips has given rise to different interpretations for centuries, and this will probably Preface xiii continue regardless of any response given by anthropometry software used to ana- lyze it. Conversely, detecting objects without directly sensing them, but only by analyzing relationships among them or from physical laws, is the essence of the humanevolutionbyabduction.Arecentexampleisthediscoveryofremoteplanets withoutbeingabletoactuallyseethem. It is no wonder that human subjectivity turns out to be the real technological challenge in man(cid:1)machine interaction. In fact, the difficulty lies in correctly inter- preting the human goal: Machines cannot capture it and normally do not easily decipherconcepts. One of the primary concerns of an artist in successfully conveying his or her message is that of directing the attention of a viewer and concentrating it on some salient detail. In image analysis by computers, the main process is the so-called attention focusing, dual of the pop-out/interrupt phenomenon that triggers human attention. Artists started to highlight objects through their position: central, aligned with a diagonal, plausible, or unexpected (as the egg above the head of the Madonna in The Sacred Conversation by Piero della Francesca); through their dimension, for example, the especially large apple tree seen before the naked bodies upfrontin The Tree ofKnowledge by Lucas Cranach;through lighting, as inthe little naked model inL0Atelier duPeintrebyGustaveCourbetandthegiganticcav- ity in The Climbing to Paradise by Jerome Bosch; or through their genuine impor- tance,asseenindepictionsofJesusChristinmostsacredpaintings. More recently, creators work at a conceptual level by exploiting the idea of dis- ruption. For instance, Rene´ Magritte painted faces having birds, apples, eggs, and such, to describe their inner personalities; Fernando Botero made a specialty of fat bodies; Maurits Escher played with laws of perspective and gravity; Wassily Kandinsky invented a theory of color effects; and Edvard Munch amplified all movements to evocate missing senses, for example, by actually merging lovers in The Kiss or by distorting the features in the melting mouth, face, and scenery in TheScream. A key factor of successful man(cid:1)machine interaction is real-time performance. A natural way to reach real-time image information exchange is to use only the required data at the right time. Biological strategies like attention focusing, smart path scanning, multiresolution analysis, and motion tracking are discussed later in this book. Note that either the significant information is known a priori and a pre- defined sequence of hypothesis test is applied or it must be directly derived from the image data by trivial operators. A number of techniques for simulating biological vision systems and for optimizing limited resources in a “wide field of view” are presented in this text, together with some examples of existing systems based ontheseprinciples. One aim of this book is to trigger curiosity in the reader, introducing him or her tothe large world ofimageryon which many human activities are based,from pol- itics to entertainment, to technical reports, to artistic creations. To be involved in these creative activities in today’s world, it is necessary to understand and use recent technological multimedia tools. In this respect, the book can be seen as a xiv Preface general-purpose guide into the image culture, its development, and potential leverage. The book is also useful in universities as a comprehensive introductory text for entry-levelpostgraduatestudentsacquiringageneralunderstandingofimageanaly- sis. Our preference not to resort to strictly technical matters will make it easier for most students, whatever their technical skills, to better understand the modern use of computerized images and videos. The book involves an approach to teaching that supports a brief introduction to algorithms and data structures, and does not waste too much of the readers’ attention on unnecessary technical details. To that aim, all technical matter—including mathematics and technology-based information or comments—is presented in a smaller, italicized font so that a nonexpert readercanskipitatfirstreading. Thecontentofthebookissubdividedintofourchapters. Chapter 1: Natural and artificial vision discusses differences, analogies, and possible synergies between natural and artificial vision systems. A number of concepts related to the analysis and synthesis of images (V-schema) are described. Different strategies for image exploration within artificial vision are explained and presentedalongwithcorrespondingsketchesandfigures. Chapter 2: Visual cues explains the different visual cues that can be used both for image analysis and rendering, following three approaches: photometric, mor- phologic, and spatial. The photometric track provides a general description of light image models, followed by explanationsofimage synthesismethods andchromatic clues. The morphological track considers different shape representations, transfor- mations, and extraction methods. The spatial track covers 3D vision and modeling, includingmotionanalysis. Chapter 3: The role of contexts considers the various functions that context may play in different communities (artistic, scientific, etc.) and explains how an image can be retrieved and interpreted both in human and computer-based systems. Additionally, a number of methods (statistical, linguistic, structural, etc.) are explained and examples given. The essential link of context with new technologies is illustrated through two generic operations: image retrieval and sensor networking. Chapter 4: Channeling the information deals with the use of different channels for conveying multimedia information thanks to a variety of methods: by using icons and metaphors, using multiple channels, and capitalizing on the network properties. The latest technologies that allow virtual realism, ambient intelligence, and3Dsophisticatedanimationfindaplaceattheendofthebook. Acknowledgments WegratefullyacknowledgetheassistanceandsupportofAlessandraSettitoproperly incorporateallthe figuresinthisbookaswellasthecooperationofSamia Bouchafa and Miche`leGouiffe`s,who providedadviceondifferenttopicscovered.Last, thanks are given to the Skype desktop sharing system for all the hours we spent using it duringthefinalrevisionofourmanuscript.

3C Vision. Cues, Context and Channels PDF

246 Pages·2011·12.792 MB·English

by Virginio Cantoni, Stefano Levialdi and Bertrand Zavidovique (Auth.)

Checking for file health...

Save to my drive

Quick download

Download

Upgrade Premium

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview 3C Vision. Cues, Context and Channels

See more

The list of books you might like

Upgrade Premium

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.