ebook img

Multimodal Interactive Pattern Recognition and Applications PDF

292 Pages·2011·6.43 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Multimodal Interactive Pattern Recognition and Applications

Multimodal Interactive Pattern Recognition and Applications Alejandro Héctor Toselli (cid:2) Enrique Vidal (cid:2) Francisco Casacuberta Multimodal Interactive Pattern Recognition and Applications Dr.AlejandroHéctorToselli Prof.FranciscoCasacuberta InstitutoTecnológicodeInformática InstitutoTecnológicodeInformática UniversidadPolitécnicadeValencia UniversidadPolitécnicadeValencia CaminodeVera,s/n CaminodeVera,s/n 46022Valencia 46022Valencia Spain Spain [email protected] [email protected] Dr.EnriqueVidal InstitutoTecnológicodeInformática UniversidadPolitécnicadeValencia CaminodeVera,s/n 46022Valencia Spain [email protected] ISBN978-0-85729-478-4 e-ISBN978-0-85729-479-1 DOI10.1007/978-0-85729-479-1 SpringerLondonDordrechtHeidelbergNewYork BritishLibraryCataloguinginPublicationData AcataloguerecordforthisbookisavailablefromtheBritishLibrary LibraryofCongressControlNumber:2011929220 ©Springer-VerlagLondonLimited2011 Apartfromanyfairdealingforthepurposesofresearchorprivatestudy,orcriticismorreview,asper- mittedundertheCopyright,DesignsandPatentsAct1988,thispublicationmayonlybereproduced, storedortransmitted,inanyformorbyanymeans,withthepriorpermissioninwritingofthepublish- ers,orinthecaseofreprographicreproductioninaccordancewiththetermsoflicensesissuedbythe CopyrightLicensingAgency.Enquiriesconcerningreproductionoutsidethosetermsshouldbesentto thepublishers. Theuseofregisterednames,trademarks,etc.,inthispublicationdoesnotimply,evenintheabsenceofa specificstatement,thatsuchnamesareexemptfromtherelevantlawsandregulationsandthereforefree forgeneraluse. Thepublishermakesnorepresentation,expressorimplied,withregardtotheaccuracyoftheinformation containedinthisbookandcannotacceptanylegalresponsibilityorliabilityforanyerrorsoromissions thatmaybemade. Coverdesign:VTeXUAB,Lithuania Printedonacid-freepaper SpringerispartofSpringerScience+BusinessMedia(www.springer.com) Foreword Traditionally, the aim of pattern recognition is to automatically solve complex recognition problems. However, it has been realized that in many real world ap- plicationsacorrectrecognitionrateisneededthatishigherthantheonereachable withcompletelyautomaticsystems.Therefore,somesortofpost-processingisap- pliedwherehumanscorrecttheerrorscommittedbymachine.Itturnsout,however, thatveryoftenthispost-processingphaseisthebottleneckofarecognitionsystem, causingmostofitsoperationalcosts. The current book possesses two unique features that distinguish it from other books on Pattern Recognition. First, it proposes a radically different approach to correctingtheerrorscommittedbyasystem.Thisapproachischaracterizedbyhu- man and machine being tied up in a much closer loop than usually. That is, the humangetsinvolvednotonlyafterthemachinehascompletedproducingitsrecog- nition result, in order to correct errors, but during the recognition process. In this way, many errors can be avoided beforehand and correction costs can be reduced. The second unique feature of the book is that it proposes multimodal interaction betweenmanandmachineinordertocorrectandpreventrecognitionerrors.Such multimodalinteractionspossiblyincludeinputviahandwriting,speech,orgestures, inadditiontotheconventionalinputmodalitiesofkeyboardandmouse. The material of the book is presented on the basis of well founded mathemati- calprinciples,mostlyBayestheory.Itincludesvariousfundamentalresultsthatare highly original and relevant for the emerging field of interactive and multimodal pattern recognition. In addition, the book discusses in detail a number of concrete applications where interactive multimodal systems have the potential of being su- perior over traditional systems that consists of a recognition phase, conducted au- tonomously by machine, followed by a human post-processing step. Examples of such applications include unconstrained handwriting recognition, speech recogni- tion,machinetranslation,textprediction,imageretrieval,andparsing. Tosummarize,thisbookprovidesaveryfreshandnovellookatthewholedisci- plineofpatternrecognition.Itisthefirstbook,tomyknowledge,thataddressesthe emerging field of interactive and multimodal systems in a unified and integrated way. This book may in fact become a standard reference for this emerging and v vi Foreword fascinating new area. I highly recommend it to graduate students, academic and industrial researchers, lecturers, and practitioners working in the field of pattern recognition. Bern,Switzerland HorstBunke Preface Ourinterestinhuman–computerinteractionstartedwithourparticipationintheTT2 project(“Trans–Type-2”,2002–2005—http://www.tt2.atosorigin.es),fundedbythe EuropeanUnion(EU)andcoordinatedbyAtosOrigin,whichdealtwiththedevel- opmentofstatistical-basedtechnologiesforcomputerassistedtranslation. Several years earlier, we had coordinated one of the first EU-funded projects onspokenmachinetranslation(EuTrans,1996–2000—http://prhlt.iti.es/w/eutrans) and, by the time TT2 started, we had already been working for years in machine translation(MT)ingeneral.Soweknewverywellwhichwasoneofthemajorbot- tlenecksfortheadoptionoftheMTtechnologyavailableatthattimebyprofessional translationagencies:Manyprofessionaltranslatorspreferredtotypebythemselves all the text from scratch, rather than trying to take advantage of the (few) correct wordsofaMT-producedtext,whilefixingthe(many)translationerrorsandsloppy sentences. Clearly, by post-editing the error-prone text produced by a MT system, these professionals felt they were not in command of the translation process; in- stead, they saw themselves just as dumb assistants of a foolish system which was producingflakyresultsthattheyhadtofigureouthowtoamend(thestateofaffairs about post-editing has improved over the years but the feeling of lack of control persists). InTT2welearntquiteafewfactsaboutthecentralroleofhumanfeedbackinthe developmentofassistivetechnologiesandhowthisfeedbackcanleadtogreathu- man/machineperformanceimprovementsifitisadequatelytakenintoaccountinthe mathematicalformulationunderwhichsystemsaredeveloped.Wealsounderstood very well that, in these technologies, the traditional, accuracy-based performance criteria is not sufficiently adequate and performance has to be mainly assessed in terms of estimated human–machine interaction effort. In one word, assistive tech- nologyhastobedevelopedinsuchawaythatthehumanuserfeelsincommandof the system, rather than the other way around, and human-interaction effort reduc- tion must be the fundamental driving force behind system design. In TT2 we also started to realize that multimodal processing is somehow implicitly present in all interactivesystemsandthatthiscanbeadvantageouslyexploitedtoimproveoverall systemperformanceandusability. vii viii Preface After the success of TT2, our research group (PRHLT—http://prhlt.iti.upv.es), started to look at how these ideas could be applied in many other Pattern Recog- nition (PR) fields, where assistive technologies are in increasing demand. As a result, we soon found ourselves coordinating a large and ambitious Spanish re- search program, called Multimodal Interaction in Pattern Recognition and Com- puterVision(MIPRCV,2007–2012—http://miprcv.iti.upv.es).Thisprogram,which involvesmorethat100highlyqualifiedPh.D.researchersfromtenresearchinstitu- tions,aimsatdevelopingcoreassistivetechnologiesforinteractiveapplicationfields asdiverseaslanguageandmusicprocessing,medicalimagerecognition,biometrics and surveillance, advanced driving assistance systems and robotics, to name but a few. To a large extent, this book is the result of works carried out by the PRHLT research group within the MIPRCV consortium. Therefore it owes credit to many MIPRCV researchers that have directly or indirectly contributed with ideas, dis- cussions and technical collaborations in general, as well as to all the members of PRHLTwho,inonemanneroranother,havemadeitpossible. These works are presented in this book in a unified way, under the PR frame- work of Statistical Decision Theory. First, fundamental concepts and general PR approachesforMultimodalInteractionmodellingandsearch(orinference)arepre- sented.Then,systemsdevelopedonthebaseoftheseconceptsandapproachesare described for several application fields. These include interactive transcription of handwritten and spoken documents, computer assisted language translation, inter- activetextgenerationandparsing,andrelevance-basedimageretrieval.Finally,sev- eralprototypesdevelopedfortheseapplicationsareoverviewedinthelastchapter. Most of these prototypes consist in live demonstrators which can be publicly ac- cessed through the Internet. So, readers of this book can easily try them by them- selves in order to get a first-hand idea of the interesting possibilities of placing PatternRecognitiontechnologieswithintheMultimodalInteractionframework. Chapter1providesanintroductiontoInteractivePatternRecognition,examining thechallengesandresearchopportunitiesentailedbyplacingPRwithinthehuman- interactionframework.Moreover,itprovidesanintroductiontogeneralapproaches availabletosolvetheunderlyinginteractivesearchproblemsonthebasisofexisting methodstosolvethecorrespondingnon-interactivecounterpartsand,anoverviewof modernmachinelearningapproacheswhichcanbeusefulintheinteractiveframe- work. Chapter2establishesthecommonbasicsandframeworkonwhicharegrounded the computer assisted transcription approaches described in the three subsequent Chaps.:3,4and5.Ontheonehand,Chaps.3and5aredevotedtohandwrittendoc- umentstranscriptionprovidingdifferentapproaches,whichcoverdifferentaspects asmultimodality,userinteractionwaysandergonomics,activelearning,etc.Onthe otherhand,Chap.4focusesdirectlyontranscriptionofspeechsignalsemployinga similarapproachdescribedinChap.3. Likewise, Chap. 6 addresses the general topic of Interactive Machine Transla- tion,providinganadequatehuman–machine-interactiveframeworktoproducehigh- qualitytranslationbetweenanypairoflanguages.Itwillbeshownhowthisalsoal- lowsonetotakeadvantageofsomeavailablemultimodalinterfacestoincreasethe Preface ix productivity. Multimodal interfaces and adaptive learning in Interactive Machine TranslationwillbecoveredinChaps.7and8,respectively. Withsignificantdifferencesinrelationtopreviouschapters,Chaps.9–11intro- duceotherthreeInteractivePatternRecognitiontopics:InteractiveParsing,Interac- tiveTextGenerationandInteractiveImageRetrieval.Thesecondone,forexample, ischaracterizedbynotusinginputsignal,whereasthefirstandthirdbynotfollow- ingtheleft-to-rightprotocolintheanalysisoftheircorrespondinginputs. Finally,Chap.12presentsseveralfullworkingprototypesanddemonstratorsof multimodalinteractivepatternrecognitionapplications.Aspreviouslycommented, allofthesesystemsserveasvalidatingexamplesfortheapproachesthathavebeen proposedanddescribedthroughoutthisbook.Amongotherinterestingthings,they aredesignedtoenableatruehuman–computerinteractiononselectedtasks. Valencia,Spain E.Vidal A.H.Toselli F.Casacuberta

Description:
Many real-world applications of pattern recognition (PR) systems require human post-processing to correct the errors committed by machines. This can create bottlenecks in recognition systems, yielding high operational costs.This important text/reference proposes a radically different approach to thi
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.