ebook img

Hierarchical Neural Networks for Image Interpretation PDF

244 Pages·2003·9.219 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Hierarchical Neural Networks for Image Interpretation

Sven Behnke Hierarchical Neural Networks for Image Interpretation June13, 2003 Draft submitted to Springer-Verlag Published as volume 2766 of Lecture Notes in Computer Science ISBN: 3-540-40722-7 Foreword Itismypleasureandprivilegetowritetheforewordforthisbook,whoseresultsI havebeenfollowingandawaitingforthelastfewyears.Thismonographrepresents theoutcomeofanambitiousprojectorientedtowardsadvancingourknowledgeof thewaythehumanvisualsystemprocessesimages,andaboutthewayitcombines highlevelhypotheseswith lowlevelinputsduringpatternrecognition.Themodel proposedbySvenBehnke,carefullyexposedinthefollowingpages,canbeapplied nowbyotherresearcherstopracticalproblemsinthefieldofcomputervisionand providesalsocluesforreachingadeeperunderstandingofthehumanvisualsystem. Thisbookaroseoutofdissatisfactionwithanearlierproject:backin1996,Sven wrote one of the handwritten digit recognizers for the mail sorting machines of theDeutschePostAG.Theprojectwassuccessfulbecausethemachinescouldin- deedrecognizethehandwrittenZIPcodes,ata rateofseveralthousandlettersper hour. However, Sven was not satisfied with the amount of expert knowledge that wasneededtodevelopthefeatureextractionandclassificationalgorithms.Hewon- deredifthecomputercouldbeabletoextractmeaningfulfeaturesbyitself,anduse theseforclassification.Hisexperienceintheprojecttoldhimthatforwardcompu- tationalonewouldbeincapableofimprovingtheresultsalreadyobtained.Fromhis knowledgeofthe humanvisualsystem, he postulatedthatonlya two-waysystem couldwork,one thatcouldadvancea hypothesisby focussingthe attentionof the lowerlayersofaneuralnetworkonit.Hespentthenextfewyearsdevelopinganew modelfortacklingpreciselythisproblem. Themainresultofthisbookistheproposalofagenericarchitectureforpattern recognitionproblems,called NeuralAbstractionPyramid(NAP).The architecture islayered,pyramidal,competitive,andrecurrent. Itis layeredbecauseimagesare representedatmultiplelevelsofabstraction.Itisrecurrentbecausebackwardpro- jectionsconnecttheuppertothelowerlayers.Itispyramidalbecausetheresolution of the representations is reduced from one layer to the next. It is competitive be- cause in each layer units compete against each other, trying to classify the input best.Themainideabehindthisarchitectureislettingthelowerlayersinteractwith the higherlayers. Thelower layerssendsome simple featuresto the upperlayers, theupperslayersrecognizemorecomplexfeaturesandbiasthecomputationinthe lowerlayers.Thisin turnimprovesthe inputto the upperlayers,whichcan refine theirhypotheses,andsoon.Afterafewiterationsthenetworksettlesinthebestin- terpretation.Thearchitecturecanbetrainedinsupervisedandunsupervisedmode. VI Here, I should mention that there have been many proposals of recurrent ar- chitecturesfor pattern recognition.Over the years we have tried to apply them to non-trivialproblems.Unfortunately,manyof the proposalsadvancedin the litera- turebreakdownwhenconfrontedwithnon-toyproblems.Therefore,oneofthefirst advantagespresentinBehnke’sarchitectureisthatitactuallyworks,alsowhenthe problemisdifficultandreallyinterestingforcommercialapplications. ThestructureofthebookreflectstheroadtakenbySventotackletheproblem ofcombiningtop-downprocessingofhypotheseswithbottom-upprocessingofim- ages.PartIdescribesthetheoryandPartIItheapplicationsofthearchitecture.The firsttwochaptersmotivatetheproblemtobeinvestigatedandidentifythefeatures ofthehumanvisualsystemwhicharerelevantfortheproposedarchitecture:retino- topicorganizationoffeaturemaps,localrecurrencewithexcitationandinhibition, hierarchyofrepresentations,andadaptationthroughlearning. Chapter 3 gives an overview of several models proposed in the last years and providesagentleintroductiontothenextchapter,whichdescribestheNAParchi- tecture. Chapter 5 deals with a special case of the NAP architecture, when only forwardprojectionsareusedandfeaturesarelearnedinanunsupervisedway.With thischapter,Svencamefullcircle:thedigitclassificationtaskhehadsolvedformail sorting, using a hand-designed structural classifier, was outperformed now by an automaticallytrainedsystem.Thisisaremarkableresult,sincemuchexpertknowl- edgewentintothedesignofthehand-craftedsystem. FourapplicationsoftheNAPconstitutePartII.Thefirstapplicationistherecog- nitionofmetervalues(printedpostagestamps),thesecondthebinarizationofma- trixcodes(alsousedforpostage),thethirdisthereconstructionofdamagedimages, andthelastisthelocalizationoffacesincomplexscenes.Theimagereconstruction problemismyfavoriteregardingthekindoftaskssolved.AcompleteNAPisused, with all its lateral, feed-forward and backward connections. In order to infer the originalimagesfromdegradedones,thenetworkmustlearnmodelsoftheobjects presentintheimagesandcombinethemwithmodelsoftypicaldegradations. I think that it is interesting how this book started from a general inspiration aboutthewaythehumanvisualsystemworks,howthenSvenextractedsomegen- eralprinciplesunderlyingvisualperceptionandhowheappliedthemtothesolution ofseveralvisionproblems.TheNAParchitectureiswhattheNeocognitron(alay- eredmodelproposedbyFukushimathe1980s)aspiredtobe.ItistheNeocognitron gotten right. The main difference between one and the other is the recursive na- tureoftheNAP.Combiningthebottom-upwiththetop-downapproachallowsfor iterativeinterpretationofambiguousstimuli. I can only encourage the reader to work his or her way through this book. It isverywellwrittenandprovidessolutionsforsometechnicalproblemsaswellas inspirationforneurobiologistsinterestedincommoncomputationalprinciplesinhu- manandcomputervision.Thebookislikearoadthatwillleadtheattentivereader toarichlandscape,fullofnewresearchopportunities. Berlin,June2003 Rau´lRojas Preface Thisthesisis publishedin partialfulfillmentofthe requirementsforthe degreeof ’DoktorderNaturwissenschaften’(Dr.rer.nat.)attheDepartmentofMathematics andComputerScienceofFreieUniversita¨tBerlin.Prof.Dr.Rau´lRojas(FUBerlin) and Prof. Dr. Volker Sperschneider(Osnabru¨ck)acted as referees. The thesis was defendedonNovember27,2002. Summary ofthe Thesis Human performance in visual perception by far exceeds the performance of con- temporarycomputervisionsystems.Whilehumansareabletoperceivetheirenvi- ronmentalmost instantly andreliably undera wide rangeof conditions,computer visionsystemsworkwellonlyundercontrolledconditionsinlimiteddomains. Thisthesisaddressesthedifferencesindatastructuresandalgorithmsunderly- ing the differencesin performance.The interface problembetween symbolic data manipulated in high-level vision and signals processed by low-level operations is identifiedasoneofthemajorissuesoftoday’scomputervisionsystems.Thisthesis aimsatreproducingtherobustnessandspeedofhumanperceptionbyproposinga hierarchicalarchitectureforiterativeimageinterpretation. Iproposetousehierarchicalneuralnetworksforrepresentingimagesatmultiple abstraction levels. The lowest level represents the image signal. As one ascends these levels of abstraction,the spatial resolution of two-dimensionalfeature maps decreases while feature diversity and invariance increase. The representations are obtainedusingsimpleprocessingelementsthatinteractlocally.Recurrenthorizontal andverticalinteractionsaremediatedbyweightedlinks.Weightsharingkeepsthe numberof free parameterslow. Recurrence allows to integrate bottom-up,lateral, andtop-downinfluences. Image interpretation in the proposed architecture is performed iteratively. An image is interpreted first at positions where little ambiguity exists. Partial results thenbiastheinterpretationofmoreambiguousstimuli.Thisisaflexiblewaytoin- corporatecontext.Sucharefinementismostusefulwhentheimagecontrastislow, noiseanddistractorsarepresent,objectsarepartiallyoccluded,ortheinterpretation isotherwisecomplicated. The proposed architecture can be trained using unsupervised and supervised learning techniques. This allows to replace manual design of application-specific VIII computervision systems with the automatic adaptationof a generic network.The tasktobesolvedisthendescribedusingadatasetofinput/outputexamples. Applicationsof the proposedarchitectureare illustrated usingsmall networks. Furthermore,severallargernetworksweretrainedtoperformnon-trivialcomputer vision tasks, such as the recognition of the value of postage meter marks and the binarizationofmatrixcodes.Itisshownthatimagereconstructionproblems,suchas super-resolution,filling-inofocclusions,andcontrastenhancement/noiseremoval, canbelearnedaswell.Finally,thearchitecturewasappliedsuccessfullytolocalize facesincomplexofficescenes.Thenetworkisalsoabletotrackmovingfaces. Acknowledgements MyprofoundgratitudegoestoProfessorRau´lRojas,mymentorandresearchadvi- sor,forguidance,contributionofideas,andencouragement.IsaluteRau´l’sgenuine passionforscience,discoveryandunderstanding,superiormentoringskills,andun- paralleledavailability. The research for this thesis was done at the ComputerScience Institute of the Freie Universita¨tBerlin. I am gratefulfor the opportunityto work in such a stim- ulatingenvironment,embeddedin the excitingresearch contextof Berlin. The AI grouphasbeenhosttomanychallengingprojects,e.g.totheRoboCupFU-Fighters projectand to the E-Chalk project. I owe a greatdeal to the membersand former membersofthegroup.Inparticular,IwouldliketothankAlexanderGloye,Bern- hardFro¨tschl,JanDo¨sselmann,andDr.MarcusPfisterforhelpfuldiscussions. Parts of the applications were developed in close cooperation with Siemens ElectroCom Postautomation GmbH. Testing the performance of the proposed ap- proach on real-world data was invaluable to me. I am indebted to Torsten Lange, whowasalwaysopenforunconventionalideasandgavemedetailedfeedback,and toKatjaJakel,whopreparedthedatabasesanddidtheevaluationoftheexperiments. Mygratitudegoesalsotothepeoplewhohelpedmetopreparethemanuscript ofthe thesis. Dr. Natalie Hempelde Ibarramadesurethatthe chapteronthe neu- robiological background reflects current knowledge. Gerald Friedland, Mark Si- mon,AlexanderGloye,andMaryAnnBrennanhelpedbyproofreadingpartsofthe manuscript.Specialthanksgoto Barry Chen whohelpedme to preparethe thesis forpublication. Finally, I wish to thank my family for their support. My parents have always encouragedand guided me to independence,never trying to limit my aspirations. Mostimportantly,IthankAnne,mywife,forshowinguntiringpatienceandmoral support,remindingmeofmyprioritiesandkeepingthingsinperspective. Berkeley,June2003 SvenBehnke Table of Contents Foreword ...................................................... V Preface ........................................................ VII 1. Introduction................................................ 1 1.1 Motivation ................................................ 1 1.1.1 ImportanceofVisualPerception........................ 1 1.1.2 PerformanceoftheHumanVisualSystem ............... 2 1.1.3 LimitationsofCurrentComputerVisionSystems ......... 6 1.1.4 IterativeInterpretation–LocalInteractionsinaHierarchy.. 9 1.2 OrganizationoftheThesis ................................... 12 1.3 Contributions.............................................. 13 PartI.Theory 2. NeurobiologicalBackground .................................. 17 2.1 VisualPathways............................................ 18 2.2 FeatureMaps.............................................. 22 2.3 Layers.................................................... 24 2.4 Neurons .................................................. 27 2.5 Synapses.................................................. 28 2.6 Discussion ................................................ 30 2.7 Conclusions ............................................... 34 3. RelatedWork............................................... 35 3.1 HierarchicalImageModels .................................. 35 3.1.1 GenericSignalDecompositions ........................ 35 3.1.2 NeuralNetworks..................................... 41 3.1.3 GenerativeStatisticalModels .......................... 46 3.2 RecurrentModels .......................................... 51 3.2.1 ModelswithLateralInteractions ....................... 52 3.2.2 ModelswithVerticalFeedback......................... 57 3.2.3 ModelswithLateralandVerticalFeedback............... 61 X TableofContents 3.3 Conclusions ............................................... 64 4. NeuralAbstractionPyramidArchitecture....................... 65 4.1 Overview ................................................. 65 4.1.1 HierarchicalNetworkStructure ........................ 65 4.1.2 DistributedRepresentations ........................... 67 4.1.3 LocalRecurrentConnectivity.......................... 69 4.1.4 IterativeRefinement.................................. 70 4.2 FormalDescription ......................................... 71 4.2.1 SimpleProcessingElements........................... 71 4.2.2 SharedWeights...................................... 73 4.2.3 Discrete-TimeComputation ........................... 75 4.2.4 VariousTransferFunctions ............................ 77 4.3 ExampleNetworks ......................................... 79 4.3.1 LocalContrastNormalization.......................... 79 4.3.2 BinarizationofHandwriting ........................... 83 4.3.3 Activity-DrivenUpdate ............................... 90 4.3.4 InvariantFeatureExtraction ........................... 92 4.4 Conclusions ............................................... 96 5. UnsupervisedLearning ...................................... 97 5.1 Introduction ............................................... 98 5.2 LearningaHierarchyofSparseFeatures .......................102 5.2.1 NetworkArchitecture.................................102 5.2.2 Initialization ........................................104 5.2.3 HebbianWeightUpdate...............................104 5.2.4 Competition ........................................105 5.3 LearningHierarchicalDigitFeatures ..........................106 5.4 DigitClassification .........................................111 5.5 Discussion ................................................112 6. SupervisedLearning......................................... 115 6.1 Introduction ...............................................115 6.1.1 NearestNeighborClassifier............................115 6.1.2 DecisionTrees ......................................116 6.1.3 BayesianClassifier...................................116 6.1.4 SupportVectorMachines..............................117 6.1.5 Bias/VarianceDilemma...............................117 6.2 Feed-ForwardNeuralNetworks...............................118 6.2.1 ErrorBackpropagation................................119 6.2.2 ImprovementstoBackpropagation......................121 6.2.3 Regularization.......................................124 6.3 RecurrentNeuralNetworks ..................................124 6.3.1 BackpropagationThroughTime........................125 6.3.2 Real-TimeRecurrentLearning .........................126 TableofContents XI 6.3.3 DifficultyofLearningLong-TermDependencies..........127 6.3.4 RandomRecurrentNetworkswithFadingMemories ......128 6.3.5 RobustGradientDescent..............................130 6.4 Conclusions ...............................................131 PartII.Applications 7. RecognitionofMeterValues .................................. 135 7.1 IntroductiontoMeterValueRecognition .......................135 7.2 SwedishPostDatabase......................................136 7.3 Preprocessing..............................................137 7.3.1 Filtering............................................137 7.3.2 Normalization.......................................140 7.4 BlockClassification ........................................142 7.4.1 NetworkArchitectureandTraining .....................144 7.4.2 ExperimentalResults.................................144 7.5 DigitRecognition ..........................................146 7.5.1 DigitPreprocessing ..................................146 7.5.2 DigitClassification...................................148 7.5.3 CombinationwithBlockRecognition ...................151 7.6 Conclusions ...............................................153 8. BinarizationofMatrixCodes ................................. 155 8.1 IntroductiontoTwo-DimensionalCodes .......................155 8.2 CanadaPostDatabase.......................................156 8.3 AdaptiveThresholdBinarization..............................157 8.4 ImageDegradation .........................................159 8.5 LearningBinarization.......................................161 8.6 ExperimentalResults .......................................162 8.7 Conclusions ...............................................171 9. LearningIterativeImageReconstruction........................ 173 9.1 IntroductiontoImageReconstruction..........................173 9.2 Super-Resolution...........................................174 9.2.1 NISTDigitsDataset..................................176 9.2.2 ArchitectureforSuper-Resolution ......................176 9.2.3 ExperimentalResults.................................177 9.3 Filling-inOcclusions........................................181 9.3.1 MNISTDataset......................................182 9.3.2 ArchitectureforFilling-InofOcclusions.................182 9.3.3 ExperimentalResults.................................183 9.4 NoiseRemovalandContrastEnhancement .....................186 9.4.1 ImageDegradation...................................187 9.4.2 ExperimentalResults.................................187 XII TableofContents 9.5 ReconstructionfromaSequenceofDegradedDigits .............189 9.5.1 ImageDegradation...................................190 9.5.2 ExperimentalResults.................................191 9.6 Conclusions ...............................................196 10. FaceLocalization ........................................... 199 10.1 IntroductiontoFaceLocalization .............................199 10.2 FaceDatabaseandPreprocessing .............................202 10.3 NetworkArchitecture .......................................203 10.4 ExperimentalResults .......................................204 10.5 Conclusions ...............................................211 11. SummaryandConclusions ................................... 213 11.1 ShortSummaryofContributions..............................213 11.2 Conclusions ...............................................214 11.3 FutureWork...............................................215 11.3.1 ImplementationOptions ..............................215 11.3.2 UsingmoreComplexProcessingElements...............216 11.3.3 IntegrationintoCompleteSystems......................217

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.