(cid:2) ComputationalModelsforCognitiveVision (cid:2) (cid:2) (cid:2) (cid:2) IEEEPress 445HoesLane Piscataway,NJ08854 IEEEPressEditorialBoard EkramHossain,EditorinChief JónAtliBenediktsson DavidAlanGrier ElyaB.Joffe XiaoouLi PeterLian AndreasMolisch SaeidNahavandi JeffreyReed DiomidisSpinellis SarahSpurgeon AhmetMuratTekalp AboutIEEEComputerSociety IEEEComputerSocietyistheworld’sleadingcomputingmembershiporganization andthetrustedinformationandcareer-developmentsourceforaglobalworkforce of technology leaders including: professors, researchers, software engineers, IT professionals, employers, and students. The unmatched source for technology information, inspiration, and collaboration, the IEEE Computer Society is the sourcethatcomputingprofessionalstrusttoprovidehigh-quality,state-of-the-art information on an on-demand basis. The Computer Society provides a wide (cid:2) (cid:2) rangeofforumsfortopmindstocometogether,includingtechnicalconferences, publications, and a comprehensive digital library, unique training webinars, professional training, and the Tech Leader Training Partner Program to help organizations increase their staff’s technical knowledge and expertise, as well as the personalized information tool my Computer. To find out more about the communityfortechnologyleaders,visithttp://www.computer.org. IEEE/WileyPartnership TheIEEEComputerSocietyandWileypartnershipallowstheCSPressauthored bookprogramtoproduceanumberofexcitingnewtitlesinareasofcomputersci- ence, computing, and networking with a special focus on software engineering. IEEEComputerSocietymembersreceivea35%discountonWileytitlesbyusing theirmemberdiscountcode.PleasecontactIEEEPressfordetails. To submit questions about the program or send proposals, please contact Mary Hatcher, Editor, Wiley-IEEE Press: Email: [email protected], John Wiley & Sons,Inc.,111RiverStreet,Hoboken,NJ07030-5774. (cid:2) (cid:2) Computational Models for Cognitive Vision Hiranmay Ghosh (cid:2) (cid:2) Ex-Advisor, TCS Research (cid:2) (cid:2) Copyright©2020TheIEEEComputerSociety,Inc. PublishedbyJohnWiley&Sons,Inc.,Hoboken,NewJersey. PublishedsimultaneouslyinCanada. Nopartofthispublicationmaybereproduced,storedinaretrievalsystem,ortransmittedinany formorbyanymeans,electronic,mechanical,photocopying,recording,scanning,orotherwise, exceptaspermittedunderSection107or108ofthe1976UnitedStatesCopyrightAct,without eitherthepriorwrittenpermissionofthePublisher,orauthorizationthroughpaymentofthe appropriateper-copyfeetotheCopyrightClearanceCenter,Inc.,222RosewoodDrive,Danvers, MA01923,(978)750-8400,fax(978)750-4470,oronthewebatwww.copyright.com.Requeststo thePublisherforpermissionshouldbeaddressedtothePermissionsDepartment,JohnWiley& Sons,Inc.,111RiverStreet,Hoboken,NJ07030,(201)748-6011,fax(201)748-6008,oronlineat http://www.wiley.com/go/permission. LimitofLiability/DisclaimerofWarranty:Whilethepublisherandauthorhaveusedtheirbest effortsinpreparingthisbook,theymakenorepresentationsorwarrantieswithrespecttothe accuracyorcompletenessofthecontentsofthisbookandspecificallydisclaimanyimplied warrantiesofmerchantabilityorfitnessforaparticularpurpose.Nowarrantymaybecreatedor extendedbysalesrepresentativesorwrittensalesmaterials.Theadviceandstrategiescontained hereinmaynotbesuitableforyoursituation.Youshouldconsultwithaprofessionalwhere appropriate.Neitherthepublishernorauthorshallbeliableforanylossofprofitoranyother commercialdamages,includingbutnotlimitedtospecial,incidental,consequential,orother damages. Forgeneralinformationonourotherproductsandservicesorfortechnicalsupport,please contactourCustomerCareDepartmentwithintheUnitedStatesat(800)762-2974,outsidethe (cid:2) UnitedStatesat(317)572-3993orfax(317)572-4002. (cid:2) Wileyalsopublishesitsbooksinavarietyofelectronicformats.Somecontentthatappearsin printmaynotbeavailableinelectronicformats.FormoreinformationaboutWileyproducts, visitourwebsiteatwww.wiley.com. LibraryofCongressCataloging-in-PublicationData: Names:Ghosh,Hiranmay,author. Title:Computationalmodelsforcognitivevision/HiranmayGhosh. Description:Hoboken,NewJersey:Wiley-IEEEComputerSocietyPress, [2020]|Includesbibliographicalreferencesandindex. Identifiers:LCCN2020003784(print)|LCCN2020003785(ebook)|ISBN 9781119527862(paperback)|ISBN9781119527855(adobepdf)|ISBN 9781119527893(epub) Subjects:LCSH:Computervision.|Cognitivescience.|Visualperception. |Bayesianstatisticaldecisiontheory. Classification:LCCTA1634.G4832020(print)|LCCTA1634(ebook)|DDC 006.3/7–dc23 LCrecordavailableathttps://lccn.loc.gov/2020003784 LCebookrecordavailableathttps://lccn.loc.gov/2020003785 CoverDesign:Wiley CoverImage:©AndriyOnufriyenko/GettyImages Setin9.5/12.5ptSTIXTwoTextbySPiGlobal,Chennai,India PrintedintheUnitedStatesofAmerica. 10 9 8 7 6 5 4 3 2 1 (cid:2) (cid:2) v Contents AbouttheAuthor ix Acknowledgments xi Preface xiii Acronyms xv 1 Introduction 1 1.1 WhatIsCognitiveVision 2 1.2 ComputationalApproachesforCognitiveVision 3 (cid:2) (cid:2) 1.3 ABriefReviewofHumanVisionSystem 4 1.4 PerceptionandCognition 6 1.5 OrganizationoftheBook 7 2 EarlyVision 9 2.1 FeatureIntegrationTheory 9 2.2 StructureofHumanEye 10 2.3 LateralInhibition 13 2.4 Convolution:DetectionofEdgesandOrientations 14 2.5 ColorandTexturePerception 17 2.6 MotionPerception 19 2.6.1 Intensity-BasedApproach 19 2.6.2 Token-BasedApproach 20 2.7 PeripheralVision 21 2.8 Conclusion 24 3 BayesianReasoningforPerceptionandCognition 25 3.1 ReasoningParadigms 26 3.2 NaturalSceneStatistics 27 3.3 BayesianFrameworkofReasoning 28 3.4 BayesianNetworks 32 (cid:2) (cid:2) vi Contents 3.5 DynamicBayesianNetworks 34 3.6 ParameterEstimation 36 3.7 OnComplexityofModelsandBayesianInference 38 3.8 HierarchicalBayesianModels 39 3.9 InductiveReasoningwithBayesianFramework 41 3.9.1 InductiveGeneralization 41 3.9.2 TaxonomyLearning 45 3.9.3 FeatureSelection 46 3.10 Conclusion 47 4 LateVision 51 4.1 StereopsisandDepthPerception 51 4.2 PerceptionofVisualQuality 53 4.3 PerceptualGrouping 55 4.4 Foreground–BackgroundSeparation 59 4.5 Multi-stability 60 4.6 ObjectRecognition 61 4.6.1 In-ContextObjectRecognition 62 4.6.2 SynthesisofBottom-UpandTop-DownKnowledge 64 4.6.3 HierarchicalModeling 65 (cid:2) (cid:2) 4.6.4 One-ShotLearning 66 4.7 VisualAesthetics 67 4.8 Conclusion 69 5 VisualAttention 71 5.1 ModelingofVisualAttention 72 5.2 ModelsforVisualAttention 75 5.2.1 CognitiveModels 75 5.2.2 Information-TheoreticModels 77 5.2.3 BayesianModels 78 5.2.4 Context-BasedModels 79 5.2.5 Object-BasedModels 81 5.3 Evaluation 82 5.4 Conclusion 84 6 CognitiveArchitectures 87 6.1 CognitiveModeling 88 6.1.1 ParadigmsforModelingCognition 88 6.1.2 LevelsofAbstraction 91 6.2 DesiderataforCognitiveArchitectures 92 6.3 MemoryArchitecture 94 (cid:2) (cid:2) Contents vii 6.4 TaxonomiesofCognitiveArchitectures 97 6.5 ReviewofCognitiveArchitectures 99 6.5.1 STAR:SelectiveTuningAttentiveReference 100 6.5.2 LIDA:LearningIntelligentDistributionAgent 102 6.6 BiologicallyInspiredCognitiveArchitectures 105 6.7 Conclusions 106 7 KnowledgeRepresentationforCognitiveVision 109 7.1 ClassicistApproachtoKnowledgeRepresentation 109 7.1.1 FirstOrderLogic 111 7.1.2 SemanticNetworks 113 7.1.3 Frame-BasedRepresentation 114 7.2 SymbolGroundingProblem 117 7.3 PerceptualKnowledge 118 7.3.1 RepresentingPerceptualKnowledge 119 7.3.2 StructuralDescriptionofScenes 120 7.3.3 QualitativeSpatialandTemporalRelations 122 7.3.4 InexactSpatiotemporalRelations 124 7.4 UnifyingConceptualandPerceptualKnowledge 127 (cid:2) 7.5 Knowledge-BasedVisualDataProcessing 128 (cid:2) 7.6 Conclusion 129 8 DeepLearningforVisualCognition 131 8.1 ABriefIntroductiontoDeepNeuralNetworks 132 8.1.1 FullyConnectedNetworks 132 8.1.2 ConvolutionalNeuralNetworks 134 8.1.3 RecurrentNeuralNetworks 137 8.1.4 SiameseNetworks 140 8.1.5 GraphNeuralNetworks 140 8.2 ModesofLearningwithDNN 142 8.2.1 SupervisedLearning 142 8.2.1.1 ImageSegmentation 142 8.2.1.2 ObjectDetection 144 8.2.2 UnsupervisedLearningwithGenerativeNetworks 144 8.2.3 Meta-Learning:LearningtoLearn 146 8.2.3.1 ReinforcementLearning 148 8.2.3.2 One-ShotandFew-ShotLearning 148 8.2.3.3 Zero-ShotLearning 150 8.2.3.4 IncrementalLearning 150 8.2.4 Multi-taskLearning 152 (cid:2) (cid:2) viii Contents 8.3 VisualAttention 154 8.3.1 RecurrentAttentionModels 155 8.3.2 RecurrentAttentionModelforVideo 158 8.4 BayesianInferencingwithNeuralNetworks 159 8.5 Conclusion 160 9 ApplicationsofVisualCognition 163 9.1 ComputationalPhotography 163 9.1.1 ColorEnhancement 164 9.1.2 IntelligentCropping 166 9.1.3 FaceBeautification 167 9.2 DigitalHeritage 168 9.2.1 DigitalRestorationofImages 168 9.2.2 CuratingDanceArchives 170 9.3 SocialRobots 172 9.3.1 DynamicandSharedSpaces 173 9.3.2 RecognitionofVisualCues 174 9.3.3 AttentiontoSociallyRelevantSignals 175 9.4 ContentRe-purposing 177 9.5 Conclusion 179 (cid:2) (cid:2) 10 Conclusion 181 10.1 “WhatIsCognitiveVision”Revisited 181 10.2 DivergenceofApproaches 183 10.3 ConvergenceontheAnvil? 185 References 187 Index 215 (cid:2) (cid:2) ix About the Author (cid:2) (cid:2) Hiranmay Ghosh is a researcher in Computer Vision, Artificial Intelligence, MachineLearning,andCognitiveComputing.HehasreceivedhisPh.D.degree from Electrical Engineering Department of IIT-Delhi and his B.Tech. degree in RadiophysicsandElectronicsfromtheCalcuttaUniversity. HiranmayhadbeenaresearchadviserwithTataConsultancyServices.Hehad been associated with R&D and engineering activities for more than 40years in industry and autonomous research laboratories. He had been invited to teach at Indian Institute of Technology Delhi and National Institute of Technology Karnataka as Adjunct Faculty. He is also a co-author of the book Multimedia Ontology:Representation&Applications. HeisaSeniorMemberofIEEE,LifeMemberofIUPRAI,andaMemberofACM. (cid:2)