ebook img

Learning Invariant Representations of Actions and Faces Andrea Tacchetti PDF

139 Pages·2017·28.52 MB·English
by  
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Learning Invariant Representations of Actions and Faces Andrea Tacchetti

Learning Invariant Representations of Actions and Faces by Andrea Tacchetti B.S., Università degli Studi di Genova (2008) M.S., Università degli Studi di Genova (2011) Submitted to the Department of Electrical Engineering and Computer Science in partial fulfillment of the requirements for the degree of Doctor of Philosophy at the MASSACHUSETTS INSTITUTE OF TECHNOLOGY September 2017 ○c Massachusetts Institute of Technology 2017. All rights reserved. Author ............................................................. Department of Electrical Engineering and Computer Science July 13, 2017 Certified by ......................................................... Professor Tomaso A. Poggio Eugene McDermott Professor of Brain and Cognitive Sciences Thesis Supervisor Accepted by......................................................... Professor Leslie A. Kolodziejski Chair, Department Committee on Graduate Students 2 Learning Invariant Representations of Actions and Faces by Andrea Tacchetti SubmittedtotheDepartmentofElectricalEngineeringandComputerScience onJuly13,2017,inpartialfulfillmentofthe requirementsforthedegreeof DoctorofPhilosophy Abstract Recognizing other people and their actions from visual input is a crucial aspect of human perception that allows individuals to respond to social cues. Humans effortlessly identify familiar faces and are able to make fine distinctions between others’ behaviors, despite transformations, like changes in viewpoint, lighting or facial expression, that substantially altertheappearanceofavisualscene. Theabilitytogeneralizeacrossthesecomplextrans- formationsisahallmarkofhumanvisualintelligence,andtheneuralmechanismssupport- ing it have been the subject of wide ranging investigation in systems and computational neuroscience. However, advances in understanding the neural machinery of visual percep- tionhavenotalwaystranslatedinpreciseaccountsofthecomputationalprinciplesdictating which representations of sensory input the human visual system learned to compute; nor how our visual system acquires the information necessary to support this learning process. Here we present results in support of the hypothesis that invariant discrimination and time continuity might fill these gaps. In particular, we use Magnetoencephalography decoding and a dataset of well-controlled, naturalistic videos to study invariant action recognition and find that representations of action sequences that support invariant recognition can be measuredinthehumanbrain. Moreover,weestablishadirectlinkbetweenhowwellartifi- cialvideorepresentationssupportinvariantactionrecognitionandtheextenttowhichthey match neural correlation patterns. Finally, we show that visual representations of visual inputthatarerobusttochangesinappearance,canbelearnedbyexploitingtimecontinuity invideosequences. Takenasawholeourresultssuggestthatsupportinginvariantdiscrimi- nationtasksisthecomputationalprincipledictatingwhichrepresentationsofsensoryinput arecomputedbyhumanvisualcortexandthattimecontinuityinvisualscenesissufficient tolearnsuchrepresentations. ThesisSupervisor: ProfessorTomasoA.Poggio Title: EugeneMcDermottProfessorofBrainandCognitiveSciences 3 4 Acknowledgments This thesis is the fruit of years of work in the Poggio lab. However, the pages that follow do not do justice to the impact that graduate school has had on me. The first shortcoming is that the by-line is way too short, it should include all the mentors, collaborators, friends and family that have supported and encouraged me through the ups and downs that come with learning to do research. This section is a timid attempt to rectify this fault. The second limitation is that the work presented here is but a slice of my experience at MIT. The people I met, and the enthusiasm and openness I found here, both inside and outside the lab, cannot be aptly described in a page, but have been as important to my training as anypieceofresearchIwasfortunateenoughtoworkon. First, I would like to thank my research advisor Tomaso Poggio for the inspiration and mentorship he provided through the years. Tommy, you have inspired me to make it my life’s work to understand and replicate human intelligence. Thank you for leading by example,forteachingmehowtodoresearchandmanageinnovation,forbeingstrictatfirst and for letting me find my own path when it was time. I could not have asked for a better advisorandIlookforwardtoalonglastingfriendship. IwouldliketothankmythesiscommitteeLeslieP.KaelblingandBillFreeman. When I presented my work to you, for 35 minutes in the middle of your busy day, the speed with which you put it into context and how, in a matter of minutes, you asked questions that had taken me weeks to concoct have provided a concrete example of what I would like to achieve in my professional career. Thank you for being role models of scientists and mentors. Yourenthusiasmandinsightfulfeedbackhavehadasignificantimpactonmeand ontheworkpresentedhere. I would like to thank all my amazing collaborators – Leyla Isik, Stephen Voinea and Georgios Evangelopoulos – and the rest of Poggio lab family, present and past, in partic- ular Chiyuan Zhang, Charlie Frogner, Youssef Mroueh, Lorenzo Rosasco, Joel Leibo, Jim Mutch, Yena Han, Qianli Liao, Gemma Roig, Brando Miranda, Xavier Boix, Ethan Mey- ers, and Owen Lewis. Thank you for all the discussions, about research and otherwise, for your friendship and your support over the years. I would also like to thank Kathleen 5 Sullivan and Gadi Geiger for the coffees, lunches, life lessons and for your constantly and relentlesslyopendoors. IwouldliketothanktheMITCyclingClub,andespeciallyitsleadership,forproviding the social environment and competitive outlet I needed to be successful at MIT. I have yet to meet a group of people with the same level of enthusiasm and dedication to their communityastheClubandIamgratefulthatIhadtheopportunitytobepartofit. I would like to thank my parents Carlo Tacchetti and Marilisa Nocera and my sister PaolaTacchettiforthesupportandencouragementandforhelpingmeputinperspectiveall thefrustrations,pitfallsandtimesofslowprogressthatareinevitableinresearchprojects. Lastly, I would like to thank my wife and life partner Rebecca Canter for her constant unwaveringsupportandforherencouragementtotakerisks. Becky,youareaninspiration ofwhatitmeanstobeagoodperson,ascientistandateammate. Thisthesiswouldsimply nothavebeenpossiblewithoutyou. 6 Contents 1 Introduction 15 1.1 Introductionandproblemstatement . . . . . . . . . . . . . . . . . . . . . 15 1.2 Relatedwork . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 1.2.1 Invariancetotransformationsandsamplecomplexity . . . . . . . . 19 1.2.2 Neural decodingreveals invariance properties ofcortical represen- tations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 1.2.3 Invariantrecognitiondictatesneuralrepresentationofvisualinput . 21 1.2.4 Biologicalpredictions . . . . . . . . . . . . . . . . . . . . . . . . 22 1.2.5 Learninginvariantrepresentationsandtheirapplications . . . . . . 24 1.2.6 Perceptionofothers . . . . . . . . . . . . . . . . . . . . . . . . . 25 1.3 OrganizationofthisThesis . . . . . . . . . . . . . . . . . . . . . . . . . . 26 1.4 Maincontributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 2 Fastinvariantrepresentationsofhumanactionsinthevisualsystem 29 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 2.2 MaterialsandMethods . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 2.2.1 Actionrecognitiondataset . . . . . . . . . . . . . . . . . . . . . . 32 2.2.2 Participants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 2.2.3 Experimentalprocedure . . . . . . . . . . . . . . . . . . . . . . . 34 2.2.4 MEGdataacquisitionandpreprocessing . . . . . . . . . . . . . . 35 2.2.5 GeneralMEGdecodingmethods . . . . . . . . . . . . . . . . . . . 36 2.2.6 Decoding-featurepre-processing . . . . . . . . . . . . . . . . . . 36 2.2.7 Decoding-classification . . . . . . . . . . . . . . . . . . . . . . . 37 7 2.2.8 Decodinginvariantinformation . . . . . . . . . . . . . . . . . . . 38 2.2.9 Significancetesting . . . . . . . . . . . . . . . . . . . . . . . . . . 38 2.2.10 TemporalCrossTraining . . . . . . . . . . . . . . . . . . . . . . . 39 2.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 2.3.1 ReadoutofactionsfromMEGdataisearlyandinvariant . . . . . . 39 2.3.2 Thedynamicsofinvariantactionrecognition . . . . . . . . . . . . 41 2.3.3 Coarseandfineactiondiscrimination . . . . . . . . . . . . . . . . 43 2.3.4 Therolesofformandmotionininvariantactionrecognition . . . . 48 2.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 2.5 SupplementaryInformation . . . . . . . . . . . . . . . . . . . . . . . . . . 53 3 Invariantrecognitiondrivesneuralrepresentationsofactionsequences 55 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 3.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 3.2.1 Action discrimination with Spatiotemporal Convolutional repre- sentations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 3.2.2 Comparisonofmodelrepresentationsandneuralrecordings . . . . 64 3.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 3.4 Materialsandmethods . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 3.4.1 ActionRecognitionDataset . . . . . . . . . . . . . . . . . . . . . 70 3.4.2 Recognizingactionswithspatiotemporalconvolutional representations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 3.4.3 Quantifying agreement between model representations and neural recordings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 3.5 Supplementaryinformation . . . . . . . . . . . . . . . . . . . . . . . . . . 78 3.5.1 RecurrentNeuralNetworks . . . . . . . . . . . . . . . . . . . . . 78 3.5.2 RSAovertime . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 4 Learning invariant representations from video sequences and transformation sets 83 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 8 4.2 Relatedwork . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 4.3 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 4.4 Metriclearningwithorbitloss . . . . . . . . . . . . . . . . . . . . . . . . 92 4.4.1 Problemstatement . . . . . . . . . . . . . . . . . . . . . . . . . . 93 4.4.2 Orbitsets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 4.4.3 Orbitmetricloss . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 4.4.4 Orbittripletloss . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 4.4.5 Orbitencoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 4.4.6 Parameterizationoftheembedding . . . . . . . . . . . . . . . . . 97 4.5 Informalanalysisof Discriminate-and-RectifyEncoders . . . . . . . . . . . . . . . . . . . . . . 98 4.6 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 4.6.1 Networkandtrainingdetails . . . . . . . . . . . . . . . . . . . . . 100 4.6.2 Affinetransformations: MNIST . . . . . . . . . . . . . . . . . . . 100 4.6.3 Facetransformations: Multi-PIE . . . . . . . . . . . . . . . . . . . 103 4.6.4 Earlystoppingbycrossvalidation . . . . . . . . . . . . . . . . . . 108 4.6.5 Estimatingorbitsfromvideosequences . . . . . . . . . . . . . . . 110 4.7 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 4.8 SupplementaryInformation . . . . . . . . . . . . . . . . . . . . . . . . . . 115 4.8.1 Additionalreconstructionexamples . . . . . . . . . . . . . . . . . 115 5 Conclusions 119 5.1 Futuredirections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 5.1.1 Perceptionofothers . . . . . . . . . . . . . . . . . . . . . . . . . 121 5.1.2 Doingawaywithfullsupervision . . . . . . . . . . . . . . . . . . 122 5.1.3 Gradientbasedlearning . . . . . . . . . . . . . . . . . . . . . . . 123 5.1.4 Theroleofthearchitectureandtheneedforatheory . . . . . . . . 123 5.2 Finalremarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 9 10

Description:
these methods provides strong evidence that invariant representations of sensory input are key to solving perceptual tasks and explain abilities that are peculiar of human intelligence, like learning from few examples and generalizing across complex transformations. 1.2.6 Perception of others.
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.