ebook img

Vision by Alignment Adam Davis Kraft PDF

134 Pages·2017·23.22 MB·English
by  
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Vision by Alignment Adam Davis Kraft

Vision by Alignment by Adam Davis Kraft B.S., Massachusetts Institute of Technology (2005) M.Eng., Massachusetts Institute of Technology (2008) Submitted to the Department of Electrical Engineering and Computer Science in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Electrical Engineering and Computer Science at the MASSACHUSETTS INSTITUTE OF TECHNOLOGY February 2018 © Adam Davis Kraft, MMXVIII. All rights reserved. The author hereby grants to MIT permission to reproduce and to distribute publicly paper and electronic copies of this thesis document in whole or in part in any medium now known or hereafter created. Author . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Department of Electrical Engineering and Computer Science January 18, 2018 Certified by. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Patrick H. Winston Ford Professor of Artificial Intelligence and Computer Science Thesis Supervisor Accepted by . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Leslie A. Kolodziejski Professor of Electrical Engineering and Computer Science Chair of the Department Committee on Graduate Students 2 Vision by Alignment by Adam Davis Kraft SubmittedtotheDepartmentofElectricalEngineeringandComputerScience onJanuary18,2018,inpartialfulfillmentofthe requirementsforthedegreeof DoctorofPhilosophyinElectricalEngineeringandComputerScience Abstract Humanvisualintelligenceisrobust. Visionisversatileinitsvarietyoftasksandoperatingcondi- tions,itisflexible,adaptingfacilelytonewtasks,anditisintrospective,providingcompositional explanations for its findings. Vision is fundamentally underdetermined, but it exists in a world that abounds with constraints and regularities perceived not only through vision but through othersensesaswell. Theseobservationssuggestthattheimperativeofvisionistoexploitallsourcesofinformation toresolveambiguity. Iproposeanalignmentmodelforvision,inwhichcomputationalspecialists eagerly share state with their neighbors during ongoing computations, availing themselves of neighbors’ partial results in order to fill gaps in evolving descriptions. Connections between specialistsextendacrosssensorymodalities,sothatthecomputationalmachineryofmanysenses maybebroughttobearonproblemswithstrictly-visualinputs. I anticipate that this alignment process accounts for vision’s robust attributes, and I call this predictionthealignmenthypothesis. InthisdocumentIlaythegroundworkforevaluatingthe hypothesis. Ithendemonstrateprogresstowardthatgoal,bywayofthefollowingcontributions: • I performed an experiment to investigate and characterize the ways that high-performing computer-vision models fall short of robust perception, and evaluated whether alignment models can address the shortcomings. The experiment, which relied on a procedure to removesignalenergyfromnaturalimageswhilepreservinghighclassificationconfidence by a neural network, revealed that the type of object depicted in the original image is a strongpredictorofwhetherhumansrecognizethereduced-energyimage. • Iimplementedanalignmentmodelbasedonanetworkofpropagators. Themodelcanuse constraintstoinferlocationsandheightsofpedestriansandlocationsofoccludingobjects in an outdoor urban scene. I used the results of the effort to refine the requirements of mechanismstouseinbuildingalignmentmodels. • Iimplementedanalignmentmodelbasedonneuralnetworks. Alignment-motivateddesign empowers the model, trained to estimate depth maps from single images, to perform the additionaltaskofdepthsuper-resolutionwithoutretraining. Thedesignthusdemonstrates flexibility,apropertyofrobustvisionsystems. ThesisSupervisor: PatrickH.Winston Title: FordProfessorofArtificialIntelligenceandComputerScience Acknowledgments Foremost I thank my advisor, Patrick Winston. My meandering path through AI research has given me the impression that AI progress will improve dramatically when more people adopt a few of Patrick’s habits and ideas. There is no substitute for learning Patrick’s habits and ideas from Patrick. If you do not have that opportunity, though, then read the rest of this paragraph becauseitwillmakeyousmarter. Tounderstandhumanintelligence,youmustrejectallmeager proxiesforit. Seekaprincipledapproach,rejectmechanisticapproaches,anddonotcompromise. Tell yourself the right story. Explain everything as simply and clearly as possible. Have a vision thatisworthdedicatinganentirecareerto,andsolveproblemsthatmakeprogresstowardyour vision. When you tell people about the problems that you solve, make sure you first tell them whatyourvisionis. Telltheminawaythatwillinspiresomebodytodedicateanentirecareerto it. GeraldSussmanandShimonUllmanservedonmythesiscommittee. Gerry’sthoughtfulcri- tiqueofthisdocumentwasinvaluable. Ihavecometoappreciatethatanhour-longconversation with Gerry easily imparts weeks of ideas to revisit, and I’m grateful to have had the opportu- nity to work with him. Similarly, Shimon’s work has always inspired me, and has helped me to understandwhichproblemsinvisionareessentialtosolve. IamgratefultoallofmyfriendsandcolleaguesatMITCSAILandespeciallytoDylanHolmes and Michael Fleder. I thank Dylan for patient critique, for countless conversations about this work, and for sharing a bright outlook on a future with robust AI. Dylan generously applied his expertise to design several of the best illustrations in this document. I have been very fortunate to benefit from Michael’s deep understanding of machine-learning concepts and from his talent forclearexplanation. Ioweadebtofgratitudetomyfriendswhoreadthisdocumentatvariousstagesofcompletion and provided me with feedback: to Avril Kenney for her extremely thorough and perceptive attention to detail and for keeping me honest; to Blake Stacey, whose science writing expertise ledtoenormouslyhelpfulsuggestions;andtoBrianNeltner,forholdingmeaccountabletohigh standards. Additionally, Robert McIntyre generously supplied me with a working set of custom toolsandtemplatesforcompilingthisdocument. Michael Coen, Jeffrey Siskind, Gadi Geiger, and Sajit Rao provided inspiration for this work, particularlyinitsearlystages. Theirworkcontinuestoinspireme. Gadi’sweeklyresearchmeet- ingwasaninvaluableresource,earlyoninthiswork. GeneroussupportthroughDARPAandNSFmadethisworkpossible. Iamespeciallygrateful to James Donlon for his vision and leadership in both the Mind’s Eye and Robust Intelligence programs. This work was sponsored by awards FA8750-05-2-0274, D11AP00250, W911NF-10-2- 0065,andIIS-1421065. Contents 1 Vision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 1.2 Mechanismsofalignment . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 1.2.1 Propagatornetworks . . . . . . . . . . . . . . . . . . . . . . . . 13 1.2.2 Probabilisticgraphicalmodels . . . . . . . . . . . . . . . . . . . 16 1.2.3 RestrictedBoltzmannmachines . . . . . . . . . . . . . . . . . . 17 1.2.4 Neuralnetworks . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 1.3 Testingground . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 1.3.1 Problemsnottouse,andwhy . . . . . . . . . . . . . . . . . . . 18 1.3.2 Problemstouse,andwhy . . . . . . . . . . . . . . . . . . . . . . 20 1.4 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 2 CharacterizingNeuralNetClassification . . . . . . . . . . . . . . . . . . . . . . . 23 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 2.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 2.2.1 Networkmodelsandimages . . . . . . . . . . . . . . . . . . . . 26 2.2.2 signal-energyreductionalgorithm . . . . . . . . . . . . . . . . . 26 2.2.3 Reductionalgorithmdesignissues . . . . . . . . . . . . . . . . . 28 2.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 2.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 2.4.1 DescriptivenessofDCNNmodels . . . . . . . . . . . . . . . . . 38 2.4.2 Classspecificityofmodeldescriptiveness . . . . . . . . . . . . . 40 2.4.3 Rulingoutalternatives: aDCNNstrategy . . . . . . . . . . . . . 41 2.4.4 Implications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 2.5 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 3 FoundationalWorkinConstraintPropagation . . . . . . . . . . . . . . . . . . . . 44 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 3.2 Observationsaboutconstraintpropagationsystems . . . . . . . . . . . . . 44 3.3 Applicationsofconstraintpropagationinvision . . . . . . . . . . . . . . . 46 3.3.1 Shapefromshading . . . . . . . . . . . . . . . . . . . . . . . . . 46 3.3.2 Waltz’s3D-labelingprocedure . . . . . . . . . . . . . . . . . . . 46 3.3.3 Hinton’sworkonrelaxation . . . . . . . . . . . . . . . . . . . . 47 3.4 Thepropagatorarchitecture . . . . . . . . . . . . . . . . . . . . . . . . . . 49 3.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 4 ProcessingaScenewithPropagators . . . . . . . . . . . . . . . . . . . . . . . . . 52 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 5 4.2 Experimentalsetup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 4.3 Implementationdetails . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 4.3.1 Abstractions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 4.3.2 Locatingforegroundregions . . . . . . . . . . . . . . . . . . . . 59 4.3.3 Trackingobjects . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 4.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 4.4.1 Scarcityofstrongconstraints . . . . . . . . . . . . . . . . . . . 82 4.4.2 Brittlenessoflogicalabsolutes . . . . . . . . . . . . . . . . . . . 82 4.4.3 Incorrigibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 4.4.4 Problemsofscale . . . . . . . . . . . . . . . . . . . . . . . . . . 84 4.4.5 Wheretogonext . . . . . . . . . . . . . . . . . . . . . . . . . . 87 4.5 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 5 BuildingNeuralNetworksforAlignment . . . . . . . . . . . . . . . . . . . . . . . 90 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 5.2 Problemstatement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 5.3 Relatedwork . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 5.4 Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 5.4.1 Fortificationagainstadversarialexamples . . . . . . . . . . . . 95 5.4.2 Semanticallymeaningfulports . . . . . . . . . . . . . . . . . . . 96 5.4.3 Empiricallysuccessfulfoundation . . . . . . . . . . . . . . . . . 97 5.5 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 5.6 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 5.6.1 Traininginfrastructure . . . . . . . . . . . . . . . . . . . . . . . 99 5.6.2 Trainingdata . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 5.6.3 Trainingparticulars . . . . . . . . . . . . . . . . . . . . . . . . . 101 5.6.4 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 5.6.5 Externalsignalintroduction . . . . . . . . . . . . . . . . . . . . 103 5.7 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 5.7.1 Extendingallynetworks . . . . . . . . . . . . . . . . . . . . . . 107 5.7.2 Applyingimmutabledifferentiablejoints . . . . . . . . . . . . . 107 5.8 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 6 SummaryofContributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 A Appendix: NeuralNetworkMethods . . . . . . . . . . . . . . . . . . . . . . . . . 113 A.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 A.2 Bootstrappingfully-connectedlayersfromconvolutionallayers . . . . . . 113 A.3 Batchnormalizationonmulti-GPUsystems . . . . . . . . . . . . . . . . . 117 A.3.1 Overviewofbatchnormalization . . . . . . . . . . . . . . . . . 117 A.3.2 Modificationstothebatchnormalizationalgorithm . . . . . . . 118 A.4 Improvingtransferlearningwithbatchnormalizationretrofit . . . . . . . 119 A.5 Neuralnetworkprototypingdesignissues . . . . . . . . . . . . . . . . . . 123 List of Figures 1 Visualsummaryofhighlightsanddevelopments . . . . . . . . . . . . . . . . . . 10 2 Motivationalframeworkfortheworkdescribedinthisdocument . . . . . . . . . 14 3 Reduced-energyimagesthataneuralnetworkrecognizes . . . . . . . . . . . . . 23 4 TheKanizsaTriangleillusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 5 Examplereduced-signal-energyimages . . . . . . . . . . . . . . . . . . . . . . . . 27 6 Signal-energyreductionsteps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 7 Adversarialexamplesgeneratedby3methods . . . . . . . . . . . . . . . . . . . . 30 8 EnergyratiosperLaplacian-pyramidlevel . . . . . . . . . . . . . . . . . . . . . . 31 9 Invarianceofsignal-energyreductionalgorithmtoRNGinitialization . . . . . . . 32 10 Residualandminimal-energyimagesfromiterativeenergyreduction . . . . . . . 33 11 Accumulatedminimum-energyimages . . . . . . . . . . . . . . . . . . . . . . . . 34 12 Class-specificityofminimum-energyimagerecognizability . . . . . . . . . . . . . 35 13 Experimentuserinterface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 14 Most-oftenandleast-oftenrecognizedimagesbystudyparticipants . . . . . . . . 38 15 Dynamicrangeofpre-softmaxactivationvalues . . . . . . . . . . . . . . . . . . . 39 16 StatisticalreframingofWaltz’procedure . . . . . . . . . . . . . . . . . . . . . . . 48 17 Exampleoutputofpropagatorsystem . . . . . . . . . . . . . . . . . . . . . . . . . 52 18 Comparisonofpropagatorandpipelineapproaches . . . . . . . . . . . . . . . . . 53 19 Data-collectionapparatus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 20 Stereoimagesfromthecameraarray . . . . . . . . . . . . . . . . . . . . . . . . . 55 21 Inheritanceandinterfacesummaryforlow-levelpropagatorsystem . . . . . . . . 56 22 Propagatorsubclasses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 23 Flowgraphoflow-levelprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . 60 24 Summaryoflow-levelprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 25 Rawinputtothetracker . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 26 Outputoftrackmergingviasymmetriccascade . . . . . . . . . . . . . . . . . . . 71 27 Edge-presentframe-counthistogramcomposite . . . . . . . . . . . . . . . . . . . 73 28 Opticalflowandcolorhistograms . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 29 Outputofleastsquaresestimationofthegroundplane . . . . . . . . . . . . . . . 75 30 Outputofpropagatorrelaxationofthegroundplane . . . . . . . . . . . . . . . . . 76 31 Locationsofoccluders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 32 Top-downpropagatornetwork . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 33 Trackaffectedbypropagation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 34 Athree-wayandmulti-waysumpropagator . . . . . . . . . . . . . . . . . . . . . 84 35 Athree-waysumpropagatorwithaconstraintonthetotal . . . . . . . . . . . . . 87 7 36 Dual-modeestimationneuralnetwork . . . . . . . . . . . . . . . . . . . . . . . . 90 37 High-leveldepth-estimationnetworkdesign . . . . . . . . . . . . . . . . . . . . . 96 38 Neuralnetworkfordepthestimation . . . . . . . . . . . . . . . . . . . . . . . . . 98 39 Outputsoftwodepth-estimationnetworks . . . . . . . . . . . . . . . . . . . . . . 104 40 Allynetworkstructure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 41 Signal-introductiondepthup-sampling . . . . . . . . . . . . . . . . . . . . . . . . 106 42 GANcomparedtoMSE-trainednetwork . . . . . . . . . . . . . . . . . . . . . . . 109 43 ConversionofaconvolutionallayertoanFClayer . . . . . . . . . . . . . . . . . . 116 8 1. Vision 1 Vision As the remnants of AI winter thaw rapidly, excitement over machine learning’s rapid pace of achievementispalpable. Machinesnowoutperformhumansontaskssuchasobject-detection,a milestonethatwasfarbeyondreachlessthanadecadeago. Perhapsthegreatestopportunityin AI research, though, is the opportunity to gain a deep computational understanding of the way people think. Despite outstanding technical achievements, progress on this front has been slow. Thedisparitybetweenperformanceandunderstandingisespeciallystrikingincomputervision. Tounderstandhumanvisualintelligenceistoaccountforitsastonishingversatility,itsflexi- bility,andthecoherenceanddepthoftheexplanationsitoffersforitsobservations. Acloserlook at the state of the art in computer vision reveals that it has not yet achieved any these abilities, despitethetremendoustechnologicalvalueofitsachievementstodate. Thisisthestoryofmyfirststepstowarddescribing,implementing,andunderstandingrobust visual intelligence. The story revolves around the alignment hypothesis that you learn about inSection1.1. ThehighlightsandmajordevelopmentsinthestoryaredepictedinFigure1. In Chapter 2, you will find out about an experiment in which I investigated and character- ized discrepancies between a neural network’s visual ability and human visual intelligence. The implications of the experiment reinforce the need for robust vision architectures. As part of the experiment,Idevelopedaproceduretoisolatethefeaturesofimagesthatsupportconfidentclas- sification by a neural network, and asked whether those features also support classification by humans. A surprising outcome of the experiment is that whether or not the neural network features are human-recognizable depends primarily on the type of object depicted in the image, ratherthanonotherdetailsoftheimage. Then, in Chapters 3 and 4, you will learn about my work in implementing alignment-driven vision systems using a propagator architecture. A significant outcome of that work is a vision system that tracks peopleas they move, and uses the tracking along withconstraints like people mustbesupportedinordertowalk toinfergeometricpropertiesofthesceneandtheactors. InChapter5youwillseemyfirststepstowardapplyingalignmentprinciplestothedesignof neuralnetworks. Asignificantresultisthatthedesignempowersaneuralnetworktoaccomplish thetaskofdepthsuper-resolutiondespitethatitwastrainedjustforthetaskofdepthestimation. 9 1. Vision (a)Imagesandminimalfeatures (b)Pedestriansandscenegeometry (c)Imagesandgenerateddepthmaps Figure1: Visualsummaryofhighlightsanddevelopments Inputs followed by outputs, shown in (a), of a signal-energy reduction algorithm that preserves neural network classification confidence. A propagator system, with output shown in (b), uses the tracks of pedestrians combined with knowledge about average human size to help identify the ground plane and refineheightsofpedestrians. Aneuralnetworkperformsbothdepthestimationfromsingleimages, and depthup-sampling,withinputsandoutputsshownin(c). 10

Description:
Necessities of life such as nding food and avoiding danger ruthlessly optimized the visual The same alignment mechanism that recruits processing . quirement, combined with the declarative nature of propagators that The short track segments are connected by weighted edges to form a di-.
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.