ebook img

understanding visual appearance on the web using large-scale crowdsourcing and deep learning PDF

209 Pages·2016·6.03 MB·English
by  
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview understanding visual appearance on the web using large-scale crowdsourcing and deep learning

UNDERSTANDING VISUAL APPEARANCE ON THE WEB USING LARGE-SCALE CROWDSOURCING AND DEEP LEARNING ADissertation PresentedtotheFacultyoftheGraduateSchool ofCornellUniversity inPartialFulfillmentoftheRequirementsfortheDegreeof DoctorofPhilosophy by SeanCameronBell August2016 (cid:13)c 2016SeanCameronBell ALLRIGHTSRESERVED UNDERSTANDINGVISUALAPPEARANCEONTHEWEBUSING LARGE-SCALECROWDSOURCINGANDDEEPLEARNING SeanCameronBell,Ph.D. CornellUniversity2016 Automaticallyunderstandingscenesistheholygrailofcomputervision. Real- world scenes have a vast array of interesting objects, materials, textures, and surfaces. With scenes, people want to edit photographs, search by object and materialproperties,visualizechangestoroomsandbuildings,browsecollections by visual similarity, and explain images to the visually impaired. However, the tools and data that we have for recognizing, editing, and exploring the applicationofscenepropertiesforeverydayproblemsarestillquitelimited. We cannoteasilyunderstand,search,andaggregatevisualconceptsinthebillionsof photosthatareuploadedeverydaytotheweb. Recently,largedatacollectionscombinedwithmachinelearninghaveopened new frontiers in scene understanding. In this thesis, we introduce new large- scalecrowdsourceddatasetsformaterialandvisualunderstandinginthewild. Usingthesenew datasets,wedevelop newstate-of-the-artalgorithmsforscene understandingofmaterials,objects,shapes,andstyle. Weproposemultiplenewlarge-scale,first-of-their-kinddatasetsinthewild. OpenSurfacescontainsthousandsofsegmentedsurfacesannotatedwithmate- rial, texture, and contextualinformation. MINC (Materials in Context)includes millionsofpointsannotatedwithmaterials. Bothareatleastanorderofmagni- tude larger than prior datasets. The Intrinsic Images in the Wild (IIW) dataset includes millions of crowdsourced annotations of relative comparisons of ma- terial properties at pairs of points in each scene. These datasets all require carefulcrowdsourcingandusetheabilityofhumanstojudgematerialsdespite variationsinillumination,viewpoint,imagingconditions,andcontext. Using these large-scale datasets we demonstrate state-of-the-art algorithms for material recognition using OpenSurfaces and MINC, and intrinsic image decomposition using IIW.We also develop state-of-the-art algorithms for object detection(Inside-OutsideNetwork,ION)andvisualsearchforstylesimilarity (ProductNet). In this thesis we have demonstrated how the combination of crowdsourcing atscaleandnewdeeplearningarchitecturescancreatenewtoolstoletconsumers understandandeditimages,scenes,materialsandobjects. BIOGRAPHICALSKETCH SeanBell wasbornin Toronto,Canada. Whenfirstasked “whatdoyouwant to bewhenyougrowup?”,hewouldanswer“computerprogrammer!”,noteven knowingwhattheydid—hejustlovedcomputers. Inearlyhighschool,hewould spendhissparetimeinthecomputerlab,workingonmakinggraphicaleffects andanimationswithJavaApplets. From2007to2011,hestudiedEngineering ScienceattheUniversityofToronto. Inhissecondyear,histeamwonfirstplace intheAER201EngineeringDesignProject,programmingthemicro-controllerfor arobotthatdispensesanexactnumberofcandiesathighspeed(averyuseful device). In his last year, he built a ray-tracer that “borrowed” unusedcomputers acrosscampustorenderhisscenes,onescanlineatatime. During his undergrad, Sean spent five summers working at Hill & Schu- macher,apatentlawfirm,andalmostwentintopatentlawasapossiblecareer. However,hewasmoreinterestedinwritingsoftwaretohelpdraftpatentsthan thepatentsthemselves. Thisturnedintohisundergraduatethesis, whichwasa real-timesystemtodetectinconsistenciesinpatentsastheywerebeingdrafted. Since 2011, Sean has been studying for his doctorate degree in Computer ScienceatCornellUniversity. Whenhefirstarrived,hewasn’tsurewhetherto workonnaturallanguageprocessing,computergraphics,ormachinelearning. Hequicklyfoundhisplaceintheboundarybetweengraphicsandvision,and has since enjoyed five wonderful years studying at Cornell. Upon graduation in 2016, he is co-foundinga deep learning company, GrokStyle, based on his work invisualsearchandstylesimilarity. iii Thisthesisisdedicatedtomyparents,fortheirloveandsupport, andtoStephanieSang,forputtingupwithsomuch. iv ACKNOWLEDGEMENTS This thesis would not have been possible without the support, guidance, and mentorshipofmyadvisorProf. KavitaBala. Kavitahasalwaysencouragedme tostriveforthebestpossibleversionofanythingthatIworkon,andhasbeen centraltolearninghowtodoresearch. IwouldliketothankProf. NoahSnavely, as my close collaborator and committee member; his ideas and feedback have been invaluable to research meetings. I would also like to thank collaborators PaulUpchurch,LarryZitnick,andRossGirshick,andmyotherPhDcommittee member, Prof. Charles Van Loan, as well as Profs. David Bindel and Serge Belongieforservingasproxiesonmycommittee. IthankmyfriendsandcolleaguesintheGraphicsandVisionLab,fortheir camaraderie, willingness to discuss research ideas and read paper drafts, and for making it such a great place to work: Kevin Matzen, Tim Langlois, Bala´zs Kova´cs,andPaulUpchurch,aswellasAlbertLiu,PramookKhungurn,Daniel Hauagge, Kyle Wilson, Nicolas Savva, Scott Wehrwein, Eston Schweikart, Jui- hsienWang,WenzelJakob,andStevenAn. Iwouldalsoliketothankcomputer graphics professors Steve Marschner and Doug James, as well as the entire CornellComputerSciencedepartment, foralwaysattendingmytalkswithgreat feedback. AttheUniversityofToronto,Iwouldliketothankmycolleaguesandfriends for making undergrad so enjoyable, and for cultivating friendly competition: Trevor Campbell, Konstantine Tsotsos, Jamie Liu, Rick Zhang, Manan Arya, Sanae Rosen, Catherine Chen, Amy Chen, Angela Yoo, Zoya Gavrilov, Mark Harfouche,AdamPan,andCaseyScott-Songin. IthankProf. KyrosKutulakos forinspiringmetoconsiderresearchincomputergraphics. v Iwouldlike tothankthecollaboratorsandcolleagues thatImeetatconfer- encesandinternships,allthosewhoattendedmytalksandposters,andthose who emailed me about my work, with exciting ideas, questions, and discus- sionsaboutresearch,inparticularAbhinavShrivastava,IshanMisra,JonBarron, AndrejKarpathy,PeterGehler. I owe everything to my family, including my grandparents, cousins, aunts, uncles,mysiblingsIanandRobyn,andespeciallymyparentsSallyandGraydon, for creating sucha wonderful and supportive environment, helping me atevery stageinlife,andforputtingupwithmebeingawayforsolong. Iamgratefulto mygrandpaScottyBellforfundingmyundergraduateeducation. To my girlfriend Stephanie Sang, who supported me through the crunch times,keptmecompany,listenedtomyresearchstruggles,sentmehand-drawn comics, brought me home-cooked meals to the lab, and helped give my life meaningoutsidethelab. Youputupwithsomuch,andIameternallygrateful. Finally, I would like to thank the Zabs sandwich from Collegetown Bagels, andthelattefromGimmeCoffee,forbeingsodelicious. vi TABLEOFCONTENTS BiographicalSketch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii Dedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v TableofContents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii ListofTables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x ListofFigures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii 1 Introduction 1 2 OpenSurfaces: ARichlyAnnotatedCatalogofSurfaceAppearance 7 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.2 Relatedwork . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.3 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.3.1 Communityphotocollections . . . . . . . . . . . . . . . . . 13 2.3.2 Humanannotation . . . . . . . . . . . . . . . . . . . . . . . 13 2.3.3 OpenSurfacesdatarepresentation . . . . . . . . . . . . . . 15 2.3.4 Annotationstages . . . . . . . . . . . . . . . . . . . . . . . 16 2.4 TheOpenSurfacesannotationpipeline . . . . . . . . . . . . . . . . 17 2.4.1 Stage1: Filteringimagesbyscenecategory . . . . . . . . . 18 2.4.2 Stage2: Flagimageswithimproperwhitebalance . . . . . 19 2.4.3 Stage3: Materialsegmentation . . . . . . . . . . . . . . . . 20 2.4.4 Stages4and5: Namingmaterialsandobjects . . . . . . . . 23 2.4.5 Stage6: Planarityvoting . . . . . . . . . . . . . . . . . . . . 24 2.4.6 Stage7: Rectifiedtextures . . . . . . . . . . . . . . . . . . . 25 2.4.7 Stage8: Appearancematching . . . . . . . . . . . . . . . . 26 2.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 2.5.1 OpenSurfacesstatistics . . . . . . . . . . . . . . . . . . . . . 30 2.5.2 Taskanalytics . . . . . . . . . . . . . . . . . . . . . . . . . . 34 2.6 Proof-of-ConceptApplications . . . . . . . . . . . . . . . . . . . . 37 2.6.1 Texturing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 2.6.2 Informedscenesimilarity . . . . . . . . . . . . . . . . . . . 38 2.6.3 Futureapplications . . . . . . . . . . . . . . . . . . . . . . . 38 2.7 Conclusionsandfuturework . . . . . . . . . . . . . . . . . . . . . 40 2.8 Impact . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 3 Material Recognition in the Wild with the Materials in Context Database 47 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 3.2 PriorWork . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 3.3 TheMaterialsinContextDatabase(MINC) . . . . . . . . . . . . 52 3.3.1 Sourcesofdata . . . . . . . . . . . . . . . . . . . . . . . . . 53 3.3.2 Segments,Clicks,andPatches . . . . . . . . . . . . . . . . 54 vii 3.4 Materialrecognitioninreal-worldimages . . . . . . . . . . . . . . 58 3.4.1 Trainingprocedure . . . . . . . . . . . . . . . . . . . . . . . 58 3.4.2 Fullscenematerialclassification . . . . . . . . . . . . . . . 59 3.5 ExperimentsandResults . . . . . . . . . . . . . . . . . . . . . . . . 61 3.5.1 Patchmaterialclassification . . . . . . . . . . . . . . . . . . 61 3.5.2 Fullscenematerialsegmentation . . . . . . . . . . . . . . . 64 3.5.3 ComparingMINCtoFMD . . . . . . . . . . . . . . . . . . 67 3.5.4 ComparingCNNswithpriormethods . . . . . . . . . . . . 68 3.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 3.7 Impact . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 4 IntrinsicImagesintheWild 71 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 4.2 Relatedwork . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 4.3 Database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 4.3.1 Whatjudgementsshouldwecollect? . . . . . . . . . . . . . 76 4.3.2 Whichimagesandwhichpairsofpoints? . . . . . . . . . . 78 4.3.3 Annotationinterface . . . . . . . . . . . . . . . . . . . . . . 81 4.3.4 Dataverification . . . . . . . . . . . . . . . . . . . . . . . . 83 4.3.5 Errormetric: WHDR . . . . . . . . . . . . . . . . . . . . . . 86 4.3.6 Discussionandresults . . . . . . . . . . . . . . . . . . . . . 87 4.4 IntrinsicImagesAlgorithm . . . . . . . . . . . . . . . . . . . . . . 89 4.4.1 Initialization . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 4.4.2 Stage1: Optimizereflectance . . . . . . . . . . . . . . . . . 93 4.4.3 Stage2: Optimizeforshading . . . . . . . . . . . . . . . . . 99 4.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 4.5.1 Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 4.5.2 Ranking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 4.5.3 MITIntrinsicImagesdataset . . . . . . . . . . . . . . . . . 108 4.6 Limitationsandfuturework . . . . . . . . . . . . . . . . . . . . . . 108 4.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 4.8 Impact . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 5 Inside-Outside Net: Detecting Objects in Context with Skip Pooling andRecurrentNeuralNetworks 112 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 5.2 Priorwork . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 5.3 Architecture: Inside-OutsideNet(ION) . . . . . . . . . . . . . . . 117 5.3.1 Poolingfrommultiplelayers . . . . . . . . . . . . . . . . . 118 5.3.2 ContextfeatureswithIRNNs . . . . . . . . . . . . . . . . . 119 5.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 5.4.1 Experimentalsetup . . . . . . . . . . . . . . . . . . . . . . . 122 5.4.2 PASCALVOC2007 . . . . . . . . . . . . . . . . . . . . . . . 127 5.4.3 PASCALVOC2012 . . . . . . . . . . . . . . . . . . . . . . . 128 viii

Description:
Both are at least an order of magni- tude larger than prior datasets. The Intrinsic Images in Cornell Computer Science department, for always attending my talks with great feedback. At the University of Toronto, I would like to thank my colleagues and friends for making undergrad so enjoyable, and
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.