Table Of Content

NEWDIRECTIONSINSEMI-SUPERVISEDLEARNING by AndrewBrianGoldberg Adissertationsubmittedinpartialfulfillmentof therequirementsforthedegreeof DoctorofPhilosophy (ComputerSciences) atthe UNIVERSITYOFWISCONSIN–MADISON 2010 (cid:13)c CopyrightbyAndrewBrianGoldberg2010 AllRightsReserved i Formyparents,whoalwaystaughtmetostriveforthehighestachievementspossible. ii ABSTRACT In many real-world learning scenarios, acquiring a large amount of labeled training data is expensiveandtime-consuming. Semi-supervisedlearning(SSL)isthemachinelearningparadigm concerned with utilizing unlabeled data to try to build better classifiers and regressors. Unlabeled data is a powerful resource, yet SSL can be difficult to apply in practice. The objective of this dissertation is to move the field toward more practical and robust SSL. This is accomplished by severalkeycontributions. First, we introduce the online (and active) semi-supervised learning setting, which considers large amounts of mostly unlabeled data arriving constantly over time. An online SSL classifier must be able to make efficient predictions at any moment and update itself in response to labeled and unlabeled data. Previously, almost all SSL assumed a fixed dataset was available before training began, and receiving new data meant retraining a potentially slow model. We present two families of online semi-supervised learners that reformulate the popular manifold and cluster as- sumptionsintotheoreticallymotivatedandefficientonlinelearningalgorithms. We also invent several novel model assumptions and corresponding algorithms for the more commonbatchSSLsetting. Principledinnature,theseassumptionsaregearedtowardmakingSSL easier to apply to a wider variety of situations in the real world. Many SSL algorithms construct a graph over the data, to approximate an assumed (single) underlying low-dimensional manifold. In contrast, our novel multi-manifold assumption handles data lying on multiple manifolds that may differ in dimensionality, orientation, and density. The work also introduces a novel low- rank assumption, based on recent developments in matrix completion, which enables multi-label transductionwithmanyunobservedfeatures. Othercontributionsutilizeseveralnewformsofweak sideinformation,suchasdissimilarityrelationshipsororderpreferencesoverpredictions. Finally, iii SSLisappliedtosentimentoropinionanalysis,exploringdomain-specificassumptionsandgraphs toextendSSLtothischallengingareaofnaturallanguageprocessing. The dissertation provides extensive experimental results demonstrating that these novel SSL learningsettingsandmodelingassumptionsleadtoalgorithmswithsignificantperformancebene- fitsincomputervision,textclassification,bioinformatics,andotherpredictiontasks. iv ACKNOWLEDGMENTS First off, none of this would have been possible without the constant support and encourage- ment of my advisor Jerry. I was quite fortunate to begin my graduate career at exactly the same time that Jerry started his own next chapter as a professor here in Madison. It was his artificial in- telligence course in Fall 2005 and especially the elective in advanced natural language processing andmachinelearninginSpring2006thatsolidifiedmyinterestinpursuingthislineofresearch. He almostimmediatelytookmeunderhiswing,gettingmeexcitedaboutsemi-supervisedlearning,as well as the entire research process. Jerry has always been instrumental in helping formulate ideas and master tough mathematical concepts. On numerous occasions, he has pushed me to step out of my comfort zone, and I am especially thankful for this. It was a pleasure to have been invited to be a co-author on our introductory textbook on semi-supervised learning; this experience and exposure to the larger research community is largely responsible for my upcoming job! Jerry’s sustained confidence in me over the past 5 years has left a lasting mark, and I will look back on thisperiodofmylifewithagreatsenseofaccomplishment. Imustalsothankmyothercommitteemembersandprofessorswhohavecontributedtomyin- terestinmachinelearninganddesiretodoresearch. ProfessorSteveWrightwasthefirstprofessor I ever made contact with at Wisconsin, when he personally notified me of my acceptance into the Ph.D. program. This friendly, welcoming, and unpretentious tone has permeated my entire time here. Whether in the classroom, doing research together, or figuring out how the Perl scripts on optimization-online.orgwork,IhaveappreciatedgettingtoworkcloselywithSteve. ProfessorJudeShavlikwasalsooneofthefirstpeopleImetuponmovingtoMadison. During thesummerbeforeofficiallyenrolling,JudeencouragedmetoreadTomMitchell’sclassicMachine Learningtext,whichwhettedmyappetiteforwhatlayahead. Judewasalsoresponsibleforsetting v me up to do an independent study with Professor Michael Ferris, in which I learned to use Matlab andcodemyfirstsupportvectormachine. Itishardtobelievehowmuchhashappenedsincethese earlyexperiences. ProfessorMarkCravenhasalsobeenalastinginfluenceonmygraduatecareer. HeandProfes- sor David Page’s sequence of bioinformatics courses really got me excited about the potential ap- plicationsofmachinelearning. Ihavesincetriedtoensuremyresearchremainspractical-minded, though without sacrificing mathematical sophistication or justification. Working with Mark in courses, TREC Genomics, and other projects has always been a pleasure. I am also especially gratefulforhisfrequentpresenceandhelpfulfeedbackatvariouspracticetalksandpresentations. I am thankful for the experience of working closely with ECE Professor Rob Nowak over the last three or four years. Applying ideas from network tomography to the seemingly unrelated task of reassembling texts deconstructed into bags of words provided my first opportunity for interde- partmental collaboration and helped broaden my research perspective. I have enjoyed getting to known Rob and his students who bring a different set of skills and technical background to the table (or should I say whiteboard?). Rob has been directly involved in several of the projects rep- resentedinthisthesis,andhasoftenbeenlikeasecondadvisortome,invitingmetoparticipatein hisreadinggroupsandprivategroupworkshops. GraceWahbaservedonmypreliminaryexamcommitteeandhasalsobeenaninspiration. Her course in Reproducing Kernel Hilbert Spaces forced me to push my limits, and as a result, I have come to appreciate the long history of research in Statistics that forms the foundation for most of modernmachinelearning. Several mentors from outside the university have played an important role in preparing me for completing this dissertation. My summer internships with Peng Xu at Google Research, and with Ariel Fuxman and Anitha Kannan at Microsoft Research Silicon Valley, provided great hands-on exposure trying to tackle real-world problems. I enjoyed open access to vast resources, including largestoresofunlabeleddataandcomputingpower. Peng,Ariel,andAnithataughtmealotabout howresearchgetsdoneinthe”realworld,”whichhasstayedwithmeasIfinishedmydissertation researchandplannedforthefuture. vi Inadditiontothosenamedabove,Icouldnothavemadeitthisfarwithoutthehelpofmanyco- authors on the work presented here, as well as other projects along the way. In alphabetical order (with current affiliations in parentheses): Rakesh Agrawal (MSR), David Andrzejewski (UW), CharlesR.Dyer(UW),MohamedEldawy(Google),NathanaelFillmore(UW),AlexFurger(UW), Arthur Glenberg (Arizona State), Bryan Gibson (UW), Lijie Heng (UW), Tushar Khot (UW), Ming Li (Nanjing), Michael Rabbat (McGill), Ben Recht (UW), Burr Settles (CMU), John Shafer (MSR), Aarti Singh (CMU), Bradley Strock (UW), Panayiotis Tsaparas (MSR), Jurgen Van Gael (Cambridge),JunmingXu(UW),andZhitingXu(UW). Throughoutthisprocess,Ihavebenefitedfromthesupportandfriendshipofmanyotherfellow classmatesandcolleaguesinComputerSciencesandbeyond. Therearetoomanyspecificpeopleto name individually, but I want to thank the members of the AI Reading Group, HAMLET (Human and Machine Learning Experiments and Theory), Graduates Anonymous, the ECE Comm-DSP readinggroup,andtheWednesdayNightDrinkingClub. I could not have done this without the love and support of my family: my dad Steve, brother Jonathan, sister-in-law Jen, and grandparents Hilda, Evelyn, and Selig. I must also thank my aunt Martha Siegel (U. of Rochester, Ph.D. Mathematics ’69) for being an inspiration throughout graduate school. Finally, my late mother Susan would have been so proud to see her little boy get hisPh.D.Sheispartofwhatkeepsmegoingthroughthechallengesandfrustrationsofresearch. Lastbutnotleast,Iowemuchthankstomybestfriendandsoon-to-bewifeAmyBecker(UW- Madison, Ph.D. Mass Communications ’10). She has been a great sounding board for ideas over the last couple years, and I always enjoy our nerdy discussions about statistics and other shared research interests. Amy has helped me retain my sanity through the final stages of this endeavor, encouraging me to get things done so we can move on to the next chapter in our life together. It hasbeenverycomfortingtonavigatethejobmarketandfinishourdissertationstogetherasateam; Iwouldprobablystillbeworkingonmineifitwerenotforherconstantencouragement. DISCARDTHISPAGE vii TABLE OF CONTENTS Page ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii LISTOFTABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi LISTOFFIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii PREFACE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv I Background Material 1 1 IntroductiontoSemi-SupervisedLearning . . . . . . . . . . . . . . . . . . . . . . . 2 1.1 ReviewofStatisticalMachineLearning . . . . . . . . . . . . . . . . . . . . . . . 2 1.2 LearningwithLabeledandUnlabeledData . . . . . . . . . . . . . . . . . . . . . 6 1.3 ThePracticalValueofSemi-SupervisedLearning . . . . . . . . . . . . . . . . . . 7 1.4 HowisSemi-SupervisedLearningPossible? . . . . . . . . . . . . . . . . . . . . . 9 1.5 Inductivevs. TransductiveSemi-SupervisedLearning . . . . . . . . . . . . . . . . 10 1.6 Caveats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2 PopularSemi-SupervisedLearningMethods . . . . . . . . . . . . . . . . . . . . . . 13 2.1 Self-Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.2 ProbabilisticGenerativeModels . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.3 Cluster-then-LabelMethods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.4 Co-TrainingandMultiviewLearning . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.4.1 Co-Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.4.2 MultiviewLearning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 2.5 Graph-BasedMethods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 2.6 Semi-SupervisedSupportVectorMachines . . . . . . . . . . . . . . . . . . . . . 35

Description:

Semi-supervised learning (SSL) is the machine learning paradigm before officially enrolling, Jude encouraged me to read Tom Mitchell's classic Machine Jonathan, sister-in-law Jen, and grandparents Hilda, Evelyn, and Selig.

NEW DIRECTIONS IN SEMI-SUPERVISED LEARNING by Andrew Brian Goldberg A dissertation ... PDF

207 Pages·2010·3.22 MB·English

Checking for file health...

Save to my drive

Quick download

Download

Upgrade Premium

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview NEW DIRECTIONS IN SEMI-SUPERVISED LEARNING by Andrew Brian Goldberg A dissertation ...

Description:

See more

The list of books you might like

Upgrade Premium

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.