ebook img

Predicting Structured Data PDF

361 Pages·2007·2.1 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Predicting Structured Data

Predicting Structured Data x 4 x 2 x 5 x 3 x 1 y 5 y 3 y 4 y 2 edited by Gökhan Bakır, Thomas Hofmann, Bernhard Schölkopf, Alexander J. Smola, Ben Taskar, and S. V. N. Vishwanathan Predicting Structured Data Advances in Neural Information Processing Systems Published by The MIT Press Neural Information Processing Series Michael I. Jordan and Thomas Dietterich, editors Advances in Large Margin Classifiers, AlexanderJ.Smola,Peter L.Bartlett,BernhardSch¨olkopf,andDaleSchuurmans,eds.,2000 Advanced Mean Field Methods: Theory and Practice, ManfredOpperandDavidSaad,eds.,2001 Probabilistic Models of the Brain: Perception and Neural Function, RajeshP.N.Rao,BrunoA.Olshausen,andMichaelS.Lewicki,eds.,2002 Exploratory Analysis and DataModeling inFunctional Neuroimaging, FriedrichT.SommerandAndrzejWichert,eds.,2003 Advances in Minimum DescriptionLength: Theory and Applications, PeterD.Gru¨nwald,InJaeMyung,andMarkA.Pitt,eds.,2005 NewDirections inStatistical Signal Processing: From Systemsto Brain, SimonHaykin,JosC.Prncipe,TerrenceJ.Sejnowski,andJohnMcWhirter,eds.,2006 Nearest-Neighbor Methods in Learning and Vision: Theory and Practice, GregoryShakhnarovich, PiotrIndyk, andTrevorDarrell,eds.,2006 NewDirections inStatistical Signal Processing: From Systemsto Brains, SimonHaykin,JosC.Prncipe,TerrenceJ.Sejnowski,andJohnMcWhirter,eds.,2007 Predicting Structured Data, Go¨khanBakır,ThomasHofmann,BernardSch¨olkopf,Alexander J.Smola,BenTaskar,S.V.N. Vishwanathan,eds.,2007 Towards Brain-Computer Interfacing, GuidoDornhege,Jos´edelR.Mill´an,ThiloHinterberger,DennisMcFarland,Klaus-Robert Mu¨ller,eds.,2007 Predicting Structured Data edited by Go¨khan Bakır Thomas Hofmann Bernhard Scho¨lkopf Alexander J. Smola Ben Taskar S.V.N. Vishwanathan The MIT Press Cambridge, Massachusetts London, England (cid:2)c2007Massachusetts Institute ofTechnology All rights reserved. No part of this book may be reproduced in any form by any electronic or mechanical means (including photocopying, recording, or information storage and retrieval) withoutpermissioninwritingfromthepublisher. PrintedandboundintheUnitedStates ofAmerica LibraryofCongressCataloging-in-PublicationData PredictingStructuredData /editedbyG¨okhanBakIr...[etal.]. p. cm. CollectedpapersbasedontalkspresentedattwoNeuralInformationProcessingSystems workshops. Includesbibliographicalreferencesandindex. ISBN978-0-262-02617-8(alk.paper) 1.Machinelearning.2.Computeralgorithms.3.Kernelfunctions. 4.Datastructures (Computerscience). I.BakIr,Go¨khan.II.NeuralInformationProcessingSystemsFoundation. Q325.5.P742007 006.3’1–dc22 2006047001 Contents Preface x I Introduction 1 1 Measuring Similarity with Kernels 3 1.1 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2 Kernels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.3 Operating in Reproducing Kernel Hilbert Spaces . . . . . . . . . . . 11 1.4 Kernels for Structured Data . . . . . . . . . . . . . . . . . . . . . . . 14 1.5 An Example of a Structured Prediction Algorithm Using Kernels . . 22 1.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 2 Discriminative Models 25 2.1 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 2.2 Online Large-MarginAlgorithms . . . . . . . . . . . . . . . . . . . . 26 2.3 Support Vector Estimation . . . . . . . . . . . . . . . . . . . . . . . 28 2.4 Margin-Based Loss Functions . . . . . . . . . . . . . . . . . . . . . . 32 2.5 Margins and Uniform Convergence Bounds . . . . . . . . . . . . . . 37 2.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 3 Modeling Structure via Graphical Models 43 3.1 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 3.2 Conditional Independence . . . . . . . . . . . . . . . . . . . . . . . . 43 3.3 Markov Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 3.4 Bayesian Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 3.5 Inference Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 3.6 Exponential Families . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 3.7 Probabilistic Context-Free Grammars . . . . . . . . . . . . . . . . . 57 3.8 Structured Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . 60 3.9 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 vi Contents II Structured Prediction Based on Discriminative Models 65 4 Joint Kernel Maps 67 Jason Weston, Go¨khan Bakır, Olivier Bousquet, Tobias Mann, William Stafford Noble, and Bernhard Sch¨olkopf 4.1 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 4.2 Incorporating Correlations into Linear Regression . . . . . . . . . . . 68 4.3 Linear Maps and Kernel Methods : Generalizing Support Vector Machines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 4.4 Joint Kernel Maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 4.5 Joint Kernels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 4.6 Experiments. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 4.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 5 Support Vector Machine Learning for Interdependent and Struc- tured Output Spaces 85 Yasemin Altun, Thomas Hofmann, and Ioannis Tsochandiridis 5.1 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 5.2 A Framework for Structured/Interdependent Output Learning . . . 86 5.3 A Maximum-Margin Formulation . . . . . . . . . . . . . . . . . . . . 90 5.4 Cutting-Plane Algorithm . . . . . . . . . . . . . . . . . . . . . . . . 94 5.5 Alternative Margin Formulations . . . . . . . . . . . . . . . . . . . . 98 5.6 Experiments. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 5.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 5.8 Proof of Proposition 37 . . . . . . . . . . . . . . . . . . . . . . . . . 102 6 Efficient Algorithms for Max-Margin Structured Classification 105 Juho Rousu, Craig Saunders, Sandor Szedmak, and John Shawe-Taylor 6.1 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 6.2 Structured Classification Model . . . . . . . . . . . . . . . . . . . . . 107 6.3 Efficient Optimization on the Marginal Dual Polytope . . . . . . . . 117 6.4 Experiments. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 6.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126 7 Discriminative Learning of Prediction Suffix Trees with the Per- ceptron Algorithm 129 Ofer Dekel, Shai Shalev-Shwartz, and Yoram Singer 7.1 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 7.2 Suffix Trees for Stream Prediction . . . . . . . . . . . . . . . . . . . 131 7.3 PSTs as Separating Hyperplanes and the perceptron Algorithm . . . 133 7.4 The Self-Bounded Perceptron for PST Learning . . . . . . . . . . . . 136 7.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142 Contents vii 8 A General Regression Framework for Learning String-to-String Mappings 143 Corinna Cortes, Mehryar Mohri, and Jason Weston 8.1 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 8.2 General Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . 144 8.3 Regression Problems and Algorithms . . . . . . . . . . . . . . . . . . 146 8.4 Pre-Image Solution for Strings . . . . . . . . . . . . . . . . . . . . . 154 8.5 Speeding up Training. . . . . . . . . . . . . . . . . . . . . . . . . . . 159 8.6 Comparison with Other Algorithms. . . . . . . . . . . . . . . . . . . 160 8.7 Experiments. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 8.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168 9 Learning as Search Optimization 169 Hal Daum´e III and Daniel Marcu 9.1 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169 9.2 Previous Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170 9.3 Search Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . 171 9.4 Experiments. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 9.5 Summary and Discussion . . . . . . . . . . . . . . . . . . . . . . . . 188 10 Energy-Based Models 191 Yann LeCun, Sumit Chopra, Raia Hadsell, Marc’Aurelio Ranzato, and Fu Jie Huang 10.1 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191 10.2 Energy-BasedTraining: Architecture and Loss Function . . . . . . . 197 10.3 Simple Architectures . . . . . . . . . . . . . . . . . . . . . . . . . . . 206 10.4 Latent Variable Architectures . . . . . . . . . . . . . . . . . . . . . . 211 10.5 Analysis of Loss Functions for Energy-BasedModels . . . . . . . . . 214 10.6 Efficient Inference: Nonprobabilistic Factor Graphs . . . . . . . . . . 225 10.7 EBMs for Sequence Labeling and Structured Outputs . . . . . . . . 230 10.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241 11 Generalization Bounds and Consistency for Structured Labeling 247 David McAllester 11.1 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247 11.2 PAC-BayesianGeneralization Bounds . . . . . . . . . . . . . . . . . 249 11.3 Hinge Loss . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252 11.4 Consistency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253 11.5 A Generalization of Theorem 62 . . . . . . . . . . . . . . . . . . . . 256 11.6 Proofs of Theorems 61 and 62 . . . . . . . . . . . . . . . . . . . . . . 258 11.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261 viii Contents III Structured Prediction Using Probabilistic Models 263 12 Kernel Conditional Graphical Models 265 Fernando P´erez-Cruz, Zoubin Ghahramani, and Massimiliano Pontil 12.1 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265 12.2 A Unifying Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267 12.3 Conditional Graphical Models . . . . . . . . . . . . . . . . . . . . . . 274 12.4 Experiments. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277 12.5 Conclusions and Further Work . . . . . . . . . . . . . . . . . . . . . 280 13 Density Estimation of Structured Outputs in Reproducing Kernel Hilbert Spaces 283 Yasemin Altun and Alex J. Smola 13.1 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283 13.2 Estimating Conditional Probability Distributions over Structured Outputs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285 13.3 A Sparse Greedy Optimization . . . . . . . . . . . . . . . . . . . . . 292 13.4 Experiments: Sequence Labeling . . . . . . . . . . . . . . . . . . . . 295 13.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 299 14 Gaussian Process Belief Propagation 301 Matthias W. Seeger 14.1 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 301 14.2 Data and Model Dimension . . . . . . . . . . . . . . . . . . . . . . . 303 14.3 Semiparametric Latent Factor Models . . . . . . . . . . . . . . . . . 306 14.4 Gaussian Process Belief Propagation . . . . . . . . . . . . . . . . . . 308 14.5 Parameter Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . 316 14.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 317 References 319 Contributors 341 Index 345 Series Foreword The yearly Neural Information Processing Systems (NIPS) workshops bring to- getherscientistswithbroadlyvaryingbackgroundsinstatistics,mathematics,com- puter science, physics, electrical engineering, neuroscience, and cognitive science, unified by a common desire to develop novel computational, and statistical strate- giesfor informationprocessing,andto understandthe mechanismsfor information processing in the brain. As opposed to conferences, these workshops maintain a flexible format that both allows and encourages the presentation and discussion of work in progress,and thus serve as an incubator for the development of important new ideas in this rapidly evolving field. The serieseditors,in consultationwith workshoporganizersandmembers ofthe NIPS foundation board, select specific workshop topics on the basis of scientific excellence, intellectual breadth, and technical impact. Collections of papers chosen and edited by the organizers of specific workshops are built around pedagogical introductory chapters, while research monographs provide comprehensive descrip- tions of workshop-relatedtopics, to create a series of books that provides a timely, authoritativeaccountofthelatestdevelopmentsintheexcitingfieldofneuralcom- putation. Michael I. Jordan and Thomas Dietterich

Description:
Machine learning develops intelligent computer systems that are able to generalize from previously seen examples. A new domain of machine learning, in which the prediction must satisfy the additional constraints found in structured data, poses one of machine learning’s greatest challenges: learnin
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.