Table Of Content

Complex Prediction Problems A novel approach to multiple Structured Output Prediction Yasemin Altun Max-PlanckInstitute ECML HLIE08 YaseminAltun ComplexPrediction Information Extraction Extract structured information from unstructured data Typical subtasks NamedEntityRecognition: person,location,organization names CoreferenceIdentification: nounphrasesreferingtothe sameobject Relationextraction: eg. PersonworksforOrganization Ultimate tasks DocumentSummarization QuestionAnswering YaseminAltun ComplexPrediction Complex Prediction Problems Complex tasks consisting of multiple structured subtasks Real world problems too complicated for solving at once Ubiquitous in many domains NaturalLanguageProcessing ComputationalBiology ComputationalVision YaseminAltun ComplexPrediction !"#$%"$#&’()$"*$"+!),-.#&’/%"/01- 2345*6&7- Co!m!p"l#e$x"P%r&e"d’i(c)t*io+n,E-x$a./m.0+p%le/1)20+1+34()56+."0%) !"&+%7/64)!.6$&.$6")56"70&.0+% 8( !""#!$%&’%’(#’(&()’&*+*’(*,)-*(#’./)#&%(+#%’0, 4( !&&&&&&!1!!1!1!1!!1!1!1!1--------!----!-!1!1!1!1!!11!1!1!1!&&&&&1!1!!1!1!!) Motion Tracking in Computational Vision ! 9:S"u6b/ta11s)k&:+I%de’n.6ti/fy0%jo.i’n(t)a*n+gl,es-o$f.h"u6m):a0n’0b+o%dy());7"%.0<4) =+0%.)/%31"’)+<)>$,/%)?+74 !"#$%#!% &’()*+,-./01, 2 YaseminAltun ComplexPrediction Complex Prediction Example !"#$%"$#&’()$"*$"+!),-.#&’/%"/01- 3-D protein structure prediction in Computational Biology Subtask: Identify secondary structured Prediction from amino-acid sequence 2345*6&7- AAYKSHGSGDYGDHDVGHPTPGDPWVEPDYGINVYHSDTYSGQW ! !"#$"%&"’()*+,-$./.0+%/1)20+1+34()56+."0%) AAYKSHGSGDYGDHDVGHPTPGDPWVEPDYGINVYHSDTYSGQW !"&+%7/64)!.6$&.$6")56"70&.0+% 8( !""#!$%&’%’(#’(&()’&*+*’(*,)-*(#’./)#&%(+#%’0, 4( !&&&&&&!1!!1!1!1!!1!1!1!1--------!----!-!1!1!1!1!!11!1!1!1!&&&&&1!1!!1!1!!) ! 9:"6/11)&+%’.6/0%.’()*+,-$."6):0’0+%());7"%.Y0as<em4inAl)tun ComplexPrediction =+0%.)/%31"’)+<)>$,/%)?+74 !"#$%#!% &’()*+,-./01, 2 Standard Approach to Complex Prediction Pipeline Approach SUTTON,MCCALLUMANDROHANIMANESH Define intermediate/sub-tasks y0 S Solve them individually or in a cPaasrcsaeded manner NP VP Use output of subtasks as features (input) for target task ... NER y01 y02 y03 Chunk y11 y12 y1m ... POS y1 y1 y1 POS y2 y2 y2 y2 y1 1 2 3 1 2 3 n-1 n x x x x x x x x 1 2 3 1 2 3 n-1 n 2 X where for POS andaf)or NER where x:x+POS tags b) Problems: ErrorproFpiagguarteio1n: Graphical representation of (a) linear-chain CRF, and (b) factorial CRF. Although the Nolearningacrosshtaidsdkesn nodes can depend on observations at any time step, for clarity we have shown linksonlytoobservationsatthesametimestep. YaseminAltun ComplexPrediction parameterestimationusingBP(Section3.3.2), andcascadedparameterestimation(Section3.3.3). Then, in Section 3.4, we describe inference and parameter estimation in marginal DCRFs. In Sec- tion 4, we present the experimental results, including evaluation of factorial CRFs on noun-phrase chunking(Section4.1),comparisonofBPschedulesinFCRFs(Section4.2),evaluationofmarginal DCRFsonboththechunkingdataandsyntheticdata(Section4.3),andcascadedtrainingofDCRFs fortransferlearning(Section4.4). Finally,inSection5andSection6,wepresentrelatedworkand conclude. 2. ConditionalRandomFields(CRFs) Conditional random fields (CRFs) (Lafferty et al., 2001; Sutton and McCallum, 2006) are condi- tionalprobabilitydistributionsthatfactorizeaccordingtoanundirectedmodel. CRFsaredefinedas follows. Letybeasetofoutputvariablesthatwewishtopredict,andxbeasetofinputvariables that are observed. For example, in natural language processing, x may be a sequence of words x= x fort =1,...,T and y= y a sequence of labels. Let G be a factor graph over y and x t t { } { } withfactorsC= ! (y ,x ) ,wherex isthesetofinputvariablesthatareargumentstothelocal c c c c { } function ! , and similarly for y . A conditional random field is a conditional distribution p that c c ! factorizesas 1 p (y x)= !! (y ,x ), ! c c c | Z(x) c C ∈ whereZ(x)="y#c C!c(yc,xc)isanormalizationfactoroverallstatesequencesforthesequence ∈ x. Weassumethepotentialsfactorizeaccordingtoasetoffeatures f ,as k { } ! (y ,x )=exp "$ f (y ,x ) , c c c k k c c ! k " so that the family of distributions p is an exponential family. In this paper, we shall assume ! { } thatthefeaturesaregivenandfixed. Themodelparametersareasetofrealweights%= $ ,one k { } weightforeachfeature. Many previous applications use the linear-chain CRF, in which a first-order Markov assump- tionismadeonthehiddenvariables. AgraphicalmodelforthisisshowninFigure1. Inthiscase, the cliques of the conditional model are the nodes and edges, so that there are feature functions 696 New Approach to Complex Prediction Proposed approach: Solve tasks jointly discriminatively Decomposemultiplestructuredtasks Usemethodsfrommultitasklearning Goodpredictorsareitsmooth Restrictthesearchspaceforsmoothfunctionsofalltasks Devicetargetedapproximationmethods Standardapproximationalgorithmsdonotcapturespecifics Dependencieswithintasksarestrongerthandependencies acrosstasks Advantages Less/noerrorpropagation Enableslearningacrosstasks YaseminAltun ComplexPrediction Structured Output (SO) Prediction Supervised Learning Giveninput/outputpairs(x,y)∈X ×Y Y ={0,...,m},Y =(cid:60) Datafromunknown/fixeddistributionD overX ×Y Goal: Learnamappingh:X →Y State-of-theartarediscriminative,eg. SVMs,Boosting In Structured Output prediction, Multivariateresponsevariablewithstructuraldependency. |Y|: exponentialinnumberofvariables Sequences,tree,hierarchicalclassification,ranking YaseminAltun ComplexPrediction SO Prediction Generative framework: Model P(x,y) Advantages: Efficientlearningandinferencealgorithms Disadvantages: Harderproblem,Questionable independenceassumption,Limitedrepresentation Local approaches: eg. [Roth, 2001] Advantages: Efficientalgorithms Disadvantages: Ignore/problematiclongrange dependencies Discriminative learning Advantages: Richerrepresentationviakernels,capture dependencies Disadvantages: Expensivecomputation(SOprediction involvesiterativelycomputingmarginalsorbestlabeling duringtraining) YaseminAltun ComplexPrediction Formal Setting Given S = ((x ,y ),...,(x,y )) 1 1 l n Find h : X → Y,h(x) = argmax F(x,y) y Linear discriminant function F : X ×Y → R FSU(xTT,OyN),=M(cid:104)CψC(AxL,LyU)M,wA(cid:105)NDROHANIMANESH w Cost function: ∆(y,y(cid:48)) ≥ 0 eg. 0-1 loss, Hamming loss Canonical example: Label sequence learning, where both x and y are sequences Figure1: Graphical repreYsaesenmtinaAtiltounn ofC(oam)plelxiPnreedaicrti-ocnhain CRF, and (b) factorial CRF. Although the hidden nodes can depend on observations at any time step, for clarity we have shown linksonlytoobservationsatthesametimestep. parameterestimationusingBP(Section3.3.2), andcascadedparameterestimation(Section3.3.3). Then, in Section 3.4, we describe inference and parameter estimation in marginal DCRFs. In Sec- tion 4, we present the experimental results, including evaluation of factorial CRFs on noun-phrase chunking(Section4.1),comparisonofBPschedulesinFCRFs(Section4.2),evaluationofmarginal DCRFsonboththechunkingdataandsyntheticdata(Section4.3),andcascadedtrainingofDCRFs fortransferlearning(Section4.4). Finally,inSection5andSection6,wepresentrelatedworkand conclude. 2. ConditionalRandomFields(CRFs) Conditional random fields (CRFs) (Lafferty et al., 2001; Sutton and McCallum, 2006) are condi- tionalprobabilitydistributionsthatfactorizeaccordingtoanundirectedmodel. CRFsaredefinedas follows. Letybeasetofoutputvariablesthatwewishtopredict,andxbeasetofinputvariables that are observed. For example, in natural language processing, x may be a sequence of words x= x fort =1,...,T and y= y a sequence of labels. Let G be a factor graph over y and x t t { } { } withfactorsC= ! (y ,x ) ,wherex isthesetofinputvariablesthatareargumentstothelocal c c c c { } function ! , and similarly for y . A conditional random field is a conditional distribution p that c c ! factorizesas 1 p (y x)= !! (y ,x ), ! c c c | Z(x) c C ∈ whereZ(x)="y#c C!c(yc,xc)isanormalizationfactoroverallstatesequencesforthesequence ∈ x. Weassumethepotentialsfactorizeaccordingtoasetoffeatures f ,as k { } ! (y ,x )=exp "$ f (y ,x ) , c c c k k c c ! k " so that the family of distributions p is an exponential family. In this paper, we shall assume ! { } thatthefeaturesaregivenandfixed. Themodelparametersareasetofrealweights%= $ ,one k { } weightforeachfeature. Many previous applications use the linear-chain CRF, in which a first-order Markov assump- tionismadeonthehiddenvariables. AgraphicalmodelforthisisshowninFigure1. Inthiscase, the cliques of the conditional model are the nodes and edges, so that there are feature functions 696

Description:

Complex Prediction Example. Motion Tracking in Computational Vision. Subtask: Identify joint angles of human body. Overall constraints: Computer vision:

Yasemin Altun PDF

27 Pages·2008·1.29 MB·English

by Yasemin Altun

Checking for file health...

Save to my drive

Quick download

Download

Upgrade Premium

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Yasemin Altun

Description:

Complex Prediction Example. Motion Tracking in Computational Vision. Subtask: Identify joint angles of human body. Overall constraints: Computer vision:

See more

The list of books you might like

Upgrade Premium

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.