ebook img

Manifold Alignment Determination: finding correspondences across different data views PDF

2.7 MB·
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Manifold Alignment Determination: finding correspondences across different data views

Manifold Alignment Determination: finding correspondences across different data views AndreasDamianou NeilD.Lawrence Amazon.com Amazon.com&UniversityofSheffield 7 [email protected] [email protected] 1 0 2 CarlHenrikEk UniversityofBristol,Bristol,UK n [email protected] a J 2 1 Abstract ] L WepresentManifoldAlignmentDetermination(MAD),analgorithmforlearning M alignmentsbetweendatapointsfrommultipleviewsormodalities. Theapproach iscapableoflearningcorrespondencesbetweenviewsaswellascorrespondences . t betweenindividualdata-points. Theproposedmethodrequiresonlyafewaligned a t examplesfromwhichitiscapabletorecoveraglobalalignmentthroughaprob- s abilisticmodel. Thestrong,yetflexibleregularizationprovidedbythegenerative [ modelissufficienttoaligntheviews. Weprovideexperimentsonbothsynthetic 1 andrealdatatohighlightthebenefitoftheproposedapproach. v 9 4 1 Introduction 4 3 0 Multiviewlearningisastrandofmachinelearningwhichaimsatconsolidatingmultiplecorrespond- . ingviewsinasinglemodel. Theunderlyingideaistoexploitcorrespondencebetweentheviewsto 1 learnasharedrepresentation. Averycommonscenarioiswhentheviewsarecomingfromdifferent 0 modalities. Asanexample, assumethatweareobservingaconversationandseparatelyrecording 7 1 the audio and the video of the interaction. Audio and video are two disparate modalities, mean- : ing that it is non-trivial to compare the audio signal to the video and vice-versa. For example, it v is nonsensical to compute the Euclidean distance between a video and an audio frame. However, i X by exploiting the factthat the signals are aligned temporally, we can obtain a correspondence that canbeexploitedtolearnajointrepresentationforbothsignals. Thisshared,multiviewmodelnow r a providesameanstocomparetheviews,allowingustotransferinformationbetweenthesoundand videospecificmodalitiesandenablingjointregularizationwhenperforminginference. The scenario explained above is based on the assumption that the data-points in each view are aligned, i.e. that for each video frame there is a single corresponding sound snippet. This is a strong assumption, as different alignments would lead to completely different representations. In many scenarios we do not know the correspondence between the data-points and this uncertainty shouldbeincludedinthemultiviewmodel. Furthermore,discoveringanefficientexplicitalignment isinmanyscenariostheultimatetask,forexamplewhentryingtosolvecorrespondenceproblems. However, discovering an alignment means that we need to search over the space of all possible permutationsofthedata-points. Inpracticethisisinfeasible,asthenumberofpossiblealignments growssuper-exponentiallywiththenumberofdata-points. Inthispaperwepresentanalgorithmicsolutionthatcircumventstheproblemofhavingtorelyon onlypre-aligneddataforthepurposeofmultiviewlearning. Rather,weexploittheregularizationof ∗WorkdonewhileA.DamianouandN.LawrencewereattheUniversityofSheffield. 1 a probabilistic factorized multiview latent variable model which allows us to formulate the search asabipartite-matchingproblem. Weproposetwodifferentapproaches: onemyopic, whichaligns data-pointsinasequential(iterative)fashionandanonmyopicalgorithmwhichproducestheoptimal alignmentforabatchofdata-points. Inourcurrentimplementation, bothmethodsrelyonasmall numberofinitialalignedinstancestoactasa“prior”ofwhatanalignmentmeansintheparticular scenario. WedenotethisnumberbyN . WefurtherdenotebyN thenumberofdata-pointstobe init ∗ aligned,andN = N +N ,N (cid:28) N . ThemyopicmethoddevelopedherescaleswithO(N ) init ∗ init ∗ ∗ whilethenonmyopiconescaleswithO(N3)(butinpracticeitcanberuninlessthanaminutefor ∗ afewhundredsofdata-points). 2 Methodology Wewillnowproceedtoexplaintheconceptofmultiviewmodelling,describetheprobabilisticmodel andhowitcanbeusedforautomaticalignmentwithinouralgorithm. ObservingtwoviewsY(1) = {y(1)}N andY(2) = {y(2)}N ,wherey(1) ∈ RD1 andy(2) ∈ RD2,weconsiderbothviewsto i i=1 i i=1 i i havebeengeneratedfromasharedlatentvariableX={x }N wherex ∈RQ. Thetwoviewsare i i=1 i implicitlyalignedbyapermutationπ ={π }N asfollows: ify(1)isgeneratedfromx andy(2)is i i=1 i i i generatedfromXπ ,theny(1) andy(2) areincorrespondence. Sincethegeneratingspaceandthe i i i permutationareunobserved,wewishtointegratethemout. Thatis,wewishtobeabletocompute themarginallikelihood: (cid:90) N (cid:90) p(Y(1),Y(2))= p(π)(cid:89) p(y(1)|x )p(y(2)|Xπ )p(x ), (1) i i i i i π i=1 xi assumingthattheobservationsareconditionallyindependentgiventhelatentspace.Recently,Klami [Klami, 2013] formulated a variational approximation of the above model. The key to solve the challenging marginalization over the permutations comes from combining a uniform prior and an interestingexperimentalobservationofpermutations. In[Leskovecetal.,2010]theauthorssample permutationsandshowthatonlyaverysmallnumberofpermutationsareassociatedwithanon-zero probability. In[Klami,2013]thisbehaviourisexploitedbyreplacingtheintegralwithasumovera smallsubsetofpermutationsthatarehighlyprobable. Herewetakeadifferentapproach. Ratherthanassumingcompletelyunstructuredobservationsand a uniform prior over alignments, we assume that we are given a small set of aligned pairs. These pairsareusedas“anchors”fromwhichwesearchforthebestalignmentoftheremainingpoints. To achievethiswedonotincludethepermutationasapartofthemodelbutratherproposeanalgorithm able to search for permutations given a model already estimated in this small set of given aligned pairs. 3 ManifoldAlignmentDetermination WeadopttheManifoldRelevanceDetermination(MRD)[Damianouetal.,2012,2016]asagener- ativemodel. TheMRDisanon-parametricBayesianmodelformultiviewmodelling. Itiscapable oflearningcompactrepresentationsanditsBayesianformulationallowsforprincipledpropagation of uncertainty through the model. This regularization enables learning from small data-sets with manyviews. Thelearnedlatentspaceconsolidatestheviewswithafactorizedrepresentationwhich encodes variations which are shared between the views separately to those that are private. This structure is known as Inter Battery Factor Analysis (IBFA) [Tucker, 1958]. The IBFA structure meansthatiftwoviewscontainpairsofobservationsthatarecompletelydifferentitwilleffectively learn two separate models. In the other end of the spectrum, the two views could be presented in pairs which are in perfect correspondence and contain exactly the same information, corrupted by differentkindsof(possiblyhighdimensional)noise. Inthelatercase,IBFAwilllearnacompletely shared model. Importantly, anything between these two extremes is possible and the model does naturally prefer solutions which are as “shared” as possible, as this increases compactness which is in accordance with the model’s objective function. This is the key insight motivating our work, as a good alignment is such that the views are maximally shared, i.e. we argue and show that the MRDnaturallyseekstoalignthedata. TheIBFAstructureisalsoadoptedby“BayesianCanonical 2 CorrelationAnalysis”(BCCA)[Klamietal.,2013]whichwasusedasagenerativemodelin[Klami, 2013]. TheMRDandtheBCCAmodelarestructurallyrelated,butwhileBCCAislimitedtolinear mappings,MRDisformulatedinanon-parametricfashion,allowingforveryexpressivenon-linear mappings. Wewillnowdescribethealgorithmandthemodelusedtolearnalignmentsinmoredetail. Inspired bytheunderlyinggenerativemodelwerefertothisapproachasManifoldAlignmentDetermination or simply as MAD. Given are two views Y(1) and Y(2) as defined above, split by two index sets A={i|i∈Z,i≤N,y(1) ↔y(2)}andB ={i|i∈Z,i≤N,i∈/ A}sothat|A|=N ,|B|=N i i init ∗ and where y(1) ↔ y(2) means that the points are in correspondence. The task is now to learn a i i latentrepresentationthatreflectsthecorrespondenceinsetAandthenalignthesetB accordingly. In the MRD model there is a single latent space X from which the two observation spaces Y(1) and Y(2) are generated. The mappings f(1) and f(2) from the latent space are modelled using GaussianprocesspriorswithAutomaticRelevanceDetermination(ARD)[Neal,1996]covariance functions. TheARDcovariancefunctioncontainsadifferentweightforeachdimensionofitsinput. LearningtheseweightswithinaBayesianframework,suchasinMRDandMAD,allowsthemodel to automatically “switch off” dimensions of the latent space for each view independently, thereby factorizingthelatentspaceintosharedandprivatesubspaces. IntheMADscenariothemodelis, p(Y(1),Y(2)|w(1),w(2))= A A (cid:90) p(F(1)|X,w(1))p(F(2)|X,w(2))p(X) (cid:89) p(y(1)|f(1))p(y(2)|f(2)), (2) A A i i i i f(1),f(2),X ∀i∈A whereF ={f } andw(1)andw(2)aretheaforementionedparametersoftheARDcovariance A i ∀i∈A function. Specifically,w(j)determinestheimportanceofdimensioniwhengeneratingviewj. This i iswhatallowsthemodeltolearnafactorizedlatentspace,becauseasashareddimensionisdeemed one that has a non-zero weight for each view and a private dimension is one for which only one viewhasanon-zeroweight. Marginalisingthelatentspacefromthemodelisintractableforgeneral covariance functions, as the distribution over X is propagated through a non-linear mapping. In [Damianou et al., 2012, 2016] a variational compression scheme is formulated providing a lower boundonthelog-likelihood,andthisapproachisalsofollowedinourpaper. Inotherwords,weaim tomaximizethefunctionalF(Q,w(1),w(2))≤p(Y(1),Y(2)|w(1),w(2))withrespecttotheARD A A weightsandtheintroducedvariationaldistributionQwhichgovernstheapproximation. Noticethatwecanequivalentlywriteequation(2)afterthemarginalisationofF isperformed: A (cid:90) p(Y(1),Y(2)|w(1),w(2))= p(X)p(Y(1)|X,w(1))p(Y(2)|X,w(2)). (3) A A A A X As can be seen, the marginalization of F couples all outputs with index in A within each view. A Furthermore,bycomparingtoequation(1)weseethatinourMADframework,theexplicitpermu- tationofoneviewisreplacedwithrelevanceweightsforallviews. Theseweightsalonecannotplay aroleequivalenttoapermutation,butthepermutationlearningisachievedwhenwecombinethese weightswiththelearningalgorithmsdescribedbelow. 3.1 LearningAlignments Assume that we have trained our multiview model on a small initial set of N aligned pairs init (Y(1),Y(2)). For the remaining unaligned N data-points (Y(1),Y(2)) we wish to find a pair- A A ∗ B B ing of the data indexed by set B which corresponds to the alignment of the data indexed by A. As previously stated, learning is performed by optimizing a variational lower bound on the model marginallikelihood. Sincewehaveanapproximationtothemarginallikelihood,wecanobtainan approximateposteriorQoftheintegratedquantities. Inparticular,weareinterestedinthemarginal for X. We denote this approximate posterior as q(X), and assign it a Gaussian form (the mean andvarianceofwhichwelearnvariationally). Importantly,aseachviewismodelledbyadifferent 3 Gaussianprocess,therearetwodifferentposteriorsforthesamelatentspace: q (X ,X )= (cid:89)q (x )(cid:89)q (x )≈p(X ,X |Y(1),Y(1)), (1) A B (1) i (1) i A B A B i∈A i∈B (4) q (X ,X )= (cid:89)q (x )(cid:89)q (x )≈p(X ,X |Y(2),Y(2)), (2) A B (2) i (2) i A B A B i∈A i∈B one for each view. The agreement of the two different posteriors provides us with means to align the remaining data. There are several different possibilities to align the data and here we have exploredtwodifferentapproacheswhichbothgiveverygoodresults. Thefirstapproachismyopic and on-line with respect to one of the views, and uses only a limited “horizon”. This approach is appropriate for the scenario where we observe points sequentially in one of the views, e.g. view (2),andwishtomatcheachincomingpointtoasetintheotherview,e.g. view(1). Tosolvethis taskwefirstcomputetheposteriorforallpointsinview(1),i.e. q (X ,X )usingequation(4). (1) A B Then, foreachincomingpointy(2),i ∈ B, wecomputetheposteriorq (x ,X ,X ), whereBˆ i (2) i A Bˆ denotes all the so far obtained points from set B. Finally, we find the best matching between the modeofq (x ,X ,X )andthemodeofeachmarginalq (x ,X ),i∈B. Thatis,wepairthe (2) i A Bˆ (1) i A pointwiththeclosestmeanoftheposteriorfromtheotherview. Thiscanbeseenaprocedureby whichwemapincomingpointsy(2)toalatentpointx throughtheposteriorandthenperforminga i i nearestneighboursearchinthelatentspace. Importantly,thecomparativematchingisperformedin thesharedlatentspaceofthetwoviews,therebyignoringprivatevariationsthatareirrelevantwith respecttothealignmentoftheviews. Clearly the above approach is very fast but depends on the order by which we observe the points. Thereforewealsoexploreanonmyopicapproach,whereweobserveallthedataatthesametime. This allows us to perform a full bipartite matching by using the Hungarian method [Kuhn, 2010]. In this scenario, we compute q (X ,X ) and q (X ,X ) at the same time, from where we (1) A B (2) A B extract the marginals q (X ) and q (X ). These distributions factorize with respect to data (1) B (2) B points (equation (4)), so we can obtain N modes for each. These modes are then compared in a ∗ N ×N distancematrixD (weusetheL distance).Thisdistancematrixisgivenasanobjective ∗ ∗ ∗∗ 2 totheHungarianmethod.Asbefore,thedistancebetweenthemodesisonlycomputedintheshared dimensions for both views. Notice that under this point of view, MAD is a means of performing afullbipartitematchinviewswhicharecleanedfromnoiseandprivate(irrelevant)variations,are compressed(sothesearchismoreefficient)and,mostimportantly,areembeddedinthesamespace. Performing the bipartite match in the output space would not be possible, as these spaces are not comparable. There are many other possibilities to perform the nonmyopic matching. For example, we might know that the data have been permuted by a linear transformation and seek to find the alignment by searching for the Procrustes superimposition. In practice we have used the Hungarian method tomatchtheposteriorsandhaveobtainedbetterresultscomparedtothemyopicmethod,whilethe myopic approach is perhaps only suitable when we have thousands or millions of data points to match. Therefore,inourexperimentsweusethenonmyopicapproach. 4 Experiments Inthissectionwepresentexperimentsinsimulatedandreal-worlddatainordertodemonstratethe intuitionsbehindourmethod–manifoldalignmentdetermination(MAD)–aswellasitseffective- ness. To start with, we tested the suitability of the MRD backbone as a penalizer of mis-aligned data. Weconstructedtwo20−dimensionalsignalsY(1) andY(2) suchthattheycontainedshared and private information. We then created K versions of Y(2), each denoted by Yˆ(2). Here, k k is proportional to the degree of mis-alignment between Y(2) and Yˆ(2), which is measured by the k Kendall−τ distance. Figure 1 illustrates the results of this experiment. Clearly, the MRD is very sensitive in distinguishing the degree of mis-alignment, making it an ideal backbone for our algo- rithm. Next,weevaluateMADinsyntheticdata. WeconstructedaninputspaceX = (cid:2)X(1)X(2)X(1,2)(cid:3), whereX(1),X(2) ∈R,X(1,2) ∈R2.NoticethatXonlycontainsorthogonalcolumns(dimensions), 4 2500 2400 d o ho2300 eli k g-li2200 o al l2100 n gi2000 ar M 1900 1800 0.00 0.02 0.04 0.06 0.08 0.10 Kendall tau distance of mis-alignment (a)MRDmarginallog-likelihoodwithincreasingmis- alignments 0 0 0 10 5 20 20 10 30 15 40 40 20 50 60 60 25 30 70 80 80 35 0 20 40 60 80 0 10 20 30 40 50 60 70 80 0 5 10 15 20 25 30 35 (b) Linear mapping, fully (c) Linear mapping, (d) Nonlinear mapping, sharedgeneratingspace shared/private generating shared/private generating space space Figure1:Resultsforthetoyexperiments.Plot(a)demonstratesthesuitabilityoftheMRDobjective as a backbone of MAD. Plots (b,c,d) visualise the matrix of distances D of the inferred latent ∗∗ featuresobtainedfromY(1)andY(2).Thesmallvaluesaroundthediagonalsuggestthattheoptimal alignment found is the correct one, i.e. y(1) is always matched to y(2), where i−j = (cid:15),(cid:15) → 0. i j Eachsub-captioninthesefiguresdenotesthewayinwhichthecorrespondingviewsweregenerated forthistoyexample. createdassinusoidswithdifferentfrequency.TwooutputsignalsY(1)andY(2)werethengenerated byrandomlymappingfrompartsofXto20dimensionswiththeadditionofGaussiannoise(inboth Xandthemapping)withstandarddeviation0.1. Weexperimentedwithmappingsthatcreateonly sharedinformationinthetwooutputs(bymappingonlyfromX(1,2))andwithmappingsthatcreated privateandsharedinformation(bymappingfrom(cid:2)X(1,2)X(1)(cid:3)forY(1)andfrom(cid:2)X(1,2)X(2)(cid:3)for Y(2)). We also experimented with linear random mappings and non-linear ones. The later were implemented as a random draw from a Gaussian process and were handled by non-linear kernels inside our model. Figure (1)(b,c,d) shows the very good results obtained by our method and a quantification is also given in Table 1. The size of the initial set used as a guide to alignments is denotedbyN . init We now proceed to illustrate our approach in real data. Firstly, we considered a dataset Y(1,2) which contained 4 different human walk motions represented as a set of 59 coordinates for the skeletonjoints(subject35,motions1–4fromtheCMUmotioncapturedatabase). WecreatedY(1) and Y(2) by splitting in half the dimensions of Y(1,2). In another experiment, we selected 100 handwrittendigitsrepresentingthenumber0fromtheUSPSdatabaseandsimilarlycreatedthetwo viewsbysplittingeachimageinhalfhorizontally, sothateachviewcontained16×8pixels. The distancematrixD computedintheinferredlatentspace,aswellasquantificationofthealignment ∗∗ mismatchanddatasetinformationcanbefoundinFigure2andTable1. Themotioncapturedatais periodicinnature(duetothemultiplewalkingstrands),apatternthatsuccessfullyisdiscoveredin 5 0 0 10 50 20 100 30 40 150 0 50 100 150 0 10 20 30 40 (a) (b) Figure2: TheinferreddistancematricesD usedtoperformthealignmentsintheinputspacefor ∗∗ themotioncaptureexperiment(left)andthedigitsexperiment(right). Figure3: Matchingthetop-halfwiththe(alignedbyourmethod)bottom-halfpartsofthedigits. D ,ascanbeseeninFigure2.Forthedigitsdemowealsopresentavisualisationofthealignment, ∗∗ bymatchingeachtop-halfimagefromY(1)withitsinferredbottom-halffromY(2).Ascanbeseen, mostofthedigitsarewellalignedalongthepenstrokes. Inafewcases(e.g. thelastthreedigits), thematchingbetweentheblackandwhitestrokesseemstobeinverted,whichmeansthatalthough thealignmentisnotperfect,itisstillfollowingalearnedcontinuityconstraint. Inthesecases,this learnedconstraintissuboptimalbut,nevertheless,satisfactoryforthemajorityoftheexamples. Data N N Kendall-τ init Toy,linear,fullyshared 100 8 0.001 Toy,linear,shared/private 100 12 0.003 Toy,nonlin.,shared/private 160 48 2×10−6 MotionCapture 350 175 0.21 Digits 100 50 0.28 Table1:Kendall-τ distanceofthedataalignedbyouralgorithm.Arandommatchingwouldachieve adistanceof≈ 0.5. MuchlargerN canbeeasilyhandled, butduetotimeconstraintswedidnot performsuchexperiments(andalsoforsmallerN ). init 5 Conclusion We have presented an approach for automatic alignment of multiview data. Our key contribution istoshowthatthecombinationofafactorizedlatentspacemodelandBayesianlearningnaturally reflectsalignmentsinmultiviewdata. Ouralgorithmlearnscorrespondencesfromasmallnumber of examples allowing additional data to be aligned. In future work we will extend the approach to make use of the uncertainty in the approximate latent posterior rather than only using the first modeasinthispaper. Thefulldistributionprovidesmeansforaddingonlypairswhichthemodel arerelativelycertainaboutandupdatethemodelaccordingly. Learningalignmentsisanimportant 6 (andverychallenging)taskinmultiviewlearningthatwebelieveneedstobehighlightedinorderto stimulatefurtherresearch. Acknowledgements.Whilepursuingthiswork,A.DamianouwasfundedbytheEuropeanresearch projectEUFP7-ICT(ProjectRef612139“WYSIWYD”). References AndreasDamianou,NeilDLawrence,andCarlHenrikEk. Multi-viewlearningasanonparametric nonlinearinter-batteryfactoranalysis. arXivpreprintarXiv:1604.04939,2016. AndreasC.Damianou,CarlHenrikEk,MichalisTitsias,andNeilD.Lawrence.ManifoldRelevance Determination. InInternationalConferenceonMachineLearning,pages145–152,June2012. ArtoKlami. Bayesianobjectmatching. Machinelearning,92(2-3):225–250,2013. Arto Klami, Seppo Virtanen, and Samuel Kaski. Bayesian Canonical Correlation Analysis. 14: 965–1003,April2013. HaroldWKuhn.TheHungarianMethodfortheAssignmentProblem.50YearsofIntegerProgram- ming,(Chapter2):29–47,2010. JureLeskovec,DeepayanChakrabarti,JonKleinberg,ChristosFaloutsos,andZoubinGhahramani. Kronecker Graphs: An Approach to Modeling Networks. The Journal of Machine Learning Research,11:985–1042,March2010. RadfordMNeal. BayesianLearningforNeuralNetworks,volume8. NewYork: Springer-Verlag, 1996. LedyardRTucker. AnInter-BatteryMethodofFactoryAnalysis. Psychometrika,23,June1958. 7

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.