ebook img

Riemannian-geometry-based modeling and clustering of network-wide non-stationary time series: The brain-network case PDF

1.3 MB·
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Riemannian-geometry-based modeling and clustering of network-wide non-stationary time series: The brain-network case

1 Riemannian-geometry-based modeling and clustering of network-wide non-stationary time series: The brain-network case Konstantinos Slavakis, Senior Member, IEEE, Shiva Salsabilian, David S. Wack, Sarah F. Muldoon, Henry E. Baidoo-Williams, Jean M. Vettel, Matthew Cieslak, and Scott T. Grafton 7 Abstract—This paper advocates Riemannian multi-manifold I. INTRODUCTION 1 modeling in the context of network-wide non-stationary time- 0 seriesanalysis.Time-seriesdata,collectedsequentiallyovertime Recent advances in brain science have highlighted the need 2 and across a network, yield features which are viewed as points to view the brain as a complex network of interacting nodes n imnaonrifcollods,eatnodaduinstiionnguoifshminugltipdliespsaurbamteantiimfoeldsseorfieasRamiemouanntnsiatno across spatial and temporal scales [12], [16], [55], [69]. The a clustering multiple Riemannian submanifolds. To support the emphasisonunderstandingthebrainasanetworkhascapital- J claim that exploiting the latent Riemannian geometry behind izedonconcurrentadvancesinbrain-imagingtechnology,such 6 many statistical features of time series is beneficial to learning as electroencephalography (EEG) and functional magnetic 2 from network data, this paper focuses on brain networks and resonance imaging (fMRI), which assess brain activity by puts forth two feature-generation schemes for network-wide measuring neuronal time series [12], [58]. ] dynamic time series. The first is motivated by Granger-causality G arguments and uses an auto-regressive moving average model Clustering is the unsupervised (no data labels available) L to map low-rank linear vector subspaces, spanned by column learning process of grouping data patterns into clusters based . vectorsofappropriatelydefinedobservabilitymatrices,topoints on similarity [73]. Time-series clustering has emerged as a s into the Grassmann manifold. The second utilizes (non-linear) c prominent tool in big-data analytics because not only does dependenciesamongnetworknodesbyintroducingkernel-based [ it enable compression of high-dimensional and voluminous partialcorrelationstogeneratepointsinthemanifoldofpositive- 1 definite matrices. Capitilizing on recently developed research data, e.g., one hour of electrocardiogram data occupies 1Gb v on clustering Riemannian submanifolds, an algorithm is pro- of storage [2], but it also leads to discovery of patterns hid- 7 vided for distinguishing time series based on their geometrical den beneath network-wide time-series datasets. Indeed, data- 6 properties,revealedwithinRiemannianfeaturespaces.Extensive mining and comparison of functional connectivity patterns 7 numerical tests demonstrate that the proposed framework out- of the default-mode brain network of human subjects, i.e., 7 performs classical and state-of-the-art techniques in clustering 0 brain-network states/structures hidden beneath synthetic fMRI brain regions that remain active during resting-state periods . time series and brain-activity signals generated from real brain- in fMRI, has enhanced understanding of brain disorders such 1 network structural connectivity matrices. as the Alzheimer disease and autism [15], [33], [59], [70], 0 7 depression [32], anxiety, epilepsy and schizophrenia [14]. IndexTerms—Timeseries,(brain)networks,Riemannianman- 1 Tomotivatethefollowingdiscussion,considertheten-node ifold, clustering, ARMA model, partial correlations, kernels. : resting-state brain-network (RSBN) toy example of Fig. 1, v i with four distinct network states/structures whose evolution X overtimeisshowninFig.1a.Thosestatesareassociatedwith r thefourfunctionalconnectivitymatricesofFigs.1b–1e:nodes a K. Slavakis and S. Salsabilian are with the Dept. of Electrical Eng., Univ.atBuffalo(UB),TheStateUniversityofNewYork(SUNY),NY14260- of the same color are considered to be connected, while no 2500, USA; Emails: {kslavaki,shivasal}@buffalo.edu. Tel: +1 (716) 645- connection is established among nodes with different colors. 1012. D. S. Wack is with the Depts. of Nuclear Medicine and Biomedical For each state, connectivity matrices stay fixed. Based on the Eng., UB (SUNY); Email: [email protected]. S. F. Muldoon is with the Dept. of Mathematics and Computational and Data-Enabled Science and previous connectivity matrices, blood-oxygen-level dependent Eng. Program, UB (SUNY); Email: [email protected]. H. E. Baidoo- (BOLD) time series [51], e.g., Fig. 1f, are simulated via the Williams is with the Dept. of Mathematics, UB (SUNY), and the US Army SimTBMATLABtoolbox[4],[62],underagenerationmech- Research Laboratory, MD, USA; Email: [email protected]. J. M. Vettel is with the US Army Research Laboratory, MD, USA, the Dept. of Psy- anism detailed in Sec. V-A. Examples of features extracted chological and Brain Sciences, Univ. of California, Santa Barbara, USA, from the BOLD time series are the covariance (Figs. 1g– and the Dept. of Bioengineering, Univ. of Pennsylvania, USA; Email: 1j) and partial-correlation matrices (Figs. 1k–1n), computed [email protected]. M. Cieslak and S. T. Grafton are with Dept. of PsychologicalandBrainSciences,Univ.ofCalifornia,SantaBarbara,USA; via correlations of the time series whose time spans are set Emails:[email protected],[email protected]. equal to the time span of a single state; see Sec. III for Preliminary parts of this study can be found in [64], [65]. D. S. Wack a detailed description. For patterns to emerge, Figs. 1g–1n receives research/grant support from the William E. Mabie, DDS, and Grace S. Mabie Fund. This work is also supported by the NSF awards suggest that sample averaging of features over many time- Eager1343860and1514056,andbytheArmyResearchLaboratorythrough series realizations is needed. On the contrary, Figs. 1o–1r contract no. W911NF-10-2-0022 from the U.S. Army research office. The demonstratethatpartial-correlationmatrices,obtainedwithout content is solely the responsibility of the authors and does not necessarily representtheofficialviewsoftheU.S.Armyfundingagency. any sample averaging, do not offer much help in identifying 2 of time [17]. Such an approach has been already utilized to show that sleep states can be predicted via connectivity patterns at given times [72], and that schizophrenia can be correctly identified [22]. The previous discussion brings forth the following pressing (a)States questions: (i) Are there features that carve the latent network state/structureoutoftheobservednetwork-widetimeseries?Is it possible to extract a sequence of features from a time series to capture a possibly dynamically evolving network state, as Fig. 1 and the related discussion suggest? (ii) Is there any (b) (c) (d) (e) model that injects geometrical arguments in the feature space, and is there any way to exploit that geometry to design a learning (in particular clustering) algorithm which provides state-of-the-art performance? A. Contributions of this work (f)Single-nodeBOLDtimeseries This paper provides answers to the previous questions. Althoughtheadvocatedmethods,togetherwiththeunderlying theory, apply to any network-wide time series, this paper focuses on brain-networks. Time-series data are processed sequentially via a finite-size sliding window that moves along (g) (h) (i) (j) the time axis to extract features which monitor the possibly time-varying state/structure of the network (Fig. 2b; Secs. II and III). Two feature-extraction schemes, novel in exploiting latent Riemannian geometry within network-wide time series, are introduced. (k) (l) (m) (n) First,motivatedbyGranger-causalityarguments,whichplay a prominent role in time-series analysis [11], [21], [26], [31], an auto-regressive moving average model is proposed to extract low-rank linear vector subspaces from the columns of appropriately defined observability matrices. Such linear sub- (o) (p) (q) (r) spaces demonstrate a remarkable geometrical property: they Fig. 1. A motivating example based on synthetically generated data via are points of the Grassmannian, a well-known Riemannian the SimTB MATLAB toolbox [4], [62]. Fig. 1a shows the time profile of manifold (Sec. II). four brain-network resting states (ten nodes). For each state, the functional Second, Sec. III generalizes the popular network-analytic connectivity pattern stays fixed (Figs. 1b–1e). Fig. 1f demonstrates a single realizationofasingle-nodeBOLDtimeseries.Averagecovariance(Figs.1g– tool of “linear” partial correlations (PCs) [39] to “non-linear” 1j)andpartial-correlation(PC)(Figs.1k–1n)matricesareobtainedbysample PCs, via reproducing kernel functions (cf. Appendix A), to averaging100realizationsofthecovarianceandPCmatrices,computedfrom capture the likely non-linear dependencies among network theBOLDtimeserieswhoselengthequalsthetimespanofanetworkstate. NosampleaveragingisconsideredinthecomputationofthePCmatricesof nodes, e.g., [38]. Geometry is also prominent in Sec. III: Figs.1o–1r. Prop. 1 demonstrates that matrices generated by kernel-based PCs are points of the celebrated Riemannian manifold of the latent connectivity structure. Since multiple realizations positive-definite matrices. of BOLD time series are hard to find in practice, rather than Capitalizing on the Riemannian-geometry thread that binds associatingasinglefeaturewithanetworkstate(Figs.1o–1r), thepreviousfeature-extractionschemes,learning,inparticular it would be preferable to extract a sequence of features (x ) clustering, is performed in a Riemannian manifold M. The t t (t denotes discrete time), e.g., running averages of covariance keyhypothesis,adoptedfromtheveryrecent[79],[80],isthe matrices, to characterize a network state. This is also in Riemannian multi-manifold modeling (RMMM) assumption: accordance with recent evidence showing that brain-network each cluster constitutes a submanifold of M, and distin- resting states demonstrate dynamic attributes, e.g., [15]. In- guishing disparate time series amounts to clustering multiple deed, the usual presupposition that functional connectivity is Riemannian submanifolds; cf. Figs. 2b and 3a. This is in staticoverrelativelylargeperiodoftimeshasbeenchallenged contrast with the prevailing perception of clusters in literature in works focusing on time-varying connectivity patterns [4], as “well-concentrated” data clouds, whose convex hulls can [13], [47], [60], [85], shifting the fMRI/EEG paradigm to the be (approximately) separated by hyperplanes in the feature so-called “chronnectome” setting, where coupling within the space, a hypothesis which lies also beneath the success of brain network is dynamic, and two or more brain regions or Kmeansandvariants[73].Incontrast,RMMM,aswellasthe setsofregions,allpossiblyevolvingintime,arecoupledwith advocated clustering algorithm of Sec. IV, allow for clusters connectivestrengthsthatarealsothemselvesexplicitfunctions (submanifolds) to intersect. The extensive numerical tests of 3 Sec. V demonstrate that the proposed framework outperforms and actions in video sequences in [52]. Moreover, spectral classical and state-of-the-art techniques in clustering brain- clustering and nonlinear dimensionality reduction techniques network states/structures. wereextendedtoRiemannianmanifoldsin[27].Suchschemes arequitesuccessfulwhentheconvexhullsofclustersarewell- separated; however, they often fail when clusters intersect or B. Prior art are closely located. Clustering data-sets which demonstrate Although the majority of methods on time-series clustering low-dimensionalstructureisrecentlyaccommodatedbyunions follows the “shape-based” approach, where clustering is ap- of affine subspaces or submanifold models. Submanifolds are plied to raw time-series data [2], fewer studies have focused usuallyrestrictedtomanifoldsembeddedineitheraEuclidean onmodel/feature-basedapproaches,suchasthepresentone[2, space or the sphere. Unions of affine subspace models, a.k.a. Table 4]. Study [37] fits an auto-regressive integrated moving hybrid linear modeling (HLM) or subspace clustering, have average (ARIMA) model to non-network-wide time-series beenrecentlyattractinggrowinginterest,e.g.,[20],[42],[67], data, measures dissimilarities of patterns via the (Euclidean) [77]. There are fewer strategies for the union of submanifolds (cid:96) -distanceofcepstrumcoefficients,andappliestheKmedoids model, a.k.a. manifold clustering [5], [6], [19], [24], [29], 2 algorithm to cluster cepstum-coefficient patterns. In [28], [30], [36], [40], [68], [81]. Notwithstanding, only higher- fuzzy Cmeans is applied to vectors comprising the Pearson’s order spectral clustering and spectral local PCA are theoret- correlation coefficients of fMRI time series, under the (cid:96) - and ically guaranteed [5], [6]. Multiscale strategies for data on 2 a hyperbolic-distance metric. In [53], hierarchical clustering Riemannian manifolds were reported in [57]. The following is applied to functional connectivity matrices, comprising discussion is based on [79], [80], where tangent spaces and Pearson’s correlation coefficients of BOLD time series via angular information of submanifolds are utilized in a novel the (cid:96) -distance. Once again, the (cid:96) -distance is used in [41], way. Even of a different context, the basic principles of [57] 2 2 together with Kmeans and its sparsity-cognizant K-SVD vari- share common ground with those in [79], [80]. It is worth ant, in clustering functional connectivity matrices which are noting that a simplified version of the algorithm in Sec. IV formed by Pearson’s correlation coefficients, as well as low- offers theoretical guarantees. This paper attempts, for the first rank matrices obtained via PCA. In [4], Kmeans is applied time in the network-science literature, to exploit the first- to windowed correlation matrices, under both the (cid:96) - and (cid:96) - order (tangential) information of Riemannian submanifolds in 1 2 distances. Kmeans is also used in clustering brain electrical clustering dynamic time series. activity into microstates in [56]. In all of the previous cases, Kmeans and variants are C. Notation predicatedontheassumptionthata“clustercenter”represents Having R and Z stand for the set of all real and integer well the “spread” or variability of the data-cloud associated numbers, respectively, let R := (0,+ ) and Z := >0 >0 with each cluster. Moreover, any underlying feature-space 1,2,... 0,1,2,... =: Z . Col∞umn vectors and 0 Riemannian geometry is not exploited. This is in contrast m{atrices}are⊂de{noted by u}pright bo≥ldfaced symbols, e.g., y, with the RMMM hypothesis, advocated by this paper, where while row vectors are denoted by slanted boldfaced ones, clusters are modeled as Riemannian submanifolds, allowed to e.g., y. Vector/matrix transposition is denoted by the su- intersect and to have a “spread” which cannot be captured by perscript . Notation A ( )0 characterizes a symmetric (cid:62) (cid:31) (cid:23) a single cluster-center point. To highlight such a difference, positive (semi)definite [P(S)D] matrix. Consider a (brain) net- Kmeansunderthestandard(cid:96)2-distancewillbeemployedinall work/graphG :=(N ,E),withsetsofnodesN andedgesE. testsinSec.V.AnapplicationoftheRiemannian(Grassmann) In the case of fMRI data, nodes could be defined as (contigu- distance between low-rank matrices to detect network-state ous) voxels belonging to either anatomically defined or data- transitionsinfMRItimeseriescanbefoundin[46].However, driven regions [58]. Each node ν N is annotated by a real- ∈ Grassmmanian geometry is exploited only up to the use of valued random variable (r.v.) Y , whose realizations comprise ν the distance metric in [46], without taking advantage of the the time series associated with the νth node. Consider a rich first-order (tangential) information of submanifolds, as subgraph G = (V,E) of G, with cardinality N := V , G | | the current study offers in Sec. IV. Another line of fruitful e.g., (i) G =G; and (ii) G is a singleton G = ν , for some { } research focuses on detecting communities within brain net- node ν. Realizations y , or, a snapshot of G at the works (e.g., [54]) by utilizing powerful concepts drawn from tth time instance, are c{ollνetc}tνe∈dVinto the N 1 vector y , and G t × network/graph theory, such as modularity [50]. Due to lack formtheN T matrixY :=[y ,...,y ]overthetimespan G 1 T of space, such a community-detection route is not pursued in t 1,...,T×; cf. Fig. 2. For subgraph G, and a τ Z , w >0 ∈ { } ∈ this paper, and the related discussion is deferred to a future which represents the length of a “sliding window” that moves publication. forward along the time axis, snapshots (yτ)tτ+=τtw−1 of G are Regardingmanifoldclustering,mostofthealgorithmsstem gathered into the data matrix Y := [y ,y ,...,y ]; from schemes developed originally for Euclidean spaces. An cf. Fig. 2b. The following two setctions itntrot+du1ce twotw+aτwy−s1to extension of Kmeans to Grassmannians, with an application capture intra-network connectivity patterns and dynamics. to non-negative matrix factorization, was presented in [34]. The mean-shift algorithm was also generalized to analytic II. ARMAMODELING manifolds in [18], [71]. Geodesic distances of product mani- Motivated by Granger causality [11], [21], [26], [31], this foldswereutilizedforclusteringhumanexpressions,gestures, section provides a scheme for capturing spatio-temporal de- 4 G G0 the AR modeling case, provided that ρ NG. For example, (cid:28) G any 0 < (cid:36) [(1 + 4p2)1/2 1]/(2p) guarantees that for {yt}Tt=1 ρ:=(cid:36)N , N≤ρ+pρ2 pN2−. G G ≤ G To simplify (1), re-define z and w as the pρ 1 vec- τ τ × tors[z ,z ,...,z ] and[w ,0 ,...,0 ] ,respec- {yt0}Tt=1 tively.(cid:62)τThe(cid:62)τn−,1itcan be(cid:62)τ−epa+si1ly(cid:62)verifiedτ(cid:62)that(cid:62)thereexi(cid:62)st(cid:62)a pρ pρ × matrix A and an N pρ matrix C such that (1) is recast 0 G 0 × G0 as z =A z +w , y =C z +v . (2) (a)Network-widetimeseries τ 0 τ 1 τ τ 0 τ τ − Further, it can be verified by (2) that for any i Z , 0 GG ∈ ≥ GGGGGG yty+tτ+wτ−w1 sEeqxutreanccte xt yt+i =C0Ai0zt+(cid:88)ij=1C0A0i−jwt+j +vt+i, ytyt+1 YtYt+1 off(exatt)utres: M xt+1 mwhereZA00a:n=dIdpeρfinaendth(cid:80)e m0j=N1C0A1−0vjewctto+rj := 0. Fix now an >0 G ∈ × (b)Flowchartofthefeature-extractionscheme (cid:121) :=[y ,y ...,y ] , (3) fτ τ(cid:62) τ(cid:62)+1 τ(cid:62)+m 1 (cid:62) Fig.2. (a)SubgraphsG andG(cid:48)ofthepotentiallydifferentgraphsG andG(cid:48), − respectively. Node ν of G emanates signal yνt (realization of a stochastic where sub-script f stresses the fact that one moves forward vperocctoerssy)ta(tsndaispcsrheotetotifmGeatt.tAimlletth)o.sSeucvhalusnesapasrheotgs,atohbesreerdveidnotvheerNthGet×im1e binetvimereifiaenddthuatitliz(cid:121)es d=ataO{(ymτ)(cid:48)z}ττ+(cid:48)=+mτ−(cid:101)1 t,owdheefirneeO(cid:121)(mfτ). Iits cthane spant∈{1,2,...,T},arecollectedintomatricesY:=[y1,...,yT]and fτ τ fτ Yidnaft(cid:48)oa:r=m(Ya[tytio1(cid:48):n=,.in.[c.yl,uty,dyTe(cid:48)dt]+.inT1h,Ye..ga.on,adyltYi+s(cid:48)tτ.ow(−dbi)1stA]i)ntsglauidnisidhngeGxwtarianncddtosGwf(cid:48)esfaertoqumureesntht(ieSatelilcmys.ceo-IsIleleraicnetdss m[vCetc(cid:62)0hto-,or(rCdwe0hrAoos0eb)s(cid:62)ee,nr.vtr.ai.be,si(liCftyro0mAma0mtir−Nix1)o(cid:62)+f](cid:62)s1i,zateinldlm((cid:101)NifGτ+i×s1)dpNeρfi:n,Oefd(omars)it:h=e III)whichcanbeviewedaspointsonorclosetoaRiemanniansubmanifold G G (theRiemannianmulti-manifoldmodelinghypothesis(RMMM)[79],[80]). {0,...,m−1},aregivenby(cid:80)ij=1C0Ai0−jwt+j+vt+i.Sinc∈e (cid:101) contains zero-mean noise terms, it can be also verified fτ pendencies among network nodes. Granger causality is built that the conditional expectation of (cid:121)fτ given zτ is E{(cid:121)fτ| on a linear auto-regressive (AR) model that approximates yt zτ}=O(m)zτ. by a linear combination of the copies y p : y := (cid:80)p D y +v , for some N N {matt−rijc}ejs=1D tp , It is well-known that any change of basis z˜τ :=P−1zτ in p j=Z1 ,jant−djv istthe r.v. that quGan×tifieGs noise and{mojd}ejl=in1g the state space, where P is non-singular, renders >0 t ina∈ccuracies. High-quality estimates of the pNG2 entries of z˜τ =P−1A0Pz˜τ 1+w˜τ, yτ =C0Pz˜τ +vτ, (4) D p require a large number of training data, and thus − { j}j=1 with observation and transition matrices C˜ := C P and an abundance of computational resources, especially in cases 0 0 A˜ :=P 1A P,respectively,equivalentto(2)inthesenseof of large-scale networks. The following discussion provides 0 − 0 describing the same signal y [43, §10.6]. The observability a way to reduce the number of unknowns in the previous τ matrix of (4) satisfies O˜(m) = O(m)P. Remarkably, due to identification task by capitalizing on the low-rank arguments thenon-singularityofP,evenifO˜(m) =O(m),theircolumns of the more general (linear) auto-regressive moving average (cid:54) span the same linear subspace. (ARMA) model. ARMA models are powerful parametric tools for spatio- Given the previous ambiguity of ARMA modeling w.r.t. temporal series analysis with numerous applications in signal P, to extract features that uniquely characterize (2), it is processing, controls and machine learning [1], [43], [75]. preferable to record the column space of O(m), instead of ARMA modeling describes y via the ρ 1 (ρ N ) latent O(m) itself. To this end, notice that for small values of τ G vector zτ [43, §10.6, p. 340]: × (cid:28) pρ, it is often the case in practice to have mNG (cid:29) pρ, which renders the “tall” O(m) full-column rank, with high (cid:88)p zτ = Ajzτ j +wτ, (1a) probability. The “column space” of O(m) becomes a (pρ)- j=1 − dimensionallinearsubspaceofRmNG,orequivalently,apoint y =Cz +v , (1b) τ τ τ in the Grassmannian Gr(mN ,pρ) := all (pρ)-rank linear G where (i) (1a) is called the state and (1b) the space equation; subspacesofRmNG .Apparently,Gr(mN{G,pρ)isa(smooth) (ii) ρ is the order of the model; (iii) C RNG×ρ is the Riemannian manifo}ld of dimension pρ(mNG pρ) [23], observation and A p Rρ ρ the tra∈nsition matrices; [74]. The Grassmannian formulation removes the−previous P- { j}j=1 ⊂ × and (iv) v as well as w are realizations of zero-mean, similarity-transform ambiguity in (4): since any linear sub- τ τ white-noise random processes, uncorrelated both w.r.t. each space possesses an orthonormal basis, it can be easily verified otherandyτ.AsinARmodeling,matrices{Aj}pj=1 manifest that Gr(mNG,pρ) = {[U]|U ∈ RmNG×pρ;U(cid:62)U = Ipρ}, causality throughout the process z . The system identifica- where given the orthogonal U, point [U] Gr(mN ,pρ) t G tion problem (1) requires estimati{on}of the N ρ+pρ2 entries stands for [U] := UP P Rpρ pρ is no∈n-singular , i.e., G × ofCand A p ,whicharemanylessthanthepN2 onesin [U] gathers all base{s for|the∈column space of U. } { j}j=1 G 5 Fix now a τf Z>0 and define the mNG τf matrices Algorithm 1 Extracting features (xt)t in Gr(mNG,pρ). ∈ × Input: Data Y =[y ,...,y ]; window size τ ; ARMA- (cid:89) :=[(cid:121) ,(cid:121) ,...,(cid:121) ], (5) 1 T w fτ fτ f,τ+1 f,τ+τf−1 model order ρ, observability-matrix order m; pa- (cid:69)fτ :=[(cid:101)fτ,(cid:101)f,τ+1,...,(cid:101)f,τ+τf−1], rameters τf,τb s.t. τw ≥τf+τb+m−1. as well as the pρ×τf matrix Zτ :=[zτ,...,zτ+τf−1]. Then, Output: Sequence (xt)Tt=−1τw+1 in Gr(mNG,pρ). (cid:89)fτ =O(m)Zτ + (cid:69)fτ. (6) 1: for t=1,...,T −τw+1 do To obtain high-quality estimates of O(m) from (6), choose a 2: Consider data Yt :=[yt,...,yt+τw−1]. τb ∈Z>0, and define as in [43, §10.6] the τbNG ×1 vector 3: tFivoermly.Yf,t+τb and Yb,t+τb−1 by (5) and (7b), respec- where, as op(cid:121)pobsτe:d=to(cid:2)y(τ(cid:62)3),,yoτ(cid:62)−ne1.m..o,vyeτs(cid:62)−ττb+s1t(cid:3)e(cid:62)ps,backward(7ian) 54:: DCoefimnpeutxett:h=e [SOˆV(tDm)(]1:/=τf[)UY:f,,1t:+pρτb]Yin(cid:62)b,tG+rτ(bm−1N=G,UpρΣ).V(cid:62). b time to define (cid:121) . Let also the τ N τ matrix 6: end for bτ b G × f (cid:89) :=[(cid:121) ,(cid:121) ,...,(cid:121) ]. (7b) bτ bτ b,τ+1 b,τ+τf−1 III. KERNEL-BASEDPARTIALCORRELATIONS By (6), Partial correlation (PC) will be used as a measure of τ1f(cid:89)f,t+τb(cid:89)(cid:62)b,t+τb−1 wsimellilasruitiytedamtoongthenotdaesks,oafndGhsaisncweeiltl-dioscubmotehntiendtuimtievreiltys t+τ(cid:88)b+τf−1 =O(m)τ1fZt+τb(cid:89)(cid:62)b,t+τb−1+ τ1τf=t+τ(cid:101)b f,τ(cid:121)(cid:62)b,τ−1. (8) iYn n:e=two[yrk1-,c.o.n.n,eycTti]v,itfyorsmtudYi˜es :[=38],[y˜[13,9.].,.[,6y˜6T].] G:=iven[yd1a−ta µ,...,y µ] to remove from data the sample averages To avoid any confusion regarding time indices, it is required T (cid:80) or offsets µ−:= (1/T)(cid:80)T y . Along the lines of Sec. II, tcτwhoiamtthapnτrdnwiosτe≥isseτttefvhr+eamcttsτolbitrehs+aatwmhreτea(cid:48)−sdual1nto.dffNrτovom,τti(cid:48)a(cid:48)ctn,ehderaeflccsoroororsdtwshe-hadctiocarhrt,eτltaaimc(cid:101)ticoefo,nτrsid(cid:121)niosn(cid:62)btf,gaτny−ttoτ1s tchoenLsteiimtdey˜erνptY˜rdotefi:nl=eotoe[yf˜ttthh,eey˜νtν+tthh1,rn.too=.wd.1e,vy˜oettfc+tGoτwr−oov1fe]Yr˜ftotirm,oseormitne,toτ+twh1e∈,r.wZ..o>,r0td.+s, (cid:48) (cid:48)(cid:48) { asthunegdginevisτtti(cid:48)sa(cid:48).lthmIaftoτdtfheeliisnsagsmeatpsslteoumcbopertrileoalnragst,ieo,ynτtshiiesnlu(an1wc/oτor)rfe(cid:80)llaatregd(cid:101)ewniu(cid:121)tmhbweτrs(cid:48) τVww−he−ijre:1=s}u.VbCs\cor{niips,itjd}e.riRjaolsswtorseas{sy˜epνsatit}rhνeo∈fVfa−nciojtdftehosarmt(Yi˜t,hje)m∈atirVsixo2Yb,˜twa−ihinjie,ltde, approximate well the ensemble ones, whicfh, aτs pfrτevio(cid:62)b,uτs−ly1 after the ith y˜ a−nd jth y˜ rows are remove−dijf,rtom Y˜ . Let, it jt t stated, are zero. now,yˆ˜ andyˆ˜ betheleast-squares(LS)estimatesofy˜ and it jt it y˜jt, respectively, w.r.t. Y˜ ij,t, i.e., yˆ˜lt := y˜ltY˜†ij,tY˜ ij,t, l i,j , with denoting−the Moore-Penrose pse−udoinv−erse maMtrioxtivbaetceodmbeys a(s8)f,otlhloewess:timation task of the observability of∈a{matr}ix[10],†andY˜†ij,tY˜ ij,t standsforthe(orthogonal) projectionoperatoronto−thelin−earspanof y˜ .Upon (cid:16)Oˆ(tm),Πˆt(cid:17)∈ΠO∈∈arRRgpmρmN×Gτi×bnNpGρ(cid:13)(cid:13)(cid:13)τ1f(cid:89)f,t+τb(cid:89)(cid:62)b,t+τb−1−OΠ(cid:13)(cid:13)(cid:13)2F(9.) r(d˜iel,tfijn(cid:54)=)inwg0.r,.tt.hleV∈rei{jsii,disju}ad,lefitr˜hnleetd(:sa=asm[py3˜l9let])−PCyˆ˜lto,f{atnhνdet}ppνar∈oiVrv−iodijfednotdheast − (cid:37)ˆ :=r˜ r˜ /( r˜ r˜ ). (10) ItUhfinr SdReVnmDoNteGis×srth(1ean/dτraf)nV(cid:89)k fo,tf+Rτ(b1τ(cid:89)b/Nτ(cid:62)bGf,)t×+(cid:89)rτbf,−atr+1eτbo=(cid:89)rth(cid:62)b,otU+goτΣbn−Va1l(cid:62),m,thawetrnihceeirtses, dInefithneedcatosebewhzeeirrjoe,t.oInneootihtfe{rj(cid:62)rt˜wito,(cid:107)rr˜djsitt,}(cid:107)(cid:37)ˆ2iisj·,(cid:107)tzemjrtoe(cid:107)a,2stuhreens(cid:37)tˆhije,tcoisrraellsao- i.e.,∈U U=I =V V∈, and Σ is the r r diagonal matrix tionbetweennodesiandj,afterremovingthe“influence”that (cid:62) r (cid:62) whosediagonalelementsgather,indescen×dingorder,thenon- nodes V ij have on (i,j). Notice that the numerator in (10) tzhearot psρingularr,vtahleuecseloefbr(a1te/dτf)S(cid:89)chf,mt+idτbt-(cid:89)M(cid:62)bi,rts+kτyb−-E1c.kAarsts-Yumouinngg is Taodocta-p−vteucrteorpopsrsoidbulect,nosinn-cliener˜altr,dle∈pe{nid,ejn}c,ieasreamroowngvencotodress., ≤ theorem [10] suggests that a solution to (9) is given by andmotivatedbythesuccessofreproducingkernelfunctionsκ Oˆ(tm) = U:,1:pρ, where U:,1:pρ gathers the first pρ columns inmodelingnon-linearities(cf.AppendixA),definetheNG× of U, and Πˆt = Σ1:pρ,1:pρV:(cid:62),1:pρ. The previous procedure NG kernel matrix Kt whose (ν,ν(cid:48))th entry is oGfraesxstmraacntinniganaGser(qmueNnce,poρf)feisatusruemsm{axrtiz:e=d [iOnˆ(tAml)g]}.t1i.nTthhee [Kt]νν(cid:48) :=κ(y˜νt,y˜ν(cid:48)t). (11) G dependence of the estimate Oˆ(m) on t as well as its on-the- Further, define the following submatrices of Kt: t fly computation allow also for the application of the previous k :ith row of K w.o. ith and jth entries, ij,i t framework to dynamical ARMA models verbatim, i.e., the − k :jth row of K w.o. ith and jth entries, case where matrices A0 :=A0t and C0 :=C0t are not fixed −ij,j t K :K w.o. ith and jth rows and columns. (12) but are functions of time in (2). ij,t t − 6 Moreover, define ϕ(Y˜ ij,t) as the (NG 2) dimH vector, Algorithm 2 Extracting features (xt)t in PD(NG) wwH.hr.o(tc.sfe.ϕνA(tphy˜peenn)dtriνyx(Aν)V.∈T−Vhe−nii,js)thgieisvLethnSebeeysltei(mmc−fa.etnAetpϕ׈ϕp((ey˜yn˜idνtti)x)ooCff)ϕs(py˜aicte) IOnuptuptu:t:DaSteaqYuen=ce[y(1x,t.)Tt.=.−,1τywT+]1;wininPdDo(wNsGiz)e. τw;(cid:15)∈R>0. { νt | ∈ −ij} 1: Form Y˜ :=[y˜1,...,y˜T]:=[y1 µ,...,yT µ], where ϕˆ(y˜it)=k−(cid:62)ij,iK†−ij,tϕ(Y˜−ij,t). (13) µ:=(1/T)(cid:80)Tt=1yt. − − As in (10), upon defining the LS-residual as r˜ :=ϕ(y˜ ) 2: for t=1,...,T τw+1 do ϕˆ(y˜lt),l∈{i,j},andprovidedthatboth{κr˜iκt,lκtr˜jt}arelnton−- 3: Consider the− rows {y˜νt}Nν=G1 of Y˜t := zero, the kernel (k)PC is defined as [y˜ ,...,y˜ ]. t t+τw−1 (cid:37)ˆ := r˜ r˜ /( r˜ r˜ ). (14) 4: Construct the kernel matrix Kt by using any of the κ ij,t (cid:104)κ it | κ jt(cid:105)H (cid:107)κ it(cid:107)H ·(cid:107)κ jt(cid:107)H methodsdemonstratedinSecs.III-A1,III-A2,orIII-A3. In the case where one of {κr˜it,κr˜jt} is zero, then κ(cid:37)ˆij,t is 5: if Kt is singular then defined to be zero. 6: Re-define Kt as Kt+(cid:15)ING. Proposition 1. Define the generalized Schur complement 7: end if Kt/K−ij,t of K−ij,t in Kt as the following 2×2 matrix 8: Define xt :=(diagK−t 1)−1/2K−t 1(diagK−t 1)−1/2. (cid:104) (cid:105) K /K := [Kt]ii [Kt]ij 9: end for t −ij,t [Kt]ji [Kt]jj (cid:104) (cid:105) Then, the (i,j)th k−PCkkis−−iigjj,i,jvienKb†−yij,t[k−(cid:62)ij,i k−(cid:62)ij,j] . (15a) reHp2r)oLdMucu,ilntaignpdlkeearknneyerlnsfeeultnfcoutfniocpntoisosin{tsκiv:le}FLlw=o1er,igawhniyttshuasαsesro-dcLieafit,endeitdRcsKaenHt Sboesf κ(cid:37)ˆij,t = (cid:112) [Kt/K−ij,t]12 . (15b) {vdeurcliifi}nleg=d,1atnhdatinthdeuckeesrnaenlRfuKnHctSioHn κwh:=ich(cid:80)is{Ll=a1ll}iαnl=leκ1alr sisubrsepparcoe- [K /K ] [K /K ] t −ij,t 11· t −ij,t 22 of (cid:80)Ll=1Hl. Such a construction is beneficial in cases where If Kt is non-singular, then prior knowledge on the data does not provide information on κ(cid:37)ˆij,t = (cid:113) −[K−t 1]ij . (15c) cwheolol.siFnograedxeaqmuaptleel,ywahseinnegvleerkaenrnealdefuqnucattieonvathriaatnmceodσe2lsfodrataa [K−t 1]ii[K−t 1]jj single Gaussian kernel κσ cannot be identified, then choosing thekernelκ:=(1/L)(cid:80)L κ ,forasetofvariances σ L Proof: See Appendix C. l=1 σl { l}l=1 that cover the range of interest, alleviates the problems that a According to (15c), information about PCs is con- designer faces due to lack of prior information. tained in the positive definite (PD) matrix Γ := t 3) Semidefinite embedding (SDE): In SDE the kernel ma- (diagK−t 1)−1/2Kt−1(diagK−t 1)−1/2, where diagK−t 1 is the trix K becomes also part of the data-driven learning pro- diagonal matrix whose main diagonal coincides with that of t cess [82]. For convenience, the discussion in Appendix D K−t 1.Itiswell-knownthatthesetofallNG×NG PDmatrices, highlights SDE’s key-points, demonstrating that SDE can denoted by PD(N ), is a (smooth) Riemannian manifold of G be cast as a convex-optimization task over the set of PSD dimension N (N + 1)/2. Assuming that the dynamics of G G matrices. the network vary slowly w.r.t. time, it is conceivable that x :=Γ constitutesmooth“trajectories”inM :=PD(N ) t t G a{sinFigs.}2band3a.Ofcourse,thereareseveralotherchoices IV. CLUSTERINGALGORITHM for points xt in M, e.g., Kt or K−t 1, or the NG×NG matrix After features have been extracted from the network-wide R , whose (ν,ν )th entry is defined to be κ(y ,y ), with time series and mapped into a Riemannian feature space t (cid:48) νt ν(cid:48)t y beingtheνthrowofthedatamatrixY.Inthecasewhere (cf. Fig. 2b), clustering is performed to distinguish the dis- νt K is PSD, diagonal loading can be used to render the matrix parate time series. To this end, a very short introduction on t PD, i.e., K is re-defined as K +(cid:15)I , for some (cid:15) R . Riemannian geometry will facilitate the following discussion. t t NG ∈ >0 All the previous choices for x will be explored in Sec. V. Formoredetails,theinterestedreaderisreferredto[23],[74]. t A. Designing the kernel matrix A. Elements of manifold theory 1) Single kernel function: There are numerous choices for Consider a D-dimensional Riemannian manifold M with thereproducingkernelfunctionκ,withthemorepopularones metric g. Based on g, the (Riemannian) distance function being the linear, Gaussian, and polynomial kernels (cf. Ap- dist (x,y) between points x,y M is well-defined, and g ∈ pendix A). Since Kt is a Gram matrix, it is non-singular a geodesic is the (locally) distance-minimizing curve in M iffthe(dimH)-dimensionalvectors ϕ(y˜ ) NG arelinearly connecting x and y. Loosely speaking, geodesics generalize { νt }ν=1 independent [44]. The larger dimH is, the more likely is “straight lines” in Euclidean spaces to shortest paths in the for ϕ(y˜ ) NG to be linearly independent. This last remark “curved” M one. The RMMM hypothesis, which this paper { νt }ν=1 justifies the choice of a Gaussian kernel (yields an infinite- advocates, postulates that the acquired data-points x are t dimensional RKHS space; cf. Appendix A) in the numerical located on or “close” to K submanifolds (clusters){S }K { k}k=1 tests of Sec. V. of M, with possibly different dimensionalities. In contrast 7 Algorithm 3 Geodesic clustering by tangent spaces (GCT). Input: Manifold M; number of clusters K; dataset x ; S2 { t}t T S1 xt γ thenumberofnearestneighborsNNGNCT;distanceparame∈ter σ (defaultσ =1);angleparameterσ (defaultσ =1); xt0 eidgenvalue thdreshold η (0,1). a a M ∈ Output: Data-cluster associations. (a) Riemannian multi-manifold modeling 1: for t=1,..., T do (RMMM) | | 2: Define neighborhood TGCT [cf. (16)]. NN,t x(t0t0)0 TxtM xx(ttt) x(t0t) TxtSk x(t0t0) 34:: C((L1o9om)c.aplutsepaxr(ts(cid:48)t)e=coldoignxgt:()xItd(cid:48))e,nt∀ifty(cid:48) ∈weTiNgGNhC,tTts.{αtt(cid:48)}t(cid:48)∈T via logxt expxt Sk 5: Compute the sample correlation matrix Cˆxt by (17). M xt0 6: (LocalPCA:)Identifytheeigenvalueswhicharelarger than or equal to ηλ (Cˆ ), and call the eigenspace max xt (b)Logarithmandexponentialmap spanned by the associated eigenvalues Tˆ . xtS 7: (Angular information:) Compute the empirical TxtM x0t θtt0 TxtSk 8: endgefoodresic angles {θtt(cid:48)}t(cid:48)∈T. vtt0 TˆxtS 9: Fasorm the |T|×|T| affinity matrix W := [wtt(cid:48)](t,t(cid:48))∈T2 Sk (cid:16) (cid:17) M xt0 wtt(cid:48) :=exp(|αtt(cid:48)|+|αt(cid:48)t|)·exp −θtt(cid:48)σ+aθt(cid:48)t . (18) 10: Apply spectral clustering [78] to W to identify data- (c)Estimatingtangentspaces cluster associations. Fig. 3. Two (K = 2) submanifolds/clusters S1,S2 and their tubular neighborhoods on the Riemannian manifold M, as well as the associated denoted by Tˆ S, is associated with each point x of the exponentialandlogarithmmaps.IncontrasttoclassicalKmeans,clustersare xt t allowedheretohavenon-emptyintersection. data-set. Given a user-defined parameter NNGNCT ∈ Z>0, let the neighborhood to the prevailing hypothesis for Kmeans, clusters in RMMM (cid:40) (cid:12) (cid:41) are allowed to have non-empty intersection. To accommodate (cid:12) x is one of the NGCT TGCT := t T (cid:12) t(cid:48) NN , (16) nlioeisweitahnidn mthies-fmoolldoewliinngg eγr-rworisd,thda(tγa {xtR} ar)etucobnuslaidrenreedighto- NN,t (cid:48) ∈ (cid:12)(cid:12) nearest neighbors of xt >0 borhood x M (s,k) M ∈ 1,...,K s.t. s where closeness is measured via dist (, ), and define Cˆ as { ∈ |∃ ∈ × { } ∈ g · · xt S and dist (x,s) < γ ; see Fig. 3a. If T M denotes the “local” sample correlation matrix k g } xt the tangent space of M at xt (a D-dimensional Euclidean Cˆ := 1 (cid:88) x(t)x(t) . (17) space; see Fig. 3b), and assuming that xt is located on a xt NNGNCT−1 t(cid:48)∈TNGNC,Tt t(cid:48) t(cid:48) (cid:62) submanifold S , then T S stands for the tangent space of k xt k Moreover, let Cˆ = λ (Cˆ ) denote the spectral norm the dk-dimensional (dk < D) submanifold Sk at xt. Loosely of Cˆ as the(cid:107)mxatx(cid:107)imummeaixgenvxatlue of the PSD Cˆ . As- speaking,theexponentialmapexpxt(·)mapsaD-dimensional suminxgt that x lies close (in the Riemannian-distancextsense) tangent vector v T M to a point exp (v) M. If t S is geodesic, i.e.∈, it cxotntains the geodesicxtdefine∈d by any to submanifold Sk, estimates of the dimension dk of Sk, k or equivalently, of T S , can be obtained by identifying a two of its points, then Sk becomes the image of TxtSk under principal eigenspace xTˆt kS of Cˆ via PCA arguments. Any expxt. The functional inverse of expxt is the logarithm map method of estimating axptrincipalxetigenspace can be employed log :M T M,whichmapsx totheorigin0ofT M. cmLleuatspxtxtea(tr(cid:48)tts)/sxduetb→,nmoi.taeen.x,tithfoxelt(di(cid:48)tm)saK:g=ekonlofogwaxndt(,axttaht(cid:48)etp).ogioHnatalvxiistn(cid:48)tgvoiatchltuehsetneulromdgaabxrteaitrt-hsomeft edhieegfireenn;eveda.glpu.,aersdaemlfiaerngteeerrTˆηxtht∈aSn(a0os,r1the)eq(ucliafn.le[ta6or])sη.uλAbmsnapxail(clCueˆsxstrtpa)at,niofnonerdoafbyTuˆsxtethrSe- can be found in Fig. 3c. If l(x ,x ) denotes the (shortest) X := x (T = 1,...,T τ +1 in the context of t t(cid:48) Secs. I{I atn}dt∈ITII) into K{ groups −X wK } M s.t. points in geodesic connecting xt and xt(cid:48) in M, and upon defining the X areassociatedwiththesubman{ifokl}dk=S1.⊂NotethatifM isa tangent vector vtt(cid:48) := logxt(xt(cid:48)), standing as the “velocity” k k of l(x ,x ) at x , let the (empirical geodesic) angle θ be Euclidean space, and submanifolds are affine subspaces, then t t(cid:48) t tt(cid:48) defined as the angle between v and the estimated linear RMMM boils down to the subspace-clustering modeling [77]. subspace Tˆ S of T M. tt(cid:48) xt xt Motivated by a very recent line of research [79], [80], this B. Algorithm paper advocates the geodesic clustering by tangent spaces Since the submanifold S , that point x belongs to, is (GCT) algorithm, detailed in Alg. 3, to solve the clustering k t unknown, so is T S . To this end, an estimate of T S , task at hand. Key-points of GCT are the local sparse coding xt k xt k 8 of step 4, local PCA of step 6, and the extraction of the distances, and NGCTlogT refers to the effort of identifying NN | | angular information at step 7. Regarding the sparse-coding the NGCT nearest neighbors of x . Notice that once the NN t step, after mapping data-points x to vectors x(t) logarithm map log (x ) is computed, under complexity in the tangent space T M at x{, ta}ntd∈Tmotivated by{thte(cid:48) a}ftfi(cid:48)∈nTe (cf. Appendix B),xtthent(cid:48) = (dimM). If M is theCsloegt xt t C(cid:112)dist O geometry of TxtM (cf. Fig. 3b), “relations” between data PD(NG), then Clog = O[ (NG(NG +1)/2)3], while Clog = within neighborhood TNGNC,Tt, centered at x(tt), are captured by O(p2ρ2mNG) if M is the Grassmannian Gr(mNG,pρ). the amount that neighbors x(t) (x(t), x(t) and Step 4 of Alg. 3 requires solving the sparsity-promoting { t(cid:48) }t(cid:48)∈TNGNC,Tt\{t} t(cid:48) t(cid:48)(cid:48) optimization task of (19). Notice that due to , only x(t) in Fig. 3b, for example) contribute in the description of (cid:107) · (cid:107)2 xt(t(cid:48)t(cid:48))(cid:48) via affine combinations: itnhneelrospsrofudnuccttisonofinE(u1c9l)i,dewahnicvheecntotarsilsaraeconmecpelsesxairtyy otof ofordrmer (cid:122)(cid:13) Data-(cid:125)fi(cid:124)tterm (cid:13)(cid:123)2 Oas(mdiamll-Msca)l.eGciovnevnetxh-aotpotnimlyizNatNGioNCnTtvaescktothrsatarceaninvboelvdeedte,r(m19in)eids {αtt(cid:48)}tm(cid:48)∈iTnNGNC,Tt\{t} (cid:13)(cid:13)(cid:13)x(tt)−(cid:88)t(cid:48)∈TNGNC,T(cid:18)t\{t}αtt(cid:48)x(t(cid:48)t)(cid:13)(cid:13)(cid:13)(cid:19)2 esohfffietclhfieesntotollypveer(ilge[9etn]C.vseSccttedopersn6ootfoefththAealstga.mco3pmlienpvlceooxlvvitaeyrs)iatnbhcyeecamonaymtrpiouxftfaC-ˆttihoen-, + (cid:88) exp (cid:107)x(tt)−σdx(tt(cid:48))(cid:107)2 |αtt(cid:48)| under complexity of O[dimM +(NNGNCT)3]. Finally, to comxt- t(cid:48)∈TNGNC,Tt\{t} pute the empirical geodesic angles, O(|T|Clog+|T|dimM) (cid:124) (cid:123)(cid:122) (cid:125) operations are necessary. Spectral clustering is invoked in Sparsity-promotingterm (cid:88) step 10 of Alg. 3 on the T T affinity matrix W. Its s.to α =1, (19) | | × | | t(cid:48)∈TNGNC,Tt\{t} tt(cid:48) main computational burden is to identify K eigenvectors (K is the number of clusters) of W, which entails complexity where the constraint in (19) manifests that neighbors should of order (K T 2). To summarize, the complexity of GCT cooperate affinely to describe x(t) in the data-fit term. The O | | t is [T 2( + +dimM +K)+NGCT T logT + regularization term in (19) enforces sparsity in the previous O | | Cdist Clog NN | | | | T + T dimM + T (NGCT)3]. representation by penalizing, thus eliminating, contributions | |Csc | | | | NN from neighbors which are located far from x(t) via the weights exp( x(t) x(t) /σ ): the larger the tdistance of V. NUMERICALTESTS x(t) from x(t(cid:107)) tin −the tt(cid:48)an(cid:107)g2entdspace T M, the larger the To assess performance, the proposed GCT algorithm is pet(cid:48)naltyonthet modulusoftheaffinecoeffixctientα .Moreover, compared with the following methods: no relations are established between x(t) andtt(cid:48)data points (i) Sparse manifold clustering (SMC) [19], [24]. SMC was t x(t) which do not belong to neighborhood TGCT, introduced in [24] for clustering submanifolds within {byts(cid:48)e}ttti(cid:48)n∈gT\αTNGNC,Tt:= 0 for any t T TGCT. All informaNtNio,tn Euclidean spaces, and it was later modified in [19] for tt(cid:48) (cid:48) ∈ \ NN,t clustering submanifolds on the sphere. SMC is adapted collected in weights α and θ are gathered in the { tt(cid:48)} { tt(cid:48)} here, according to our needs, to cluster submanifolds in affinity matrix W (step 10 of Alg. 3) that is fed in any a Riemannian manifold, and still referred to as SMC. spectral clustering (SC) algorithm that provides data-cluster SMC’s basic idea is as follows: Per each data-point x, a associations. The contribution of GCT [79], [80] in clustering local neighborhood is mapped to the tangent space T M on Riemannian surfaces is the novel way of extraction and x bythelogarithmmap(cf.step3ofAlg.3),andasparse- incorporation of the angular information θ in an SC { tt(cid:48)} coding task (cf. step 4 of Alg. 3) is solved in T M to affinity matrix. A performance analysis, with guarantees on x provide weights for an SC similarity matrix. the clustering accuracy and the number of mis-classified data- (ii) Spectralclustering[78]equippedwithRiemannianmetric points, has been already provided for a simplified version of (SCR) of [27]. SCR [27] utilizes SC under the weighted GCT, where submanifolds are considered to be “geodesic,” affinity matrix [W] := exp[ dist2(x ,x )/(2σ2)], justifying thus the name GCT, the sparse-coding scheme of tt(cid:48) − g t t(cid:48) where the Riemannian distance metric dist (, ) is used step 4 in Alg. 3 is not employed, and the affinity matrix of g · · to quantify affinity among data-points [27]. step 10 becomes a binary one, with entries either 1 or 0, (iii) Kmeans, where data lying in the Riemannian manifold depending on whether conditions on the dimensions of the are embedded into a Euclidean space, and then the estimated tangent subspaces, the angular information θ { tt(cid:48)} classical Kmeans, under the classical (Euclidean) (cid:96) - and the Riemannian distance between data-points are satisfied 2 distance metric, is applied to the embedded dataset. In or not [79]. particular, Grassmannian manifolds are embedded into Euclidean spaces by the isometric embedding [8], [45], C. Computational complexity andPD(NG)isembeddedintoRNG(NG+1)/2 byvectoriz- A major part of GCT computations take place within the ing the triangular upper part of the elements of PD(NG). neighborhoodTGCT.ThecomplexityforcomputingtheNGCT This set of tests stands as a representative of all schemes NN,t NN (typically 100 in all numerical tests) nearest neighbors that do not exploit the underlying Riemannian geometry, of x is (≤T + NGCTlogT ), where denotes the as detailed in Sec. I-B. t | |Cdist NN | | Cdist cost of computing the Riemannian distance between any two Unlike GCT, none of the previous methods utilizes the un- points, T refers to the complexity of computing T 1 derlying submanifold tangential information (Kmeans is even dist | |C | |− 9 Riemannian-geometry agnostic). In contrast to the prevailing 1 hypothesis of Kmeans and variants, that clusters are not 0.9 closely located to each other, RMMM allows for non-empty 0.8 aeinxctpcTeuerhsrreieamccgtyeir,onodntu,esnafidonn-fdetrdsauustabshsems“lsaa(mn#bieeflonosltfdiopssfod(icocnflnt.useFswtviegiirat.sht3hacaerl)ue.nsotaetviroalinalabobefllecslueinqstueearalinctohg Clustering Accuracy0000....4567 0.7328CCoorvr0.7160.58250.65050.6063 0.46430.67340.50510.39840.7829 0.72440.71420.6850.68320.6926 0.7170.55190.7620.85750.867 ICov the ground-truth ones) / (# of total points).” Signal-to-noise 0.3 POCB ratio (SNR) is set to be 10dB for all experiments. Tests are SCR SMC Kmeans GCT run for a number of 50 realizations, and average clustering Fig.4. Linearkernel:τw=50;NNGNCD=12. accuracies, as well as standard deviations, are depicted in the subsequent figures. 1 0.9 A. Synthetically generated time series 0.8 tahcecTroehmisaprlesisehcuetpidonttohrrtoehufregerhes ttthaosektsch/oeeovpesenerttasti/tnmiogondouoflfeFsniogtdh.east1..nEePaeecdrhtsontoatdbeee, Clustering Accuracy000...567 0.9491Corr0.91880.93880.85850.7489 0.72070.71640.71860.57560.9246 0.90530.88290.78060.87260.8092 0.95550.7550.95720.97110.9762 contributestoaspecifictaskbysharingacommonsignalwith 0.4 Cov ICov other nodes assigned to the same task. Nodes that share a 0.3 POCB common task are considered to be connected to each other. SCR SMC Kmeans GCT Per node, the previous common signal is linearly combined Fig.5. Linearkernel:τw=70;NNGNCT=12. with a signal characteristic of the node, and with a first- order auto-regressive (AR) process, with time-varying AR 1 coefficient, contributing to the dynamics of the task-specific 0.9 signal. The AR signal is described by the recursion y := νt,AR 0.8 cuaanlosledsrtθh-utden·eiypfitν-rnve(etva−drioi1pa)una,sArcRaetim+mneoe√trems1resa−rθile0crs.oavisn.s,2dfiaθ∆nlttdeθ·rv.θettTd,:hwb=eyhlθeittnrh−eee1avmrt+coios∆dmeaθlbz,oiefnfroaor[t-6imos2on]emaotonef Clustering Accuracy000...567 0.9443Corr0.95520.89020.94590.8511 0.88730.64450.31440.72870.9513 0.92250.91210.95520.95520.8247 0.98470.87290.81050.98880.9811 0.4 Cov yield the BOLD data {yt}Tt=1. 0.3 IPOCCBov Regarding Alg. 1 of Sec. II, parameters are set as follows: SCR SMC Kmeans GCT NG := 10, m = 3, p = 1, ρ = 3, τf = 20, τb = 20, Fig.6. Linearkernel:τw=80;NNGNCT=8. and τ 50,70,80 . Results pertaining to the observability- w ∈{ } matrix features of Sec. II are denoted by the “OB” tag in the 1 legents of all subsequent figures. 0.9 Regarding the methodology of Sec. III, several features 0.8 ovarerafefleuAreeexslng:pc.l(eoi2)rte,o(ddd(iie1ann1go)tKt,heedp−to1ni)bun−ymt1e/xt2rhtiKeca∈−tlta1tPg(edDsit“(askN.gPMGKC)”o−trt1eai)nk−sep1st/eh2ctehififerscouafmbollslyles,oqtwewupieintng8ht Clustering Accuracy000...567 0.9865Corr0.99650.87730.92870.8321 0.87060.64010.36760.70320.947 0.99110.89460.89780.93790.8689 0.99340.87170.85590.99420.9968 0.4 Cov figures; (ii) K from step 6 of Alg. 2, denoted by tag ICov “Cov”; (iii) K−tt1, denoted by tag “ICov”; and (iv) Λt, where 0.3 POCB SCR SMC Kmeans GCT [Λ ] := κ(y ,y ), with y NG being the rows of Yt:ν=ν(cid:48)[y ,y ,ν.t..,νy(cid:48)t ], a{ndνtd}eνn=o1ted by tag “Corr”. Fig.7. Linearkernel:τw=80;NNGNCT=12. t t t+1 t+τw−1 Constructing a reproducing kernel function κ, or the se- perform. However, GCT exhibits the best performance quence of kernel matrices Kt in step 4 of Alg. 2, plays a evenforsmallvaluesofthoseparameters,particularlyfor { } principal role in the methodology of Sec. III. To this end and the advocated features of kPC and observability matrices alongthelinesofSec.III-A,fourwaysofdesigningthekernel (“OB”).Further,focusingonthesetwofeatures,itcanbe matrices are explored: seen that “OB” outperforms kPC in almost all scenarios. (i) Linear kernel function: By choosing κ of Appendix A (ii) Single Gaussian kernel function: The Gaussian (repro- l as the kernel function, the feature space H becomes ducing) kernel function κ of Appendix A is used here, σ nothingbuttheinputEuclideanRτw one,withκl(y,y(cid:48))= with variance values σ2 0.5,1,2 . As Appendix A yy(cid:48)(cid:62), for any y,y(cid:48) Rτw. As such, the previously met suggests, the feature spa∈ce{Hσ bec}omes an infinite- ∈ K and Λ become the classical covariance and correla- dimensional functional space. Notice that the “OB” tag t t tionmatrices,respectively.AsFigs.4–8demonstrate,the is not included in Figs. 9–14, since the methodology largerthevaluesoftheslidingwindowτ andthenumber of Sec. II does not include any kernel-based arguments. w of nearest neighbors NGCD are, the better all methods As the relevant figures demonstrate, all methods appear NN 10 1 1 0.9 0.9 Clustering Accuracy00000.....45678 0.9803CCICooorvvr0.97810.83120.92870.8468 0.85460.66810.34890.68620.9646 0.92340.91330.89510.95720.8778 0.99540.90730.869610.9949 Clustering Accuracy00000.....45678 CCIk0.426CPoooCrvvr0.428 0.3963 0.4261 0.4825 0.4822 0.4846 0.489 0.3695 0.3602 0.3625 0.3619 0.7806 0.7807 0.7806 0.7808 0.3 POCB 0.3 SCR SMC Kmeans GCT SCR SMC Kmeans GCT Fig.8. Linearkernel:τw=80;NNGNCT=16. Fig.12. SingleGaussiankernel:σ2=0.5;τw=80;NNGNCT=16. 1 1 0.9 0.9 Clustering Accuracy000000......345678 CCIk0.3797CPoooCrvvr0.429 0.4281 0.3796 0.4802 0.483 0.4836 0.4874 0.3684 0.3606 0.3729 0.3665 0.5819 0.5821 0.582 0.5822 Clustering Accuracy000000......345678 CCIk0.7223CPoooCrvvr0.747 0.7273 0.7352 0.6698 0.673 0.465 0.6593 0.745 0.7366 0.7156 0.747 0.965 0.965 0.968 0.9683 SCR SMC Kmeans GCT SCR SMC Kmeans GCT Fig.9. SingleGaussiankernel:σ2=0.5;τw=50;NNGNCT=16. Fig.13. SingleGaussiankernel:σ2=1;τw=80;NNGNCT=16. 1 1 0.9 0.9 Clustering Accuracy000000......345678 CCIk0.587CPoooCrvvr0.5449 0.5567 0.5965 0.6049 0.6254 0.2923 0.7455 0.5719 0.5977 0.5899 0.5881 0.7336 0.7336 0.7455 0.746 Clustering Accuracy000000......345678 CCIk0.9689CPoooCrvvr0.9763 0.9174 0.9633 0.5754 0.8268 0.3084 0.5594 0.778 0.7422 0.8309 0.9181 0.9947 0.9942 0.9851 0.9998 SCR SMC Kmeans GCT SCR SMC Kmeans GCT Fig.10. SingleGaussiankernel:σ2=1;τw=50;NNGNCT=16. Fig.14. SingleGaussiankernel:σ2=2;τw=80;NNGNCT=16. 1 1 0.9 0.9 Clustering Accuracy000000......345678 CCIk0.6657CPoooCrvvr0.6657 0.5857 0.711 0.8925 0.6318 0.3167 0.8612 0.613 0.6559 0.6823 0.7535 0.846 0.8478 0.7793 0.9295 Clustering Accuracy000000......345678 CCIk0.691CPoooCrvvr0.6929 0.5825 0.7087 0.8481 0.6536 0.4488 0.8424 0.6409 0.6441 0.68 0.6667 0.8015 0.7992 0.8014 0.9059 SCR SMC Kmeans GCT SCR SMC Kmeans GCT Fig.11. SingleGaussiankernel:σ2=2;τw=50;NNGNCT=16. Fig.15. Multi-kernel:τw=50;NNGNCT=12. to be sensitive to the choice of the kernel’s variance κ is a reproducing kernel (cf. Appendix A). Moreover, value: the less the value is, the worse the clustering the resulting feature space H is an infinite-dimensional accuracies become. Still, under such a uniform behavior, functionalspace.Needlesstosaythattherearenumerous GCT exhibits the best performance among employed ways of defining similar multi-kernel functions, such as methods. the incorporation of polynomial or linear kernels in κ. (iii) Multi-kernel function: As Figs. 9–14 demonstrate, the Since this study is not meant to be exhaustive, such choice of the value of variance of a Gaussian kernel a path is not pursued. Figs. 15–19 show results for hinders the clustering-accuracy performance of all em- severalvaluesofsliding-windowlengthτ andNGCT.As w NN ployed techniques. To this end, a multi-kernel-function expected, multi-kernel functions enhance performance of approach is adopted here to robustify all methods: κ := all methods, with GCT exhibiting the best performance (1/I)(cid:80)I κ , where the values of variances cover the among employed techniques. i=1 σi wide range σ 0.25+0.01(i 1) I , with I :=376, (iv) SDE: Here, the kernel function is designed via the data- i ∈{ − }i=1 σ = 0.25, and σ = 4. It can be easily verified that driven approach of Sec. III-A3, and results are demon- 1 I

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.