NONLINEARBLINDSOURCESEPARATION 1 Nonlinear Blind Source Separation Using Sensor-Independent Signal Representations David N. Levin Abstract—Consider a time series of signal measurements the measurement time series into a time series of separable x(t), having components xk for k = 1,2,...,N. This paper states: shows how to determine if these signals are equal to linear 6 s(t)=f[x(t)], (1) or nonlinear mixtures of the state variables of two or more 1 statistically-independent subsystems. First, the local distribution 0 where s(t) denotes a set of state components, s (t) for k = of measurement velocities (x˙) is processed in order to derive N k 2 1,2,...,N, which can be partitioned into two or more sta- local vectors at each x. If the data are separable, each of these n vectors is directed along a subspace traversed by varying the tistically independent groups. Note that the mixing function a state variable of one subsystem, while all other subsystems are defines a transformation between two coordinate systems on J kept constant. Because of this property, these vectors can be state space: the coordinate system (x) defined by the choice 3 usedtodetermineifthedataareseparable,and,iftheyare,x(t) of measurements and the coordinate system (s) of separable 1 can be transformed into a separable coordinate system in order state components. to recover the time series of the independent subsystems. The ] method is illustrated by using it to blindly recover the separate The method proposed in this paper utilizes a criterion for E utterancesof twospeakersfromnonlinearcombinationsof their statisticalindependencethatdiffersfromtheconventionalone. M waveforms. Let ρ (s) be the probability density function (PDF), defined S . Index Terms—blind source separation, nonlinear signal pro- so that ρS(s)ds is the fraction of total time that the trajectory at cessing, invariants, sensor s(t) is located within the volume element ds at location s. In t theusualformulationoftheBSSproblem,themixingfunction s [ I. INTRODUCTION musttransformthe measurementsso thatρS(s) isthe product of the density functions of individual components (or groups 1 The signals from a process of interest are often contami- of components) v nated by signals from ”noise” processes, which are thought 0 to be statistically independent of the process of interest but ρ (s)= ρ (s ), (2) 1 S Y a (a) 4 are otherwise unknown. This raises the question: can one use a=1,2,... 3 thesignalsto determineif twoor moreindependentprocesses where s is a subsystem state variable, comprised of one 0 are present, and, if so, can one derive a representation of the (a) 1. evolution of each of them? In other words, if a system is or more of the components sk. In every formulation of BSS, multiple solutionscan be created by permutingthe subsystem 0 effectively evolving in a closed box, can one “explore” the state variables and/or transforming their components. How- 6 signals emanating from the box in order to learn the number 1 ever, the criterion in (2) is so weak that it suffers from a and nature of the subsystems within it? There is a variety : much worse non-uniqueness problem: namely, solutions can v of methods for solving this blind source separation (BSS) be created by mixing the state variables of other solutions i problem for the special case in which the signals are linearly X (see [3] and references therein). related to the system states. However, some observed signals r In this paper, the issue of non-uniqueness is circum- a (e.g., from biologicalor economic systems) may be nonlinear vented by considering the data’s trajectory in (s,s˙)-space functions of the underlying system states, and computational (s˙ = ds/dt), instead of s-space (i.e., state space). First, let methods for performing nonlinear BSS are limited ( [1], [2]), ρ (s,s˙)bethePDFinthisspace,definedsothatρ (s,s˙)dsds˙ even though humans can often perform it quite effortlessly. S S is the fraction of total time that the location and velocity of Consideranevolvingphysicalsystemthatisbeingobserved s(t) are within the volume element dsds˙ at location (s,s˙). by making time-dependent measurements, x (t) for k = k Asinanearlierpaper[4],themixingfunctionmusttransform 1,2,...,N, which are invertibly related to the system’s state themeasurementssothatρ (s,s˙)istheproductofthedensity variables. In Conclusion, we describe how to choose mea- S functionsofindividualcomponents(orgroupsofcomponents) surements that have this invertibility property. The objective of BSS is to determine if the measurements are mixtures ρ (s,s˙)= ρ (s ,s˙ ). (3) S Y a (a) (a) of the state variables of statistically independent subsytems. a=1,2,... Specifically,wewanttoknowifthereisaninvertible,possibly Separability in (s,s˙)-space is a stronger requirement than nonlinear,N-component”mixing”function,f,thattransforms separability in state space. To see this, note that (2) can be D. Levin is with the Department of Radiology and the Committee on recovered by integrating both sides of (3) over all velocities, Medical Physics, University of Chicago. Mailing address: 1310 N. Ritchie but the latter equation cannot be deduced from the former Ct., Unit 26 AD, Chicago, IL 60610. Telephone: 312-753-9274. Fax: 312- one. In fact, it can be shown that (3) is strong enough to 482-8624.Email:[email protected] guarantee that the BSS problem has a unique solution, up to 2 NONLINEARBLINDSOURCESEPARATION permutationsof subsystemstate variablesandtransformations can determinewhetherthe data are separableby seeing if any of their components ( [4]), and that is why it is studied here. of these functions transforms the data’s density function (or Because this paper utilizes the separability criterion in (3), correlations)intoafactorizableform.Ifthedataareseparable, it is fundamentallydifferentfrom mostother BSS techniques, they can be transformed into the separable coordinate system which utilize the weaker criterion in (2). Furthermore, the in order to recover the time course of each subsystem (up newmethodexploitsstatisticalconstraintsonthemeasurement to an arbitrary transformations on the state space of each velocities in each local region of state space. In contrast, subsystem). existing methods of using velocity information utilize weaker The first step is to construct second-order and fourth-order constraints on the global distribution of measurement veloci- local correlations of the data’s velocity ties ( [5]). C (x)= h(x˙ −x¯˙ )(x˙ −x¯˙ )i (4) kl k k l l x The new method should be compared to two earlier tech- niquesof performingBSS accordingto the criterionin (3). In Cklmn(x)= h(x˙k−x¯˙k)(x˙l−x¯˙l) (5) [4]itwasshownthatthelocalsecond-ordercorrelationmatrix (x˙ −x¯˙ )(x˙ −x¯˙ )i m m n n x of measurementvelocity can be taken to define a Riemannian where x¯˙ = hx˙i , where the bracket denotes the time average metric on the space of measurements (x). Nonlinear BSS can x over the trajectory’s segments in a small neighborhood of x, then be performed by finding the transformation to another and where all indices are integers between 1 and N. Because (s) coordinate system, in which this metric is block-diagonal x˙ is a contravariant vector, C (x) and C (x) are local everywhere.In order to constructthis new coordinate system, kl klmn contravariant tensors of second and fourth rank, respectively. it is necessary to compute the metric’s first and second The definition of the PDF implies that C (x) and C (x) derivatives with respect to x. This approach suffers from a kl klmn are two of its moments; e.g., practical difficulty: namely, a great deal of data is required to cover the measurement manifold densely enough in order to ρ(x,x˙)(x˙k−x¯˙k)(x˙l−x¯˙l)...dx˙ C (x)= R , (6) calculate these derivatives accurately. In contrast, the method kl... ρ(x,x˙)dx˙ R proposedinthispaperonlydependsonthecomputationofthe where ρ(x,x˙) is the PDF in the x coordinate system, where secondandfourth-ordercorrelationsofthelocalx˙ distribution. “...” denotes possible additional indices on the left side and This can be done with much less data. corresponding factors of x˙ −x¯˙ on the right side, and where Reference [6] describes a second way of performing non- all indices are integers between 1 and N. Although (6) is linear BSS according to the criterion in (3). Higher-order useful in a formal sense, in practical applications all required local correlations of the data’s velocity (typically, at least correlationfunctionscanbecomputeddirectlyfromlocaltime fifth-order correlations) are used to compute multiple scalar averagesofthedata(i.e.,(4)-(5)),withoutexplicitlycomputing quantities (typically, at least six scalars) at each point x. A the data’s PDF. Also, note that velocity “correlations” with a necessaryconsequenceofseparabilityisthatcertainsubgroups single subscript vanish identically of these scalars must map x-space onto a subspace of lower dimensions. Typically, this approach requires more data than Ck(x)=0. (7) the proposed technique because it is necessary to compute Next, let M(x) be any local N ×N matrix, and use it to local velocity correlations of order greater than four. define M-transformed velocity correlations The next section describes how to determine if the data are separable and, if so, how to recover a representation Ikl(x)= X Mkk′(x)Mll′(x)Ck′l′(x), (8) of the evolution of each independent subsystem. Section III 1≤k′,l′≤N illustrates the method by using it to recover the utterances of two speakers from unknown nonlinear mixtures of their Iklmn(x)= X Mkk′(x)Mll′(x) 1≤k′,l′,m′,n′≤N waveforms. The last section discusses the implications of this approach. Mmm′(x)Mnn′(x)Ck′l′m′n′(x). (9) Because C (x) is genericallypositivedefinite at anypointx, II. METHOD kl it is possible to find a particular form of M(x) that satisfies This paragraph and Figure 1 provide a brief description I (x)=δ (10) of the proposed BSS procedure. The procedure is initiated kl kl by using the local distribution of measurement velocities to I (x)=D (x), (11) constructN localvectors(V (x) for i=1,2,...,N)ateach X klmm kl (i) 1≤m≤N pointx.Ifthedataareseparable,thereisaspecialscoordinate where D(x) is a diagonal NxN matrix. Such an M(x) system in which (3) is true. In that coordinate system it is can be constructed from the product of three matrices: 1) evidentthateachvectorisdirectedalongthesubspacecreated a rotation that diagonalizes C (x), 2) a diagonal rescaling by varying one subsystem’s state variables, while all other kl matrix that transforms this diagonalized correlation into the subsystems are held constant. Because of this property, these identity matrix, 3) another rotation that diagonalizes vectors can be used to construct a finite set of functions on x-space, and one of those functions must transform the data C˜ (x), X klmm to a separable coordinate system if it exists. Therefore, we 1≤m≤N LEVIN:BLINDSOURCESEPARATION 3 Measured signal time series x(t) Compute ( )and ( ) Compute(cid:1829) (cid:3038)M(cid:3039)((cid:1876))and(cid:1829) (cid:3038)(cid:3039)(cid:3040)((cid:3041))(cid:1876) Compu(cid:1876)te ( )(cid:1848)(cid:3036) (cid:1876) Can ( )be partitioned into two m(cid:1875)u(cid:3036)t(cid:1872)ually uncorrelated groups? No Data are Yes inseparable (cid:1875)(cid:3036) (cid:1872) Use to construct function u associated with each partitioning (cid:1848)(cid:3036) (cid:1876) No Does a u transform the density function Data are into factorizableform? inseparable (cid:1876) Yes Data are separable and the components of u[x(t)]describe the evolution of separate subsystems Fig.1. Theproposedmethodofnonlinear blindsourceseparation where C˜ (x) is the fourth-order velocity correlation system in which they are computed (except for possible klmn (C (x)) after it has been transformed by the first rotation permutations and/or reflections). In this sense, the time se- klmn and the rescaling matrix. As long as D is not degenerate, ries of weights comprises an invariant or coordinate-system- M(x) is unique, up to arbitrary local permutations and/or independent representation of the system’s velocity. reflections. In almost all applications of interest, the velocity Now, let’s imagine the computation of these weights in the correlations will be continuous functions of x. Therefore, separable (s) coordinate system, in which the components of in any neighborhood of state space, there will always be a s have been partitioned into disjoint blocks corresponding to continuoussolution for M(x), and this solution is unique, up (possibly multidimensional) subsystem state variables. In this to arbitrary global permutations and/or reflections. coordinate system, correlations C (s), having all indices Skl... Inanyothercoordinatesystemx′,themostgeneralsolution in block a, are the correlations between the components of for M′ is given by state variable s . Next, construct the block diagonal matrix (a) M′ (x′)= P M (x)∂xn(x′), (12) MS(s) kl X km mn ∂x′ M (s ) 0 ... 1≤m,n≤N l S1 (1) 0 M (s ) ... where M is a matrix that satisfies (10) and (11) in the x MS(s)= S2 (2) . (14) coordinate system and where P is a product of permutation, ... ... ... reflection,andidentitymatrices.Thiscanbeprovenbysubsti- where submatrix M satisfies (10) and (11) for correlations tutingthisequationintothedefinitionofI′ (x′)andI′ (x′) Sa kl klmn between components of s . It is not difficult to show that and by noting that these quantities satisfy (10) and (11) in (a) M satisfies (10)and(11)in thes coordinatesystem andthat the x′ coordinate system because (8)-(9) satisfy them in the S it is unique, up to global permutations and/or reflections. To x coordinate system. Note that, by construction, M is not see this, first note that (6), (3), and (7) imply that velocity singular, and, therefore, it has a non-singular inverse. correlations in the s coordinate system vanish if their indices Noticethat(12)showsthattherowsofM transformaslocal contain a solitary index from any one block. It follows that covariantvectors,uptoglobalpermutationsand/orreflections. C (s)is blockdiagonalandthat(14) satisfiestheconstraint Skl Likewise, the same equation implies that the columns of (10), because each block of M is defined to transform the M−1 transform as local contravariant vectors (denoted as S correspondingblock of C into an identity matrix. In order Skl V (x) for i=1,2,...,N),up to globalpermutationsand/or (i) toprovethat(14)satisfies(11),substituteitintothedefinition reflections.Becausethesevectorsarelinearlyindependent,the of measurement velocity at each time, x˙(t), can be represented I . (15) by a weighted superposition of them X Sklmm 1≤m≤N x˙(t)= X wi(t)V(i), (13) Then, note that: 1) when k and l belong to different blocks, 1≤i≤N each term in this sum vanishes because it factorizes into a where w are time-dependent weights. Because x˙ and V product of correlations, one of which has a single index; 2) i (i) transformas contravariantvectors,the weightsw must trans- when k and l belongto the same block and are unequal,each i form as scalars; i.e., they are independent of the coordinate termwithminanyotherblockcontainsafactorequaltoI , Skl 4 NONLINEARBLINDSOURCESEPARATION which vanishes as proved above; 3) when k and l belong to If the data are separable, the weight components can be the same block and are unequal, the sum over m in the same partitionedintotwogroupscorrespondingtoindependent(pos- blockvanishes,becauseeachblockofM isdefinedtosatisfy sibly multidimensional) subsystems, and these groups will be S (11) for the corresponding subsystem. mutuallyuncorrelated.Therefore,thenextstepistodetermine After transforming(13) into the s coordinatesystem, it has if the weight components can be so partitioned. In principle, the form these groups should be required to factorize the density s˙(t)= w (t)P V , (16) function of weights in w-space or (w,w˙)-space. In practice, X i ij S(j) 1≤i,j≤N one could simply look for groups of weight components that factorize lower order weight correlations. If it is found that where V is V in the s coordinate system and P is a S(j) (j) theweightcomponentscannotbepartitionedintotwomutually possible permutation and/or reflection. Note that V is the S(i) ith column of M−1. In other words, the V are the local uncorrelatedgroups,thedataareinseparable;i.e.,thereisonly S S(i) one subsystem (the system itself). On the other hand, if the vectors, which are derived from the local distribution of s˙ weight components can be so partitioned, the measurements in the same way that the V were derived from the local (i) may or may not be separable. In the following paragraph: 1) distributionofx˙.So,(16)showsthattheweightsthatrepresent it is assumed that there are one or more ways to partition s˙ arethesame(uptoapossiblepermutationand/orreflection) the weights in this manner; 2) the vectors V are used to as those that represent x˙. (i) derive a function corresponding to each partitioning; 3) it is Observe that each vector V vanishes except where it S(i) passesthroughoneoftheblocksofM−1.Therefore,equation shown that at least one of these must be a transformation to S a separable coordinate system, if it exists. In principle, each (16) is equivalent to a group of equations, which are formed of these functionscan then be used to transform x(t) into the byprojectingitontoeachblockcorrespondingtoasubsystem correspondingcoordinatesystem,andthefactorizabilityofthe state variable.For example,projectingbothsides of (16) onto density function of the transformed data can be determined. block a gives the result In practice, it may suffice to determine the factorizability of s˙ (t)= w (t)P V . (17) lowerordercorrelationsofthetransformeddata.Inanyevent, (a) X i ij S(ja) 1≤i≤N the measurements are separable if and only if at least one of j∈blocka thecomputedfunctionsmapsthedata’sdensityfunctionintoa Here, V is the projection of V onto block a; i.e., it factorizableform.Ifthedataarefoundtobeseparableintotwo S(ja) S(j) is the column of M−1 that coincideswith column j of M−1, subsystems, the transformed data consists of two time series, Sa S as it passes through block a. This means that the vectors eachonedescribingtheevolutionofonesubsystem.Then,the V , for j ∈ blocka, are the local vectors on the s above methodology can be applied to the time series of each S(ja) (a) manifold, which are derived from the local distribution of subsystem in order to determine if it is separable into even s˙ in the same way that the V were derived from the smaller independent subsystems. (a) (i) localdistributionofx˙.Therefore,(17)showsthateachweight As described above, in this paragraph, it is assumed that w (t), appearingin the invariantrepresentationofx˙, is oneof thereareoneormorewaystopartitiontheweightcomponents i theweightsintheinvariantrepresentationofthevelocityofan intotwouncorrelatedgroups.Then,thisinformationisusedto isolated subsystem. Notice that each time-dependent weight, derivea setoffunctions,whichmustincludea transformation w (t), reflectsthe evolutionof justone subsystem;it doesnot to a separable coordinate system, if it exists. To do this, i containinformationabouttheevolutionofseveralsubsystems. suppose that the weights have been partitioned into two Equation(17) alsoshowsthatthetime-dependentweightscan groups (having N and N elements), which correspond to 1 2 be partitioned into subsets such as two independent subsystems. Because of the block-diagonal structure of M (see (14)), the vectors associated with either S w P (18) X i ij one of these groups are directed along subspaces created by 1≤i≤N varying the corresponding subsystem’s state variable, while for j ∈blocka, each of which describes the canonical repre- holding all other subsystems constant. These subspaces can sentation of one subsystem’s velocity. Each of these subsets be used to construct the mapping between the x coordinate of time-dependent weights is statistically independent of the system and a separable coordinate system on state space. othersubsets,andtheweightswithinanyofthesesubsetswill First, choose some point x , and use the vectors in group 0 usually be correlated with one another. 1 to construct an N -dimensional subspace (called the ”first 1 Because of the block-diagonalityof M in the s coordinate subspaceoftype1”),whichcontainspointsthatcanbereached system, it is apparent that each vector V is directed along by starting at x and then varying s , the state variable S(i) 0 (1) a subspace traversed by varying the state variables of one of subsystem 1. Impose on this subspace some smoothly subsystem, while all other subsystems are held constant. It varying coordinates u (having components u for a = (1) (1)a follows that the vectors V are directed along the images of 1,2,...,N ). Next, at each point in this subspace, use the (i) 1 thosesubspacesinthexcoordinatesystem,andthispropertyis vectors in group 2 to construct an N -dimensional subspace, 2 heavilyexploitedbelow.ItisinterestingthattheV wouldnot which contains points that can be reached by starting at this (i) have this important property if the definition of M (see (10) point and then varying s , the state variable of subsystem (2) and (11)) was changed by replacing I with 2. Each point in each one of these subspaces is assigned P1≤m≤N klmm higher order correlations (e.g., I ). a constant value of u : namely, the value of u at the P1≤m,n≤N klmmnn (1) (1) LEVIN:BLINDSOURCESEPARATION 5 ”starting point” in the first subspace of type 1. In the same each sampled time, was sorted into a 16 × 16 array of bins. way, construct an N -dimensional subspace (called the ”first Then, the x˙ distribution in each bin was used to compute 2 subspaceoftype2),whichcontainspointsthatcanbereached local velocity correlations (see (4) and (5)), and these were by starting at x and then varying s . Then, impose on used to derive M and V for each bin. Figure 3c shows 0 (2) (i) this subspace some smoothly varying coordinate system u these local vectors at each point. Finally, (13) was used to (2) (having components u for b = 1,2,...,N ). Finally, at transform the measurement velocity (x˙(t)) at each time into (2)b 2 each point in this subspace, use the vectors in group 1 to itscoordinate-system-independentrepresentation,givenbythe construct an N -dimensional subspace, which contains points weights,w (t).ThethinblacklinesinFigure5showthetime- 1 i that can be reached by starting at this point and then varying dependent weights, which were derived from the waveform s . Each point in each one of these subspaces is assigned mixtures in Figure 4, using (13). The thick gray lines in (1) a constant value of u : namely, the value of u at the Figure5showthetime-dependentweights,whichwerederived (2) (2) ”starting point” in the first subspace of type 2. In this way, directly from each of the unmixed waveforms in Figure 2, each point in the space is assigned values of both u and using(17).Noticethattheweights,derivedfromthemixedand (1) u , thereby defining a u = (u ,u ) coordinate system. unmixedwaveforms,are nearlythe same, despite the factthat (2) (1) (2) Note that the u coordinate system has been constructed so these waveforms differed significantly because of nonlinear that subspaces having constant u coincide with subspaces mixing (e.g., compare Figures 4 and 2). This illustrates the (1) havingconstants .Likewise,subspaceshavingconstantu coordinate-system-independenceof the weight time series. (1) (2) coincide with subspaces having constant s . This means The correlation between the two weight time series, w (t) (2) 1 that the u coordinates are the same as the s coordinates, and w (t), was quite small (namely, -0.0016). In order to 2 exceptfor transformationsamong the componentsof s and determineiftheseweightscorrespondtocompletelyseparable (1) transformationsamong the components of s . Because such subsystems, the vectors associated with these two weights (2) subspace-wise transformations do not affect separability, u were used to constructthe putative coordinatetransformation, must be a separable coordinate system, and, by construction, u(x). As described in Method, this was done by using these weknowthetransformationu(x)thatmapsthemeasurements vectors to construct a family of lines having constant values onto these separable coordinates. Now, consider the group of of u , together with a family of lines having constant values 1 functions, which can be derived in this manner from each of u . The thin black lines in Figure 3a depict these lines for 2 way of partitioning the weight components. The above result evenlyspacevaluesofu andu . Thefunction,u(x), defined 1 2 implies that the data are separable if and only if this group by these curves was the only possible transformation to a contains a transformation to a separable coordinate system. separable coordinate system. As in Method, the separability of the data could be determined by using this function to III. EXPERIMENTS transform x(t) into the u coordinate system and by verifying thatu[x(t)]hasafactorizabledensityfunction(orfactorizable Inthissection,thenewBSStechniqueisillustratedbyusing correlation functions). it to disentangle nonlinear mixtures of the waveforms of two In this illustrative example, the separability of the data in malespeakers.EachspeakerreadanEnglishtext,consistingof the u coordinate system was verified by showing that the u athirty-secondexcerptfromoneoftwoaudiobookrecordings. and s coordinate systems differed by component-wise trans- The waveform of each speaker, s (t) (for k = 1,2), was k formations,whichdonotaffectseparability.Thiswasdoneby sampled16,000timespersecondwithtwobytesofdepth.The comparingthe linesof constantu andu to linesofconstant 1 2 thickgraylinesinFigure2showthetwospeakers’waveforms s and s , which are the thin black and thick gray lines, 1 2 during a short (30 ms) interval. These waveforms were then respectively, in Figure 3a. Notice the near coincidence of the mixed by the nonlinear functions familiesoflinesofconstantu (oru )andthefamiliesoflines 1 2 f1(s)=0.763s1+(958−0.0225s2)1.5 with constant s1 (or s2). This demonstrates that the u and s f (s)=0.153s +(3.75∗107−763s −229s )0.5, (19) coordinate systems differ by component-wise transformations 2 2 1 2 oftheform:u=(g(s ),h(s ))whereg andharemonotonic. 1 2 where −215 ≤ s ,s ≤ 215. This is one of a variety of Because the data are separable in the s coordinate system 1 2 nonlinearmixingfunctionsthatweretriedwithsimilarresults. and because component-wise transformations do not affect The measurements, x (t), were taken to be the variance- separability,thedatamustalsobeseparableintheucoordinate k normalized, principal components of the sampled waveform system. Therefore, we have accomplished the objectives of mixtures,f [s(t)].Figure3ashowshowthisnonlinearmixing BSS: namely, by blindly processing the data x(t), we have k maps an evenly-spaced Cartesian grid in the s coordinate determinedthatthesystemisseparable,andwehavecomputed system onto a warped grid in the x coordinatesystem. Figure the transformation, u(x), to a separable coordinate system. 3b shows a random subset of the measurements x(t), and The transformation u(x) can be applied to the data x(t) to Figure 4 shows the time course of the measurements x(t) recover the original unmixed waveforms, up to component- during the same short time interval depicted in Figure 2. wise transformations. The resulting waveforms, u [x(t)] and 1 When either measured waveform, x (t) or x (t), was played u [x(t)],aredepictedbythethinblacklinesinFigure2,which 1 2 2 as an audio file, it sounded like a confusing superposition also shows the trajectory of the unmixed waveforms in the s of two voices, which were quite difficult to understand. The coordinatesystem.Noticethatthetwotrajectories,u[x(t)]and entire set of 500,000 measurements, consisting of x and x˙ at s(t), are similar except for component-wise transformations 6 NONLINEARBLINDSOURCESEPARATION (a) (b) (c) Fig.2. (a)Thethickgraylinedepicts thetrajectory of30msofthetwospeakers’ unmixedspeechinthescoordinate system,inwhicheachcomponent is equal to one speaker’s speech amplitude. The thin black line depicts the waveforms (u)ofthe two speakers during the same time interval, recovered by blindlyprocessing theirnonlinearly mixedspeech.Panels (b)and(c)showthetimecourses ofs1 andu1 andofs2 andu2,respectively. (a) (b) (c) Fig. 3. (a) The thick gray lines are a regular Cartesian grid of lines with constant s1 and constant s2, after they were nonlinearly mapped into the x coordinate systembythemixingfunctionin(19).Thethinblacklinesdepictlinesofconstantu1 andu2,whereuisaseparablecoordinate systemderived fromthemeasurements.(b)Arandomsubsetofthemeasurementsalongthetrajectory ofthemixedwaveforms,x(t).(c)Thethickgrayandthinblacklines showthelocal vectors,V andV ,respectively, aftertheyhavebeenuniformlyscaledforthepurposeofdisplay. (1) (2) (a) (b) (c) Fig.4. (a)Thetrajectory ofmeasurements, x(t),during the30mstimeinterval depicted inFigure2.Panels (b)and(c)showthetimecourses ofx1 and x2,respectively. LEVIN:BLINDSOURCESEPARATION 7 (a) (b) (c) Fig. 5. Thethin black line in panel (a) is the trajectory ofthe weights, wi(t), which werederived from the measurements in Figure 4andwhich forma coordinate-system-independent representationofthevelocity(x˙(t))ofthosemeasurements.Thethickgraylinein(a)isthetimecourseoftheweights,which were derived directly from the unmixed waveforms in Figure 2 and which form the coordinate-system-independent representation of s˙(t) during the same timeinterval. Panels (b)and(c)showthetimecourses ofthefirstandsecondcomponents, respectively, ofthelines inpanel(a). along the two axes. The component-wise transformation is permutations and/or reflections), they are independent especially noticeable as a stretching of s(t) with respect to of the nature of the coordinate system in which they u[x(t)]alongthepositives axis.Wheneachoftherecovered are derived. Therefore,they are also independentof the 2 waveforms, u [x(t)] and u [x(t)], was played as an audio observer’s choice of sensors, as asserted above. This 1 2 file, it sounded like a completely intelligible recording of type of sensor-independentsignal representation should one of the speakers. In each case, the other speaker was not be contrasted with conventional signal representations, heard, except for a faint “buzzing” sound in the background. whichcontainmixturesofinformationaboutthesensors, Therefore,thecomponent-wisetransformations,whichrelated together with information intrinsic to the underlying the recovered waveforms to the original unmixed waveforms, processes. did not noticeably reduce intelligibility. 2) ThispapershowshowtoperformnonlinearBSSforthe caseinwhichthemeasurementsareinvertiblyrelatedto IV. CONCLUSION thestatevariablesoftheunderlyingsystem.Invertibility This paper describes how to determine if time-dependent can almost be guaranteed by observing the system signal measurements, x(t), are comprised of linear or nonlin- witha sufficientlylargenumberofindependentsensors: ear mixtures of the state variables of statistically-independent specifically, by utilizing at least 2N + 1 independent subsystems. First, the local distribution of measurement ve- sensors, where N is the dimension of the system’s locities, x˙, is used to construct local vectors at each point state space. In this case, the sensors’ output lies in x. If the data are separable, each of these vectors is directed an N-dimensional subspace within a space of at least along a subspace traversed by varying the state variable of 2N +1 dimensions. Dimensional reduction techniques one subsystem, while all other subsystems are kept constant. (e.g., [7]) can be used to find the subspace coordi- Becauseofthisproperty,thesevectorscanbeusedtoderivea nates corresponding to the sensor outputs. Because an finite set of functions, u(x), which must include the transfor- embedding theorem asserts that this subspace is very mationtoaseparablecoordinatesystem,ifitexists.Therefore, unlikely to self-intersect ( [8]), the coordinates on this separability can be determined by testing the separability of subspace constitute synthetic ”measurements” that are the data, after it has been transformed by each of these almost certainly invertibly related to the system’s state mappings. Furthermore, if the data are separable, we can space. In other words, suitable synthetic measurements recover a time series that describes the evolution of each canbefoundbyaddingmoreandmoresensorsuntilthe subsystem. Therefore, nonlinear blind source separation has numberofsensorsismorethantwicethedimensionality been accomplished. (N) of the subspace containing the sensors’ outputs. Some comments on this result: 3) Theoretically, the proposed method can be applied to measurements described by any diffeomorphic mixing 1) The time-dependent weights are independent of the function. However, in practice, more data will have to choice of sensors used to observe the underlyingphysi- be analyzed in order to handle mixing functions with cal process, and in that sense they comprise an intrinsic morepronouncednonlinearities.Thisis because rapidly or “inner” property of that process. To see this, note varying mixing functions may cause the local vectors thatdifferentsets ofsensorsdetectdifferentmixturesof (V ) to vary rapidly in the measurement coordinate signals from that process, and different signal mixtures (i) system, and, therefore, it will be necessary to compute simply describe the system’s state in different coordi- those vectors in numerous small neighborhoods. nate systems. Because the weights are scalars (up to 8 NONLINEARBLINDSOURCESEPARATION 4) More data will also be required to apply this method to state spaces having higher dimensions. In Experiments, thirty seconds of data (500,000 samples) were used to recover two waveforms from measurements of two nonlinearmixtures.Inotherexperiments,approximately six minutes of data (6,000,000 samples) were used to cleanly recover the waveforms of four sound sources (two speakers and two piano performances) from four signal mixtures. As expected, blind separation of the 4D state space did require more data, but it was not a prohibitive amount. 5) The method does not require a lot of computational resources. In any event, the most computationally ex- pensive tasks are the binning of the measurement data and the computation of the local vectors, V , in each (i) bin. If necessary, these calculations can be parallelized across multiple CPUs. V. ACKNOWLEDGEMENT The author is grateful to Michael A. Levin for his careful reading of the manuscript and for his suggestions on how to organize the text. REFERENCES [1] C. Jutten and J. Karhunen, “Advances in blind source separation (BSS) and independent component analysis (ICA) for nonlinear mixtures,” International J.NeuralSystems,vol.14,pp.267-292,2004. [2] L.Almeida, Nonlinear Source Separation. Synthesis Lectures onSignal Processing. Princeton, New Jersey: Morgan and Claypool Publishers, 2005. [3] A. Hyva¨rinen and P. Pajunen, “Nonlinear independent component anal- ysis: existence and uniqueness results,” Neural Networks, vol. 12, pp. 429-439,1999. [4] D.N.Levin,“Usingstatespacedifferentialgeometryfornonlinearblind sourceseparation,” J.AppliedPhysics,vol.103,art.no.044906,2008. [5] S. Lagrange, L. Jaulin, V. Vigneron, C. Jutten, “Analytic solution of the blind source separation problem using derivatives,” in Independent ComponentAnalysisandBlindSignalSeparation,LNCS,vol.3195,C.G. PuntonetandA.G.Prieto(eds).Heidelberg: Springer, 2004,pp.81-88. [6] D. N. Levin, “Performing nonlinear blind source separation with signal invariants,”IEEETrans.SignalProcessing,vol.58,pp.2131-2140,2010. [7] S. T. Roweis and L. K.Saul, “Nonlinear dimensionality reduction by locally linearembedding,” Science, vol.290,pp.2323-2326, 2000. [8] T.Sauer,J.A.Yorke,M.Casdagli,“Embedology,”J.StatisticalPhysics, vol.65,pp.579-616,1991. David N. Levin received his Ph.D. in theoretical physics from Harvard University in 1970 and did researchinquantumfieldtheoryuntil1977,whenhe enteredmedicalschoolattheUniversityofChicago. He joined the faculty after receiving an M.D. and completing radiology residency at the University. During1987-1999,heservedasDirectorofClinical MRIattheUniversityandduring1999-2005heco- directed the University’s Brain Research Imaging Center. He is currently an emeritus professor in the Department of Radiology. His past research interests have included the search for all renormalizable quantum field theories, multimodality 3D brain imaging, computer-assisted neurosurgery, imagesegmentation,methodologyformappingthebrainwithfunctionalMRI, andtheuseofpriorknowledgeofsparsesupporttoincreasethespeedofMR imageacquisition.Hiscurrentresearchisfocusedonvariousaspectsofsignal processing,includingnonlinearblindsourceseparation,speechseparation,and sensor-independent signalrepresentations.