ebook img

Operator means of probability measures and generalized Karcher equations PDF

0.42 MB·
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Operator means of probability measures and generalized Karcher equations

OPERATOR MEANS OF PROBABILITY MEASURES AND GENERALIZED KARCHER EQUATIONS 6 1 MIKLO´S PA´LFIA 0 2 Abstract. In this article we consider means of positive bounded lin- n a ear operators on a Hilbert space. We present a complete theory that J provides a framework which extends the theory of the Karcher mean, 5 its approximatingmatrix powermeans, and alarge part of Kubo-Ando 2 theory to arbitrary many variables, in fact, to the case of probability measures with bounded support on the cone of positive definite opera- ] tors. This framework characterizes each operator mean extrinsically as A unique solutions of generalized Karcher equations which are obtained F byexchangingthematrixlogarithmfunctionintheKarcherequationto . arbitraryoperatormonotonefunctionsoverthepositiverealhalf-line. If h t theunderlyingHilbertspaceisfinitedimensional,thenthesegeneralized a Karcher equations are Riemannian gradients of convexcombinations of m strictlygeodesically convexlog-determinantdivergencefunctions,hence [ these new means are the global minimizers of them, in analogue to the case of the Karcher mean as pointed out. Our framework is based on 1 fundamental contraction results with respect to the Thompson metric, v which provides us nonlinear contraction semigroups in the cone of pos- 7 7 itive definite operators that form a decreasing net approximating these 7 operator means in thestrong topology from above. 6 0 . 1 0 1. Introduction 6 1 Let E be Hilbert space and S(E) denote the Banach space of bounded v: linear self-adjoint operators. Let P ⊆ S(E) denote the cone of positive Xi definite operators on E. In this article we are concerned with means of members of P that enjoy certain attractive properties that recently became r a importantfrom the pointof view of averaging in the finitedimensional case, see for example [4, 15, 18, 5, 6]. Usually the main difficulties here arise from the required property of operator monotonicity i.e., our means must bemonotone with respectto the positive definiteorder on P. The2-variable theory of such functions is relatively well understood: each such function is represented by an operator monotone function according to the theory of Kubo-Ando [20]. However in the several variable case we have no such characterization of operator monotone functions. Date: January 27, 2016. 2000 Mathematics Subject Classification. Primary 47A64, 46L05, Secondary 53C20, 53C35. Key words and phrases. operator monotone function, operator mean, Karchermean. 1 2 MIKLO´SPA´LFIA WhenE isfinitedimensional, thenthereareadditionalgeometrical struc- tures on P that are used to define n-variable operator means [3, 11, 26, 29]. In this setting P is justthe cone of positive definite n-by-n Hermitian matri- ces,wherenisthedimensionofE. Itisasmoothmanifoldasanopensubset of the vector space of n-by-n Hermitian matrices (which is just S(E) in this case)andhasaRiemanniansymmetricspacestructureP∼= GL(E)/U,where U is the unitary group and GL(E) is the general linear group over E [9, 10]. This symmetric space is nonpositively curved, hence a unique minimizing geodesic between any two points exists [10]. Themidpointoperation on this space, whichis definedas taking themiddlepointof thegeodesic connecting two points, is the geometric mean of two positive definite matrices [10]. The Riemannian distance on the manifold P is of the form d(A,B) = Trlog2(A−1B). q Then the (weighted) multivariable geometric mean or Karcher mean of the k-tuple A := (A ,...,A ) ∈ Pk with respect to the positive probability 1 k vector ω := (w ,...,w ) is defined as the center of mass 1 k k (1) Λ(w ,...,w ;A ,...,A )= argmin w d2(X,A ). 1 k 1 k i i X∈P(n,C) i=1 X The Karcher mean was first considered in this setting in [29, 11]. The Karcher mean Λ(ω;A) is also the unique positive definite solution of the corresponding critical point equation called the Karcher equation k (2) w log(X−1A ) = 0, i i i=1 X wherethegradientofthefunctionintheminimizationproblem(1)isappears on the left hand side [29]. The (entry-wise) operator monotonicity of this mean in A with respect to a fixed ω, that is A ≤ B if and only if A ≤ B i i for 1 ≤ i ≤ k implying Λ(ω;A) ≤ Λ(ω;B) in the positive definite order, was unknown until it has been first proved in [26]. Later in [28] a one parameter family P (ω,A) of operator monotone means, called the matrix s power means, has been constructed as the unique solution of the nonlinear operator equation k (3) X = w X# A i s i i=1 X for fixed s ∈ [−1,1],w > 0, k w = 1 and A ∈ P, where i i=1 i i s A# B =PA1/2 A−1/2BA−1/2 A1/2 s (cid:16) (cid:17) is the weighted two-variable geometric mean. Then in [28] it has been es- tablished that lim P (ω,A) = Λ(ω;A) s s→0+ MEANS & KARCHER EQUATIONS 3 proving once more monotonicity and many other additional properties of Λ. Later the same approach based on the matrix power means has been adapted in [24], to extend Λ(ω;A) to the infinite dimensional setting as the uniquesolutionof (2),eventhoughthereexistsnoRiemannianmetricinthis case, hence nor geodesically convex potential functions. These additional properties make the Karcher mean Λ(ω;A) the geometric mean to choose amongthenumerousdifferentgeometric means[11,25]. Form thegeometric and analytic point of view the two characterizations provided by (1) and (2) may be the most significant. In this paper we extend the theory of the Karcher mean and the approxi- mating matrix power means appearing in[24, 25, 28] in three different ways. Firstly we generalize the setting of (2) by taking any operator monotone function f : (0,∞) 7→ R instead of just the special one f(x) = log(x). Here operator monotonicity of f is understood as A ≤ B implying f(A) ≤ f(B) for any A,B ∈ P. The second generalization is that instead of sums for n- tuples of operators A in (2) we consider integrals with respect to probability measures supported on P. So our generalization of (2) has the form (4) f(X−1A)dµ(A) = 0 P Z for a probability measure µ on P. Let L denote the set of all operator monotone functions f : (0,∞) 7→ R such that f(1) = 0,f′(1) = 1. Let P(L×P)denotethesetofBorelprobabilitymeasureswithboundedsupport on the product space L×P. Then our first result is the following. Theorem 1.1. Let µ ∈P(L×P). Then the equation (5) f X−1/2AX−1/2 dµ(f,A) = 0, ZL×P (cid:16) (cid:17) which we call the generalized Karcher equation, has a unique solution in P that is also operator monotone in the extended sense of Definition 5.2 below, which provides a positive definite partial order for measures. We prove this by suitably generalizing the argumentation in [24]. We es- tablish a one parameter family of generalized matrix power means as fixed points of strict contractions, with suitable properties like operator mono- tonicity, such that they form a decreasing net in the partial positive definite order approximating the solution of (5) from above in the strong topology. Moreover these generalized matrix power means, called induced operator means, are unique solutions of (5) as well, hence fitting into the greater picture. By establishing many attractive properties, like operator mono- tonicity, of these fixed points, we prove that these properties are preserved in the limit i.e., a solution of (5) has these properties as well. Then we prove that the left hand side of (5) occurs as a Riemannian gradient of certain divergence functions on P when E is finite dimensional, using the 4 MIKLO´SPA´LFIA integral formula (6) f(x) = l (x)dν(s) s Z[0,1] valid for any f ∈ L, where l (x) := x−1 and ν is a uniquely determined s (1−s)x+s probability measure on [0,1]. In particular we obtain a generalized form of (1) by means of strictly geodesically convex divergence functions 1 (7) LDs(X,A) = tr{log[(1−s)A+sX]−logX# A} 1−s s(1−s) wheres ∈ [0,1],X# AistheweightedgeometricmeanofX,A ∈ P. These 1−s divergence functions (7) also appeared recently in [14]. Let P([0,1] ×P) denote the set of Borel probability measures on [0,1]×P. Then our second main result is the following. Theorem 1.2. Let P be finite dimensional and µ ∈P([0,1]×P). Then the minimization problem (8) argminX∈P LDs(X,A)dµ(s,A) Z[0,1]×P has a unique positive definite solution in P. Moreover it is the unique solu- tion of the corresponding generalized Karcher equation X1/2l (X−1/2AX−1/2)X1/2dµ(s,A) = 0, s Z[0,1]×P which is the critical point equation of (8) with respect to the Riemannian metric of the symmetric space P. According to formula (6), the above theorem exhausts all possible gener- alized Karcher equations appearing in Theorem 1.1. Also the combination of Theorem 1.1 and Theorem 1.2, as a byproduct, yields a new property of operator monotone functions in L: they are characterized as Riemannian gradients of divergence functions. Theorem 1.1 and Theorem 1.2 are extensions of the results known for the geometric (or Karcher) mean Λ(ω;A) [24, 25, 28, 29], in which case f(x) = log(x) in Theorem 1.1 and in Theorem 1.2 the measure dµ(s,A) = d(ν×σ)(s,A), wheredν(s) = ds anddσ(A) is finitely supportedover A ∈ P i with corresponding weight w for 1 ≤ i ≤ k. Also the matrix power means i P (ω,σ) fit in the picture, in which case f(x) = xs−1 in Theorem 1.1. Also s s our theorems extend the log-determinant means of positive matrices in [14]. Hence the unique solutions appearing in Theorem 1.1 and Theorem 1.2 not only provide us with new operator means, but unifies these existing ap- proacheswhich-toourknowledge-isthefirstgenerallyapplicableextension oftwo-variableKubo-Andooperatormeans[20]toseveral(noncommutative) variables and measures. Many nice properties of these generalized operator means are proved here including operator monotonicity, see Theorem 6.4 below. MEANS & KARCHER EQUATIONS 5 The proof of Theorem 1.2 is based on a contraction principle in the cone P with a complete metric space structure given by Thompson’s part metric d (A,B) =max{logM(A/B),M(B/A)} ∞ for any A,B ∈ P, where M(A/B) = inf{α : A ≤ αB}. The argument is an elaborate generalization of the one given in [24] which is motivated by the onein [28]basedontheapproximation of theKarchermean withthematrix power means which are themselves fixed points of one parameter families of strict contractions. Also en route this process, new auxiliary results are proved for the Thompson metric as well which extend various earlier ones in [23]. Thepaperisorganizedasfollows. Insection2weintroducethedivergence functions (7), calculate their gradients and prove their geodesic convexity along with new integral formulas for members in L. These results will pro- vide Theorem 2.6 which gives the first half of Theorem 1.1. Then in section 3 we prove new integral formulas for operator means in the sense of Kubo- Ando. In section 4 we prove fundamental contraction results for operator means with respect to the Thompson metric and in section 5 we use these to build up an analogous theory of generalized matrix power means called induced operator means in our setting. Then in section 6 we prove Theo- rem1.2asacombinedeffortofTheorem6.3,Theorem6.9andTheorem6.13 which also concludes the second half of Theorem 1.1 and Theorem 1.2. In section 7 we briefly discuss the results obtained. 2. Convex cone of log-determinant divergences over P Suppose now that the Hilbert space E is finite dimensional so that P is isomorphic to the cone of positive definite matrices. In this case one can introduce the so called logarithmic-barrier function f(X) = −logdetX = −trlogX which is strictly convex on P [9]. Using this function we introduce the following one parameter family of divergence functions on P also studied in [14]: 1 (9) LDs(X,A) = tr{log[(1−s)A+sX]−logX# A} 1−s s(1−s) fors ∈ [0,1]andX,A ∈ Pwherethecasess = 0,1aredefinedbyacontinuity argument and (10) X# A= X1/2 X−1/2AX−1/2 1−sX1/2 = X X−1A 1−s 1−s (cid:16) (cid:17) is the weighted geometric mean. Indeed, one can check th(cid:0)at (cid:1) LD1(X,A) = tr X−1A−I −log(X−1A) LD0(X,A) = tr(cid:8)A−1X −I −log(A−1X)(cid:9) (cid:8) (cid:9) 6 MIKLO´SPA´LFIA bycomputingthelimitslim LDs(X,A)andlim LDs(X,A). More- s→1− s→0+ over simple calculation show that 1 det[(1−s)A+sX] LDs(X,A) = log s(1−s) det(X# A) 1−s 1 = tr log[(1−s)X−1A+s]−log(X−1A)1−s . s(1−s) The formulas defining LDs(X,(cid:8)A) show that indeed it is a divergence (cid:9)func- tion, that is, it satisfies (I) LDs(X,A) ≥ 0, (II) LDs(X,A) = 0 if and only if X = A, where (I) essentially follows from the operator inequality between the arith- metic and geometric means sX +(1−s)A≥ X# A 1−s and the operator monotonicity of log, see for example [10]. Property (I) and (II) tells us that LDs(X,A) is a divergence function on P. Now clearly any convex combinations of the form LDs(X,A)dν(s) Z[0,1] also satisfies (I) and (II) for any probability measure ν over the closed inter- val [0,1]. Here we can consider arbitrary positive measures over [0,1], but we will see that the normalization condition dν(s) = 1 does not restrict [0,1] our further analysis, since we will consider the point where these functions R attain their minimum on P. We are interested in the convexity properties of LDs(X,A)dν(s). [0,1] Morepreciselythegeodesicconvexity withrespecttotheRiemannianmetric R (11) hX,Yi = tr A−1XA−1Y A where A ∈ P and X,Y ∈ S(E), where S(E) in this case is just the vector (cid:8) (cid:9) space of Hermitian matrices. It is well known that P with the Riemannian metric is a Hadamard manifold i.e., it has nonpositive sectional curvature [10]. Therefore its geodesics connecting any two points A,B ∈ P are unique and are given by the geometric mean A# B. The Levi-Civita connection t with respect to the Riemannian metric is of the form 1 (12) ∇ Y = DY[A][X ]− X A−1Y +Y A−1X XA A A 2 A A A A see for example [22]. (cid:0) (cid:1) Let us consider the set of operator monotone functions f : (0,∞) 7→ R. Operator monotonicity means that for any two Hermitian matrices X ≤ Y we have f(X) ≤ f(Y). Such functions have strong analytic properties, each such function has a unique integral representation [9] of the form ∞ λ 1 (13) f(x) = α+βx+ − dµ(λ), λ2+1 λ+x Z0 MEANS & KARCHER EQUATIONS 7 whereαisarealnumber,β ≥ 0andµisauniquepositivemeasureon[0,∞) such that ∞ 1 (14) dµ(λ) < ∞. λ2+1 Z0 Notice that the set of all such functions is a convex cone. A subset of this cone will be of our interest: L:= f : (0,∞) 7→ R,f is operator monotone ,f(1) = 0,f′(1) = 1 . The following is a simple corollary of Bendat and Sherman’s result [8], see (cid:8) (cid:9) Theorem 3.7 in [16]. Corollary 2.1. The function f ∈ L if and only if there exists a unique probability measure ν on [0,1] such that x−1 (15) f(x)= dν(s). (1−s)x+s Z[0,1] We can also consider x−1 x−1 λ dν(s) = dν (1−s)x+s x + λ λ+1 Z[0,1] Z[0,∞] λ+1 λ+1 (cid:18) (cid:19) (λ+1)2 λ = λ+1− dν λ+x λ+1 Z[0,∞] (cid:18) (cid:19) with dη(λ) := dν λ , so that we obtain the following similar corollary as λ+1 well: (cid:16) (cid:17) Corollary 2.2. The function f ∈ L if and only if (λ+1)2 (16) f(x)= λ+1− dη(λ) λ+x Z[0,∞] where η is a unique probability measure on the closed, compact interval [0,∞]. The above integral characterizations will be useful tools also in later sec- tions. One can easily see that x−1 (17) 1−x−1 ≤ ≤ x−1 (1−s)x+s holds for all s ∈ [0,1] and x > 0, by comparing the functions on the disjoint intervals (0,1) and [1,∞). Hence after integration we conclude that (18) 1−x−1 ≤ f(x)≤ x−1 holds for all f ∈ L. Now we give yet another characterization of the set L. Definition 2.1. Let D denote the simplex of divergence functions on P of A the form F(X) = LDs(X,A)dν(s) Z[0,1] for any probability measure ν over the closed interval [0,1] and A∈ P. 8 MIKLO´SPA´LFIA Note that the above integral exists, since for fixed X,A ∈ P the real func- tion s 7→ LDs(X,A) is continuous over the compact interval [0,1], hence bounded and strongly measurable so it is integrable by dominated conver- gence theorem or by Theorem 11.8 in [2]. The same reasoning ensures the existence of other integrals over [0,1] that are considered in this section. Theorem 2.3. Let F ∈ D be represented by a probability measure ν over A the closed interval [0,1]. Then the Riemannian gradient is ∇F(X) = − X(X−1A−I) (1−s)X−1A+sI −1dν(s) (19) Z[0,1] (cid:2) (cid:3) = −Xf(X−1A)= −X1/2f(X−1/2AX−1/2)X1/2 with respect to the Riemannian metric (11) and f ∈ L represented by ν as in (15). Proof. The Riemannian gradient ∇F(X) of F(X) is defined by the relation ∂ F(X +Vt) = h∇F(X),Vi ∂t X (cid:12)t=0 (cid:12) for any V in the tangent space at (cid:12)X. I.e., we compute (cid:12) ∂ LDs (X +Vt,A)| = ∂t t=0 1 = tr Dlog (1−s)X−1A+sI −(1−s)X−1VX−1A s(1−s) −(1−s)D(cid:8)log X(cid:0)−1A −X−1VX−1(cid:1)A(cid:0) (cid:1) whereDlog :P×S(E) →(cid:0)S(E)is(cid:1)t(cid:0)heFr´echetderiv(cid:1)a(cid:9)tiveoflogwhichislinear in the second variable. Moreover since log is an analytic function on (0,∞) we have by the linearity, cyclic property of the trace and the Riesz-Dunford functional calculus [41], that for any X,Y we have that tr{Dlog(X)(Y)} = tr{Dlog(X)(I)Y}. Further calculation with this gives that ∂ LDs (X +Vt,A)| = ∂t t=0 1 = tr Dlog X−1A (I)X−1VX−1A s −D(cid:8)log (1(cid:0)−s)X−(cid:1)1A+sI (I)X−1VX−1A 1 = tr A−(cid:0)1XX−1VX−1A−(cid:1)(1−s)X−1A+s(cid:9)I −1X−1VX−1A s =1trnVX−1−X−1A (1−s(cid:2))X−1A+sI −1X−(cid:3)1V , o s n o hence (cid:2) (cid:3) 1 ∇F(X) = X I − (1−s)I +s(X−1A)−1 −1 . s n o (cid:2) (cid:3) MEANS & KARCHER EQUATIONS 9 Now simple calculation shows that 1− 1−s+sx−1 −1 x−1 = − , s (1−s)x+s (cid:2) (cid:3) and that the whole calculation also holds even if s = 0 or 1. Since Bochner integration is exchangeable with the linear differential operator of Fr´echet differentiation, we get from the above that (19) holds. (cid:3) Remark 2.1. The proof of Theorem 2.3 also shows that actually every f ∈ L corresponds to a Riemannian gradient of a divergence function in D . A We say that a map f : P → R is geodesically convex with respect to the metric (11) if it is convex along any geodesic γ :[0,1] → P i.e., the function f(γ(t)) is convex: f(γ(t)) ≤ (1−t)f(γ(0))+tf(γ(1)) for t ∈ [0,1]. Similarly f is strictly geodesically convex if additionally f(γ(t)) < (1−t)f(γ(0))+tf(γ(1)) for t ∈ [0,1]. It is well known that f is geodesically convex if and only if the Riemannian Hessian Hessf(X) : S(E)×S(E) → R is positive semidefinite, similarly f is strictly geodesically convex if and only if Hessf(X)(·,·) is positive definite [19, 35]. Moreover if one can show that Hessf(X) > m > 0 on some boundedgeodesically convex set, then it follows that f is uniformly convex i.e., m f(γ(t)) ≤ (1−t)f(γ(0))+tf(γ(1))− t(1−t)d(γ(0),γ(1))2 2 for all geodesics γ lying entirely in the bounded gedesically convex set. In the above inequality d is the Riemannian distance function. Theorem 2.4. Every F ∈ D is strictly geodesically convex function with A respect to the metric (11). Proof. The idea is to show that the Riemannian Hessian of F is positive definite i.e., HessF(X)(V,V) > 0 for any nonzero V ∈ S(E) in the tangent space at X. The Riemannian Hessian of F is defined by HessF(X)(V,W) = h∇ (∇F(X)),Wi V X as a bilinear form acting on the tangent space at X, where ∇ is the co- V variant derivative given by the Levi-Civita connection (12), see for example [37]. By Theorem 2.3 we have that ∇F(X) = −Xf(X−1A) with an f ∈ L, so we have ∇ (∇F(X)) = −Vf(X−1A)+XDf(X−1A)(X−1VX−1A) V 1 + Xf(X−1A)X−1V +Vf(X−1A) . 2 (cid:2) (cid:3) 10 MIKLO´SPA´LFIA Hence we have HessF(X)(V,V) = h∇ (∇F(X)),Vi V X = tr X−1 −Vf(X−1A)+XDf(X−1A)(X−1VX−1A) 1 +(cid:8) Xf(cid:8)(X−1A)X−1V +Vf(X−1A) X−1V . 2 (cid:27) (cid:27) (cid:2) (cid:3) By the cyclic property of the trace the above is equivalent to HessF(X)(V,V) = tr Df(X−1A)(X−1VX−1A)X−1V . Now we make use of the integral(cid:8)representation for f given in Cor(cid:9)ollary 2.2 in the form (λ+1)2 f(x)= λ+1− dη(λ). λ+x Z[0,∞] By exchanging the Fr´echet derivative with the integral we obtain that Df(X−1A)(X−1VX−1A) = ν({∞})X−1VX−1A + (λI +X−1A)−1X−1VX−1A(λI +X−1A)−1dη(λ), Z[0,∞) where we separated the term corresponding to {∞}. Hence by the linearity of the trace we have HessF(X)(V,V) = ν({∞})tr X−1VX−1AX−1V + tr (λI +X−1A)−1X−(cid:8)1VX−1A(λI +X−1(cid:9)A)−1X−1V dη(λ). Z[0,∞) (cid:8) (cid:9) Hence it suffices to prove now that the expressions c(V) := tr X−1VX−1AX−1V > 0, p(V) := tr (λI +X−1A)−1X−1VX−1A(λI +X−1A)−1X−1V > 0 (cid:8) (cid:9) for all nonzero V(cid:8) ∈ S(E), X,A ∈ P and λ ≥ 0. Using the notati(cid:9)on P = X−1/2AX−1/2, H = X−1/2VX−1/2, the cyclic property of the trace and that (λI +P)−1/2 commutes with P1/2, we get that p(V) = tr (λI +P)−1HP(λI +P)−1H = tr(cid:8)(λI +P)−1/2H(λI +P)−1/2P(cid:9)1/2 n P1/2(λI +P)−1/2H(λI +P)−1/2 o 2 = (λI +P)−1/2H(λI +P)−1/2P1/2 , 2 (cid:13) (cid:13) c(V) = tr{HPH} (cid:13) (cid:13) (cid:13) (cid:13) = tr HP1/2P1/2H n 2 o = HP1/2 2 (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13)

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.