Quantum Information on Spectral sets Peter Harremoës Niels Brock Copenhagen Business College Copenhagen E-mail: [email protected] Abstract—For convex optimization problems Bregman diver- probability distributions on the sample space. In such cases gences appear as regret functions. Such regret functions can thestate spaceisa simplex,butit iswell-knownthatthe state 7 be defined on any convex set but if a sufficiency condition is space is not a simplex in quantum physics. In order to cover 1 added the regret function must be proportional to information 0 applications in physics we need a the more general notion of divergence and the convex set must be spectral. Spectral set 2 are sets where different orthogonal decompositions of a state a state space as defined in [1]. b into pure states have unique mixing coefficients. Only on such e spectral sets it is possible to define well behaved information A. The state space and the positive cone F theoreticquantitieslikeentropyanddivergence.Itisonlypossible Beforewedoanymeasurementwe prepareoursystem.Let 0 itsosppeercftorraml. Tmheeasmuoresmt iemnptsorintanatrsepveecrtsriballesewtsaycainf tbheersetparteessepnatecde P denote the set of preparations. Let p0 and p1 denote two 1 aspositiveelementsof Jordanalgebraswithtrace1.Thismeans preparations.Fort∈[0,1]we define(1−t)·p0+t·p1 as the that Jordan algebras provide a natural framework for studying preparationobtainedbypreparingp0withprobability1−tand T] quantuminformation.WecompareinformationtheoryonHilbert p1 withprobabilityt.Ameasurementmisdefinedasanaffine spaceswithinformationtheoryinmoregeneralJordanalgebras, mapping of the set of preparations into a set of probability I s. aidnedntciofyncsloumdeetihmaptomrtuacnht dofifftehreenfocrems.alism is unchanged but also measures on some measurable space. Let M denote a set of c feasiblemeasurements.The state space S is definedas the set [ I. INTRODUCTION of preparationsmodulo measurements. Thus, if p1 and p2 are 2 Although quantum physics has existed for more than 100 preparations then they represent the same state if m(p1) = v years there are still many simple fundamental questions that m(p2) for any m∈M. 8 remainunanswered.Oneofthemainquestionsiswhycomplex In statistics the state space equals the set of preparations 8 6 Hilbert spaces are used. Although the complex numbers are and has the shape of a simplex. The symmetry group of a 6 extremelyusefulfordoingcomputationswedonothavedirect simplex is simply the group of permutations of the extreme 0 observationsofthesenumbers.Observablesarerepresentedby points. In quantum physics the state space has the shape of 1. selfadjointoperatorsand such operatorscan be multiplied but the density matrices on a complex Hilbert space and the state 0 the multiplication leads to operators that are not selfadjoint space has a lot of symmetries that a finite simplex does not 7 and cannot be observed. Therefore Pascual Jordan introduced have. For simplicity we will assume that the state space is a 1 what is now known as Jordan algebras with a different finite dimensional convex compact space. : v product than the usual matrix product. Still, his product is From now on we consider the situation where our knowl- Xi not well physically motivated. In this paper we give some edge about a system is given by an element in a convex set. general definitions of some information theoretic quantities These elements are called states and convex combinations r a and demonstrate that they are only well behaved on convex are formed by probabilistic mixing. States that cannot be sets that can be representedon Jordan algebras.Thereforethe distinguishedbyanymeasurementareconsideredasbeingthe Jordan algebras appear to be the correct formalism for doing same state. The extreme points in the convex set are called quantuminformation.From a computerscience point of view purestatesandallotherstatesarecalledmixedstates. See[1] the quantum algorithms should be run in Jordan algebras, for details about this definition of a state space. but for building a quantum computer we need a physical If S is a state space it is sometimes convenientto consider implementation of the algorithm. Appearently the physical thepositiveconegeneratedbyS .Thepositiveconeconsistof world prefer Jordan algebras associated with complex Hilbert elements of the form λ·σ where λ≥0 and σ ∈S. Elements spaces so a general algorithm on a Jordan algebra should be of a positive cone can multiplied by positive constants via implemented on complex Jordan algebras. This is possible in λ·(µ·σ)=(λ·µ)·σ and can be added as follows. mostcasesbutmayincreasethenumberofgateoperationsby λ µ a factor. λ·ρ+µ·σ =(λ+µ)· ρ+ σ . λ+µ λ+µ (cid:18) (cid:19) II. STRUCTURE OFTHE STATESPACE The convex set and the positive cone can be embedded in a Our knowledge about a system will be represented by a realvectorspace by takingthe affine hullof the coneand use state space. I many cases the state space is given by a set of the apex of the cone as origin of the vector space. B. Measurements on the state space S that contain both σ and σ . If the the extreme points 0 1 σ ,σ ,...,σ of a decomposition are orthogonal then the Let m denote a measurement on the state space S with 1 2 n decomposition is called an orthogonal decomposition. valuesinthesetA.Thenm(σ)isaprobabilitydistributionon A.LetB ⊆A.Thenm(σ)(B)istheprobabilityofobserving a result in B when the state is σ and the measurement m is φ s performed. Then m(σ)(B) ∈ [0,1] and σ →m(σ)(B)is an 1 1 affine mapping. Definition 1. Let S denote a convex set. A test is an affine 0.8 map from S to [0,1] . The tests are building block for all measurements accoring 0.6 to the following proposition. Proposition 2. Let m denote a measurement on the convex 0.4 s set S with values in the finite set A. Then there exists tests φ :S →[0,1] such that for any B ⊆A we have b 0.2 m(σ)(B)= φ (σ). (1) b bX∈B 0 s If the set A is not finite we may have to replace the sum 0 by an integral. The trace of a positive element is defined by tr(λ·σ)=λ whenσ ∈S so thatstates are positiveelements Fig. 1. In the disc the points σ0 and σ1 are mutually singular. The point with trace equal to 1. We note that the trace restricted to the shasauniquedecomposition intomutually singularpointsbecause itisnot statesdefinesatest.Anytestscanbeidentifiedwithapositive thecenter ofthedisc. functionalon the positive conethat is dominatedby the trace. Note that if a measurement m is given by (1) then the tests Theorem4. LetS denoteaconvexcompactsetofdimensiond satisfies and let x denote some element in the positive cone generated φ =tr. by S. Then there exists an orthogonal decomposition of the b b∈A form as in Equation 2 such that n≤d+1. X C. Improved Caratheodory theorem Proof: See [2, Thm. 2]. Let x be an element in the positive cone such that Definition5. Therankofaconvexsetisthemaximalnumber n oforthogonalstatesneededinanorthogonaldecompositionof x= λi·σi. (2) a state. i=1 X Example 6. In the unit square with (0,0),(1,0),(0,1) and where σ are pure states. If λ ≥ λ ≥ ··· ≥ λ then the i 1 2 n (1,1) as vertices the point, with coordinates (1/2,1/4) has an vector λn is called the spectrum of the decomposition. Note 1 orthogonal decomposition with spectrum (1/2,1/4,1/4). This thattherearenorestrictionsonthe numbern inthe definition spectrum majorizes the spectrum of any other decomposition ofthespectrum,soiftwospectrahavedifferentlengthwewill of this point, and it also majorizes the spectrum of any extend the shorter vector by concatenating zeros at the end. other point in the square. The square has in total four points We note that for a decomposition like (2) the trace is given by tr[x]= n λ . symmetrically arranged with the same spectrum as (1/2,1/4). i=1 i Spectra are ordered by majorization. Let λn and µn be the III. SPECTRAL SETS P 1 1 spectra of two decompositions of the same positive element. Any state in may be decomposed into orthogonal states, Thenλn1 (cid:23)µn1 if ki=1λi ≥ ki=1µi fork ≤n. Note that in butsuchorthogonaldecompositionsareonlyuniquewhenthe a general positive cone the majorization ordering is a partial state space is a simplex. Nevertheless there exists a type of P P ordering.Inspecialcasesliketheconeofpositivesemidefinite convex sets where some weaker type uniqueness holds. matricesonacomplexHilbertspacethedecompositionsofthe matrixformalatticeorderingwithauniquemaximalelement, A. Spectrality conditions but in general the set of decompositions may have several Definition 7 ([3]). An n-frame is a list of n perfectly distin- incompatible maximal elements. guishable pure states. If any state σ ∈S has a decomposition σ = λ ·σ where σ ,σ ...,σ is an n-frame then the Definition3. LetS denoteaconvexset.Letσ ,i=1,2,...,n i i i 1 2, n i state space is said to satisfy weak spectrality. states in the state space S. Then (σi)i are said to be perfectly P distinguishable if there exists tests φ such that φ (σ )=δ . Proposition 8. Let S denote a state space such that i i j ij The states σ and σ are said to be orthogonal if σ and 1. If both ρ⊥σ and ρ⊥σ then ρ⊥1(σ +σ ). 0 1 0 1 2 2 1 2 σ are perfectly distinguishable in the smallest face F of 2. If σ ⊥σ then σ and σ are perfectly distinguishable. 1 1 2 1 2 Then each state satisfies weak spectrality. impliesthatx =0foralli.InanformallyrealJordanalgebra i we write x≥0 if x is a sum of squares. Definition 9. If all orthogonaldecompositionsof a state have For a matrix (M ) the trace tr is defined by tr[M] = the same spectrum then the common spectrum is called it the mn M . The trace vanish on commutators and associators. spectrum of the state and the state is said to be spectral. We n nn For an associative algebra it means that tr[MN]= tr[NM] saythatastatespaceS isspectralifallstatesinS arespectral. P and for a Jordan algebra it means that tr[(a◦b)◦c] = Often we are interested in a stronger condition. tr[a◦(b◦c)] (see [4] for details). For a Jordan algebra one can define an inner product by hx,yi = tr[x◦y]. With this Definition 10. Let σ denote a state with the orthogonal inner product the positive cone becomes self-dual. decompositions σ = s σ = r ρ . We say that the de- i i j j In any finite dimensional formally real Jordan algebra we compositionsarestrictlyspectralif f(s )σ = f(r )ρ P P i i j j maydefinethedensityoperatorsasthepositiveelementswith foranyrealvaluedfunctionf.Astateisstrictlyspectralifany P P trace 1. Then the density operators of a Jordan algebra is a orthogonaldecomposition is strictly spectral. A state space is spectral set. Any formally real Jordan algebra is a sum of strictly sprectral if any state is strictly spectral. simple formally real Jordan algebras. There are 5 types of The important case is when the function f equals 1 for simple Eucledian Jordan algebras leading to the following λ whichwe getthat s σ = r ρ impliesthat σ = convex sets: i i j j si=λ i ρ . An element of the form σ is called an Spin factor A unit ball in d real dimensions. rj=λ j P P si=λ i P idempotent. Strict spectrality means that a state has a unique Real n×n density matrices over the real numbers. P P decomposition σ = λ e where λ are distinct eigenvalues Complex n×n density matrices over complex numbers k k k and e are orthogonal idempotents. We note that if the state Quaternionic n×n density matrices over quaternionians. k P space is strictly spectral then for any element x in the Albert 3×3 density matrices with octonian entries. generated real vector space we have y ≥ 0 if and only if The first four types are called the special types and the there existsx such thaty =x2. Inparticular x2 =0 if and last one is called the exceptional type. In each of the four i only if x =0 for all i. special Jordan algebras the product is defined by (3) from an i P associativeproduct.ThisisnotpossiblefortheAlbertalgebra, B. Jordan algebras which is the reason that it is said to be exceptional. See [5] The notion of a spectral set is related to self-duality of forgeneralresultsonJordanalgebrasand[6]fordetailsabout the cone of positive elements, which leads to the following quantum mechanics based on quatonians. The 2×2 matrices conjecture. have real, complex, quaternionic or octonionic entries can be Conjecture 11. If a finite dimensional convex compact set is identified with spin factors with d = 2, d = 3, d = 5, and spectral and has a transitive symmetry group then the convex d = 9. The most important example of a spin factor is the qubit. set can be represented as positive elements with trace 1 in a A sum of different simple Jordan algebras does not fulfill simple Jordan algebra . the strong symmetry mentioned in [3] although it is spectral The density matrices with complex entries play a crusial and fulfill projectivity. This was left as an open question in role in the mathematical theory of quantum mechanics and it [3]. is well-known that the density matrices is a spectral set. For each density matrix the spectrum equals the usual spectrum C. Separable states calculated as roots of the characteristic polynomium. In the LetH ⊗H denoteatensorproductoftwocomplexHilbert 1 2 1930’tiesP. Jordan generalized the notion of Hermitean com- spaces.ThenB(H ⊗H )=B(H )⊗B(H ). Theseparable 1 2 1 2 plex matrix to the notion of a Jordan algebra in an attempt states are mixtures of states of the form σ ⊗σ where σ 1 2 i to provide an alternative to the complex Hilbert spaces as the is a density operator in B(H ). We will show that the set of i mathematical basis of quantum mechanics. For instance the separable states is spectral. complex Hermitean matrices form a Jordan algebra with the First we note that for A ∈ B(H ⊗H ) and B ∈ B(H ) 1 2 1 quasi-product defined by and C ∈B(H )we have 1 1 x◦y = (x+y)2−x2−y2 . (3) tr(A(B⊗C))=tr1(tr2(A)B)·tr2(tr1(A)C) 2 (cid:16) (cid:17) where tr and tr denote partial traces. Therefore Adirectexpansionofthesquaresleadtox◦y = 1(xy+yx). 1 2 2 tr(A(B⊗C)) is extreme when tr (tr (A)B) and An formally real Jordan algebra is an algebra with compo- 1 2 tr (tr (A)C) are extreme. Now tr (tr (A)B) is minimal sition ◦ that is commutative and satifies the Jordan identity 2 1 1 2 ifB issupportenontheeigenspaceoftheminimaleigenvalue (x◦y)◦(x◦x)=x◦(y◦(x◦x)). of tr (A) and tr (tr (A)B) is maximal if B is supporten 2 1 2 on the eigen space of the maximal eigenvalue of tr (A). In Further it is assumed that 2 particular B ⊗C and B ⊗C are orthogonal as elements n 1 1 2 2 of the set of separable states if B ⊥B or C ⊥C . Since x2 =0 1 2 1 2 i tr((B ⊗C )(B ⊗C )) = tr(B B )tr(C C ) we have 1 1 2 2 1 2 1 2 i=1 X that B ⊗C and B ⊗C are orthogonalas elements of the A state space issaid to be symmetric if anyn-framecan be 1 1 2 2 set of separable states if and only if B ⊗C and B ⊗C transformed into any other n-frame by an affine map of the 1 1 2 2 are orthogonal under the Hilbert-Schmidt inner product on state space into itself. B(H ⊗H ). 1 2 Conjecture 14. A symmetric spectral state space is either Nowanyseparablestatecanbedecomposedinto λ B ⊗ i i i a simplex or it can be represented as density elements of a C . whereB ⊗C areorthogonalpurestatesonB(H ⊗H ) anidsincetheidensiitymatricesonB(H ⊗H )formPa1spectr2al formally real Jordan algebra. 1 2 set the coefficients λi is uniquely determined. The separable IV. QUANTIZATION OF ENERGY states satisfy Bell type inequality and each Bell inequality Often the term quantum theory is used only for systems determines a face of the set of separable states. These faces that have physical implementations. Here we will justify that donothavecomplementaryfacessothesetofseparablestates theterm“quantum”isusedforJordanalgebrasandevenmore does not satisfy the property called projectivity. generalcases.AssumethatS isfinitedimensionalstatespace. Let g : S → S denote the transformation that maps a state t D. Symmetry at time 0 into state at time t. We assume that g = g ◦g s+t s t RecalthataconvexsetC isbalancedabouttheoriginifP ∈ so that we get a representation of R0,+. We shall restrict the discussion to the harmonic oscillator. In classical physics a C implies−P ∈C where−P denotesthepointwithopposite sign of all coordinates. A spectral set of rank 2 is balanced, harmonic oscilator has a period T such that the state at time i.e. it is symmetric around a central point and all boundary t equals the state at time t + T. Therefore we may call a points are extreme. Two states in the set are orthogonal if system with state space S a harmonic oscillator if gt+T =gt and only if they are antipodal. Any state can be decomposed for all t∈R0,+ and this representationcan be extendedto R. Since the representation is periodic it may be considered as a into two antipodal states. If the state is not the center of the balanced set this is the only orthogonal decomposition. The representation of T=R/τ where τ is the circle constant 2π. center can be decomposed into a 1/2 and 1/2 mixture of any Wesaythattherepresentationt→gtofTintotransformations pair of antipodal points. gt :S →S is irreducible if any convex subset S′ ⊆S that is is invariant under g spans S. t Proposition 12. In two dimensionsa simplex and a balanced Proposition 15. If t→g is an irreducible representation of set are the only types of spectral sets. t T on the state space S then S is a point or S has the shape Astatespaceofrank2issaidtohavesymmetrictransission of a disk. If S is a disk then there exists a n∈N such that t probabilities[7, Def.9.2(iii)] ifforanystatesσ1 andσ2there is mapped into a rotation by the angle n·t. exists test φ such that φ (σ )=1 and min φ (σ)=0. i i i σ i In the standard Hilbert space formalism the representation Theorem13. A spectralstate spaceofrank2with symmetric characterizedby n is describedby a Hamiltonian with energy transmissionprobabilitiescanberepresentedbyaspinfactor. ~ω(n+1/2).IfTisrepresentedonafinitedimensionalstate space but the representation is not irreducible then the repre- Proof: Let C denote an intersection between the state sentation can be decomposed into irreducible representations space and a two dimensional hyperplane through the center eachcharacterizedbyaspecificenergy.Inthissenseenergyis c of the state space. Let σ and σ denote states such that 1 2 quantized. In the standard formulation of quantum mechanics φ1(σ2) = φ2(σ1) = 1/2. Define φ˜i = 2 · φi − 1 so that the energy is represented by a operator that is an observable, φ˜ (σ ) = 1 and φ˜ (c) = 0. For any state σ we may define i i i butforotherJordanalgebrasthereexistsconnectedsymmetry the coordinates by x = φ˜ (σ) and y = φ˜ (σ). Let φdenote 1 2 groups that cannot be represented by observables [8]. a test that equals 1 on σand equals 0 on the antipodal of σ. Let φ˜=2·φ−1. If the boundaryof the state space at σ only V. OPTIMIZATION has one supporting hyperplane then φ˜is uniquely determined. Let A denote a subset of the feasible measurements such Then φ˜(σ ) = x and φ˜(σ ) = y. Therefore the projection 1 2 that a ∈ A maps the convex set S into a distribution on of σ along the supportinghyperplanhascoordinates x2,xy 1 the real numbers i.e. the distribution of a random variable. and the projection of σ has coordinates xy,y2 . Therefore 2 (cid:0) (cid:1) The elements of A may represent feasible actions (decisions) the vectors x2−1,xy and xy,y2−1 are parallel so that (cid:0) (cid:1) that lead to a payoff like the score of a statistical decision, (cid:0) x2(cid:1)−1 (cid:0) xy (cid:1) the energy extracted by a certain interaction with the system, =0 (minus) the length of a codeword of the next encoded input xy y2−1 (cid:12) (cid:12) letter using a specific code book, or the revenue of using a (cid:12) 1−x2−y2(cid:12)=0 (cid:12) (cid:12) certain portfolio. For each σ ∈S we define (cid:12) (cid:12) so that σ has coordinates that lie on a unit circle. Almost ha,σi=E[a(σ)]. all points on the boundary of C has a unique supporting and hyperplane. Therefore almost all points lie on a circle so all F (σ)= supha,σi. points must lie on a circle. a∈A Without loss of generality we may assume that the set of Theorem 18. For any set of actions actions A is closed so that we may assume that there exists C =supinf t ·D (σ ,a) a ∈ A such that F(σ) = ha,σi and in this case we say that F i F i a is optimal for σ. We note that F is convex but F need not ~t a Xi be strictly convex. where the supremum is taken over all probability vectors ~t If F (σ) is finite then we define the regret of the action a supported on S. by This result can improved. D (σ,a)=F (σ)−ha,σi. F Theorem 19. If (t ,t ,...,t ) is a probability vector on the 1 2 n statesσ ,σ ,...,σ withσ¯ = t ·σ anda istheoptimal 1 2 n i i opt F action for minimax regret then P s0 s1 C ≥inf t ·D (σ ,a)+D (σ¯,a ). F i F i F opt a X If a is an action and σ is optimal then opt supD (σ ,a)≥C +D (σ ,a). F i F F opt i Proof: See [9, Thm. 2]. D (s ,s ) A. Bregman divergences F 0 1 Ifthestateisρbutoneactsasifthestatewereσonesuffers a regret that equals the difference between what one achieves Fig.2. Theregretequals thevertical betweencurveandtangent. and what could have been achieved. If ai are actions and (ti) is a probability vector then we Definition 20. If F (ρ) is finite then we define the regret of we may define the mixed action ti·ai as the action where the state σ as we do a with probability t . We note that h t ·a ,σi = i i i i ti·hai,σi.WewillassumethatPallsuchmixtPuresoffeasible DF (ρ,σ)=inafDF (ρ,a) actions are also feasible. If a (σ)≥a (σ) almost surely for 1 2 P wheretheinfimumistakenoveractionsathatareoptimalfor all states we say that a dominates a and if a (σ)>a (σ) 1 2 1 2 σ. almostsurelyforallstatesσwesaythata strichtlydominates 1 a2. All actions that are dominated may be removed from A If the state σ has the unique optimal action a then without changing the function F. Let A denote the set of F measurements m such that hm,σi ≤ F (σ). Then F (σ) = F (ρ)=DF (ρ,σ)+ha,ρi (5) sup ha,σi. Thereforewe mayreplaceA by A without a∈AF F sothefunctionF canbereconstructedfromDF exceptforan changing the optimization problem. affine function of ρ. The closure of the convexhull of the set Definition 16. If F(σ) is finite the regret of the action a is offunctionsσ →ha,σi isuniquelydeterminedbythe convex defined by function F. D (σ,a)=F (σ)−ha,σi (4) TheregretiscalledaBregmandivergenceifitcanbewritten F in the following form Proposition 17. The regret D has the following properties: F D (ρ,σ)=F (ρ)−(F (σ)+hρ−σ,∇F (σ)i) (6) F • DF (σ,a)≥0 with equality if a is optimal for σ. • σ →DF (σ,a) is a convex function. where h·,·i denotes some inner product. In the context of • If a¯ is optimal for the state σ¯ = ti · σi where forecasting and statistical scoring rules the use of Bregman (t ,t ,...,t ) is a probability vector then divergences dates back to [10]. 1 2 ℓ P Theorem 21. The following conditions are equivalent. t ·D (σ ,a)= t ·D (σ ,a¯)+D (σ¯,a). i F i i F i F Foreachstatesandallactionsa anda suchthatF(σ)= 1 2 • Xti·DF (σi,a)isminXimalifaisoptimalforσ¯ = ti· ha1,σi=ha2,σi we have E◦a1 =E◦a2. σ . The function F is differentiable. i P P The regret D is a Bregman divergence. If the state is not know exactly but we know that σ is one of F the states σ1,σ2,...,σn thenthe minimaxregret is definedas We note that if DF is a Bregman divergence and σ mini- mizesF then∇F (σ)=0sothattheformulafortheBregman C =infsupD (σ ,a). F F i divergence reduces to a i We have the following result. D (ρ,σ)=F (ρ)−F (σ). F Bregman divergences satisfy the Bregman identity where the infimum is taken over all spectra of x. t ·D (ρ ,σ)= t ·D (ρ ,ρ¯)+D (ρ¯,σ), (7) Since entropy is decreasing under majorization the entropy i F i i F i F ofxisattainedatanorthogonaldecomposition.Thisdefinition butXif F is not differentiXable this identity can be violated. extendsasimilardefinitionoftheentropyofastateasdefined Example 22. Let the state space be the interval [0,1] with by Uhlmann [11]. two actionsa (σ)=1−2σ and a (σ)=2σ−1. Letσ =0 In general this definition of entropy does not provide a 0 1 0 and σ1 =1. Let further t0 =1/3 and t1 =2/3. Then σ¯ =2/3. concavefunctiononthepositivecone.Forinstancetheentropy If σ =1/2 then of points in the square from Example 6 has local maximum in the four points with maximal spectrum in the majorization ti·DF (σi,s)=0 ordering. A characterization of the convex sets with concave entropy functions is lacking. but X Proposition 25. Assume that the entropy function H on a ti·DF (σi,σ¯)= state space is strictly concave. Then any uniform mixture of X 1 2 orthogonal pure states give the maximum entropy state. ·(a (0)−a (0))+ ·(a (1)−a (1)) 0 1 1 1 3 3 Proof: Assume that σ = 1 n σ and ρ = 1 n ρ 1 2 n i=1 i n i=1 i = ·(1−(−1))= . aredecompositionsintoorthogonalpurestates.ThenH(σ)= 3 3 H(ρ) = ln(n) and H σ+ρ ≥Pln(n)with equalityP, which 2 We also have DF (σ¯,σ) = 0. Clearly the Bregman identity implies that σ =ρ (7) is violated. Theentropyisdefined(cid:0)asfo(cid:1)rgeneralconvexsetandwewill Assume that a¯ is optimal for ρ¯. Then prove that H is a concave on the cone of positive elements in a Jordan algebra. The following exposition is inspired by t ·(F(ρ ))= t ·(D (ρ ,ρ¯)+ha¯,ρ i) i i i F i i similar result for complex matrix algebras stated in [12], but X =Xt ·D (ρ ,ρ¯)+ a¯, t ·ρ the proofshavebeenchangedso thattheyare valid on Jordan i F i i i algebras. =Xt ·D (ρ ,ρ¯)+Dha¯,ρ¯Xi E i F i Lemma 26. For elements A and B in a Jordan algebra and =Xti·DF (ρi,ρ¯)+F (ρ¯). any analytic function f we have Therefore X d tr[f(A+tB)] =tr[f′(A)◦B]. dt t ·D (ρ ,ρ¯)= t ·(F (ρ ))−F(ρ¯). (cid:12)t=0 i F i i i (cid:12) Proof: First assume that(cid:12)f(z)=zr. Then X VI. SUFFICIEXNCY CONDITIONS (cid:12) d d In this section we will introduce various conditions on a tr[f(A+tB)] =tr (A+tB)r dt dt Bregman divergence. Under some mild conditions they turn (cid:12)t=0 (cid:20) (cid:12)t=0(cid:21) out to be equivalent. (cid:12)(cid:12) r−1 (cid:12)(cid:12) (cid:12) =tr Ai◦B◦A(cid:12)r−1−i Theorem 23. Let DF denote a regret function defined on an "i=0 # X arbitrary state space. Then we have the implications 3. ⇒ r−1 4.⇒5.⇒6.⇒1.⇒2. =tr Ai◦Ar−1−i◦B 1. The function F equals entropy times a constant plus an "i=0 # X affine function. =tr n·Ar−1◦B 2.TheregretDF isproportionaltoinformationdivergence. =tr[f′(A)◦B]. (cid:2) (cid:3) 3. The regret function satisfies strong sufficiency. 4.Theregretismonotone,i.e.itsatisfiesthedataprocessing As a consequence the theorem holds for any polynomial and inequality. also for any analytic function because such functions can be 5. The regret is sufficiency stable. approximated by polynomials. 6. The regret is local. Lemma 27. In a real Jordan algebra the following formula Theconditions1.,2.,and6.areequivalentonformallyreal holds for any analytic function f Jordan algebras with at least three orthogonal states.. d2 A. Entropy in Jordan algebras dt2tr[f(A+tB)] = ak,ltr[(Ek◦B)◦(Eℓ◦B)] (cid:12)t=0 k,ℓ Definition 24. Let x denote an element in a positive cone. (cid:12) X (cid:12) The entropy of x is be defined as where A= kλkEk(cid:12) is an orthogonal decomposition and n P f′(λk)−f′(λℓ) forλ 6=λ , H(x)=inf − λiln(λi) ak,ℓ = λk−λℓ k ℓ i=1 ! (f′′(λk) forλk =λℓ. X Proof: Let f denote the function zr. Then Assume that the Jordan algebra is special we have d2 tr[f(A+tB)] = dtr[f′(A+tB)◦B] tr[(Ek◦B)◦(Eℓ◦B)] dt2 (cid:12)t=0 dt (cid:12)t=0 1 = dtr (cid:12)(cid:12)r·(A+tB)r−1◦B (cid:12)(cid:12) = 4tr[(EkB+BEk)(EℓB+BEℓ)] dt h(cid:12) r−j−2timie(cid:12)(cid:12)st=0 (cid:12) = 41tr EkBEℓB+EkB2Eℓ+BEkEℓB+BEkBEℓ =r·trr−2(((Aj ◦B)◦A)◦A)◦··(cid:12)(cid:12)·◦A◦B = 41tr(cid:2) (EkBEℓ)(EkBEℓ)∗+(EℓBEk)(EℓBEk)∗ (cid:3) Xj=0 z }| { (cid:2) ≥(cid:3)0. r−2 Assume that the Jordan algebra is exceptional, i.e. the =r·tr Aj ◦B ◦ Ar−2−j ◦B . Albert algebra. There exists an F4 automorphism of the Xj=0(cid:0) (cid:1) (cid:0) (cid:1) Jordan algebra such that Ek is orthogonal and without loss Assume A= λ E where E are orthogonalpure states. of generality we may assume that k k k k Then P 1 0 0 d2 E = 0 0 0 . tr[f(A+tB)] = k dt2 0 0 0 (cid:12)t=0 (cid:12) r−2 (cid:12) The idempotentEℓ has the form vv∗ for some column vector r·trXj=0 Xk λjk·(cid:12)Ek◦B!◦ Xℓ λℓr−2−j ·Eℓ◦B! v wvit1h entries in a quaternionicsub-algebra.Assume thatv = r−2 v2 . Then =r· λjλr−2−jtr[(E ◦B)◦(E ◦B)] v k ℓ k ℓ 2 j=0 k,ℓ XX v v¯ v v¯ v v¯ 1 1 1 2 1 3 r−2 E = v v¯ v v¯ v v¯ . =r· λjλr−2−j tr[(E ◦B)◦(E ◦B)] ℓ 2 1 2 2 2 3 k ℓ k ℓ v3v¯1 v3v¯2 v3v¯3 k,ℓ j=0 X X r−2 Now tr(Ek◦Eℓ) = v1v¯1 = 0 since Ek and Eℓ are orthog- = r· λjλr−2−j tr[(E ◦B)◦(E ◦B)]. onal. Therefore v1 = 0. Without loss of generality we may k ℓ k ℓ assume that k,ℓ j=0 X X Now 0 0 0 E = 0 1 0 , rj=−02λjkλrℓ−2−j =(rr··λ(rk−rλk1−−−rλ1·ℓλ)rℓλ−rk1−2 ffoorrλλkk 6==λλℓℓ, ℓ 0p 0a 0¯b X = f′(λλkk)−−λfℓ′(λℓ) forλk 6=λℓ, B = a¯b mc¯ nc . (f′′(λk) forλk =λℓ. Then Since the formula holds of all powers, it also holds for all 2p a ¯b polynomials and for all analytic functions because these can 1 E ◦B = a¯ 0 0 be approximated by polynomials. k 2 b 0 0 Theorem 28. In a formally real Jordan algebra the entropy 0 a 0 function is a concave function on the positive cone. 1 E ◦B = a¯ 2m c ℓ 2 Proof: Let f denote the holomorphic function 0 c¯ 0 f(z) = −zlnz, z > 0. We have to prove that and tr[f((1−t)A+tX)] = tr[f(A+tB)] is concave where B =X −A. We have (E ◦B)◦(E ◦B)= k ℓ d2 2aa¯ 2pa+2ma+¯bc¯ ac dt2tr[f(A+tB)] = ak,ltr[(Ek◦B)◦(Eℓ◦B)] 1 2pa¯+2ma¯+c¯¯b 2a¯a a¯¯b . (cid:12)(cid:12)t=0 Xk,ℓ 4 c¯¯b ba 0 and the coefficients a(cid:12) are negative because f is concave. (cid:12)k,ℓ We need to prove that tr[(E ◦B)◦(E ◦B)] ≥ 0, but a Therefore k ℓ formallyrealJordanalgebracanbewrittenasasumofsimple tr[(E ◦B)◦(E ◦B)]=|a|2 ≥0. k ℓ Jordanalgebrassoitissufficienttoprovepositivityonsimple algebras. B. Information divergence The minimax-result them implies that C = ln(2). There- −H fore σ¯ = 1σ + 1σ . If ρ and ρ are orthogonal and Definition 29. If the entropy is a concave function then the 2 1 2 2 1 2 ρ¯= 1ρ + 1ρ . Then Bregman divergence D is called information divergence. 2 1 2 2 −H 1 1 The information divergence is also called Kullback-Leibler ln(2)≥ D(ρ1kρ)+ D(ρ2kρ¯)+D(ρ¯kσ¯) 2 2 divergence, relative entropy or quantum relative entropy. In a 1 1 = ln(2)+ ln(2)+D(ρ¯kσ¯) Jordan algebra we get 2 2 =ln(2)+D(ρ¯kσ¯). D (P,Q)= −H Therefore D(ρ¯kσ¯) = 0 so that ρ¯ = σ¯. Therefore the state −H(P)−(−H(Q)+hP −Q,−∇H(Q)i) space is balanced and thereby it is spectral. =H(Q)−H(P)+hP −Q,∇H(Q)i C. Strong monotonicity =tr[f(Q)−tr(f(P))+tr((P −Q)◦f′(Q))] Definition 31. A regret function D is said to satisfy strong =tr[f(Q)−f(P)+(P −Q)◦f′(Q)] F monotonicity if for any transformation Φ of the state space into itself the equation where f(x)=−xln(x). Now f′(x)=−ln(x)−1 so that D (Φ(ρ),Φ(σ))=D (ρ,σ) F F f(Q)−f(P)+(P −Q)f′(Q) implies that there exists a recovery map Ψ, i.e. a map of the =−Q◦ln(Q)+P ◦ln(P)+(P −Q)(−ln(Q)−1) statespaceintoitselfsuchthatΨ(Φ(ρ))=ρandΨ(Φ(σ))= σ. =P ◦(ln(P)−ln(Q))+Q−P. In statistics where the state space is a simplex strong Hence monotonicity is well established. Example 32. Squared Euclidean distance on a spin factor is D (P,Q)=tr[P ◦(ln(P)−ln(Q))+Q−P] −H a Bregman divergence that satisfies strong monotonicity. To and for states P,Q it reduces to see this we note an transformationΦ can be decomposedinto a translation and a linear map. Since the transformationmaps the unit ball into itself the maximal eigenvalue of the linear D (P,Q)=tr[P ◦ln(P)−P ◦ln(Q)]. −H map must be 1. Therefore Proposition30. IfastatespaceS hasrank2andtheentropy D (Φ(ρ),Φ(σ))=D (ρ,σ) F F function H is concave then the state space is spectral. implies that ρ and σ belong to a subspace that has eigenvalue Proof: Let C−H denote the capacity of the state space 1.Theintersectionofthesubspacespannedbyρandσandthe and. We have state space is a disc of radius1 and this disc must be mapped into anotherdisc of radius1 in the state space. Since anydisc of radius 1 can be mapped into any other disc of radius one C = sup q D ρ q ρ −H i j i i (q1,q2...qn)Xj (cid:13)(cid:13)Xi ! there exists a recovery map. (cid:13) (cid:13) Proposition33. IftheregretfunctionDF isstronlymonotone = sup H q ρ (cid:13)− q H(ρ ) i i i j then F is strictly convex. (q1,q2...qn) i ! j X X Proof: Assume that F is not strictly convex. Then there =supH(ρ). ρ∈S exists states ρ and σ such that F ρ+σ = F(ρ)+F(σ). Then 2 2 D (ρ,σ) = 0. Let Φdenote a contraction around ρ+σ. If ρ = q1ρ1 +q2ρ2 then H(ρ) ≤ −q1ln(q1)−q2ln(q2) ≤ ThFen DF (Φ(ρ),Φ(σ)) = 0 but(cid:0)a con(cid:1)traction cannot h2ave ln(2). Therefore supρ∈SH(ρ) ≤ ln(2). Let σ¯denote a a recovery map on a compact set. capacityachievingstate. Assume thatσ¯ =p1σ1+p2σ2 where For density matrices over the complex numbers strong σ1 and σ2 are pure orthogonal states. Then monotonicity has been proved for completely positive maps in [13]. Some new results on this topic can be found in [13]. C =supD(ρkσ¯) −H ρ∈S D. Feasible transformations and monotonicity ≥ max{D(σ1kσ¯),D(σ2kσ¯)} We considerasetT offeasibletransformationsofthestate i=1,2 space.By a feasible transformationwe mean a transformation 1 1 = max ln ,ln thatweareabletoperformonthestatespacebeforewechoose i=1,2(cid:26) (cid:18)p1(cid:19) (cid:18)p2(cid:19)(cid:27) afeasibleaction.LetΦ:S yS denoteafeasibletransforma- ≥ln(2). tionandletadenoteafeasibleaction.Thena◦Φistheaction σ → a(Φ(σ)). Thus the set of feasible transformations acts Proposition39. Ifaregretfunctionisstronglymonotone,then on the set of actions. If Ψ and Φ are feasible transformations the regret function is monotone. then we will assume that Ψ◦Φ is also feasible. Further we Proof: Assume that the regret function D is strongly will assume that the identity is feasible. Led F denote the F monotone.Then there exists states ρ and σ and a transforma- monoid of feasible transformations. Finally we will assume tionΦsuchthatD (Φ(ρ)Φ(σ))≥D (ρ,σ).LetΘ denote that (1−s)·Ψ+s·Φ is feasible for s ∈ [0,1] so that F F F ǫ a contraction around Φ(ρ) with factor ǫ ∈ [0,1]. Then there becomes a convex monoid. exists a value of ǫ such that D (Θ (Φ(ρ)),Θ (Φ(σ))) = F ǫ ǫ Proposition 34 (The principle of lost opportunities). If Φ is D (ρ,σ). Therefore Θ ◦Φ has a reverse Ψ such that F ǫ a feasible transformation then Ψ((Θ ◦Φ)(ρ))=ρ ǫ F (Φ(σ))≤F (ρ). (8) Ψ((Θ ◦Φ)(σ))=σ ǫ Proof: See [9, Prop. 4]. Therefore Since the feasible transformations increase the value of F the set of states with minimal value of F is invariant under (Φ◦Ψ)(Θǫ(Φ(ρ)))=Φ(ρ) feasible transformations. (Φ◦Ψ)(Θ (Φ(σ)))=Φ(σ) ǫ Proposition 35. Let S be a state space. Then the set TF of so that Φ◦Ψ is a recovery map of the contraction Θǫ. This transformations Φ : S y S such that F (Φ(σ)) ≤ F (σ) for is only possible if ǫ = 1 implying that D (Φ(ρ)Φ(σ)) = F all σ ∈S is a convex monoid and for any action a∈AF we DF (ρ,σ). have that a◦Φ∈A . F E. Sufficiency Proof: See [9, Prop. 5]. The present definition of sufficiency is based on [16], but Corollary 36 (Semi-monotonicity). Let Φ denote a feasible there are a number of other equivalent ways of defining this transformation and let σ denote a state that minimizes the concept. We refer to [13] where the notion of sufficiency is function F. If DF is a Bregman divergence then discussed in great detail. DF (Φ(ρ),Φ(σ))≤DF (ρ,σ). (9) Definition 40. Let (σθ)θ denote a family of states and let Φ denote an affine transformation S → T where S and T Proof: See [9, Cor. 1]. denote state spaces. Then Φ is said to be sufficient for (σ ) θ θ Next we introduce the stronger notion of monotonicity. if there exists an affine transformation Ψ : T → S such that Definition37. LetDF denotearegretfunctionontheconvex Ψ(Φ(σθ))=σθ. We say that Φ is reversible if Φ is feasible set C. Then DF is said to be monotone if and there exist a feasible Ψ such that Ψ(Φ(σθ))=σθ. Proposition41. IfD isaregretfunctionandΦisreversible D (Φ(ρ),Φ(σ))≤D (ρ,σ) F F F for ρ and σ then for any affine transformation Φ:C →C. D (Φ(ρ),Φ(σ))=D (ρ,σ). F F In information theory an inequality of this type is often Proof: See [9, Prop. 6]. calledadataprocessinginequality.Ingeneralaregretfunction need notbe monotone[9, Ex. 5]. Recently it has been proved The notion of sufficiency as a property of divergences was that information divergence on a complex Hilbert space is introducedin[17].Thecrucialideaofrestrictingtheattention decreasing under positive trace preserving maps [14], [15]. totransformationsofthestate spaceintoitselfwasintroduced Previouslythiswasonlyknowntoholdifsomeextracondition in [18]. It was shown in [18] that a Bregman divergence on like complete positivity was assumed. the simplex of distributions on an alphabet that is not binary determines the divergence except for a multiplicative factor. Theorem 38. Information divergence is monotone under any Here we generalize the notion of sufficiency from Bregman positive trace preserving map on a special Jordan algebra. divergences to regret functions. Proof: The proof is a step by step repetition of the Definition42. WesaythattheregretD onthestatespaceS F proof by Müller-Hermes and Reep [14], where they proved satisfies sufficiency if D (Φ(ρ),Φ(σ))=D (ρ,σ) for any F F the theorem for density matrices over the complex numbers. affine transformation S →S that is sufficient for (ρ,σ). See also[15]wherethesame prooftechniqueisused.Intheir proof they use the sandwiched Rényi divergence defined by Proposition 43. A monotone regret function DF satisfies Dα(ρkσ)= α−11ln tr σ12−ααρσ12−αα α , and we note that sufficiency. thisquantitycanbed(cid:16)efinhe(cid:12)dandmanipu(cid:12)lait(cid:17)edasintheproofof Proof: See [9, Prop. 7]. (cid:12) (cid:12) ReepandMüller-Hermes(cid:12)aslongasth(cid:12)ealgebraisassociative. Combining the previous results we get that information divergence on a special Jordan algebra satisfies sufficiency. Lemma 44. If a regret function on a state space of rank 2 Anotherway to state the resultis thata regretfunctionthat satisfies sufficiency, then the state space is spectral. is based on a function F that is not differentialbe violates monotonicity. Proof: For i = 1,2 let σ and ρ denote pure states and i i let φ is a test such that φ (σ )=1 and φ (ρ )=0. Let m Theorem46. Ifastatespaceofrank2hasamonotoneregret i i i i i i denotethemidpointm = 1σ +1ρ .Thenthetransformation function, then the state space can be represented by a spin i 2 i 2 i Φ given by factor. Φ(π)=φ (π)σ +(1−φ (π))ρ Proof: First we note that a monotoneregretfunctionis a 1 2 1 2 Bregman divergence. Since a monotone Bregman divergence is sufficient for the pair (σ ,m ) with recovery map Ψ given 1 1 satisfies sufficiency the state space must be balanced. Let c by denotethecenterofthestatespace.Withoutlossofgenerality Ψ(π)=φ (π)σ +(1−φ (π))ρ . 2 1 2 1 we will assume that the state space S is two dimensional.Let Using sufficiency we have D (pσ +(1−p)ρ ,m ) = Φdenote a transformation of the state space that first rotate it F 1 1 1 D (pσ +(1−p)ρ ,m ). Define by an angle of θand then dilate it around c with a factor r< F 2 2 2 f(p) = D (pσ +(1−p)ρ ,m ). Then 1.Thefactorcanbechoseninsuchawaythattheboundaryof F 1 1 1 D (pσ +(1−p)ρ ,m )=f(p)foranypairoforthogonal Φ(S) touches the boundary of S. Assume that the boundary F 2 2 2 states (σ ,ρ ) so this divergence is completely determined ofΦ(S)touchestheboundaryofS inthepurestateσ′ andlet 2 2 by the spectrum (p,1−p). In particular any orthogonal φdenote a test on S such that φ(σ′)=1 and φ(c)=1/2. Let decomposition must have the same spectrum so that the state Ψ denote a map that streches Φ(S) without changing φ. The space is spectral. map Ψ should strech so muchthat the boundaryof Ψ(Φ(S)) Let S denote a state space of rank two with regretfunction touch the boundary of S in a pure state ρ′ different ρ′ that is that satisfies sufficiency. Then S is spectral and therefore not co-linear with c and σ′. Let ρand σ denote the preimages balanced. Let c denote the center of S. Then any state σ has ofρ′ andσ′.Letπ denoteapureoftheshortestpathalongthe a decomposition σ = tρ+(1−t)c where ρ is a pure state. boundaryofS.Thenthereexistamixtureσ¯ =(1−s)·π+s·c We have D (σ,c) = f t+1 . Then there exists an affine thatalsohasadecompositionoftheformσ¯ =(1−t)·ρ+t·σ. F 2 function g such that F(σ) = f t+1 +g(t). For t = 0 we Then (cid:0) (cid:1) 2 get g(0)=F (c). ThereforeF ((cid:0)σ)=(cid:1)f t+21 +F (c)+h(t) σ¯′ =(1−s)·π′+s·c=(1−t)·ρ′+t·σ′. where h is a linear function that that may depend on ρ. (cid:0) (cid:1) Using monotonicity we have Proposition 45. A monotone regret function is a Bregman divergence. (1−t)·D (ρ′,σ¯′)+t·D (σ′,σ¯′) F F Proof: A regret function is given by a function F and ≤(1−t)·DF (ρ,σ¯)+t·DF (σ,σ¯) we have to prove that this function is differentiable. Since F so that is convex it is sufficient to prove that F is differentiable in anydirection.Letρand σ denotetwo states and letΦdenotea (1−t)·D (ρ′,c)+t·D (σ′,c)−D (σ¯′,c) F F F contractionaroundρbyafactort∈[0,1].ThenΦ(ρ)=ρand ≤(1−t)·D (ρ,c)+t·D (σ,c)−D (σ¯,c) Φ(σ) = (1−t)·ρ+t·σ. Let πdenote a state on the line F F F between ρand σ. Monotonicity implies that and D (σ¯′,c)≥D (σ¯,c). Since σ¯ =(1−s)·π+s·c and F F σ¯′ = (1−s)·π′ +s·c and π is a pure state π′ most also DF (Φ(σ),Φ(π))≤DF (σ,π) be a pure state. Therefore Ψ◦Φ maps pure states into pure D ((1−t)·ρ+t·σ,(1−t)·ρ+t·π)≤D (σ,π) states so that Ψ◦Φ mustbe a bijection.Since anymap of the F F state space into itself can be strechedinto a bijectionthe state lim D ((1−t)·ρ+t·σ,(1−t)·ρ+t·π)≤D (σ,π) F F t→1− space must be a disc. lim infF ((1−t)·ρ+t·σ)−ha,(1−t)·ρ+t·πi F. Locality t→1− a (cid:16) ≤infF (σ)−ha,σi (cid:17) Oftenitisrelevanttouse thefollowingweakversionofthe a sufficiency property. F (σ)− lim supha,(1−t)·ρ+t·σi ≤F (σ)−supha,σi Definition 47. The regret function D is said to be local if t→1− a a F (cid:18) (cid:19) where a on the left side should be optimal for Φ(π)and a on DF (ρ,(1−t)ρ+tσ1)=DF (ρ,(1−t)ρ+tσ2) the right hand side should be optimal for π. Then when ρ,σ are perfectly distinguishable for i = 1,2 and t ∈ i ]0,1[. lim supha,(1−t)·ρ+t·σi ≥supha,σi t→1−(cid:18) a (cid:19) a Example 48. On a state space of rank 2 any regret function which implies that the left derivative and the right derivative D is local.Thereasonis thatif σ and σ arestates thatare F 1 2 of F at π are the same so that F is differentiable at π. orthogonal to ρ then σ =σ . 1 2