Table Of Content

Relaxation of the EM Algorithm via Quantum Annealing for Gaussian Mixture Models* Hideyuki Miyahara, Koji Tsumura, and Yuki Sughiyama Abstract—We propose a modified expectation-maximization parametersinGMMsbytheexpectation-maximization(EM) algorithm by introducing the concept of quantum an- algorithm [21]. However, parameter estimation sometimes nealing, which we call the deterministic quantum an- fails since EM depends on initial values and suffers from 7 nealing expectation-maximization (DQAEM) algorithm. The 1 expectation-maximization (EM) algorithm is an established al- the problem of local optima. To relax the problem, Ueda 0 gorithmtocomputemaximumlikelihoodestimatesandapplied and Nakano proposed a deterministic simulated annealing 2 to many practical applications. However, it is known that EM expectation-maximization(DSAEM)algorithm1,anditsuc- n heavilydependsoninitialvaluesanditsestimatesaresometimes ceeds to relax the difficulty of the multimodality in EM. a trapped by local optima. To solve such a problem, quantum Thisalgorithmisbasedondeterministicsimulatedannealing J annealing(QA)wasproposedasanoveloptimizationapproach (DSA) 2, which was proposedby Rose etal. [23], [24]. The motivated by quantum mechanics. By employing QA, we then 2 formulate DQAEM and present a theorem that supports its essence of these approaches is to make objective functions 1 stability. Finally, we demonstrate numerical simulations to smooth by introducing thermal fluctuations without random confirm its efficiency. numbers, and the non-convex problem in optimization is ] L considerably managed without increase of numerical cost. I. Introduction M As we have explained, QA is considered to be effective Combinatorialoptimizationisafundamentalissueinboth than SA in some conditions [12], and thus the quantum . at science and engineering. Although some problems in such versionofDSAisexpectedtobesuperiortoit.Inthispaper, t optimization can be efficiently solved by well-known algo- we propose a deterministic quantum annealing expectation- s [ rithms[1],[2],otherproblemsinaclassofNP-hard,e.g.the maximization (DQAEM) algorithm for Gaussian mixture travelingsalesmanproblem,are essentiallydifficultto solve. models because it is expected that quantum fluctuations can 1 One of the effective approaches for NP-hard problems is relaxtheproblemoflocaloptimainparameterestimation.In v 8 simulated annealing (SA), which was proposed by Kirk- our previous paper [25], we proposed DQAEM for contin- 6 patrick et al. [3], [4]. SA is a generic approach for op- uous latent variables, and obtained the result that DQAEM 2 timization, in which random numbers that mimic thermal outperformed EM. However, its applicability is limited be- 3 fluctuationsareusedtogooverpotentialbarriersinobjective cause the latent variables are assumed to be continuous 0 functions. Furthermore, its global convergence is in some and most difficulties in parameter estimation come from . 1 sense guaranteed by Geman and Geman et al. [5]. After optimization of discrete latent variables, such as Gaussian 0 that, a quantum extension of SA, which is called quantum mixture models. Thus, in this paper, we develop DQAEM 7 annealing (QA), was proposed in physics [6], [7], [8], and for discrete latent variables and apply it to GMMs. After 1 : has been intensively studied [9], [10], [11], [12], [13], [14], the formulation of the algorithm, we present a theorem that v [15], [16], [17]. In QA, instead of thermal fluctuations, guarantees its stability. Finally, to illustrate its efficiency i X quantumfluctuationsareusedto overcomepotentialbarriers compared to EM, we show numerical simulations, in which r in objective functions, and it has been reported that QA is DQAEM is applied to GMMs for data clustering. a more effective than SA for some problems [12]. Especially, This paper is organized as follows. In Sec. II, we review duetoquantumfluctuations,QAexhibitsbetterperformance GMMsandEMtoprepareforDQAEM.InSec.III,whichis than SA when objective functionshave steep multimodality. the main section of this paper, we describe the formulation Such combinatorial optimization also appears in machine of DQAEM in detail and present a theorem on its conver- learning, which has attracted much interest recently [18], gence.InSec.IV,wedemonstratenumericalsimulationsand [19].Forexample,someclass ofdataclusteringisknownto discuss its efficiency. In Sec. V, we conclude this paper. beNP-hardproblems[20].Oneofcommonmethodsfordata II. Reviewoftheexpectation-maximization(EM)algorithm clustering is as follows. Assuming data points are generated andGaussianmixturemodels(GMMs) by Gaussian mixture models (GMMs), we estimate the In this section, we review EM to prepare for introducing *ThisworkwassupportedinpartbyGrant-in-AidforScientificResearch ourDQAEM,andconsideranestimationproblemofGMMs (B)(25289127), JapanSociety forthePromotionofScience. to formulate DQAEM because it is one of the simplest H. Miyahara and K. Tsumura are with the Department of In- formation Physics and Computing, the Graduate School of Infor- models with discrete variables. mation Science and Technology, The University of Tokyo, Japan hideyuki [email protected] 1This algorithm is called the deterministic annealing expectation- Y.SughiyamaarewithInstituteofIndustrialScience, TheUniversity of maximization algorithm inRef.[22]. Tokyo,Japan 2Thisalgorithm iscalled deterministic annealing inRef.[23]. A. Maximum likelihood estimation (MLE) and the Algorithm 1 Expectation-maximization(EM) algorithm expectation-maximization (EM) algorithm 1: initialize θ(0) and set t←0 The aim of this subsection is to describe EM because 2: while convergence criterion is satisfied do DQAEM is based on it. First, we review maximum like- 3: calculateP(σ(i)|y(i);θ(t))(i=1,...,N)with(3) (Estep) lihood estimation (MLE) briefly. Suppose we have N data points Yobs = {y(1),y(2),...,y(N)} and they are independent 4: calculate θ(t+1) = argmaxθQ(θ;θ(t)) where Q(θ;θ(t)) and identically distributed obeying p(y(i);θ) where θ is a is (2) (M step) parameter. Moreover we define p(y(i),σ(i);θ) as the proba- 5: end while bility density functions for complete data with the unobservable variables {σ(1),σ(2),...,σ(N)}. Namely, p(y(i);θ) = p(y(i),σ(i);θ), where Ω(i) represents the domain of parameter is updated by σ(i)∈Ω(i) σP(i). Then the log likelihood function is given by θ(t+1)=argmaxQ(θ;θ(t)). θ N L(Y ;θ)= logp(y(i);θ) Atthe endofthissubsection,we summarizeEMin Algo.1. obs Xi=1 N = log p(y(i),σ(i);θ). (1) B. Gaussian mixture models (GMMs) Xi=1 σ(Xi)∈Ω(i) Here we introduce GMMs and its quantum mechanical representation. We follow the notations in Refs. [18], [19]. Note that i in y(i) and σ(i) is the index for each observed Let y and σ denote continuous observable and discrete data point. MLE is a technique to estimate the parameter unobservable variables. Here, we assume that Ω, which is θ in model distributions that maximize the log likelihood the domain of σ, is given by {1 }K , where function L(Y ;θ). k k=1 obs In general, maximizing the log likelihood function 1k=[0, ..., 0,1,0, ..., 0]⊺, L(Yobs;θ)withrespecttoθisdifficultbecauseitissometimes k−1 K−k a non-convex optimization, and then we replace it with for k=1,...,K, and the|n{thze}num|b{erzo}f elements in Ω is K. its lower bound. Using Jensen’s inequality, we have the Specifically, σ=1 when σ denotes the k-th element in Ω. k following inequality Using the abovenotation, the probabilitydensity function of GMMs is given by N p(y(i),σ(i);θ) L(Yobs;θ)= log P(σ(i)|y(i);θ′)P(σ(i)|y(i);θ′) K Xi=1 σ(Xi)∈Ω(i) p(y;θ)= p(y|σ=1k;θ)P(σ=1k;θ), N p(y(i),σ(i);θ) Xk=1 ≥ P(σ(i)|y(i);θ′)log P(σ(i)|y(i);θ′) where Xi=1σ(Xi)∈Ω(i) ≥Q(θ;θ′) p(y|σ=1k;θ)=g(y;µk,Σk), P(σ=1 ;θ)=π (k=1,...,K), N k k − P(σ(i)|y(i);θ′)logP(σ(i)|y(i);θ′), {π }K satisfies π =1, g(y;µ ,Σ ) is a Gaussian func- Xi=1σ(Xi)∈Ω(i) tioknk=w1ith mean µPkkankd covariancke Σkk for k=1,...,K, and N θ={π ,µ ,Σ }K . The joint probability density function for Q(θ;θ′)= P(σ(i)|y(i);θ′)logp(y(i),σ(i);θ), (2) k k k k=1 GMMs is therefore given by Xi=1σ(Xi)∈Ω(i) p(y,σ;θ)= p(y|σ=1 ;µ ,Σ )p(σ=1 ;π ) σk where θ′ is an arbitrary parameter and P(σ(i)|y(i);θ′) is the k k k k k Yk (cid:2) (cid:3) conditional probability. Then, the procedure of EM consists = π g(y;µ ,Σ ) σk, (4) of the followingtwo steps. Thefirst one,which is called the k k k Estep,istocomputetheconditionalprobabilityP(σ(i)|y(i);θ) Yk (cid:2) (cid:3) by where σk is the k-th element of σ. To introducequantumfluctuations,we needto rewrite the P(σ(i)|y(i);θ′)= p(y(i),σ(i);θ′), (3) above equations in the Hamiltonian formulation. Taking the p(y(i);θ′) logarithm of (4), the Hamiltonian for GMMs can then be p(y(i);θ′)= p(y(i),σ(i);θ′). written as σ(Xi)∈Ω(i) H(y,σ;θ)= h σ , (5) k k Here we have used Bayes’ rule. The second one, which Xk is called the M step, is to maximize the Q function (2) where h = −log{π g(y;µ ,Σ )} for k = 1,...,K. Here, we k k k k withrespecttoθinsteadofL(Y ;θ).Denotingthetentative introduce ket vectors, bra vectors and “spin” operators to obs estimatedparameteratthet-thiterationbyθ(t),theestimated rewrite (5) in the manner of quantum mechanics. First, we define the ket vector |σ=1 i by 1 and the “spin” operator Now we begin to formulate DQAEM. To introduce quan- k k σˆ by tum fluctuations, we add H′(Γ) = Γσˆ′ whose σˆ′ satisfies k [σˆ ,σˆ′],0 for k=1,...,K to the original Hamiltonian H, k σˆ =|σ=1 ihσ=1 | k k k and then (8) is converted to =diag(0,...,0,1,0,...,0), p (y(i),σˆ(i);θ)=exp{−(H(y(i),σˆ(i);θ)+H′(Γ))}. (9) Γ k−1 K−k respectively, where the b|ra{zve}ctor|h{σz=} 1 | satisfies the In MLE, the log likelihood function (7) is optimized. On j the other hand, the objective function in DQAEM, which is orthonormalconditionhσ=1 |σ=1i=δ .Replacingσwith j i ij called the free energy, is given by σˆ, we have the Hamiltonian operator F (θ)=−logZ (θ), Γ Γ H(y,σˆ;θ)= h σˆ , k k Xk where =diag(h ,h ,...,h ), (6) N 1 2 K Z (θ)= Z(i)(θ), Γ Γ and this satisfies Yi=1 Z(i)(θ)=Tr p (y(i),σˆ(i);θ) . hσ=1i|H(y,σˆ;θ)|σ=1ji=hiδij, Γ Γ h i BytakinginintoaccountH′(0)=0andcomparingtoEq.(7), where δ is the Kronecker delta. We use this formulation ij we obtain the relation between the free energy and the log to describe DQAEM in the following section. Note that a likelihood function as similar expression is presented in Ref. [26]. F (θ)=−L(Y ;θ). (10) Γ=0 obs III. Deterministicquantumannealing expectation-maximizationalgorithm(DQAEM) Thuswe cansay thatthe negativefreeenergyatΓ=0 is the log likelihood function. First, we formulateDQAEM by using the quantumrepre- Next, we define the function U (θ;θ′) to formulate Γ sentation describedin the previoussection. Then we discuss DQAEM,whichcorrespondstotheQfunctioninEM.Using its stability by showing the monotonicity of the free energy (9), the function U (θ;θ′) has the form Γ during the algorithm. N A. Formulation UΓ(θ;θ′)= Tr PΓ(σˆ(i)|y(i);θ′)logpΓ(y(i),σˆ(i);θ) , (11) Xi=1 h i In this subsection, we formulate DQAEM by employing where the concept of quantum annealing [8] (also see App. A). First, we rewriteEM in the quantumrepresentation.The log P (σˆ(i)|y(i);θ)= pΓ(y(i),σˆ(i);θ). (12) Γ likelihood function (1) is rewritten as Z(i)(θ) Γ N Then DQAEM is composed of the following two steps. L(Yobs;θ)= logTr p(y(i),σˆ(i);θ) . (7) The first one is to compute the conditional probability (12), Xi=1 h i and this is called the E step of DQAEM. The second one Note that Tr[·] = hσ(i) = 1 |[·]|σ(i) = 1 i. As we have is to update the parameter θ(t) by minimizing the function k k k explainedinSec.II-PA,theQfunction(2)ismaximizedinthe UΓ(θ;θ′) (11). That is, M step of EM. Similarly to (7), the quantum representation θ(t+1)=argminU (θ,θ(t)), Γ of the Q function (2) is given by θ and this is called the M step in DQAEM. Furthermore, we N Q(θ;θ′)= Tr P(σˆ(i)|y(i);θ′)logp(y(i),σˆ(i);θ) , decrease Γ during the iterations. We summarize DQAEM in Xi=1 h i Algo. 2. where B. Convergence theorem p(y(i),σˆ(i);θ)=exp{−(H(y(i),σˆ(i);θ))}, (8) We have proposed DQAEM in the previous subsection. Here,wepresentthetheoremthatguaranteesitsstability via and H(y(i),σˆ(i);θ) is in Eq. (6). Furthermore,the conditional iterations. probabilityP(σˆ(i)|y(i);θ) iscomputedusingBayes’rule.That Theorem 1: Let θ(t+1) = argminθUΓ(θ;θ(t)). Then is, FΓ(θ(t+1)) ≤ FΓ(θ(t)) holds. Moreover, the equality holds if and only if U (θ(t+1);θ(t)) = U (θ(t);θ(t)) p(y(i),σˆ(i);θ) Γ Γ P(σˆ(i)|y(i);θ)= . and SΓ(θ(t+1);θ(t)) = SΓ(θ(t);θ(t)), where SΓ(θ;θ′) = Z(i)(θ) N Tr P (σˆ(i)|y(i);θ′)logP (σˆ(i)|y(i);θ) . i=1 Γ Γ Here, the normalization factor, which is called the partition PThishtheorem insists that DQAEM cionverges at least the function in physics, has the form global optimum or a local optimum. We mention that the global convergence of EM is discussed by Dempster et Z(i)(θ)=Tr p(y(i),σˆ(i);θ) . al.[21]andWu[27],andtheirdiscussionsapplytoDQAEM. h i Algorithm 2 Deterministic quantum annealing expectation- TABLE I: Ratios of success and failure for DQAEM and maximization (DQAEM) algorithm EM. 1: set Γ←Γinit 2: initialize θ(0) and set t←0 Success FaDilQAEM Total 3: while convergence criteria is satisfied do Success 55.9% 0.7% 56.6% 4: calculate PΓ(σˆ(i)|y(i);θ(t)) (i=1,...,N) with (12) (E EM Fail 41.5% 1.9% 43.4% Total 97.4% 2.6% 100.0% step) 5: calculate θ(t+1) = argminθUΓ(θ;θ(t)) with (11) (M step) 6: decrease Γ Inthissection,assumethat,inmatrixnotation,σˆ′ isgiven 7: end while by 0 1 1 σ′=1 0 1. In this sectionIV, .wNeucmaerrryicaolustimnuulmateiorincsal simulations to Obviously [σˆ ,σˆ′],0 is sa1tisfi1ed.0Note that the size of the confirmtheperformanceofDQAEM.Inthefirstsubsection, k Hamiltonian is determined by assumed number of mixtures. we present the setup of numerical simulations, and, in the following subsection, we provide numerical results. B. Numerical results A. Mathematical setup Inthissubsection,usingthedatasetshowninFig.1(a),we compare DQAEM, EM, and DSAEM, which was proposed We estimate the parameters of GMMs by using both DQAEM and EM. Suppose N data points Y = in Ref. [22]. This data set is generated by the GMM that obs {y(1),y(2),...,y(N)} are identically sampled by GMMs with consists of three two-dimensionalGaussian functionswhose K=3. Here, a GMM is given by (4). In EM, the updating meansare(X,Y)=(−3,0),(0,0)and(3,0).HerewesetΓinit= equationsforθ={π ,µ ,Σ }K aredeterminedbythederiva- 1.0in DQAEM to discussthe effect ofquantumfluctuations k k k k=1 simply. We also choose the annealingparameter in DSAEM tive of the Q function (2) with respect to θ. The parameter θ(t+1) of GMMs at the t+1-th iteration is then given by asβinit=0.7.Notethat,inDSAEM,theannealingparameter isgivenbytemperature.Furthermore,we exponentiallyvary N 1 βandΓto1and0,respectively.Weplottransitionsofthelog π(t+1)= P(σ(i)=1 |y(i);θ(t)), (13) k N k likelihoodfunctionsof EM and the negativefree energiesof Xi=1 DSAEMandDQAEMinFig.1(b)byredlines,orangelines, N y(i)P(σ(i)=1 |y(i);θ(t)) µ(t+1)= i=1 k , (14) and blue lines, respectively. The value of −712.1 depicted k P iN=1P(σ(i)=1k|y(i);θ(t)) by the green line in Fig. 1(b) is the optimal value in these PN (y(i)−µ(t+1))(y(i)−µ(t+1))⊺P(σ(i)=1 |y(i);θ(t)) numericalsimulations. DQAEM, EM, and DSAEM give the Σ(t+1)= i=1 k k k , optimalestimateorsuboptimalestimatesdependingoninitial k P N P(σ(i)=1 |y(i);θ(t)) i=1 k optimization values. P (15) To understand visually how DQAEM and EM behave where θ(t) is the tentative estimated parameter at the t-th in parameter estimation, we illustrate estimated Gaussian iteration. functions in the case where the log likelihood function is In DQAEM, the updating equations for θ are determined −712.1 in Fig. 2(a) and in one of the cases where the by the derivative of the function U (θ,θ′) in (11) with log likelihood function is lower than the optimal value in Γ respect to θ, and then P(σ(i) = 1 |y(i);θ(t)) in (13), (14) Fig. 2(b). The case demonstratedin Fig. 2(b) clearly fails in k and (15) are replaced by P (σ(i) = 1 |y(i);θ(t)) = hσ(i) = data clustering. QA k 1 |P (σˆ(i)|y(i);θ(t))|σ(i)=1 i. That is, the updating equations However, the ratios of success for DQAEM, EM, and k Γ k for DQAEM are given by DSAEM are much different. To see the ratios of success and failure for DQAEM and EM, we performed DQAEM N π(t+1)= 1 P (σ(i)=1 |y(i);θ(t)), and EM with same initial optimization values 1000 times, k N QA k respectively, and summarized the results in Table I. Here, Xi=1 we have defined the “success” of DQAEM and EM when N y(i)P (σ(i)=1 |y(i);θ(t)) µ(t+1)= i=1 QA k , squareerrorsbetweentheestimatedmeansofthreeGaussian k P iN=1PQA(σ(i)=1k|y(i);θ(t)) functions and the true means are less than 0.3 times the PN (y(i)−µ(t+1))(y(i)−µ(t+1))⊺P (σ(i)=1 |y(i);θ(t)) covariances of three Gaussian functions. Table I shows that Σ(t+1)= i=1 k k QA k . DQAEM succeeds with the ratio of 97.4 % while EM k P N P (σ(i)=1 |y(i);θ(t)) i=1 QA k succeeds with the ratio of 56.6 %, and that DQAEM is NotethatthequantumPeffectsforparameterestimationcomes superiortoEM.InTableII,weshowtheratiosofsuccessfor from P (σ(i)=1 |y(i);θ(t)). The annealing parameter Γ are DQAEM, EM, and DSAEM in parameter estimation. This QA k varied from initial values to 0 via iterations. table also shows that DQAEM is superior to DSAEM. (a) 20 (a) 20 Data Data Centers Centers 15 15 Covariances Covariances 10 10 e e t t a a n 5 n 5 di di r r o 0 o 0 o o c c Y -5 Y -5 -10 -10 -15 -15 -4 -2 0 2 4 -4 -2 0 2 4 X coordinate X coordinate (b) (b) 20 -712.1 Data Centers ) 15 t() Covariances θ -750 (Γ 10 β, te F a − n 5 d -800 rdi n o 0 a o c ) t()θ -850 EM Y -5 y; DSAEM ( -10 L DQAEM Optimal value -900 -15 0 20 40 60 80 100 120 140 -4 -2 0 2 4 Iterations (t) X coordinate Fig. 1: (a) Data set generated by three Gaussian functions Fig. 2: Estimated Gaussian functions (a) in the case where whosemeansare(X,Y)=(−3,0),(0,0)and(3,0).(b)Number the log likelihood function is the value of −712.1and (b) in of iterations (log scale) vs typical transitions of the log one of the cases where the log likelihood function is lower likelihood functions in EM and the negative free energies thantheoptimalvalue.Greencrossesandbluelinesrepresent in DSAEM and DQAEM. the estimated means and covariances, respectively. TABLEII:RatiosofsuccessforDQAEM,EMandDSAEM. work.Inotherwords,wearegoingtoproposeadeterministic DQAEM EM DSAEM quantum annealing variational Bayes inference. 97.4% 56.6% 77.8% References V. Conclusion [1] E.W.Dijkstra, “Anote ontwoproblems inconnexion withgraphs,” Numerische mathematik, vol.1,no.1,pp.269–271,1959. In this paper, we have proposed the deterministic quan- [2] J. B. Kruskal, “On the shortest spanning subtree of a graph tum annealing expectation-maximization (DQAEM) algo- and the traveling salesman problem,” Proceedings of the American rithm for Gaussian mixture models (GMMs) to relax the Mathematical Society, vol. 7, no. 1, pp. 48–50, 1956. [Online]. Available: http://www.jstor.org/stable/2033241 problem of local optima of the expectation-maximization [3] S. Kirkpatrick, C. D. Gelatt, and M. P. Vecchi, (EM) algorithm by introducing the mechanism of quantum “Optimization by simulated annealing,” Science, vol. 220, fluctuationsintoEM.Althoughwehavelimitedourattention no. 4598, pp. 671–680, 1983. [Online]. Available: http://www.sciencemag.org/content/220/4598/671.abstract toGMMsinthispapertosimplifythediscussion,thederiva- [4] S. Kirkpatrick, “Optimization by simulated annealing: Quantitative tion presented in this paper can be straightforwardlyapplied studies,”JournalofStatisticalPhysics,vol.34,no.5-6,pp.975–986, to any models which have discrete latent variables. After 1984.[Online]. Available: http://dx.doi.org/10.1007/BF01009452 formulating DQAEM, we have presented the theorem that [5] S.Geman andD. Geman, “Stochastic relaxation, gibbs distributions, andthebayesianrestorationofimages,”PatternAnalysisandMachine guarantees its convergence. We then have given numerical Intelligence, IEEETransactionson,vol.PAMI-6,no.6,pp.721–741, simulations to show its efficiency compared to EM and Nov1984. DSAEM. It is expect that the combination of DQAEM and [6] B. Apolloni, C. Carvalho, and D. de Falco, “Quantum stochastic optimization,” Stochastic Processesandtheir Applications, DSAEM gives better performance than DQAEM. Finally, vol. 33, no. 2, pp. 233–244, 1989. [Online]. Available: one of our future works is a Bayesian extension of this http://www.sciencedirect.com/science/article/pii/0304414989900409 [7] A. Finnila, M. Gomez, C. Sebenik, C. Stenson, and Appendix J. Doll, “Quantum annealing: A new method for minimizing A. Quantum annealing multidimensional functions,” Chemical Physics Letters, vol. 219, no. 56, pp. 343–348, 1994. [Online]. Available: Here, we briefly introduce “quantum” annealing (QA) to http://www.sciencedirect.com/science/article/pii/0009261494001170 prepare for DQAEM. First we consider the minimization [8] T.KadowakiandH.Nishimori,“Quantumannealinginthetransverse ising model,” Phys. Rev. E, vol. 58, pp. 5355–5363, Nov 1998. problem of the Ising model. That is, [Online].Available:http://link.aps.org/doi/10.1103/PhysRevE.58.5355 [9] G. E. Santoro, R. Martonˇa´k, E. Tosatti, and R. Car, “Theory minH, of quantum annealing of an ising spin glass,” Science, vol. {σ(zi)} 295, no. 5564, pp. 2427–2430, 2002. [Online]. Available: http://www.sciencemag.org/content/295/5564/2427.abstract where [10] G.E.SantoroandE.Tosatti,“Optimizationusingquantummechanics: quantumannealingthroughadiabaticevolution,”JournalofPhysicsA: H=− Jijσ(zi)σ(zj), (16) Mathematical andGeneral, vol.39,no.36,p.R393,2006. Xi<j [11] E. Farhi, J. Goldstone, S. Gutmann, J. Lapan, A. Lundgren, and D. Preda, “A quantum adiabatic evolution algorithm applied σ(i)=±1foreachi,and J isthecouplingconstantbetween to random instances of an np-complete problem,” Science, z ij spinsatsite i andsite j. Note thatthis problemcandescribe vol. 292, no. 5516, pp. 472–475, 2001. [Online]. Available: http://www.sciencemag.org/content/292/5516/472.abstract manycombinatorialproblemssuchasthetravelingsalesman [12] S. Morita and H. Nishimori, “Mathematical foundation problem and the max-cut problem [28]. of quantum annealing,” Journal of Mathematical In QA, we quantize the Ising model (16) by applying Physics, vol. 49, no. 12, 2008. [Online]. Available: http://scitation.aip.org/content/aip/journal/jmp/49/12/10.1063/1.2995837 magnetic fields along the x axis to the model and solve [13] J. Brooke, D. Bitko, T. F., Rosenbaum, and G. Aeppli, the Schro¨dinger equation on this system while decreasing “Quantum annealing of a disordered magnet,” Science, vol. the magnetic fields. Then the Hamiltonian of this system is 284, no. 5415, pp. 779–781, 1999. [Online]. Available: http://www.sciencemag.org/content/284/5415/779.abstract given by [14] D. de Falco and D. Tamascelli, “Quantum annealing and the schro¨dinger-langevin-kostin equation,” Phys. Rev. Hˆ =− J σˆ(i)σˆ(j)+Γ σˆ(i), ij z z x A, vol. 79, p. 012315, Jan 2009. [Online]. Available: Xi<j Xi http://link.aps.org/doi/10.1103/PhysRevA.79.012315 [15] ——, “An introduction to quantum annealing,” RAIRO - Theoretical where Informatics andApplications, vol.45,pp.99–116,12011. [16] A. Das and B. K. Chakrabarti, “Colloquium : Quantum σˆ(i)=Iˆ ⊗···⊗Iˆ ⊗σˆ ⊗Iˆ ⊗···⊗Iˆ , z 2 × 2 2 × 2 z 2 × 2 2 × 2 annealing and analog quantum computation,” Rev. Mod. Phys., vol. 80, pp. 1061–1081, Sep 2008. [Online]. Available: i−1 N−i http://link.aps.org/doi/10.1103/RevModPhys.80.1061 σˆ(i)=|Iˆ ⊗{··z·⊗Iˆ }⊗σˆ ⊗|Iˆ ⊗{··z·⊗Iˆ }, x 2 × 2 2 × 2 x 2 × 2 2 × 2 [17] R. Martonˇa´k, G. E.Santoro, and E. Tosatti, “Quantum annealing by the path-integral monte carlo method: The two-dimensional random i−1 N−i ising model,” Phys. Rev. B, vol. 66, p. 094203, Sep 2002. [Online]. using | {z } | {z } Available: http://link.aps.org/doi/10.1103/PhysRevB.66.094203 [18] C. Bishop, “Pattern recognition and machine learning (information 1 0 1 0 0 1 [19] Ksc.iePn.cMe aunrpdhsyt,atMisaticchs)in,e1slteaerdnni.ng2:00a6.pcroobrra.b2inlidstipcripnetirnsgpeecdtinv,e”.200M7I.T Iˆ2×2="0 1#, σˆz="0 −1#, σˆx="1 0#, press,2012. and Γ represents the strength of the magnetic fields. This [20] D. Aloise, A. Deshpande, P. Hansen, and P. Popat, “Np- hardness ofeuclidean sum-of-squares clustering,” Machine Learning, is called the Transverse Ising model. Thus the Schro¨dinger vol. 75, no. 2, pp. 245–248, 2009. [Online]. Available: equation that we solve in QA is given by http://dx.doi.org/10.1007/s10994-009-5103-0 [21] A.P.Dempster,N.M.Laird,andD.B.Rubin,“Maximumlikelihood ∂ from incomplete data via the em algorithm,” Journal of the Royal ih¯ |ψi=Hˆ|ψi, (17) ∂t Statistical Society, Series B,vol.39,no.1,pp.1–38,1977. [22] N.UedaandR.Nakano,“Deterministicannealingemalgorithm,”Neu- whereiistheimaginaryunit,h¯ istheDiracconstant,and|ψi ralNetworks, vol.11,no.2,pp.271–282, 1998.[Online]. Available: istheketvector.ThemagneticfieldΓissettobelargeatthe http://www.sciencedirect.com/science/article/pii/S0893608097001330 beginning of QA, and then |ψi is initially equal or close to [23] K. Rose, E. Gurewitz, and G. Fox, “A deterministic annealing approach to clustering,” Pattern Recognition Letters, the eigenstate of iσˆ(xi). During solving (17), we gradually vol. 11, no. 9, pp. 589–594, 1990. [Online]. Available: decreaseΓandfinPallymakeΓgotozero.Therefore,|ψigives http://www.sciencedirect.com/science/article/pii/016786559090010Y a solution for the original Hamiltonian (16). The efficiency [24] K. Rose, E. Gurewitz, and G. C. Fox, “Statistical mechanics and phase transitions in clustering,” Phys. Rev. of QA is discussed in Refs. [8], [11], [12]. Lett., vol. 65, pp. 945–948, Aug 1990. [Online]. Available: http://link.aps.org/doi/10.1103/PhysRevLett.65.945 [25] H.Miyahara and K.Tsumura,“Deterministic quantum annealing for the em algorithm,” in American Control Conference (ACC), July 2016.[Online]. Available: https://arxiv.org/abs/1606.01484 [26] K.TanakaandK.Tsuda,“Aquantum-statistical-mechanical extension ofgaussianmixturemodel,”inJournalofPhysics:ConferenceSeries, vol.95,no.1. IOPPublishing, 2008,p.012023. [27] C. F. J. Wu, “On the convergence properties of the em algorithm,” Ann.Statist.,vol.11,no.1,pp.95–103,031983.[Online].Available: http://dx.doi.org/10.1214/aos/1176346060 [28] A. Lucas, “Ising formulations of many npproblems,” arXiv preprint arXiv:1302.5843, 2013.

Relaxation of the EM Algorithm via Quantum Annealing for Gaussian Mixture Models PDF

0.15 MB·

by Hideyuki Miyahara

#journals #arxiv

Checking for file health...

Save to my drive

Quick download

Download

Upgrade Premium

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Relaxation of the EM Algorithm via Quantum Annealing for Gaussian Mixture Models

See more

The list of books you might like

Upgrade Premium

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.