Simulated annealing and Gibbs sampler Leonid Zhukov SchoolofAppliedMathematicsandInformationScience National Research University Higher School of Economics 12.12.2013 LeonidZhukov (HSE) Lecture6 12.12.2013 1/14 Simulated annealing Global optimization method (vs greedy strategies - local minimum) Works for both continues and discrite optimization problems Intuition from thermodynamics, physical annealing process Algorithm: 1 generate trial point and evaluate function at that location. 2 accept new location if it reduces "energy"(improves solution) 3 accept some new location even not improving the solution 4 probability accepting non-improving location decreases with lowering "temperture" LeonidZhukov (HSE) Lecture6 12.12.2013 2/14 Simulated annealing Compute change of energy for a k-step ∆E = E −E : k k k−1 If ∆E ≤0, accept the step k if ∆Ek >0, accept the step with probability P(∆Ek)=e−∆Ek/Tk Cooling schedule: T =αT /k k 0 T =αT /logk k 0 T =T /αk k 0 LeonidZhukov (HSE) Lecture6 12.12.2013 3/14 Traveling salesman problem Travelling salesman problem: given a list of cities and the distances between each pair of cities, what is the shortest possible route that visits each city exactly once and returns to the origin city? (NP-hard problem in combinatorial optimization) Simulated annealing algorithm: Define energy cost function: (cid:88)N (cid:113) E = (x −x2)+(y −y )2 i+1 i i+1 i i Select initial route (sequence of city labeles) Iteratively imporve the route by trying local changes: swap a pair of cities + reverse the section between them always accept lower enegergy swap, sometimes higher enegery reduce temperature with cooling schedule LeonidZhukov (HSE) Lecture6 12.12.2013 4/14 Traveling salesman problem LeonidZhukov (HSE) Lecture6 12.12.2013 5/14 MCMC connection MCMC constructs Markov chain that generates random samples distributed according to P(x) Metropolis algorithm uses symmetric "candidate"distribution Q(x,x∗) = Q(x∗,x) Probabiity of "forward move"of the chain: (cid:20) P(x∗)(cid:21) α(x,x∗) = min 1, P(x) Consider Boltzman (Gibbs) distribution P(x) = 1e−ET(x) Z Probability of move (cid:20) (cid:21) α(x,x∗) = min 1,e−E(x∗)T−E(x) = min(cid:104)1,e−∆TE(cid:105) ∆E ≤ 0 - always move, ∆E > 0 move with probability exp(−∆E) T LeonidZhukov (HSE) Lecture6 12.12.2013 6/14 The Gibbs sampler Goal: get random samples from joint multivariate density p(x ,...x ) if 1 n it is not known explicitely or hard to sample from Given conditional univatiate distributions p(x |x ...x ), p(x |x ...x ), 1 2 n 2 1 n ..p(x |x ...x ) n 1 n−1 Bayesian inference, posterior distributions Approximate joint distribution (histogram), compute averages LeonidZhukov (HSE) Lecture6 12.12.2013 7/14 Bivariate case joint bivariate distribution p(x,y) marginal distribution (cid:82) p(x) = p(x,y)dy (cid:82) p(y) = p(x,y)dx conditional probability p(x|y) = p(x,y)/p(y) p(y|x) = p(x,y)/p(x) marginal from conditional distribution (cid:82) p(x) = p(x|y)p(y)dy = E [p(x|y)] p(y) (cid:82) p(y) = p(y|x)p(x)dx = E [p(y|x)] p(x) LeonidZhukov (HSE) Lecture6 12.12.2013 8/14 Gibbs sampler: bivariate case Given: p(x|y),p(y|x) Choose y , t = 0 0 do "sampler scan" x ∼ p(x|y = y ) t t y ∼ p(y|x = x ) t+1 t repeat k-times, Gibbs sequence (x ,y ),(x ,y )...(x ,y ) 0 0 1 1 k k LeonidZhukov (HSE) Lecture6 12.12.2013 9/14 Gibbs sampler LeonidZhukov (HSE) Lecture6 12.12.2013 10/14
Description: