ebook img

pruning algorithms for partially observable markov decision processes a thesis submitted to the ... PDF

140 Pages·2017·0.99 MB·English
by  
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview pruning algorithms for partially observable markov decision processes a thesis submitted to the ...

PRUNING ALGORITHMS FOR PARTIALLY OBSERVABLE MARKOV DECISION PROCESSES A THESIS SUBMITTED TO THE GRADUATE SCHOOL OF NATURAL AND APPLIED SCIENCES OF MIDDLE EAST TECHNICAL UNIVERSITY BY SEL(cid:157)M (cid:214)ZGEN IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY IN ELECTRICAL AND ELECTRONICS ENGINEERING NOVEMBER 2017 Approval of the thesis: PRUNING ALGORITHMS FOR PARTIALLY OBSERVABLE MARKOV DECISION PROCESSES submitted by SEL(cid:157)M (cid:214)ZGEN in partial ful(cid:28)llment of the requirements for the degree of Doctor of Philosophy in Electrical and Electronics Engineer- ing Department, Middle East Technical University by, Prof. Dr. G(cid:252)lbin Dural (cid:220)nver Dean, Graduate School of Natural and Applied Sciences Prof. Dr. Tolga ˙ilo§lu Head of Department, Electrical and Electronics Engineering Prof. Dr. M(cid:252)beccel Demirekler Supervisor, Electrical and Electronics Eng. Dept., METU Examining Committee Members: Do(cid:231). Dr. Umut Orguner Electrical and Electronics Eng. Dep., METU Prof. Dr. M(cid:252)beccel Demirekler Electrical and Electronics Eng. Dep., METU Prof. Dr. Faruk Polat Computer Eng. Dep., METU Prof. Dr. (cid:214)mer Morg(cid:252)l Electrical Eng. Dep., Bilkent University Assist. Prof. Dr. Mehmet Tan Computer Eng. Dep., TOBB ETU Date: 30.11.2017 I hereby declare that all information in this document has been ob- tained and presented in accordance with academic rules and ethical conduct. I also declare that, as required by these rules and conduct, I have fully cited and referenced all material and results that are not original to this work. Name, Last Name: SEL(cid:157)M (cid:214)ZGEN Signature : iv ABSTRACT PRUNING ALGORITHMS FOR PARTIALLY OBSERVABLE MARKOV DECISION PROCESSES (cid:214)zgen, Selim Ph.D., Department of Electrical and Electronics Engineering Supervisor : Prof. Dr. M(cid:252)beccel Demirekler November 2017, 120 pages It is possible to represent the value function in partially observable Markov deci- sion processes as a piecewise linear function if the state, action, and observation space is discrete. Exact value iteration algorithm searches for this value func- tion by creating an exponential number of linear functions at each step, many of which can be pruned without changing the value of the value function. The pruning procedure is made possible by the use of linear programming. This study (cid:28)rst gives a geometric framework of the pruning procedure. It shows that the linear programming iterations refer to the selection of di(cid:27)erent convex regions in the vector space representation of the pruning problem. We also put forward an algebraic framework, which is the utilization and maintenance of linear programs. It shows how the problem can be decomposed into small sized LPs and what the LP iterations refer to. While stating these two theoretical frameworks, their relations have also been exploited. v The exponential increase in the number of vectors in any step of the exact value iteration algorithm is due to an operation called the cross-sum addition of a set of vectors. This operation results in a new set of vectors. It is known that for any of the summand vectors in this new set to be non-dominated, the addend vectors entering the cross-sum addition should have intersecting support sets. The given geometric and algebraic framework has further been extended to exploit this particular property of the cross-sum operation. Two novel pruning algorithms have been o(cid:27)ered in this study. First algorithm, called FastCone, can be used for pruning any given set of vectors. For a given set of clean vectors at any step, the algorithm hastily searches for the convex region that a dirty vector is in and tries to (cid:28)nd a clean vector if only the given set of clean vectors is not su(cid:30)cient to make the decision about this dirty vector. ThesecondalgorithmiscalledCross-Sum Pruning with Multiple Objective Func- tions, where the aim is to (cid:28)nd the vectors that have non-intersecting support sets with the current active vectors in each simplex iteration. This approach is useful because when two vectors from two di(cid:27)erent sets with non-intersecting support sets are detected, it is possible to delete all ordered pairs containing these two vectors. And this amounts to a simple sign check of the coe(cid:30)cients of a row of the simplex tableau. To show the algorithms’ performance, both algorithms have been compared to the conventional algorithms and their revised versions both analytically and experimentally. Keywords: decision-theoretic planning, Markov decision processes, partial ob- servability, linear programming vi (cid:214)Z KISM(cid:157) G(cid:214)ZLEMLENEB(cid:157)L(cid:157)R MARKOV KARAR S(cid:220)RE˙LER(cid:157) (cid:157)˙(cid:157)N BUDAMA ALGOR(cid:157)TMALARI (cid:214)zgen, Selim Doktora, Elektrik ve Elektronik M(cid:252)hendisli§i B(cid:246)l(cid:252)m(cid:252) Tez Y(cid:246)neticisi : Prof. Dr. M(cid:252)beccel Demirekler Kas(cid:25)m 2017 , 120 sayfa Durum, eylem ve g(cid:246)zlem uzay(cid:25)n(cid:25)n ayr(cid:25)k oldu§u k(cid:25)smi g(cid:246)zlemlenebilir Markov karar s(cid:252)re(cid:231)lerinde de§er fonksiyonunu par(cid:231)al(cid:25) do§rusal bir fonksiyon olarak g(cid:246)s- termek m(cid:252)mk(cid:252)nd(cid:252)r. Kesin de§er yineleme algoritmas(cid:25), bu de§er fonksiyonunu ararken her ad(cid:25)mda (cid:252)ssel say(cid:25)da lineer fonksiyon yaratmaktad(cid:25)r. Bu fonksiyon- lar(cid:25)n (cid:246)nemli bir k(cid:25)sm(cid:25)n(cid:25) de§er fonksiyonunun de§erini hi(cid:231) de§i‡tirmeden elemek m(cid:252)mk(cid:252)nd(cid:252)r. Bu budama prosed(cid:252)r(cid:252) lineer programlaman(cid:25)n kullan(cid:25)lmas(cid:25) saye- sinde m(cid:252)mk(cid:252)n olmaktad(cid:25)r. Bu (cid:231)al(cid:25)‡ma ilk olarak budama prosed(cid:252)r(cid:252)n(cid:252)n geometrik bir (cid:231)er(cid:231)evesini vermekte- dir. Bu (cid:231)al(cid:25)‡mada g(cid:246)sterilmektedir ki, lineer programlama iterasyonlar(cid:25), budama probleminin vekt(cid:246)r uzay(cid:25) g(cid:246)steriminde farkl(cid:25) d(cid:25)‡b(cid:252)key alanlar(cid:25)n se(cid:231)imine denk gelmektedir. Buna ek olarak, budama problemine cebirsel bir (cid:231)er(cid:231)eve de sunul- mu‡tur. Bu (cid:231)er(cid:231)eve lineer programlar(cid:25)n in‡a edilmesi ve kullan(cid:25)lmas(cid:25) (cid:252)zerine kurulmaktad(cid:25)r. Problemin daha k(cid:252)(cid:231)(cid:252)k boyutlu lineer programlar kullan(cid:25)larak vii nas(cid:25)l (cid:231)(cid:246)z(cid:252)lebilece§i ve lineer programlar(cid:25)n iterasyonlar(cid:25)n(cid:25)n ne anlama geldi§i an- lat(cid:25)lm(cid:25)‡t(cid:25)r. Problemin geometrik ve cebirsel (cid:231)er(cid:231)evesi aras(cid:25)nda ayr(cid:25)ca bir ili‡ki de kurulmu‡tur. Kesin de§er yineleme algoritmas(cid:25)n(cid:25)n her ad(cid:25)m(cid:25)nda vekt(cid:246)r say(cid:25)s(cid:25)ndaki (cid:252)ssel ar- t(cid:25)‡(cid:25)n nedeni verili olan vekt(cid:246)r k(cid:252)meleri (cid:252)zerinde yap(cid:25)lan (cid:231)apraz toplama i‡lemi- dir. Bu i‡lem sonucunda yeni bir vekt(cid:246)r k(cid:252)mesi olu‡maktad(cid:25)r. Bilinmektedir ki, yeni olu‡an setteki toplanan vekt(cid:246)rlerden herhangi birinin elenebilir oldu§unu g(cid:246)rmek i(cid:231)in (cid:231)apraz toplama i‡lemine giren toplanan vekt(cid:246)rlerin destek k(cid:252)mele- rinin kesi‡imine bakmak yeterlidir. Elinizdeki (cid:231)al(cid:25)‡ma, verili olan geometrik ve cebirsel (cid:231)er(cid:231)eveyi (cid:231)apraz toplama operasyonunun (cid:246)zelliklerini incelemek (cid:252)zere kullanmaktad(cid:25)r. Bu (cid:231)al(cid:25)‡mada iki yeni budama algoritmas(cid:25) (cid:246)nerilmektedir. Bunlardan ilki olan FastCone veriliherhangibirvekt(cid:246)rsetii(cid:231)inkullan(cid:25)labilir.Algoritman(cid:25)nherhangi bir an(cid:25)nda verili olan bir temiz vekt(cid:246)r seti i(cid:231)in, se(cid:231)ilmi‡ olan kirli vekt(cid:246)r(cid:252)n i(cid:231)ine d(cid:252)‡t(cid:252)§(cid:252) d(cid:25)‡b(cid:252)key alan h(cid:25)zl(cid:25) bir ‡ekilde bulunmaktad(cid:25)r. E§er bulunan (cid:231)(cid:246)z(cid:252)m, se- (cid:231)ilmi‡ olan kirli vekt(cid:246)r(cid:252) elemek i(cid:231)in yeterli de§ilse bu i‡lem i(cid:231)in yararl(cid:25) olabilecek temiz vekt(cid:246)rler bulunmaya (cid:231)al(cid:25)‡(cid:25)lmaktad(cid:25)r. (cid:157)kinci algoritman(cid:25)n ismi Cross-Sum Pruning with Multiple Objective Functions olarak belirlenmi‡tir. Bu algoritma ile ama(cid:231)lanan herhangi bir simpleks ad(cid:25)- m(cid:25)ndaaktifolanvekt(cid:246)rlerindestekk(cid:252)meleriylekesi‡imibo‡k(cid:252)meolanvekt(cid:246)rleri belirlemektir. Bu operasyonun i‡levi ‡(cid:246)yle (cid:246)zetlenebilir. E§er farkl(cid:25) iki k(cid:252)meden al(cid:25)nan iki vekt(cid:246)r(cid:252)n destek k(cid:252)melerinin kesi‡imi bo‡ k(cid:252)me ise, bu iki vekt(cid:246)r(cid:252) i(cid:231)eren b(cid:252)t(cid:252)n s(cid:25)ral(cid:25) (cid:231)iftlerin elenmesi m(cid:252)mk(cid:252)n hale gelmektedir. Bu iki vekt(cid:246)r(cid:252)n destek k(cid:252)melerinin kesi‡iminin bo‡ k(cid:252)me oldu§unu anlamak i(cid:231)in ise simpleks tablosundaki bir s(cid:25)rada i‡aret kontrolu yapmak yeterli olmaktad(cid:25)r. Algoritma performanslar(cid:25)n(cid:25) g(cid:246)sterebilmek i(cid:231)in (cid:246)nerilen algoritmalar, konvansi- yonel algoritmalar ve onlar(cid:25)n revize edilmi‡ versiyonlar(cid:25) ile analitik ve deneysel olarak k(cid:25)yaslanm(cid:25)‡t(cid:25)r. Anahtar Kelimeler: karar kuram(cid:25) temelli planlama, Markov karar s(cid:252)re(cid:231)leri, viii k(cid:25)smi g(cid:246)zlemlenebilirlik, lineer programlama ix To my grandmother Nezaket Erig(cid:252)r x

Description:
yonel algoritmalar ve onlar n revize edilmi³ versiyonlar ile analitik ve .. The computation of all possible vectors for each step of the exact value
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.