Table Of Content

SPRINGER BRIEFS IN PROBABILITY AND MATHEMATICAL STATISTICS J. Adolfo Minjárez-Sosa Zero-Sum Discrete-Time Markov Games with Unknown Disturbance Distribution Discounted and Average Criteria 123 SpringerBriefs in Probability and Mathematical Statistics Editor-in-Chief MarkPodolskij,UniversityofAarhus,AarhusC,Denmark SeriesEditors Nina Gantert, Technische Universita¨t Mu¨nchen, Mu¨nich, Nordrhein-Westfalen, Germany RichardNickl,UniversityofCambridge,Cambridge,UK SandrinePećhe´,Univirsite´ ParisDiderot,Paris,France GesineReinert,UniversityofOxford,Oxford,UK MathieuRosenbaum,Universite´ PierreetMarieCurie,Paris,France WeiBiaoWu,UniversityofChicago,Chicago,IL,USA SpringerBriefs present concise summaries of cutting-edge research and practical applications across a wide spectrum of fields. Featuring compact volumes of 50 to 125 pages, the series covers a range of content from professional to academic. Briefsarecharacterized byfast,globalelectronic dissemination,standardpublish- ing contracts, standardized manuscript preparation and formatting guidelines, and expeditedproductionschedules. Typicaltopicsmightinclude: - Atimelyreportofstate-of-thearttechniques - A bridge between new research results, as published in journal articles, and a contextualliteraturereview - Asnapshotofahotoremergingtopic - Lectureofseminarnotesmakingaspecialisttopicaccessiblefornon-specialist readers - SpringerBriefsinProbabilityandMathematicalStatisticsshowcasetopicsofcur- rentrelevanceinthefieldofprobabilityandmathematicalstatistics Manuscriptspresentingnewresultsinaclassicalfield,newfield,oranemerging topic,orbridgesbetweennewresultsandalreadypublishedworks,areencouraged. Thisseriesisintendedformathematiciansandotherscientistswithinterestinprob- ability and mathematical statistics. All volumes published in this series undergo a thoroughrefereeingprocess. TheSBPMSseriesispublishedunder theauspices oftheBernoulliSocietyfor MathematicalStatisticsandProbability. Moreinformationaboutthisseriesathttp://www.springer.com/series/14353 J. Adolfo Minja´rez-Sosa Zero-Sum Discrete-Time Markov Games with Unknown Disturbance Distribution Discounted and Average Criteria J.AdolfoMinja´rez-Sosa DepartmentofMathematics UniversityofSonora Hermosillo,Sonora,Mexico ISSN2365-4333 ISSN2365-4341 (electronic) SpringerBriefsinProbabilityandMathematicalStatistics ISBN978-3-030-35719-1 ISBN978-3-030-35720-7 (eBook) https://doi.org/10.1007/978-3-030-35720-7 MathematicsSubjectClassification:91A15,90C40,62G05 ©TheAuthor(s),underexclusivelicensetoSpringerNatureSwitzerlandAG2020 Thisworkissubjecttocopyright.AllrightsaresolelyandexclusivelylicensedbythePublisher,whether thewholeorpartofthematerialisconcerned,specificallytherightsoftranslation,reprinting,reuse ofillustrations,recitation,broadcasting,reproductiononmicrofilmsorinanyotherphysicalway,and transmissionorinformationstorageandretrieval,electronicadaptation,computersoftware,orbysimilar ordissimilarmethodologynowknownorhereafterdeveloped. Theuseofgeneraldescriptivenames,registerednames,trademarks,servicemarks,etc.inthispublication doesnotimply,evenintheabsenceofaspecificstatement,thatsuchnamesareexemptfromtherelevant protectivelawsandregulationsandthereforefreeforgeneraluse. Thepublisher,theauthors,andtheeditorsaresafetoassumethattheadviceandinformationinthisbook arebelievedtobetrueandaccurateatthedateofpublication.Neitherthepublishernortheauthorsor theeditorsgiveawarranty,expressedorimplied,withrespecttothematerialcontainedhereinorforany errorsoromissionsthatmayhavebeenmade.Thepublisherremainsneutralwithregardtojurisdictional claimsinpublishedmapsandinstitutionalaffiliations. ThisSpringerimprintispublishedbytheregisteredcompanySpringerNatureSwitzerlandAG. Theregisteredcompanyaddressis:Gewerbestrasse11,6330Cham,Switzerland Tomytwowomen:FranciscaandCamila Preface Discrete-timezero-sumMarkovgamesconstituteaclassofstochasticgamesintro- duced by Shapley in [65] whose evolution over time can be described as follows. Ateachstage,players1and2observethecurrentstatexofthegameandindepen- dentlychooseactionsaandb,respectively.Then,player1receivesapayoffr(x,a,b) fromplayer2andthegamemovestoanewstateyinaccordancewithatransition probabilityoratransitionfunctionF asin(1),below.Thepayoffsareaccumulated throughouttheevolutionofthegameinafiniteorinfinitehorizonunderaspecific optimalitycriterion. Even though there are now many studies in this field under multiple variants, it is mostly assumed that all components of the game are completely known by the players. However, the environment itself in which it evolves could make this assumption unrealistic or too strong. Hence, the availability of approximation and estimation algorithms that provide players with some insights on the evolution of thegameisimportant,sothattheycanselecttheiractionsmoreaccurately. An important feature of this book is that it will deal with a class of Markov games with Borel state and action spaces, and possibly unbounded payoffs, under discounted and average criteria, whose state process {x} evolves according to a t stochasticdifferenceequationoftheform xt+1=F(xt,at,bt,ξt), t=0,1,... (1) Here,thepair(a,b)representstheactionschosenbyplayers1and2,respectively, t t at time t, and {ξ} is the disturbance process which is an observable sequence of t independentandidenticallydistributedrandomvariableswithunknowndistribution θ for both players. In this scenario, our concern is in a game played over an in- finite horizon evolving as follows. At stage t, once the players have observed the state x, and before choosing the actions a and b, players 1 and 2 implement a t t t statisticalestimationprocesstoobtainestimatesθ1andθ2ofθ,respectively.Then, t t vii viii Preface independently,theplayersadapttheirdecisionstosuchestimatorstoselectactions a=a(θ1) and b=b(θ2). Next the game jumps to a new state according to the t t t t transition probability determined by Eq.(1) and the unknown distribution θ, and theprocessisrepeatedoverandoveragain. This book is the first part of a project whose objective is to make a systematic analysisonrecentdevelopmentsinthiskindofgames.Specifically,inthisfirstpart wewillprovidethetheoreticalfoundationsontheprocedurescombiningstatistical estimation and control techniques for the construction of strategies of the players. Wegenericallycallthiscombination“estimationandcontrol”procedures.Thesec- ondpartoftheprojectwilldealwithanotherclassofgamesmodels,aswellaswith approximationandcomputationalaspects. The statistical estimation process will be studied from two approaches. In the first one, we assume that the distribution θ has a density ρon ℜk. In this case, there is a vast literature (see, e.g., [9–11, 27] and references therein) that provides differentdensityestimationmethodsthatmightbeeasilyadaptedtotheconditions imposedbytheproblembeinganalyzed.Amongthesewecanmentionkernelden- sity estimation, L estimation for q≥1, and projection estimation, through which q itispossibletoobtainseveralimportantpropertiessuchastherateofconvergence. Thesecondapproachisprovidedbytheempiricaldistributionθ definedbytheran- t domdisturbanceprocess{ξ}.Thismethodisverygeneralinthesensethatboththe t randomvariablesξ andthedistributionθcanbearbitrary.Thepricethatmustbe t paidduetothisgeneralityisthatitsapplicabilityisrestrictedbecauseitisnecessary to impose stronger conditions than those of the previous case on the game model. Anyhow, the use of the empirical distribution has the additional advantage that it providesanapproximationmethodofthevalueofthegameandoptimalstrategies for players, in cases where the distribution θis difficult to handle, by replacing θ with a simpler distribution given by θ. In general terms, our approach to obtain t estimationandcontrolproceduresforbothdiscountedandaveragecriteriaconsists ofcombiningastatisticalestimationmethodsuitableforθwithgametheorytech- niques.Ourstartingpointisto,first,provetheexistenceofavalueofthegameas well as measurable minimizers/maximizers in the Shapley equation. To this end, some conditions are imposed on the game model which fall within the weighted- norm approach proposed by Wessels in [76] and then fully studied in [23, 24, 31] forMarkovdecisionprocesses(MDPs)andrecentlyforzero-sumstochasticgames in[32,40,41,44,48].Thereby,theestimationmethodisadaptedtotheseconditions toobtainappropriateconvergenceproperties. Clearly,thegoodbehaviorofthestrategiesobtainedthroughtheestimationand control procedures depends on the accuracy of the estimation method, and even more on the optimality criterion with which their performance is measured. For instance,itiswellknownthatthediscountedcriterionstronglydependsonthede- cisions selected in the early stages of the game, just where the estimation process yields deficient information about the unknown distribution θ. So, neither player Preface ix 1norplayer2cangenerallyensuretheexistenceofdiscountedoptimalstrategies. Hencetheoptimalityunderadiscountedcriterionisstudiedinanasymptoticsense. ThenotionofasymptoticoptimalityusedinthisbookforMarkovgameswasmo- tivated from Scha¨l [67], who introduced this concept to study adaptive MDPs. In contrast, in view of the necessary asymptotic analysis in the study of the average criterion,thestrategiesobtainedbymeansofestimationandcontrolproceduresturn outtobeaverageoptimal,providingsuitableergodicityconditions. Accordingtothehistoricaldevelopmentofthetheoriesofstochasticcontroland Markov games, the problem of estimation and control for MDPs, also known as anadaptiveMarkovcontrolproblem,hasreceived considerableattentioninrecent years(see,e.g.,[2,7,22,25,26,28,29,33–35,52–55,67]andreferencestherein). In fact, even though approximation algorithms for stochastic games and games with partial information have been studied from several points of view (see, e.g., [8,17,20,43,46,59,60,63],andreferencestherein),inthefieldofstatisticalesti- mationandcontrolproceduresforMarkovgamestheliteratureremainsscarce;we cancite,forinstance,[50,56–58,69,70].Inparticular,[56]dealswithsemi-Markov zero-sumgameswithunknownsojourntimedistribution.Theworks[69,70]study repeatedgamesassumingthatthetransitionlawdependsonanunknownparameter which is estimated by the maximum likelihood method, whereas [50, 56–58] deal withthetheorydevelopedinthecontextofthisbook. The book is organized as follows. In Chap.1 the class of Markov game mod- elswedealwithisintroduced,togetherwiththemainelementsnecessarytodefine the game problem. Chapters 2 and 3 are devoted to analyze the discounted and theaveragecriteria,respectively,whereestimationandcontrolproceduresarepre- sentedundertheassumptionthatthedistributionθhasadensityonℜk.Empirical estimation-approximation methods are given in Chap.4. In this case, by using the empirical distribution to estimate θboth discounted and average criteria are ana- lyzed. Finally, several examples of the class of Markov games studied throughout thebookaregiveninChap.5.Inthispartwefocus,mainly,onillustratingouras- sumptions on the game model, as well as on the numerical implementation of the estimationandcontrolalgorithminspecificexamples. Acknowledgments.TheworkoftheauthorhasbeenpartiallysupportedbyConsejo Nacional de Ciencia y Tecnolog´ıa (CONACYT-Me´xico) grant CB/2015-254306. SpecialthankstoOne´simoHernańdez-Lermawhohasreadadraftversion;hisvalu- ablecommentsandsuggestionshelpedtoimprovethebook.Ialsowanttothankmy colleagues Fernando Luque-Va´squez, Oscar Vega-Amaya, and Carmen Higuera- Chan, with whom I form the “controllers team” of the University of Sonora; un- doubtedly,manyoftheideasdiscussedinourlongtalksareincludedinthisbook. Finally,IdeeplythankDonnaChernyk,AssociateEditoratSpringer,forherhelp. Hermosillo,Mexico J.AdolfoMinja´rez-Sosa August2019 Contents 1 Zero-SumMarkovGames....................................... 1 1.1 GameModels.............................................. 1 1.1.1 Difference-EquationGames:EstimationandControl ...... 2 1.2 Strategies ................................................. 4 1.3 MarkovGameStateProcess ................................. 5 1.4 OptimalityCriteria ......................................... 6 2 DiscountedOptimalityCriterion................................. 9 2.1 Minimax-MaximinOptimalityConditions...................... 10 2.2 DiscountedOptimalStrategies................................ 12 2.2.1 AsymptoticOptimality ............................... 15 2.3 EstimationandControl...................................... 16 2.3.1 DensityEstimation................................... 19 2.3.2 AsymptoticallyOptimalStrategies...................... 22 2.3.3 ProofofTheorem2.11................................ 24 3 AveragePayoffCriterion........................................ 31 3.1 ContinuityandErgodicityConditions.......................... 32 3.2 TheVanishingDiscountFactorApproach(VDFA)............... 33 3.2.1 DifferenceEquationAverageGameModels.............. 34 3.3 EstimationandControlUndertheVDFA....................... 36 3.3.1 AverageOptimalPairofStrategies...................... 38 3.3.2 ProofofTheorem3.5................................. 39 4 EmpiricalApproximation-EstimationAlgorithmsinMarkov Games ........................................................ 47 4.1 AssumptionsandPreliminaryResults.......................... 48 4.2 TheDiscountedEmpiricalGame.............................. 51 4.2.1 EmpiricalEstimationProcess .......................... 53 4.2.2 DiscountedOptimalStrategies ......................... 54 4.3 EmpiricalApproximationUnderAverageCriterion .............. 58 xi

Zero-Sum Discrete-Time Markov Games with Unknown Disturbance Distribution: Discounted and Average Criteria (SpringerBriefs in Probability and Mathematical Statistics) PDF

129 Pages·2020·1.165 MB·English

by J. Adolfo Minjárez-Sosa

Checking for file health...

Download

Upgrade Premium

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Download Zero-Sum Discrete-Time Markov Games with Unknown Disturbance Distribution: Discounted and Average Criteria (SpringerBriefs in Probability and Mathematical Statistics) PDF Free - Full Version

by J. Adolfo Minjárez-Sosa| 2020| 129 pages| 1.165| English

Download Zero-Sum Discrete-Time Markov Games with Unknown Disturbance Distribution: Discounted and Average Criteria (SpringerBriefs in Probability and Mathematical Statistics) by J. Adolfo Minjárez-Sosa in PDF format completely FREE. No registration required, no payment needed. Get instant access to this valuable resource on PDFdrive.to!

Free Download PDF

About Zero-Sum Discrete-Time Markov Games with Unknown Disturbance Distribution: Discounted and Average Criteria (SpringerBriefs in Probability and Mathematical Statistics)

No description available for this book.

Detailed Information

Author:	J. Adolfo Minjárez-Sosa
Publication Year:	2020
ISBN:	9783030357191
Pages:	129
Language:	English
File Size:	1.165
Format:	PDF
Price:	FREE

Download Free PDF

Safe & Secure Download - No registration required

Why Choose PDFdrive for Your Free Zero-Sum Discrete-Time Markov Games with Unknown Disturbance Distribution: Discounted and Average Criteria (SpringerBriefs in Probability and Mathematical Statistics) Download?

100% Free: No hidden fees or subscriptions required for one book every day.
No Registration: Immediate access is available without creating accounts for one book every day.
Safe and Secure: Clean downloads without malware or viruses
Multiple Formats: PDF, MOBI, Mpub,... optimized for all devices
Educational Resource: Supporting knowledge sharing and learning

Frequently Asked Questions

Is it really free to download Zero-Sum Discrete-Time Markov Games with Unknown Disturbance Distribution: Discounted and Average Criteria (SpringerBriefs in Probability and Mathematical Statistics) PDF?

Yes, on https://PDFdrive.to you can download Zero-Sum Discrete-Time Markov Games with Unknown Disturbance Distribution: Discounted and Average Criteria (SpringerBriefs in Probability and Mathematical Statistics) by J. Adolfo Minjárez-Sosa completely free. We don't require any payment, subscription, or registration to access this PDF file. For 3 books every day.

How can I read Zero-Sum Discrete-Time Markov Games with Unknown Disturbance Distribution: Discounted and Average Criteria (SpringerBriefs in Probability and Mathematical Statistics) on my mobile device?

After downloading Zero-Sum Discrete-Time Markov Games with Unknown Disturbance Distribution: Discounted and Average Criteria (SpringerBriefs in Probability and Mathematical Statistics) PDF, you can open it with any PDF reader app on your phone or tablet. We recommend using Adobe Acrobat Reader, Apple Books, or Google Play Books for the best reading experience.

Is this the full version of Zero-Sum Discrete-Time Markov Games with Unknown Disturbance Distribution: Discounted and Average Criteria (SpringerBriefs in Probability and Mathematical Statistics)?

Yes, this is the complete PDF version of Zero-Sum Discrete-Time Markov Games with Unknown Disturbance Distribution: Discounted and Average Criteria (SpringerBriefs in Probability and Mathematical Statistics) by J. Adolfo Minjárez-Sosa. You will be able to read the entire content as in the printed version without missing any pages.

Is it legal to download Zero-Sum Discrete-Time Markov Games with Unknown Disturbance Distribution: Discounted and Average Criteria (SpringerBriefs in Probability and Mathematical Statistics) PDF for free?

https://PDFdrive.to provides links to free educational resources available online. We do not store any files on our servers. Please be aware of copyright laws in your country before downloading.

The materials shared are intended for research, educational, and personal use in accordance with fair use principles.