politecnico di milano Facoltà di Ingegneria Scuola di Ingegneria Industriale e dell'Informazione Dipartimento di Elettronica, Informazione e Bioingegneria Master of Science in Computer science and engineering Transfer Learning for Actor-Critic methods in Lipschitz Markov Decision Processes Supervisor: prof marcello restelli . Assistant Supervisors: dott matteo pirotta . ing andrea tirinzoni . Master Graduation Thesis by: daniel felipe vacca manrique 852802 Student Id n. 2016 2017 Academic Year Actor-critic algorithm in the Importance Sam- 58 pling scenario ACRONYMS RL Reinforcement Learning MDP Markov Decision Process POMDP Partially Observable Markov Decision Process MRP Markov Reward Process MC Monte Carlo FIM Fisher Information Matrix TD Temporal difference MSE Mean Squared Error MSBE Mean Squared Bellman Error MSTDE Mean Squared Temporal Difference Error LSTD Least Squares Temporal Difference MSPBE Mean Squared Projected Bellman Error NEU Norm of Expected TD Update OPE Operator Error FPE Fixed-Point Error SGD Stochastic Gradient Descent SVD Singular Value Decomposition IS Importance Sampling ESS Effective Sample Size TL Transfer Learning PLC Pointwise Lipschitz Continuous x