Table Of Content

DEGREE PROJECT IN ELECTRICAL ENGINEERING, SECOND CYCLE, 30 CREDITS STOCKHOLM, SWEDEN 2018 Deep Reinforcement Learning for Adaptive Resource Allocation in Virtualized Network Functions SIMON IGNAT KTH ROYAL INSTITUTE OF TECHNOLOGY SCHOOL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCE Deep Reinforcement Learning for Adaptive Resource Allocation in Virtualized Network Functions SIMON IGNAT, SIMON.G.IGNAT@GMAIL.COM M.Sc in Electrical Engineering Systems, Control and Robotics Date: October 7, 2018 Supervisor: Johannes A. Stork Examiner: Danica Kragic KTH Royal Institute of Technology School of Electrical Engineering and Computer Science ii Nobody ever figures out what life is all about, and it doesn’t matter. Explore the world. Nearly everything is really interesting if you go into it deeply enough. RICHARD FEYNMAN iii Abstract Network Function Virtualization (NFV) is the transition from propri- etary hardware functions to virtualized counterparts of them within the telecommunication industry. These virtualized counterparts are known as Virtualized Network Functions (VNFs) and are the main buildingblocksofNFV.Thetransitionstarted2012andisstillongoing, with research and development moving at a high pace. It is believed that when using virtualization, both capital and operating expenses canbeloweredasaresultofeasierdeployments,cheapersystemsand networks that can operate more autonomous. This thesis examines if thecurrentstateofNFVcanlowertheoperatingexpenseswhilemain- taining quality of service (QoS) high by using current state of the art machinelearningalgorithms. Morespecificallythethesisanalyzesthe problem of adaptive autoscaling of virtual machines (VMs) allocated by the VNFs with deep reinforcement learning (DRL). To analyze the task, the thesis implements a discrete time model for VNFs with the purposeofcapturingthefundamentalcharacteristicsofthescalingop- eration. It also examines the learning and robustness/generalization of six state-of-the-art DRL algorithms. The algorithms are examined since they have fundamental differences in their properties, ranging from off-policy methods such as DQN to on-policy methods such as PPO Advantage Actor Critic. The policies are compared to a baseline P-controllertoevaluatetheperformancewithrespecttosimplermeth- ods. The result from the model show that DRL needs around 100,000 samples to converge, which in a real setting would represent around 70daysoflearning. Thethesisalsoshowsthatthefinalpolicyapplied bytheagentdoesnotshowconsiderableimprovementsoverasimple controlalgorithmwithrespecttorewardandperformancewhenmul- tiple experiments with varying loads and configurations are tested. Due to the lack of data and slow real time systems, with robustness being an important consideration, the time to convergence required by a DRL agent is to long for an autoscaling solution to be deployed in the near future. Therefore, the author can not recommend DRL for autoscalinginVNFsgiventhecurrentstateofthetechnology. Instead the author recommend simpler methods, such as supervised machine learningorclassicalcontroltheory. iv Sammanfattning Network Function Virtualization (NFV) är övergången från propri- etärahårdvarufunktionertillvirtualiserademotsvarigheteravdeminom telekommunikationsindustrin. Dessavirtualiserademotsvarigheterär kända som Virtualized Network Functions (VNF) och kan ses som beståndsdelarnaavNFV.Tankaromvirtualiseringstartade2012ochär fortfarandepågående,därforskningochutvecklingfortskriderisnabb takt. Förhoppningen är att virtualiseringen ska sänka både kapital- och driftkostnader till följd av enklare installationer, billigare system och mer autonoma lösningar. Det här examensarbetet undersöker om NFV:snuvarandetillståndkansänkadriftskostnadernasamtidigtsom kvaliteten på tjänsten (QoS) hålls hög genom att använda maskinin- lärning. Mer specifikt undersöks Deep Reinforcment Learning (DRL) och problemet adaptiv autoskalning av virtuella maskiner som an- vänds av VNF:erna. För att analysera uppgiften implementerar examensarbetet en diskret model över VNF:s med syftet att fånga de fundamentala egenskaperna hos skalningsoperationer. Det granskar också lärandet och robustheten av sex DRL-algoritmer. Algoritmerna undersöks eftersom de har grundläggande skillnader i deras egen- skaper,frånoff-policy-metodersåsomDQNtillon-policy-metoderså som PPO Advantage Actor Critic. Algoritmerna jämförs sedan med en P-regulator för att utvärdera prestanda med hänsyn till enklare metoder. Resultatet från studien visar att DRL behöver cirka 100 000 interaktionermedmodellenförattkonvergera,vilketienverkligmiljö skulle motsvara cirka 70 dagers lärande. Examensarbetet visar också att de konvergerade algoritmerna inte visar avsevärda förbättringar över den enkla P-regulatorn när flera experiment med varierande be- lastningar och konfigurationer testas. På grund av bristen på data ochdetlångsammarealtidssystem,därrobusthetärettviktigtövervä- gande, ses tiden för konvergens som krävs av en DRL-agent som ett stortproblem. DärförkanförfattareninterekommenderaDRLförau- toskalningiNFVmedtankepåteknikensnuvarandetillstånd. Istället rekommenderar författaren enklare metoder, såsom supervised ma- chinelearningellerklassiskkontrollteori. v Acknowledgements FirstlyIwouldliketothankmysupervisoranddomainexpertatEric- sson,HerbieFrancis,forhissupportandtrustinme. Ourdailydiscus- sionsaboutAI,lifeandthefaithofhumanityhashelpedmeinwriting thisthesisandgrowasaperson. I would also like to express my gratitude to Johannes Stork as my supervisorfromKTH,withouthissupportandideasthisthesiswould nothavebeenwhatitis. In addition to my supervisors I also want to thank Wenfeng and Tobias for their support, Ibrahim for our daily fussball matches, Tord forhisgenuineinterestintheprojectandJörgenforhishelpwithiden- tifyingandanalysingthedatafromVNFs. Lastly I would like to thank my boss, Thomas Edwall, for entrust- ing me in with this thesis and giving me the opportunity to explore andlearnaboutthesubjectofreinforcementlearning. Contents 1 Introduction 1 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 ResearchQuestion . . . . . . . . . . . . . . . . . . . . . . 3 1.3 RelatedWork . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.4 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2 NetworkFunctionVirtualization 5 2.1 MotivationofVirtualization . . . . . . . . . . . . . . . . . 5 2.2 ManagementOperations . . . . . . . . . . . . . . . . . . . 6 2.2.1 TheScalingAction . . . . . . . . . . . . . . . . . . 6 2.2.2 WaystoMeasurePerformance . . . . . . . . . . . 8 2.2.3 GettingDatafromCustomers . . . . . . . . . . . . 11 3 ReinforcementLearning 13 3.1 IntroductiontoReinforcementLearning . . . . . . . . . . 14 3.2 SolvingaTaskusingRL . . . . . . . . . . . . . . . . . . . 15 3.2.1 StateSelection . . . . . . . . . . . . . . . . . . . . . 15 3.2.2 RewardFunction . . . . . . . . . . . . . . . . . . . 16 3.2.3 EpisodicandContinuousTasks . . . . . . . . . . . 17 3.3 MathematicalFramework . . . . . . . . . . . . . . . . . . 19 3.3.1 MarkovDecisionProcess . . . . . . . . . . . . . . 20 3.3.2 ValueFunctions . . . . . . . . . . . . . . . . . . . . 21 3.4 Dynamicprogramming . . . . . . . . . . . . . . . . . . . 22 3.4.1 BellmanOptimalityEquations . . . . . . . . . . . 22 3.4.2 PolicyIteration . . . . . . . . . . . . . . . . . . . . 24 3.5 LearningfromExperience . . . . . . . . . . . . . . . . . . 25 3.5.1 ExploringandExploitingExperience . . . . . . . 25 3.5.2 MonteCarloLearning . . . . . . . . . . . . . . . . 28 3.5.3 TemporalDifference(TD)Learning . . . . . . . . 29 vi CONTENTS vii 3.5.4 UnifyingMethodology,n-stepTDlearning . . . . 30 3.5.5 FunctionApproximation . . . . . . . . . . . . . . 34 3.5.6 PolicyGradientMethods . . . . . . . . . . . . . . 35 4 DeepReinforcementLearning 38 4.1 ArtificialNeuralNetworks(ANN) . . . . . . . . . . . . . 39 4.1.1 FeedforwardNeuralNetworks . . . . . . . . . . . 39 4.1.2 NetworkStructures . . . . . . . . . . . . . . . . . . 42 4.1.3 TrainingtheNetwork . . . . . . . . . . . . . . . . 43 4.2 DeepReinforcementLearningAlgorithms . . . . . . . . . 46 4.2.1 NeuralFittedQ(NFQ)Iteration . . . . . . . . . . 48 4.2.2 DeepQ-Network(DQN) . . . . . . . . . . . . . . . 49 4.2.3 TrustRegionPolicyOptimization(TRPO) . . . . . 50 4.2.4 ProximalPolicyOptimization(PPO) . . . . . . . . 52 4.2.5 ActorCriticAlgorithms . . . . . . . . . . . . . . . 54 4.2.6 EvolutionaryStrategies . . . . . . . . . . . . . . . 55 4.3 ImprovementstoDRLAlgorithms . . . . . . . . . . . . . 56 4.3.1 DDQNandRainbowDQN . . . . . . . . . . . . . 56 4.3.2 AdvantageActorCritic(A2C)withGAE(�) . . . . 58 4.3.3 AsynchronousLearning . . . . . . . . . . . . . . . 60 5 Method 64 5.1 ModellingofaVNFDuringScaling . . . . . . . . . . . . 64 5.1.1 StatesintheModel . . . . . . . . . . . . . . . . . . 65 5.1.2 ActionsPossibletoPerformontheModel . . . . . 66 5.1.3 ExternalLoad,ltr . . . . . . . . . . . . . . . . . . . 66 t 5.1.4 State-transitiondynamics . . . . . . . . . . . . . . 67 5.2 ModelingtheTask. . . . . . . . . . . . . . . . . . . . . . . 71 5.2.1 RewardSignal . . . . . . . . . . . . . . . . . . . . . 71 5.2.2 ModelParameters . . . . . . . . . . . . . . . . . . 72 5.3 AutoscalingusingDRLonModel . . . . . . . . . . . . . . 74 5.4 ImplementationDetails. . . . . . . . . . . . . . . . . . . . 75 6 ExperimentsandResults 77 6.1 Experiment1: GatheringStatisticsofTraining . . . . . . . 77 6.1.1 Results . . . . . . . . . . . . . . . . . . . . . . . . . 78 6.1.2 InterpretationofResults . . . . . . . . . . . . . . . 82 6.2 Experiment2: ChangingtheModelParameters . . . . . . 83 6.2.1 Results . . . . . . . . . . . . . . . . . . . . . . . . . 83 6.2.2 InterpretationofResults . . . . . . . . . . . . . . . 83 viii CONTENTS 7 Discussion 89 7.1 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 7.1.1 UncertaintyDuetoDifferentConfigurationsand LoadPatterns . . . . . . . . . . . . . . . . . . . . . 90 7.1.2 SuboptimalPerformanceduetoExploring . . . . 91 7.1.3 ValidatingRewardFunctionandDesignConsid- erations . . . . . . . . . . . . . . . . . . . . . . . . 91 7.1.4 ModelandtheAgentsAppliedtoIt . . . . . . . . 92 7.2 Ethics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 8 Conclusion 96 8.1 FutureWorkonModel . . . . . . . . . . . . . . . . . . . . 96

Description:

known as Virtualized Network Functions (VNFs) and are the main building blocks .. is modelled as two different types of errors, explained more in detail later .. Policy iteration works by alternating between evaluating and improv-.

Deep Reinforcement Learning for Adaptive Resource Allocation in Virtualized Network Functions PDF

110 Pages·2017·2.48 MB·English

Checking for file health...

Download

Upgrade Premium

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Download Deep Reinforcement Learning for Adaptive Resource Allocation in Virtualized Network Functions PDF Free - Full Version

by Unknow| 2017| 110 pages| 2.48| English

Download Deep Reinforcement Learning for Adaptive Resource Allocation in Virtualized Network Functions by in PDF format completely FREE. No registration required, no payment needed. Get instant access to this valuable resource on PDFdrive.to!

Free Download PDF

About Deep Reinforcement Learning for Adaptive Resource Allocation in Virtualized Network Functions

Detailed Information

Author:	Unknown
Publication Year:	2017
Pages:	110
Language:	English
File Size:	2.48
Format:	PDF
Price:	FREE

Download Free PDF

Safe & Secure Download - No registration required

Why Choose PDFdrive for Your Free Deep Reinforcement Learning for Adaptive Resource Allocation in Virtualized Network Functions Download?

100% Free: No hidden fees or subscriptions required for one book every day.
No Registration: Immediate access is available without creating accounts for one book every day.
Safe and Secure: Clean downloads without malware or viruses
Multiple Formats: PDF, MOBI, Mpub,... optimized for all devices
Educational Resource: Supporting knowledge sharing and learning

Frequently Asked Questions

Is it really free to download Deep Reinforcement Learning for Adaptive Resource Allocation in Virtualized Network Functions PDF?

Yes, on https://PDFdrive.to you can download Deep Reinforcement Learning for Adaptive Resource Allocation in Virtualized Network Functions by completely free. We don't require any payment, subscription, or registration to access this PDF file. For 3 books every day.

How can I read Deep Reinforcement Learning for Adaptive Resource Allocation in Virtualized Network Functions on my mobile device?

After downloading Deep Reinforcement Learning for Adaptive Resource Allocation in Virtualized Network Functions PDF, you can open it with any PDF reader app on your phone or tablet. We recommend using Adobe Acrobat Reader, Apple Books, or Google Play Books for the best reading experience.

Is this the full version of Deep Reinforcement Learning for Adaptive Resource Allocation in Virtualized Network Functions?

Yes, this is the complete PDF version of Deep Reinforcement Learning for Adaptive Resource Allocation in Virtualized Network Functions by Unknow. You will be able to read the entire content as in the printed version without missing any pages.

Is it legal to download Deep Reinforcement Learning for Adaptive Resource Allocation in Virtualized Network Functions PDF for free?

https://PDFdrive.to provides links to free educational resources available online. We do not store any files on our servers. Please be aware of copyright laws in your country before downloading.

The materials shared are intended for research, educational, and personal use in accordance with fair use principles.