ebook img

Deep Reinforcement Learning for Adaptive Resource Allocation in Virtualized Network Functions PDF

110 Pages·2017·2.48 MB·English
by  
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Deep Reinforcement Learning for Adaptive Resource Allocation in Virtualized Network Functions

DEGREE PROJECT IN ELECTRICAL ENGINEERING, SECOND CYCLE, 30 CREDITS STOCKHOLM, SWEDEN 2018 Deep Reinforcement Learning for Adaptive Resource Allocation in Virtualized Network Functions SIMON IGNAT KTH ROYAL INSTITUTE OF TECHNOLOGY SCHOOL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCE Deep Reinforcement Learning for Adaptive Resource Allocation in Virtualized Network Functions SIMON IGNAT, [email protected] M.Sc in Electrical Engineering Systems, Control and Robotics Date: October 7, 2018 Supervisor: Johannes A. Stork Examiner: Danica Kragic KTH Royal Institute of Technology School of Electrical Engineering and Computer Science ii Nobody ever figures out what life is all about, and it doesn’t matter. Explore the world. Nearly everything is really interesting if you go into it deeply enough. RICHARD FEYNMAN iii Abstract Network Function Virtualization (NFV) is the transition from propri- etary hardware functions to virtualized counterparts of them within the telecommunication industry. These virtualized counterparts are known as Virtualized Network Functions (VNFs) and are the main buildingblocksofNFV.Thetransitionstarted2012andisstillongoing, with research and development moving at a high pace. It is believed that when using virtualization, both capital and operating expenses canbeloweredasaresultofeasierdeployments,cheapersystemsand networks that can operate more autonomous. This thesis examines if thecurrentstateofNFVcanlowertheoperatingexpenseswhilemain- taining quality of service (QoS) high by using current state of the art machinelearningalgorithms. Morespecificallythethesisanalyzesthe problem of adaptive autoscaling of virtual machines (VMs) allocated by the VNFs with deep reinforcement learning (DRL). To analyze the task, the thesis implements a discrete time model for VNFs with the purposeofcapturingthefundamentalcharacteristicsofthescalingop- eration. It also examines the learning and robustness/generalization of six state-of-the-art DRL algorithms. The algorithms are examined since they have fundamental differences in their properties, ranging from off-policy methods such as DQN to on-policy methods such as PPO Advantage Actor Critic. The policies are compared to a baseline P-controllertoevaluatetheperformancewithrespecttosimplermeth- ods. The result from the model show that DRL needs around 100,000 samples to converge, which in a real setting would represent around 70daysoflearning. Thethesisalsoshowsthatthefinalpolicyapplied bytheagentdoesnotshowconsiderableimprovementsoverasimple controlalgorithmwithrespecttorewardandperformancewhenmul- tiple experiments with varying loads and configurations are tested. Due to the lack of data and slow real time systems, with robustness being an important consideration, the time to convergence required by a DRL agent is to long for an autoscaling solution to be deployed in the near future. Therefore, the author can not recommend DRL for autoscalinginVNFsgiventhecurrentstateofthetechnology. Instead the author recommend simpler methods, such as supervised machine learningorclassicalcontroltheory. iv Sammanfattning Network Function Virtualization (NFV) är övergången från propri- etärahårdvarufunktionertillvirtualiserademotsvarigheteravdeminom telekommunikationsindustrin. Dessavirtualiserademotsvarigheterär kända som Virtualized Network Functions (VNF) och kan ses som beståndsdelarnaavNFV.Tankaromvirtualiseringstartade2012ochär fortfarandepågående,därforskningochutvecklingfortskriderisnabb takt. Förhoppningen är att virtualiseringen ska sänka både kapital- och driftkostnader till följd av enklare installationer, billigare system och mer autonoma lösningar. Det här examensarbetet undersöker om NFV:snuvarandetillståndkansänkadriftskostnadernasamtidigtsom kvaliteten på tjänsten (QoS) hålls hög genom att använda maskinin- lärning. Mer specifikt undersöks Deep Reinforcment Learning (DRL) och problemet adaptiv autoskalning av virtuella maskiner som an- vänds av VNF:erna. För att analysera uppgiften implementerar ex- amensarbetet en diskret model över VNF:s med syftet att fånga de fundamentala egenskaperna hos skalningsoperationer. Det granskar också lärandet och robustheten av sex DRL-algoritmer. Algoritmerna undersöks eftersom de har grundläggande skillnader i deras egen- skaper,frånoff-policy-metodersåsomDQNtillon-policy-metoderså som PPO Advantage Actor Critic. Algoritmerna jämförs sedan med en P-regulator för att utvärdera prestanda med hänsyn till enklare metoder. Resultatet från studien visar att DRL behöver cirka 100 000 interaktionermedmodellenförattkonvergera,vilketienverkligmiljö skulle motsvara cirka 70 dagers lärande. Examensarbetet visar också att de konvergerade algoritmerna inte visar avsevärda förbättringar över den enkla P-regulatorn när flera experiment med varierande be- lastningar och konfigurationer testas. På grund av bristen på data ochdetlångsammarealtidssystem,därrobusthetärettviktigtövervä- gande, ses tiden för konvergens som krävs av en DRL-agent som ett stortproblem. DärförkanförfattareninterekommenderaDRLförau- toskalningiNFVmedtankepåteknikensnuvarandetillstånd. Istället rekommenderar författaren enklare metoder, såsom supervised ma- chinelearningellerklassiskkontrollteori. v Acknowledgements FirstlyIwouldliketothankmysupervisoranddomainexpertatEric- sson,HerbieFrancis,forhissupportandtrustinme. Ourdailydiscus- sionsaboutAI,lifeandthefaithofhumanityhashelpedmeinwriting thisthesisandgrowasaperson. I would also like to express my gratitude to Johannes Stork as my supervisorfromKTH,withouthissupportandideasthisthesiswould nothavebeenwhatitis. In addition to my supervisors I also want to thank Wenfeng and Tobias for their support, Ibrahim for our daily fussball matches, Tord forhisgenuineinterestintheprojectandJörgenforhishelpwithiden- tifyingandanalysingthedatafromVNFs. Lastly I would like to thank my boss, Thomas Edwall, for entrust- ing me in with this thesis and giving me the opportunity to explore andlearnaboutthesubjectofreinforcementlearning. Contents 1 Introduction 1 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 ResearchQuestion . . . . . . . . . . . . . . . . . . . . . . 3 1.3 RelatedWork . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.4 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2 NetworkFunctionVirtualization 5 2.1 MotivationofVirtualization . . . . . . . . . . . . . . . . . 5 2.2 ManagementOperations . . . . . . . . . . . . . . . . . . . 6 2.2.1 TheScalingAction . . . . . . . . . . . . . . . . . . 6 2.2.2 WaystoMeasurePerformance . . . . . . . . . . . 8 2.2.3 GettingDatafromCustomers . . . . . . . . . . . . 11 3 ReinforcementLearning 13 3.1 IntroductiontoReinforcementLearning . . . . . . . . . . 14 3.2 SolvingaTaskusingRL . . . . . . . . . . . . . . . . . . . 15 3.2.1 StateSelection . . . . . . . . . . . . . . . . . . . . . 15 3.2.2 RewardFunction . . . . . . . . . . . . . . . . . . . 16 3.2.3 EpisodicandContinuousTasks . . . . . . . . . . . 17 3.3 MathematicalFramework . . . . . . . . . . . . . . . . . . 19 3.3.1 MarkovDecisionProcess . . . . . . . . . . . . . . 20 3.3.2 ValueFunctions . . . . . . . . . . . . . . . . . . . . 21 3.4 Dynamicprogramming . . . . . . . . . . . . . . . . . . . 22 3.4.1 BellmanOptimalityEquations . . . . . . . . . . . 22 3.4.2 PolicyIteration . . . . . . . . . . . . . . . . . . . . 24 3.5 LearningfromExperience . . . . . . . . . . . . . . . . . . 25 3.5.1 ExploringandExploitingExperience . . . . . . . 25 3.5.2 MonteCarloLearning . . . . . . . . . . . . . . . . 28 3.5.3 TemporalDifference(TD)Learning . . . . . . . . 29 vi CONTENTS vii 3.5.4 UnifyingMethodology,n-stepTDlearning . . . . 30 3.5.5 FunctionApproximation . . . . . . . . . . . . . . 34 3.5.6 PolicyGradientMethods . . . . . . . . . . . . . . 35 4 DeepReinforcementLearning 38 4.1 ArtificialNeuralNetworks(ANN) . . . . . . . . . . . . . 39 4.1.1 FeedforwardNeuralNetworks . . . . . . . . . . . 39 4.1.2 NetworkStructures . . . . . . . . . . . . . . . . . . 42 4.1.3 TrainingtheNetwork . . . . . . . . . . . . . . . . 43 4.2 DeepReinforcementLearningAlgorithms . . . . . . . . . 46 4.2.1 NeuralFittedQ(NFQ)Iteration . . . . . . . . . . 48 4.2.2 DeepQ-Network(DQN) . . . . . . . . . . . . . . . 49 4.2.3 TrustRegionPolicyOptimization(TRPO) . . . . . 50 4.2.4 ProximalPolicyOptimization(PPO) . . . . . . . . 52 4.2.5 ActorCriticAlgorithms . . . . . . . . . . . . . . . 54 4.2.6 EvolutionaryStrategies . . . . . . . . . . . . . . . 55 4.3 ImprovementstoDRLAlgorithms . . . . . . . . . . . . . 56 4.3.1 DDQNandRainbowDQN . . . . . . . . . . . . . 56 4.3.2 AdvantageActorCritic(A2C)withGAE(�) . . . . 58 4.3.3 AsynchronousLearning . . . . . . . . . . . . . . . 60 5 Method 64 5.1 ModellingofaVNFDuringScaling . . . . . . . . . . . . 64 5.1.1 StatesintheModel . . . . . . . . . . . . . . . . . . 65 5.1.2 ActionsPossibletoPerformontheModel . . . . . 66 5.1.3 ExternalLoad,ltr . . . . . . . . . . . . . . . . . . . 66 t 5.1.4 State-transitiondynamics . . . . . . . . . . . . . . 67 5.2 ModelingtheTask. . . . . . . . . . . . . . . . . . . . . . . 71 5.2.1 RewardSignal . . . . . . . . . . . . . . . . . . . . . 71 5.2.2 ModelParameters . . . . . . . . . . . . . . . . . . 72 5.3 AutoscalingusingDRLonModel . . . . . . . . . . . . . . 74 5.4 ImplementationDetails. . . . . . . . . . . . . . . . . . . . 75 6 ExperimentsandResults 77 6.1 Experiment1: GatheringStatisticsofTraining . . . . . . . 77 6.1.1 Results . . . . . . . . . . . . . . . . . . . . . . . . . 78 6.1.2 InterpretationofResults . . . . . . . . . . . . . . . 82 6.2 Experiment2: ChangingtheModelParameters . . . . . . 83 6.2.1 Results . . . . . . . . . . . . . . . . . . . . . . . . . 83 6.2.2 InterpretationofResults . . . . . . . . . . . . . . . 83 viii CONTENTS 7 Discussion 89 7.1 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 7.1.1 UncertaintyDuetoDifferentConfigurationsand LoadPatterns . . . . . . . . . . . . . . . . . . . . . 90 7.1.2 SuboptimalPerformanceduetoExploring . . . . 91 7.1.3 ValidatingRewardFunctionandDesignConsid- erations . . . . . . . . . . . . . . . . . . . . . . . . 91 7.1.4 ModelandtheAgentsAppliedtoIt . . . . . . . . 92 7.2 Ethics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 8 Conclusion 96 8.1 FutureWorkonModel . . . . . . . . . . . . . . . . . . . . 96

Description:
known as Virtualized Network Functions (VNFs) and are the main building blocks .. is modelled as two different types of errors, explained more in detail later .. Policy iteration works by alternating between evaluating and improv-.
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.