Task Allocation and Scheduling of Concurrent Applications to Multiprocessor Systems Kaushik Ravindran Electrical Engineering and Computer Sciences University of California at Berkeley Technical Report No. UCB/EECS-2007-149 http://www.eecs.berkeley.edu/Pubs/TechRpts/2007/EECS-2007-149.html December 13, 2007 Copyright © 2007, by the author(s). All rights reserved. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission. TaskAllocationandSchedulingof ConcurrentApplicationstoMultiprocessorSystems by KaushikRavindran B.S.(GeorgiaInstituteofTechnology)2001 Adissertationsubmittedinpartialsatisfactionofthe requirementsforthedegreeof DoctorofPhilosophy in Engineering-ElectricalEngineeringandComputerSciences inthe GRADUATEDIVISION ofthe UNIVERSITYOFCALIFORNIA,BERKELEY Committeeincharge: ProfessorKurtKeutzer,Chair ProfessorJohnWawrzynek ProfessorAlperAtamtu¨rk Fall2007 TaskAllocationandSchedulingof ConcurrentApplicationstoMultiprocessorSystems Copyright2007 by KaushikRavindran 1 Abstract TaskAllocationandSchedulingof ConcurrentApplicationstoMultiprocessorSystems by KaushikRavindran DoctorofPhilosophyinEngineering-ElectricalEngineeringandComputerSciences UniversityofCalifornia,Berkeley ProfessorKurtKeutzer,Chair Programmable multiprocessors are increasingly popular platforms for high performance em- bedded applications. An important step in deploying applications on multiprocessors is to allo- cateandscheduleconcurrenttaskstotheprocessingandcommunicationresourcesoftheplatform. When the application workload and execution profiles can be reliably estimated at compile time, it is viable to determine an application mapping statically. Many applications from the signal pro- cessingandnetworkprocessingdomainsarestaticallyscheduledonmultiprocessorsystems. Static schedulingisalsorelevanttodesignspaceexplorationformicro-architecturesandsystems. Owing to the computational complexity of optimal static scheduling, a number of heuristic methods have been proposed for different scheduling conditions and architecture models. Un- fortunately, these methods lack the flexibility necessary to enforce implementation and resource constraints that complicate practical multiprocessor scheduling problems. While it is important to findgoodsolutionsquickly,aneffectiveschedulingmethodmustalsoreliablycapturetheproblem specificationandflexiblyaccommodatediverseconstraintsandobjectives. Thisdissertationisanattempttodevelopinsightintoefficientandflexiblemethodsforallocat- ing and scheduling concurrent applications to multiprocessor architectures. We conduct our study in four parts. First, we analyze the nature of the scheduling problems that arise in a realistic ex- ploration framework. Second, we evaluate competitive heuristic, randomized, and exact methods for these scheduling problems. Third, we propose methods based on mathematical and constraint programming for a representative scheduling problem. Though expressiveness and flexibility are advantages of these methods, generic constraint formulations suffer prohibitive run times even on 2 modestly sized problems. To alleviate this difficulty, we advance several strategies to accelerate constraintprogramming,suchasproblemdecompositions,searchguidancethroughheuristicmeth- ods,andtightlowerboundcomputations. Theinherentflexibility,coupledwithimprovedruntimes from a decomposition strategy, posit constraint programming as a powerful tool for multiproces- sor scheduling problems. Finally, we present a toolbox of practical scheduling methods, which provide different trade-offs with respect to computational efficiency, quality of results, and flexi- bility. Our toolbox is composed of heuristic methods, constraint programming formulations, and simulatedannealingtechniques. Thesemethodsarepartofanexplorationframeworkfordeploying network processing applications on two embedded platforms: Intel IXP network processors and XilinxFPGAbasedsoftmultiprocessors. ProfessorKurtKeutzer DissertationCommitteeChair i “Bettertoremainsilentandbethoughtafoolthantospeakoutandremovealldoubt.” –AbrahamLincoln ii Contents ListofFigures v ListofTables vii 1 TheTrendtoSingleChipMultiprocessorSystems 1 1.1 DeployingConcurrentApplicationsonMultiprocessors . . . . . . . . . . . . . . . 3 1.1.1 TheImplementationGap . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.1.2 AMethodologytoBridgetheImplementationGap . . . . . . . . . . . . . 4 1.2 TheMappingProblemforMultiprocessorSystems . . . . . . . . . . . . . . . . . 5 1.2.1 StaticModels,StaticScheduling . . . . . . . . . . . . . . . . . . . . . . . 7 1.2.2 ComplexityofStaticScheduling . . . . . . . . . . . . . . . . . . . . . . . 8 1.2.3 CommonMethodsforStaticScheduling . . . . . . . . . . . . . . . . . . . 10 1.3 TheQuestforEfficientandFlexibleSchedulingMethods . . . . . . . . . . . . . . 12 1.4 ContributionsofthisDissertation . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2 AFrameworkforMappingandDesignSpaceExploration 16 2.1 AFrameworkforMappingandExploration . . . . . . . . . . . . . . . . . . . . . 16 2.1.1 DomainSpecificLanguageforApplicationRepresentation . . . . . . . . . 17 2.1.2 TheMappingStep . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.1.3 PerformanceAnalysisandFeedback . . . . . . . . . . . . . . . . . . . . . 19 2.2 TheNetworkProcessingDomain: ApplicationsandPlatforms . . . . . . . . . . . 20 2.2.1 NetworkProcessingApplications . . . . . . . . . . . . . . . . . . . . . . 21 2.2.2 IntelIXPNetworkProcessors . . . . . . . . . . . . . . . . . . . . . . . . 22 2.2.3 XilinxFPGAbasedSoftMultiprocessors . . . . . . . . . . . . . . . . . . 24 2.3 ExplorationFrameworkforNetworkProcessingApplications . . . . . . . . . . . . 26 2.3.1 DomainSpecificLanguageforApplicationRepresentation . . . . . . . . . 27 2.3.2 TheMappingStep . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 2.3.3 PerformanceAnalysisandFeedback . . . . . . . . . . . . . . . . . . . . . 30 2.4 MotivationforanEfficientandFlexibleMappingApproach. . . . . . . . . . . . . 30 3 ModelsandMethodsfortheSchedulingProblem 31 3.1 ModelsforStaticScheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 3.1.1 TheApplicationTaskGraphModel . . . . . . . . . . . . . . . . . . . . . 32 3.1.2 TheMultiprocessorArchitectureModel . . . . . . . . . . . . . . . . . . . 34 iii 3.1.3 PerformanceModelfortheTaskGraph . . . . . . . . . . . . . . . . . . . 35 3.1.4 OptimizationObjective . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 3.1.5 ImplementationandResourceConstraints . . . . . . . . . . . . . . . . . . 37 3.2 MethodsforStaticScheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 3.2.1 HeuristicMethods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 3.2.2 ListSchedulingusingDynamicLevels . . . . . . . . . . . . . . . . . . . 42 3.2.3 EvolutionaryAlgorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 3.2.4 SimulatedAnnealing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 3.2.5 EnumerativeBranch-and-Bound . . . . . . . . . . . . . . . . . . . . . . . 45 3.2.6 MathematicalandConstraintProgramming . . . . . . . . . . . . . . . . . 45 3.3 SchedulingToolsandFrameworks . . . . . . . . . . . . . . . . . . . . . . . . . . 46 3.4 TheRightMethodfortheJob . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 4 ConstraintProgrammingMethodsforStaticScheduling 50 4.1 ARepresentativeStaticSchedulingProblem . . . . . . . . . . . . . . . . . . . . . 50 4.1.1 MultiprocessorArchitectureModel . . . . . . . . . . . . . . . . . . . . . 51 4.1.2 ApplicationTaskGraph . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 4.1.3 ExecutionTimeandCommunicationDelayModels . . . . . . . . . . . . . 52 4.1.4 ValidAllocation,ValidSchedule . . . . . . . . . . . . . . . . . . . . . . . 53 4.1.5 OptimizationObjective . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 4.1.6 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 4.1.7 ImplementationandResourceConstraints . . . . . . . . . . . . . . . . . . 55 4.1.8 ComplexityoftheSchedulingProblem . . . . . . . . . . . . . . . . . . . 57 4.2 AMixedIntegerLinearProgrammingFormulation . . . . . . . . . . . . . . . . . 58 4.2.1 Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 4.2.2 Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 4.3 MappingResultsforNetworkProcessingApplications . . . . . . . . . . . . . . . 61 4.3.1 IPv4PacketForwardingonFPGAbasedSoftMultiprocessors . . . . . . . 62 4.3.2 DifferentiatedServicesontheIXP1200NetworkProcessor . . . . . . . . . 69 4.4 ACaseforanEfficientandFlexibleMappingApproach . . . . . . . . . . . . . . 71 5 TechniquestoAccelerateConstraintProgrammingMethods 73 5.1 TheConceptofProblemDecomposition . . . . . . . . . . . . . . . . . . . . . . . 74 5.1.1 RelatedDecompositionApproachesforSchedulingProblems . . . . . . . 75 5.1.2 OverviewoftheDecompositionApproach . . . . . . . . . . . . . . . . . 75 5.2 ADecompositionApproachforStaticScheduling . . . . . . . . . . . . . . . . . . 76 5.2.1 MasterProblemFormulation . . . . . . . . . . . . . . . . . . . . . . . . . 78 5.2.2 SubProblemDecompositionConstraints . . . . . . . . . . . . . . . . . . 80 5.2.3 AlgorithmicExtensionstoImprovePerformance . . . . . . . . . . . . . . 83 5.3 EvaluationoftheDecompositionApproach . . . . . . . . . . . . . . . . . . . . . 85 5.3.1 BenchmarkSet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 5.3.2 ComparisonstoHeuristicsandSingle-PassMILPFormulations . . . . . . 85 5.3.3 ExtensibilityofConstraintProgramming . . . . . . . . . . . . . . . . . . 89 5.4 AnEfficientandFlexibleMappingApproach . . . . . . . . . . . . . . . . . . . . 92 iv 6 AToolboxofSchedulingMethods 94 6.1 TheValueofGoodHeuristics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 6.1.1 DynamicLevelSchedulingRevisited . . . . . . . . . . . . . . . . . . . . 95 6.1.2 GuidanceforSearchinBranch-and-BoundMethods . . . . . . . . . . . . 97 6.1.3 EvaluationofHeuristicSearchGuidance . . . . . . . . . . . . . . . . . . 99 6.2 SimulatedAnnealingforLargeTaskGraphs . . . . . . . . . . . . . . . . . . . . . 102 6.2.1 AGenericSimulatedAnnealingAlgorithm . . . . . . . . . . . . . . . . . 102 6.2.2 AnnealingStrategyfortheRepresentativeSchedulingProblem . . . . . . . 103 6.3 EvaluationofSchedulingMethods . . . . . . . . . . . . . . . . . . . . . . . . . . 105 6.4 TheRightMethodfortheJob . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 7 ConclusionsandFurtherWork 112 7.1 ConstraintProgrammingMethodsforScheduling . . . . . . . . . . . . . . . . . . 113 7.2 AToolboxofPracticalSchedulingMethods . . . . . . . . . . . . . . . . . . . . . 115 7.3 ExplorationFrameworkforNetworkProcessingApplications . . . . . . . . . . . . 118 Bibliography 120
Description: