ebook img

DTIC ADA439729: Exploiting Application Tunability for Efficient, Predictable Parallel Resource Management PDF

19 Pages·0.23 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview DTIC ADA439729: Exploiting Application Tunability for Efficient, Predictable Parallel Resource Management

Exploiting Application Tunability for Efficient, Predictable Parallel Resource Management FangzheChang,VijayKaramcheti,andZviKedem DepartmentofComputerScience CourantInstituteofMathematicalSciences NewYorkUniversity (cid:0) fangzhe,vijayk,kedem @cs.nyu.edu (cid:1) Abstract Parallelcomputingis becomingincreasingcentraland mainstream, drivenboth by the widespread availabilityofcommoditySMPandhigh-performanceclusterplatforms,aswellasthegrowinguseof parallelismingeneral-purposeapplicationssuchasimagerecognition,virtualreality,andmediaprocess- ing. Inadditiontoperformancerequirements,thelattercomputationsimposesoftreal-timeconstraints, necessitating efficient, predictable parallel resource management. Unfortunately, traditional resource management approachesin both parallel and real-time systems are inadequate for meeting this objec- tive; the parallelapproachesfocus primarilyon improvingapplicationperformanceand/orsystemuti- lizationatthecostofarbitrarilydelayingagivenapplication,whilethereal-timeapproachesareoverly conservativesacrificingsystemutilizationinordertomeetapplicationdeadlines. Inthispaper,weproposeanovelapproachforincreasingparallelsystemutilizationwhilemeeting applicationsoft real-timedeadlines. Our approach exploitsthe applicationtunabilityfound in several general-purposecomputations. Tunabilityreferstoanapplication’sabilitytotradeoffresourcerequire- mentsovertime,whilemaintainingadesiredlevelofoutputquality.Inotherwords,alargeallocationof resourcesinonestageofthecomputation’slifetimemaycompensate,inaparameterizablemanner,for asmallerallocationinanotherstage. Wefirstdescribelanguageextensionstosupporttunabilityinthe Calypsoprogrammingsystem, acomponentofthe MILANmetacomputingproject, and evaluatetheir expressivenessusinganimageprocessingapplication.Wethencharacterizetheperformancebenefitsof tunability,usingasynthetictasksystemtosystematicallyidentifyitsbenefitsandshortcomings.Ourre- sultsareveryencouraging:applicationtunabilityisconvenienttoexpress,andcansignificantlyimprove parallelsystemutilizationforcomputationswithpredictabilityrequirements. 1 Introduction Parallelcomputing has become central and mainstream, driven both by a decreasein cost and a growth in itsgeneral-purposeapplicability. Thecommoditynatureofsmall-scalesymmetricmultiprocessors(SMPs) and high-performance cluster platforms [4] has ensured the widespread availability of relatively inexpen- sive hardware platforms. Moreover, these platforms are increasingly being used to run general-purpose applicationssuchasimagerecognition,virtualreality, and mediaprocessing[6]. However, theselatterap- plications introduce additional demands on the management of resources in a parallel system. On top of the performance requirementstraditionally associated with parallel applications(e.g., floating-pointinten- sive scientific codes), these applications impose additional soft real-time constraints requiring completion ofdifferentportionsoftheapplicationwithinspecificintervals oftime. Forexample,anapplicationthatis trying to analyze a live video feed to recognize important artifacts for archival and classificationpurposes 1 Report Documentation Page Form Approved OMB No. 0704-0188 Public reporting burden for the collection of information is estimated to average 1 hour per response, including the time for reviewing instructions, searching existing data sources, gathering and maintaining the data needed, and completing and reviewing the collection of information. Send comments regarding this burden estimate or any other aspect of this collection of information, including suggestions for reducing this burden, to Washington Headquarters Services, Directorate for Information Operations and Reports, 1215 Jefferson Davis Highway, Suite 1204, Arlington VA 22202-4302. Respondents should be aware that notwithstanding any other provision of law, no person shall be subject to a penalty for failing to comply with a collection of information if it does not display a currently valid OMB control number. 1. REPORT DATE 3. DATES COVERED 1998 2. REPORT TYPE - 4. TITLE AND SUBTITLE 5a. CONTRACT NUMBER Exploiting Application Tunability for Efficient, Predictable Parallel 5b. GRANT NUMBER Resource Management 5c. PROGRAM ELEMENT NUMBER 6. AUTHOR(S) 5d. PROJECT NUMBER 5e. TASK NUMBER 5f. WORK UNIT NUMBER 7. PERFORMING ORGANIZATION NAME(S) AND ADDRESS(ES) 8. PERFORMING ORGANIZATION Defense Advanced Research Projects Agency,3701 North Fairfax REPORT NUMBER Drive,Arlington,VA,22203-1714 9. SPONSORING/MONITORING AGENCY NAME(S) AND ADDRESS(ES) 10. SPONSOR/MONITOR’S ACRONYM(S) 11. SPONSOR/MONITOR’S REPORT NUMBER(S) 12. DISTRIBUTION/AVAILABILITY STATEMENT Approved for public release; distribution unlimited 13. SUPPLEMENTARY NOTES The original document contains color images. 14. ABSTRACT see report 15. SUBJECT TERMS 16. SECURITY CLASSIFICATION OF: 17. LIMITATION OF 18. NUMBER 19a. NAME OF ABSTRACT OF PAGES RESPONSIBLE PERSON a. REPORT b. ABSTRACT c. THIS PAGE 18 unclassified unclassified unclassified Standard Form 298 (Rev. 8-98) Prescribed by ANSI Std Z39-18 needs to complete its processing by the time the next frame arrives. Thus, such applications necessitate efficient, predictable parallel resource management that must simultaneouslyoptimizeresourceutilization andensurethatapplicationsmeettheirdeadlines.1 Unfortunately, traditional resource management approaches in both parallel and real-time systems are inadequateformeetingtheseobjectives. Theparallelapproachesfocusprimarilyonimprovingapplication performanceand/orsystemutilizationatthecostofprovidingonlybesteffortguaranteestotheapplication. A specific application can experience arbitrary delay which may grow with the number of applications contendingfortheresources. Clearly,suchdelaysareunacceptableforsoftreal-timeapplicationsthatwork with continuous media. Real-time systems are better at providing predictable guarantees to applications. However, they do so by being overly conservative, ensuring that enough resources are available for each applicationtomeetitsdeadline. Admissioncontrolisusedtoensurethatthesystemisalwaysunderloaded, providingapplicationpredictabilityatthecostofsystemutilization. Inthispaper, weproposea novelapproachforincreasingparallelsystemutilizationwhilemeetingap- plicationsoftreal-timedeadlines. Ourapproachexploitstheapplicationtunabilityfoundinseveralgeneral- purposecomputationssuchastheonesdescribedabove. Tunabilityreferstoanapplication’s abilitytotrade off resource requirements over time, while maintaining a desired level of output quality. In other words, a large allocationof resourcesin one stage of the computation’s lifetimemaycompensate, in a parameter- izable manner, for a smaller allocation in another stage. For example, the artifact recognition application may first sample different portions of the image to decide on interesting regions, and then run a resource intensive algorithmon these regions; spendingmore resources on the sampling step reduces the work that willneedtobeperformedintheanalysisstep. Applicationtunabilityprovides flexibilitytotheunderlying resourcemanagementsystem, which can now use the choice inresourceallocationprofilesto increasethe numberofapplicationsthatcanbeadmittedintothesystem(improvingutilization),whilestillensuringthat anapplicationmeetsitsreal-timerequirements. Thispaperdescribessupportforpredictable,tunableapplicationsintheMILANparallelanddistributed computing system [17]. We first describe language extensions to the Calypso programming language for expressing application tunability, and evaluate their effectiveness by describing the construction of a tun- able image processing application (junction detection). We then characterize the performance benefits of applicationtunability, byfirstproposingacompetitiveheuristicfortheunderlyingschedulingproblem,and then using a synthetic task system to systematically identify the benefits and shortcomings of tunability. Wefindthattunabilityhelpsimprovebothsystemutilizationandjobthroughput(thenumberofjobswhich meet their deadlines) over a wide range of task parameters, such as arrival rate, deadline laxity, and the shapeoftherequiredresourceprofile. Ourresultsareveryencouraging: applicationtunabilityisconvenient to express, and can significantly improve parallel system utilization for computations with predictability requirements. The rest of this paper is organized as follows. Section 2 provides relevant backgroundon the MILAN system and the Calypso programming language. In Section 3, we present the overall MILAN resource managementarchitectureforpredictablecomputations.Section4describesextensionstotheCalypsosource language for supporting tunability. The performance impact of tunability is characterized in Section 5. Section6placesourworkincontextwithrelatedeffortsandweconcludeinSection7. 1Inthispaperweassumethatprocessorsaretheprimaryresourcethatisbeingmanaged: applicationsrequestnon-preemptive allocationofaspecificnumberofprocessorsforafixedamountoftime. 2 2 Background and Project Context We describe, in turn, the MILAN parallel and distributed computing system, and the Calypso program- ming language. Application tunability is expressed employing extensions to the Calypso language, and is exploitedbytheMILANresourcemanager. 2.1 TheMILANSystem The MILAN system supports robust, predictable parallel computation over a possibly heterogenous col- lection of resources. MILANtakes advantage of two execution techniqueswith strongtheoreticalfounda- tions [5]—two-phase idempotent execution strategy, and eager scheduling—to provide programmers with the view of a fault-free virtual shared memory environment, even when the underlying resources may in- cur faults and exhibit wide variations in processing speeds. This support is exposed to the programmer in the form of several programming systems: Calypso [1] described in further detail below, Chime [15] whichsupportsdistributedexecutionofCC++[3]programs,andCharlotte[2]whichprovidesaweb-based metacomputing infrastructure. In addition, the MILAN system consists of supporting infrastructure such as ResourceBroker (a system for dynamically managing the association and integration of resources into multipleparallelcomputationsaccordingtouser-specifiedpolicies)andKnittingFactory(atoolkitforcon- structionofdistributedapplicationsinanunpredictablemetacomputingenvironment). The MILAN resource manager allocates system resources among competing applications based upon theirresourceandpredictabilityrequirements,andprovidesthecontextforthisresearch. Section3describes theresourcemanageringreaterdetail. 2.2 TheCalypsoProgrammingLanguage The programmingmodel supportedby Calypsois one of parallel tasks insertedinto a sequentialprogram. Theseparalleltasksareresponsibleforperformingthecomputationallyintensivework,whilethesequential code is responsible for the high-level control-flow and I/O. Figure 1 illustrates a fragment of an evolving Calypso execution. Within a parallel step, Calypso supports CREW (concurrent read, exclusive write) semanticstoshareddatastructures,withupdatesvisibleonlyattheendofthecurrentstep. Additionally,the paralleltasks are idempotent, implying thata code segment can be executed multipletimes (with possibly some partial executions), with exactly-once semantics. These multiple executions mask any faults in the underlyingresources. CalypsoaugmentsstandardC++withfourkeywords: shared,parbegin,parend,androutine. Globallysharedvariablesaredeclaredusingthekeywordshared. parbeginandparendhelpdelimit aparallelstepconsistingofasequenceofroutinestatements: parbegin routine [int-exp](int width, int number) routine-body1 routine [int-exp](int width, int number) routine-body2 parend; The routine statements specify the tasks within the parallel step: routine-body1 and routine-body2 are sequential C++ program fragments, int-exp specifies an integer expression indicating the number of copiesofeachroutinethatneedtobecreatedwithintheparallelstep,andwidthandnumberarearguments 3 step 1 : sequential(cid:13) step 2 : parallel(cid:13) step 3 : sequential(cid:13) step 4 : parallel(cid:13) step 5 : sequential(cid:13) Figure1: AfragmentofanevolvingCalypsoexecution provided to each task denoting, respectively, the number of tasks created and the sequence number of the specific task among these tasks. As shown in the code fragment above, each parallel step may consist of multiple routine statements. Concurrency exists both inside one routine, as well as among multiple routineswithinthesameparallelstep. 3 Resource Management Architecture The MILAN resource management architecture, shown in Figure 2, consists of two major components: an application-level QoS agent and the system-level QoS arbitrator.2 The QoS agent communicates the applicationresourceandpredictabilityrequirementstotheQoSArbitrator, whichsatisfiesthisrequest(and thosefromotherapplications)byprovidinganappropriateresourceallocation.Wediscussthesecomponents indetailbelow. 3.1 ComponentsoftheArchitecture QoS agent The QoS agent acts on behalfof the applicationto negotiate an appropriatelevel of resource reservation/allocation with the QoS arbitrator. The QoS agent, automatically generated from the applica- tion’s specificationbyapreprocessingstep(seeSection4fordetails),describestheapplication’s real-time constraints,itsresourcerequirements,and moreimportantlyitstunability. Tunabilityisexpressedbyindi- catingchoiceintheexecutionpathavailablefortheapplication. As Figure 2 shows, from the perspective of the QoS agent, the application is viewed as an execution path(a chain, or moregenerally, a dag)comprisingseveral tasks, eachwith its own resourcerequirements 2Thecomponentnamessignifyourfocusonprovidingpredictablequalityofservice(QoS)forapplications.Here,qualityrefers bothtothedesiredoutputsaswellasassociatedtimelinessguarantees. 4 Original Program(cid:13) Tunable Program(cid:13) QoS Specifications(cid:13) QoS Agent(cid:13) d(cid:13) d(cid:13) d(cid:13) Q(cid:13) 1(cid:13) Q(cid:13) 2(cid:13) Q(cid:13) 3(cid:13) r(cid:13) 1(cid:13) r(cid:13) 2(cid:13) r(cid:13) 3(cid:13) 1(cid:13) 2(cid:13) 3(cid:13) Request(cid:13) Allocation(cid:13) Q(cid:13) Q(cid:13) Q(cid:13) r(cid:13) 1(cid:13) r(cid:13) 2(cid:13) r(cid:13) 3(cid:13) 1(cid:13) 2(cid:13) 3(cid:13) d(cid:13) d(cid:13) d(cid:13) 1(cid:13) 2(cid:13) 3(cid:13) QoS Arbitrator(cid:13) Figure2: MILANresourcemanagementarchitecture. and deadlines. Resource requirements can be thought of as a vector of values, one for each resource in the system. Each task also has an associated output quality, closely related to the requested resources. Thequality value ofthe executionpath is obtainedby composingthe outputqualitiesof each ofthe tasks. Tunabilityisexpressedbyspecifyingmultiplesuchexecutionpaths,eachwithitsownresourcerequirement and deadlineprofiles,representingalternatewaysinwhichthe applicationcanconsumeresourcesinorder toproduceoutputswiththedesiredquality. For the above application model, the Qos agent negotiates a level of resource allocation for each task which maximizes the application output quality. In general, this negotiation can be dynamic possibly in- volvinganinitialallocationthatgetsrevisedasafunctionofchangingapplicationdemandsand/orchanging system conditions. For the results reported in the rest of the paper however, we restrict our attention to a relatively static negotiation model: the QoS agent communicates all the possible application execution paths and their resourcerequirements up front, and receives in return (from the QoS arbitrator)a resource allocationprofileforoneofthesepaths. QoSarbitrator TheQoSarbitratortakesadvantageoftheflexibleprogramspecificationprovidedbyQoS agents to enhance system utilization while satisfying the predictability requirements of each application. In MILAN, this flexibility comes from two aspects. First, application tunability provides the freedom to choose a macro-level resource allocation over time for each application. And second, the adaptiveness of the underlying fault-masking techniques (two-phase idempotent execution and eager scheduling) provide micro-level flexibility, permittingpreemptive allocation,deallocation,andreallocationofresources. Inthis paper, werestrictourfocustotheflexibilityobtainedfromapplicationtunability. 5 Uponjobarrival,theQoSarbitratorfirstperformsadmissioncontroltocheckwhetherornotapplication resource requirements can be satisfied. Application tunability increases the likelihood that an application can be admitted into the system. The QoS arbitrator scheduling algorithms (discussed in Section 5) first choosethebestexecutionpath,andthenmakeanassignmentofwhichprocessorswillexecutewhichappli- cationtasksandforwhattime. ThesedecisionsarecommunicatedbacktotheapplicationQoSagentwhich configurestheapplicationappropriately. Ingeneral,theQoSarbitratoralsomonitorssystemresources,and triggers renegotiation on detecting a significant change in resource levels (e.g., on a fault, or when new resources become available as in a metacomputingenvironment). For the purposes of this paper however, weassumethattheunderlyingsystemisfault-freeandhasafixedamountofavailableresources. 3.2 JunctionDetection: An ExampleTunableApplication Thejunctiondetection[11]applicationdetectsdistinguishedpixelsinanimagewheretheintensityorcolor changes abruptly. Junction detection is a core component of several image-processing applications, often servingasaprecursortoshapeconstructionandclassificationtasks. Ourjunctiondetectionalgorithmcon- sists of three steps. The first step samples a subset of the pixels in parallel and performs a quick test to determinewhetherornotthetestedpixelisofinterest. A pixelisofinterestifthedifferenceamonginten- sities/colorsofitsneighborpixelsisbeyondathreshold. Thesecondstepdrawsaregionofinterestaround a cluster of interesting pixels. The region is essentially a convex hull containing at least a certain number ofinterestingpixelsincloseproximity. Finally, thethirdsteprunsacompute-intensive algorithmforevery pixelintheregionsofinterest. Junction detection is a tunable application because the granularity of sampling in the first step can be parameterized,resultingindifferentresourcerequirements. Thecomputationcancompensate(withrespect ofresultquality)foracoarsersamplinginthefirsttaskbypossiblydrawingadditionaland/orlargerregions ofinterest. Thus,asmallerresourcerequirementinthefirststep(forcoarsersampling)iscompensatedfor byrequiringalargerresourceallocationinthethirdstep. Figure3demonstratesthistunability,showingtwo configurationswithdifferentsamplinggranularities,differentthresholdsfordrawingtheregionsofinterest, andconsequentlydifferentresourcerequirementsforthethirdstep. Parameters(cid:13) Time(cid:13) Parameters(cid:13) Time(cid:13) step 1(cid:13) sampleGranularity = 16(cid:13) 8(cid:13) step 1(cid:13) sampleGranularity = 64(cid:13) 2(cid:13) step 2(cid:13) searchDistance = 24(cid:13) 1(cid:13) step 2(cid:13) searchDistance = 40(cid:13) 2(cid:13) step 3(cid:13) 76(cid:13) step 3(cid:13) 81(cid:13) Figure3: Atunablejunction-detectionalgorithm. TheQoSagentforthisapplicationrepresentstunabilityintermsoftwoparameters: thesamplinggran- ularity, andasearchdistancemetricwhichdetermineshowregionsofinterestareconstructedinthesecond stepofthealgorithm. Thesamplinggranularityparameteraffectsthenumberofpixelssampledinthefirst step,withcoarsesamplingbeingcompensatedforbyahighervalueofthesearchdistance(correspondingly, a larger ormoreregions ofinterest). Theresourcerequirements,deadlines,andoutputqualitiesassociated witheachalternateexecutionpathareassumedtobeavailableapriori(thesecanbeobtainedbyprofilingon atrainingsetofrepresentativeimages). TheQoSagentcommunicatesthistasksystemtotheQoSarbitrator, 6 which chooses the path that will be executed. Note that depending upon system load, different paths may get chosen for junction-detection jobs which arrive at different times. The QoS agent then configures the applicationtoexecutealongthatpath. Inthiscase,applicationconfigurationjustrequiressettingvaluesfor thesamplinggranularityandsearchdistanceparameters. 4 Language Support for Tunability We extend the Calypso programming language to support expression of tunable, predictable applications. Theseextensionsprimarilyassociateresourcerequirementsandoutputqualityvalueswithindividualtasks, specifyhowtasksmaybeconfiguredasafunctionofavailableresources,andfinallyspecifyalternatepaths ofexecutionthroughtheprogram. TheCalypsopreprocessorusestheseextensionstoconstructaQoSagent for the program which embodies the task graph and tunability aspects of the application. As described in Section3,duringjobstartuptime,thisQoSagentnegotiateswiththeQoSarbitratorforanappropriatelevel ofresourceallocation. We first discuss in general the different models of applicationtunability, describe tunabilitysupportin Calypso,andfinallyexaminehowthejunctiondetectionprogramintroducedinSection3canbeexpressed usingtheseextensions. 4.1 Different ModelsofTunability Tunabilitymodifiesthebehaviorofanapplicationinresponsetoavailableresources. Therearetwoorthogo- naldimensionsalongwhichsuchbehaviormodificationcanbeexpressed: granularityofcodemodification, andthegranularityofresourcechangewhichcanbedetectedandacteduponbytheprogram. Thefirstdi- mensionreferstohowapplicationbehaviormodificationiseffected. Applicationbehavioriscontrolledboth atthemacro-levelbythealgorithmsituses(coarsetunability), andatthemicro-levelbycertainparameters thataffectcontrol-flowdecisions(finetunability). Theseconddimensionreferstoanapplication’s abilityto respondtoachangeinavailableresources. Someapplicationsmaybeabletotakeadvantageofevenavery smallchange in resources(continuous tunability), whileother applicationscan only respond to a finite set ofspecificresourcelevels(discretetunability). Applicationstypicallyexhibittunabilitycorrespondingtothreeofthepossiblefourcombinations:coarse- discrete, fine-discrete, and fine-continuous. The first two situations are probably most common where an application can use different algorithms (e.g., compression algorithms in a continuous media application) or modifyits controlparametersinresponseto a changein resourceavailability. However, the application can only react to discrete resource levels, not the entire range. An example of the third situation is the samplingstepofourjunction-detection program: thesamplinggranularityservesasaknobwhichcanvary applicationresourcerequirementsoveracontinuousrange. 4.2 CalypsoExtensionsforTunability TheseextensionstotheCalypsoprogramminglanguageidentifyprogramcontrolparameters,alternateex- ecution paths, and the corresponding resource and timeliness requirements of program tasks. To keep our preprocessorsimple,wefocusonsupportingonlytwomodelsoftunability: coarse-discrete(wheretunabil- ity is achieved by using different algorithms), and fine-discrete (where tunability is effected by modifying 7 parametervalues).3 Moreover,werestrictourattentiontocomputationswherethesequenceofparallelsteps isindependentofprogramcontrolflow. Mostparallelapplicationssatisfythisassumption. The language extensions can broadly be classified into three categories: declaration constructs (for identifyingcontrol parameters), task constructs(forassociatingresourceand timelinessrequirementswith tasks), and structure constructs (for specifying alternate execution paths in the code). Control parameters aredeclared(andoptionallyinitialized)withinthetask control- parametersblock,usedasinthe followingcodefragment: task control parameters (cid:0) ... (cid:1) ; These parameters are used by the QoS agent, after receiving an allocation of resources from the QoS arbitrator, toappropriatelyconfiguretheprogram. Thetaskconstruct,task,actsasa wrapperaroundasequentialorparallelCalypsostep,andspecifies the task deadline, its control parameters, and resource requirements and output quality values for each of theacceptabletaskconfigurations. Thelattercorrespondtotheassignmentofspecificvaluestothecontrol parameters. Thefollowingcodefragmentdemonstratesusageofthetaskconstruct: task name [deadline] [parameter-list] [((cid:0) parameter-values(cid:1) ,(cid:0) resource-request(cid:1) ,quality), ... ] // Calypso code for the task taskend The deadline denotes the time within which this task should complete. The parameter-list is a list of controlparameterswhichwillbeassignedavalueexactlybeforethistaskstartsattheexecutiontime. The acceptable task configurations are shown as a list in the last argument of the task construct. Each con- figuration consists of the parameter-values, which is a list of value assignments to the control parameters identifiedinparameter-list, theresource-request, whichisavectorofvaluesofsizeequaltothenumberof resource types, and the quality which describes the quality value of the task output for the current config- uration. Forthe purposesof thispaper, resource-request isa processor-time tuple, denotingthe numberof processorsrequiredforthetaskandthetimedurationtheyarerequiredfor. Two structuring constructs are provided to enable selection of an execution path through the program. The task select and task loop constructs are used to represent choice of configurations within a parallel step, and overall iterative structure of the program, respectively. The following code fragment demonstratestheiruse: task select when when-expr task, task select, or task loop constructs finally finally-code ... when when-expr task, task select, or task loop constructs finally finally-code task selectend; task loop (loop-expr) 3Supportingfine-continuoustunabilityrequiresthatthepreprocessorbeabletohandlesymbolicexpressionsforresourcere- quirementsanddeadlines.Thisismoreanimplementationlimitationratherthanafundamentalissue. 8 task, task select, or task loop constructs task loopend; task selectpermitsselectionamongmultipletaskswiththesamestepwhosereadinessischeckedusing the when-expr. The finally-codeis executed upon completionof the task and togetherwith the whencon- structpermitsexecution paths tobe defined in the program. Thetask loopconstruct isused to expressan iterativestructuresurroundingthesequentialandparallelsteps. Itsargument,loop-expr,denotesthenumber ofiterations. Bothwhen-expr andloop-expr canonlyincludeconstantsandcontrolparameters,facilitating theirevaluationatschedulingtime. TheCalypsolanguageextensionsareorthogonaltotheparallelismconstructs,enablingtunabilitytobe incrementallyincorporatedintoanapplicationprogram. 4.3 ExpressingTunabilityintheJunctionDetectionProgram Wenext demonstratethe useof theabove languageconstructsinthe context of thejunctiondetectionpro- gramdescribedinSection3. Asmentionedearlier, the programistunablewith respecttotwo parameters, controllingthesamplinggranularityinthefirststep,andthesearchdistanceinthesecondstep. Itisassumed thattheresourcerequirementsanddeadlinesforeachstepareobtainedbyseparatelyprofilingtheprogram. Herewefocusonhowapplicationflexibilityandthesepredictabilityrequirementscanbeexpressed. Figure 4 shows the pseudo code for the original (fixed) and tunable versions of the junction detection program. The sampling granularity and search distance parameters are identified as control parameters whose values will be provided by the QoS agent based upon its negotiation with the QoS arbitrator. The restoftheprogramusestheseparameterstocontroltheapplicationbehavior. Thetunabilityinthefirststep, sampleImage, is expressed using the task construct: the arguments state that the deadline for the step is 10.0 units, the step behavior is controlled by the sampleGranularity parameter, and allowable configurations correspond to sampleGranularity=16 and sampleGranularity=64. The first configurationrequires4processorsfor8.0unitsoftime,whilethesecondrequires4processorsfor2.0units oftime. Thesecondstep,markRegion,admitscoarse-discretetunability, allowinguseofdifferentalgorithms based on the sampling granularity used in the first step. As the code fragment shows, this level of tun- ability is conveniently expressed using the task select construct. The finally code fragments are used to set up another control parameter, c, which is used to describe allowable task configurationsin the third step. Based on the value of c (i.e., the task configuration chosen in the second step), only one of the computeJunctions configurations is allowed. This restriction of configurations based on which configurationswereselectedinanearlierstepmakeexplicittheapplication’s abilitytotradeoffresourcere- quirementsoveritslifetime. Alowerallocationofresourcesinthesamplingsteprequirealargerallocation ofresourcesinthejunctioncomputationstep. 5 Performance Impact of Tunability Tounderstandtheperformanceimpactoftunability, wefirstproposeagreedyheuristicforschedulingtun- ableparallelapplicationswithpredictabilityrequirements,andthenusingasynthetictasksystemsystemat- ically quantify the benefits and shortcomings of tunability. The QoS arbitrator component of the resource managementarchitecturedescribedinSection3willincorporatethisheuristic. 9

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.