ebook img

A Statistical Approach to Performance Monitoring in Soft Real-Time Distributed Systems PDF

0.15 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview A Statistical Approach to Performance Monitoring in Soft Real-Time Distributed Systems

A Statistical Approach to Performance Monitoring in Soft Real-Time Distributed Systems DannyBickson,GidonGershinsky,EzraN. Hochand KonstantinShagin IBM HaifaResearch Lab, MountCarmel, Haifa31905,Israel, 9 {dannybi,gidon,ezrah,konst}@il.ibm.com 0 0 2 Abstract time), many applications impose weaker requirements by n a allowingasmallportionofmessagestoexceedtheirdead- J Soft real-time applications require timely delivery of line (softreal-time). Still, asa consequenceof application 8 messagesconformingtothesoftreal-timeconstraints. Sat- complexity and the volatile nature of the distributed envi- 1 isfying such requirements is a complex task both due to ronment,evencompliancewiththeseweakerconstraintsis the volatile nature of distributed environments, as well as achallengingtask.Unexpectedactivitybursts,messageloss ] I due to numerous domain-specific factors that affect mes- due to unreliable communication medium, network buffer N sage latency. Prompt detectionof the root-cause of exces- overflow, network congestion, resource sharing and many s. sivemessagedelayallowsadistributedsystemtoreactac- otherunpredictablefactorsmayresultinsignificantincrease c cordingly. Thismaysignificantlyimprovecompliancewith intheend-to-endmessagedelay. [ therequiredtimelinessconstraints. It ishighlydesirablethata distributedsystem adaptsto 1 Inthiswork,wepresentanovelapproachfordistributed the changing conditions and thus avoids violations of the v performance monitoring of soft-real time distributed sys- latency constraints. A crucial step towards achieving this 5 tems. We propose to employ recent distributed algorithms is enablingthe system to identify the root-causewhenever 8 6 fromthestatisticalsignalprocessingandlearningdomains, there is a degradationin performance. This, by itself, is a 2 and to utilize them in a different context of online perfor- non-trivialproblem,becauseinthiscontextasymptommay . mancemonitoringandroot-causeanalysis,forpinpointing beeasilymistakenfortherealcauseormisinterpreted. For 1 0 thereasonsforviolationofperformancerequirements. Our example,packetdropresultingfrombufferspacedeficiency 9 approachisgeneralandcanbeusedformonitoringofany onthereceiversidemaybeattributedtopacketlossdueto 0 distributed system, and is not limited to the soft real-time networkcongestion. : v domain. Distributed systems with latency constraints often em- i WehaveimplementedtheproposedframeworkinTrans- ploy resource reservation to ensure that the more critical X Fab, an IBM prototype of soft real-time messaging fabric. components are served more promptly. The reserved re- r a Inadditionto root-causeanalysis, the framework includes sources are commonly the memory space, bandwidth and facilities to resolve resource allocation problems, such as CPU share. In many cases, readjustment of the resource memoryandbandwidthdeficiency.Theexperimentsdemon- quotascanalleviatethetimelinessissues. Forinstance,ifa stratethatthesystemcanidentifyandresolvelatencyprob- certaincomponentrapidlygeneratesmessages,itsmayex- lemsinatimelyfashion. ceedits transmissionbandwidthlimitandhencemayhave to queue messages, rather than transmitting them immedi- ately. Consequently, the delayedmessages may miss their delivery deadline. This may be avoided by temporary in- 1. Introduction creasingthecomponent’sbandwidth,ifpossible. Theideasaboveleadustodeviseaframeworkthatmon- Thenumberofdistributedsystemswithlatencyrequire- itors distributed system performance, determines the root- mentsrapidlygrows. Inseveraldomains,suchasmilitary, cause of the increased delay, and takes corrective actions industrial automation, and financial markets, message la- in order to avoid violation of the timeliness constraints. tencyplaysacriticalrole.Sinceitistechnicallyhardandin Weproposeamonitoringframeworkwhichemploysadis- mostcasescostlytoguaranteethateachandeverymessage tributedroot-causeanalysis. Asignificantadvantageofthe is delivered within a predefined period of time (hard-real statisticalapproachisthat,incontrasttotheexpertknowl- edge methods, it is independentof the system characteris- networks. We havefurtherobservedonlyaminorincrease tics such as operating system, transport protocol and net- inCPUconsumptionandmemory. work structure. Moreover, it requires a minimal domain- Ourtechniquecanscaleuptolargedomains,inahierar- specificknowledgetoaccuratelydeterminetheroot-cause. chicalmanner,whereeachsubdomainperformsmonitoring Ourprimarydesigngoalswerethefollowing: locally,filtersouttherelevantparameterswhichaffectper- formance,andthenthealgorithmisrunagainbetweenthe • Systemoperationshouldbedistributed,withoutacen- differentdomains. tralizedcomputingnode. This paperis organizedas follows. Section 2 describes related previous work. Section 3 outlines the mathemat- • The system should adapt to network changes as ical background required for understanding our construc- quicklyaspossible. tion. Section 4 presents our framework. Experimentalre- • The system should not rely on software implemen- sultsofarealLANdeploymentarediscussedinSection5. tation, OS and networking details (“black-box” ap- WeconcludeinSection6. proach). 2.Related work In the currentwork, we make the novelcontributionof borrowingrecentalgorithmsfromthefieldofstatisticalsig- nalprocessing[1, 9] to be employedin a differentcontext Recently, there has been a lot of research targeted on of a distributed monitoringframework. By utilizing those monitoring and allocation of resources in communication algorithmsweareabletoefficientlyanddistributivelychar- networks utilizing techniques from statistics, learning and acterize the behaviorof the varyingnetwork conditionsas data mining domains (see for example [10, 7, 5, 8, 6, 4]). astochasticprocess,andtoperformroot-causeanalysisfor Onepossibleapproachforhavingstrictperformanceguar- detectingtheparameterswhichcauseanincreasedlatency. antees of a distributed software application is to use over Theframeworkworksasfollows. Eachnodemonitorsa provisioning, where the required host and network re- large number of various local operating system and appli- sourcesareallocatedahead[6],avoidingcasesofresource cation parameters. If a degraded performanceis observed congestion. In contrary, in the currentwork, we assume a anywhereinthenetwork,thenodesjointlycharacterizethe dynamicmodel, where software behaviorand resourcere- performanceby regardingit as a linear stochastic process, quirementsarenotknownahead. using statistical signal processing tools. Subsequently, a Other worksuse a centralizedcomputationforcomput- jointroot-causeanalysiscomputationisperformedtoiden- ingthebestallocationofresources[10,8],whileweassume tifytheparameterswhichaffectperformance.Oncetherea- a distributed computing model. In our model there is no sonsfordegradationareknown,acorrectiveactionistaken centralserverwherealltheinformationisshippedandpro- (wheneverpossible),byadjustingtheresourcequotaofone cessed in. Rishetal. [7] optimizes topologyconstruction ormorenodes. for optimizing downloadspeed of a Peer-to-Peer network. Theroot-causeanalysistechniqueisgeneralandcanbe Inthe currentwork, we assume thereis some giventopol- appliedinmanyotherdistributedsystems,anditisnotlim- ogyof the communicationflows between the participating ited to the soft real-time domain. The main performance nodes as designed by the application builder, and perform measuresare tunable and can be set, for example, to CPU themonitoringontopofthegiventopology. Ourmonitor- consumption,bandwidthutilizationetc. Oneoftheappeal- ingframeworkcanbeappliedinanygiventopology,includ- ing properties of our monitoring is that it can be used for inggraphswithcycles. debuggingas well – detecting anomalous software behav- ResourceBundles[4]isan exampleofasuccessfulap- iorlikebugsanddeadlocks. Itcanbefurtherusedforload proachforfindingandclusteringsimilaravailableresources balancing,minimizationofdeployedresource,hot-spotde- overtheWAN,andiscloselyrelatedtothiswork.IntheRe- tectionetc. sourceBundlessystem,resourcesarecaptureddailytoform WehaveimplementedtheproposedframeworkinTrans- aresourceutilizationhistogram.Availableresourcesareag- Fab,a prototypeofsoftreal-timemessagingtransportfab- gregatedforprovidinguserssmartselectionofgroupsofre- ric, developed in IBM Research Lab. We have tested our sources. Notsurprisingly,itisshownthataddinghistorical framework in various settings and on different topologies. informationaboutresourceconsumptionsimprovessignif- Theexperimentsshowthattheproposedschemeaccurately icantlyselectionofresources. Inthecurrentpaper,ourfo- identifies the reasons for performancedegradationin non- cusisonrealtimeresourceallocation:wecaptureresource trivialscenarios.Overall,theprotocolisalight-weightpro- usageinatimewindowofseconds(ratherthandaily).Fur- tocol. Themessage overheadofa singleroot-causeanaly- thermore,weuseacharacterizationoftheprocessasalin- siscomputationamountstoonlyseveralkilobytespercom- earlynoisystochasticprocess. Thisallowsgreaterflexibil- municatingnode,whichisnegligibleinmostcontemporary ityindescribingtheprocessbehaviorovertime,bycharac- 2 terizing joint covariance of measured parameters (relative Thelinearregressionmethodhasanunderlyingassump- tohistogramsusedbyResourceBundles).Furthermore,our tionthatthemeasuredparametersarenotcorrelated. How- frameworkisalsocapableofperformingroot-causeanaly- ever,asshownintheexperimentalresultsinSection5,the sisofparameterswhichaffectsystemperformance. measured parameters are highly correlated. For example, onacertainqueuethenumberofget/putoperationsineach 3. Mathematical Background givensecondarecorrelated. Inthiscase, itisbettertouse thegeneralizedleastsquares(GLS)method.Inthismethod, weminimizethequadraticcostfunction The following Section briefly overviewsthe mathemat- ical background needed for describing the algorithms de- min(Ax−b)TP−1(Ax−b), (4) ployed. Section 4 explainshow those algorithmsare used x inthemonitoringcontext. where P is the inverse covariance matrix of the observed data. Inthiscase,theoptimalsolutionis 3.1. The Kalman filter algorithm x=(ATP−1A)−1ATP−1b. (5) TheKalmanFilterisanefficientiterativealgorithmthat estimates the state of a discrete-time controlled process whichisthebestlinearunbiasedestimator(BLUE). x ∈ Rn thatisgovernedbythelinearstochasticdifference equation 3.3. Efficient distributed computation via xk =Axk−1+wk−1, (1) the Gaussian Belief Propagation algo- with a measurement z ∈ Rm that is zk = Hxk + vk. rithm The random variables wk and vk that represent the pro- cessandmeasurementAWGNnoise(respectively).p(w)∼ Recent work [1] shows how to compute distributively N(0,Q),p(v) ∼ N(0,R). ThediscreteKalman filter up- andefficientlytheKalmanfilteroveracommunicationnet- dateequationsaregivenby[11]: work. Thepredictionstep: Other recent work [2, 9, 3] show that the GLS method − computation (Eq. 5) can be computed efficiently and dis- xˆk = Axˆk−1, (2a) tributivelytheGaBPalgorithmaswell. Pk− = APk−1AT +Q. (2b) TheGaussianBeliefPropagation(GaBP)algorithmisan Themeasurementstep: efficient iterative algorithm for solving a system of linear equationsofthetypeAx=b[9].Theinputtothealgorithm Kk = Pk−HT(HPk−HT +R)−1, (3a) is the matrix A and the vector b, the output is the vector xˆk = xˆ−k +Kk(zk−Hxˆ−k), (3b) x=A−1b. Thealgorithmisadistributedalgorithm,which − means that each node gets a part of the matrix A and the Pk = (I −KkH)Pk . (3c) vectorbasinput,andoutputsapartofthevectorxasoutput. whereI istheidentitymatrix. Thealgorithmmaynotconverge,butincaseitconvergesit The algorithm operates in rounds. In round k the esti- isknowntoconvergetothecorrectsolution. mates Kk,xˆk,Pk are computed, incorporating the (noisy) Because of the short space, we do not reproduce here measurementzk obtainedin this round. The outputof the anyofthepreviousresults. Theinterestedreaderisreferred algorithmarethemeanvectorxˆk andthecovariancematrix to [1, 9, 3, 2] for complete description of the algorithms Pk. deployed. 3.2. Generalized least squares (GLS) 4 Ourproposed construction method GivenanobservationmatrixAofsizen×k,andatarget Our monitoringframeworkis composedof four stages, vectorbofsize1×n,thelinearregressioncomputesavector asdepictedinFigure1. Thefirststageisthedatacollection stage. In this stage the nodelocally monitortheir measur- x which is the least squares solution to the quadratic cost ableperformanceparametersandrecordtherelevantparam- function 2 etersinalocaldatastructure.Thedatacollectionisdonein min||Ax−b||2 . x the background every configured time frame ∆t, and has Thealgebraicsolutionisx = (ATA)−1ATb. xcanbere- minimaleffect on performance. In case of a normaloper- ferredasthehiddenweightparameters,whichgiventheob- ation, all messages arrive before their soft real-time dead- servationmatrixA,explainsthetargetvectorb. lines. Thus, there is no need to continue and computethe 3 nextstages. Wheneveroneofthenodesdetectssomedete- sourcesfromtheoperatingsystem.Thefourthstageisdone riorationinperformance(e.g.,amessageisalmostlate),it locallyandisoptional,dependingonthetypeofapplication notifiestheothernodesthatitwishestocomputethesecond andtheavailabilityofresources. stage. Below, we give furtherdetailsregardingthe implemen- tationandcomputationalaspectsofthedifferentstages. 1(cid:13) 4(cid:13) Taking(cid:13) Local(cid:13) 4.1. Stage I: local data collection corrective(cid:13) monitoring of(cid:13) L(cid:13)o(cid:13)c(cid:13)a(cid:13)l(cid:13) performance(cid:13) measures(cid:13) L(cid:13)o(cid:13)c(cid:13)a(cid:13)l(cid:13) (when possible)(cid:13) In this stage, participating nodes locally monitor their performanceevery∆t seconds. Node recordperformance parameters,suchasmemoryandCPUconsumption,band- width utilization and other relevant parameters. Based on 2(cid:13) 3(cid:13) Detecting(cid:13) Learning(cid:13) the monitored software, information about internal data parameters(cid:13) parameters’(cid:13) structures like files, sockets, threads, available buffersetc. which influence(cid:13) D(cid:13)i(cid:13)s(cid:13)t(cid:13)r(cid:13)i(cid:13)b(cid:13)u(cid:13)t(cid:13)e(cid:13)d(cid:13) cKoarlrmelaanti oFnil tveiar(cid:13)(cid:13) pevrfiao rGmLaSn(cid:13)ce(cid:13) D(cid:13)i(cid:13)s(cid:13)t(cid:13)r(cid:13)i(cid:13)b(cid:13)u(cid:13)t(cid:13)e(cid:13)d(cid:13) icsalallys,oinmaonniintoterrenda.lTdhaetamstoruncittourreedrepparreasmenetteinrsgathreesmtoarteridxlAo-, ofsizen×k,wherenisthehistorysize,andkisthenum- ber of measured parameters. Note, that at this stage, the Figure 1. Schematic operation of the pro- monitoring framework is oblivious to the meaning of the posedmonitoringframework. monitoredparameters, regardingall monitoredparameters equallyaslinearstochasticnoisyprocesses. The second stage performs the Kalman filter computa- 4.2. Stage II: Kalman filter tiondistributively. Theinputtothesecondstagearethelo- caldataparameterscollectedinthefirststage,anditsoutput The second stage is performed distributively over the is the mean and joint covariance matrix which character- network, where participating nodes compute the Kalman ize correlation between the different parameters (possibly filter algorithm (outlined in Section 3.1). The input to the collectedondifferentmachines).Theunderlyingalgorithm computationisthematrixArecordedinthedatacollection used for computingthe Kalmanfilter updatesis the GaBP stage, andthe assumed levelsof noise Q and R. The out- algorithm(describedinSection3.3). Theoutputofthesec- put of this computationis the mean vector xˆ and the joint ondstagecanbealsousedforreportingperformancetothe covariancematrixP (Eq.3b,3c). Thejointcovariancema- application. Forexample,weareabletomeasurethemean trixcharacterizescorrelationbetweenmeasuredparameters, andvarianceoftheeffectivebandwidth. possiblyspanningdifferentnodes. ThethirdstagecomputestheGLSmethod(explainedin We utilizerecentresultsfromthefieldofstatisticalsig- Section 3.2) for performingregression. The target for the nal processing for computing the the Kalman filter using regressioncanbechosenonthefly. Inourexperimentswe the GaBP iterative algorithm (explained in Section 3.3.) wheremainlyinterestedinthetotalmessagelatencyasour The benefit of using this recent efficient distributed itera- mostimportantperformancemeasure.Theinputtothethird tivealgorithmisinfasterconvergence(reducednumberof stage is the parameter data collected at the first stage, and iterations)relativetoclassicallinearalgebraiterativemeth- the covariance matrix computed in the second stage. The ods.Thisinturn,allowsthemonitoringframeworktoadapt outputofthethirdstageisaweightparametervector. The promptlytochangesinthenetwork. weight parameter vector has an intuitive meaning of pro- TheoutputoftheKalmanfilteralgorithmxˆiscomputed vidingalinearmodelforthedatacollected. Thecomputed ineachnodelocally. Eachcomputingnodehasthepartof linear model allows us to identify which parametersinflu- the outputwhich is the mean value of its own parameters. enceperformancethemost(parameterswiththehighestab- Forreducingcomputationcost,wedonotcomputethefull solute weights). Additionalbenefit, is that using the com- matrixP,buttherowsofP whichrepresentsignificantper- puted weights we are able to compute predictions for the formanceparametersselectedahead. nodebehavior. Forexample, howan increase of10MBof buffermemorywillaffectthetotallatencyexperienced. 4.3. Stage III: GLS Regression Finally,thefourthstageusestheoutputofthethirdstage for taking corrective measures. For example, if the main reason of increased latency is related to insufficient mem- Thethirdstageisperformeddistributivelyoverthe net- ory, the relevant node may request additional memory re- work as well, for computing the GLS regression (Eq. 5). 4 TheinputtothisstageisthejointcovariancematrixP com- mance,byusingtheequationˆb=Axˆ. puted in the second stage, the recorded parameters matrix Specifically, in the soft real-time systems we focus on, A, and the performance target b. The output of the GLS we can examine the effect of an increase of 10% in trans- computationisaweightvectorxwhichassignsweightsto mitter memory by computing the predicted effect on total all of the measured parameters. By selecting the parame- message latency. In the current work we mainly experi- terswiththehighestabsolutemagnitudefromthevectorx, mentedwithmemorypredictions,whereincreasewaslim- weidentifywhichoftherecordedparameterssignificantly ited to up to 10%. We have found the linear model quite influencetheperformancetarget.Theresultsofthiscompu- accurate under those settings, whenever the memory was tationisreceivedlocally,whichmeansthateachnodecom- theactualperformancebottleneck. Anareaforfuturework putesthe weightsof its ownparameters. Additionally,the is to investigate the applicability of predictions computed nodes compute distributively the top ten maximal values. bythelinearmodelonbroadersettingsandappliedtoother TheGLSmethodiscomputedagainusingtheGaBPalgo- resources. rithm (Section 3.3). The main benefit of using the GaBP algorithm for both tasks (kalman filter and GLS method 5.Experimental Results computation)isthatweneedto implementandtestthe al- gorithmonlyonce. The TransFab messaging fabric is a high-performance soft real-time messaging middleware. It runs on top of 4.4. Stage IV: taking corrective measures thenetworkinginfrastructureandimplementsasetofpub- lish/subscribeandpoint-to-pointserviceswithexplicitlyen- Whenever a node detects that a local parameter com- forcedlimitsontimesforend-to-enddatadelivery. puted in stage III, is highly correlated to the target per- We have incorporated our monitoring framework as a formancemeasure, it may try to take corrective measures. partoftheTransFaboverlayinJava.Inourexperiments,the Thisstepisoptionalanddependsontheapplicationand/or TransFabnoderecorded190parameterswhichcharacterize theoperatingsystemsupport. Exampleoflocalsystemre- thecurrentperformance.Amongthem,memoryusage,pro- sourcesare CPU quota,threadpriority,memoryallocation cess information(obtained from the \proc file system in andbandwidthallocation.Notethatresourcesmaybeeither Linux), current bandwidth, number of incoming/outgoing increasedordecreasedbasedontheregressionresults. messages, state of internal data structures like queues and For implementing this stage, a mapping between the bufferpools,numberofget/putoperationsonthem,etc. measuredparametersandtherelevantresourceneedstobe We have utilized the unreliable UDP transport whose definedbythesystemdesigner. Forexample,TRANSMIT- timeliness properties are more predictable then those of TER PROCESS VSIZE,theprocessvirtualmemorysizeis TCP. TransFab incorporates reliability mechanisms that related to memory allocated to the process by the operat- guarantiesin-orderdeliveryofmessages. Atransmitterdis- ingsystem. Ourmonitoringframework(stagesI-III)isnot cardsamessageonlyafterallreceivershaveacknowledged awaretothesemanticmeaningofthisparameter.Fortaking thereceiptofthemessage. Whenareceiverdetectsamiss- correctivemeasures, the mappingbetween parametersand ingmessage,itrequestsitsretransmissionbysendinganeg- resource is essential and requires domain specific knowl- ativeacknowledgementtothetransmitter. edge. Getting back the virtual memory example above, themappinglinksTRANSMITTER PROCESS VSIZEtothe 5.1. Two nodes experiment memoryquotaofthetransmitterprocess.Wheneverthispa- rameterisselectedtobythelinearregressiondoneinstage For testing our distributed monitoring framework, we IIIasaparameterwhichsignificantlyaffectsperformance,a haveperformedthefollowingsmallexperiment. Inthisex- requesttotheoperatingsystemtoincreasethevirtualmem- periment,ourmainperformancemeasureisthetotalpacket oryquotaisperformed. latency. A transmitterandreceiverTransFabnodesrunon A natural question is how much to increase / decrease twoidlePentiumIVdualcoreAMDOpteron2.6GhzLinux a certain resource quota. Here, the results of Stage III are machinesonthesameLAN.Thetransmitterwasconfigured useful. The regressionassign weightsto examinedsystem to send 10,000 messages of size 8Kb in a second. Mem- parameters to explain the performance target in the linear oryallocationofbothnodesis100Mb.Theexperimentrun model. Moreformally,Ax ≈ bwherexistheweightvec- for 500 seconds, where history size n was set to 100 sec- tor,Aaretherecordedparametersandbistheperformance onds.Duringtheexperiment,stageI(datacollection)ofthe target.Now,assumexiisthemostsignificantparameterse- monitoringwas performedevery∆t = 1 second. At time lectedbytheregression,representingresourcei. Itispos- 250seconds,theKalmanfilteralgorithmwascomputeddis- sibletoincreasexi bylet’ssay20%,xˆ = x+0.2∗xi and tributively by the nodes. The goal of this experiment is examinethe result of the increase on the predicted perfor- to show that by performingthe Kalman filter computation 5 (stage II) using information collected from two nodes, we areabletoidentifywhichofthecollectedparametersinflu- ence the total packet latency. Furthermore, we are able to gaininsightsaboutsystemperformance,whichcouldnotbe computedbyusingonlylocalinformation. For saving bandwidth, nodes locally filter out constant parameters out of the matrix A. Thus, the input to the Kalman filter algorithm was reduced to 45 significant pa- rameters. Figure 2 presents a joint covariance matrix cal- culated by the Kalman filter algorithm (computed in the secondstage)usingthistypicalrun. Column(androw)40 representthetotalpacketlatencymeasuredbythereceiver. Thecovariancematrixincludesparameterscapturedbythe Figure 3. Kalman filter smoothingof packets transmitter(columns1-23)andparametersrecordedbythe persecondparameter. receiver(columns24-45). Total latency 45 60 40 packetspersecondparameterofthesametwonodesexper- 35 50 iment. The assumed error levels Q,R define the level of meter number223050 3400 mscmoamotropictuhetisendwgv.iatIhnriaeonrurcroeerxplpreoveveriilmdσee2tnht=eswn0oe.d0teo1so.kwTQihthe,Ramdtedoaitnbioevndaaliluaigenofanonard-l Para 20 mationaboutperformance,whichcouldbeused formoni- 15 10 toringanddebugging. 10 Wehaverepeatedthepreviousexperiment,butthistime 0 5 attime150seconds,thetransmittingmachinememorywas 5 10 15 20 25 30 35 40 45 −10 reduced to 2.4Mb for outgoing message buffers. At time Parameter number 155, the receiving nodes detected locally degradation in performance,andcomputestagesII(Kalmanfilter)andIII (linear regression) where the history parameter was set to Figure 2. Joint Covariance matrix computed n = 100 seconds, for finding the parameterswhich affect distributivelyusingtwoTransFabnodes. Col- thetotalpacketslatency.Figure4presentstheoutputofthe umn (and row) 40 captures the dependency distributedlinearregressionpreformedbythetwoTransFab of total packet latency in various measured nodes. Clearly, the transmitterlow buffersare themain(3 parameters. Warm colors (yellow to red) outofthe top5)reasonsbehindthe increasedlatency. We presentsmedium tohighcorrelation ofmea- deployedtheGLSmethod(showninFigure5) toimprove suredparameterswiththetotallatency. thequalityofranking. Thistime,allsevenoutofthetop9 parametersarerelatedtotransmitterbuffers. As clearly seen in Figure 2, the total latency of pack- 5.2. Larger experiments ets, even in a small setup of only two nodes, on two idle machines,isstronglycorrelatedwithdozensofparameters. Furthermore,thetotallatencydependsonparametersfrom We have performed the following experiment which both the sender and the receiver. The covariance matrix demonstrates the applicability of our monitoring frame- plotsthe dependenceof pairsof parameters,wherewe are work. Thegoalofthisexperimentwastotest,givenaran- mainlyinterestedinunderstandingthereasonsformessage domly chosen faulty node, weather the monitoringframe- delay. work is able to correctly identify the faulty node, and the In practice, we have deployed the following optimiza- typeofthefault. tion:wedonotcomputethefullcovariancematrix,butonly At runtime, we have randomly created an overlay tree therowswhichrepresentcorrelationwiththetargetparam- topologyoftennodes. Asingletransmitter(therootnote), eters(inthisexampleonlyrow40). transmitsa flowof 3,000packetsofsize 8K persecondto Figure 3 depicts the additional Kalman filter output, itschildren,witharateof24Mbitforeachflow. Thenodes which is the mean vector xk. In this example, the mean havenoglobalknowledgeofthenetworktopology.Specifi- 6 Error Description 1 A Noerror B LowCPUreceiver 0.8 C LowCPUtransmitter D Channelloss 0.6 E Lowmemoryreceiver F Lowmemorytransmitter 0.4 Table 1.Possiblehostand networkfaultsse- 0.2 lectedrandomlyonruntime. 0 TBUF_AVAIL_MIN TBUF_AVAIL_MAX TZDSTATS_1 TBUF_OP_GET RPROCESS_VSIZEE TBUF_HISTORY_DEL TZDSTATS_0 TBUF_AVAIL TBUF_OP_PUT cawbareeellahfyso,asrvtiwhgeenasetrerddlaeinntcosgtmeondnioteatdeteorrsfaisanthnndedoottmwraehwoeincanheroeondwfoeshdfa.iecushlatsorefliitstrtseeeddilrieneacvTteacsbh.liNeld1erxetton, Wehaveruntheexperimentfor500seconds,afterwhich thenodesjointlycomputetheregressionresults,wherethe Figure4.Linearregressionresultsofatrans- target was the transmitter latency. (Transmitter latency is mitterwithlowmemory. Outofthefirst5top definedasthetotaltimemessageswaitintransmitterbuffers reasons for latency increase are 3 related to fromtheir submissionby the applicationuntiltheyare ac- thetransmittermemorybuffers. tuallytransmittedoverthewire.) Transmitter (T)(cid:13) 1 0.8 Forwarding(cid:13) node (R4)(cid:13) 0.6 R1(cid:13) R2(cid:13) R3(cid:13) 0.4 0.2 R5(cid:13) R6(cid:13) Slow( Rre7c)e(cid:13)iver(cid:13) R8(cid:13) R9(cid:13) 0 TBUF_HISTORY TBUF_AVAIL TBUF_OP_PUT TBUF_OP_GET TBUF_AVAIL_MIN TBUF_AVAIL_MAX UF_HISTORY_DEL PROCESS_VSIZE TZDSTATS_0 FTerparihegrnreoudforrotreo(mmipn6leoy.tdlhosAjigeoslytiern,cceatftelosyeerdtowolnaaoptrwoadrlluloimnntgegetynimmnonooeford.dyteReernesaeg,ncmrwdeeaihsvtceseyhrirpoei)nenatehroisesef. B T targetisthetransmitterlatency. Figure 5. Improved regressions results for a Thisexperimentwasrepeatedmultipletimes,eachtime transmitter with low memory, using the GLS withadifferenttopology,adifferentfaultynodeandadif- method. Seven out of nine top reasons for ferentfaultwasselected.Anexampletopologygeneratedat latencyincreasearerelatedtothetransmitter randomisshowninFigure6. Thefaultynodewasassigned memorybuffers. faultE -alowmemoryreceiver. Thematchingregression resultsarepresentedinFigure7. Theresultsindicate,that the memory of the low memory node is identified as the firstcausewhichaffectstransmitterlatency.Theforwarding node’smemoryisidentifiedasthesecondandthirdcauses 7 2x 106 Transmitter latency Actual Predicated 1.8 1.6 1.4 Transmitter Latency (Nanos)01..182 0.6 0.4 0.2 00 10 20 30 40 50 60 70 80 90 100 Time (Seconds) Figure8.Regression qualityforthetennode experiment. The blue line represents actual transmitter latency, while the green line rep- resentsthepredictedlatencyusingthelinear model. Figure 7. Regression results for the tree topology with ten nodes. The low memory receiver (R7) memory is detected to be the first cause of transmitter latency. The for- allcaseswewhereabletocorrectlyidentifythefaultynode, warding node’s memory (R4) is the second except of the channel loss fault D. The channel loss case cause for transmitter latency. Finally, trans- simulatesalossychannelbydroppingpacketsuniformlyat mitter (T) memory is detected as the fourth random with a probability of 5%. In the case of channel causewhichaffectstransmitterlatency. loss,theidentifiednodewasthereceiver. Additionalquestionweask,isweatherthelinearregres- sion isable to identifythe typeof fault. Theeasiest faults todetectwherelowmemoryfaults(E+F),whereeitherthe transmitter or receiver had low memory. In those cases ofthe transmitterlatency. Finally, the transmitter memory both the node and the fault reasons where identified cor- isidentifiedasthefourthcause oftransmitterlatency. Be- rectly. Thecaseofnormalbehavior(faultA)weidentified sidesofdetectingthefaultynode,thecriticalpathbetween receivers latency to have highest correlation to transmitter the transmitter and the low memory receiver is identified latency,whichisnormal. correctlyasthecongestedpath. Thelosssituation(faultD)wheremuchhardertodistin- Figure 8 depicts the quality of the linear regression for guishfromalowCPUreceiver(faultB).Thereasonisthat the same experiment, comparing the actual transmitter la- it is much harder to enforce flow control using IP multi- tencywiththepredictedlatencybythelinearmodel. More cast,sincedifferentnodeshavedifferentcapabilities.Inthe formally,we firstcomputexusingEq.5andthenplotAx caseoffaultB,theslow receiverisswampedwithpackets vs. b. Thedesiredresultisthattheactuallatencyandpre- in speed higher than its processing capabilities, so packet dicted latency computed by the linear model would be as lossisincurredattheoperatingsystemsocketbufferslevel. closeaspossible toeachother. However,usingrealworld Fromtheotherhand,infaultD,randompacketsarethrown data it is hard to getperfectpredictions, probablybecause inasituationwhichissimilartofaultB.Table2summarizes thelinearmodelis a simplificationofthe realworld. This thegroupsoffaultsthatwherenotdistinguishableusingthe figure shows that the overallfit between the predictedand linearregression. actuallatency is quite good, exceptfor some spikes in the Since we where not able to distinguish between chan- actualtransmitterlatencywhichwherenotpredicted. Note nelloss andsocketbufferoverflow,we experimentedwith thatpredictionqualityiscloselyrelatedtothediscussionin bursty loss model (instead of random loss). Under the Section4.4regardingresourceallocationinstageIV. burstylossmodel,wehavethrownasequenceof100pack- We have repeated the ten nodes experiment multiple ets, witha probabilityof %1chosenrandomlyinuniform. times, each time selecting a different topology at random Inthiscaseitwasmucheasiertodistinguishbetweenfaults and a differentfault. Table 2 summarizesour findings. In BandDbycomparingthepatternofnegativeacknowledge- 8 Error Faultynodeidentified? Reasonforfaultidentified? Reasonforfaultidentified usingdomainknowledge? A(noerror) - - B(receiverCPU) V B,Dundistinguishable V C(transmitterCPU) V C,Eundistinguishable V D(channelloss) receiver B,Dundistinguishable V E(receivermemory) V V F(transmittermemory) V V Table2.Summaryofresultsfortherandomlyconstructedtennodesoverlays. ments. foridentifyinganomalousbehaviorsandparameterswhich Weconcludethatthelinearregressionisaveryeffective affectperformance. tool for identifying the faulty nodes as well as the critical Using a prototypesoftware implementedin Java which congested path in a data dissemination overlay. However, randistributivelyonuptotencomputingnodes,we where in some cases it is harder to detect the reason for perfor- able to demonstrate that we are able to correctly identify mancedegradationwithoutdeployingadditionaltools. We faulty nodes in a randomly generated data dissemination proposetousedomainspecificknowledgeusingthedatalo- overlay, and in most cases identify the reasons for perfor- callycollectedatthefaultynodeinthosecases. Oneoption mancedegradation.However,insomedifficultcasesitwas istorecordnormalbehaviorofthenodelocally(computed hardertodistinguishbetweenseveralpossiblefaults.Inthis instageII),soafaultynodecancomparemeanandvariance case,weproposetousedomainspecificknowledgelocally foridentifyinglocallyanomalousbehavior. Anotheroption foridentifyingcorrectlythetypeoffault. istoaddexpertknowledge,forexampledetectingmessages Asforfuturework,weplantoextendthelocalcorrection which arrive out of sequence in large gaps for identifying done at stage IV for creating a self-healingnetwork. Cur- burstylosscases. rentlywefocusonlowmemorysituationswherethefaulty nodeincreasesitsmemoryallocated.Wefurtherplantoex- 5.3. Protocol overhead plore changesin local resource quotaslike bandwidthand processpriority. Anotherinterestingextensionistodeploy hierarchyin largernetworkswherethe monitoringis done Regardingtheprotocoloverhead,stagesIandIVareper- locallyineachdomain,andperformanceresultsareaggre- formed locally with minimal computational effort, requir- gatedusingdomainhierarchy. ing nonetworkbandwidth. Stage II and IIIare performed across the network. The GaBP algorithm typically con- TheGaBPalgorithmisaniterativealgorithmwhichmay vergesin fiveiterations, wherein eachiterationeach node notconverge.Wearenowworkingonavariantofthealgo- sends around 2.5Kb of data to each of its neighbors. The rithm which always convergesto the correctanswer. This GLSmethodrequiresonlyasingle executionoftheGaBP variantwillbereportedinanearfuturework. algorithm, wherethe Kalman filter requiresseveralexecu- tions(dependsontherequiredaccuracy). We typicallyset References thenumberof Kalmanfilter iterationsto 10. The memory andCPUrequirementswherefoundtobenegligible.Speed [1] D.Bickson,D.Dolev,andO.Shental. DistributedKalman ofthelinearregressioncomputationwasaround0.1second, filter using Gaussian belief propagation. In Proc. 46th whiletheKalmanfilterstepcomputationtookupto1sec- AllertonConf.onCommunications,ControlandComputing, ond. The computation are done by a separate thread, the Monticello,IL,USA,Sept.2008. avoidserviceinterruption. [2] D. Bickson, O. Shental, P. H. Siegel, J. K. Wolf, and D.Dolev. Lineardetectionviabeliefpropagation. InProc. 6. Conclusionand future work 45thAllertonConf.onCommunications,ControlandCom- puting,Monticello,IL,USA,Sept.2007. [3] D. Bickson, O. Shental, P. H. Siegel, J. K. Wolf, and In this work we have proposed an efficient monitoring D. Dolev. Gaussian belief propagation based multiuser framework to be used in a distributed system. Using our detection. In IEEE Int. Symp. on Inform. Theory (ISIT), approach, we regard nodes’ performance parameters as a Toronto,Canada,July2008. stochasticprocessandusestatisticalsignalprocessingtools [4] M.CardosaandA.Chandra. Resourcebundles: Usingag- for characterizing the process behavior. Furthermore, we gregation for statistical wide-area resource discovery and areabletoperformaroot-causeanalysisacrossthenetwork allocation. In The 28th International Conference on Dis- 9 tributedComputingSystems(ICDCS2008),pages760–768, LosAlamitos,CA,USA,2008.IEEEComputerSociety. [5] J. Kim, A. Chandra, and J. B. Weissman. Accessibility- basedresourceselectioninloosely-coupleddistributedsys- tems. InThe28thInternationalConferenceonDistributed Computing Systems (ICDCS 2008), pages 777–784, Los Alamitos,CA,USA,2008.IEEEComputerSociety. [6] C.Lumezanu,S.Bhola,andM.Astley.Onlineoptimization forlatencyassignment indistributedreal-timesystems. In The28thInternationalConferenceonDistributedComput- ingSystems(ICDCS2008), pages752–759, LosAlamitos, CA,USA,2008.IEEEComputerSociety. [7] I.RishandG.Tesauro. Estimatingend-to-endperformance by collaborative prediction with active sampling. In Inte- gratedNetworkManagement,2007.IM’07.10thIFIP/IEEE InternationalSymposiumon,pages294–303,2007. [8] A. Saboori, G. Jiang, and H. Chen. Autotuning config- urations in distributed systems for performance improve- ments using evolutionary strategies. In The 28th Inter- national Conference on Distributed Computing Systems (ICDCS 2008), pages 769–776, Los Alamitos, CA, USA, 2008.IEEEComputerSociety. [9] O.Shental, P.H.S.D.Bickson, J.K.Wolf,andD.Dolev. Gaussian belief propagation solver for systems of linear equations. In IEEE Int. Symp. on Inform. Theory (ISIT), Toronto,Canada,July2008. [10] G.Tesauro, N.Jong, R.Das, andM.Bennani. Ontheuse ofhybridreinforcementlearningforautonomicresourceal- location. InClusterComputing,volume10,pages287–299, 2007. [11] G. Welch and G. Bishop. An introduction to the kalman filter. Technicalreport,2006. 10

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.