1 An Algorithm for Multipath Computation using Distance-Vectors with Predecessor Information SRINIVAS VUTUKURY J.J. GARCIA-LUNA-ACEVES [email protected] [email protected] ComputerSciencesDepartment ComputerEngineeringDepartment UniversityofCalifornia UniversityofCalifornia SantaCruz,CA95064 SantaCruz,California95064 NetworkingandSecurityCenter SunMicrosystemsLaboratories PaloAlto,California94303 Abstract—RoutingalgorithmsintheIPInternetprovideasinglepathbetweeneach With the exception of DASM [17], all of the above routing algo- source-destinationpairandwheremorethanonepathisprovided, theyarepaths rithmsfocuson theprovision of asingle pathtoeach destination. A ofequallength. Single-pathroutingisinherentlyslowinrespondingtocongestion drawback of DASM, however, is that it uses multi-hop synchroniza- andtemporarytrafficbursts;multiplepathsarebettersuitedtohandlecongestion. tion, which limitsitsscalability. Recently, wepresented MPDA [16] AlsothepathsprovidedinRIPandOSPFarenotfreeofloopsduringtimesofnet- whichisthefirstroutingalgorithmbased onlink-statesthat provides worktransition, whichcanbedebilitatingtonetworkperformance. Wepresenta distributedroutingalgorithmforcomputingmultiplepathsthatneednothaveequal multipleloop-freepathsusingone-hopsynchronization. Inthispaper, lengthbetweeneachsource-destinationpairinacomputernetworksuchthatthey wepresentavariantofMPDAcalledMPATH,whichisthefirstrouting areloop-freeateveryinstant—insteadystateaswellasduringnetworktransitions. algorithmbasedondistancevectorsthat(a)providesmultiplepathsof Thealgorithmisscalabletolargenetworksasitusesonlyone-hopsynchronization unequalcosttoeachdestinationthatarefreeofloopsateveryinstant whichisunlikediffusingcomputationsthatrequireinternodalsynchronizationspan- — in steady stateas well as during network transitions, and (b) uses ningmultiplehops.Thesafetyandlivenesspropertiesofthealgorithmareprovenand a synchronization mechanism that spans only one hop, which makes itscomplexityisanalyzed. itmorescalablethanroutingalgorithmsbased ondiffusingcomputa- tionsspanningmultiplehops. MPATHisapath-findingalgorithm,and I. INTRODUCTION differs from prior similar algorithms in the invariants used to ensure multipleloop-freepathsofunequalcost. Thepeculiardifferencesbe- The most popular routing protocols used in today’s internets are tweenMPATHandMPDAisaresultofthedifferencesinthekindof based on the exchange of vectors of distances (e.g., RIP [7] and informationthatnodesexchange. EIGRP[2])ortopologymaps(e.g.,OSPF[11]). RIPandmanyother Section II describes MPATH. Section III presents the correctness routing protocols based on the distributed Bellman-Ford algorithm proofs showing that MPATH is loop-free at every instant, safe, and (DBF) for shortest-path computation suffer from the bouncing effect live. Section IV analyzes thecomplexity of MPATH.SectionV pro- and thecounting-to-infinity problems, whichlimitstheir applicability videsconcludingremarks. tosmallnetworksusinghopcount asthemeasureofdistance. OSPF andalgorithmsbasedontopology-broadcast(e.g.,[15],[12])incurtoo muchcommunicationoverhead,whichforcesthenetworkadministra- II. DISTRIBUTEDMULTIPATHROUTINGALGORITHM torstopartitionthenetworkintoareasconnectedbyabackbone. This A. ProblemFormulation makesOSPFcomplexintermsofrouterconfigurationrequired.EIGRP usesaloop-freeroutingalgorithmcalledDUAL[3],whichisbasedon Acomputernetworkisrepresentedasagraph where internodalcoordinationthatcanspanmultiplehops. issetofnodes(routers)and isthesetofedgesG(lin=ks()Nco;nLn)ectingthNe InadditiontoDUAL,severalalgorithmsbasedondistancevectors nodes.AcostisassociatedwLitheachlinkandcanchangeovertime,but have been proposed to overcome the counting-to-infinity problem of isalwayspositive. Twonodesconnectedbyalinkarecalledadjacent DBF[14],[10],[9],[17]. Allofthesealgorithmsrelyonexchanging nodesorneighbors.Thesetofallneighborsofagivennode isdenoted queriesandrepliesalongmultiplehops,atechniquethatissometimes by i. Adjacentnodescommunicatewitheachotherusinigmessages calleddiffusingcomputations,becauseithasitsorigininDijkstraand andNmessagestransmittedoveranoperationallinkarereceivedwithno Scholten’sbasicalgorithm[1]. errors, inthepropersequence, andwithinafinitetime. Furthermore, Acoupleofroutingalgorithmshavebeenproposedthatoperateus- such messages are processed by the receiving node one at a time in theorder received. Anodedetectsthefailure, recoveryandlinkcost ingpartialtopology information[4], [6]toeliminatethemainlimita- changesofeachadjacentlinkwithinafinitetime. tionoftopology-broadcastalgorithms.Furthermore,severaldistributed shortest-pathalgorithms[8],[13],[5]havebeenproposedthatusethe Thegoalofourdistributedroutingalgorithmistodetermineateach distanceandsecond-to-lasthoptodestinationsastheroutinginforma- node the successor set of for destination , which we denote by tionexchangedamongnodes. Thesealgorithmsareoftencalledpath- i i i,suchthattheroiutinggraph j consistingoflinkset findingalgorithmsorsource-tracing algorithms. Allthesealgorithms Sj(t)(cid:18)N m isfreeofSloGopj(sta)teveryinstant ,even eliminateDBF’scountingtoinfinityproblem,andsomeofthem[5]are fw(hmen;nli)njnk c2osStsj a(rte);chman2ginNggwith time. The routing graph t moreefficientthatanyoftheroutingalgorithmsbasedonlink-statein- forsingle-pathroutingisasink-treerootedat ,becausethesuSccGesjs(ot)r formationproposedtodate.Furthermore,LPA[5]isloop-freeatevery sets i haveatmostonemember.Inmultipjathrouting,therecanbe instant. moreStjh(atn)onememberin ;therefore, isadirectedacyclic i graph with as the sink nSojd(te). There areSpGotje(ntt)ially several foreachdesjtination ;however,theroutinggraphweareinterSesGtejd(ti)s ThisworkwassupportedinpartbytheDefenseAdvancedResearchProjectsAgency(DARPA)under grantsF30602-97-1-0291andF19628-96-C-0038. definedbythesuccejssorsets i k i i , Sj(t) = fkjDj(t) < Dj(t);k 2 N g Report Documentation Page Form Approved OMB No. 0704-0188 Public reporting burden for the collection of information is estimated to average 1 hour per response, including the time for reviewing instructions, searching existing data sources, gathering and maintaining the data needed, and completing and reviewing the collection of information. Send comments regarding this burden estimate or any other aspect of this collection of information, including suggestions for reducing this burden, to Washington Headquarters Services, Directorate for Information Operations and Reports, 1215 Jefferson Davis Highway, Suite 1204, Arlington VA 22202-4302. Respondents should be aware that notwithstanding any other provision of law, no person shall be subject to a penalty for failing to comply with a collection of information if it does not display a currently valid OMB control number. 1. REPORT DATE 3. DATES COVERED 1999 2. REPORT TYPE 00-00-1999 to 00-00-1999 4. TITLE AND SUBTITLE 5a. CONTRACT NUMBER An Algorithm for Multipath Computation using Distance-Vectors with 5b. GRANT NUMBER Predecessor Information 5c. PROGRAM ELEMENT NUMBER 6. AUTHOR(S) 5d. PROJECT NUMBER 5e. TASK NUMBER 5f. WORK UNIT NUMBER 7. PERFORMING ORGANIZATION NAME(S) AND ADDRESS(ES) 8. PERFORMING ORGANIZATION University of California at Santa Cruz,Department of Computer REPORT NUMBER Engineering,Santa Cruz,CA,95064 9. SPONSORING/MONITORING AGENCY NAME(S) AND ADDRESS(ES) 10. SPONSOR/MONITOR’S ACRONYM(S) 11. SPONSOR/MONITOR’S REPORT NUMBER(S) 12. DISTRIBUTION/AVAILABILITY STATEMENT Approved for public release; distribution unlimited 13. SUPPLEMENTARY NOTES 14. ABSTRACT 15. SUBJECT TERMS 16. SECURITY CLASSIFICATION OF: 17. LIMITATION OF 18. NUMBER 19a. NAME OF ABSTRACT OF PAGES RESPONSIBLE PERSON a. REPORT b. ABSTRACT c. THIS PAGE 6 unclassified unclassified unclassified Standard Form 298 (Rev. 8-98) Prescribed by ANSI Std Z39-18 2 ProcedureINIT-PATH Procedure Invokedwhenthenodecomesup. CalledNbTyUPATHtoprocessanevent. f1.Initializealltables. g f1.IfeventisamessageMfromneighgbor , 2.Run algorithm. a.Foreachentry[ , , ]inM//Note k , .) k k EndINIT-PAPTAHTH Set j adndp . d=Dj p=pj i i b.ForeachDdjekst inadtion pwjikth anpentryinM, AlgorithmPATH Removeexistingjlinks( , )in andaddnew i InvokedwhenamessageMisreceivedfromneighbor , link( , , )to ,whnerej Tk i i i foranadjacentlinkto haschangedorwhenanodeisk and m j d. Tk d=Djk(cid:0)Dmk i initialized. k 2.Iftheeventmis=anpajdkjacentlink-statuschange,update and 1.Run g toupdateneighbortables. clearneighbortablesof ,iflinkisdown. lki 2.RunNTU toupdatemaintables. End k 3.ForeMachTdUestination markedaschanged, NTU Addupdateentry[j, , ]tothenewmessage . i i 0 Fig.2. NeighborTableUpdateAlgorithm 4.WithinfiniteamountjofDtimjep,jsendmessage to M 0 eachneighbor. M EndPATH by and isthepredecessorto ontheshortestpathfrom to i asknotifipejdkby . j k Fig.1. ThePATHAlgorithm 4. jTheNeighborLkinkTable istheneighbor ’sviewofthenet- i work as known to and cTonktains link informkation derived from thedistanceandpreidecessorinformationintheneighbordistance where istheshortest distance of node todestination . Wecall table. i suchaDrojutinggraphtheshortestmultipathifordestination j. 5. Adjacent Link Table stores the cost of adjacent link to each i After a seriesof link cost changes whichleave thenetjwork topol- neighbor .Ifalinkisdownitscostislkinfinity. ogyinarbitraryconfiguration,thedistributedroutingalgorithmshould Nodesexchakngeinformationusingupdatemessageswhichhavethe worktomodify insuchawaythatiteventuallyconvergestothe followingformat. shortest multipatShGojf the new configuration, without ever creating a 1. Anupdatemessagecanoneormoreupdateentries. Anupdate loopin duringtheprocess. entry is a triplet [ , , ], where is the distance of the node BecaSusGej is node ’s local variable, its value has to be explic- sendingthemessagjedtodpestinationd and isthepredecessoron k itly or implDicjitly commkunicated to . If is the value of thepathto . j p i k as known to node , the problem niow becDojmkes one of computDinjg 2. Eachmessjagecarriestwoflagsusedforsynchronization: query i . However, becauseofnon-zeroprop- andreply. i i i Sagja(tti)on=deflkajyDs,jkdu(tr)in<g nDetjw(ot)rkgtransitions there can be discrepancies ifnortmheinvalue o.fTDojkpraevnednittslocooppsy,tDhejirkefaotrei,,awdhdiictihonmalaycocnasutrsaeinlotsopmsutsot C.ACsommpenuttiionngeDdjiearlier, our strategy isto firstdesign ashortest-path beimposSeGdwj hencomputing i.Weshowlaterthatifthesuccessorset routing algorithmand thenmake the multipathextensions toit. This at each node for eachdestiSnajtion satisfycertainconditions called subsection describes our shortest-path algorithm PATH and the next loop-freeinvairiantconditions,thenthjesnapshotattime oftherouting subsection describes the multipath extensions. Figure 1 shows the graph impliedby i isfreeofloops. Ourstolution tothis pseudocode of PATH. INIT-PATHis called at node startup to initial- problemSGcoj(nts)istsoftwopaSrtjs(:t()1)computing usingashortest-path ize the tables; distances are initialized to infinity and node identities i routingalgorithmcalledPATHand(2)extendiDngjittocompute such toanullvalue. PATHisexecutedinresponsetoaneventthatcanbe i thattheysatisfyingloop-freeinvariantconditionsateveryinstaSntj. eitherareceiptofanupdatemessagefromaneighbor ordetectionof anadjacent linkcost or linkstatus(up/down) change. PATHinvokes B. NodeTablesandMessageStructures procedureNTU,describedinFigure2, whichfirstupdatestheneigh- bor distance tables and then updates withlinks where AsinDBF,nodesexecutingMPATHexchangemessagescontaining i and . PATTHktheninvokes(mpr;ocned;udre)MTU, distances todestinations. In additiontothe distancetoa destination, i i i dsp=eciDfiendki(cid:0)nFDigmukre5, wmhi=chpcnoknstructs bymergingthetopologies nodesalsoexchangetheidentityofthesecond-to-lastnode,alsocalled i andtheadjacentlinks . T predecessor node, which is the node just before the destination node i i TkThe merging process ilskstraightforward if all neighbor topologies on the shortest path. In this respect MPATH is akin to several prior containconsistent linkinformation, but when twoor moreneigh- algorithms[5],[13],[8],butdiffersinitsspecification,verificationand i Tbokrslinktablescontain conflictinginformationregarding aparticular analysisand,moreimportantly,inthemultipathoperationdescribedin link, the conflict must be resolved. Two neighbor tables are said to thenextsection. contain conflicting information regarding a link, if either both report Thefollowinginformationismaintainedateachnode : thelinkwithdifferent cost or one reportsthelinkand theother does 1. TheMainDistanceTable contains and , whiere isthe i i i not. Conflicts are resolved as follows: if two or more neighbor link distanceofnode todestination andDj isthepjpredecessDorjtodes- i tables contain conflicting information of link , then isup- tination onthesihortestpathfrojm topj.Thetablealsostoresfor datedwithlinkinformationreportedbythene(igmh;bonr) thatoTfifersthe eachdesjtination , thesuccessor seit ji, feasibledistance i, shortestdistancefromthenode totheheadnode kofthelink,i.e., 2. rTephoerMtedaidnisLtainnkceTjaRbDleji andistwthoeflnaogdsec’Sshjavniegwedoafntdherenpeotrwt-oitr.FkDanjd ltekin+tmDamninker=;omneinwfalykii+stoDbmirkeajkkit2iesNailgw.ayTsieisnafraevobmrrookfelnowineraacdodnrseisss- i containslinksrepresenTtedby where isalink neighbor. Because itselfistheheadofthelinkforadjacentlinks,any withcost . (m; n; d) (m; n) informationaboutainadjacentlinksuppliedbyneighborswillbeover- 3. TheNeigdhbor DistanceTable for neighbor contains and riddenbythemostcurrentinformationaboutthelinkavailabletonode i where isthedistanceofneighbor tok ascommDunjikcated .Figure4showsthesignificanceofthetie-breakingrule. i i pjk Djk k j i 3 Procedure i p q 1.CleaMrliTnUktable . i 2 1 1 1 1 1 1 3 2.Foreachnode T occurringinatleastone i, a.Find j 6=i . Tk p q x i y x i y i i i b.Let MbeIsNuc hthmatinfDjk+lkjk2N g.Tiesare 1 4 3 1 i i broknenconsistentlyM.NIeNigh=bo(rDjnis+thlenp)referredneighbor q j j p fordestination .Foreachlinkn( , , )in , (a) i Addlink j to . j v d Tn i 1 p 1 x 3 3.Update iw(ijt;hve;acdh)linkTi. i 2 1 1 j 4.RunDijTkstra’sshortestpatlhkalgorithmon ito 1 1 q 3 y 4 5.fiFonrdenaecwhDdejis,tiannadtiopnij. ,if or changeTdfrom i i (b) previousvalue,setchjangeDdjandprjeport-itflagsfor . End j Tableshowingthepreferredneighbors. Destination p q x y j MTU Distance 2 1 2 3 5 Fig.5. MainTableUpdateAlgorithm Pref.Nbr p q q p q Fig.3. Exampleillustratingthemaintableupdateprocedure. (a)Showstheadjacent Let , called the feasible distance, be an ’estimate’ of the dis- nlienikgshbanodrtnaebilgehsbortablesofnodei. (b)Showsthemainlinktableiaftermergingthe tanceoFfnDojide tonode inthesensethat i isequalto i when the network isiin stablejstate, but to prevenFtDlojops during pDerjiods of networktransitions,itisallowedtobetemporarilydifferfrom . i p x Dj i j Loop-freeInvariantConditions(LFI)[16]: q y (a) (1) i k i FDj(t) (cid:20) Dji(t) k2N i p q (2) i i i Sj(t) = fkjDjk(t)<FDj(t)g Theinvariantconditions (1)and(2)statethat,foreachdestination p q x i y x i y ,anode canchooseasuccessorwhosedistanceto ,asknownto , jislessthani thedistanceofnode to thatisknowntojitsneighbors. i q j j p i j (b) Theorem1: [16] If theLFI conditions are satisfiedat any time , the impliedbythesuccessorsets isloop-free. t i p x SPGrjo(otf): Let thenfrom (2)Swje(th)ave i i j k2Sj(t) q y (3) (c) i i Djk(t) < FDj(t) Atnode ,becausenode isaneighbor,from (1)wehave Fig.4. Significanceofthetie-breakingRule.(a)Anexamplenetworkwithunitlinkcosts. k i (b)Node hasthecostsofitsadjacentlinksandtheshortestpathtreesofitsneighbors and .Tihedistancesofnodes and from isidenticalthroughbothneighbors pand .q(c)IfMTUbreakstiesinaxrbitraryymanneirwhileconstructing i,itmaychoospe k i (4) asqthepreferredneighborfornode andchoose aspreferredneiTghborfornode , FDj(t) (cid:20) Djk(t) rpesultinginagraphthathasnopathxfrom to . Tqies,therefore,cannotbebrokenyin Combining (3)and (4)weget arbitrarymanner. i j (5) k i Aftermergingthetopologies,MTUrunsDijkstra’sshortestpathal- FDj(t) < FDj(t) gorithmtofindtheshortestpathtreeanddeletesalllinksfrom that Eq.(5)statesthat,if isasuccessorofnode inapathtodestina- i arenotinthetree. BecausetherecanbemorethanoneshorteTst-path tion , then ’s feasiblekdistance to isstrictlyilessthan the feasible tree,whilerunningDijkstra’salgorithmtiesareagainbrokeninacon- distanjceofnkode to . Now,ifthesujccessorsetsdefinealoopattime sistentmanner. Thedistances andpredecessors canthenbeob- withrespecttoi,thjenforsomenode ontheloop,wearriveatthe i i tainedfrom . ThetreeiscoDmjparedwiththeprevpijousshortestpath tabsurdrelation j . ThpereforetheLFIconditionsare i p p treeandonlyTthedifferencesarethenreportedtotheneighbors.Ifthere sufficientforlooFpD-frje(etd)o<m.FDj(t) arenodifferences,noupdatesarereported. Eventuallyalltablescon- TheinvariantsusedinLFIareindependentofwhetherthealgorithm vergesuchthat givetheshortestdistancesandallmessageactivity uses link states or distance vectors; in link-state algorithms, such as i willcease.TheDprojofsaregiveninsectionIII. MPDA,the arecomputed locallyfromthelink-statescommuni- i cated by theDnjekighbors while in distance-vector algorithms, like the D. Computing i MPATHpresentedhere,the i aredirectlycommunicated. InthissubsecStjion,thefinaldesiredroutingalgorithmMPATHisde- Theinvariants(1)and(2)Dsjukggestatechniqueforcomputing i rivedbymakingextensionstoPATH.MPATHcomputesthesuccessor suchthatthesuccessor graph fordestination isloop-frSeje(ta)t sets byenforcingtheLoop-freeInvariant conditionsdescribedbe- everyinstant.ThekeyisdetermSinGinjg(t) inEq.(1)j,whichrequires i i lowaSnjdusinganeighbor-to-neighborsynchronization. node toknow ,thedistanceFfrDomj(t)tonode inthetopology k i Dji(t) i j 4 ProcedureINIT-MPATH cannot send any update messages or add neighbors to any successor set. Afterreceivingrepliesfromallitsneighbors thenodeisallowed Invokedwhenthenodecomesup. f1.InitializetablesandrunMPATHg. tomodifythesuccessorsetsandreportanychangesthatmayhaveoc- curredsincethetimeithastransitionedtoACTIVEstate,andifnone EndINIT-MPATH ofthedistancesincreasedbeyondthereporteddistance,thenodetran- Algorithm sitionstoPASSIVEstate. Otherwise, thenode sendsthenext update InvokeMdPwhAeTnHamessageMisreceivedfromneighbor , message withthe query bit set and becomes ACTIVE again, and the foranadjacentlinkto haschanged. k wholecycle repeats. If anode receives amessagewiththe query bit 1.Run toupdatkeneighbortablges. setwheninPASSIVEstate,itmodifiesitstablesandthensendsback 2.RunNTU toobtainnew and . anupdatemessagewiththereplyflagset. Otherwise,ifthenodehap- 3.IfnodMeTisUPASSIVEornodDeijisACTpIijVE lastreplyarrived, penstobeinACTIVEstate,itmodifiesthetablesbutbecausethenode Resetgoactiveflag. ^ isnotallowedtosendupdateswheninACTIVEstate,thenodesends backanemptymessagewithnoupdatesbutthereplybitset. Ifare- Foreachdestination markedasreport-it, a. j plyfromaneighborispendingwhenthelinktotheneighborfailsthen b.FIfDji minf,DSejit;gRoDacjitgiveflag. animplicitreplyisassumed,andsuchareplyisassumedtoreportan c. Dji >RDji infinitedistancetothedestination. Becauserepliesaregivenimmedi- d.RAdDdji[ , Dji , ]tomessage . atelytoqueriesandrepliesareassumedtobegivenuponlinkfailure, i i 0 deadlocksduetointer-neighborsynchronizationcannotoccur.Eventu- e.ClearjrepRoDrt-jitpfljagfor . M ally,allnodesbecomePASSIVEwithcorrectdistancestodestinations, Otherwise,thenodeisACTIVEjandwaitingformorereplies, whichweproveinthenextsection. Foreachdestination markedaschanged, f. j 4.ForeachFdeDstjiin atiomninfmDarjik;eFdDasjigchanged, III. CORRECTNESSOFMPATH a.Clearchangedflagjfor The following properties of MPATH must be proved: (1) MPATH b. j eventuallyconvergeswith givingtheshortestdistancesand(2)the i i i i 5.FoSreja chfnkeijgDhjbkor<,FDjg successorgraph islooDpj-freeateveryinstantandeventuallycon- a. . k vergestotheshorSteGstjmultipath.PATHworksessentiallylikePDA[16] 00 0 b.MIfeve ntMisaqueryfrom ,Setreplyflagin . exceptthatthekindofupdateinformationexchangedisdifferent;PDA 00 c.Ifgoactiveset,Setquerykflagin . M exchangeslink-statewhilePATHexchangesdistance-vectorswithpre- 00 d.If non-empty,send to M. decessor information. The correctness proof of PATH is identical to 00 00 6.IfgoaMctiveset,become M k,otherwise PDA and are reproduced here for correctness. The convergence of become . ACTIVE MPATH directly follows from the convergence of PATH because ex- End PASSIVE tensionstoMPATHaresuchthatupdatemessagesinMPATHareonly MPATH delayedafiniteamountoftime. Fig.6. Multi-pathLoop-freeRoutingAlgorithm Definitions: The -hop minimum distance of node to node in a networkisthemininmumdistancepossibleusingapathiof hops(jlinks) orless.Apaththatoffersthe -hopminimumdistanceisncalled -hop table thatnode communicatedtoneighbor .Becauseofnon-zero propaTgaiktion delay,i is a time-delayed versiokn of . We observe minimumpath. Ifthereisnopnathwith hopsorlessfromnodento that,ifnode delaysTuikpdatingof with until Tinicorporatesthe thenthe -hopminimumdistancefromn to isundefined. An i-hopj distance iinitstables,then FDsajitisfiesDthjieLFIckondition. minimumntreeofanode isatreeinwhiichjnode istherootanndall i i paths of hops or less firom the root to any otheir node is an -hop PseudoDcjode for MPATH isFsDhojwn in Figure 6. MPATH enforces minimumnpath. n theLFIconditionsbysynchronizingtheexchangeofupdatemessages Let denotethefinaltopologyofthenetwork,asseenbyanom- amongneighborsusingqueryandreplyflags. Ifanodesendsames- G niscient observer, after all link changes occurred. (We use bold font sage withaquery bitset, thenthenode must wait untilareply isre- torefer toquantities in ). Without lossof generality, assume is ceived from all its neighbors before the node is allowed to send the G G connected; if isdisconnected, theproof appliestoeach connected nextupdatemessage. ThenodeissaidtobeinACTIVEstateduring G componentindependently. thisperiod. Theinter-neighborsynchronizationusedinMPATHspans Wesaythatarouter knowsatleastthe -hopminimumtree,ifthe onlyonehop,unlikealgorithmsthatusediffusingcomputationthatpo- treecontainedinitsmaiinlinktable isantleastan -hopminimum tentiallyspanthewholenetwork(e.g.,DASM[17]). i treerootedat in andthereareatlTeast nodesin nthatarereach- AssumethatallnodesareinPASSIVEstateinitiallywithcorrectdis- ablefromtheirootG.Notethat issuchthnatthelinksTwiithheadnodes tancestoallothernodesandthatnomessagesareintransitorpending i thataremorethani hopsawayTfrom mayhavecoststhatdonotagree tobeprocessed. Thebehavior ofthenetworkwhereeverynoderuns withthelinkcostsnin . i MPATHissuchthatwhenafinitesequenceoflinkcostchangesoccurs G in the network within a finite time interval, some or all nodes to go Theorem2: Ifnode hasadjacentlinkcoststhatagreewith and through a series of PASSIVE-to-ACTIVEand ACTIVE-to-PASSIVE foreachneighbor , irepresentsatleastan -hopminGimum i statetransitions,untileventuallyallnodesbecomePASSIVEwithcor- tree,thenaftertheekxeTcuktionofMTU,theminim(unm(cid:0)co1s)ttreecontained rectdistancestoalldestinations. in isatleastan -hopminimumtree. i LetanodeinPASSIVEstatereceiveaneventresultinginchanges T Proof: TheprnoofisidenticaltotheproofofLemma1in [16]and initsdistancestosomedestinations. Beforethenodesendsanupdate isprovidedintheappendixforconvenientreference. messagetoreportnewdistances,itchecksifthedistance toanydes- i tination hasincreasedabove thepreviouslyreported dDisjtance . Theorem3: Afinitetimeafterthelastlinkcostchangeinthenet- i Ifnoneojfthedistancesincreased,thenthenoderemainsinPASRSIDVjE work, the main topology at each node gives the correct shortest i state. Otherwise, thenode setsthequery flagintheupdate message, pathstoallknowndestinatTions. i sendsit,andgoesintoACTIVEstate. WheninACTIVEstate,anode Proof: TheproofisidenticaltotheproofofTheorem2in [16] 5 andisprovidedintheappendixforconvenientreference. IV. COMPLEXITYANALYSIS A node generates update messages only to report changes in dis- ThemaindifferencebetweenPATHandMPATHisthattheupdate tancesandpredecessor,soafterconvergencenomessageswillbegen- messagessentinMPATHaredelayedafiniteamountoftimeinorderto erated. Thefollowingtheoremsshow thatMPATHprovidesinstanta- enforcetheinvariants.Asaresult,thecomplexityofPATHandMPATH neousloop-freedomandcorrectlycomputestheshortestmultipath. areessentiallythesameandarethereforecollectivelyanalyzed. Theorem4: ForthealgorithmMPATHexecutedatnode ,let be The storage complexity is the amount of table space needed at a thetimewhen isupdatedandreportedforthe -thtimie. Ttnhen, node. Each one of the neighbor tables and the main distance ta- i i thefollowingcoRnDdijtionsalwayshold. n blehassizeoftheorderN andthemainlinktable cangrow, i during execution of MTUO,(jtNojs)ize at most times T . The i storagecomplexityisthereforeoftheorder jN j .O(jNj) (6) i FDji(tn) (cid:20) minfRDji(tn(cid:0)1);RDji(tn)g ThetimecomplexityisthetimeittakesfoOr(tjhNenjejNtwjo)rktoconverge i i (7) afterthelastlinkcostchangeinthenetwork. Todeterminetimecom- Proof: FFrDomj(tt)hew(cid:20)orkiFngDojf(tMn)PATHti2nF[ting;.t6n,+w1e)observethat plexityweassumethecomputationtimetobenegligibleascomparedto is updated at line 3c when (a) the node goes from PASSIVE- thecommunicationtimes.If isthetimewheneverynodehasthe - Rto-DAjiCTIVE because of one or more distance increases (b) the node hopminimumtree,becauseetvnerynodeprocessesandreportschangnes receives the last reply and goes from ACTIVE-to-PASSIVEstate (c) infinitetime isbounded. Let forsome thenode isinPASSIVEstateandremainsinPASSIVEstatebecause finiteconstantjtn.+F1ro(cid:0)mttnhjeorem3,theconvejrtgne+n1ce(cid:0)titmnejc(cid:20)an(cid:18)beatmost thedistancedidnotincreaseforanydestination(d)thenodereceives and,henc(cid:18)e,thetimecomplexityis . thelastreplybutimmediatelygoesintoACTIVEstate. Thereported jNTj(cid:18)he computation complexity is the tiOm(ejNtajk)en to build the node’s FdfoisDlltoajiwnicsse.uRWpdDhaejtinerdethmaetalniinonsdeeu3niascehianacnAhgCetidTmIdeVuRrEinDpghjitahisseeu,ApCdaTteIVdmEaatyplhianalesseo3.cbB,eEemcqa.ou(ds6ie-) wshiiothtrateTksekitspinaftohrmtrieaetioinnTisiOfr(ojNmitj.hjNeThnj)eerioegpfhoebrroearttihotaenbcaleonsmdTprkuui.ntantUiinopgndaDctoiinmjkgpstlorefaxTiotyni fiedbythestatementonline3f,whichimpliesFEDq.ji(7). Tis iO(jN jjNjilog(jNj)) . OT(hjeNcojmjNmju+nicjNatiojjnNcjolomgp(ljeNxijt)y)is the number of update messages Theorem5: (Safety property) At any time , the successor sets required for propagating a set of link-cost changes. The analysis for computedbyMPATHareloop-free. t multiplelink-costchangesiscomplexbecauseofthesensitivitytothe i Sj(t)Proof: Theproofisbasedonshowingthatthe and com- timingofthechanges. So,weprovidetheanalysisonlyforthecaseof i i putedbyMPATHsatisfytheLFIconditions. Let FbDetjhetimSejwhen singlelink-costchange. Anoderemovesalinkfromitsshortestpath isupdatedandreportedforthe -thtime. Thtenproofisbyinduc- tree if only a shorter path using two or more links is discovered and i RtioDnjontheinterval . LetthenLFIconditionbetrueuptotime once discovered the path is remembered. Therefore, a removed link ,weshowthat [tn;tn+1] will not be added again to the shortest path which means that a link tn canbeincludedanddeletedfromtheshortest pathbyanodeatmost (8) i k onetime. Becausenodesreporteachchangeonlyoncetoeachneigh- FDj(t) (cid:20) Dji(t) t2[tn;tn+1] FromTheorem4wehave bor,anupdatemessagecantravelonlyonceonalinkandthereforethe number ofmessages sent byanodecanbeat most . Forcer- (9) FDji(tn) (cid:20) minfRDji(tn(cid:0)1);RDji(tn)g taintopologiesandsensitivelytimedsequenceoflinkOc(ojsEtjc)hangesthe i i i (10) amountofcommunicationrequiredbyPATHcanbeexponential.Hum- FDj(tn+1) (cid:20) minfRDj(tn);RDj(tn+1)g (11) blet [8] provides an example that exhibits such behavior, and though i i FDj(t) (cid:20) FDj(tn) t2[tn;tn+1) PATHisdifferentfromtheshortest-pathalgorithmpresentedinthatpa- Combiningtheaboveequationsweget per,wenotethatPATHisnotimmunefromsuchexponentialbehavior. (12) However,webelievesuchscenariosrequiresensitivelytimedlink-cost i i i FDj(t) (cid:20) minfRDj(tn(cid:0)1);RDj(tn)g t2[tn;tn+1] changes which are veryunlikely tooccur inpractice. If necessary, a Let be the time when message sent by at is received and 0 smallhold-downtimebeforesendingupdatemessagesmaybeusedto processted byneighbor . Becauseofthenon-izerotpnropagationdelay preventsuchbehavior. acrossany link, issukch that andbecause is 0 0 i modifiedat antdremainsunchtanng<edtin< tn+1 weget RDj tn (tn;tn+1) V. CONCLUDINGREMARKS (13) i k 0 Wehavepresentedthefirstroutingalgorithmbasedondistancein- RDj(tn(cid:0)1) (cid:20) Dji(t) t2[tn;t) i k 0 (14) formationthat providesmultiplepathsthat neednot haveequal costs RDj(tn) (cid:20) Dji(t) t2[t;tn+1] and that are loop-free at every instant, without requiring inter-nodal FromEq.(13)and(14)weget synchronizationspanningmorethanonehop. Theloop-freeinvariant i i k (15) conditions presented here arequite general and can be used with ex- Fromm(1in2)faRnDd(j1(5tn)(cid:0)th1e);inRdDucjt(ivtne)sgtep((cid:20)8)foDllojiw(ts).Betc2au[sten;tn+1] istinginternetprotocols. ThemultiplesuccessorsthatMPATHmakes i at initialization, from induction we have that FDj(t0)(cid:20) availableateachnodecanbeusedfortrafficload-balancing,whichas Djki(t0)forall . GiventhatthesuccessorsetsarecomputeFdDbjia(ste)do(cid:20)n wehaveshown using other algorithms(MPDA[16]) isnecessary for k minimizingdelaysinanetwork. MPATHcanthereforebeusedasan Dji(t,)itfollowtsthattheLFIconditionsarealwayssatisfied.According i alternativetoMPDAtogetsimilarperformance. Inafutureworkwe FtoDthjeTheorem1thisimpliesthatthesuccessorgraph isalways intend to compare the performance of the three multipath routing al- loop-free. SGj gorithmsMPATH,MPDAandDASM[17]intermsofcontrolmessage Theorem6: (Livenessproperty)Afinitetimeafterthelastchange overheadandconvergencetimesandanalyzetheirrelativemerits. in the network, the give the correct shortest distances and Dji. Sji = REFERENCES fkjDPjkro<ofD: jiT;hke2prNooifgissimilartotheproofofTheorem4in [16]and [1] E.W.DijkstraandC.S.Scholten. TerminationDetectionforDiffusingComputations. Information ProcessingLetters,11:1–4,August1980. isprovidedintheappendixforconvenience. [2] D.Farinachi.IntroductiontoenhancedIGRP(EIGRP).CiscoSystemsInc.,July1993. 6 [3] J.J.Garcia-Luna-Aceves. Loop-FreeRoutingUsingDiffusingComputations. IEEE/ACMTrans. element i. Let be the highest priority neighbor for which Networking,1:130–141,February1993. j 2 Mn K . Atmost nodesin i i i i i i [4] J.J.Garcia-Luna-AcevesandJ.Behrens.Distributed,scalableroutingbasedonvectorsoflinkstates. DcajnKha+velKles=semr oirnefqDujakl d+isltkajnkce2thNang which immpli(cid:0)es1path TK IEEEJournalonSelectedAreasinCommunications,October1995. existswithatmost hops.Let bejtheneighborof in K.T;henj [5] J.J.Garcia-Luna-AcevesandS.Murthy.Apath-findingalgorithmforloop-freerouting.IEEE/ACM i Trans.Networking,February1997. thepath m(cid:0)ha1satmost v hops.Because j isTaKtleasta [6] J.J.Garica-Luna-AcevesandM.Spohn. Scalablelink-stateinternetrouting. Proc.International -Khop;mvin!imujmtree,thelimnk(cid:0)1 mustagreeTwKiith . Since ConferenceonNetworkProtocols,October1998. (n(cid:0)1) , from indvuc!tionj hypothesis thereGis apath [7] C.Hendrick.RoutingInformationProtocol.RFC,1058,june1988. i i i i [8] P.A.Humblet. AnotherAdaptiveDistributedShortestPathAlgorithm. IEEETrans.Commun., DvK +inlK <suDchjtKha+tthleKlengthisatmost . 39:995–1003,June91. i;Novw wGeinneedtoshow thatthepreferredDnein;ivghbor for isalso , [9] J.M.JaffeandF.H.Moss.AResponsiveDistributedRoutingAlgorithmforComputerNetworks. IEEETrans.Commun.,30:1758–1762,July1982. sothatthelink willbeincludedintheconstructionvof i,thKus [10] P.M.MerlinandA.Segall. AFailsafeDistributedRoutingProtocol. IEEETrans.Commun., ensuringtheexivst!encjeofthepath in . IfsomeneigGhbnor 27:1280–1287,September1979. otherthan isthepreferredneighib;orfjor GthinenoneofthefollowiKng0 [11] J.Moy.OSPFVersion2.RFC,1247,August1991. two conditKions should hold: (a) v or (b) [[1132]] BRA..uPtRoeanrjloammgaoonpu.asFlaSanyuslattn-etdmolMse.r.aInnFttaebirmrnoeaatndw.coaArsktiRonfgers:opRuoetnisnseigaverinchDfoiarsmntrdiabtEiuoxtnepd.erCSieohnmocrpetue,ts2et:-r5PN1a–eth6tw9Ro,orMkustairancnghdA1I9Slg9Do1Nr.i,th7m,1w98it3h. Dvi.K0+lKi 0 =DviK+lKi andprioDrivitKy0of+Kl0Kii0sg<reaDterviKtha+nplKriiorityof [14] A.Segall. Optimaldistributedroutingforvirtualline-switcheddatanetworks. IEEETrans.Com- KCase(a):Because itfollowsthatthepath [15] Jm.uSnp.,in2e7l:l2i0a1n–d2R09.,GJaalnluagareyr.1E97v9en.tDrivenTopologyBroadcastwithoutSequenceNumbers. IEEE in isgreaDtejirKth+anlKico(cid:20)stoDfjiK0+lKii0n whichimpliesthat Trans.Commun.,37:468–474,1989. v ;isjnotTaKi0 hopminimumtreev–!acojntradGictionofassumption. [16] S.VutukuryandJ.J.Garcia-Luna-Aceves. ASimpleApproximationtoMinimumDelayRouting. TTKhie0refore (n(cid:0)1) . Proc.ofACMSIGCOMM,1999. i i i i i [17] W.T.ZaumenandJ.J.Garcia-Luna-Aceves.Loop-FreeMultipathRoutingUsingGeneralizedDif- Case (bD):vKLe+t lKb=emthiensfeDt vokf+neilgkhjkbo2rsNthagt give the minimum fusingComputations.Proc.IEEEINFOCOM,March1998. distancefor ,i.e.Q,fjoreach , i i i i . Similajrly, let be skuc2hQthjatDfojrke+aclhk =minf,Djk+lkjk 2 APPENDIX i i i N g Qv . If and k 2 Q,vthDenvkit+folllkow=s i i i ProofofTheorem2: mfroimnfsDamvke+arglkujmken2t aNs ign casek(a2) tQhavt k 2=inQj is greater than letLetHbinedtheenosetetoanfnno-dheosptmhaitnaimreuwmitthreineroohtoepdsaftronmodeiininG. aLnedt –coastcoofnvtra!dictjioinnoGf aimsspulmyipntgioTn.ki Bisencoautsave(;n(cid:0)hja1s)-thhTeokiphimghineismtupmriotrrietye Mdinenotethedistanceof to in .Lent bethecoistofHthinelink among all members of and Kand , also has Din;j .Node iscalledthehieadjoftHheinlink dij .Thenotation thehighestpriorityamoQngjallmQemvb(cid:18)ersQofj . kTh2ereQfovreK . iin!dicjatesapaithfrom to ofzeroormoreil!inkjs;ifthepathhais;zeroj Also, fromthesameargument itcanbeinfQervredthat Qv (cid:18). TQhjis links,then . Theilenjgthofpath isthesumofcostsofall provesthat willbeincludedintheconstructionoKf 2i.QBvecause linksintheipa=th.j i ; j Dannid;vle+ngdthvojvf=!Djni;jiinnG,iwshleesrsetdhavnjoisrtehqeuafilntaolcostofrfoGlminnkinvdu!ctiojn, Property1: From the principle of optimality (the sub-path of a hypothesis,wie;havvelenGginthof in lessthDanni;vorequalto . shortestpathbetweentwonodesisalsotheshortestpathbetweenthe Thisprovespart1ofthetheoreim;. j Gin Din;j endnodesofthesub-path),if and aretwo -hopminimumtrees 0 rootedatnode and and HareseHtsofnodesnthatarewithin hops ProofofTheorem3: 0 ftForoormeia;ijc.ihnFjHor2anMidinHt,0hMreesil;epjnegctthiMveoilf;yjp,.aththeniM;=jiMnb0o=thMHinanadndHM0iinsne(cid:21)qunal. iin,TtThiheiesnpaetrtwoleooafrskitsnhba-yhsoianptdmmucointsiitomnuomnttrnele,i.nthkBesegcwlaohubesareeltthimelieosnwnguhemesntblefoororopef-afnrcoehedneposadtihne DLent i h(cid:21)n Dhi,(cid:20)whDerne i isthesetofnodesin i. Because thenetwork, isthetimNe(cid:0)wh1eneverynodeNhastheshortestpathto isaAtlea=stSank2NiAk -hopmiAnikmumtreeandnode cTankappearat everyothernotNde(cid:0).1Weneedtoshowthat isfinite.Thebasecaseis i Tmkostonceineach(nof(cid:0)1),each hasatleast uniiqueelements. ,thetimewheneverynodehas1-hopmtNin(cid:0)i1mumdistanceandbecause i i Therefore, hasatlAeakst Aeklements. n(cid:0)1 tth1eadjacentlinkchangesarenotifiedwithinfinitetime, . Let i Let Abethesetof n(cid:0)1nearestelementstonode in . That forsome . Giventhatthepropagationdelta1y<sar1efinite i i is, Mn , n(cid:0),1andforeach and i A , tenac<hn1odewillhavnee<achNofitsneighbors -hopminimumtreeinfinite i i i i i i Mn (cid:18)A jMnj=n(cid:0)1 j 2Mn . v2A (cid:0)Mn timeafter . FromTheorem2wecansneethatthenodewillhaveat i i i i i i miTnofDprjokve+thlekjtkhe2orNemgit(cid:20)ismsuifnfifciDenvtkto+plrkojvke2thNefogllowing: leastthe tn -hopminimumtreeinfinitetimeafter . Therefore, 1. Let representthegraphconstructedbyMTUonlines2and3. (n.+Fro1m) inductionwecanseethat .tn i (i.e.,GbneforeapplyingDijkstrainline4). Foreach there tn+1 <1 tN(cid:0)1 <1 isapath in suchthatitslengthisatmostj 2 M. ni ProofofTheorem6: 2. Afterrunin;ingjDijkGstinraon online4inMTU,theDresni;ujltingtree TheconvergenceofMPATHfollowsdirectlyfromtheconvergence i isatleastan -hopminimuGmntree. of PATH because the update messages in MPATH are only delayed Letusfirstassumnepart1istrueandprovepart2becauseitissimple. a finite time as allowed at line 4 in algorithm PATH. Therefore, the From the statement in part 1 for each node there is a path distances inMPATHalsoconvergetoshortestdistances. Because i i iDB;eijckasjutrsiaen,tGwheinerewcaaintrheilnefnegrtthhaentroemdieosssatinDpaint;jh.ii,Int;htehetjrreewjesiuct2hlotnilnsMegtnrgtunrtcehteeadatfhtmearosrsauttnDlneianin;sjgt. crahtaendgebsytatoDfhteeDjrnjiceoaignrehvbearolgwresanyicnse.trheFeprioorrmtteadblilnteoest3ihnaeifinnneMiitgePhAtbioTmrHes,aDwndjiekaorb=eseinrDvceojkrtphfooa-rt nodes includingnn(cid:0)od1e . FromMprnoperty 1, it follows that the tree kwh2enNnoide becomespassive holdstrue.Becauseallnodes nconstructedisatleastan i-hopminimumtree. arepassiveiatconvergenceitFfoDllojiw=stDhajit We now prove part 1.nOrder the nodes in in non-decreasing i i i i . Sj = fkjDjk < FDj;k 2 order. Theproof isbyinductiononthesequenMcenofelementsin . i k i i i N g=fkjDj <Dj;k2N g Thebasecaseistruebecausefor ,thefirstelementof , Mn i i and m1 AsinductionhypothMesis,llmet1th=e mstaitnemflekinjkth2oldNfiogrthefilmirs1t =Di1;mel1e:mentsof . Considerthe -th i m(cid:0)1 Mn m