Table Of Content

1 Resilient Multicast using Overlays Suman Banerjee, Seungjoon Lee, Bobby Bhattacharjee, Aravind Srinivasan Abstract(cid:151)We introduce PRM (Probabilistic Resilient [16]. Unlike native multicast where data packets are Multicast):amulticastdatarecoveryschemethatimproves replicated at routers inside the network, in application- data delivery ratios while maintaining low end-to-end layer multicast data packets are replicated at end hosts. latencies. PRM has both a proactive and a reactive com- Logically, the end-hosts form an overlay network, and ponents; in this paper we describe how PRM can be used thegoalofapplication-layermulticastistoconstructand to improve the performance of application-layermulticast maintain an ef(cid:2)cient overlay for data transmission. The protocols, especially when there are high packet losses eventualdatadeliverypathin application-layermulticast and host failures. Through detailed analysis in this paper, is an overlay tree. While network-layer multicast makes we show that this loss recovery technique has ef(cid:2)cient scaling properties (cid:151) the overheads at each overlay node the most ef(cid:2)cient use of network resources, its limited asymptoticallydecreasetozerowithincreasinggroupsizes. deployment in the Internet makes application-layer mul- As a detailed case study, we show how PRM can be ticast a more viable choice for group communication applied to the NICE application-layer multicast protocol. over the wide-area Internet. We present detailed simulations of the PRM-enhanced Akeychallengeinconstructingaresilientapplication- NICE protocol for 10,000 node Internet-like topologies. (cid:0) layer multicast protocol is to provide fast data recovery Simulations show that PRM achieves a high delivery when overlay node failures partition the data delivery ratio ( 97%) with a low latency bound (600 ms) for environments with high end-to-end network losses (1-5%) paths. Overlay nodes are processes on regular end-hosts and high topology change rates (5 c(cid:1)hanges per second) which are potentially more susceptible to failures than while incurring very low overheads ( 5%). the routers. Each such failure of a non-leaf overlay node causes a data outage for nodes downstream until the time the data delivery tree is reconstructed. Losses due to overlay node failures are more signi(cid:2)cant than I. INTRODUCTION regular packet losses in the network and may cause data We present a fast multicast data recovery scheme outage in the order of tens of seconds (e.g. the Narada that achieves high delivery ratios with low overheads. application-layermulticastprotocol[8] setsdefaulttime- Our technique, called Probabilistic Resilient Multicast outs between 30-60 seconds). (PRM), is especially useful for applications that can PRM uses two simple techniques: bene(cid:2)t from low data losses without requiring perfect (cid:2) AproactivecomponentcalledRandomizedforward- reliability. Examples of such applications are real-time audio and video streaming applications where the play- ing in which each overlay node chooses a con- back quality at the receivers improves if the delivery stant number (e.g., 1 or 2) of other overlay nodes uniformly at random and forwards data to each ratios can be increased within speci(cid:2)c latency bounds. of them with a low probability (e.g. 0.01-0.03). Usingterminologyde(cid:2)nedinpriorliterature[26]wecall this model of data delivery resilient multicast. This randomized forwarding technique operates in conjunction with the usual data forwarding mecha- In this paper we describe PRM in the context of nisms alongthe tree edges,andmay lead to a small overlay-based multicast [8], [10], [1], [31], [7], [25], number of duplicate packet deliveries. Such dupli- S. Banerjee is with the Department of Computer Sciences, Uni- cates are detected and suppressed using sequence versity of Wisconsin, Madison, WI 53706 USA. S. Lee, B. Bhat- numbers. The randomized component incurs very tacharjee, and A. Srinivasan are with the Department of Computer Science, University of Maryland, College Park, MD 20742, USA. low additional overheads and can guarantee high B. Bhattacharjee and A. Srinivasan are also with the Institute for deliveryratiosevenunderhighratesofoverlaynode AdvancedComputerStudies,UniversityofMaryland.S.Banerjee,S. (cid:2) failures. Lee,andB.BhattacharjeeweresupportedinpartbyNSFawardANI A reactive mechanism called Triggered NAKs to 0092806. A. Srinivasan was supported in part by NSFAward CCR- 0208005. B.Bhattacharjee and A.Srinivasan werealsosupported in handle data losses due to link errors and network part by ITR Award CNS-0426683. congestion. This is an extended version of a paper that appeared in ACM SIGMETRICS2003. Through analysis and detailed simulations we show 2 that these relatively simple techniques provide high re- We explain the speci(cid:2)c details of proactive random- silience guarantees, both in theory and practice. PRM ized forwarding using the example shown in Figure 1. can be used to signi(cid:2)cantly augment the data delivery In the original data delivery tree (Panel 0), each overlay ratios of any application-layer multicast protocol (e.g. node forwards data to its children along its tree edges. (cid:3)(cid:5)(cid:4)(cid:7)(cid:6)(cid:9)(cid:8)(cid:11)(cid:10) (cid:3)(cid:5)(cid:12)(cid:13)(cid:6)(cid:9)(cid:14)(cid:15)(cid:10) Narada[8],Yoid[10],NICE[1],HMTP[31],Scribe[7], However, due to network losses on overlay links (e.g. (cid:16) (cid:17) DelaunayTriangulation-based[16],CAN-multicast[25]) and )orfailureofoverlaynodes(e.g. , (cid:18) (cid:8)(cid:19)(cid:6)(cid:9)(cid:14)(cid:20)(cid:6)(cid:22)(cid:21)(cid:15)(cid:6)(cid:9)(cid:23)(cid:19)(cid:6)(cid:25)(cid:24)(cid:26)(cid:6)(cid:9)(cid:27) while maintaining low latency bounds. and ) a subset of existing overlaynodes do not receive (cid:28) The contributions of this paper are three-fold: the packet (e.g. and ). We remedy (cid:2) We propose a simple, low-overhead scheme for re- this as follows. When any overlaynode receivesthe (cid:2)rst silient multicast. To the best of our knowledge,this copyofadatapacket,itforwardsthedataalongalloth(cid:29)er workisthe(cid:2)rstproposedresilientmulticastscheme tree edges (Panel 1). It also chooses a small number ( ) ofotheroverlaynodesandforwardsdatatoeachofthem that can be used to augment the performance of (cid:30) (cid:31) (cid:14) withasmallprobability, .Forexamplenode chooses (cid:2) application-layer multicast protocols. We present a full analysis of PRM and derive to fo(cid:28)rward data to two other nodes using cross edges the necessary and suf(cid:2)cient conditions for PRM and . which will allow the control overheadsat the group Note that as a consequence of these additional edges members to asymptotically decrease to zero with some nodes may receive multiple copies of the same (cid:3)(cid:5)(cid:12)!(cid:6) (cid:10) (cid:3)(cid:5)"(cid:20)(cid:6) (cid:10) (cid:2) increasing group sizes. packet(e.g.no de inPanel1receive sthedataalongthe We demonstrate how our proposed scheme can be tree edge and cross edge ). Therefore each used with an existing application-layer multicast overlaynodeneedstodetectandsuppresssuchduplicate protocol (NICE [1]) to provide a low overhead,low packets. Each overlay node maintains a small duplicate latency and high delivery ratio multicast technique suppression cache, which temporarily stores the set of for realistic applications and scenarios. data packets received over a small time window. Data packets that miss the latency deadline are dropped. The rest of this paper is structured as follows. In Hence the size of the cache is limited by the latency the next section we present the details of the PRM deadline desired by the application. In practice, the scheme and analyze its performance in Section III. In duplicate suppression cache can be implemented using Section IV we present detailed simulation studies of the playback buffer already maintained by streaming the the PRM-enhanced NICE protocol. In Section V we media applications. describe related work and conclude in Section VI. The It is easy to see that each node on average sends or Appendix gives a brief sketch of the proofs. #%$&(cid:30)’(cid:29) receives up to copies of the same packet. The (cid:30)’(cid:29) (cid:30) overhead of this scheme is , where we choose to (cid:29) # II. PROBABILISTIC RESILIENT MULTICAST (PRM) b( e a small value (e.g. 0.01) and to be between and ThePRMschemeemploystwomechanismstoprovide . In our analysis we show that if the destinations of resilience. We describe each of them in turn. these cross edges are chosen uniformly at random, it is possible to guarantee successful reception of packets at each overlay node with a high probability. A. Randomized Forwarding Each overlay node periodically discovers a set of In randomized forwarding, each overlay node, with a random other nodes on the overlay and evaluates the small probability, proactively sends a few extra trans- numberof lossesthat it shareswith these randomnodes. missions along randomly chosen overlay edges. Such In anoverlayconstructionprotocol like Narada[8], each a construction interconnects the data delivery tree with node maintains state information about all other nodes. some cross edges and is responsible for fast data recov- Therefore, no additional discoveryof nodes is necessary ery in PRM under high failure rates of overlay nodes. in this case.For someother protocols like Yoid [10] and Existing approaches for resilient and reliable multicast NICE [1] overlay nodes maintain information of only a use either reactive retransmissions (e.g. RMTP [24], small subset of other nodes in the topology. Therefore STORM [26] Lorax [18]) or proactive error correction we implement a node discovery mechanism, using a codes (e.g. Digital Fountain [4]) and can only recover random-walkontheoverlaytree.Asimilartechniquehas from packet losses on the overlay links. Therefore the beenusedinYoid[10]todiscoverrandomoverlaygroup proactive randomized forwarding is a key difference members. The discovering node transmits a Discover between our approach and other well-known existing message with a time-to-live (TTL) (cid:2)eld to its parent approaches. on the tree. The message is randomly forwarded from 3 0 1 B B M A A A C E F C E F X D T D T N Y Q Q Z P G H J K L M N P G H J K L M N P Overlay subtree with large number of node failures Fig.1. The basicidea of thePRMscheme. The circlesrepresent theoverlay nodes. Thecrosses Fig. 2. Successful delivery with indicate link and node failures. The arrows indicate the direction of data (cid:3)ow. The curved edges high probability even under high indicate the chosen cross overlay links for randomized forwarding of data. node failure rate. neighbor to neighbor, without re-tracing its path along Z W : 118 017 116 115 114 Z the tree and the TTL (cid:2)eld is decremented at each hop. Seq. no: 18 The node at which the TTL reaches zero is chosen as NAK: 16 17 16 15 14 the random node. v Z : 0 1 1 1 18 17 16 15 14 Why is Randomized Forwarding effective? It is in- Y W : 1 0 0 1 1 Y Seq. no: 18 teresting to observe why such a simple, low-overhead 17 16 15 14 NAK: 15, 14 v : 0 0 1 1 randomized forwarding technique is able to increase Y 18 17 16 15 14 packet delivery ratios with a high probability, especially X WX : 1 0 0 0 0 when many overlay nodes fail. Consider the example shown in Figure 2, where a large fraction of the nodes Fig. 3. Triggered NAKs to parent on the overlay tree when data havefailedinthes(cid:4)hadedregion.Inparticular,therootof unit with sequence number 18 is propagated along the overlay. The length of the bit-mask is 4. thesub-tree,node ,hasalsofailed.So ifnoforwarding is performed along cross edges, the entire shaded sub- treeispartitionedfromthedatadeliverytree.Nooverlay 2 node in this entire sub-tree would get data packets till sequenceandsendsNAKsto torequesttheappropriate 2 3 the partition is repaired. However, using randomized retransmissions. Note that can either be a parent of forwarding along cross edges, a number of nodes from in the data delivery tree, or a random node forwarding (cid:3) (cid:6)*)+(cid:10)(cid:22)(cid:6),(cid:3)(cid:5)-(cid:19)(cid:6)(cid:9).(cid:13)(cid:10) (cid:3)(cid:5)"/(cid:6)(cid:22)01(cid:10) . the unshaded region will have random edges into the along a cross edge. We illustrate the use of triggered (cid:28) 0 shaded region as shown ( and ). NAKs in Figure 3. Node receives data with sequence . Theoverlaynodesthatreceivedataalongsuchrandomly number 18, which indicates that the parent has data chosencrossedgeswill subsequentlyforward dataalong sequence numbers 14, 15, and 16. Since node does 0 ) . regular tree edges and any chosen random edges. Since not have sequence number 16, it sends a NAK for 16 to ) the cross edges are chosen uniformly at random, a large .Similarly,node sendsaNAKtoitsparent for 14 subtree will have a higher probability of cross edges and 15. Note that does not have data with sequence . being incident on it. Thus as the size of a partition number 16, but does not request retransmission from its increases,so doesits chanceof repair using crossedges. parent. This is because the parent does not currently 0 have this data and has already made an appropriate . ) retransmission request to its parent . On receiving this B. Triggered NAKs data, will automatically forward it to . This is the reactive component of PRM. We assume that the applicationsourceidenti(cid:2)eseachdataunitusing C. Extensions monotonically increasingsequencenumbers. An overlay node can detect missing data using gaps in the sequence We describe two extensions to the PRM scheme that numbers.This information is usedto trigger NAK-based further improve the resilience of the data delivery. retransmissions.Thistechniquehasbeenappliedforloss Loss Correlation: This is a technique that can be repair in RMTP [24]. used to improve the randomized forwarding component 2 In our implementation each overlay node, , pig- ofPRM. AsdescribedinSectionII-A eachoverlaynode gybacks a bit-mask with each forwarded data packet chooses a small number of cross edges completely at indicating which of the prior sequence numbers it has random for probabilistic data forwarding on the overlay. 3 correctly received. The recipient of the data packet, , In practice, it is possible to increase the utility of detects missing packets using the gaps in the received these cross edges by choosing them more carefully. In 4 Seq. no: 31 be overly aggressiveand cause false positives leading to 30 29 28 27 v : 0 0 1 1 a higher amount of data duplication on the data delivery P tree. Thus, the improvement in performance is at the Y NAK: 28, 27 P cost of additional state, complexity and packet overhead EGF_Request 31 30 29 28 27 31 30 29 28 27 at nodes. Depending on the reliability requirements, WY : 1 0 0 0 0 WP : 1 0 0 1 1 applications may choose to enable or disable EGF. Fig. 4. Triggered NAKs to source of4random forwarding for data withsequencenumber31.Thevalueof forEphemeralGuaranteed III. EVALUATION OF PRM Forwarding is set to 3. A key component of the PRM scheme is the randomized forwarding technique which achieves high delivery ratios in spite of a large number of overlay node/link 5 particular if is the root (and source) of the overlay failures. In this section we present our analysis of this tree,w)ewant.tochooseacrossedgebetweentwooverlay scheme. nodes and ,ifandonlyifthe(cid:3)c57or6r8:e9lati)+on(cid:10) betwe(cid:3) 5;en68<t9he Recall fro(cid:30)’m(cid:29) Section II that the per-node overhead p.=a(cid:10)cketslost on the overlaypaths and of PRM is . We will now primarily be concerned is low. Clearly if these two overlay paths share no with the case of low overhead; in particular, #the case underlying phys)ical lin.ks, then we expect the losses where the overhead is (much) smaller than . Thus, experienced by and to be uncorrelated. However, u(cid:29)IpG to #Section III-A, we will consider the case where such a condition is dif(cid:2)cult to guarantee for any overlay : here, each node does randoJKmGLfo(cid:30)rwarding to protocol. Therefo)re under the loss correlation extension, just one node, with a probability of (cid:29) . However, w. eleteachnode tochoosearandomedgedestination, all of our results will also hold for being arbitrary. , with which has the minimum number of common In this paper, we provJe that even if the probability losses over a limited time window. of random forwarding is made an arbitrarily (cid:30)’sm(cid:29)+aGll positive constant (i.e., even if the data overhead EphemeralGuaranteedForwarding(EGF): This is J<(cid:29)MGNJ is made negligible), the scheme can be designed anextensiontothetriggeredNAKcomponentandisalso . so that almost all surviving overlay nodes get the data useful in increasing the data delivery ratio. Consider the case when node receives a data packet with "sequence with high probOability. In particular, the system scales: > number along a random edge from node . If o. n as the number of nodes increas# es, our probability of > successful data delivery tends to . receiving the data packet with sequence number , ? . We start with some notation and assumptions. detects a large gap (greater than a threshold ) in the sequence number space.it is likely that" the parent of (A1) All nodes at the same level of the tree havethe has failed. In this case, can request to increa(cid:3)(cid:5)s"(cid:20)e(cid:6)(cid:9)t.@he(cid:10) same number of childO ren. The total number of (cid:30) . nodes is denoted by . random forwarding probability, , for the edge P Q " (A2) There are parameters and such that the to one for a short duration of time. To do this sends an EGF Request message to . Note that BthADeCFEEGF state pP robability of any givennode failing is at most is soft and expires within a time period. . This is , and the prQobability of any given linkP failingQ shown by an example in Figure 4. Node receives data is at most . We only# require that and w" ith. sequence number 31 along a random edge from bP (cid:6)eQ(cid:13)bRTouSVnUXdW ed away from : e.g., we could have . immediately requestsretransm?HissGion( sf. or datawith . (Indeed, a multicast tree compoWYS[seZ d sequencenumbers28and27.S" ince , alsose. nds of elements that may fail with more than the EGF Request message to . If the tr.ee path of is probabiPlity isQ in effect useless; in practice, we " expect and to be close to zero.) The failure repaired before the EGF period expires, can also send an EGF Cancel message to to terminate this state. events are all independent. We next present a theorem that deals with the asymp- The EGF mechanism is useful for providing unin- O totic regime where is large. We then discuss a terrupted data service when the overlay construction (cid:147)tree augmentation(cid:148) technique and general optimality protocol is detecting and repairing a partition in the of our results in Section III-A. We complement these data delivery tree. In fact, putting such a mechanism in in Section III-B with simulation results for the (cid:147)non- place allows the overlay nodes to use larger timeouts asymptotic(cid:148) regime of size 10,000. to detect failure of overlay peers. This, in turn, reduces the control overheads of the application-layer multicast Theorem 3.1: Letthe probabilityofrandomforward- J protocol.Inpractice,theEGFmechanismcansometimes ing be an arbitrary positive constant (i.e., it can be 5 (cid:16) \]S arbitrarily small). Then, there is a constant Overlay link loss 2%, Overlay node failure 5%, p = 0.05 such that the following holds. Suppose every non-leaf 1 (cid:16)+^‘_bacO 0.95 nodehasatleast children.Then,withprobability tendingto # asO increases,thefollowingtwoclaimshold ered 0.9 simultaneously: deliv 0.85 s 0.8 (i) All the non-leafnodesthatdid notfail success- cket 0.75 a fully get thede#fd8(cid:7)atPhagj.ikde#f8(cid:15)QjgYikde#f8mlFdnODg*g of p 0.7 n (ii) Atleastan fractionof ctio 0.65 a the leaf nodes tl:hadnODtgdid not fail, successfullyget Fr 0.6 110KK nnooddeess ((PPRRMM)) the data; here denotes a negligibly small 0.55 1K nodes (BE) S O 10K nodes (BE) 0.5 quantity, which tends to as increases. 2 5 10 20 Proof: See Appendix. Overlay node degree Theorem 3.1 shows thatO as long as the node-degrees Fig. 5. Variation of data delivery ratio with overlay node degree. are at least logarithmic in , the tree is highly resilient BEstands for (cid:147)Best Effort.(cid:148) to node- and link-failures, even with arbitrarily low data overhead. To take a realistic example, we consider the ^(cid:131)_bacO case when the end-to-end losses on overlay links is 2%, leaves are only some small constant times , then in andsimultaneousnumberofoverlaynodesfailuresis1% fact a large number of the leaves’ parents will fail to get # O of the group (for group of size 10,000 this translates to the data with probability tending to as increases. 100 simultaneous failures). Theorem 3.1 states that with a high probability all surviving non-leaf overlay nodes B. Expected Scaling Behavior for Finite Groups and at least 97% of the surviving leaf nodes continue In the analysisof the PRM scheme(described above), to receive data packets using the existing overlay data we have explored its asymptotic behavior with corre- paths and random edges. spondingincreaseinthegroupsize.Throughsimulations It is, in fact, possible to increase the delivery to wenowshowthatPRMisexpectedtoperformverywell leaf-nodes to to increase the delivery to leaf-nodes to in practice even for groups of moderate size. In these an arbitrarily high value using a (cid:147)tree augmentation(cid:148) idealized simulations, we assume that there exists some scheme described next. application-layer multicast protocol which constructs and maintains a data delivery tree. When data packets A. Tree-augmentation extensions to PRM are sent on this tree, a random subset of the overlay o If we desire, for an arbitrary given(small) constant , nodes fail simultaneously. The failed subset is chosen de#m8&o:g that at least a (cid:150)fraction of the surviving leaves independently for each data packet sent. Additionally, get the data with high probability (in addition to item (i) data packets also experience network layer losses on (cid:129) of Theorem 3.1), then it suf(cid:2)ces to augment the tree as overlaylinks. Consider a regular tree, where all non-leaf #p$rq(cid:25)sut*vxwzyk{e|j} follows: each leaf connects to sut*v~w(cid:127)yk{e(cid:128)(cid:5)} randomly nodes have the same degree. From the analysis we can chosennon-leafnodes,andgetsthedatafromanyoneof intuitively expectthat as the degreeof the tree increases, them that has received the packet. For example, for the so does the data delivery ratio. 1% node failure rate case,the tree augmentation scheme In Figure 5 we illustrate how the data delivery ratio will guarantee that 99.98% of all nodes (including leaf for the non-leaf nodes of an overlay tree improves with (cid:6) nodes) will successfully get the data packets with an increaseindegree.Weconsideredtwodifferenttreesizes #~SbSbS #~S SbSbS overhead bounded by the constant two. (cid:150) nodes and nodes. In this example, we de#[8moFg Infact,ifwerequirea (cid:150)fractionofthesurviving assume that for each overlay link experiences a loss (cid:132)(cid:133)Z W(cid:133)Z (cid:129) leaves to get the data with high probability, this amount rate of and the node failure rate is (which #m$Lq(cid:25)sut*v~wzyk{e|j} WYS #~SbSbS (cid:6) of overhead (i.e., sut*v~w(cid:127)yk{e(cid:128)(cid:130)} ) is necessary for any implies simultaneous failures for a -node tree WYSbS #~S SbSbS protocol.In Appendixweshowthatourprotocol’slower and simultaneous failures on a -node tree). bounds on the fraction of surviving leaves getting the Such failure rates are very high by the usual Internet J data(cid:151)both with and without the random augmentation standards. The randomized forwarding probability is SVU(cid:134)S(cid:133)W W(cid:133)Z for leaves(cid:151)areoptimal. Also in Appendix we show that chosen to be (i.e. the data overhead is ). We the logarithmic degree-requirement of Theorem 3.1 is can see that even under such adverse conditions, the ( bothnecessaryandsuf(cid:2)cientifwedesirealowoverhead. randomized forwarding technique achieves data delivery (cid:135) Z W Speci(cid:2)cally, if the degrees, e.g., of the parents of the ratio of about even with a tree degree of ; the 6 (cid:135)bW(cid:133)Z #~S delivery ratio exceeds when the degree is made , useful for very dynamic scenarios, at the cost of higher # quickly approaching as the degreeis increasedfurther. data overheads. The results for the leaf nodes in practice, are close to Choosing FEC parameters: Since the FEC-based (cid:136)(cid:145)$(cid:19)(cid:29) (cid:136) those of the non-leaf nodes. schemes need to send packets instead of packets weuseahigher datarate at the source(i.e. a datarate of d(cid:5)(cid:136)D$(cid:13)(cid:29)[g*(cid:137)j(cid:136) timesthe data rate usedbythe otherschemes). IV. SIMULATION EXPERIMENTS TheresilienceofanFEC-basedschemecanbeincreased (cid:29) The PRM scheme can be applied to any overlay- by increasing the overheads parameter, . Also for the based multicast data delivery protocol. In our detailed same amount of additional data overheads, resilience simulation study we implemented PRM over the NICE against network losses of the FEC-based schemes im- (cid:136) (cid:29) application-layer multicast protocol [1]. We used NICE prove if we choose higher values of and . For becauseofthreereasons:(1)theauthorsin [1]showthat example,FEC-(128,128)hasbetter datarecoveryperfor- the NICE application-layer multicast protocol achieves mance than FEC-(16,16) even though both have 100% good delivery ratios for a best-effort scheme; (2) NICE overhead. This improved reliability comes at the cost isascalableprotocolandthereforewecouldperformde- of increased delivery latencies. Therefore, the maximum (cid:136) (cid:29) tailed packet-level simulations for large overlay topolo- value of and depends on the latency deadline. We gies; (3) the source-code for NICE is publicly available. haveexperimentedwitharangeofsuchchoicesuptothe We have studied the performance of the PRM- maximumpossiblevaluethatwillallowcorrectreception enhanced NICE protocol using detailed simulations. We at receivers within the latency deadline. However, we compare the following four schemes, all of which use observed that in presence of failures of overlay nodes (cid:136) (cid:29) NICE as the underlying multicast protocol: increasing and doesnotalwaysimprovetheresilience BE: This is the best-effort multicast scheme and has no properties. This is because choosing higher values of (cid:136) (cid:29) reliability mechanismsandtherefore servesas a baseline and leads to increased latencies in data recovery. for comparison. However when the group change rate is high the data HHR: This is a reliable multicast scheme, where each delivery paths break before the FEC-based recovery can overlaylinkemploysahop-by-hopreliabilitymechanism complete, and the achieved data delivery ratio is low. usingnegativeacknowledgments(NAKs).Adownstream overlay node sends NAKs to its immediate upstream A. Simulation Scenarios overlaynodeindicatingpacketlosses.Onreceivingthese In all these experiments we model the scenario of a NAKs the upstream overlaynoderetransmits the packet. source node multicasting streaming media to a group. This scheme, therefore, hides all packet losses on the The source sends CBR traf(cid:2)c at the rate of 16 packets overlay(cid:6)links. (cid:136) (cid:29) persecond.For all theseexperimentswechosea latency FEC-( ): This is another enhanced version of the deadline of upto 8 seconds. As a consequence the size NICE protocol in which the source uses a forward error of the the packet buffer for NAK-based retransmissions correction mechanism to recover from packet losses. In (cid:136) is 128. The packet buffer will be larger for longer this scheme,the source takes a set of data packetsand (cid:136)%$N(cid:29) deadlines. encodes them into a set of packets and sends this Weperformed detailed experimentswith awide-range encoded data stream to the multicast group. A receiving (cid:136) of parameters to study different aspects of the PRM member can recover the data packets if it receives (cid:136) (cid:29)M$&(cid:136) scheme. The network topologies were generated using any of the encoded packets 1. Additional data (cid:29)[(cid:137)j(cid:136) the Transit-Stub graph model, using the GT-ITM topol- overhead(cid:6)s of this scheme are . (cid:6) (cid:29) (cid:30) ogy#~gSenSbeSbrSator [5]. All topologies in these simulations PRM-( ): This is our proposed PRM enhancements h( ad routerswithanaveragenodedegreebetween (cid:146) implemented on the basic NICE(cid:29) applic(cid:30)ation-layer multi- and . End-hosts were attached to a set of routers, cast protocol. The meanings of and are the same as chosen uniformly at random, from among the stub- in Section II. For all these experiments we implemented domainnodes.Thenumberofsuchhostsinthemulticast (cid:147) (cid:147)V#x(cid:135)b(cid:132) the loss correlation extensions to PRM. We enable EGF group were varied between and for different for only one speci(cid:2)c experiment (described later) due to experiments. Each physical link on the topology was its higher overheads. Our results will show that EGF is modeledtohavelosses.Inter-domainlinkshad0.5-0.6% loss rates, while intra-domain links was about 0.1% loss 1The Digital Fountain technique [4](cid:138)(cid:5),(cid:139):u(cid:140)(cid:142)se(cid:141)ks(cid:143)n(cid:144)Tornado codes that rates.Wealsomodelburstylossesasfollows:if apacket requ(cid:144)ire the receiver to corr(cid:141)ectly receive packets to recover the data packets, where is a small constant. is lost ona physicallink we increasethe lossprobability 7 for subsequent packets received within a short time link loss=0.5%, deadline=8 sec, 512 hosts 1 window. The average propagation and queueing latency 0.9 on each physical link was between 2-10 ms. In all our 0.8 experimentswe use a HeartBeat period of 5 secondsfor 0.7 o NICE and its extensionsas was describedbythe authors ati 0.6 R in [1]. ery 0.5 We have simulated a wide-range of topologies, group Deliv 0.4 0.3 sizes, member join-leave patterns, and protocol parame- PRM 3,0.01 0.2 HHR ters. Inthe experiments,all departuresof end-hostsfrom FEC(128,128) 0.1 FEC(16,16) BE themulticastgroupweremodeledas(cid:147)ungracefulleaves.(cid:148) 0 0 1 2 3 4 5 6 7 8 9 10 This is equivalent to a host failure, where the departing Group Change Rate (per second) member is unableto senda Leavemessageto the group. Fig.6. Delivery ratio with varying rate of changes to the group. In the experiments reported in these section, we (cid:2)rst let a set of end-hosts join the multicast group and stabilize into an appropriate multicast data delivery tree. Subsequently a traf(cid:2)c source end-host starts sending group change=0.1/sec, deadline=8 sec, 512 hosts data group and end-hosts continuously join and leave 1 0.9 the multicast group. The join and the leave rate for 0.8 members are chosen to be equal so that the average 0.7 size of the group remained nearly constant. The instants atio 0.6 R of group membership changes were drawn from an ery 0.5 exponential distribution with a pre-de(cid:2)ned mean, which Deliv 0.4 0.3 variedbetweenexperiments.We studiedthe variousdata PRM 3,0.01 0.2 HHR delivery properties of our proposed scheme over this FEC (128,128) 0.1 FEC (16,16) dynamically changing phase of the experiment. 0 BE 0 0.2 0.4 0.6 0.8 1 Link Loss (%) B. Simulation Results Fig.7. Delivery ratio with varying network loss rates. We have studied the three metrics of interest: data delivery ratio, delivery latency and data overheads. The dataoverheadsinPRMarebecauseofduplicationdueto randomized forwarding, and due to redundant encoding membership change rate of 10 per second. Additionally in FEC-based schemes. We also examine the additional PRM incurs a very low (3%) additional data overhead. control overheads due to NAKs, random member dis- The data delivery ratio for the best-effort protocol covery etc. falls signi(cid:2)cantly (to about 0.35 for change rate of 10 Delivery Ratio: In Figure 6 we show the delivery group changes per second) with increase in the change ratioofthedifferentschemesasthefrequencyofchanges rate to the overlay. In [1], the authors had shown that to group membership is varied. The average size of the NICE application-layer multicast protocol achieves the group was 512. The average loss rate for physical good delivery ratios for a best-effort scheme, and is links for this experiment was 0.5%, which corresponds comparable to other best-effort application-layer multi- to between2-5% end-to-endlossesoneach overlaylink. cast protocols, e.g. Narada [8]. Therefore, we believe We plot the data delivery ratios as the group change that PRM-enhancements can signi(cid:2)cantly augment the rateisvariedbetween0and10changespersecond.Note data delivery ratios of all such protocols. An FEC-based thateven5changespersecondimpliesthat512(whichis scheme is typically able to recover from all network also the size of the group) membership changes happen losses. However changes in the overlay data delivery in less than two minutes! While such a high change rate path signi(cid:2)cantly impacts the performance of an FEC- is drastic, it is not improbable for very large distribution based scheme. Note that the performance of the FEC- groups in the Internet. The PRM scheme is able to based scheme degrades with increasing frequency of recover from a vast majority of these losses through the groupchangesandevenfallsbelowthesimplebest-effort use of randomized forwarding mechanism. The delivery scheme for high group change rates. \(cid:148)(cid:135)(cid:133)(cid:149)bZ ratioforPRM-(3,0.01)is foragroupmembership In Figure 7 we compare the delivery ratio of the \(cid:150)(cid:147)YS[Z change rate of 5 per second and for a group different schemes as the average packet loss rate on 8 link loss=0.5%, group change=1/second, deadline=8 sec, 512 hosts link loss=0.5%, group change=0.1/sec, deadline=8 sec, 512 hosts 1 1 0.8 0.8 o ery Rati 0.6 CDF 0.6 Deliv 0.4 0.4 BE 0.2 0.2 FEC(16,16) PRM 3,0.01 PRM 3,0.01 HHR BE FEC(128,128) 0 0 0 10 20 30 40 50 60 70 80 90 100 0 200 400 600 800 1000 1200 1400 Sequence Number Avg. Latency observed at nodes (in milisecond) Fig. 8. Time evolution of the delivery ratio for a sequence of data Fig. 10. Cumulative distribution of average latency for packets packets. received at the different nodes. link loss=0.5%, group change=1/sec, deadline=8 sec, 512 hosts link loss=0.5%, group change=0.1/sec, 512 hosts 1 1 0.8 0.9 o 0.6 ati 0.8 CDF ery R 0.4 eliv 0.7 D PRM 3,0.01 0.2 0.6 PRM 1,0.01 HHR PRM 3,0.01 FEC(N,N) BE BE 0 0.5 0 10 20 30 40 50 60 0 1 2 3 4 5 6 7 8 Maximum Gap (in second) Deadline (in second) Fig.9. Cumulativedistributionofthemaximumtimegapoverwhich Fig.11. Deliveryratioachievedwithvaryingdeadlinewithinwhich an overlay node lost all data packets. the data packets are to be successfully delivered. the physical links of the topology are varied. In this This is a signi(cid:2)cant improvement over the best-effort experiment, we use a lower failure rate, and changes to protocol where more than 20% of the overlay nodes the overlay topology (including both joins and leaves) experience a maximum data outage of more than 30 occur with a mean of 0.1 changeper second. In contrast seconds. to the other schemes,which suffer between 20% to 55% Delivery Latency: In Figure 10 we show the dis- losses, the PRM-(3,0.01) scheme achieves near perfect tribution of latency experienced by the data packets data delivery under all data loss rates. at the different overlay nodes. In this experiment, the In Figure 8 we show how the delivery ratio achieved average group membership change rate was 0.1 per by the PRM scheme evolves over time in comparison second and the average loss probability at the physical to the best-effort protocol. In this experiment, the group links was 0.5%. Note that the latency of data packets changeratewas oneper second,i.e. the entire groupcan is the lowest for the best-effort NICE protocol. This potentially change in less than 10 minutes. In the best- is because the best-effort scheme incurs no additional effort protocol, for any particular packet, 20(cid:150)40% of the delay due to timeout-based retransmissions or delivery group members fail to receive it. In contrast, the losses using alternate longer overlay paths. FEC-(16,16) incurs experienced in the PRM scheme are minimal. For the a slightly higher latency. This is because, to recover a same experiment, we plot the cumulative distribution of lost datapacketinthe FEC-basedschemehastowaitfor the maximum data outage period of the overlay nodes additional (encoded) packets to arrive. FEC-(128,128) in Figure 9. Most overlay nodes had no signi(cid:2)cant data can decode data only after suf(cid:2)ciently many packets outageperiodinPRMandmorethan98%oftheoverlay are received, and as a result, it incurs highest latency. nodeshadamaximumoutageperiodlessthan5seconds. The HHR scheme also suffers from high latency since 9 scheme it is purely reactive; in our simulations of HHR, packet changes/sec, Delivery Ratio losses due to node failure are often detected after the deadline (sec) 80% 85% 90% 95% 99% tree is recovered by underlying multicast protocol. The FEC,0.1, 0.5 88-100 - - - - PRM scheme is a combination of proactive and reactive FEC,0.1, 2.0 62-75 - - - - FEC,0.1, 8.0 50-62 75-87 - - - schemes and therefore incurs signi(cid:2)cantly lower latency FEC,0.1, 64.0 37-50 50-62 75-87 87-100 - than the HHR scheme. However data packets delivered PRM,1, 0.2 9-12 18-21 21-24 30-60 - using PRM still incur higher latency than the simple PRM,1, 0.5 0-1 1-3 3-6 9-15 30-60 PRM,1, 2.0 0-1 0-1 0-1 0-1 3-9 best-effort delivery. This is because many of the data PRM,1, 8.0 0-1 0-1 0-1 0-1 1-3 packets that are lost on the shortest best-effort path TABLEI are successfully delivered either using Triggered NAK- based retransmissionsor randomized forwarding using a COMPARISONOFADDITIONALDATAOVERHEADS(IN%) longeroverlaypath.Morethan90%oftheoverlaynodes REQUIREDFORPRMANDFEC-BASEDSCHEMESTOMEET receive data packets within 500 ms, which is 2.5 times DIFFERENTDELIVERYRATIOSFORSPECIFICGROUPCHANGE the worst case overlay latency on the topology. RATESANDLATENCYBOUNDS.WEDONOTREPORTRESULTS InFigure11weshowtheeffectofimposingadeadline WHENTHEOVERHEADSEXCEED100%(MARKEDBY-). for packet delivery. The deadline speci(cid:2)es a time upper boundwithinwhichapacketmustreachanoverlaynode 2 Group Control Overheads (pkts/sec/node) Delivery ratio tobebeusefultotheapplication.Foradeadlineof sec- Size BE PRM BE PRM onds, we allow different overlay nodes to have slightly 128 1.43 2.03 0.58 0.99 differentadditionalslacksinthetimeupperboundwithin 256 1.67 2.22 0.57 0.99 512 1.62 2.21 0.59 0.99 whichpacketsmustbedelivered.Thisslackistoaccount 1024 1.73 2.32 0.49 0.97 for the minimum latency that data packets encounter 2048 2.17 2.70 0.43 0.97 on the shortest path from the source to that overlay 4096 2.50 3.19 0.43 0.96 node. The maximum slack for any overlay node was 8192 3.65 4.56 0.40 0.96 lessthan200ms.Thebest-effortNICEprotocolmakesa TABLEII single attempt for packet deliveryand therefore achieves COMPARISONOFBEST-EFFORTANDPRM-ENHANCEDNICE almost identical delivery ratio for all deadlines. The PROTOCOLSWITHVARYINGGROUPSIZES.WEUSEPRM-(3,0.01) performance of the HHR scheme increases gradually INTHESEEXPERIMENTS. with increase in the deadline due to its dependence on timeouts. In contrast, for short deadlines, the PRM schemeachievesrapidimprovementsduetoitsproactive component and further gradual improvements for longer fact for most of the scenarios shown in the table, PRM deadlines due to its reactive component. For the FEC- requires overheads less than 10% to meet the desired (cid:136)(cid:151)GL(cid:29) based scheme we used and chose the value of deadline and delivery ratios. PRM requires higher over- (cid:136) based on the deadline imposed. It achieved between headsfor onlythe verystringent deadlineof 200 ms and 80-87% delivery ratios in these experiments. to achieve 99% delivery ratio for a 500 ms deadline. As Additional Data Overheads: In Table I, we com- is evident from the table, FEC-based schemes require pare the overheads of PRM and FEC-based schemes far higher overheads even for much lower group change to achieve different delivery ratios. The table shows rates (0.1 per second). the additional data overheads for both schemes under Scalability: In Table II we show the effect of different parameters (e.g. latency bounds, group change multicast group size on control overheads and delivery rate,etc.).TheFEC-basedschemesperformpoorlywhen ratio. In this set of results, the group change rate was the frequency of changes on the overlay is high. Hence, 0.2% of the group size. For example, for 512-node weusedanorderofmagnitudelowergroupchangerates groups, the change rate is 1 node per second, while for (0.1 changes/sec) for the FEC-based schemes than what the 8192-node case, it is 16 nodes per second. In the we used for PRM (1 change/sec). original best-effortNICE application-layermulticast, the The table shows that PRM incurs very low additional controloverheadsatdifferentoverlaynodesincreaselog- data overheads to achieve relatively high delivery ratios arithmically with the increase in group size. The control within low latency bounds. For example for a group overheads for the PRM-enhanced NICE are higher, due change rate of one per second and data latency bound totheadditionalmessagessuchasrandomnodeDiscover of 0.5 seconds,the PRM scheme incurs 3-6% additional messages and NAKs. However the amount of increase data overheads to achieve data delivery ratio of 90%. In at each overlay node is less than one control packet 10 per second, which is negligible in comparison to data Second, all the network-layer multicast based schemes rates that will be used by the applications (e.g., media described employ completely reactive mechanisms for streaming).Wealsoobservethatthedatadeliveryratioof providing data reliability and therefore incurs moderate the PRM-enhancedNICE protocolishighacrossvarious or high delivery latencies. As we show in this paper, group sizes. proactive mechanisms, e.g. randomized forwarding, can Loss Correlation and EGF: We brie(cid:3)y describe be used to signi(cid:2)cantly improve resilience for applica- other experiments to demonstrate the bene(cid:2)ts of the tions that require low latency data delivery. two proposed extensions to the basic PRM scheme. We PRM is not the only proactive approach to pro- simulated some speci(cid:2)cpathologicalnetwork conditions videimprovedreliability performancefor multicast data. that led to highly correlated losses between large groups There exists some well-known forward error correcting of members; here use of the loss correlation technique code based approaches that are also proactive in nature. improved data delivery rates by up to 12%. For example, Huitema [13] had proposed the use of EGF is bene(cid:2)cial under high frequency of group packet level FECs for reliable multicast. Nonnenmacher changes.For the experiment with 512 overlaynodesand et. al. [22] studied and demonstrated that additional 10 groupchangesper second,EGF canimprovethe data bene(cid:2)ts can be achieved when an FEC-based technique delivery ratio from about 80% (see Figure 6) to 93%. is combined with automatic retransmission requests. Note that under these circumstances 512 changes (i.e. APES uses a related approach for data recovery [27]. same as the size of the group) to the group happen in Digital Fountain [4] and RPB/RBS [20] are two other less than a minute. However, it also increases duplicate ef(cid:2)cientFEC-basedapproachesthatprovidesigni(cid:2)cantly packets on the topology by nearly 10%. improved performance. All these FEC based approaches can recover from network losses. However, they alone V. RELATED WORK are not suf(cid:2)cient for resilient multicast data delivery A large number of research proposals have addressed when overlays are used. Overlay nodes are processes reliable delivery for multicast data, most notably in on regular end-hosts and are more prone to failures the context of network-layer multicast. A comparative than network routers. FEC-based approaches are not survey of these protocols is given in [17] and [28]. In suf(cid:2)cient to recoverfrom losses due to temporary losses SRM [9] receivers send NAKs to the source to indicate onthe data path,especiallywhenlow-latencydeliveryis missing data packets. Each such NAK is multicast to required. The PRM scheme differs from all these other the entire group and is used to suppress NAKs from schemesbyprovidinga proactivecomponentthat allows other receivers that did not get the same packet. In this the receivers to recover from losses due to overlay node approach, however, a few receivers behind a lossy link failures. In TableIII we summarizethe characteristicsof can incur a high NAK overhead on the entire multicast all these schemes. group. A different randomized error recovery scheme called Tree-based protocols provide another alternative solu- RRMP is proposed in [29]. RRMP differs from PRM in tion for reliable and resilient multicast. In this approach that nodes detecting losses send randomized repair re- the receivers are organized into an acknowledgmenttree quest (reactively) while PRM uses proactiverandomized structure with the source as the root. This structure is forwarding. Gossiping [11] and PRM both use random- scalable because the acknowledgments are aggregated ized forwarding. However, randomized forwarding in along the tree in a bottom-up fashion and also allows PRM only providesadditional resilience to data delivery local recovery and repair of data losses. Protocols like along multicast tree, while gossiping only uses commu- RMTP [24], TMTP [30], STORM [26], LVMR [19] nication with random peers. Also related is dispersity and Lorax [18] construct this structure using TTL- routing scheme [21], where additional messages are scoped network-layer multicast as a primitive. In con- sent using different routes. The difference is that PRM trast, LMS [23] uses an additional mechanism, called forwardsdatapacketsthroughrandomizedroutesinstead directed subcast, to construct its data recovery structure. of a (cid:2)xedset of paths betweensource and destination as Our work differs from of all these above approaches in dispersity routing. in two key aspects. First, unlike all these protocols Since PRM was (cid:2)rst published [2], there have been that employ network-layer multicast service for data two interesting pieces of related work. SplitStream [6] distribution our scheme is based upon an application- 2Although FEC-based schemes can be implemented over layer multicast delivery service. To the best of our application-layer multicast,asthispaper shows, italone isnotsuf(cid:2)- knowledge the PRM scheme is the (cid:2)rst application- cient to achieve high delivery ratios even under moderate frequency layer multicast based scheme that addresses resilience. of membership changes on the overlay.

Description:

1 Resilient Multicast using Overlays Suman Banerjee, Seungjoon Lee, Bobby Bhattacharjee, Aravind Srinivasan AbstractŠWe introduce PRM (Probabilistic Resilient

Suman Banerjee, Seungjoon Lee, Bobby Bhattacharjee PDF

14 Pages·2011·0.25 MB·English

Checking for file health...

Save to my drive

Quick download

Download

Upgrade Premium

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Suman Banerjee, Seungjoon Lee, Bobby Bhattacharjee

Description:

1 Resilient Multicast using Overlays Suman Banerjee, Seungjoon Lee, Bobby Bhattacharjee, Aravind Srinivasan AbstractŠWe introduce PRM (Probabilistic Resilient

See more

The list of books you might like

Upgrade Premium

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.