Table Of Content

1 Selection of network coding nodes for minimal playback delay in streaming overlays Nicolae Cleju∗, Nikolaos Thomos† and Pascal Frossard† †Signal Processing Laboratory (LTS4), Ecole Polytechnique Fe´de´rale de Lausanne (EPFL), Lausanne, Switzerland ∗The “Gheorghe Asachi” Technical University of Iasi, Iasi Romania. [email protected], {nikolaos.thomos,pascal.frossard}@epfl.ch 1 Abstract—Networkcodingpermitstodeploydistributedpacket max-flow min-cut bound of the network graph. Overall, the 1 deliveryalgorithmsthatlocallyadapttothenetworkavailability network coding systems have shown improved resiliency to 0 in media streaming applications. However, it may also increase dynamics,delays,scalabilityandbuffercapacitiesinnetworks 2 delay and computational complexity if it is not implemented with diversity [3]. efficiently.Weaddressheretheeffectiveplacementofnodesthat n implement randomized network coding in overlay networks, so Theapplicationofnetworkcodingalgorithmsinmultimedia a J that the goodput is kept high while the delay for decoding stays streamingsystemsishowevernotstraightforward.Specifically, small in streaming applications. We first estimate the decoding multimedia streaming imposes strict timing constraints that 0 delay at each client, which depends on the innovative rate in 2 impact the design of network coding algorithms. A practical the network. This estimation permits to identify the nodes that network coding system has been presented in [4] and ad- have to perform coding for a reduced decoding delay. We then M] propose two iterative algorithms for selecting the nodes that dresses the specific characteristics of streaming applications. shouldperformnetworkcoding.Thefirstalgorithmreliesonthe It implements randomized network coding (RNC) techniques M knowledge of the full network statistics. The second algorithm [5] in the network nodes and devises a protocol to deal with usesonlylocalnetworkstatisticsateachnode.Simulationresults bufferingissuesandtimingconstraints.Moreover,itintroduces s. show that large performance gains can be achieved with the the concept of generations that restricts coding operations c selection of only a few network coding nodes. Moreover, the [ secondalgorithmperformsverycloselytothecentralestimation to packets that share similar decoding deadlines. However, strategy, which demonstrates that the network coding nodes can networkcodingsystemsstillfaceimportantissuesinpractical 1 beselectedefficientlyinadistributedmanner.Ourschemeshows systems due to the decoding delays imposed by successive v large gains in terms of achieved throughput, delay and video 9 network coding operations. This delay as well as the com- quality in realistic overlay networks when compared to methods 7 putational overhead in the system grow with the number of that employ traditional streaming strategies as well as random 9 network nodes selection algorithms. networkcodingnodes.Itbecomesthereforeimportanttoselect 3 efficiently the subset of nodes that perform network coding . Index Terms—Network coding, delay minimization, through- 1 in order to control delay and complexity and still exploit put maximization, overlay networks. 0 efficiently the diversity in the network. 1 In this paper, we discuss solutions for the selective place- 1 I. INTRODUCTION ment of a few network coding nodes in order to reduce the : v Therecentdevelopmentofoverlaynetworksoffersinterest- delay for video delivery. The nodes in the network are cate- i ing perspectives for multimedia streaming applications, since X gorized into network coding (NC) and store and forward (SF) network diversity can be used advantageously for improved nodes. The network coding nodes use the practical network r a quality of service. The traditional streaming systems based coding algorithm described in [4], which has been selected on ARQ or channel coding techniques however generally for its effectiveness and simplicity. Similarly, we adopt the fail to efficiently exploit this diversity. They either suffer conceptofcodinggenerationandbuffermodels[4]forproper from relatively high computational cost, require coordination handling of the timing constraints in the stream delivery. We between network nodes or lead to suboptimal performance first build on our previous work [6] and estimate the rate of in large scale networks where local channel conditions are non-redundant packets in each network node. This rate is an hard to estimate. A different paradigm has been initiated indication of the goodput of the system as it measures the recently with network coding [1], [2], where some processing number of useful and non-redundant packets that are received is requested from the network nodes in order to improve atanode.Itpermitstoestimatethedecodingdelayintheclient the packet delivery performance. Specifically, network coding nodes; this corresponds to the time necessary for collecting nodes combine buffered packets before forwarding them to enough useful packets to build a full rank decoding system. next hop nodes. This coding strategy is particularly appealing We use the delay estimation to select the subset of nodes in distributed streaming systems, as it removes the need for that should implement network coding such that a maximal reconciliationbetweennodes.Itlocallyadaptstotheavailable goodput or a minimal delay is attained. We propose two bandwidth and packet loss rate and even permits to approach algorithms that iteratively choose the SF nodes to be turned into NC nodes for improving the system performance. Both This work has been partly supported by the Swiss National Science Foundation,undergrantPZ00P2-121906. algorithms differ in their view of the network status. The first 2 Client algorithmassumesthatacentralnodehascompleteknowledge about the status of the overlay network in terms of available bandwidth and packet loss rate. The second algorithm only uses local estimations of the network status at each node and Server SF provides a solution for distributed systems. The simulation SS results show that the proper selection of only a few network NC codingnodesalreadyleadstothroughputgainsthatcomeclose Server to max-flow min-cut bound and greatly decrease the delay Source SS necessary for media stream delivery. Moreover, the algorithm SF that only considers local network statistics performs very Lossy packet network competitively with the algorithm that uses full knowledge of Client the network topology. Both algorithms even select the same nodes for network coding in most of the cases. Furthermore, theybothoutperformbasicstreamingalgorithmsbuiltonstore Fig. 1: Illustration of a system for streaming on overlay and forward approaches as well as solutions where network networks. Multiple streaming servers (SS) send information coding nodes are selected randomly. These observations are to clients on a lossy packet network via intermediate nodes confirmed inrealistic overlaynetworks whereour methodcan that can be either network coding (NC) or ’store and forward’ improve users’ video experience even in the case where only (SF) peers. fewnodesimplementrandomizednetworkcoding.Thisisdue to a good balance between decoding delay and efficient use of the network diversity. Finally, minimal network knowledge based packet delivery strategy that involves lower communi- is often sufficient for determining the efficient positioning of cation and coordination overhead than a pull-based solution. the network coding nodes. We consider that the servers can also implement randomized Thepaperisorganizedasfollows.InSectionII,wepresent coding on the source packets for improved robustness. The the framework under consideration and briefly overview the coded packets are then pushed to the clients through the network coding principles. In Section III, we describe the successive intermediate nodes. Finally, the clients perform model of the SF node buffer that is eventually used for delay network decoding after receiving enough packets to build a computation. Then we present in Section IV a methodology full rank decodable system of packets. for estimating the useful flow rate in the network nodes TheSFnodessimplysendateachtransmissionopportunity as well as the decoding delay. The centralized and semi- the first packet in their buffer, which has not been sent distributed algorithms for selecting the network coding nodes previously.Thebufferismanagedinafirst-in-first-outmanner, are presented in Section V. Simulation results are proposed in where the oldest packets are replaced by new ones when the Section VI where the benefits of the proposed algorithms are buffer is full. When the outgoing bandwidth is larger than evaluated for video streaming applications in various realistic the incoming capacity, a SF node sends random replicates of networkcases.Finally,therelatedworkisdiscussedinSection packets from its buffer. On the other hand, the intermediate VII and conclusions are drawn in Section VIII. nodes that perform network coding combine randomly the buffered packets in order to generate network coded packets II. NETWORKCODINGFRAMEWORK that are further transmitted to neighbor nodes. As suggested in [4], the NC nodes first check whether the received packets We consider a streaming system that consists of servers, are innovative, where innovative packets characterize pack- clients and intermediate nodes, as illustrated in Fig. 1. The ets carrying novel information. Non-innovative packets are overlay network offers source and path diversity, which can discarded immediately as they do not increase the symbol be efficiently exploited with network coding techniques that diversity into the network. Then the NC nodes randomly randomly combine packets in the nodes. This increases the combine the remaining packets with coding operations based packet diversity in the network and leads to efficient exploita- on randomized network coding (RNC) [7]. It is a simple and tion of the channel resources without the need for complex efficient network coding solution in distributed systems. RNC scheduling or nodes coordination mechanisms [1]. The net- codesworksimilarlytoratelesscodes[8],[9]andcangenerate work is modeled by a directed acyclic graph G = (V,E) an arbitrary number of coded packets from a given set of whereV isthesetofnetworknodesandE isthesetofedges source packets. It provides a means for simple bandwidth (links)inthenetwork.Eachnetworklinkbetweennodesuand adaptation. v is characterized by a bandwidth b (expressed in terms of u,v Formally,thenetworkcodingoperationsperformedinapeer packets per second) as well as a packet loss rate π . We u,v node can be written as follows. A NC node u generates M assume that all servers transmit the same multimedia content packets by RNC. The mth network coded packet c is of the m to clients via intermediate nodes that could either be network form coding (NC) or “store and forward” (SF) nodes. We consider (cid:88) c = f ·p (u) m m,i i thattheintermediatenodesarenotnecessarilyinterestedinthe transmitted content, but rather act as helper nodes and assist pi(u)∈N(u) the packet delivery system. The system implements a push- where N(u) corresponds to the set of packets of the same 3 p1 f’1 c1(p1, p2, p3) p2 overlay node p3 f’2 c2(p1, p2, p3) Fig. 2: A NC node combines incoming packets p and gener- i ates network coding packets c . A header f(cid:48) is appended to m m each coded packet and carries the coding coefficients. generation that are available at node u, p (u) denotes either i a network coded packet or a native (uncoded) packet, and f is a random coefficient over the Galois field of size q, m,i GF(q).ThebasisoftheGaloisfieldistypicallysettoq =256, as it has been shown in [4] that this guarantees high symbol diversityandlowprobabilityofbuildingduplicatepackets.As the packets combined in a node are actually combinations of theoriginaldatapackets,theencodedpacketscanbeexpressed as a function of the native packets N Fig. 3: Packet replications procedure followed in a SF node (cid:88) (cid:48) cm = fm,i·ni (1) after the reception of the hth innovative packet. i=1 where N is the total number of native packets, e.g., for video transmission N can be the number of video packets. The packetsinthenetworkbycodingoperations.Thisisnecessary parametersn andf(cid:48) representrespectivelythenativepack- asthereisanon-zeroprobabilityforthereceptionofduplicate i m,i ets and their corresponding coding coefficients after random packets in the network nodes when the network is mostly networkcodingoperations.Itisworthnotingthatsomeofthe composed of SF nodes. These duplicates can be generated by coefficients f(cid:48) can be zero, which means that c does not a node that does not receive enough diverse packets, or from m,i m contain information about the native packet n . As the coding different nodes that independently transmit identical packets. i coefficientsarechosenrandomly,aheaderofconstantlengthis Theseduplicatepacketslowersignificantlythestreamdelivery appended to each packet with coefficient information, so that performance especially in networks containing bottlenecks. the received can decode the stream and recover the original However,thecarefulplacementofafewnetworkcodingnodes data packets. A network coded packet is thus augmented in the overlay can help to reduce the number of duplicates in with a header containing the vector of coding coefficients. the network. If the number of network coding nodes however Fortunately,theheaderdoesnotgrowwiththenumberofhop becomestoolarge,theprobabilityfortherandomizednetwork transmissionsduetoEq.(1).Theencodingprocedureinapeer codingoperationstogenerateduplicatepacketsbecomesagain node is depicted in Fig. 2. non-negligible.Thisisespeciallythecaseifcodingoperations The decoding operations at the client basically consist in arerestrictedtosmallgenerationsduetodelayconstraints.As solvingthesystemofequationsthatcorrespondtothenetwork redundancy,delayandcomputationaloverheadmightincrease coding operations. Upon collecting a network coded packet, with the number of coding nodes, it becomes quite apparent the client stores it in a buffer and adds a line into a matrix F that efficient systems should not implement network coding that contains the coding coefficients. When a full rank system in every overlay node. Instead, one has to find an effective is collected, the original packets are reconstructed by solving placement of network coding nodes in order to fully exploit the following equations the network diversity in overlay streaming applications.  c   f(cid:48) ··· f(cid:48)   n  1 1,1 1,N 1 c=F·n=⇒ ... = ... ... ... · ...  III. SFBUFFERMODEL We provide in this section a buffer model for the SF nodes, cN fN(cid:48) ,1 ··· fN(cid:48) ,N nN which forward and possibly replicate packets if the outgoing where c and n are respectively vectors with the coded and bandwidth is sufficient. The buffer model is used to estimate source packets. The solution of the equations system is typi- therateofreplicatedpackets,whichisanimportantparameter cally computed by gaussian elimination [4]. in the computation of the decoding delay in the receivers. The proposed streaming system leads to the following As illustrated in Fig. 3, we consider that each SF node has observations. First, the network coding nodes act somehow two buffers of capacity h (in packets): the Main Buffer (MB), similarly to sources in the sense that they refresh the set of where the incoming packets are stored, and the Copies Buffer 4 (CB) where copies of the packets that have been recently replicated, since they do not traverse fully the buffer. If transmittedarestored.BothbuffersfollowaFIFOmodelasthe k(cid:48) = k−K denotes the position of the packets in CB when oldestpacketsareoverwrittenbythenewoneswhenthenode’s the buffer is flushed, the number of copies of the late packets buffer capacity is exceeded. In addition, since our system can then be written as workswithdeadline-constraineddata,thepacketsareremoved from the buffers when their decoding deadline expires. h−k(cid:48)+1 The buffering process works as follows: when a packet Rk(u)=1+(R(u)−1)· h , fork ∈[K+1,|N(u)|], arrives in a SF node, it is stored in MB. When the SF node (4) has a transmission opportunity, a packet from MB is sent and where |N(u)| is the number of packets of the same coding thus removed from MB. A copy of this packet is kept in CB. generation that reach the node u. Whenever MB is empty and the node has other transmission In summary, two main factors affect the packet replication opportunities,itrandomlyselectsapacketfromCBandtrans- rate,theFIFObehavioroftheCBbufferandtheexpirationof mits it. In other words, the node transmits packet replicates thedecodingdeadlinethatcausesthedeletionofpacketsinthe when the outgoing bandwidth is sufficient. The packets in buffer. Thus, the first h−1 packets are replicated more than CB are overwritten after some time by newer packets. In average and the last packets are replicated less than average, our model, if a SF node does not have sufficient outgoing while the intermediate packets have constant replication rate. bandwidthforreplication,itdoesnotuseCB.Alternatively,if Finally, it should be noted that, depending on the bandwidth the outgoing bandwidth is large, CB is used extensively and value and the delay constraints, there are situations where the MB is often empty. buffer does not reach the stationary regime of Eq. (3) and We are now interested in computing the number of packet the computation of the number of replicates shall be adapted replicates generated by the SF node u under the proposed accordingly. buffer model. A priori, the average number of replicates per Weusetheabovebufferingmodeltocomputeanequivalent packet R(u) at node u is given by the ratio of outgoing and packetreplicationrateRˆ(u)forallpacketsinaSFnode,which incoming bandwidths b (u) and b (u) respectively. We can is more precise than the average value R(u) = bo(u). The write R(u)= bo(u). Hoowever, the pirobability for a packet to equivalent replication rate is estimated so that the nbiu(mu)ber of bereplicateddebpi(eun)dsontheorderofitsarrivaltotheSFnode. packets at the client c is preserved with respect to the case Typically, a packet that arrives early spends more time in the where the packet replication rate is computed independently buffer and thus has a higher probability to be replicated than for each packet. We assume that each packet travels indepen- a packet that arrives late and close to the decoding deadline. dently to the client c, and we pose the following equivalent Wethusconsiderthreecasesdependingontheorderofarrival condition: of the packets. We denote the arrival by the position k of a packet in a coding generation. The first case includes the |N(u)| earliest packets, which reach the node while CB is not full. (cid:88) (cid:110)1−((cid:15) (u))Rk(u)(cid:111)=|N(u)|·(cid:110)1−((cid:15) (u))Rˆ(u)(cid:111), c c Every packet is replicated with a uniform probability in CB, k=1 but the packets that stay longer in the buffer have a higher (5) chance to be replicated. The number of copies for the kth where (cid:15)c(u) is the probability of loosing a packet between packet is thus given by: the node u and the client c. The number of copies Rk(u) is computed from Eqs. (2), (3) and (4). Rewriting the above equation, we can express the equivalent replication rate as (cid:32) h (cid:33) (cid:88) 1 1 Rk(u)=1+(R(u)−1) x +(k−1)h , fork ∈[1,h] log(cid:18)1− (cid:80)k|N=(1u)|{1−((cid:15)c(u))Rk(u)}(cid:19) x=k |N(u)| (2) Rˆ(u)= . (6) log((cid:15) (u)) The second case corresponds to packetsthat reach the node c whileCBisfull.WhenCBisfull,eachnewpacketoverwrites We use this replication rate estimate in the computation of theoldestpacketinCB.EachpackethasalifetimeinCBthat the decoding delay in the next section. corresponds to the time necessary to collect h new packets in the SF node. The replication probability is equal to 1/h IV. DELAYANALYSIS and the number of copies is then equivalent to the average A. Estimation methodology replication rate in the node. In this stationary mode, we have Our objective is to minimize the decoding delay by the proper placement of NC nodes in the overlay. The decoding R (u)=R(u), for k ∈[h+1,K], (3) k delay is the time required to gather a sufficient number of where K is the number of packets that fully traverse CB innovative packets for decoding. In the analysis below we until the head of the buffer’s queue. Finally, the third case restrict our attention to cases where a full rank system is corresponds to the packets that do not spend a full lifecycle built at decoder and estimate the delay necessary for this in CB due to the expiration of the decoding deadline. When situation to happen. We further construct our analysis for one the decoding deadline expires, CB is flushed, and the packets codinggeneration,whiletheextensiontomultiplegenerations in the buffer at that moment have less opportunities to be is straightforward. The decoding delay depends on the rate of 5 (a) (b) (c) (d) (e) Fig. 4: Flow decomposition of the network graph by the proposed algorithm. The top (red) node and the bottom (green) node are respectively the source and the client of the considered topology. The SF and NC nodes are represented respectively by the big and small black nodes. Flow from the (a) source, (b) first NC node, (c) second NC node, (d) third NC node, and (e) fourth NC node. innovative packets at the client. The innovative rate increases and we define b monotonically with the number of useful packets at the client ρ = u,v u,v (cid:88) [4], whichcorrespondsto thenumber ofdifferent packetsthat b u,v reach the client. Hence, a higher rate of useful packets leads v∈D(u) to a smaller decoding delay. as the probability that a packet transmitted by node u is Inordertocomputethedecodingdelay,weconsiderthatSF forwarded to a descendant node v ∈ D(u). In addition, we nodes replicate packets in case of large outgoing bandwidth. write the probability that a packet is deleted in a node due to We further consider that new packets are generated only at buffer overflow as sources and NC nodes. We treat these nodes independently in (cid:40)1− bo(u), b (u)<b (u) the computation of the delay at the client as illustrated in Fig. β(u)= bi(u) o i 0, b (u)≥b (u) 4. We assume that the probability of generating two identical o i packets in the sources or NC nodes is negligible due to the Recall that b (u) and b (u) are respectively the cumulative o i largesizeoftheGaloisfield.Inmoredetails,wefirstestimate incoming and outgoing bandwidth of node u. Then, a packet the delay noticed by the client when packets are sent from a sent by node u might not reach the client c due to one of given source through all the paths connecting this source to the following three causes. The packet can be lost during its the client, except for the paths that traverse NC nodes (see transmissiontothechildnodev oritcanbelostatthenodev Fig. 4a). Next, we consider the NC node that is the closest duetobufferoverflow.Finally,itcanbelostalongwithallits to the source and all the paths that connect it to the client, possible copies during the transmission from the child node v except for those passing through other NC nodes (see Fig. to the client c. Overall, the probability (cid:15) (u) is given as c 4b). Similarly, all other NC nodes are considered only when all their parent NC nodes have been visited. This procedure is (cid:88) (cid:15) (u)= ρ ·{π +(1−π )·β(u)} repeatedfortheunprocessedNCnodesandthecorresponding c u,v u,v u,v graphsareshownrespectivelyinFigs.4c,4dand4e.Thetotal v∈D(u) (7) delay is computed under the assumption that all sources and + (cid:88) ρ (1−π )·(1−β(u))·(cid:15) (v)Rˆ(v). u,v u,v c NCnodessendindependentstreams.Equivalently,weassume v∈D(u) that the total useful flow is equal to the sum of the useful The probability (cid:15) (u) can be computed recursively back- flows generated by the source and all the NC nodes. c wards starting from the clients up to the server or NC nodes. As we consider lossy network paths, we have to consider Specifically, we first set to zero the loss probabilities for all that some of the packets generated by network nodes do not the clients. Then, all nodes in the directed acyclic subgraph reach their destination. We thus estimate the probability (cid:15) (u) are visited backwards. Each node is visited only when all its c that a packet sent by the node u does not reach the client c. children nodes have been processed already. Once (cid:15) (u) is c Thisprobabilityiscomputedonthesubgraphthatcontainsall known,wecancomputetherateofusefulpacketsreceivedby the paths connecting the node u with the client c but excludes the client c from a source or a NC node by multiplying the the paths that traverse other NC nodes since these typically respectiveoutgoingratebytheprobabilityofcorrectreception alterthesetofreceivedpackets.Thecorrespondingsubgraphs 1−(cid:15) (u). c are colored in red in Fig. 4. We denote by D(u) the set of Next, we define N (u), the number of packets received by c children of node u (excluding the NC nodes) in the subgraph any node u that are potentially useful for the client c. These 6 packets correspond to the data that reaches the node u and procedure is applicable in our framework as we consider the contains information that is potentially useful for the client iterativeselectionofnetworkcodingnodes.Nodesarechecked c. It depends on the paths connecting the source nodes and in a greedy way and the algorithm improves at each stage the NC nodes to the node u, i.e., the subgraphs colored in the current solution by the selection of the additional network green in Fig. 4. It also depends on the set of SF nodes on codingnodethatbringsthelargestdelayreduction.Theoverall these paths. The rate of useful packets transmitted by the algorithm typically converges only after a few iterations. sources corresponds to their outgoing bandwidth, as they are We describe now the delay estimation algorithm in more able to generate any number of different packets via network details. The steps 7-11 of Algorithm 1 correspond to the coding. However, the useful rate in NC nodes may be smaller estimation of the useful rate N (u) in NC nodes, which is c than their outgoing bandwidth, as they may have only part of necessaryforcomputingtheinitialreplicationrateofthenode the source information in their buffer. Finally the number of u.Asthisrateisdifficulttoestimateinadirectway,wechoose useful packets in SF nodes cannot be larger than the number a differential method by comparing the delay t (u) observed c of incoming packets. It is however difficult to estimate in a by the client when the node u is active and respectively direct way, so that the estimation of the delay based on the silent. From the delay difference ∆t (u) we compute the rate c usefulratesentbynetworknodesbecomeshard.Wetherefore difference ∆N (u) that is the useful rate at node u. Finally, c propose below to compute the delay in a recursive manner. the rate N (u) of the packets at node u that are useful for the c client c is computed by solving B. Decoding delay (cid:40)N (u)·(cid:16)1−(cid:15) (u)Rˆ(u)(cid:17), b (u)>b (u) c c o i In this section, we estimate the delay t at a client node ∆N (u)= c. The decoding delay depends on the ratecof useful packets c Nc(u)· bboi((uu)) ·(1−(cid:15)c(u)), bo(u)<bi(u) received from the multiple sources or network coding nodes. (8) We estimate the time necessary to form a system of full rank where the first and second conditions correspond to the cases G at the decoder, where G is the generation size in packets. when the node u in SF mode has a small incoming, respec- In practice, the client might need to collect a slightly larger tively outgoing bandwidth. Note that, when the network does number of packets, G˜ =G/(1−x) [4] for forming a system not contain any NC nodes, the rate of useful packets that are with G innovative packets. This is due to the possibility that transmitted is simply equal to the output bandwidth of the useful packets from a source might still be redundant and not sources. In this case, the useful rate received by the client is completelyindependentofpacketsgeneratedbyothersources. simply Nc(s)=bo(s)·(1−(cid:15)c(s)). Theexactvalueoftheoverheadfactorxdependsonthecoding Then we compute the delay due to packets sent by the system(e.g.,itcanbeupper-boundedby1/qforRNC,whereq different sources and NC nodes in the network. We consider istheGFsize).However,ouranalysisisrelative,andcompare two cases. First, we consider the NC nodes that have limited different configurations to select the option that leads to the incomingbandwidth(i.e.,Nc(u)<bo(u)inline12orAlg.1) minimal decoding delay. It becomes therefore equivalent to andthesourceswithoutgoingbandwidthlargerthanthesource work with G or G˜ since the solutions that lead to the fastest rate.Theprobabilityofgeneratingusefulpacketsinsuchnodes delivery of G and respectively G˜ packets are identical. We evolvesasthebufferfillsin.Westartbyestimatingthenumber choose to work with G in the rest of this section. of different packets received by the client c when the node We compute the average decoding delay by first estimating u is the only source of information. In average, the node u thetimenecessarytocollectenoughpacketsfromeachsource sends ν(u) = bo(u) packets in the inter-arrival time of two Nc(u) orNCnodeindependently.Undertheassumptionthateachone consecutivepacketsinitsbuffer,whicharecombinationsofthe of these multiple collection processes represents a uniform samesetofinputpackets.Outoftheseν(u)packets,kpackets flow of packets, we can finally approximate the expected can be considered as useful for the client c if the decoding decoding delay as the time necessary for the collection of a system has a rank deficiency of k < ν(u). Furthermore, due sufficientnumberofpacketsfrommultipleindependentflows. topacketlossesandbandwidthvariationsinthenetwork,each The complete algorithm for computing the decoding delays of the packets generated by the node u arrives at the client c is given in Algorithm 1. Note that the algorithm uses an with probability (cid:15)c(u). The probability Ak(u) that k out of iterative procedure to compute the decoding delays, since the the ν(u) packets arrive at the client c is equivalent packet replication rate in SF node (see Section III) (cid:18) (cid:19) cannot be exactly computed at first. The algorithm initializes A (u)= ν(u) (1−(cid:15) (u))k(cid:15) (u)ν(u)−k (9) the replication rate to an average value given by the input and k k c c output bandwidths of each node, and refines this value along withthesuccessivedecodingdelayestimations.TheNCnodes Note that in general, ν(u) has not an integer value. We there- areexaminedintheorderoftheirproximitytothesources,i.e., fore perform an interpolation between the values of Ak(u) thenodesthatareclosertothesourcesareprocessedfirst.The evaluated on the integer values nearest to ν(u). numberofusefulpacketsN (u)iscomputedrecursivelyatall We then consider the probability P (u,r,n) for the client c c NC nodes, starting from those that are close to the sources. c to collect r useful packets from data sent by a node u that Then the algorithm considers NC nodes that receive packets possesses n useful packets. This probability can be computed from NC nodes that have been already visited. This specific recursively as 7 Algorithm 1 Delay computation algorithm P (u,r,n)=ν(cid:88)(u)P (u,r−k,n−1)·A (u),∀r ∈[1..n−1] 1: Initialize replication rates for every node: Rˆ(u)= bboi((nn)). c c k 2: repeat k=0 3: for each client node c do (10) Eq. (10) holds for r <n. When r =n, it becomes 4: Compute (cid:15)c(u) for sending nodes u from Eq. (7). 5: Compute Nc(s)=bo(s)·(1−(cid:15)c(s)) for all sources ν(cid:88)(u) ν(cid:88)(u) s. P (u,n,n)= P (u,n−k,n−1)· A (u) (11) c c l 6: for each NC node u do k=1 l=k 7: Compute tc(u) using Eqs. (10)-(14) setting node We further denote by Pc(u,r,n) the probability that the u in SF mode. client c collects k useful packets precisely due to the arrival 8: Compute tc(u)(cid:48) using Eqs. (10)-(14). setting node of the nth useful packet in the sending node u. It can be u in silent mode. written as 9: Compute ∆tc(u)=tc(u)−tc(u)(cid:48). 10: Compute ∆Nc(u)=1/∆tc(u).   ν(u) ν(u) 11: Compute Nc(u) using Eq. (8). (cid:88) (cid:88) Pc(u,r,n)= Pc(u,r−k,n−1)· Al(u), (12) 12: if Nc(u)<bo(u) then k=1 l=k 13: Computetc(u)usingEqs.(10)-(14)settingnode u in NC mode. whereP (u,r−k,n−1)includesallpossibleeventsthatlead c 14: else the node u to collect packets with rank r−k when it receives the n−1 useful packet for client c. 15: Compute the expected decoding delay tc(u) using Eq. (15). Thearrivaltimeofthenth usefulpacketatthesendingnode 16: end if u can be computed from the useful packet rate N (u).We c 17: end for assume that N (u) represents a constant rate and that the c arrival times of packets in node u are uniformly distributed. 18: Compute the average decoding delay tc considering all sources and NC nodes simultaneously, with Eq. Now, one can compute the expected number of useful packets (16). E (u) that are necessary at the sending node u for the client c 19: end for c to receive G useful packets. It is expressed simply as 20: for each SF node u do (cid:88)∞ 21: Estimatethetotalnumberofpacketsreceivedbyeach Ec(u)= p·Pc(u,G,p) (13) node, per generation: |N(u)|=bi(u)·maxtc. c p=G 22: Update the replication rate Rˆ(u) with Eq. (6). The decoding delay for the client c when the NC node u is 23: end for the only source of information can be estimated by dividing 24: until Until convergence of tc. the expected number of necessary packets by the inter-arrival time between two useful packets. It is written as t (u)=E (u)/N (u) (14) V. SELECTIVEPLACEMENTOFNCNODES c c c Then, we consider the sources and the NC nodes that are Equipped with methods to estimate the decoding delay on over-provisioned in bandwidth (i.e., the set of nodes u where theoverlaynetwork,wecandesignalgorithmstodecidewhich Nc(u) > bo(u), line 14 in Alg. 1). We assume that they nodes in the network should perform network coding. We transmit packets that are all potentially useful for the client c. addresstheproblemofplacingAnetworkcodingnodesinthe Thenumberofusefulpacketsfromnodeuthatreachtheclient overlay network, such that the average delay observed by the c in this case is given by the rate Nc(u)=bo(u)·(1−(cid:15)c(u)). clients is minimized. This is typically achieved by selecting When this rate is uniform, the decoding delay when the node network coding nodes such that the packet replication rate u is the only source of information is given by: is decreased and the innovative flow rate in the network is increased. G tc(u)= b (u)·(1−(cid:15) (u)) (15) However, the optimal selection of the NC nodes is known o c to be an NP-hard problem [10]. Hence, we design a greedy Finally, the average decoding delay at client c is computed approach that iteratively searches for the optimal placement by considering all the sources and NC nodes as independent of a new network coding node while all the previously added sources of information with uniform useful rates 1/t (u). We c NC nodes are fixed. The candidate nodes for implementing can write the decoding delay as network coding are all the remaining SF nodes in the overlay 1 network. Our node selection algorithm examines all the SF tc = (cid:88) 1 , (16) nodes backwards from the clients to the servers. It selects the SF node whose transformation into an NC node brings the t (u) c u∈S highestbenefitfortheclients.Thisprocedureisrepeateduntil where S is the set of sources and NC nodes. all the A NC nodes have been selected. 8 Weproposenowtwovariantsoftheiterativeselectionalgo- Algorithm 3 NC node selection with local information rithm that both use Algorithm 1 for computing the innovative 1: for i = 1 to A do flowrate,butdifferintheirviewofthenetworkresources.The 2: for every SF node u do first algorithm assumes that a central node possesses a global 3: Temporarily transform node u into an NC node. knowledge of the network; it iteratively selects the network 4: Estimate the average delay tˆc(u) at the clients using coding nodes in a centralized manner. When global network Algorithm 1 with local information. knowledge is not a reasonable assumption, the centralized 5: Transform node u back into an SF node. algorithm still serves as a performance benchmark for other 6: Transmit tˆc(u) to a central agent. greedy NC node placement algorithms. The second algorithm 7: end for uses only a local view of the network resources at each node 8: Select the node u∗ that maximizes the innovative rate, forcomputingthegainsininnovativerateanddecodingdelay. u∗ =argmax (cid:80) tˆ(u) u c c This algorithm is probably more realistic in practice and can 9: Turn permanently u∗ into a NC node. be implemented in a distributed way. 10: end for In more details, the centralized algorithm uses the knowl- edgeaboutthefullnetworkandavailableresourcesinorderto determine the number of innovative packets received by each ofNCnodesistypicallydeterminedbytheadmissibledelayor client. It leads to the iterative selection of A NC nodes by tolerable complexity in the network. For example, constraints computing at each stage the benefit of turning any of the SF ondecodingdelayimposealimitonthemaximumnumberof nodes into an NC node. The candidate node that brings the NCnodesinthesystem.However,theproblemofdetermining highest innovative flow rate with its transformation is selected the optimal number of NC nodes is out of the scope of this as a new NC node. The algorithm is described in Algorithm paper. We rather assume that the number of coding nodes or 2. helpersinthestreamingsystemisgivenapriori.Theproposed algorithmsthensolvetheproblemofplacingefficientlytheNC Algorithm 2 Centralized NC node selection nodes in the overlay network. 1: for i = 1 to A do Finally, it has to be noted that the second algorithm is not 2: for each node u in the set of SF nodes. do fully distributed, as it still uses a central agent to select the 3: Turn temporarily u into a NC node NC nodes. However, since it uses only local information, the 4: Estimate the average decoding delay at the clients tc proposed solution is certainly amenable to a fully distributed (using Algorithm 1). algorithm. One could imagine that each node decides inde- 5: Turn u back into a SF node. pendently if it should implement network coding or not, by 6: end for comparingthelocalestimationofthegainininnovativerateto 7: Select the node u∗ that minimizes the decoding delay, a pre-defined threshold. Alternatively, a distributed consensus i.e., u∗ =argmin (cid:80) t solution could be deployed for a coordinated selection of the u c c 8: Turn permanently u∗ into a NC node. NC nodes with minimal information exchange between the 9: end for overlay nodes. VI. SIMULATIONRESULTS The second algorithm relaxes the assumption that a central node is aware of the full network status. Instead, the nodes A. Setup only use local network information for the estimation of the Inthissection,weanalyzetheperformanceoftheproposed innovative flow rate. We define a neighborhood around each NC node selection algorithms for the transmission of video node. Then, an algorithm similar to the centralized solution streams in overlay networks. We generate overlay networks above is applied in each neighborhood in order to determine based on realistic network bandwidth values and adjacency the benefits of turning SF nodes into NC nodes. In particular, measurementsfromthePlanetLab[11],asprovidedinasnap- each node uses the estimation of the reception probability shot of their network taken on 24 Nov. 2009 by their Scalable (cid:15)c(u) that is given by all nodes u in the neighborhood and Sensing Service (S3) [12]. The networks under consideration computes an estimation of the decoding delay based on local haveonesourcenode,threeclientnodesandavariablenumber information, i.e., the capacities and the loss rates of the sub- of intermediate nodes. We create network topologies in the network around the node u. Note that (cid:15)c(u) is also calculated followingway.First,thesourcenodesarepositioned,thenthe consideringonlythestatisticsofnode’suneighborhood.These nodes are randomly added one-by-one to the topology. For estimations are transmitted periodically to a central agent, every new node, four nodes are randomly chosen as parent which finally makes the decision on the placement of new nodes. However, if the new node is not directly connected NC nodes. The procedure is summarized in Algorithm 3. to any of the selected parents according to the Planet Lab Both algorithms permit to select a few network coding measurement data, the node is removed and a new node is nodes in the system, such that the coding delay and overall selected. After all nodes have been added to the network, the computational complexity in the network is limited. At the nodes thatcannot bereached bythe sourceand the nodesthat same time, the system maintains a high innovative rate for are not connected with any client are removed. The resulting sustained streaming performance. The choice of the number networkgraphsaredirectedacyclicbyconstruction.Theedge 9 capacityofeachlinkissetto1/200ofthePlanetLabcapacity the performance of Algorithm 3 for different sizes of the values, in order to get realistic values for the link bandwidth. neighborhoodinthelocalgainestimations.Weobserveasharp Finally, the packet loss rate of each link is set to 5%. reduction of the delivery times with the addition of the first The network coding operations are performed in a Galois few NC nodes for all the algorithms, but especially for the field of size GF(28) since this field size has been shown to proposed algorithms. The gains become less important after result in a good compromise between performance (packets a few NC nodes have been placed in the network. We can are innovative with high probability) and information over- thus see that the NC nodes are well positioned in order to head [4]. The generation size is set to 32 packets, which is improve the delivery performance. The results also highlight reasonable for real time video streaming applications. The the inefficiency of the RANDSEL algorithm, which becomes packet size is 512 bytes. The decoding is performed by competitive with the other methods only for large number of gaussian elimination. Since we are interested in analyzing NC nodes. Finally, we can see that the Algorithm 2 performs the performance in terms of decoding delay, we compute the similartotheALGO2R.Thisconfirmsthattheproposeddelay average delay as the time needed by each client to receive 32 estimation strategy in ALGO2 is accurate as it comes close to linearlyindependentpackets(i.e.,acompletegeneration).This the actual delay values measured by the network simulator. istheminimalnumberofpacketsfortheclientstodecodethe source information. We use a set of 10 random networks that C. Innovative rate consist of 32 to 56 nodes with two different radii of 6 and 8 Wefurtherlookattheaveragenormalizedeffectivethrough- hops(i.e.,themaximumdistancebetweenanypairofnodesin putinthenetwork(i.e.,thenumberofusefulpacketsreceived the network is 6 or 8 hops). We consider the placement of up by the clients), as a function of the number of NC nodes to10NCnodesineachnetwork.Ineachcase,theperformance in the network. Normalization is performed with respect to resultsareaveragedover100simulationsperformedusingthe the effective throughput achieved by the scheme where all NS-3 [13] network simulator. nodes perform network coding. Figs. 6 (a) and (b) show We evaluate the performance of the centralized and semi- the effective throughputs for two different network radii. The distributed algorithms (resp. the Algorithm 2 and Algorithm results confirm the earlier observations on the decoding delay 3) denoted as “ALGO2” and “ALGO3” respectively). We performance. A few well selected nodes are able to bring compare them with a greedy search algorithm (ALGO2R) a large throughput gain. Further performance improvements similartoAlgorithm2butthatusestheactualdelaysobtained become less important as the number of NC nodes increase. by NS-3 simulations instead of the delay estimates from Weseealsothatthealgorithmsproposedinthispaperprovide Algorithm 1 for the node selection. We also compare to a the best performance among the schemes under comparison. baseline scheme called in the following as “RANDSEL” that The node selection algorithm with local information improves randomlyplacestheNCnodesinthenetwork.Forthesakeof with the size of the neighborhood but generally stays close to completeness, we finally study the performance of a scheme thecentralizedalgorithmwhentheneighborhoodissufficiently whereallnodesareNCnodes(thisschemeiscalled“Allnodes large. In the case where the neighborhood is limited to one NC”), a Linear NC scheme [4] and a scheme with only SF node, the proposed algorithm still outperforms RANDSEL nodes. The performance of the later is equal to the theoretical since the decisions are not totally blind. The reason for the maximum that can be sustained when only routing is enabled inferiorperformanceofthesemi-distributedschemecompared and can be found by typical maximum flow algorithms). tothecentralizedonesimplycomesfromthefactthatthelocal Note that the minimum delay and maximum effective network statistics are not sufficient for accurately estimating throughput obtained with Linear NC are computed by consid- thedelaywhentheneighborhoodissmall.Inaddition,wecan ering that a high capacity hyper-source node connected with observethatafewNCnodesaresufficientforobtaininghigher all sources. In this case, the overall throughput is computed throughputsthantheoneinroutingalgorithmswherethenodes as the sum of the throughputs from the hyper-source to each simply forward packets randomly to their descendants. It con- clientnode.Fortheroutingcase,weconsideraswellahyper- firmsthefactthatourmethodsareappropriatefordeployment sink node linked with all client nodes, and we compute the in low-cost networks, where a few helpers or network coding maximum throughput between the hyper-source and hyper- nodesaresufficientforimprovedthroughputandefficientdata sink with standard graph maximum flow algorithms. We delivery. Finally, we can observe that our algorithms tend to should point out that the links connecting the hyper nodes performbetterinlargernetworks(i.e.,largerradiusvalues),as with the networks sources and clients are error free and have theyareabletoexploitmoreefficientlytheavailableresources infinite capacities, so they do not introduce extra delays. and the diversity in the overlay network. B. Decoding delay D. Video quality Wefirststudythedecodingdelayforeachofthealgorithms. Finally, we study the performance of the video delivery Figs. 5 (a) and (b) illustrate the normalized average decoding schemes from the viewpoint of video quality. We estimate the delays for the network clients as a function of the number average PSNR quality measured at the clients with respect to of NC nodes added in the network, for two different network thenumberofNCnodesinthenetworkforallmethodsunder sizes. The decoding delays are normalized to the performance comparison in the transmission of the Foreman CIF sequence obtained when all nodes perform network coding. We show encoded by the the JM12.2 [14] of the H.264/AVC standard 10 1.8 2.5 RANDSEL RANDSEL ALGO3 (range=1) ALGO3 (range=1) 1.7 ALGO3 (range=2) ALGO3 (range=2) ALGO3 (range=3) ALGO3 (range=3) malized average delivery delay1111....3456 AAALLll GGnoOOd22eRs NC malized average delivery delay 1.25 AAALLll GGnoOOd22eRs NC Nor1.2 Nor 1.1 1 1 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 Number of NC nodes Number of NC nodes in the network (a) (b) Fig. 5: Normalized average decoding delays versus the number of NC nodes, in networks with maximum distance between any pair of nodes equal to: (a) six and (b) eight. 1 1 0.95 0.9 0.9 w Normalized average total flo00..0078..5578 RRAALLoAGGuNtOODin33Sg E((rrLaannggee==21)) Normalized total flow000...678 RRAALLoAGGuNtOODin33Sg E((rrLaannggee==21)) ALGO3 (range=3) 0.5 ALGO3 (range=3) 0.65 ALGO2 ALGO2 ALGO2R ALGO2R All nodes NC All nodes NC 0.4 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 Number of NC nodes Number of NC nodes (a) (b) Fig. 6: Normalized achievable throughput versus the number of NC nodes, in networks with maximum distance between any pair of nodes equal to: (a) six and (b) eight. [15]. The quality is estimated by setting the encoding rate to VII. RELATEDWORK the value of the network throughput in the different schemes. The corresponding results are illustrated in Figs. 7 (a) and (b) The problem of finding a minimal set of network coding fornetworkswitharadiusofsixandeighthops,respectively.It pointsinanetworkhasbeenmostlystudiedfromatheoretical isinterestingthattheimprovedthroughputvaluestranslateinto perspective so far. First, the special case of two source higher PSNR quality which confirms the above observations messages is examined in [16] where it is proved that the about the benefits of proper NC node selection. We can have number of coding nodes is independent of the total number gainsthatexceed1.5dBwithonlytwoNCnodes,whereaswe of network nodes. In [17], the minimum number of network reachgainsof3dBforsevenNCnodes.ThePSNRgainssatu- coding nodes is computed through graph coloring techniques. rateasthenumberofNCnodesincreases,butqualitygainscan It is then shown in [18] that the number of coding nodes is still be noticed. As expected, larger gains are observed for the upper bounded by the number of receivers. A unification of centralizedalgorithm,however,thesemi-distributedalgorithm networkcodingandtreepackingtheoremsisfurtherpresented offers significant gains as well. For a neighborhood of three in [19], where network coding is restricted to pre-selected hops, the performance of the semi-distributed even becomes edges. These include only input edges of relay nodes and not identical to that of the centralized scheme. Finally, we see theinputedgesofclientswheresimpleroutingisapplied.This that RANDSEL gives small PSNR gains for a few NC nodes, choice is made in order to achieve the min-cut max-flow limit which confirms the poor performance of a random selection of the network and save both processing and implementation of the network coding nodes and supports the development of complexity. The relation between links capacities and the effective selection algorithms. number of coding nodes is investigated in [20], where it is shown that in directed acyclic networks arbitrary amounts of gaincanbenoticedwhensubsetsofnodesofarbitrarysizeare

Selection of network coding nodes for minimal playback delay in streaming overlays PDF

0.67 MB·English

by Nicolae Cleju

#additional_collections #journals #arxiv

Checking for file health...

Save to my drive

Quick download

Download

Upgrade Premium

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Selection of network coding nodes for minimal playback delay in streaming overlays

See more

The list of books you might like

Upgrade Premium

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.