Table Of Content

1 A Poisson Hidden Markov Model for Multiview Video Traffic Lorenzo Rossi, Jacob Chakareski, Pascal Frossard, and Stefania Colonnese Abstract—Multiview video has recently emerged as a means resource allocation, buffer dimensioning, and performance to improve user experience in novel multimedia services. We evaluation [8]. propose a new stochastic model to characterize the traffic There is however a lack of traffic models for multiview 3 generated by a Multiview Video Coding (MVC) variable bit video communicationservices. We propose here a new traffic 1 rate source. To this aim, we resort to a Poisson Hidden Markov modelforMVCcontentinordertocharacterizetheframesize 0 Model(P-HMM),inwhichthefirst(hidden)layerrepresentsthe 2 evolutionofthevideoactivityandthesecondlayerrepresentsthe sequence observed at the output of an MVC variable bit rate framesizesof themultipleencodedviews.Weproposeamethod (VBR) source. Specifically, building on our preliminary work n for estimating the model parameters in long MVC sequences. [9], we design a doubly stochastic source model, namely a a We then present extensive numerical simulations assessing the J Poisson Hidden Markov Model (P-HMM) [10], in which the model’s ability to produce traffic with realistic characteristics 2 for a general class of MVC sequences. We then extend our first(hidden)layerconsistsofanon-stationarychainmodeling framework to network applications where we show that our the video activity level and the second layer represents the ] model is able to accurately describe the sender and receiver framesizes ofthe differentMVC encodedviews. Besides, we M buffers behavior in MVC transmission. Finally, we derive a extend the P-HMM parameter estimation algorithm for short model of user behavior for interactive view selection, which, in M observation sequences presented in [10] and adapt it to long conjunction with our traffic model, is able to accurately predict sequencessuch as those encounteredin video communication . actual network load in interactive multiview services. s services. We assess the model performances by extensive c Index Terms—Digital video broadcasting, three dimensional [ TV, hidden Markov models, multiview video. numerical simulations on classes of MVC sequences sharing common properties. We apply our model to predict the traffic 1 load generated by two different network services based on v 4 I. INTRODUCTION a client-server video communication paradigm. In the first 4 The advent of novel video services with multiple views one, that we name Multiview TV, the server simultaneously 3 streams all the MVC encoded views to the client. Our model of the same video scene, e.g., 3-D TV or free-view point 0 is shown to be able to predict the state of sender and video, poses many novel challenges in terms of coding, . 1 receiver buffers in Multiview TV. In the second one, named processing and transmission of the multimedia content. As 0 interactiveTV, theclientdynamicallyselectstheviewsduring far as encoding techniques are concerned, the ISO/ITU-T 3 the streaming session by means of a feedback channel. Due Joint Video Team has recently finalized the H.264 Multiview 1 to the MVC encoding dependencies, the server transmits a : Video Coding (MVC) standard,which is explicitlydevoted to v composite stream encompassing all encoded data required to efficientcompressionofa multiviewsource[1]. Itis expected i X that multiview video communication services will be traffic correctly decode the selected view. Finally, we introduce an Interactive TV user service request model, in order to mimic r intensive,whichraisesimportantquestionsin networkdimen- a the sequence of requested views selected by the user. In fact, sioning. In addition, the encoding dependencies between the thetrafficgeneratedduringtheinteractiveTVsessiondepends different views renders resource allocation quite challenging bothontheMVCencodedvideotracesandontheuser’sview in a MVC communication system. selection. We show that the combination of our two models Both problems of network dimensioning and resource allo- is able to accurately characterize the traffic in interactive cation are usually addressed with the help of traffic models multiview applications. in classical video delivery services. Such tools have proved The main contribution of this paper can be summarized as to be a valid support for efficient and accurate allocation follows: of network resources by characterizing the compressed video content through statistical models. Video traffic models have • a non-stationary traffic model for VBR MVC sequences is introduced, with the ability to characterize different beenderivedfordifferentapplicationsin teleconferencing[2], classes of MVC streams at different encoding settings. videobroadcasting[3],[4],orstreaming[5].Differentstochas- The model can predict actual network load in network tic models based on autoregressive processes [2], Transform applications; Expanded Sample (TES) processes [6], and Hidden Markov Models(HMMs)[7]havebeenconsideredfornetworkdesign, • a Maximum Likelihood (ML) estimation procedure suitable to derive the traffic model parameters in long se- L. Rossi and S. Colonnese are with DIET Dept., Sapienza Universita` di quences is derived; Roma,viaEudossiana18,I-00184Roma,Italy • a user behavior model for interactive view selection is J.ChakareskiandP.FrossardarewithSignalProcessingLaboratory(LTS4), combined with our traffic model to characterize interac- Ecole Polytechnique Fe´de´rale de Lausanne (EPFL), CH-1015 Lausanne, Switzerland tive multiview traffic. 2 The rest of the paper is organizedas follows. In Section II, we introduce the Poisson Hidden Markov Model (P-HMM) and discuss its feasibility; we also describe the P-HMM parametersetestimationprocedure.InSectionIII,wevalidate the model in different stream settings. Network applications of our model are studied in Section IV, along with the view switching model. Section V concludes the paper. II. MVCSOURCE MODELING A. MVC coding format A MVC stream jointly encodes different video sequences capturedby multiple cameraswith overlappingfields of view. Let us denote by N the number of such sequences. One View view, denoted as reference view, is independently encoded Fig.2. GOPstructureandencodinghierarchy (NGOP=4). using temporal motion compensation and transform coding techniques, similarly to a classical video sequence encoded level, and the traffic model should match this non stationary with the H.264 encoder [1]. The other N −1 views are View stochastic process. We propose a two-layer stochastic process encoded using inter-view prediction in addiction to temporal model inspired from [13]. We build a new Poisson Hidden prediction, in order to further improve the compression per- Markov Model (P-HMM) [10], in which the first (hidden) formance.InH.264MVC[1],inter-viewpredictionisallowed layer is a discrete time Markov chain whose states represent between frames referring to the same time instant, whereas different video activity levels; the second layer represents the intra-view encoding dependencies are usually set to permit frame size sequence corresponding to a given activity level. temporal scalability [11]. The encoding dependencies give Wemakethefollowingassumptionsaboutthenon-stationarity rise to a generalized Groups Of Pictures (GOP) structure of of the sources: duration N , constituted by N = N ×N frames. GOP f View GOP Figures 1 and 2 show two examples of such MVC GOPs. 1) the activity level of the video content varies in time GiventhecompleteMVCencodedbitstream,uptoN flows according to a Poisson distribution; View aretransmittedanddecodedbytheclient.Inmostapplications, 2) thehiddenlayerstate transitions,(i.e.,thechangeofthe all the views are transmitted together in a simulcast mode. activity level) occur at the beginning of a GOP; 3) the activity level is the same in all views since views are correlated. With these simplifying assumptions, we approximate the duration of scenes with a given activity level, by a simple oneparameterdistribution.Wediscardthepossiblechangesin the average frame size due to activity level changes observed in the middle of a GOP. We will show that, in spite of this approximation, the model closely matches the MVC source characteristics. Let us denote the number of states (i.e., different video activity levels) in our model as N . The state duration obeys s a Poisson distribution denoted by e−λiλk Fig.1. GOPstructure andencoding hierarchy(NGOP=8). di[k]= k! i . When a state transition occurs, the model is described by In order to control the size of the bitstreams, different rate a state transition matrix Π, whose element π denotes the controlalgorithms[12]canbeimplementedinMVCencoders. ij probability of transition from state i to state j. Because of We however focus in this paper on Variable Bit Rate (VBR) the explicit modeling of the state duration time distribution, streams, where the quantization step sizes are fixed. Their π = 0 for all i. Finally, π denotes the initial probability of valueonlydependsontheframetype,ascommonlyemployed ii i the model being in state i. in most MVC applications [1]. The second layer in our model describes the frame sizes in an entire GOP. Formally, let us consider the random vector B. Traffic model x[n]=[x [n],...,x [n]] We propose now a new model that is able to characterize 0 Nf−1 the frame sizes in GOPs of an MVC compressed stream with representing the set of frames sizes in the n-th GOP of the agivenGOPstructure.InVBRoperatingmode,thebit-rateof compressed multiview content. The vector x[n] is emitted theMVCencodedviewsvariesaccordingtothevideoactivity in accordance to a multivariate probability mass function 3 (pmf) depending on the actual hidden layer state. Given the representing the expectation steps of our EM algorithm (see current state in the first layer of the model, say i, a random [14] for further details). Then, the parameter set is expressed vector x[n] is generated according to the pmf b [x[n]]. For as function of these auxilary variables. We first define the i the sake of compactness, each pmf b [·], i = 1,...,N , has following forward probabilities 1 for n=0,...,N −1 i s a different number of bins depending on the coding mode def α (i,k)=P(s =i,...,s =i, (namely I, P, or B-frames) of the compressed picture to be n n n+k (3) generated. This choice is motivated by not imposing that the s 6=i|yn,Θ(m)), k <N −n−1 n+k+1 0 frame size pmf follows a fixed probability distribution (e.g., gaussian,gamma,...),butinsteadbyadaptingthedistribution αn(i)d=efP(sn =i|y0n,Θ(m)). (4) shape to the actual MVC sequences. Moreover, by allowing For k=N−n−1 the definitions aboveare slightly different a different number of bins for different frame types, we can in order to take into account the finite length of the actual adaptively control the model’s complexity (i.e., the parameter sequence set) according to the desired accuracy performance. α (i,N −n−1)d=efP(s =i,...,s =i|yn,Θ(m)). n n N−1 0 C. Parameter estimation Those quantities are calculated by the recursive algorithm illustrated in Algorithm 12. They represent the probabilityfor The estimation of the model parameters is a crucial step the system to be in state i at time n and to stay in the same in traffic characterization. Since the model belongs to the state for the next k instants, given the sequence observed till wide HMM family [14], we can resort to one of the estima- time n. tion algorithms employed for such models. In particular, an estimation procedure called Expectation-Maximization (EM) Algorithm 1 Computing forward probabilities α (i,k) and algorithm[15]iswidelyusedforHMMs.AversionoftheEM n α (i). algorithm has been proposed for P-HMMs in [10]. However, n 1: for n=0→N −1 do it exhibits numerical instability when used for long data sequences[16].WederivehereanewEMalgorithmforstable 2: for k =0→N −n−1 do parameterestimationinlongsequences.Ouralgorithmextends 3: if n=0 then the method of [16] to the case of non-stationary hidden state durations.Wepresenttheparameterestimationindetailbelow. 4: α0(i,k)∝πid˙i[k]bi[y[0]] Suppose that we observe a video sequence composed of N GOPs. Let yN−1d=ef{y[n]}N−1 denote the ob- 5: else served video traffic a0nd Θ ∈ nΘ=0the parameter set 6: αn(i,k) ∝ bi[y[n]]( Nj=s1αn−1(j,0)πjid˙i[k] + of our model, where Θ is the parameter space and Pj6=i α (i,k+1)) def n−1 Θ={Π,λ ,b [·],π ,...,λ ,b [·],π }. The EM algo- 1 1 1 Ns Ns Ns rithm comprises two iterative computational steps. The first 7: end if one is an expectation step that computes the auxiliary likeli- 8: end for hoodfunctionQ(Θ|Θ(m))=E{log(Prob{S,x,Θ})|y,Θ(m)}, inwhichS ∈SrepresentsaplausiblestatesequenceandΘ(m) 9: αn(i)= kαn(i,k) P isthem-thestimateoftheparameterset.Then,amaximization 10: end for step maximizes the likelihood function, i.e., Θ(m+1) =argmaxQ(Θ|Θ(m)). (1) Then, the following backwardprobabilitiesare defined in a Θ similar way Thealgorithmiteratesbetweenthetwostepsuntilconvergence of the parameter set. Before getting into more details, let us define a function γ (i,k)d=efP(s =i, n n d˙i[k] representing the state duration distribution and taking ...,s =i,s 6=i|yN−1,Θ(m)), (5) into account the finite length of the observed sequence, as n+k n+k+1 0 n=0,...,N −1; k <N −n−1 1−D [k−1] if k =N −n−1 i d˙i[k]= . (2) ξ (i,j,k)d=efP(s =i, s =j, n n−1 n d [k] otherwise  i ...,sn+k =j,sn+k+1 6=j|y0N−1,Θ(m)), (6) TheparameterD [k]isthecumulativedistributionofthestate i n=1,...,N −1; k <N −n−1. duration times. Then, the computations in our EM algorithm, asappliedtoP-HMMs,comprisei)thecomputationofforward Note that we resort to a different definition for the backward probabilities, ii) the computation of backward probabilities, probabilitieswithrespecttotheusualβ notation[14].In[14], iii) the estimation of the parameter set. The first two steps 1Ourdefinitions differfrom[10]inordertoavoidnumerical instability. calculate three auxiliary variables, namely the conditioned 2The normalization coefficient for αn(i,k) is calculated by summation probabilities of the state sequence to the observed sequence, overiandk. 4 aβ backwardprobabilityisdefinedrepresentingthelikelihood Finally, the state duration is expressed as: oftheobservedsequence,andγ andξarecalculatedbymeans ofαandβ.Here,wecalculatedirectlyγ andξ inabackward N−1N−n−2 Ns N−1  iteration in order to avoid numerical issues arising from the λ = k ξ (j,i,k)+ k γ (i,k) sequence length. The Algorithm 2 illustrates the backward i X X X n X 0  n=1 k=0 j=1 k=0  probabilities computation, where δj denotes the Kronecker  j6=i  i function. −1 N−1N−n−2 Ns N−1  · ξ (j,i,k)+ γ (i,k) . (10) X X X n X 0  n=1 k=0 j=1 k=0   j6=i  Notethatasingleiterationoftheestimationalgorithmconsists Algorithm 2 Stable estimation of backward probabilities of calculating first the state sequence probabilities (3)-(6), γ (i,k) and ξ (i,j,k), from (5)-(6) n n which is the expectation step. Subsequently, we compute 1: for n=N −1→1 do the new parameter estimates by averaging the observations weighted with the state probabilities (7)-(10), which cor- 2: for k =N −n−1→0 do responds to the maximization step of an iteration of the 3: if n=N −1 then algorithm.ConvergenceisassuredbyJensen’sinequality[16]. 4: γN−1(i,k)=αN−1(i) III. TRAFFIC MODELVALIDATION 5: else In this section, we assess our model by comparing statis- 6: if k 6=0 then tics evaluated on a pseudo-random synthetic traffic generated according to our P-HMM, with the statistics evaluated on a 7: γn(i,k)=ξn+1(i,i,k−1) composite MVC test sequence. The P-HMM parameters are 8: else estimated by applying the EM algorithm of Section II-C on 9: γn(i,0)=PNj=s1PkN=−0n−2ξn+1(i,j,k) twhheicohbstheervteedstcsoemqupeonscieteissedqifufeenrecnet.fWroematlhseosteeqsutetnhceecuasseediton j6=i 10: end if trainthemodel,inordertoshowthemodelabilitytorepresent not only the training sequence, but also other sequences with 11: end if similar content. A comparisonwith a well-knownsingle view 12: ξn(i,j,k) = PNlα=sn1−α1n(−i,10()lπ,0ij)πd˙jlj[kd˙]j+[kδ]ij+αδnlj−α1n(−i,1k(+l,k1)+1) · VBR model is also carried out. γ (j,k) n A. MVC encoder settings 13: end for The composite MVC sequences considered in the model 14: end for assessment are generated by concatenating several tests se- quenceswithverydifferentactivitylevels,asreportedinTable I. All the sequences are in CIF format, with a frame rate of 25 fps, and N = 4 views. The resulting MVC composite View sequence is approximately 6 minutes long. We encode the sequence using the two different GOP structures reported in Figures1 and 2. Both structuresexhibitmotion compensation Finally, the parameter set Θ(m+1) in the maximization dependencies among the views for anchor and non-anchor step can be calculated with the help of the forward and frames [17]. The large number of dependencies accentuates the backward probabilities above. We can write the initial the difference between MVC traffic and simple aggregations probability of being in state i as: of single view traffic. The two GOP structures differ in the numberofIandPframes.Thebit-ratevariabilityofsequences N−1 encoded using the GOP in Figure 1 is mainly due to the π = γ (i,k). (7) i X 0 residuals of the motion compensation whereas for sequences k=0 encoded using the GOP in Figure 2; the bit-rate variability Then we can express the transition probabilities as: depends on the large number of intra or P frames. For each N−1 N−n−1ξ (i,j,k) GOPstructure,differentMVCbitstreamshavebeengenerated πij = NPs n=1NP−1k=0N−n−n1ξ (i,j,k). (8) by setting the quantizationparameter of the referenceview to Pj=1Pn=1 Pk=0 n 10 (high quality), 20 (medium quality), or 40 (low quality), j6=i and by adjusting the temporal layers quantization parameter The frame size distribution in each state is given by: accordingly3. We use JMVC v7.0 to encode the sequences N−1 N−n−1γ (i,k)δy[n] bi[x]= Pn=N0−P1 k=N0−n−1nγ (i,kx) . (9) def3aSupltecsiefitctianlglys,owfethehaJvMeVseCt[t1h8e].quantization parameters according to the Pn=0 Pk=0 n 5 [18],thenweusethedifferentMVCbitstreamstobuildtraffic Algorithm 3 Synthetic traffic generation. models. 1: Initial state i extracted according to the distribution π ,π ,...,π sequence name # frames 1 2 Ns Akko & Kayo 290 2: n←0 Champagne Tower 500 Uli 250 3: while n<N do Jungle 250 4: k extracted from the Poisson distribution belonging to Balloons 500 the current state, i.e., d [k] Kendo 400 i Dog 300 5: if n≥N −k then Pantomime 500 6: k ←N −n−1 TABLEI TESTSEQUENCESUSEDTOGENERATETHECOMPOUNDSEQUENCE. 7: end if 8: for j =0 to k do As we discussed in the previous section, the number of 9: generatea synthetic GOP x[n+j] accordingto bi[·]; bins in the pmfs of the model may be different for different 10: end for frame types in order to have a trade-offbetween performance and model complexity. In our tests we use the number of 11: n←n+k bins shown in Table II (a) and (b) for the GOP structures 12: perform state transition according to transition matrix in Figures 1 and 2, respectively. The bins are placed in the interval between the minimum and the maximum observed Π. frame size for each frame type. 13: end while (a) GOPstructure inFigure1. View #0 50 10 10 10 20 10 10 10 View #1 to #3 30 10 10 10 20 10 10 10 B. Performance evaluation (b) GOPstructure inFigure2. View #0 50 10 20 10 We nowassess the model’saccuracybyfirst comparingthe View #1 to #3 30 10 20 10 autocorrelationfunction(acf)andtheQ-Qplotcomputedona sequenceofframesizesofthe actualMVCencodedsequence TABLEII (comprising all the views) with the respective statistics eval- NUMBEROFBINSINTHEPMFFOREACHFRAMEOFTHEGOPGIVENIN uated on a pseudo-random traffic sequence generated by the DISPLAYORDER. P-HMM with parameters estimated from the corresponding composite sequence. Figures 3, 4 and 5 show the acf and In the simulations, we employ a 3-state model in order to the Q-Q plot for different compressed MVC sequences. Q-Q represent low, medium and high level activity. A first coarse plotmeasureshow two distributionsare similar by comparing estimation of the model parameter is performed by labeling their quantiles. The closer the Q-Q plot to the bisector of the each GOP of the actual sequence as low, medium or high first and third quadrant,the more similar the two distributions according to its average frame size. The thresholds are set in are.It is clear that the statistics evaluated on the synthetic order to have the same number of GOPs in the three states. traffic (P-HMM-A) closely follow the statistics of the actual Then,acoarseestimationisobtainedbyevaluatingframesize MVC sequences, for every GOP structure and quantization histograms related to each state. Zero-valued bins are set to a parameter. The close match with the actual data is due to the lowfixedvalueandthehistogramsarenormalizedaccordingly. fact that we do not constrain the frame size pmfs to have The transition probability matrix is initializated with positive an explicit probability distribution and that we employ the random values, and the Poisson mean values are set to 1 Poissondistributionforthestatedurations,thusforcinganon- for each state. After that, the EM algorithm is performed stationaryrecurrenceoftheactivitylevels.Inordertovalidate as described in Section II-C using the coarse estimation as these conclusions, we compare our model to the well-known starting point. Estimation ends when the difference between single view VBR model described in [19]. Specifically, we the log likelihood of the two most recent iterates is smaller focus on the ”model A” in [19], whose main differenceswith than 0.01. our P-HMM model relate to the frame size distribution and Finally, synthetic traffic is produced by first generating a the hidden chain describing the activity level. The model A state sequence according to the model and then producing employs three shifted gamma distributions for the sizes of synthetic traffic for a GOP for each state of the sequence by frames (I, P, B) and a stationary 7-state Markov chain for means of the frame size pmfs. The bins are converted to the the activity level. Moreover, an ad-hoc parameter estimation mean frame size of the interval they represent. The synthetic procedure is used in [19]. We see in the Figures 3, 4 and 5 traffic generation is described in Algorithm 3. that model A is not able to describe it correctly.In particular, 6 x 105 15 1 P−HMM−A actual sequence P−HMM−B P−HMM−A Model A P−HMM−B 0.8 model A 0.6 10 n function 0.4 equence elatio etic s autocorr 0.2 synth 5 0 −0.2 −0.4 0 0 5 10 15 20 25 30 35 40 45 50 0 1 2 3 4 5 6 7 lag (samples) actual sequence x 105 (a) Autocorrelation function (b) Q-QPlot Fig. 3. Comparison of (a) the autocorrelation function and (b) the Q-Q plot estimated on the real MVC encoded sequence and on the synthetic P-HMM generated videosequence (GOPstructure inFigure2,highqualitystream). x 105 10 1 P−HMM−A actual sequence P−HMM−B P−HMM−A 9 Model A P−HMM−B 0.8 model A 8 0.6 7 utocorrelation function 00..24 synthetic sequence 456 a 0 3 2 −0.2 1 −0.4 0 0 5 10 15 20 25 30 35 40 45 50 0 0.5 1 1.5 2 2.5 3 3.5 lag (samples) actual sequence x 105 (a) Autocorrelation function (b) Q-QPlot Fig. 4. Comparison of (a) the autocorrelation function and (b) the Q-Q plot estimated on the real MVC encoded sequence and on the synthetic P-HMM generated videosequence (GOPstructure inFigure2,mediumquality stream). we can see that model A does not depict accurately the every view makes the model attractive to describe real MVC shape of the actual sequence, both for the Q-Q plots and the traffic in network applications, where a subset of the views autocorrelationfunction.Wefinallyconsiderthecaseinwhich are transmitted to the receivers. The good results obtained our model is trained with a differentsequence with respect to withPHMM-Bshowthatthemodelisabletocapturefeatures the test sequence,we denotethe synthetictraffic generatedby that are not only related to a single MVC stream, but also to this model as P-HMM-B. P-HMM-A outperforms the other a class of sequences sharing similar content characteristics. modelsin mimickingthe overallframe size distribution while Finally, we remark that the slight divergence in terms of P-HMM-B also achieves good adherence performance. autocorrelationfunctioniscausedbyneglectingtheintra-GOP correlation in the model design. The same statistics have been calculated separately on the traffic related to each view in MVC streams. Figures 6 IV. TRAFFICMODEL IN MULTIVIEWSERVICES and 7 show these statistics for the view #1 and view #3, A. Applications scenarios respectively. Similar results have been obtained for different encodersettings.Again,itisclearthatmodelA[19]isnotable We examine the accuracy of our model in the context of to capture the statistics for each single view. Conversely, our multiviewservices.Weconsidertwocasestudiesillustratedin models can characterize efficiently the traffic for each view. Figure 8. In the first one, called ”Multiview TV”, the server Thecharacterizationofthe firstandsecondorderstatistics for sends all the MVC content to the user. In the second one, 7 x 105 1.2 5 actual sequence P−HMM−A P−HMM−B synthetic sequence (same train) 4.5 Model A synthetic sequence (different train) 1 model A 4 0.8 3.5 autocorrelation function 00..46 synthetic sequence2.235 0.2 1.5 1 0 0.5 −0.2 0 0 5 10 15 20 25 30 35 40 45 50 0 0.5 1 1.5 2 2.5 3 3.5 lag (samples) actual sequence x 105 (a) autocorrelation function (b) Q-Qplot Fig.5. Q-QplotandautocorrelationestimatedontheactualMVCsequenceandthesyntheticsequences(GOPstructureinFigure1,mediumqualitystream). x 105 1.2 2.5 actual sequence P−HMM−A P−HMM−B synthetic sequence (same train) Model A synthetic sequence (different train) 1 model A 2 0.8 autocorrelation function 00..46 synthetic sequence1.15 0.2 0.5 0 −0.2 0 0 5 10 15 20 25 30 35 40 45 50 0 0.5 1 1.5 2 2.5 lag (samples) actual sequence x 105 (a) autocorrelation function (b) Q-Qplot Fig. 6. Q-Q plot and autocorrelation estimated on the actual MVC sequence (View #1) and the synthetic sequences (GOP structure in Figure 1, medium quality stream). x 105 5 1.2 P−HMM−A actual sequence P−HMM−B synthetic sequence (same train) 4.5 Model A synthetic sequence (different train) 1 model A 4 0.8 3.5 autocorrelation function 00..46 synthetic sequence2.235 0.2 1.5 1 0 0.5 −0.2 0 0 5 10 15 20 25 30 35 40 45 50 0 0.5 1 1.5 2 2.5 3 lag (samples) actual sequence x 105 (a) autocorrelation function (b) Q-Qplot Fig. 7. Q-Q plot and autocorrelation estimated on the actual MVC sequence (View #3) and the synthetic sequences (GOP structure in Figure 1, medium quality stream). 8 denotedas”InteractiveTV”,the userrequestsoneview atthe study could be conducted with other behavior models. time and can switch dynamically among the available views duringtheplayout,exploitinganout-of-bandfeedbackcontrol (a) VSMtransition matrix. channel. Due to the codingdependencies,the reference views Views #0 #1 #2 #3 still have to be transmitted along with the target view in the #0 0 0.4 0.2 0.4 Interactive TV service. #1 0.4 0 0.4 0.2 #2 0.2 0.4 0 0.4 FFeeeeddbbaacckk cchhaannnneell #3 0.4 0.2 0.4 0 cchhaannnneell sseerrvveerr uusseerr (b) VSMstatedurationparameters. cchhaannnneellrraattee rr Views av. time st. dev. sseennddeerrbbuuffffeerr BBTT rrreeeccceeeiiivvveeerrrbbbuuuffffffeeerrr BBBRRR #0 6min 30s #1 to #3 1min 10s Fig.8. Systemdescription formultiview services. Thefeedback channel is presentonlyintheinteractive TVcase. TABLEIII VIEWSWITCHINGMODEL(VSM)PARAMETERS. FromthepointofviewoftheMVCsourcemodel,theabove servicesdifferinthatthetrafficgeneratedduringtheMultiview TV session depends only on the encoded video, whereas the traffic generated during the Interactive TV session depends B. Performance analysis also on the view selection process. Thereby, in order to fully characterize the latter scenario, we develop a new model of We compare the traffic load due to the H.264 MVC source the view switching sequence, inspired by related models in andthe syntheticvideotraffic trace generatedby our P-HMM the context of channel switching in IPTV systems [20]-[23]. model in both network scenarios defined above. In the design of our view switching model, we employ the WeconsiderthattheMVCtrafficisfedintothetransmission following assumptions: buffer BT that is characterized by a buffer size bT and an output rate r (see Figure 8). The transmission buffer adopts a 1) A user watches the reference view most of the time; First In First Out (FIFO) schedulingpolicy.The buffer output 2) Other views are occasionally selected by the user be- is encapsulated into networks packets in accordance with cause they represent added content with respect to the network packetization rules and transmitted to the destination main view (e.g., in a football game, these views may through the channel. Each packet might be affected by a describe the foreground of the players); different(random)delayduringtransmission.Thedelayd[n]is 3) Auserselectsviewswithpreferencesthatdependonthe thesumofthechanneldelayd [n]andthetransmissionbuffer present view the user is currently watching (first order C delayd [n]. For modelingthe channeldelay d [n], we resort dependency). T C to the quite general and complete channel model introduced We model the view switching sequence as a chain in which by Miao and Chou4 [24]. After an initial prefetch delay D the states represent different views. The duration of stay in (namely D= 2 sec. in our study) from the arrival time of the eachviewisexplicitlymodeledwithaprobabilitydistribution. first frame,the playoutbufferis drainedata rate givenbythe According to the above assumptions, we select sample values MVC compressed stream. If frames are not available in the for the view transition probabilities, which are reported in playout buffer at their decoding deadline, they are considered Table III (a). The average and standard deviation of state aslost.Weconsiderthreedifferentvaluesforthechannelrate, durations are also set to sample values corresponding to the namely 1, 1.5, or 2 times the average bitrate of the MVC above assumptions (see Table III (b)). We have found that source rate. We denote the ratio between the channelrate and the Gamma distribution is suitable and flexible for modeling then average source rate by the factor c . thedurationtime, sinceittakesonlynon-negativevalueswith r We have generated a 25 minute long multiview test se- mean and standard deviation that are features independent. quence, by concatenating the streams described in Table I, Formally, let ds[t] be the density function for the duration i similarly to the sequences used in Section III-A. A 3-state time in state i: model is built by running the EM algorithm on the actual βαi ds[k]d=ef i kαi−1e−βik for k ≥0 (11) sequenceusing the sameprocedureasSection III-A. For each i Γ(αi) sequence, we generated two streams with high (Q=10) and The parameters α and β are derived from the mean and the low (Q=40) quality, respectively. We then study the accuracy i i standarddeviationofdurationtimeforthecorrespondingstate, of our model by comparing the traffic and more particularly respectivelyµ andσ ,accordingtothefollowingexpressions: the loss rate due to late packets, for both the synthetic traffic i i and the actual MVC sequence. The loss rate is defined as the  αi = µσ2i2 ratio between the number of lost frames and the number of  i (12) transmittedframesatboththesenderandreceiverbuffers.The βi = σµ2i frame loss rate is averaged over 10 Monte Carlo simulations.  i Note that the specific model of user behavior employed in 4Specifically, wehaveadopted thesamenumerical channel modelparam- theperformanceanalysisishowevernotcritical,anda similar etersasin[24],α=80, n=4, χ=0.025. 9 0.25 0.5 Actual sequence Actual sequence P−HMM−A P−HMM−A P−HMM−B 0.45 P−HMM−B Model A Model A 0.2 0.4 0.35 me loss rate0.15 me loss rate0.02.53 fra 0.1 fra 0.2 0.15 0.05 0.1 0.05 0 0 0 5 10 15 20 25 100 150 200 250 300 350 400 450 500 550 600 buffer size (kbit) buffer size (kbit) (a) Transmissionbuffer (b) Playoutbuffer Fig.9. FramelossrateinMultiview TVservices (lowquality stream,cr=2). First, we compare the sender buffer frame loss rate for dif- in the dimensioning of transmit and receive buffers. It can be ferentvaluesofthebuffersize.Figures9(a),10(a)andFigures employed both for synthetic trace generation as well as for 11(a), 12(a) show respectively these results for the Multiview theoretical network performance analysis. TV and the Interactive TV cases, at different channel rates and stream qualityvalues. It can be seen that the actualMVC V. CONCLUSION sequenceandthesyntheticsequenceshareasimilarframeloss In this paper, we have presented a new stochastic model rate for different channel rates and transmission buffer sizes. characterizing the frame size sequence for MVC variable bit Notethattheclosesimilarityisduetothemodel’scapabilityto rate (VBR) sources. The model exploits a Poisson Hidden describehigherorderstatisticsbymeansofthe non-stationary Markov Model representing the random frame sizes of the activity level chain. different MVC encoded views as a function of the random Finally, we compare the overall frame loss rate, i.e., the real video scene activity variations. We have also derived a sum of lost frames at both the sender and receiver buffers, stable EM algorithm that is applicable to long data sequences divided by the total number of frames, as a function of the fortheP-HMMparameterestimation.Wehaveshownthrough receiver buffer size5. Figures 9(b), 10(b) and Figures 11(b), extensivesimulationsthatourmodelaccuratelypredictsthese- 12(b)show these resultsfor the MultiviewTV andInteractive quenceofframesizesinanMVCstream.Wehavealsoapplied TV cases, respectively. Table V summarizes these results for our model to traffic load prediction in two different network other test settings, quantifying the model’s accuracy as the scenarios, namely a multiview TV service and an interactive average absolute difference e of the frame loss rate, between TV service. Simulation results show that the synthetic traffic the real sequence and the synthetic sequence. generatedbytheproposedmodelstronglyresemblesthetraffic The average is taken over the different buffer sizes under duetorealMVCvideotraces.Themodelisabletoaccurately consideration and the frame loss rate is expressed in percent. characterize a class of MVC streams sharing similar content It can be seen that the synthetic sequence closely follows the characteristics with the training data. The model is therefore behavioroftheactualMVCsource;specifically,thedifference anappropriatetoolfordifferentnetworkingproblems,suchas between the frame loss rate of the model and the one of the networkdimensioning,resourceallocation,andcalladmission actual sequence is smaller than 0.03 for most of the playout control. buffersizesunderexamination.Sincetheframearrivaltimeat theplay-outbufferdependsonthesize ofthe previousframes REFERENCES thatare transmitted,the close similaritybetweenthe synthetic and the actual traffic demonstrates the accuracy of the model [1] Y. Chen, Y. Wang, K. Ugur, M.M. Hannuksela, J. Lainema, and M. Gabbouj, “The Emerging MVC Standard for 3D Video Services,” in characterizing MVC traffic statistics. In addition, even the EURASIPJournalonAdvancesinSignalProcessing,vol.2009,Article P-HMM-Bmodel,whichistrainedondifferentsequencesthan ID786015,13pages,2009.doi:10.1155/2009/786015. the test sequences, is able to capture relevant traffic features [2] S.XuandZ.Huang,“AGammaAutoregressiveVideoModelonATM Networks,”IEEETrans.CircuitsSyst.VideoTechnol.,Vol.8,No.2,pp. ofactualdatathusprovidingagoodframelossrateestimation 138-142,Apr.1998. for both transmission and play-out buffer, as seen from Table [3] N.Ansari,H.Liu,Y.Q.Shi,andH.Zhao,“OnmodelingMPEGvideo V. Therefore, the P-HMM can replace real MVC sequences traffics,”IEEETrans.Broadcast.,vol.48,no.4,pp.337-347,Dec.2002. [4] D.P.HeymanandT.V.Lakshman,“SourceModelsforVBRBroadcast- VideoTraffic,”IEEE/ACMTrans.Netw.,vol.4,no.1,pp.40-48,Feb1996. 5To determine the overall frame loss rate, we set the transmission buffer [5] S. Colonnese, S. Rinauro, L. Rossi, and G. Scarano, “Markov Model sizetobelargeenoughtoguaranteethattheframelossrateatthetransmission of H.264 Video Traffic,” presented at the 4th Int. Sym. on Im./Video sideisnothigherthan5%. Comm.(ISIVC’08),Bilbao-Spain,July9-11th,2008. 10 (a) Highquality sequence c = 1 c = 1.5 c = 2 r r r buffer sender receiver sender receiver sender receiver P-HHM-A 0.37 0.5 0.41 0.39 0.15 0.39 P-HHM-B 0.56 1.47 0.7 1.17 0.63 1.06 Model A 1.58 1.84 1.42 1.23 1.11 0.83 (b) Lowquality sequence c = 1 c = 1.5 c = 2 r r r buffer sender receiver sender receiver sender receiver P-HHM-A 0.18 0.39 0.11 0.87 0.09 1.27 P-HHM-B 0.46 1.13 0.61 1.66 0.62 0.87 Model A 1.47 5.68 1.61 5.52 1.68 4.33 TABLEIV AVERAGEFRAMELOSSRATEDIVERGENCEBETWEENREALSEQUENCEANDSYNTHETICSEQUENCES(MULTIVIEWTVCASE). (a) Highquality sequence c = 1 c = 1.5 c = 2 r r r buffer sender receiver sender receiver sender receiver P-HHM-A 0.29 0.23 0.32 0.18 0.12 0.23 P-HHM-B 1.08 0.88 1.18 0.75 0.65 0.72 Model A 2.43 1.83 1.69 0.55 1.05 0.86 (b) Lowquality sequence c = 1 c = 1.5 c = 2 r r r buffer sender receiver sender receiver sender receiver P-HHM-A 0.39 1.21 0.34 0.91 0.34 0.91 P-HHM-B 0.59 3.53 0.65 1.50 0.67 1.61 Model A 4.02 10.24 4.22 3.85 4.27 3.94 TABLEV AVERAGEFRAMELOSSRATEDIVERGENCEBETWEENREALSEQUENCEANDSYNTHETICSEQUENCES(INTERACTIVETVCASE). 0.7 0.7 Actual sequence Actual sequence P−HMM−A P−HMM−A 0.6 PM−oHdeMl MA−B 0.6 PM−oHdeMl MA−B 0.5 0.5 frame loss rate00..34 frame loss rate00..34 0.2 0.2 0.1 0.1 0 0 100 200 300 400 500 600 700 800 900 1000 10 15 20 25 30 35 40 buffer size (kbit) buffer size (Mbit) (a) Senderbuffer (b) Receiver buffer Fig.10. Framelossrate,Multiview TVservices (highquality stream,cr=1). [6] A. Matrawy, I. Lambadaris, and C. Huang, “MPEG4 traffic modeling [8] Z.Zang,J.Kurose,J.D.Salehi,andD.Towsley,“Smoothing,Statistical usingthetransformexpandsamplemethodology,”inProc.of4thIEEE Multiplexing, and Call Admission Control for Stored Video,” IEEE J. IWNA4,2002,pp.249-256. Sel.AreasCommun.,vol.15,no.6,pp.1148-1166, Aug.1997. [7] S. Colonnese, S. Rinauro, L. Rossi, and G. Scarano, “H.264 Video [9] L.Rossi,J.Chakareski,P.Frossard,andS.Colonnese,“Anon-stationary TrafficModeling ViaHiddenMarkovProcess,”inProc.EUSPICO-09, hidden Markovmodelformultiview video traffic,” inProc.17thIEEE 2009,pp.2221-2225, 2009. International Conference on Image Processing (ICIP), 2010, pp.2921-