Focusing on a Probability Element: Parameter Selection of Message Importance Measure in Big Data Rui She, Shanyun Liu, Yunquan Dong, and Pingyi Fan, SeniorMember, IEEE State Key Laboratory on Microwave and Digital Communications Tsinghua National Laboratory for Information Science and Technology Department of Electronic Engineering, Tsinghua University, Beijing, P.R. China 7 Email: [email protected],[email protected] 1 0 2 n Abstract—Message importance measure (MIM) is applicable in big data opposed to conventional information theoretic a to characterize the importance of information in the scenario measures [8],[9]. In particular, this information measure is J of big data, similar to entropy in information theory. In fact, similar to Shannon entropy and Renyi entropy in structure, 2 MIM with a variable parameter can make an effect on the while focusing more on significance of the anomaly with 1 characterization of distribution. Furthermore, by choosing an appropriate parameter of MIM, it is possible to emphasize small occurring probability in big data scenarios. In terms ] the message importance of a certain probability element in a of applications of MIM, it was conducive to the minority T distribution. Therefore, parametric MIM can play a vital role subsetdetectionincludingbinaryminoritysubsetdetectionand I in anomaly detection of big data by focusing on probability of . M-ary minority subset detection [9]. However, the minimum s an anomalous event. In this paper, we propose a parameter occurringprobabilityofanomalouseventsattractsthemostat- c selection method of MIM focusing on a probability element [ and then present its major properties. In addition, we discuss tentiontoselecttheparameterofMIMinthepreviousmethod. the parameter selection with prior probability, and investigate That is to say, other anomalous events are ignored except 1 v the availability in a statistical processing model of big data for the one with minimum probability. In addition, the previous anomaly detection problem. 4 parameter selection principle of MIM is too imprecise to set 3 an operational parameter of message importance measure for 2 I. INTRODUCTION agivendistribution.Thus,itisnecessarytopresentapractical 3 Recently, big data has attracted considerable attention in parameterselectionwhichcanalsofocusonanomalousevents 0 manyfieldsofapplications.Inparticular,therearesomecases with other small probability in addition to the one with the . 1 where the minority subsets are more important than those minimum probability. 0 majoritysubsets. For instance,in IntrusionDetectionSystems Objectively, MIM can present an efficient implement to 7 1 (IDSs), only a few signs of security violations, which give address the problem of anomalous events detection. To make : rise to alarms, should be monitored and analyzed [1], [2]. MIM adaptable to detect anomalies in big data era, we will v Whatismore,asfarasfinancialcrimedetectionisconcerned, introducetheparameterselectionmethodofMIMfocusingon i X the identities not conforming to the provisions which can a probability that maintains availability to anomaly detection r contributetofinancialfrauds,arecertainlyintheminority[3]. problem in big data analysis. a Hidden in big data sets [4], the minority subset detection or A. Parameter Selection of MIM anomaly detection becomes increasingly influentialin various fields. In this subsection, we shall propose a parameter selection Note that the conventional information theoretic measures of MIM to pay close attention to importance of a probability such as KolomogorovComplexity,Shannon entropyand rela- elementinadistribution.Beforeweproceed,letusreviewthe tive entropy still applied to detect infrequent and exceptional definition of MIM briefly first [9]. appearance in large data sets. By judging the complexity of In a finite alphabet, for a given probabilitydistribution p= classifiedbigdatasetsusingthesemeasures,anomaloussubset (p1,p2,...,pn), the MIM with importance coefficient ̟ ≥ 0 could be recognized [5], [6]. Moreover, with an information is defined as theoretic approach,the objective function related to factoriza- n tion based on distribution was constructed to detect minority L(p ,̟)=log p e̟(1−pi) , (1) i subset [7]. Although these algorithms are suitable for some ! i=1 special applications, it is not difficult to perceive that they X whichmeasurestheinformationimportanceofthedistribution. actually focus on processing the majority subsets. As a new information measure, message importance mea- Definition 1. For a finite distributionp=(p ,p ,...,p ), the 1 2 n sure(MIM) has been proposed to target minority subsets parameterselection of MIM focusing on a certain probability element, which is a functional relationship between the im- where ̟ = 1 . We note that when p = p , the derivative is pj j portance coefficient ̟ and a given probability element p , is zeroandthequadraticderivativeisnegative.Thustheproperty j defined as is proved. 2) Chain Rule for MIM: For a given distribution p = ̟j =F (pj) (2) (p1,p2,...,pn) without pj = 0, if p1 > p2 > ... > pn , then we have and the corresponding MIM is n L (p ,̟ )<L (p ,̟ )<...<L (p ,̟ ). (8) 1 1 2 2 n n L (p,̟ )=log p e̟j(1−pi) , (3) j j i ! proof: Since p > p > ... > p and the importance i=1 1 2 n X coefficient is ̟ = 1 (j =1,...,n), we have ̟ < ̟ < where importance coefficient ̟ can be set by pj. j pj 1 2 ...<̟ . n B. Outline of the Paper This derivative of L(p ,̟) with respect to ̟ is The rest of this paper is organized as follows. In Section ∂L(p ,̟) II, we presenta parameter selection methodof MIM focusing ∂̟ on a certain probabilityelement and analyze the propertiesof tdheetamileedssadgisecuimsspioonrtaonncethmeepaasruarme.eItnerSseeclteicotnioInIIo,fwMepIMerf.oTrmhena = Pni=1pie̟(1ni−=p1i)p−ie̟P(1ni−=p1ip)2ie̟(1−pi) (9) the availability of parametric MIM is discussed in Section IV n p2e̟(1−pi) =1− i=1Pi >0, byapplyingtheparametricMIMinamodelofminoritysubset, Pni=1pie̟(1−pi) which is closely related to the applications in big data. We present some simulation results to verify our discussions in which means L(pP,̟) is monotonically increasing with ̟ Section V. Finally, we conclude the paper in Section VI. and the property is proved. 3)TheLowerBoundforFiniteDistribution:For any finite II. A MIM WITHPROBABILITY FOCUSING AND ITS distribution p = (p ,p ,...,p ) with positive p , we denote 1 2 n j PROPERTIES p as the maximum p , then we have max j In this section, we shall propose a parameter selection L (p ,̟ ) method of MIM which can focus on the importance of j j a certain probability element. In order to gain an insight 1 ≥L p , into the message importance measure with this method, its (cid:18) pmax(cid:19) (10) fundamental properties are investigated in depth. n Similar to the self-information in Shannon entropy, an >L(p ,1)=log p e(−pi) +1. i importance coefficient selection method of MIM focusing on i=1 ! X a given probability p is defined as j This property follows from the chain rule property and the 1 proof is omitted. ̟ =F(p )= . (4) j j p 4)Focusingonthe MinimumProbability: For any p = j (p ,p ,...,p ) without zeros element, denote p as the Inaccordancewiththefunctionalrelationshipbetweenimpor- 1 2 n min minimum probability element and ̟ = 1 according to tancecoefficient̟ andprobabilityelementpj,theparametric 0 pmin the parameter selection method. In that case, we have MIM is given by n Lj(p ,̟j)=log piep1j(1−pi) . (5) L0(p ,̟0)≥L0(u ,̟0), (11) ! Xi=1 where u is the uniform distribution, with the same number of Moreover, some properties of the MIM can be discussed to elements as p. providesomepossiblytheoreticalguidanceforapplicationsas proof: For the binomial distribution p = (p,1−p) with follows: 0 < p < 1, we have ̟ = 1. In order to compare the 1) Principal Component: For a fixed distribution p = two entire p2arametric MIM0 appaprently, we define a function (p1,p2,...,pn) with pj >0 (j =1,...,n), when we focus on G(p)=L(p ,̟0)−L(u ,̟0). Then we can derive the importance of a certain probability element p , the prob- j ability pj becomes the principal component in the message G(p)=log pep1−2+(1−p) , (12) importance measure. That is (cid:16) (cid:17) pje̟j(1−pj) ≥pie̟j(1−pi) (i=1,...,n). (6) ∂G(p) = (1− p1)ep1−2−1 <0. (13) proof:Consider the derivative ∂p pep1−2+(1−p) ∂ pe̟(1−p) It is clear that G(p) ≥ G( p = 1) = 0, which certifies the =(1−̟p)e̟(1−p), (7) 2 ∂p property. (cid:0) (cid:1) Next for the any distribution p = (p ,p ,...,p ) without 1 2 n n zfuenroctieolenment, we have ̟0 = pm1in. We define the Lagrange max:T(p,̟)= pie̟(1−pi), (19) i=1 X G(p,λ)=G(p ,....,p ,λ) 1 n p −p(2) <0, (i=1) (20) =Xi=n1pie1p−mpini−n+1+λ Xi=n1pi−1!. (14) s.t. pni(i1)p−i=pi0<, 0, ((ni==12)). ((2221)) Then the partial derivative of G(p,λ) with regard to p is i i obtained as fo∂lGlow=s 1− pi e1p−mpini−n+1+λ. (15) weRhesaovretinthgetoLathgeramnXige=eth1ofudnocftioLnagQranagnedwiittshpKaKrtiTalcodnerdiivtiaotinvse, ∂p p i (cid:18) min(cid:19) ∂Q, as follow n ∂pi By solving ∂∂pGi = 0 with i=1pi −1 = 0, it can be readily Q(p,a(j1),a(j2),λ) dwehmicohnsdternatoetdesththaet epx1tr=empe2vP=alu.e..o=f Gpnwi=ll bn1eoisbttahienesdolbuytiothne, = n pie̟(1−pi)+a(j1)(p(j1)−pj)+a(j2)(pj −p(j2)) (23) uniform distribution. i=1 X n In addition, we can attain (1) (2) +λ( p −1),(n=2,j =1,a ≥0,a ≥0), i j j L0(p ,̟0)−L0(u ,̟0) Xi=1 n = pie1p−mpini−n+1 (16) ∂Q =e̟(1−pi)(1−p ̟)−a(1)+a(2)+λ,(i=1,2). Xi=1 ∂pi i i i (24) =log(G(p ,....,p ,λ=0)) 1 n n By solving ∂Q =0 with p −1=0, a(1)(p(1)−p )=0, G(p ,....,p ,λ=0)≥1. (17) ∂pi i=1 i 1 1 1 1 n a(2)(p −p(2)) = 0, (p =Pp,p = 1−p), it can be shown 1 1 1 1 2 Therefore L (p ,̟ )−L (u ,̟ )≥0. that extreme value of Q is obtained when a(1) = a(2) = 0, 0 0 0 0 1 1 g(p,̟)=0, where, III. PARAMETER SELECTIONWITHPRIOR PROBABILITY g(p,̟) =(1−p̟)e̟(1−p)−(1−(1−p)̟)e̟p. (25) In this section, we investigate the parameter selection of message importance measure when the range of prior prob- By setting g(p,̟) = 0, we can derive the importance ability can be predicted. In that regard, there are certainly coefficient ̟∗ contributing to the maximum MIM given the many scenarios where the prior probability of anomalies is prior probability range. We then consider ̟∗ as the selected too small to be estimated. In order to reduce the complexity parameterofMIMwhenwefocusontheprobabilityelementp of the parameter selection approach of message importance inthebinomialdistribution,regardingtherelationshipbetween measurewhenaninterestedprobabilityis focusedon,we will ̟∗ and p as the ̟ =F(p ). Moreover,the g(p,̟) in some j j discuss binary distribution in the scenario with two possible certainp aredisplayedin Fig.1wherewecanacquire̟∗ the events. solutions of equation g(p,̟)=0 with different p. In particular, g(p,̟) can be simplified using the Taylor A. Parameter Focusing on Prior Probability expansion as follow For a given binomialdistribution p=(p ,p )=(p,1−p), 1 2 with the prior probability p(1) < p < p(2),(i = 1,2), g(p,̟) 0 < p < 1, we mean to seilect an aippropriiate importance =(2p+2)+(p2−p+ 1)̟+(−1p+ 1p2−p3)̟2. 2 coefficient ̟ to achieve the entire maximum MIM, which 2 2 2 (26) implies parametric MIM has been operational.To be specific, In the light of g(p,̟)=0 and ̟ >0, we have we contemplate the issue of the importance coefficient ̟ related to pj, ̟j =F(pj) satisfying p2−p+ 1 + 9p4+2p3+2p2+3p+ 1 ∗ . 2 4 (27) ̟ = . L(p ,̟∗)≥L(p ,̟ ) (18) q2p3−p2+p j According to the analysis above, we define a objective Remark 1. For ∂(̟∗) < 0, the importance coefficient ̟∗ ∂p function as T(p,̟) = logL(p ,̟). Note that L(p ,̟) is is monotonically decreasing with p. With the consideration of the logarithmof T(p,̟), wecan knowthattheymaintainthe p(1) < p < p(2), we can select p = p(1) and then determine sameconvexity.InthelightoftheapproachofLagrangeKKT the parameter ̟∗ of the MIM focusing on the probability p conditions, we have the optimization problem as follow in the binary distribution with prior probability. 80 IV. THE AVAILABILITY IN BIG DATA p=0.15 p=0.85 In this section, we investigate the convergence of MIM 60 p=0.18 p=0.82 focusing on a certain probability element under a stochastic 40 p=0.21 statistical model so as to present its availability in big data. p=0.79 20 p=0.24 ̟) p=0.76 A. Model of minority subset p, 0 g( For a stochastic variable {Xn}, the sample space -20 {a ,a ,...,a } corresponding to the probability distribution 1 2 K -40 {p ,p ,...,p }. Ina stochastic sequenceX X ...X , occur- 1 2 K 1 2 M -60 rence number of ak is mk. As for ∀ε,δ > 0, in the light of the weak law of large numbers [10], we have -80 0 1 2 3 4 5 6 7 m ̟ p | k −p |<ε >1−δ. (33) k M n o Fig.1. equation ofg(p,̟)=0withdifferent ̟ Itis easy to see thatin general, mk, the occurrencefrequency M of a , is converging to the probability p as M increasing. k k That is to say the occurrence number of a in the stochastic Furthermore,comparingpe̟(1−p) with(1−p)e̟p,wehave sequence can be regarded as Mp approximkately. k Meanwhile, we can also have some m (k = 1,2,...,K) z(p)=pe̟(1−p)−(1−p)e̟p, (28) k satisfying m k ∂z(p) =(1−̟p)e̟(1−p)+(1−(1−p)w)e̟p. (29) M −pk ≥ε, (34) p which can be consider(cid:12)ed as the(cid:12)condition for the minority (cid:12) (cid:12) Itcanbederivedthat ∂z(p) <0holdstruewhen0≤p≤ 1. subset G¯. Thus, we have p 2 As a result, we have G¯ = m |mk −p |≥ε,k=1,2,...,K , (35) k k M pe̟(1−p) ≥(1−p)e̟p. (30) n (cid:12) m o p{G¯(cid:12)}≤Kmaxp{| k −p |≥ε}<δ k k M (36) ̟T=he̟n∗c.e,theprobabilityelementpisalsothedominantwhen p{G}=1−p{G¯}>1−δ. Therefore, the problem of minority subset detection can be B. Bounds of Importance Coefficient with Prior Probability processed as a test of binary distribution p = (p,1−p) with On the one hand, we can investigate the value range of the p(G¯)=p, p(G)=1−p. parameter ̟∗ based on equation g(p,̟) = 0. On the other B. The availability of parametric MIM in the model hand,wecansubstitutetheequation(1−p̟)e̟(1−p) =C(C is a constant) with more than one solution for the solution In order to simplify the model of the minority subset, we ̟∗ of g(p,̟) = 0. It is clear that the function g(p) = can focus on the binary stochastic variable, namely, K = 2 (1−p̟)e̟(1−p) is not monotonicwith p. Thereforewe look rather than multivariable. In this case, we can regard the forward to explore the ̟ satisfied the equation in the case of occurrenceof{G¯0}asasmall-probabilityeventwhichattracts 0<p< 1. We can acquire the derivative our attention, as well as 2 G¯ = m m≥Mp +Mε,or,m≤Mp −Mε , (37) ∂g(p) =(̟2p−2̟)e̟(1−p). (31) 0 1 1 ∂p where m(cid:8)is t(cid:12)he occurrence number of a , and the oc(cid:9)currence (cid:12) 1 number of a relaying on m. Considering ̟ > 0, the solution ̟ meeting the convexity of 2 g(p)needsto makeitpossible tofit ∂g(p) =0 for0<p< 1. Furthermore, we denote ni as the number of occurrence ∂p 2 of {G¯ } in the trial events number N , which indicates that In consequence, we have 0 i small-probabilityevent{G¯ }isfocusedon.Thus,wehavethe 0 2 2 empirical probability ≤̟ ≤ , (32) pmax pmin pˆ = ni = ni−1+∆ni , (38) where p =max{p}, p =min{p}. (i) Ni Ni−1+∆Ni max min Remark 2. We can provide a loose value range of the ̟ where ni = ij=1∆nj, Ni = ij=1∆Nj, ∆nj and ∆Nj dependingon0<p< 1.Inviewofthelooserrangeofp,the denotestheincrementofnj, Nj respectively.We furtherhave 2 P P pdaisrtarimbuetteiorn̟beolfonMgIsMtofo{c̟us|in̟g o>n4th}e. probability p in binary min{ni−1, ∆ni}≤pˆi ≤max{ni−1, ∆ni}, (39) Ni−1 ∆Ni Ni−1 ∆Ni which indicates the empirical probability pˆ will be disturbed i 500 by the increment ∆ni. However it can converge in the case binary of N large enough∆Nbaised on law of large numbers. geometric FoirtheMIMfocusingonprobabilitypˆi,takingtheempirical y p j400 puinoisfosormn tphreobmabaixliimtyupˆmi finotropLroib(pab,i̟litiy)ewleitmhe̟ntip=bapˆ1sie,dwoencparnopaecrqtyuir2e. y ever 300 b Ifnolltohwis case, we can have MIM with empirical probability as ̟ set j200 Lˆ =L(pˆ)=L pˆ,̟ = 1 ̟p,) ,j100 i i i(cid:18) i i pˆi(cid:19) (40) L( =log(pˆiepˆ1i−1+(1−pˆi)e). 0 Note that the derivative satisfies ∂L(pˆi,̟i=pˆ1i) < 0, namely 1 2 3 orde4r of pr5obabili6ty elem7ents 8 9 10 ∂pˆi the empirical MIM Lˆ is monotonically decreasing with pˆ. i i Fig.2. messageimportance measureversus̟i Therefore we have min{Lˆi−1,Lˆ∆i}≤Lˆi ≤max{Lˆi−1,Lˆ∆i}, (41) In consideration of the E(Lˆ ) and D(Lˆ ), we can see that i i whTehreisLˆd∆esic=ribLes(∆t∆hNneiic,o̟nvie=rge∆∆nNnciie)ofempiricalMIMLˆ related wwihlelnbethgerepartolybaebfifleitcyteedlebmyetnhtepLi′s′(µv)ereyxscmepatllf,otrhaelmaregaeneEno(uLˆgih) i 2 to pˆ when pˆ is focused on. According to the law of large N . Nevertheless, the variation D(Lˆ ) can converge to L(µ) i i i i numbers, ni playsavitalroleinconvergenceofLˆ .However, withN increasing,especiallyN →∞,onaccountofthefact ∆ni can dNisiturb the convergenceto cause the flucituation. that[L′i(µ)]2ismonotonicallydeicreasingwithµ,(0<µ< 1). ∆NTiaking into account the property of binary distribution, we Furthermore,itcanbeactuallyconsciousthatthemoretr2ial canacquirethe meanp,andthe variationp(1−p). According events Ni are observed, the more exactly mean of empirical to the central limit theorem, we derive operational MIM, i.e. E(Lˆi), will be estimated. As for the variation,we can ascertain that the empirical MIM is actually E(pˆ )=p moreconvergentinthatcondition.Thatistosay,itisavailable (i) (42) (D(pˆi)=p(1−p)/Ni to take advantage of the MIM focusing on a probability elementforminoritysubsetoranomalydetectioninempirical where E(pˆ(i)) andD(pˆi) denote the mean µ andvariationσ2 statistics. of pˆ respectively. i Note that Lˆ is a stochastic function of pˆ, we can obtain V. NUMERICAL RESULT i i the mean E Lˆ and variation D Lˆ . However, it is quite In the section, the properties and the availability of MIM i i complicated(cid:16)to (cid:17)acquire the exact(cid:16)resu(cid:17)lt, thus we can have focusingon a probabilityelementare illustrated by numerical approximate solution as follow result. Taking the binomial distribution, piosson distribution and geometric distribution as examples of different distri- ′′ . L (µ) E Lˆ =L(µ)+ σ2 butions, the variety of operational MIM with importance i 2 coefficient ̟ which is set by p , is presented in Fig. 2 and (cid:16) (cid:17)=. log pe(p1−1)+(1−p)e + Fig. 3. j j (2p −1(cid:16))e(p2−24)p+e(p1(−p131)−+p1(21−(cid:17)−p2p)+e22)e(p1−2)−1σ2 LbujtF(iporon,ms̟ejxFci=egp.tp12tjh,)eiitusndiisfeocrrsmehaosdwiinsngtriwbthuiatthtionpth,jewfhooircphedriawffteiooruenlnadtlcdeMirstItirMfiy- theproperty1.Besides, itisnotedthatnotevery̟ setbyp D Lˆ =. [L′(µ)]2σ(cid:16)2 (cid:17) in different distribution can contribute to the L (p,j̟ = 1j) i j j pj larger than that in uniform distribution. However, for Fig. 3, (cid:16) (cid:17)=. (1− p1)e(p1−2)−1 2σ2. it is presented that L0(p ,̟0 = pm1in) in other distributions pe(p1−2)+1−p ! is bigger than that in uniform distribution, which can verify (43) property 4. On the other hand, in the accordance with Chebyshev VI. CONCLUSION inequality,wecanobtaintherelationshipbetweenconvergence and its probability as follow In this paper, we investigated the information measures problemanddiscussedthe parameterselectionofMIMfocus- P{|Lˆ −E(Lˆ )|≥ǫ}≤ D(Lˆi), (44) ing on a probability element, which characterized the event i i ǫ uncertainty as entropy and emphasized the importance of where ∀ǫ>0. the attractive probability element. Furthermore,the advantage [10] R. Durrett. Probability: theory and examples, Cambridge university n 250 binary press,2010. oi piosson uti normal distrib 200 ruanyilform nt e 150 er diff )in min100 p 1/ = 50 ̟ p, L( 0 10 20 30 40 50 order of probability elements Fig.3. messageimportance measureindifferent distribution of MIM focusing on a probability element is that it can chooseparametermoreadaptivelyasrequiredforthestatistical big data analysis. We have discussed several properties of a MIM with attractive probability, and investigated an practical method to select coefficient ̟ with the prior probability. We also have investigated the availability of the parametric MIM with importance coefficient selected by probability for more commonminoritysubsetdetectionproblems.Inthefuture,we plan to design more efficient algorithms to detect anomalous events or minority subsets. ACKNOWLEDGMENT We thank a lot for the support of the China Major State Basic Research Development Program (973 Program) No.2012CB316100(2),andChinaScholarshipCouncil.Mean- while, the work received the extensive comments on the final draft from other members of Wistlab of Tsinghua University. REFERENCES [1] W. Lee and S. J. Stolfo. “Data Mining Approaches for Intrusion De- tection,” in Proc. Usenix security, San Antonio, USA, Jan. 1998, pp. 291-300. [2] K.Julisch,andM.Dacier.“Miningintrusiondetectionalarmsforaction- able knowledge,” InProc.Acm International Conference onKnowledge Discovery &DataMining, NewYork,USA,2002,pp.366-375. [3] S. Wang. “A comprehensive survey of data mining-based accounting- fraud detection research,” In Proc. IEEE Intelligent Computation Tech- nologyandAutomation(ICICTA),Madurai,India,May2010,pp.50-53. [4] S.Ramaswamy,R.Rastogi,andK.Shim.“Efficientalgorithmsformining outliersfromlargedatasets,”ACMSIGMODRecord,Vol.29,No.2,pp. 427-438,May2000. [5] V.Chandola,A.Banerjee,andV.Kumar.“Anomalydetection:Asurvey,” ACMcomputing surveys (CSUR),Vol.41,No.3,pp.15,July2009. [6] W. Lee, and D Xiang. “Information-theoretic measures for anomaly detection,” In Proc. IEEE Symposium on Security and Privacy 2001, Oakland, USA,May2001,pp.130-143. [7] S.Ando,andESuzuki.“Aninformation theoretic approachtodetection of minority subsets in database,” In Proc. IEEE Sixth International Conference onDataMining,HongKong,China, Dec.2006,pp.11-20. [8] T. M. Cover, and Joy A. Thomas. Elements of information theory 2nd edition, Wiley Series in Telecommunications and Signal Processing, WileyInterScience, 2006. [9] P. Y. Fan, et al. “Message Importance Measure and Its Application to Minority Subset Detection in Big Data,” In Proc. IEEE Global Communications Conference (GLOBECOM), Washington D.C., USA, Dec.2016.