Table Of Content

DellEMCTechnicalReports,Vol. 1,2017 OriginalResearcharticle A Generalized Multinomial Distribution from Dependent Categorical Random Variables Rachel Traylor , Ph.D. ∗ 7 Abstract 1 Categoricalrandomvariablesareacommonstapleinmachinelearningmethodsandotherapplications 0 2 across disciplines. Many times, correlation within categorical predictors exists, and has been noted to n have an effect on various algorithm effectiveness, such as feature ranking and random forests. We a present a mathematical construction of a sequence of identically distributed but dependent categorical J random variables, and give a generalized multinomial distribution to model the probability of counts of 4 such variables. 2 Keywords ] R categorical variables — correlation — multinomial distribution — probability theory P . h Contents t a m Introduction 1 [ 1 Background 2 1 2 ConstructionofDependentCategoricalVariables 3 v 5 2.1Example construction ............................................................ 5 5 2.2Properties ..................................................................... 5 9 6 IdenticallyDistributedbutDependent•PairwiseCross-CovarianceMatrix 0 3 GeneralizedMultinomialDistribution 9 . 1 4 Properties 10 0 7 4.1Marginal Distributions ........................................................... 10 1 4.2Moment Generating Function ..................................................... 11 : v 4.3Moments of the Generalized Multinomial Distribution .................................. 11 i X 5 GeneratingaSequenceofCorrelatedCategoricalRandomVariables 12 r a 5.1Probability Distribution of a DCRV Sequence ........................................ 12 5.2Algorithm ..................................................................... 12 ExampleDCRVSequenceGeneration 6 Conclusion 14 Acknowledgments 14 References 14 Introduction Bernoullirandomvariablesareinvaluableinstatisticalanalysisofphenomenahavingbinaryoutcomes, however,manyothervariablescannotbemodeledbyonlytwocategories. Manytopicsinstatisticsand OfficeoftheCTO,DellEMC ∗ AGeneralizedMultinomialDistributionfromDependentCategoricalRandomVariables—2/15 b q p 0 1 ǫ b b 1 q+ p− q− p+ ǫ 0 1 0 1 2 b b b b q+ p− q+ p− q− p+ q− p+ 0 1 0 10 1 0 1 ǫ 3 b b b b b b b b b b b Figure1. First-DependenceTreeStructure machinelearningrelyoncategoricalrandomvariables,suchasrandomforestsandvariousclustering algorithms.[6,7]. Manydatasetsexhibitcorrelationordependencyamongpredictorsaswellaswithin predictors,whichcanimpactthemodelused.[6,9]. Thiscanresultinunreliablefeatureranking[9],and inaccuraterandomforests[6]. SomeattemptstoremedytheseeffectsinvolveBayesianmodeling[2]andvariouscomputationaland simulationmethods[8]. Inparticular,simulationofcorrelatedcategoricalvariableshasbeendiscussedin theliteratureforsometime.[1,3,5]. Littleresearchhasbeendonetocreatemathematicalframeworkof correlatedordependentcategoricalvariablesandtheresultingdistributionsoffunctionsofsuchvariables. Korzeniowski [4] studied dependent Bernoulli variables, formalizing the notion of identically dis- tributedbutdependentBernoullivariablesandderivingthedistributionofthesumofsuchdependent variables,yieldingaGeneralizedBinomialDistribution. In this paper, we generalize the work of Korzeniowski [4] and formalize the notion of a sequence of identically distributed but dependent categorical random variables. We then derive a Generalized MultinomialDistributionforsuchvariablesandprovidesomepropertiesofsaiddistribution. Wealso giveanalgorithmtogenerateasequenceofcorrelatedcategoricalrandomvariables. 1. Background Korzeniowskidefinedthenotionofdependenceinawaywewillrefertohereasdependenceofthefirst kind(FKdependence). Suppose(ε ,...,ε )isasequenceofBernoullirandomvariables,and P(ε =1)= p. 1 N 1 Then,forε ,i 2,weweighttheprobabilityofeachbinaryoutcometowardtheoutcomeofε ,adjusting i 1 ≥ theprobabilitiesoftheremainingoutcomesaccordingly. Formally,let0 δ 1,andq=1 p. Thendefinethefollowingquantities ≤ ≤ − p+ :=P(εi =1 ε1 =1)= p+δq p− :=P(εi =0 ε1 =1)=q δq | | − (1) q+ :=P(εi =1 ε1 =0)= p δp q− :=P(εi =0 ε1 =0)=q+δp | − | Giventheoutcomeiofε ,theprobabilityofoutcomeioccurringinthesubsequentBernoullivariables 1 ε ,ε ,...ε is p+,i=1orq+,i=0. Theprobabilityoftheoppositeoutcomeisthendecreasedtoq and p , 2 3 n − − respectively. Figure 1 illustrates the possible outcomes of a sequence of such dependent Bernoulli variables. Korzeniowskishowedthat,despitethisconditionaldependency, P(ε =1)= p i. Thatis,thesequence i ∀ ofBernoullivariablesisidenticallydistributed,withcorrelationshowntobe δ, i=1 Cor(ε ,ε )= i j (cid:40)δ2, i = j,i,j 2 (cid:54) ≥ AGeneralizedMultinomialDistributionfromDependentCategoricalRandomVariables—3/15 TheseidenticallydistributedbutcorrelatedBernoullirandomvariablesyieldaGeneralizedBinomial distribution with a similar form to the standard binomial distribution. In our generalization, we use the same form of FK dependence, but for categorical random variables. We will construct a sequence ofidenticallydistributedbutdependentcategoricalvariablesfromwhichwewillbuildageneralized multinomial distribution. When the number of categories K =2, the distribution reverts back to the generalized binomial distribution of Korzeniowski [4]. When the sequence is fully independent, the distributionrevertsbacktotheindependentcategoricalmodelandthestandardmultinomialdistribution, andwhenthesequenceisindependentandK=2,werecoverthestandardbinomialdistribution. Thus, thisnewdistributionrepresentsamuchlargergeneralizationthanpriormodels. 2. Construction of Dependent Categorical Variables b p p p 1 2 3 ǫ 1 2 3 1 b b b p+1 p−2 p−3 p−1 p+2 p−3 p−1 p−2 p+3 ǫ 1 2 3 1 2 3 1 2 3 2 b b b b b b b b b p+1 p−2 p−3p+1 p−2p−3 p+1 p−2 p−3 p−1 p+2 p−3p−1 p+2p−3 p−1 p+2 p−3p−1 p−2 p+3p−1 p−2p+3 p−1 p−2 p+3 ǫ 3 1b 2b 3b 1b 2b 3b 1b 2b 3b 1b 2b 3b 1b 2b 3b 1b 2b 3b 1b 2b 3b 1b 2b 3b 1b 2b 3b Figure2. Probabilitydistributionat N =3forK=3 Suppose each categorical variable has K possible categories, so the sample space S = 1,...,K 1 The { } construction of the correlated categorical variables is based on a probability mass distribution over K adicpartitionsof[0,1]. Wewillfollowgraphterminologyinourconstruction,asthislendsavisual − representation of the construction. We begin with a parent node and build a K-nary tree, where the endnodesarelabeledbytherepeatingsequence(1,...,K). Thus,after N steps,thegraphhasKN nodes, with each node labeled 1,...,K repeating in the natural order and assigned injectively to the intervals 0, 1 , 1 , 2 ,..., KN 1,1 . Define KN Kn KN K−n (cid:16) (cid:105) (cid:0) (cid:3) (cid:0) (cid:3) ε =i on Kj,Kj+i N KN KN (2) εN =K on(cid:16) K(j+K1N)−1(cid:105),K(Kj+N1) (cid:16) (cid:105) wherei=1,...,K 1,and j=0,1,...,KN 1 1. Analternateexpressionfor(2)is − − − ε =ion l 1, l , i l modK, i=1,...,K 1 N K−N KN ≡ − (3) (cid:16) (cid:105) ε =K on l 1, l , 0 l modK N K−N KN ≡ (cid:16) (cid:105) 1Theseintegersshouldnotbetakenasorderedorsequential,butratherascharactertitlesofcategories. AGeneralizedMultinomialDistributionfromDependentCategoricalRandomVariables—4/15 Toeachofthebranchesofthetree,andbytransitivitytoeachoftheKN partitionsof[0,1],weassigna probabilitymasstoeachnodesuchthatthetotalmassis1ateachlevelofthetreeinasimilarmanner to[4]. Let0< p <1,i=1,...,K suchthat∑K p =1,andlet0 δ 1bethedependencycoefficient. Define i i=1 i ≤ ≤ p+ := p +δ∑ p , p = p δp foiri=1i,...,K.lT(cid:54)=hieslepi−robabii−litiesisatisfytwoimportantcriteria: Lemma1. ∑K p = p++∑ p =1 • i=1 i i l(cid:54)=i −l p p++p ∑ p = p • i i i− l(cid:54)=i l i Proof. Thefirstisobviousfromthedefinitionsof pi+/− above. Thesecondstatementfollowsclearlyfrom algebraicmanipulationofthedefinitions: p p++p ∑ p = p2+p (1 p )= p i i i− l(cid:54)=i l i i − i i WenowgivetheconstructioninstepsdowneachleveloftheK-narytree. LEVEL1: Parentnodehasmass1, withmasssplit1 ∏K p[ε1=i], where [ ] isanIversonbracket. Thislevelcorre- · i=1 i · spondstoasequenceεofdependentcategoricalvariablesoflength1. ε /Branch Path MassatNode Interval 1 1 parent 1 p (0,1/K] 1 → 2 parent 2 p (1/K,2/K] 2 → . . . . . . . . . . . . i parent i p ((i 1)/K,i/K] i → − . . . . . . . . . . . . K parent K Mass p ((K 1)/K,1] K → − Table1. ProbabilitymassdistributionatLevel1 LEVEL2: Level2hasKnodes,withKbranchesstemmingfromeachnode. Thiscorrespondstoasequenceoflength 2: ε=(ε ,ε ). Denotei.1asnodeifromlevel1. Fori=1,...,K, 1 2 Nodei.1hasmass p ,withmasssplit p p+ [ε2=i] ∏K p [ε2=j] i i i −j j=1,j=i (cid:54) (cid:16) (cid:17) (cid:0) (cid:1) ε /Branch Path MassatNode Interval 2 (i 1)K (i 1)K+1 1 i.1→1 pip1− −K2 , −K2 2 i.1→2 pip2− (cid:16)(i−1K)2K+1,(i−1K)2K+(cid:105)2 .. .. .. (cid:16) .. (cid:105) . . . . i i.1→i pipi+ (i−1)KK+2(i−1),(i−1K)2K+i .. .. .. (cid:16) .. (cid:105) . . . . K i.1 K p p iK 1,iK → i −K K−2 K2 Table2. ProbabilitymassdistributionatLevelII,Nodei (cid:0) (cid:3) AGeneralizedMultinomialDistributionfromDependentCategoricalRandomVariables—5/15 In this fashion, we distribute the total mass 1 across level 2. In general, at any level r, there are K streamsofprobabilityflowatLevelr. Forε =i,theprobabilityflowisgivenby 1 r pi∏ pi+ [εj=i]∏ p−l [εj=l] ,i=1,...,K (4) j=2(cid:34) l=i (cid:35) (cid:0) (cid:1) (cid:54) (cid:0) (cid:1) WeusethisflowtoassignmasstotheKr intervalsof[0,1]atlevelrinthesamewayasabove. Wemay alsoverifyviaalgebraicmanipulationthat r 1 − r pi = pi pi++∑pi− = pi ∑ ∏ pi+ [εj=i]∏ p−l [εj=l] (5) (cid:32) l=i (cid:33) ε2,...,εrj=2(cid:34) l=i (cid:35) (cid:54) (cid:0) (cid:1) (cid:54) (cid:0) (cid:1) wherethefirstequalityisduetothefirststatementofLemma1,andthesecondisdueto(4). 2.1 Example construction For an illustration, refer to Figure 2. In this example, we construct a sequence of length N = 3 of categorical variables with K = 3 categories. At each level r, there are 3r nodes corresponding to 3r partitionsoftheinterval [0,1]. Notethateachtimethenodesplitsinto3children,thesumofthesplit probabilitiesis1. Despitetheoutcomeofthepreviousrandomvariable,thenextonealwayshasthree possibilities. Thesamplespaceofcategoricalvariablesequencesoflength3has33=27possibilities. Some example sequences are (1,3,2) with probability p p p , (2,1,2) with probability p p p+, and (3,1,1) 1 3− 2− 2 1− 2 withprobability p p p . TheseprobabilitiescanbedeterminedbytracingdownthetreeinFigure2. 3 1− 1− 2.2 Properties 2.2.1 IdenticallyDistributedbutDependent We now show the most important property of this class of sequences– that they remain identically distributeddespitelosingindependence. Lemma2. P(ε =i)= p ;i=1,...,K,r N. r i ∈ Proof. Theproofproceedsviainduction. r=1isclearbydefinition,soweillustrateanadditionalbase case. Forr=2,fromLemma1,andfori=1,...,K, P(ε2 =i)=PK2−1 KK2j,KjK+2 1 = pipi++pi−∑pj = pi j(cid:91)=1 (cid:18) (cid:21) j(cid:54)=i   Thenatlevelr,inkeepingwiththealternativeexpression(3),weexpressε intermsofthespecific r nodesatlevelr: ε = ∑Kr εl1 , whereεl = l modK, 0(cid:54)≡l modK (6) r l=1 r lK−r1,Klr r (cid:40)K, 0≡l modK (cid:18) (cid:21) Letml betheprobabilitymassforεl. Then,forl =1,...,Kr andi=1,...,K 1, r r − P(εl =i), i l modK ml = r ≡ (7) r (cid:40)P(εr =K), 0 l modK ≡ AGeneralizedMultinomialDistributionfromDependentCategoricalRandomVariables—6/15 Astheinductivehypothesis,assumethatfori=1,...,K 1, − p =P(ε =i)=P Kr−1 εl =i = K∑r−1 P εl =i = K∑r−1 ml i r−1  l(cid:91)=1 { r−1 } l=1 (cid:16) r−1 (cid:17) l=1 r−1 i≡l modK  i≡l modK i≡l modK   Eachmassml issplitintoK piecesinthefollowingway r 1 − ml p++∑K p , l =1,...,K r 1 1 j=2 i− − ml (cid:16)p++∑ p (cid:17), l =K+1,...,2K  r 1 2 j=2 i−  − (cid:54) mrl−1 =mrl−1(cid:16)(cid:16)p3++∑j(cid:54)=3pi−(cid:17)(cid:17), ..l =2K+1,...,3K (8) . mrl−1 p+K +∑j(cid:54)=Kpi− , l =Kr−1−K+1,...,Kr−1  (cid:16) (cid:17)   whichmaybewrittenas P(ε =1)+P(εl =1)=ml +P(εl =1), l =1,...,K r r (cid:54) r r (cid:54) P(εr =1)+P(εlr (cid:54)=1)=mrl +P(εlr (cid:54)=1), l =K+1,...,2K mrl 1 =P(εr =1)+P(εlr (cid:54)=1)=mrl +P(εlr (cid:54)=1), l =2K+1,...,3K (9) −  ... P(εr =1)+P(εlr (cid:54)=1)=mrl +P(εlr (cid:54)=1), l =Kr−1−K+1,...,Kr−1      Then P(εr =1)= ∑Kr mrξ = ∑K mrl 1p1++ K∑r−1 mrl 1p1− (10) ξ=1 l=1 − l=K+1 − 1 ξ modK ≡ When1 l modK, ≡ K 1 K p1+ ∑− mrl+ξ1 =mrl 1p1++mrl 1 ∑p−j =mrl 1 (11) (cid:32)ξ=0 − (cid:33) − − (cid:32)j=2 (cid:33) − Equation11holdsbecauseml+ξ = p−l+ξ,ξ =0,...,K 1,andbyofLemma1. Thus, r 1 p+ − − 1 P(εr =1)= ∑Kr mrξ = ∑K mrl 1p1++ K∑r−1 mrl 1p1− = ∑K mrl 1+ K∑r−1 mrl 1 =K∑r−1mrl 1 = p1 ξ=1 l=1 − l=K+1 − l=1 − l=K+1 − l=1 − 1 ξ modK 1 l modK ≡ ≡ (12) Asimilarprocedurefori=2,...,K followsandtheproofiscomplete. AGeneralizedMultinomialDistributionfromDependentCategoricalRandomVariables—7/15 2.2.2 PairwiseCross-CovarianceMatrix Wenowgivethepairwisecross-covariancematrixfordependentcategoricalrandomvariables. Theorem 1 (Cross-Covariance of Dependent Categorical Random Variables). Denote Λι,τ be the K K × cross-covariancematrixofε andε ,ι,τ=1,...,n,definedasΛι,τ =E[(ε E[ε ])(ε E[ε ])]. Thentheentries ι τ ι ι τ τ − − δp (1 p ), i= j δ2p (1 p ), i= j ofthematrixaregivenbyΛ1,τ = i − i ,τ 2,andΛι,τ = i − i ,τ>ι,ι=1. ij (cid:40) δpipj, i = j ≥ ij (cid:40) δ2pipj, i = j (cid:54) − (cid:54) − (cid:54) Proof. TheijthentryofΛι,τ isgivenby Cov([ε =i],[ε =j])=E[[ε =i][ε =j]] E[[ε =i]]E[[ε =j]]=P(ε =i,ε =j) P(ε =i)P(ε =j) (13) ι τ ι τ ι τ ι τ ι τ − − Letι=1. For j=i,andτ=2, Cov([ε =i],[ε =i])=P(ε =i,ε =i) p2 = p p+ p2 =δp (1 p ) (14) 1 2 1 2 − i i i − i i − i For j=iandτ=2, (cid:54) Cov([ε1 =i],[ε2 = j])=P(ε1 =i,ε2 = j)−pipj = pip−j −pipj =−δpipj (15) p p+, j=i Forτ =2,itsufficestoshowthat P(ε =i,ε = j)= i i . 1 τ (cid:54) (cid:40)pip−j , j(cid:54)=i Startingfromε ,thetreeissplitintoK largebranchesgovernedbytheresultsofε . Thenatlevelr, 1 1 thereareKr nodesindexedbyl. EachoftheKlargebranchescontainsKr 1 ofthesenodes. Thatis, − ε1 =1branchcontainsnodesl =1,...,Kr−1 ε1 =2branchcontainsnodesl =Kr−1+1,...,2Kr−1 . . . ε =K branchcontainsnodesl =Kr K+1,...,Kr 1 − Then P(ε =1,εl =i)=ml; i l modK,l =1,...,Kr 1. Wehavealreadyshownthebasecaseforr=2. 1 r r ≡ − So,asaninductivehypothesis,assume p p+, i=1 P(ε =1,ε =i)= 1 1 1 r 1 − (cid:40)p1pi−, i(cid:54)=1 Thenwehavethatfori=1, p p+ =P(ε =1,ε =1)= K∑r−2 ml 1 1 1 r 1 r 1 − l=1 − 1 l modK ≡ andfori =1, (cid:54) p1pi− =P(ε1 =1,εr 1 =i)= K∑r−2 mrl 1 − l=1 − i l modK ≡ Movingonestepdownthetree(stillnotingthatε =1),wehaveseenthattheeachmassml splitsas 1 r 1 − K K mrl 1 = p1+mrl 1+mrl 1∑p−j =P(ε1 =1,εr =1)+mrl 1∑p−j − − − j=2 − j=2 AGeneralizedMultinomialDistributionfromDependentCategoricalRandomVariables—8/15 Therefore, P(ε =1,ε =1)= K∑r−1 ml 1 r r l=1 1 l modK ≡ = K∑r−2 ml p++ ∑K K∑r−2 ml p r 1 1 r 1 −j l=1 − j=2 l=1 − 1 l modK 1 l modK ≡ (cid:54)≡ K = p p+p++p p+∑p 1 1 1 1 1 −j j=2 = p p+ 1 1 p p+, j=i Asimilarstrategyshowsthat P(ε =1,ε =i)= p p andthat P(ε =i,ε = j)= i i . 1 r 1 i− 1 r (cid:40)pip−j , j(cid:54)=i Next,wewillcomputetheijthentryofΛι,τ,ι>1,τ>ι. Forι=2,τ=3, p p+ 2+(p )2∑ p , i= j i i i− j=i j P(ε2 =i,ε3 = j)=pip(cid:0)i+p−(cid:1)j +pjpi−p+j +(cid:54) ∑ll(cid:54)==ijplpi−p−j , i(cid:54)= j (16) (cid:54) This can be seen bytracing the tree construction from level 2 to level 3, with an example given in Figure2. Next,wewillshowthat(16)holdsforτ>3. First, note that P(ε =1,εl =1) = ml for 1 l modK, and l = (ξ 1)Lτ 1+ j, where ξ =1,...,K, 2 τ τ ≡ − − j=1,...,Kτ 2. Wehaveshownthebasecasewhereτ=3. Fortheinductivehypothesis,assumethat − K P(ε2 =1,ετ 1 =1)= p1 p1+ 2+(p1−)2∑p−j − j=2 (cid:0) (cid:1) Thenwehavethat p 2∑K p =∑ ml forthel definedabove. Then,ateachml ,wehave 1− j=2 −j 1 l modK τ 1 τ 1 ≡ − − thefollowingmasssplits: (cid:0) (cid:1) K mlτ 1 =mlτ 1p1++mlτ 1∑p−j , ξ =1 − − − j=2 (17) mlτ 1 =mlτ 1pi++mlτ 1∑mlτ 1p−j , ξ =i, i=2,...,K − − − j=i − (cid:54) Then K P(ε =1,ε =1)= ∑ ∑ ml (18) 2 τ τ ξ=1 l,ξ 1 l modK ≡ Usingthesametacticas(11),weseethatusing(17),addingthecomponents,andcombiningtheterms correctly,theproofiscompleteforanyτ. Theprooffor P(ε =i,ε = j)foranyi,jfollowssimilarly,and 2 τ theprooffor P(ε =i,ε = j)followsfromreducingtotheaboveprovenclaims. ι τ Inthenextsection,weexploitthedesirableidenticaldistributionofthecategoricalsequenceinorder toprovideageneralizedmultinomialdistributionforthecountsineachcategory. AGeneralizedMultinomialDistributionfromDependentCategoricalRandomVariables—9/15 3. Generalized Multinomial Distribution FromtheconstructioninSection2,wederiveageneralizedmultinomialdistributioninwhichallcategor- icalvariablesareidenticallydistributedbutnolongerindependent. Theorem2(GeneralizedMultinomialDistribution). Let ε ,...,ε becategoricalrandomvariableswithcat- 1 N egories 1,...,K, constructed as in Section 2. Let X = ∑N [ε =i],i =1,...,K, where [ ] is the Iverson bracket. i j=1 j · DenoteX=(X ,...,X ),withobservedvaluesx=(x ,...,x ). Then 1 K 1 K P(X=x)= ∑K pi(x (N1)−!∏1)! x ! pi+ xi−1∏ p−j xj i=1 i− j(cid:54)=i j (cid:0) (cid:1) j(cid:54)=i(cid:16) (cid:17) Proof. Forbrevity,letε =(ε ,...,ε )denotethesequenceofncategoricalrandomvariableswiththe ( 1) 2 n − firstvariableremoved. Conditioningonε , 1 K ∑ P(X=x)= P(X=x ε =i) 1 | i=1 K N K N = ∑p ∑ ∏ p+ [εj=i]∏∏ p [εl=k] i i −l i=1 ε (1)j=1 k=1l=1 X−=x (cid:0) (cid:1) k=i (cid:0) (cid:1) (cid:54) K = ∑p ∑ p+ ∑Nj=1[εj=i]∏ p ∑Nj=1[εj=l] i i −l  i=1 ε2,...,εN l=i X=x (cid:0) (cid:1) (cid:54) (cid:0) (cid:1)   Ntoorwes,iwdehienncεa1te=go1r,yth1e,r(eNa−r1e−x2(xx1N−1−−111))wcoaymsbtihneatrieomnsaionfinthgeNre−m1ai−nixn1g−N1−ca1tecgaotergicoarlicvaalrviaabrlieasblceasn{rεeis}iiNd=e2 incategory2,andsoforth. Finally,thereare(N−1−x1x−K1∑iK=−21xi)waysthefinalunallocated{εi}canbein categoryK. Thus, P(X=x ε =1) 1 | = p1 xN−11 N−1−x x1−1 ··· N−1−x1x−1−∑Kj=−21xj p1+ x1−1∏K p−j xj (cid:18) 1− (cid:19)(cid:18) 2 (cid:19) (cid:18) K (cid:19) j=2(cid:16) (cid:17) (cid:0) (cid:1) = p1(x (N1)−!∏1)K! x ! p1+ x1−1∏K p−j xj (19) 1− j=2 j j=2(cid:16) (cid:17) (cid:0) (cid:1) Similarly,fori=2,...,K P(X=x ε =i) 1 | = pi Nx−1 ··· N−1x−∑ij−=11 ··· N−1−xi−x1−∑Kj(cid:54)=−i,1j=1xj p1+ x1−1∏K p−j xj (cid:18) 1 (cid:19) (cid:18) i (cid:19) (cid:18) K (cid:19) j=2 (cid:16) (cid:17) (cid:0) (cid:1) = pi(x (N1)−!∏1)! x ! pi+ xi−1∏ p−j xj (20) i− j(cid:54)=i j (cid:0) (cid:1) j(cid:54)=i(cid:16) (cid:17) Summingcompletestheproof. AGeneralizedMultinomialDistributionfromDependentCategoricalRandomVariables—10/15 4. Properties ThissectiondetailssomeusefulpropertiesoftheGeneralizedMultinomialDistributionandthedependent categoricalrandomvariables. 4.1 Marginal Distributions Theorem 3 (Univariate Marginal Distribution). The univariate marginal distribution of the Generalized MultinomialDistributionistheGeneralizedBinomialDistribution. Thatis, P(Xi =xi)=q Nx−1 pi− xi q− N−1−xi +pi Nx −11 pi+ xi−1 q+ N−1−(xi−1) (21) (cid:18) i (cid:19) (cid:18) i− (cid:19) (cid:0) (cid:1) (cid:0) (cid:1) (cid:0) (cid:1) (cid:0) (cid:1) whereq=∑ p ,q+ =q+δp ,andq =q δq j=i j i − (cid:54) − Proof. First,weclaimthefollowing: q+ =q+δp = p++∑ p ,l =2,...,K. Thismaybejustifiedviaa i l j=l −j (cid:54) j=i (cid:54) simplemanipulationofdefinitions: p++∑p = p +δ∑(p δp )+∑(p δp )= p +∑p +δp =q+δp l −j l j− j j− j l j i i j=l j=l j=l j=l (cid:54) (cid:54) (cid:54) (cid:54) j=i j=i (cid:54) (cid:54) Similarly,q =q δq. Thus,wemaycollapsethenumberofcategoriesto2: Categoryi,andeverything − else. Now,notice−thatforl (cid:54)=i, p+l xl−1 ∏ p−j xj =(q+)N−xi−1 forl =1,...,K andl (cid:54)=i. Fixk(cid:54)=i. Then j=l (cid:54) (cid:16) (cid:17) (cid:0) (cid:1) j i ≤ pk(x (N1)−!∏1)! x !(pk)xk−1∏ p−j xj = pk pi− xi q+ N−xi−1 x !(x (N1−)!1∏)! x ! k− j(cid:54)=k j j(cid:54)=k(cid:16) (cid:17) (cid:0) (cid:1) (cid:0) (cid:1) i k− jj(cid:54)==ki j (cid:54) N 1 = pk x− pi− xi q+ N−xi−1 (22) (cid:18) i (cid:19) (cid:0) (cid:1) (cid:0) (cid:1) Then ∑K pi(x (N1)−!∏1)! x ! pi+ xi−1∏ p−j xj = pi(x (N1)−!∏1)! x ! pi+ xi−1 q− N−1−(xi−1) i=1 i− j(cid:54)=i j (cid:0) (cid:1) j(cid:54)=i(cid:16) (cid:17) i− j(cid:54)=i j (cid:0) (cid:1) (cid:0) (cid:1) +∑pk(x (N1)−!∏1)! x ! p+k xk−1∏ p−j xj k(cid:54)=i k− j(cid:54)=k j (cid:0) (cid:1) j(cid:54)=k(cid:16) (cid:17) = pi Nx −11 pi+ xi−1 q− N−1−(xi−1) (cid:18) i− (cid:19) +∑pk N(cid:0) x−(cid:1)1 p(cid:0)i− (cid:1)xi q+ N−xi−1 k=i (cid:18) i (cid:19) (cid:54) (cid:0) (cid:1) (cid:0) (cid:1) = pi Nx −11 pi+ xi−1 q− N−1−(xi−1) (cid:18) i− (cid:19) N (cid:0)1 (cid:1) (cid:0) (cid:1) +q x− pi− xi q+ N−xi−1 (cid:18) i (cid:19) (cid:0) (cid:1) (cid:0) (cid:1) Theabovetheoremshowsanotherwaythegeneralizedmultinomialdistributionisanextensionof thegeneralizedbinomialdistribution.

A Generalized Multinomial Distribution from Dependent Categorical Random Variables PDF

0.33 MB·

by Rachel Traylor

#journals #arxiv

Checking for file health...

Save to my drive

Quick download

Download

Download A Generalized Multinomial Distribution from Dependent Categorical Random Variables PDF Free - Full Version

by Rachel Traylor| 0.33

Download A Generalized Multinomial Distribution from Dependent Categorical Random Variables by Rachel Traylor in PDF format completely FREE. No registration required, no payment needed. Get instant access to this valuable resource on PDFdrive.to!

Free Download PDF

About A Generalized Multinomial Distribution from Dependent Categorical Random Variables

No description available for this book.

Detailed Information

Author:	Rachel Traylor
File Size:	0.33
Format:	PDF
Price:	FREE

Download Free PDF

Safe & Secure Download - No registration required

Why Choose PDFdrive for Your Free A Generalized Multinomial Distribution from Dependent Categorical Random Variables Download?

100% Free: No hidden fees or subscriptions required for one book every day.
No Registration: Immediate access is available without creating accounts for one book every day.
Safe and Secure: Clean downloads without malware or viruses
Multiple Formats: PDF, MOBI, Mpub,... optimized for all devices
Educational Resource: Supporting knowledge sharing and learning

Frequently Asked Questions

Is it really free to download A Generalized Multinomial Distribution from Dependent Categorical Random Variables PDF?

Yes, on https://PDFdrive.to you can download A Generalized Multinomial Distribution from Dependent Categorical Random Variables by Rachel Traylor completely free. We don't require any payment, subscription, or registration to access this PDF file. For 3 books every day.

How can I read A Generalized Multinomial Distribution from Dependent Categorical Random Variables on my mobile device?

After downloading A Generalized Multinomial Distribution from Dependent Categorical Random Variables PDF, you can open it with any PDF reader app on your phone or tablet. We recommend using Adobe Acrobat Reader, Apple Books, or Google Play Books for the best reading experience.

Is this the full version of A Generalized Multinomial Distribution from Dependent Categorical Random Variables?

Yes, this is the complete PDF version of A Generalized Multinomial Distribution from Dependent Categorical Random Variables by Rachel Traylor. You will be able to read the entire content as in the printed version without missing any pages.

Is it legal to download A Generalized Multinomial Distribution from Dependent Categorical Random Variables PDF for free?

https://PDFdrive.to provides links to free educational resources available online. We do not store any files on our servers. Please be aware of copyright laws in your country before downloading.

The materials shared are intended for research, educational, and personal use in accordance with fair use principles.