Table Of ContentDellEMCTechnicalReports,Vol. 1,2017
OriginalResearcharticle
A Generalized Multinomial Distribution from
Dependent Categorical Random Variables
Rachel Traylor , Ph.D.
∗
7 Abstract
1
Categoricalrandomvariablesareacommonstapleinmachinelearningmethodsandotherapplications
0
2 across disciplines. Many times, correlation within categorical predictors exists, and has been noted to
n have an effect on various algorithm effectiveness, such as feature ranking and random forests. We
a present a mathematical construction of a sequence of identically distributed but dependent categorical
J
random variables, and give a generalized multinomial distribution to model the probability of counts of
4
such variables.
2
Keywords
]
R categorical variables — correlation — multinomial distribution — probability theory
P
.
h
Contents
t
a
m
Introduction 1
[
1 Background 2
1
2 ConstructionofDependentCategoricalVariables 3
v
5 2.1Example construction ............................................................ 5
5
2.2Properties ..................................................................... 5
9
6 IdenticallyDistributedbutDependent•PairwiseCross-CovarianceMatrix
0 3 GeneralizedMultinomialDistribution 9
.
1
4 Properties 10
0
7 4.1Marginal Distributions ........................................................... 10
1
4.2Moment Generating Function ..................................................... 11
:
v
4.3Moments of the Generalized Multinomial Distribution .................................. 11
i
X
5 GeneratingaSequenceofCorrelatedCategoricalRandomVariables 12
r
a 5.1Probability Distribution of a DCRV Sequence ........................................ 12
5.2Algorithm ..................................................................... 12
ExampleDCRVSequenceGeneration
6 Conclusion 14
Acknowledgments 14
References 14
Introduction
Bernoullirandomvariablesareinvaluableinstatisticalanalysisofphenomenahavingbinaryoutcomes,
however,manyothervariablescannotbemodeledbyonlytwocategories. Manytopicsinstatisticsand
OfficeoftheCTO,DellEMC
∗
AGeneralizedMultinomialDistributionfromDependentCategoricalRandomVariables—2/15
b
q p
0 1 ǫ
b b 1
q+ p− q− p+
ǫ
0 1 0 1 2
b b b b
q+ p− q+ p− q− p+ q− p+
0 1 0 10 1 0 1 ǫ
3
b b b b b b b b
b
b
b
Figure1. First-DependenceTreeStructure
machinelearningrelyoncategoricalrandomvariables,suchasrandomforestsandvariousclustering
algorithms.[6,7]. Manydatasetsexhibitcorrelationordependencyamongpredictorsaswellaswithin
predictors,whichcanimpactthemodelused.[6,9]. Thiscanresultinunreliablefeatureranking[9],and
inaccuraterandomforests[6].
SomeattemptstoremedytheseeffectsinvolveBayesianmodeling[2]andvariouscomputationaland
simulationmethods[8]. Inparticular,simulationofcorrelatedcategoricalvariableshasbeendiscussedin
theliteratureforsometime.[1,3,5]. Littleresearchhasbeendonetocreatemathematicalframeworkof
correlatedordependentcategoricalvariablesandtheresultingdistributionsoffunctionsofsuchvariables.
Korzeniowski [4] studied dependent Bernoulli variables, formalizing the notion of identically dis-
tributedbutdependentBernoullivariablesandderivingthedistributionofthesumofsuchdependent
variables,yieldingaGeneralizedBinomialDistribution.
In this paper, we generalize the work of Korzeniowski [4] and formalize the notion of a sequence
of identically distributed but dependent categorical random variables. We then derive a Generalized
MultinomialDistributionforsuchvariablesandprovidesomepropertiesofsaiddistribution. Wealso
giveanalgorithmtogenerateasequenceofcorrelatedcategoricalrandomvariables.
1. Background
Korzeniowskidefinedthenotionofdependenceinawaywewillrefertohereasdependenceofthefirst
kind(FKdependence). Suppose(ε ,...,ε )isasequenceofBernoullirandomvariables,and P(ε =1)= p.
1 N 1
Then,forε ,i 2,weweighttheprobabilityofeachbinaryoutcometowardtheoutcomeofε ,adjusting
i 1
≥
theprobabilitiesoftheremainingoutcomesaccordingly.
Formally,let0 δ 1,andq=1 p. Thendefinethefollowingquantities
≤ ≤ −
p+ :=P(εi =1 ε1 =1)= p+δq p− :=P(εi =0 ε1 =1)=q δq
| | − (1)
q+ :=P(εi =1 ε1 =0)= p δp q− :=P(εi =0 ε1 =0)=q+δp
| − |
Giventheoutcomeiofε ,theprobabilityofoutcomeioccurringinthesubsequentBernoullivariables
1
ε ,ε ,...ε is p+,i=1orq+,i=0. Theprobabilityoftheoppositeoutcomeisthendecreasedtoq and p ,
2 3 n − −
respectively.
Figure 1 illustrates the possible outcomes of a sequence of such dependent Bernoulli variables.
Korzeniowskishowedthat,despitethisconditionaldependency, P(ε =1)= p i. Thatis,thesequence
i
∀
ofBernoullivariablesisidenticallydistributed,withcorrelationshowntobe
δ, i=1
Cor(ε ,ε )=
i j (cid:40)δ2, i = j,i,j 2
(cid:54) ≥
AGeneralizedMultinomialDistributionfromDependentCategoricalRandomVariables—3/15
TheseidenticallydistributedbutcorrelatedBernoullirandomvariablesyieldaGeneralizedBinomial
distribution with a similar form to the standard binomial distribution. In our generalization, we use
the same form of FK dependence, but for categorical random variables. We will construct a sequence
ofidenticallydistributedbutdependentcategoricalvariablesfromwhichwewillbuildageneralized
multinomial distribution. When the number of categories K =2, the distribution reverts back to the
generalized binomial distribution of Korzeniowski [4]. When the sequence is fully independent, the
distributionrevertsbacktotheindependentcategoricalmodelandthestandardmultinomialdistribution,
andwhenthesequenceisindependentandK=2,werecoverthestandardbinomialdistribution. Thus,
thisnewdistributionrepresentsamuchlargergeneralizationthanpriormodels.
2. Construction of Dependent Categorical Variables
b
p p p
1 2 3
ǫ
1 2 3 1
b b b
p+1 p−2 p−3 p−1 p+2 p−3 p−1 p−2 p+3
ǫ
1 2 3 1 2 3 1 2 3 2
b b b b b b b b b
p+1 p−2 p−3p+1 p−2p−3 p+1 p−2 p−3 p−1 p+2 p−3p−1 p+2p−3 p−1 p+2 p−3p−1 p−2 p+3p−1 p−2p+3 p−1 p−2 p+3
ǫ
3
1b 2b 3b 1b 2b 3b 1b 2b 3b 1b 2b 3b 1b 2b 3b 1b 2b 3b 1b 2b 3b 1b 2b 3b 1b 2b 3b
Figure2. Probabilitydistributionat N =3forK=3
Suppose each categorical variable has K possible categories, so the sample space S = 1,...,K 1 The
{ }
construction of the correlated categorical variables is based on a probability mass distribution over
K adicpartitionsof[0,1]. Wewillfollowgraphterminologyinourconstruction,asthislendsavisual
−
representation of the construction. We begin with a parent node and build a K-nary tree, where the
endnodesarelabeledbytherepeatingsequence(1,...,K). Thus,after N steps,thegraphhasKN nodes,
with each node labeled 1,...,K repeating in the natural order and assigned injectively to the intervals
0, 1 , 1 , 2 ,..., KN 1,1 . Define
KN Kn KN K−n
(cid:16) (cid:105)
(cid:0) (cid:3) (cid:0) (cid:3)
ε =i on Kj,Kj+i
N KN KN
(2)
εN =K on(cid:16) K(j+K1N)−1(cid:105),K(Kj+N1)
(cid:16) (cid:105)
wherei=1,...,K 1,and j=0,1,...,KN 1 1. Analternateexpressionfor(2)is
−
− −
ε =ion l 1, l , i l modK, i=1,...,K 1
N K−N KN ≡ −
(3)
(cid:16) (cid:105)
ε =K on l 1, l , 0 l modK
N K−N KN ≡
(cid:16) (cid:105)
1Theseintegersshouldnotbetakenasorderedorsequential,butratherascharactertitlesofcategories.
AGeneralizedMultinomialDistributionfromDependentCategoricalRandomVariables—4/15
Toeachofthebranchesofthetree,andbytransitivitytoeachoftheKN partitionsof[0,1],weassigna
probabilitymasstoeachnodesuchthatthetotalmassis1ateachlevelofthetreeinasimilarmanner
to[4].
Let0< p <1,i=1,...,K suchthat∑K p =1,andlet0 δ 1bethedependencycoefficient. Define
i i=1 i ≤ ≤
p+ := p +δ∑ p , p = p δp
foiri=1i,...,K.lT(cid:54)=hieslepi−robabii−litiesisatisfytwoimportantcriteria:
Lemma1.
∑K p = p++∑ p =1
• i=1 i i l(cid:54)=i −l
p p++p ∑ p = p
• i i i− l(cid:54)=i l i
Proof. Thefirstisobviousfromthedefinitionsof pi+/− above. Thesecondstatementfollowsclearlyfrom
algebraicmanipulationofthedefinitions: p p++p ∑ p = p2+p (1 p )= p
i i i− l(cid:54)=i l i i − i i
WenowgivetheconstructioninstepsdowneachleveloftheK-narytree.
LEVEL1:
Parentnodehasmass1, withmasssplit1 ∏K p[ε1=i], where [ ] isanIversonbracket. Thislevelcorre-
· i=1 i ·
spondstoasequenceεofdependentcategoricalvariablesoflength1.
ε /Branch Path MassatNode Interval
1
1 parent 1 p (0,1/K]
1
→
2 parent 2 p (1/K,2/K]
2
→
. . . .
. . . .
. . . .
i parent i p ((i 1)/K,i/K]
i
→ −
. . . .
. . . .
. . . .
K parent K Mass p ((K 1)/K,1]
K
→ −
Table1. ProbabilitymassdistributionatLevel1
LEVEL2:
Level2hasKnodes,withKbranchesstemmingfromeachnode. Thiscorrespondstoasequenceoflength
2: ε=(ε ,ε ). Denotei.1asnodeifromlevel1. Fori=1,...,K,
1 2
Nodei.1hasmass p ,withmasssplit p p+ [ε2=i] ∏K p [ε2=j]
i i i −j
j=1,j=i
(cid:54) (cid:16) (cid:17)
(cid:0) (cid:1)
ε /Branch Path MassatNode Interval
2
(i 1)K (i 1)K+1
1 i.1→1 pip1− −K2 , −K2
2 i.1→2 pip2− (cid:16)(i−1K)2K+1,(i−1K)2K+(cid:105)2
.. .. .. (cid:16) .. (cid:105)
. . . .
i i.1→i pipi+ (i−1)KK+2(i−1),(i−1K)2K+i
.. .. .. (cid:16) .. (cid:105)
. . . .
K i.1 K p p iK 1,iK
→ i −K K−2 K2
Table2. ProbabilitymassdistributionatLevelII,Nodei
(cid:0) (cid:3)
AGeneralizedMultinomialDistributionfromDependentCategoricalRandomVariables—5/15
In this fashion, we distribute the total mass 1 across level 2. In general, at any level r, there are K
streamsofprobabilityflowatLevelr. Forε =i,theprobabilityflowisgivenby
1
r
pi∏ pi+ [εj=i]∏ p−l [εj=l] ,i=1,...,K (4)
j=2(cid:34) l=i (cid:35)
(cid:0) (cid:1) (cid:54) (cid:0) (cid:1)
WeusethisflowtoassignmasstotheKr intervalsof[0,1]atlevelrinthesamewayasabove. Wemay
alsoverifyviaalgebraicmanipulationthat
r 1
− r
pi = pi pi++∑pi− = pi ∑ ∏ pi+ [εj=i]∏ p−l [εj=l] (5)
(cid:32) l=i (cid:33) ε2,...,εrj=2(cid:34) l=i (cid:35)
(cid:54) (cid:0) (cid:1) (cid:54) (cid:0) (cid:1)
wherethefirstequalityisduetothefirststatementofLemma1,andthesecondisdueto(4).
2.1 Example construction
For an illustration, refer to Figure 2. In this example, we construct a sequence of length N = 3 of
categorical variables with K = 3 categories. At each level r, there are 3r nodes corresponding to 3r
partitionsoftheinterval [0,1]. Notethateachtimethenodesplitsinto3children,thesumofthesplit
probabilitiesis1. Despitetheoutcomeofthepreviousrandomvariable,thenextonealwayshasthree
possibilities. Thesamplespaceofcategoricalvariablesequencesoflength3has33=27possibilities. Some
example sequences are (1,3,2) with probability p p p , (2,1,2) with probability p p p+, and (3,1,1)
1 3− 2− 2 1− 2
withprobability p p p . TheseprobabilitiescanbedeterminedbytracingdownthetreeinFigure2.
3 1− 1−
2.2 Properties
2.2.1 IdenticallyDistributedbutDependent
We now show the most important property of this class of sequences– that they remain identically
distributeddespitelosingindependence.
Lemma2. P(ε =i)= p ;i=1,...,K,r N.
r i
∈
Proof. Theproofproceedsviainduction. r=1isclearbydefinition,soweillustrateanadditionalbase
case. Forr=2,fromLemma1,andfori=1,...,K,
P(ε2 =i)=PK2−1 KK2j,KjK+2 1 = pipi++pi−∑pj = pi
j(cid:91)=1 (cid:18) (cid:21) j(cid:54)=i
Thenatlevelr,inkeepingwiththealternativeexpression(3),weexpressε intermsofthespecific
r
nodesatlevelr:
ε = ∑Kr εl1 , whereεl = l modK, 0(cid:54)≡l modK (6)
r l=1 r lK−r1,Klr r (cid:40)K, 0≡l modK
(cid:18) (cid:21)
Letml betheprobabilitymassforεl. Then,forl =1,...,Kr andi=1,...,K 1,
r r −
P(εl =i), i l modK
ml = r ≡ (7)
r
(cid:40)P(εr =K), 0 l modK
≡
AGeneralizedMultinomialDistributionfromDependentCategoricalRandomVariables—6/15
Astheinductivehypothesis,assumethatfori=1,...,K 1,
−
p =P(ε =i)=P Kr−1 εl =i = K∑r−1 P εl =i = K∑r−1 ml
i r−1 l(cid:91)=1 { r−1 } l=1 (cid:16) r−1 (cid:17) l=1 r−1
i≡l modK i≡l modK i≡l modK
Eachmassml issplitintoK piecesinthefollowingway
r 1
−
ml p++∑K p , l =1,...,K
r 1 1 j=2 i−
−
ml (cid:16)p++∑ p (cid:17), l =K+1,...,2K
r 1 2 j=2 i−
− (cid:54)
mrl−1 =mrl−1(cid:16)(cid:16)p3++∑j(cid:54)=3pi−(cid:17)(cid:17), ..l =2K+1,...,3K (8)
.
mrl−1 p+K +∑j(cid:54)=Kpi− , l =Kr−1−K+1,...,Kr−1
(cid:16) (cid:17)
whichmaybewrittenas
P(ε =1)+P(εl =1)=ml +P(εl =1), l =1,...,K
r r (cid:54) r r (cid:54)
P(εr =1)+P(εlr (cid:54)=1)=mrl +P(εlr (cid:54)=1), l =K+1,...,2K
mrl 1 =P(εr =1)+P(εlr (cid:54)=1)=mrl +P(εlr (cid:54)=1), l =2K+1,...,3K (9)
− ...
P(εr =1)+P(εlr (cid:54)=1)=mrl +P(εlr (cid:54)=1), l =Kr−1−K+1,...,Kr−1
Then
P(εr =1)= ∑Kr mrξ = ∑K mrl 1p1++ K∑r−1 mrl 1p1− (10)
ξ=1 l=1 − l=K+1 −
1 ξ modK
≡
When1 l modK,
≡
K 1 K
p1+ ∑− mrl+ξ1 =mrl 1p1++mrl 1 ∑p−j =mrl 1 (11)
(cid:32)ξ=0 − (cid:33) − − (cid:32)j=2 (cid:33) −
Equation11holdsbecauseml+ξ = p−l+ξ,ξ =0,...,K 1,andbyofLemma1. Thus,
r 1 p+ −
− 1
P(εr =1)= ∑Kr mrξ = ∑K mrl 1p1++ K∑r−1 mrl 1p1− = ∑K mrl 1+ K∑r−1 mrl 1 =K∑r−1mrl 1 = p1
ξ=1 l=1 − l=K+1 − l=1 − l=K+1 − l=1 −
1 ξ modK 1 l modK
≡ ≡
(12)
Asimilarprocedurefori=2,...,K followsandtheproofiscomplete.
AGeneralizedMultinomialDistributionfromDependentCategoricalRandomVariables—7/15
2.2.2 PairwiseCross-CovarianceMatrix
Wenowgivethepairwisecross-covariancematrixfordependentcategoricalrandomvariables.
Theorem 1 (Cross-Covariance of Dependent Categorical Random Variables). Denote Λι,τ be the K K
×
cross-covariancematrixofε andε ,ι,τ=1,...,n,definedasΛι,τ =E[(ε E[ε ])(ε E[ε ])]. Thentheentries
ι τ ι ι τ τ
− −
δp (1 p ), i= j δ2p (1 p ), i= j
ofthematrixaregivenbyΛ1,τ = i − i ,τ 2,andΛι,τ = i − i ,τ>ι,ι=1.
ij (cid:40) δpipj, i = j ≥ ij (cid:40) δ2pipj, i = j (cid:54)
− (cid:54) − (cid:54)
Proof. TheijthentryofΛι,τ isgivenby
Cov([ε =i],[ε =j])=E[[ε =i][ε =j]] E[[ε =i]]E[[ε =j]]=P(ε =i,ε =j) P(ε =i)P(ε =j) (13)
ι τ ι τ ι τ ι τ ι τ
− −
Letι=1. For j=i,andτ=2,
Cov([ε =i],[ε =i])=P(ε =i,ε =i) p2 = p p+ p2 =δp (1 p ) (14)
1 2 1 2 − i i i − i i − i
For j=iandτ=2,
(cid:54)
Cov([ε1 =i],[ε2 = j])=P(ε1 =i,ε2 = j)−pipj = pip−j −pipj =−δpipj (15)
p p+, j=i
Forτ =2,itsufficestoshowthat P(ε =i,ε = j)= i i .
1 τ
(cid:54) (cid:40)pip−j , j(cid:54)=i
Startingfromε ,thetreeissplitintoK largebranchesgovernedbytheresultsofε . Thenatlevelr,
1 1
thereareKr nodesindexedbyl. EachoftheKlargebranchescontainsKr 1 ofthesenodes. Thatis,
−
ε1 =1branchcontainsnodesl =1,...,Kr−1
ε1 =2branchcontainsnodesl =Kr−1+1,...,2Kr−1
.
.
.
ε =K branchcontainsnodesl =Kr K+1,...,Kr
1
−
Then P(ε =1,εl =i)=ml; i l modK,l =1,...,Kr 1. Wehavealreadyshownthebasecaseforr=2.
1 r r ≡ −
So,asaninductivehypothesis,assume
p p+, i=1
P(ε =1,ε =i)= 1 1
1 r 1
− (cid:40)p1pi−, i(cid:54)=1
Thenwehavethatfori=1,
p p+ =P(ε =1,ε =1)= K∑r−2 ml
1 1 1 r 1 r 1
− l=1 −
1 l modK
≡
andfori =1,
(cid:54)
p1pi− =P(ε1 =1,εr 1 =i)= K∑r−2 mrl 1
− l=1 −
i l modK
≡
Movingonestepdownthetree(stillnotingthatε =1),wehaveseenthattheeachmassml splitsas
1 r 1
−
K K
mrl 1 = p1+mrl 1+mrl 1∑p−j =P(ε1 =1,εr =1)+mrl 1∑p−j
− − − j=2 − j=2
AGeneralizedMultinomialDistributionfromDependentCategoricalRandomVariables—8/15
Therefore,
P(ε =1,ε =1)= K∑r−1 ml
1 r r
l=1
1 l modK
≡
= K∑r−2 ml p++ ∑K K∑r−2 ml p
r 1 1 r 1 −j
l=1 − j=2 l=1 −
1 l modK 1 l modK
≡ (cid:54)≡
K
= p p+p++p p+∑p
1 1 1 1 1 −j
j=2
= p p+
1 1
p p+, j=i
Asimilarstrategyshowsthat P(ε =1,ε =i)= p p andthat P(ε =i,ε = j)= i i .
1 r 1 i− 1 r (cid:40)pip−j , j(cid:54)=i
Next,wewillcomputetheijthentryofΛι,τ,ι>1,τ>ι. Forι=2,τ=3,
p p+ 2+(p )2∑ p , i= j
i i i− j=i j
P(ε2 =i,ε3 = j)=pip(cid:0)i+p−(cid:1)j +pjpi−p+j +(cid:54) ∑ll(cid:54)==ijplpi−p−j , i(cid:54)= j (16)
(cid:54)
This can be seen bytracing the tree construction from level 2 to level 3, with an example given in
Figure2. Next,wewillshowthat(16)holdsforτ>3.
First, note that P(ε =1,εl =1) = ml for 1 l modK, and l = (ξ 1)Lτ 1+ j, where ξ =1,...,K,
2 τ τ ≡ − −
j=1,...,Kτ 2. Wehaveshownthebasecasewhereτ=3. Fortheinductivehypothesis,assumethat
−
K
P(ε2 =1,ετ 1 =1)= p1 p1+ 2+(p1−)2∑p−j
−
j=2
(cid:0) (cid:1)
Thenwehavethat p 2∑K p =∑ ml forthel definedabove. Then,ateachml ,wehave
1− j=2 −j 1 l modK τ 1 τ 1
≡ − −
thefollowingmasssplits:
(cid:0) (cid:1)
K
mlτ 1 =mlτ 1p1++mlτ 1∑p−j , ξ =1
− − − j=2 (17)
mlτ 1 =mlτ 1pi++mlτ 1∑mlτ 1p−j , ξ =i, i=2,...,K
− − − j=i −
(cid:54)
Then
K
P(ε =1,ε =1)= ∑ ∑ ml (18)
2 τ τ
ξ=1 l,ξ
1 l modK
≡
Usingthesametacticas(11),weseethatusing(17),addingthecomponents,andcombiningtheterms
correctly,theproofiscompleteforanyτ. Theprooffor P(ε =i,ε = j)foranyi,jfollowssimilarly,and
2 τ
theprooffor P(ε =i,ε = j)followsfromreducingtotheaboveprovenclaims.
ι τ
Inthenextsection,weexploitthedesirableidenticaldistributionofthecategoricalsequenceinorder
toprovideageneralizedmultinomialdistributionforthecountsineachcategory.
AGeneralizedMultinomialDistributionfromDependentCategoricalRandomVariables—9/15
3. Generalized Multinomial Distribution
FromtheconstructioninSection2,wederiveageneralizedmultinomialdistributioninwhichallcategor-
icalvariablesareidenticallydistributedbutnolongerindependent.
Theorem2(GeneralizedMultinomialDistribution). Let ε ,...,ε becategoricalrandomvariableswithcat-
1 N
egories 1,...,K, constructed as in Section 2. Let X = ∑N [ε =i],i =1,...,K, where [ ] is the Iverson bracket.
i j=1 j ·
DenoteX=(X ,...,X ),withobservedvaluesx=(x ,...,x ). Then
1 K 1 K
P(X=x)= ∑K pi(x (N1)−!∏1)! x ! pi+ xi−1∏ p−j xj
i=1 i− j(cid:54)=i j (cid:0) (cid:1) j(cid:54)=i(cid:16) (cid:17)
Proof. Forbrevity,letε =(ε ,...,ε )denotethesequenceofncategoricalrandomvariableswiththe
( 1) 2 n
−
firstvariableremoved. Conditioningonε ,
1
K
∑
P(X=x)= P(X=x ε =i)
1
|
i=1
K N K N
= ∑p ∑ ∏ p+ [εj=i]∏∏ p [εl=k]
i i −l
i=1 ε (1)j=1 k=1l=1
X−=x (cid:0) (cid:1) k=i (cid:0) (cid:1)
(cid:54)
K
= ∑p ∑ p+ ∑Nj=1[εj=i]∏ p ∑Nj=1[εj=l]
i i −l
i=1 ε2,...,εN l=i
X=x (cid:0) (cid:1) (cid:54) (cid:0) (cid:1)
Ntoorwes,iwdehienncεa1te=go1r,yth1e,r(eNa−r1e−x2(xx1N−1−−111))wcoaymsbtihneatrieomnsaionfinthgeNre−m1ai−nixn1g−N1−ca1tecgaotergicoarlicvaalrviaabrlieasblceasn{rεeis}iiNd=e2
incategory2,andsoforth. Finally,thereare(N−1−x1x−K1∑iK=−21xi)waysthefinalunallocated{εi}canbein
categoryK. Thus,
P(X=x ε =1)
1
|
= p1 xN−11 N−1−x x1−1 ··· N−1−x1x−1−∑Kj=−21xj p1+ x1−1∏K p−j xj
(cid:18) 1− (cid:19)(cid:18) 2 (cid:19) (cid:18) K (cid:19) j=2(cid:16) (cid:17)
(cid:0) (cid:1)
= p1(x (N1)−!∏1)K! x ! p1+ x1−1∏K p−j xj (19)
1− j=2 j j=2(cid:16) (cid:17)
(cid:0) (cid:1)
Similarly,fori=2,...,K
P(X=x ε =i)
1
|
= pi Nx−1 ··· N−1x−∑ij−=11 ··· N−1−xi−x1−∑Kj(cid:54)=−i,1j=1xj p1+ x1−1∏K p−j xj
(cid:18) 1 (cid:19) (cid:18) i (cid:19) (cid:18) K (cid:19) j=2
(cid:16) (cid:17)
(cid:0) (cid:1)
= pi(x (N1)−!∏1)! x ! pi+ xi−1∏ p−j xj (20)
i− j(cid:54)=i j (cid:0) (cid:1) j(cid:54)=i(cid:16) (cid:17)
Summingcompletestheproof.
AGeneralizedMultinomialDistributionfromDependentCategoricalRandomVariables—10/15
4. Properties
ThissectiondetailssomeusefulpropertiesoftheGeneralizedMultinomialDistributionandthedependent
categoricalrandomvariables.
4.1 Marginal Distributions
Theorem 3 (Univariate Marginal Distribution). The univariate marginal distribution of the Generalized
MultinomialDistributionistheGeneralizedBinomialDistribution. Thatis,
P(Xi =xi)=q Nx−1 pi− xi q− N−1−xi +pi Nx −11 pi+ xi−1 q+ N−1−(xi−1) (21)
(cid:18) i (cid:19) (cid:18) i− (cid:19)
(cid:0) (cid:1) (cid:0) (cid:1) (cid:0) (cid:1) (cid:0) (cid:1)
whereq=∑ p ,q+ =q+δp ,andq =q δq
j=i j i −
(cid:54) −
Proof. First,weclaimthefollowing: q+ =q+δp = p++∑ p ,l =2,...,K. Thismaybejustifiedviaa
i l j=l −j
(cid:54)
j=i
(cid:54)
simplemanipulationofdefinitions:
p++∑p = p +δ∑(p δp )+∑(p δp )= p +∑p +δp =q+δp
l −j l j− j j− j l j i i
j=l j=l j=l j=l
(cid:54) (cid:54) (cid:54) (cid:54)
j=i j=i
(cid:54) (cid:54)
Similarly,q =q δq. Thus,wemaycollapsethenumberofcategoriesto2: Categoryi,andeverything
−
else. Now,notice−thatforl (cid:54)=i, p+l xl−1 ∏ p−j xj =(q+)N−xi−1 forl =1,...,K andl (cid:54)=i. Fixk(cid:54)=i. Then
j=l
(cid:54) (cid:16) (cid:17)
(cid:0) (cid:1) j i
≤
pk(x (N1)−!∏1)! x !(pk)xk−1∏ p−j xj = pk pi− xi q+ N−xi−1 x !(x (N1−)!1∏)! x !
k− j(cid:54)=k j j(cid:54)=k(cid:16) (cid:17) (cid:0) (cid:1) (cid:0) (cid:1) i k− jj(cid:54)==ki j
(cid:54)
N 1
= pk x− pi− xi q+ N−xi−1 (22)
(cid:18) i (cid:19)
(cid:0) (cid:1) (cid:0) (cid:1)
Then
∑K pi(x (N1)−!∏1)! x ! pi+ xi−1∏ p−j xj = pi(x (N1)−!∏1)! x ! pi+ xi−1 q− N−1−(xi−1)
i=1 i− j(cid:54)=i j (cid:0) (cid:1) j(cid:54)=i(cid:16) (cid:17) i− j(cid:54)=i j (cid:0) (cid:1) (cid:0) (cid:1)
+∑pk(x (N1)−!∏1)! x ! p+k xk−1∏ p−j xj
k(cid:54)=i k− j(cid:54)=k j (cid:0) (cid:1) j(cid:54)=k(cid:16) (cid:17)
= pi Nx −11 pi+ xi−1 q− N−1−(xi−1)
(cid:18) i− (cid:19)
+∑pk N(cid:0) x−(cid:1)1 p(cid:0)i− (cid:1)xi q+ N−xi−1
k=i (cid:18) i (cid:19)
(cid:54) (cid:0) (cid:1) (cid:0) (cid:1)
= pi Nx −11 pi+ xi−1 q− N−1−(xi−1)
(cid:18) i− (cid:19)
N (cid:0)1 (cid:1) (cid:0) (cid:1)
+q x− pi− xi q+ N−xi−1
(cid:18) i (cid:19)
(cid:0) (cid:1) (cid:0) (cid:1)
Theabovetheoremshowsanotherwaythegeneralizedmultinomialdistributionisanextensionof
thegeneralizedbinomialdistribution.