Table Of Contentk
Convergence of the -Means Minimization
Γ
Problem using -Convergence
MatthewThorpe1, FlorianTheil1, AdamM. Johansen1, and NeilCade2
1UniversityofWarwick, Coventry,CV4 7AL,UnitedKingdom
2Selex-ES, Luton,LU1 3PG,United Kingdom
Abstract
5
1
Thek-meansmethodisaniterativeclusteringalgorithmwhichassociateseachobservationwithoneofkclus-
0
2 ters.Ittraditionallyemploysclustercentersinthesamespaceastheobserveddata.Byrelaxingthisrequirement,it
ispossibletoapplythek-meansmethodtoinfinitedimensionalproblems,forexamplemultipletargettrackingand
r
p smoothingproblemsinthepresenceofunknowndataassociation. Via aΓ-convergenceargument,theassociated
A optimizationproblemisshowntoconvergeinthesensethatboththek-meansminimumandminimizersconverge
inthelargedatalimittoquantitieswhichdependupontheobserveddataonlythroughitsdistribution. Thetheory
3
issupplementedwithtwoexamplestodemonstratetherangeofproblemsnowaccessiblebythek-meansmethod.
] The first example combines a non-parametric smoothing problem with unknown data association. The second
T
addressestrackingusingsparsedatafromanetworkofpassivesensors.
S
.
h
1 Introduction
t
a
m
Thek-meansalgorithm[23]isatechniqueforassigningeachofacollectionofobserveddatatoexactlyoneofk
[ clusters, eachof whichhasa uniquecenter, in sucha way thateachobservationis assignedto the cluster whose
2 centerisclosesttothatobservationinanappropriatesense.
v Thek-meansmethodhastraditionallybeenusedwithlimitedscope.ItsusualapplicationhasbeeninEuclidean
0 spaces which restricts its application to finite dimensionalproblems. There are relatively few theoretical results
2
using the k-means methodologyin infinite dimensions of which [5,8,12,19–21,28] are the only papers known
3
to the authors. In the rightframework,post-hoctrack estimation in multiple targetscenarioswith unknowndata
1
0 associationcanbeviewedasaclusteringproblemandthereforeaccessibletothek-meansmethod.Insuchproblems
. one typically has finite-dimensionaldata, but would wish to estimate infinite dimensionaltracks with the added
1
0 complicationofunresolveddataassociation.Itisouraimtoproposeandcharacterizeaframeworkforthek-means
5 methodwhichcandealwiththisproblem.
1 Anaturalquestiontoaskofanyclusteringtechniqueiswhethertheestimatedclusteringstabilizesasmoredata
v: becomesavailable.Moreprecisely,weaskwhethercertainestimatesconverge,inanappropriatesense,inthelarge
i datalimit.Inordertoanswerthisquestioninourparticularcontextwefirstestablisharelatedoptimizationproblem
X
andmakeprecisethenotionofconvergence.
r
Consistencyofestimatorsforill-posedinverseproblemshasbeenwellstudied,forexample[14,24],butwith-
a
outthedataassociationproblem.Incontrasttostandardstatisticalconsistencyresults,wedonotassumethatthere
existsastructuralrelationshipbetweentheoptimizationproblemandthedata-generatingprocessinordertoestab-
lishconvergencetotrueparametervaluesinthelargedatalimit;rather,wedemonstrateconvergencetothesolution
ofarelatedlimitingproblem.
Thispapershowstheconvergenceoftheminimizationproblemassociatedwiththek-meansmethodinaframe-
work that is generalenoughto include exampleswhere the cluster centers are not necessarily in the same space
as the data points. In particular we are motivated by the application to infinite dimensional problems, e.g. the
smoothing-dataassociationproblem. Thesmoothing-dataassociationproblemis theproblemofassociatingdata
points (t ,z ) n [0,1] Rκtounknowntrajectoriesµ :[0,1] Rκforj =1,2,...,k. Bytreatingthetra-
{ i i }i=1 ⊂ × j →
jectoriesµ astheclustercentersonemayapproachthisproblemusingthek-meansmethodology.Thecomparison
j
ofdatapointstoclustercentersisapointwisedistance:d((t ,z ),µ )= µ (t ) z 2(where istheEuclidean
i i j j i i
| − | |·|
normonRκ). Toensuretheproblemiswell-posedsomeregularizationisalsonecessary. Fork = 1theproblem
reducestosmoothingandcoincideswiththelimitingproblemstudiedin[17]. Wewilldiscussthesmoothing-data
associationproblemmoreinSection4.3.
1
Letusnowintroducethenotationforourvariationalapproach.Thek-meansmethodisastrategyforpartition-
ingadatasetΨ = ξ n X intokclusterswhereeachclusterhascenterµ forj = 1,2,...,k. Firstletus
n { i}i=1 ⊂ j
considerthespecialcasewhenµ X.Thedatapartitionisdefinedbyassociatingeachdatapointwiththecluster
j
∈
centerclosesttoitwhichismeasuredbyacostfunctiond : X X [0, ). Traditionallythek-meansmethod
considersEuclideanspacesX =Rκ,wheretypicallywechoose×d(x,→y)=∞x y 2 = κ (x y )2. Wedefine
| − | i=1 i− i
theenergyforachoiceofclustercentersgivendataby
P
n k
1
f :Xk R f (µΨ )= d(ξ ,µ ),
n n n i j
→ | n
i=1j=1
X ^
where for any k variables, a ,a ,...,a , k a := min a ,...,a . The optimal choice of µ is that which
1 2 k j=1 j { 1 k}
minimizesf ( Ψ ). Wedefine
n n
·| Vθˆ = min f (µΨ ) R.
n n n
µ Xk | ∈
∈
Anassociated“limitingproblem”canbedefined
θ = min f (µ)
µ Xk ∞
∈
iid
whereweassume,inasensewhichwillbemadepreciselater,thatξ P forsomesuitableprobabilitydistribution,
i
∼
P,anddefine
k
f (µ)= d(x,µ )P(dx).
j
∞
Z j=1
^
InSection3wevalidatetheformulationbyfirstshowingthat,underregularityconditionsandwithprobabilityone,
the minimum energy converges: θˆ θ. And secondly by showing that (up to a subsequence) the minimizers
n
→
converge:µn µ whereµnminimizesf andµ minimizesf (againwithprobabilityone).
∞ n ∞
In a more→sophisticated version of the k-means method the r∞equirement that µ X can be relaxed. We
j
∈
insteadallowµ = (µ ,µ ,...,µ ) Yk forsomeotherBanachspace,Y,anddefinedappropriately. Thisleads
1 2 k
∈
to interestingstatistical questions. When Y is infinite dimensionaleven establishingwhetheror nota minimizer
existsisnon-trivial.
Whentheclustercenterisinadifferentspacetothedata,boundingthesetofminimizersbecomeslessnatural.
For example, consider the smoothing problem in which one wishes to fit a continuous function to a set of data
points. Thenaturalchoiceofcostfunctionisapointwisedistanceofthedatatothecurve. Theoptimalsolutionis
fortheclustercentertointerpolatethedatapoints:inthelimittheclustercentermaynolongerbewelldefined.In
particularwecannothopetohaveconvergingsequencesofminimizers.
In thesmoothingliteraturethis problemispreventedbyusinga regularizationtermr : Yk R. Fora cost
→
functiond:X Y [0, )theenergiesf ( Ψ ),f ():Yk Rareredefined
n n
× → ∞ ·| ∞ · →
n k
1
f (µΨ )= d(ξ ,µ )+λ r(µ)
n n i j n
| n
i=1j=1
X ^
k
f (µ)= d(x,µ )P(dx)+λr(µ).
j
∞
Z j=1
^
AddingregularizationchangesthenatureoftheproblemsowecommittimeinSection4tojustifyingourapproach.
Particularlywemotivatetreatingλ = λasaconstantindependentofn. We areabletorepeattheanalysisfrom
n
Section4;thatistoestablishthattheminimumandasubsequenceofminimizersstillconverge.
EarlyresultsassumedY =X wereEuclideanspacesandshowedtheconvergenceofminimizerstotheappro-
priatelimit[18,25]. Themotivationfortheearlyworkinthisareawastoshowconsistencyofthemethodology.In
particularthisrequirestheretobeanunderlying‘truth’.Thisrequirestheassumptionthatthereexistsauniquemin-
imizertothelimitingenergy.Theseresultsdonotholdwhenthelimitingenergyhasmorethanoneminimizer[4].
Inthispaperwediscussonlytheconvergenceofthemethodandassuchrequirenoassumptionastotheexistence
oruniquenessofaminimizertothelimitingproblem.Consistencyhasbeenstrengthenedtoacentrallimittheorem
in [26] also assuming a unique minimizer to the limiting energy. Other rates of convergence have been shown
in[2,3,9,22]. InHilbertspacesthereexistconvergenceresultsandratesofconvergencefortheminimum. In[5]
the authors show that f (µn) f (µ ) is of order 1 , however, there are no results for the convergenceof
| n − ∞ ∞ | √n
minimizers.Resultsexistfork ,seeforexample[8](whicharealsovalidforY =X).
→∞ 6
AssumingthatY =X,theconvergenceoftheminimizationprobleminareflexiveandseparableBanachspace
hasbeenprovedin[21]andasimilarresultinmetricspacesin[20]. In[19],theexistenceofaweaklyconverging
2
subsequencewasinferredusingtheresultsof[21].
Inthefollowingsectionweintroducethenotationandpreliminarymaterialusedinthispaper.
Wethen,inSection3,considerconvergenceinthespecialcasewhentheclustercentersareinthesamespace
asthedatapoints,i.e. Y = X. Inthiscasewedon’thaveanissuewithwell-posednessasthedatahasthesame
dimensionastheclustercenters.Forthisreasonweuseenergiesdefinedwithoutregularization.Theorem3.5shows
thattheminimumconverges,i.e. θˆ θ asn ,foralmosteverysequenceofobservationsandfurthermore
n
→ → ∞
wehaveasubsequenceµnm ofminimizersoffnm whichweaklyconvergetosomeµ∞whichminimizesf∞.
ThisresultisgeneralizedinSection4toanarbitraryX andY. TheanalogousresulttoTheorem3.5isTheo-
rem4.6. WefirstmotivatetheproblemandinparticularourchoiceofscalingintheregularizationinSection4.1
beforeproceedingtotheresultsinSection4.2. Verifyingtheconditionsonthecostfunctiondandregularization
termrisnon-trivialandsoweshowanapplicationtothesmoothing-dataassociationprobleminSection4.3.
Todemonstratethegeneralityoftheresultsinthispaper,twoapplicationsareconsideredinSection5. Thefirst
isthedataassociationandsmoothingproblem. We showtheminimumconvergingasthedatasizeincreases. We
alsonumericallyinvestigatetheuseofthek-meansenergytodeterminewhethertwotargetshavecrossedtracks.
Thesecondexampleusesmeasuredtimesofarrivalandamplitudesofsignalsfrommovingsourcesthatarereceived
acrossanetworkofthreesensors. TheclustercentersarethesourcetrajectoriesinR2.
2 Preliminaries
InthissectionweintroducesomenotationandbackgroundtheorywhichwillbeusedinSections3and4toestablish
ourconvergenceresults.Inthesesectionsweshowtheexistenceofoptimalclustercentersusingthedirectmethod.
By imposing conditions, such that our energiesare weakly lower semi-continuous, we can deduce the existence
ofminimizers. Furtherconditionsensurethe minimizersare uniformlybounded. The Γ-convergenceframework
(e.g.[6,13])allowsustoestablishtheconvergenceoftheminimumandalsotheconvergenceofminimizers.
WehavethefollowingdefinitionofΓ-convergencewithrespecttoweakconvergence.
Definition 2.1 (Γ-convergence). A sequence f : A R on a Banach space (A, ) is said to
n A
→ ∪ {±∞} k · k
Γ-converge on the domain A to f : A R with respect to weak convergence on A, and we write
∞ → ∪{±∞}
f =Γ-lim f ,ifforallx Awehave
n n
∞ ∈
(i) (liminfinequality)foreverysequence(x )weaklyconvergingtox
n
f (x) liminff (x );
n n
∞ ≤ n
(ii) (recoverysequence)thereexistsasequence(x )weaklyconvergingtoxsuchthat
n
f (x) limsupf (x ).
n n
∞ ≥ n
WhenitexiststheΓ-limitisalwaysweaklylowersemi-continuous,andthusadmitsminimizers. Animportant
propertyofΓ-convergenceisthatitimpliestheconvergenceofminimizers. Inparticular,wewillmakeextensive
useofthefollowingwell-knownresult.
Theorem 2.1 (Convergenceof Minimizers). Let f : A R be a sequence of functionalson a Banach space
n
→
(A, )andassumethatthereexistsN >0andaweaklycompactsubsetK Awith
A
k·k ⊂
inff =inff n>N.
n n
A K ∀
Iff =Γ-lim f andf isnotidentically then
n n
∞ ∞ ±∞
minf =liminff .
n
A ∞ n A
Furthermoreifeachf isweaklylowersemi-continuousthenforeachf thereexistsaminimizerx Kandany
n n n
∈
weaklimitpointofx minimizesf . SinceK isweaklycompactthereexistsatleastoneweaklimitpoint.
n
∞
Aproofofthetheoremcanbefoundin[6,Theorem1.21].
Theproblemswhichweaddressinvolverandomobservations. Weassumethroughouttheexistenceofaprob-
ability space (Ω, ,P), rich enoughto supporta countablyinfinite sequence of such observations, ξ(ω),.... All
F 1
random elements are defined upon this common probability space and all stochastic quantifiers are to be under-
stoodasactingwithrespecttoPunlessotherwisestated. Whereappropriate,toemphasizetherandomnessofthe
functionalsf , we will write f(ω) to indicate the functionalassociated with the particular observation sequence
n n
ξ(ω),...,ξ(ω) andweallowP(ω) todenotetheassociatedempiricalmeasure.
1 n n
3
Wedefinethesupportofa(probability)measuretobethesmallestclosedsetsuchthatthecomplementisnull.
Forclarityweoftenwriteintegralsusingoperatornotation.I.e. forameasureP,whichisusuallyaprobability
distribution,wewrite
Ph= h(x)P(dx).
Z
Forasequenceofprobabilitydistributions,P ,wesaythatP convergesweaklytoP if
n n
P h Ph forallboundedandcontinuoush
n
→
andwewriteP P. WithaslightabuseofnotationwewillsometimeswriteP(U) := PI forameasurable
n U
⇒
setU.
For a Banach space A onecan definethe dualspace A to be the space of all boundedand linearmapsover
∗
AspaincetooRfaelqlubiopupneddewdiathndthleinneoarrmmakpFskoAv∗er=Asuipnxto∈AR|.FR(exfl)e|.xiSviemsiplaarcleysoanreecdaenfindeedfintoetbheesspeaccoensdAdusualchAt∗h∗aatsAthies
∗
isometricallyisomorphictoA . Thesehavetheusefulpropertythatclosedandboundedsetsareweaklycompact.
∗∗
For example any Lp space (with 1 < p < ) is reflexive, as is any Hilbert space (by the Riesz Representation
∞
Theorem:ifAisaHilbertspacethenA isisometricallyisomorphictoA).
∗
A sequence x A is said to weakly convergenceto x A if F(x ) F(x) for all F A . We write
n n ∗
∈ ∈ → ∈
x ⇀x. WesayafunctionalG:A RisweaklycontinuousifG(x ) G(x)wheneverx ⇀xandstrongly
n n n
→ →
continuousif G(x ) G(x) whenever x x 0. Note that weak continuity implies strong continuity.
n n A
→ k − k →
SimilarlyafunctionalGisweaklylowersemi-continuousifliminf G(x ) G(x)wheneverx ⇀x.
n n n
WedefinetheSobolevspacesWs,p(I)onI Rby →∞ ≥
⊆
Ws,p =Ws,p(I)= f :I Rs.t. ∂if Lp(I)fori=0,...,s
→ ∈
where we use ∂ for the weak derivative, i.e(cid:8). g = ∂f if for all φ C (I) (the space o(cid:9)f smooth functionswith
∈ c∞
compactsupport)
dφ
f(x) (x)dx= g(x)φ(x)dx.
dx −
ZI ZI
Inparticular,wewillusethespecialcasewhenp=2andwewriteHs =Ws,2. ThisisaHilbertspacewithnorm:
s
f 2 = ∂if 2 .
k kHs k kL2
i=0
X
Fortworeal-valuedandpositivesequencesa andb wewritea .b if an isbounded.ForaspaceAanda
setK AwewriteKcforthecomplementofKninA,ni.e. Kc =An K.n bn
⊂ \
3 Convergence when Y = X
Weassumewearegivendatapointsξ X fori = 1,2,... whereX isareflexiveandseparableBanachspace
i
∈
withnorm andBorelσ-algebra . Thesedatapointsrealizeasequenceof -measurablerandomelements
X
k·k X X
on(Ω, ,P)whichwillalsobedenoted,withaslightabuseofnotation,ξ .
i
F
Wedefine
n k
1
f(ω) :Xk R, f(ω)(µ)=P(ω)g = d(ξ(ω),µ ) (1)
n → n n µ n i j
i=1j=1
X ^
k
f :Xk R, f (µ)=Pg = d(x,µ )P(dx) (2)
µ j
∞ → ∞ ZXj=1
^
where
k
g (x)= d(x,µ ),
µ j
j=1
^
P isaprobabilitymeasureon(X, ),andempiricalmeasureP(ω)associatedwithξ(ω),...,ξ(ω) isdefinedby
X n 1 n
n
1
P(ω)h= h(ξ(ω))
n n i
i=1
X
4
foranyX-measurablefunctionh:X →R. Weassumeξi areiidaccordingtoP withP =P◦ξi−1.
Wewishtoshow
θˆ(ω) θ foralmosteveryωasn (3)
n → →∞
where
θˆ(ω) = inf f(ω)(µ)
n n
µ Xk
∈
θ = inf f (µ).
µ Xk ∞
∈
Wedefine :Xk [0, )by
k
k·k → ∞
µ :=max µ forµ=(µ ,µ ,...,µ ) Xk. (4)
k j X 1 2 k
k k j k k ∈
Thereflexivityof(X, )carriesthroughto(Xk, ).
X k
k·k k·k
Ourstrategyissimilartothatof[25]butweembedthemethodologyintotheΓ-convergenceframework. We
showthat(2)istheΓ-limitinTheorem3.2andthatminimizersareboundedinProposition3.3. Wemaythenapply
Theorem2.1toinfer(3)andtheexistenceofaweaklyconvergingsubsequenceofminimizers.
The key assumptions on d and P are given in Assumptions 1. The first assumption can be understood as a
‘closeness’conditionforthespaceX withrespecttod. Ifweletd(x,y) = 1forx = y andd(x,x) = 0thenour
6
costfunctionddoesnotcarryanyinformationonhowfaraparttwopointsare. Assumethereexistsaprobability
density for P which has unbounded support. Then f(ω)(µ) n k (for almost every ω), with equality when
n ≥ −n
we choose µ ξ(ω) n . I.e. any set of k unique data points will minimize f(ω). Since our data points are
j ∈ { i }i=1 n
unboundedwemayfinda sequence ξ(ω) . Nowwechooseµn = ξ(ω) andclearlyourclustercenteris
k in kX → ∞ 1 in
unbounded.Weseethatthischoiceofdviolatesthefirstassumption.Wealsoaddamomentconditiontotheupper
boundtoensureintegrability. NotethatthisalsoimpliesthatPd(,0) M( x )P(dx) < sof (0)<
· ≤ X k k ∞ ∞ ∞
and,inparticular,thatf isnotidenticallyinfinity.
∞ R
Thesecondassumptionisslightlystrongerconditionondthanaweaklowersemi-continuityconditioninthe
firstvariableandstrongcontinuityinthesecondvariable. TheconditionallowstheapplicationofFatou’slemma
forweaklyconvergingprobabilities,see[15].
Thethirdassumptionallowsustoviewd(ξ ,y)asacollectionofrandomvariables.Thefourthimpliesthatwe
i
haveatleastkopenballswithpositiveprobabilityandthereforewearenotoverfittingclusterstodata.
Assumptions1. Wehavethefollowingassumptionsond:X X [0, )andP.
× → ∞
1.1. Thereexistcontinuous,strictlyincreasingfunctionsm,M :[0, ) [0, )suchthat
∞ → ∞
m( x y ) d(x,y) M( x y ) forallx,y X
X X
k − k ≤ ≤ k − k ∈
with lim m(r) = , M(0) = 0, there exists γ < such that M( x + y ) γM( x ) +
r X X
γM( y →)∞andfinally ∞M( x )P(dx)< (andM i∞smeasurable). k k ≤ k k
k kX X k kX ∞
1.2. Foreachx,y X wehRavethatifxm xandyn ⇀yasn,m then
∈ → →∞
liminf d(x ,y ) d(x,y) and lim d(x ,y)=d(x,y).
m n m
n,m ≥ m
→∞ →∞
1.3. Foreachy X wehavethatd(,y)is -measurable.
∈ · X
1.4. Thereexistkdifferentcentersµ†j ∈X,j =1,2,...,ksuchthatforallδ >0
P(B(µ†j,δ))>0 ∀j =1,2,...,k
whereB(µ,δ):= x X : µ x <δ .
X
{ ∈ k − k }
Wenowshowthatforaparticularcommonchoiceofcostfunction,d,Assumptions1.1to1.3hold.
Remark3.1. Foranyp>0letd(x,y)= x y p thendsatisfiesAssumptions1.1to1.3.
k − kX
Proof. Takingm(r) = M(r) = rp wecanboundm( x y ) d(x,y) M( x y )andm,M clearly
X X
k − k ≤ ≤ k − k
satisfym(r) ,M(0)=0,arestrictlyincreasingandcontinuous.Onecanalsoshowthat
→∞
M( x+y ) 2p 1( x p + y p )
k kX ≤ − k kX k kX
henceAssumption1.1issatisfied.
5
Letx xandy ⇀y. Then
m n
→
1
liminf d(xm,yn)p = liminf xm ym X
n,m n,m k − k
→∞ →∞
liminf ( y x x x )
n X m X
≥n,m k − k −k − k
→∞
=liminf y x sincex x
n X m
n k − k →
→∞
y x
X
≥k − k
wherethelastinequalityfollowsasaconsequenceoftheHahn-BanachTheoremandthefactthaty x⇀y x
n
− −
whichimpliesliminf y x y x . Clearlyd(x ,y) d(x,y)andsoAssumption1.2holds.
n n X X m
→∞k − k ≥k − k →
ThethirdassumptionholdsbytheBorelmeasurabilityofmetricsoncompleteseparablemetricspaces.
Wenowstatethefirstresultofthepaperwhichformalizestheunderstandingthatf isthelimitoff(ω).
n
∞
Theorem3.2. Let(X, )beareflexiveandseparableBanachspacewithBorelσ-algebra, ;let ξ be
X i i N
k·k X { }∈
asequenceofindependentX-valuedrandomelementswithcommonlawP. Assumed:X X [0, )andthat
× → ∞
P satisfiestheconditionsinAssumptions1. Definef(ω) :Xk Randf :Xk Rby(1)and(2)respectively.
n
→ ∞ →
Then
f =Γ-limf(ω)
n
∞ n
forP-almosteveryω.
Proof. DefineΩ astheintersectionofthreeevents:
′
Ω = ω Ω:P(ω) P ω Ω:P(ω)(B(0,q)c) P(B(0,q)c) q N
′ ∈ n ⇒ ∩ ∈ n → ∀ ∈
n o n o
ω Ω: I (x)M( x )P(ω)(dx) I (x)M( x )P(dx) q N .
∩ ∈ B(0,q)c k kX n → B(0,q)c k kX ∀ ∈
(cid:26) ZX ZX (cid:27)
By the almost sure weak convergenceof the empiricalmeasure the first of these events has probability one, the
second and third are characterized by the convergence of a countable collection of empirical averages to their
populationaverageand,bythestronglawoflargenumbers,eachhasprobabilityone.HenceP(Ω)=1.
′
Fixω Ω: wewillshowthattheliminfinequalityholdsandarecoverysequenceexistsforthisωandhence
′
∈
foreveryω Ω. We start by showingthe lim inf inequality,allowing µn Xk to denoteany sequence
∈ ′ { }∞n=1 ∈
whichconvergesweaklytoµ Xk. Wearerequiredtoshow:
∈
liminff(ω)(µn) f (µ).
n n ≥ ∞
→∞
ByTheorem1.1in[15]wehave
liminf g (x)P(dx) liminf g (x)P(ω)(dx)=liminfP(ω)g .
ZXn→∞,x′→x µn ′ ≤ n→∞ ZX µn n n→∞ n µn
Foreachx X,wehavebyAssumption1.2that
∈
liminf d(x,µn) d(x,µ ).
x′ x,n ′ j ≥ j
→ →∞
Bytakingtheminimumoverj wehave
k k
liminf g (x)= liminf d(x,µn) d(x,µ )=g (x).
x′ x,n µn ′ x′ x,n ′ j ≥ j µ
→ →∞ j=1 → →∞ j=1
^ ^
Hence
linm→i∞nffn(ω)(µn)=linm→i∞nfPn(ω)gµn ≥ZXgµ(x)P(dx)=f∞(µ)
asrequired.
Wenowestablishtheexistenceofarecoverysequenceforeveryω Ω andeveryµ Xk. Letµn =µ Xk.
′
∈ ∈ ∈
Letζ beaC (X)sequenceoffunctionssuchthat0 ζ (x) 1forallx X,ζ (x) = 1forx B(0,q 1)
q ∞ q q
≤ ≤ ∈ ∈ −
andζ (x)=0forx B(0,q). Thenthefunctionζ (x)g (x)iscontinuousinx(andwithrespecttoconvergence
q q µ
6∈
6
in )forallq. Wealsohave
X
k·k
ζ (x)g (x) ζ (x)d(x,µ )
q µ q 1
≤
ζ (x)M( x µ )
q 1 X
≤ k − k
ζ (x)M( x + µ )
q X 1 X
≤ k k k k
M(q+ µ )
1 X
≤ k k
soζ g isacontinuousandboundedfunction,hencebytheweakconvergenceofP(ω)toP wehave
q µ n
P(ω)ζ g Pζ g
n q µ → q µ
asn forallq N. Forallq Nwehave
→∞ ∈ ∈
limsup P(ω)g Pg limsup P(ω)g P(ω)ζ g +limsup P(ω)ζ g Pζ g +limsup Pζ g Pg
n | n µ− µ|≤ n | n µ− n q µ| n | n q µ− q µ| n | q µ− µ|
→∞ →∞ →∞ →∞
=limsup P(ω)g P(ω)ζ g + Pζ g Pg .
n | n µ− n q µ| | q µ− µ|
→∞
Therefore,
limsup P(ω)g Pg liminflimsup P(ω)g P(ω)ζ g
n | n µ− µ|≤ q n | n µ− n q µ|
→∞ →∞ →∞
bythedominatedconvergencetheorem. Wenowshowthattherighthandsideoftheaboveexpressionisequalto
zero.Wehave
P(ω)g P(ω)ζ g P(ω)I g
| n µ− n q µ|≤ n (B(0,q−1))c µ
P(ω)I d(,µ )
≤ n (B(0,q−1))c · 1
P(ω)I M( µ )
≤ n (B(0,q−1))c k·− 1kX
γ P(ω)I M( )+M( µ )P(ω)I
≤ n (B(0,q−1))c k·kX k 1kX n (B(0,q−1))c
γ(cid:16)PI M( )+M( µ )PI as(cid:17)n
(B(0,q 1))c X 1 X (B(0,q 1))c
→ − k·k k k − →∞
0 asq
(cid:0) (cid:1)
→ →∞
wherethelastlimitfollowsbythemonotoneconvergencetheorem.Wehaveshown
lim P(ω)g Pg =0.
n | n µ− µ|
→∞
Hence
f(ω)(µ) f (µ)
n → ∞
asrequired.
NowwehaveestablishedalmostsureΓ-convergenceweestablishtheboundednessconditioninProposition3.3
sowecanapplyTheorem2.1.
Proposition3.3. AssumingtheconditionsofTheorem3.2anddefine by(4),thereexistsR>0suchthat
k
k·k
inf f(ω)(µ)= inf f(ω)(µ) nsufficientlylarge
µ∈Xk n kµkk≤R n ∀
forP-almosteveryω. InparticularRisindependentofn.
Proof. The structure of the proof is similar to [20, Lemma 2.1]. We argue by contradiction. In particular we
argue that if a cluster center is unboundedthen in the limit the minimum is achieved over the remaining k 1
−
clustercenters. WethenuseAssumption1.4toimplythataddinganextraclustercenterwillstrictlydecreasethe
minimum,andhencewehaveacontradiction.
WedefineΩ tobe
′′
Ω′′ =∩δ∈Q∩(0,∞),l=1,2,...,k ω ∈Ω′ :Pn(ω)(B(µ†l,δ))→P(B(µ†l,δ)) .
n o
AsΩ isthecountableintersectionofsetsofprobabilityone,wehaveP(Ω ) = 1. Fixω Ω andassumethat
′′ ′′ ′′
∈
theclustercentersµn Xk arealmostminimizers,i.e.
∈
f(ω)(µn) inf f(ω)(µ)+ε
n ≤µ Xk n n
∈
7
forsomesequenceε >0suchthat
n
lim ε =0. (5)
n
n
→∞
Assumethat lim µn = . Thereexistsl 1,...,k suchthat lim µn = . Fixx X then
n k kk ∞ n ∈{ } n k lnkX ∞ ∈
→∞ →∞
d(x,µn ) m( µn x ) .
ln ≥ k ln − kX →∞
Therefore,foreachx X,
∈
k
lim d(x,µn) d(x,µn) =0.
n j − j
→∞ j^=1 j^6=ln
Letδ >0thenthereexistsN suchthatforn N
≥
k
d(x,µn) d(x,µn) δ.
j − j ≥−
j^=1 j^6=ln
Hence
k
liminf d(x,µn) d(x,µn) P(ω)(dx) δ.
n j − j n ≥−
→∞ Z j^=1 j^6=ln
Lettingδ 0wehave
→
k
liminf d(x,µn) d(x,µn) P(ω)(dx) 0
n j − j n ≥
→∞ Z j^=1 j^6=ln
andmoreover
liminf f(ω)(µn) f(ω) (µn) 0, (6)
n n − n j j6=ln ≥
whereweinterpretf(ω)accordingly.→It∞suf(cid:16)ficestodemonstrat(cid:0)ethat (cid:1)(cid:17)
n
liminf inf f(ω)(µ) inf f(ω)(µ) <0. (7)
n µ Xk n −µ Xk−1 n
→∞ (cid:18) ∈ ∈ (cid:19)
Indeed,if(7)holds,then
liminf f(ω)(µn) f(ω) (µn)
n n − n j j6=ln
→∞ (cid:16) (cid:0) (cid:1)(cid:17)
= lim f(ω)(µn) inf f(ω)(µ) +liminf inf f(ω)(µ) f(ω) (µn)
n n −µ Xk n n µ Xk n − n j j6=ln
→∞ ∈ →∞ (cid:18) ∈ (cid:19)
(cid:0) (cid:1) (cid:0) (cid:1)
≤εn
<0 by|(5)and(7),{z }
butthiscontradicts(6).
We nowestablish(7). ByAssumption1.4thereexistsk centersµ†j ∈ X andδ1 > 0suchthatminj6=lkµ†j −
µ†lkX ≥δ1. Henceforanyµ∈Xk−1thereexistsl∈{1,2,...,k}suchthatwehave
δ
kµ†l −µjkX ≥ 21 forj =1,2,...,k−1.
Proceedingwiththischoiceofl,forx∈B(µ†l,δ2)(foranyδ2 ∈(0,δ1/2))wehave
δ
1
µ x δ
j X 2
k − k ≥ 2 −
andtherefored(µ ,x) m(δ1 δ )forallj =1,2,...,k 1. Also
j ≥ 2 − 2 −
δ
Dl(µ):=j=1,m2,.i.n.,k 1d(x,µj)−d(x,µ†l)≥m( 21 −δ2)−M(δ2). (8)
−
8
Soforδ sufficientlysmallthereexistsǫ>0suchthat
2
D (µ) ǫ.
l
≥
Sincetherighthandsideisindependentofµ Xk 1,
−
∈
inf maxD (µ) ǫ.
l
µ Xk−1 l ≥
∈
Definethecharacteristicfunction
χ (ξ)= 1 ifkξ−µ†l(µ)kX <δ2
µ
(0 otherwise,
wherel(µ)isthemaximizerin(8). Foreachω Ω oneobtains
′′
∈
n k 1
1 −
inf f(ω)(µ)= inf d(ξ ,µ )
µ Xk−1 n µ Xk−1 n i j
∈ ∈ i=1j=1
X ^
n k 1
1 −
≥µ iXnkf−1 n d(ξi,µj)(1−χµ(ξi))+ d(ξi,µ†l(µ))+ǫ χµ(ξi)
∈ Xi=1 j^=1 (cid:16) (cid:17)
≥µinXfkfn(ω)(µ)+ǫl=1m,2i,n...,kPn(ω)(B(µ†l,δ2)).
∈
ThensincePn(ω)(B(µ†l,δ2))→P(B(µ†l,δ2))> 0byAssumption1.4(forδ2 ∈Q∩(0,∞))wecanconclude(7)
holds.
Remark3.4. OnecaneasilyshowthatAssumption1.2impliesthatdisweaklylowersemi-continuousinitssecond
argumentwhichcarriesthroughtof(ω). Itfollowsthatonanybounded(orequivalentlyasX isreflexive: weakly
n
compact)settheinfimumoff(ω)isachieved.HencetheinfimuminProposition3.3isactuallyaminimum.
n
WenoweasilyproveconvergencebyapplicationofTheorem2.1.
Theorem3.5. AssumingtheconditionsofTheorem3.2andProposition3.3theminimizationproblemassociated
withthek-meansmethodconverges.I.e.forP-almosteveryω:
min f (µ)= lim min f(ω)(µ).
n
µ Xk ∞ n µ Xk
∈ →∞ ∈
Furthermoreanysequenceofminimizersµn off(ω) isalmostsurelyweaklyprecompactandanyweaklimitpoint
n
minimizesf .
∞
4 The Case of General Y
In the previous section the data, ξ , and cluster centers, µ , took their values in a common space, X. We now
i j
remove this restriction and let ξ : Ω X and µ Y. We may want to use this framework to deal with
i j
→ ∈
finitedimensionaldataandinfinitedimensionalclustercenters,whichcanleadtothevariationalproblemhaving
uninformativeminimizers.
Intheprevioussectionthecostfunctiondwasassumedtoscale withtheunderlyingnorm. Thisisnolonger
appropriatewhend : X Y [0, ). Inparticularifweconsiderthesmoothing-dataassociationproblemthen
× → ∞
the natural choice of d is a pointwise distance which will lead to the optimal cluster centers interpolating data
points.Hence,inanyHsnormwiths 1,theoptimalclustercenters“blowup”.
≥
OnepossiblesolutionwouldbetoweakenthespacetoL2 andallowthistypeofbehavior. Thisisundesirable
frombothmodelingandmathematicalperspectives:Ifwefirstconsiderthemodelingpointofviewthenwedonot
expectourestimatetoperfectlyfitthedatawhichisobservedinthepresenceofnoise. Itisnaturalthatthecluster
centersaresmootherthanthedataalonewouldsuggest.Itisdesirablethattheoptimalclustersshouldreflectreality.
Fromthemathematicalpointofview,restrictingourselvestoonlyveryweakspacesgivesnohopeofobtaininga
stronglyconvergentsubsequence.
Analternativeapproachis,asiscommoninthesmoothingliterature,tousearegularizationterm.Thisapproach
is also standard when dealing with ill-posed inverse problems. This changes the nature of the problem and so
requiressomejustification.Inparticularthescalingoftheregularizationwiththedataisoffundamentalimportance.
In the following section we argue that scaling motivated by a simple Bayesian interpretation of the problem is
9
not strong enough (unsurprisingly, countable collections of finite dimensional observations do not carry enough
informationto provideconsistency when dealing with infinite dimensionalparameters). In the form of a simple
exampleweshowthattheoptimalclustercenterisunboundedinthelargedatalimitwhentheregularizationgoes
tozero sufficientlyquickly. Thenaturalscaling inthis exampleisfor theregularizationto varywith thenumber
ofobservationsas np forp [ 4,0]. We considerthecase p = 0in Section4.2. Thistypeofregularizationis
∈ −5
understoodaspenalizedlikelihoodestimation[16].
Althoughitmayseemundesirableforthelimitingproblemtodependupontheregularizationitisunavoidable
inill-posedproblemssuchasthisone: thereisnotsufficientinformation,inevencountablyinfinitecollectionsof
observationstorecovertheunknownclustercentersandexploitingknown(orexpected)regularityinthesesolutions
providesone way to combineobservationswith qualitative prior beliefs aboutthe cluster centers in a principled
manner.Therearemanyprecedentsforthisapproach,including[17]inwhichtheconsistencyofpenalizedsplines
is studied using, what in this paper we call, the Γ-limit. In that paper a fixed regularization was used to define
the limitingproblemin orderto derivean estimator. Naturally, regularizationstrongenoughto alter the limiting
probleminfluencesthesolutionandwecannothopetoobtainconsistentestimationinthissetting,eveninsettings
in which the cost function can be interpretedas the log likelihood of the data generating process. In the setting
of [17], the regularization is finally scaled to zero whereuponunder assumptions the estimator convergesto the
truthbutsuchastepisnotfeasibleinthemorecomplicatedsettingsconsideredhere.
Whenmorestructureisavailableitmaybedesirabletofurtherinvestigatetheregularization.Forexamplewith
k =1thenon-parametricregressionmodelisequivalenttothewhitenoisemodel[7]forwhichoptimalscalingof
theregularizationisknown[1,30]. Itisthesubjectoffurtherworktoextendtheseresultstok >1.
With our redefined k-means type problem we can replicate the results of the previous section, and do so in
Theorem4.6. Thatis, weprovethatthek-meansmethodconvergeswhereY isageneralseparableandreflexive
BanachspaceandinparticularneednotbeequaltoX.
Thissectionissplitintothreesubsections.Inthefirstwemotivatetheregularizationterm.Thesecondcontains
theconvergencetheoryinageneralsetting. Establishingthattheassumptionsofthissubsectionholdisnon-trivial
andso,inthethirdsubsection,weshowanapplicationtothesmoothing-dataassociationproblem.
4.1 Regularization
Inthissectionweuseatoy,k =1,smoothingproblemtomotivateanapproachtoregularizationwhichisadopted
inwhatfollows.Weassumethattheclustercentersareperiodicwithequallyspacedobservationssowemayusea
Fourierargument.Inparticularweworkonthespaceof1-periodicfunctionsinH2,
Y = µ:[0,1] Rs.t. µ(0)=µ(1)andµ H2 . (9)
→ ∈
Forarbitrarysequences(a ),(b(cid:8))anddataΨ = (t ,z ) n [0,1] Rd(cid:9)wedefinethefunctional
n n n { j j }j=1 ⊂ ×
n 1
f(ω)(µ)=a − µ(t ) z 2+b ∂2µ 2 . (10)
n n | j − j| nk kL2
j=0
X
Dataarepointsinspace-time:[0,1] R.TheregularizationischosensothatitpenalizestheL2normofthesecond
×
derivative. For simplicity, we employ deterministic measurementtimes t in the followingpropositionalthough
j
thisliesoutsidetheformalframeworkwhichweconsidersubsequently. Anothersimplificationwemakeistouse
convergenceinexpectationratherthanalmostsureconvergence.Thissimplifiesourarguments.Westressthatthis
sectionisthemotivationfortheproblemstudiedinSection4.2. Wewillgiveconditionsonthescalingofa and
n
b thatdeterminewhetherEminf(ω)andEµnstayboundedwhereµnistheminimizeroff(ω).
n n n
Proposition4.1. LetdatabegivenbyΨ = (t ,z ) n with t = j undertheassumptionz = µ (t )+ǫ
n { j j }j=1 j n j † j j
forǫ iidnoisewithfinitevarianceandµ L2 anddefineY by(9). Theninf f(ω)(µ)definedby(10)stays
j † µ Y n
bounded(inexpectation)ifa =O(1)fora∈nypositivesequenceb . ∈
n n n
Proof. Assumenisodd.Bothµandzare1-periodicsowecanwrite
n−1 n−1
µ(t)= 1 2 µˆle2πilt and zj = 1 2 zˆle2πnilj
n n
l=X−n−21 l=X−n−21
with
n 1 n 1
− 2πilj − 2πilj
µˆl = µ(tj)e− n and zˆl = zje− n .
j=0 j=0
X X
10