ebook img

Convergence of the $k$-Means Minimization Problem using $\Gamma$-Convergence PDF

0.39 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Convergence of the $k$-Means Minimization Problem using $\Gamma$-Convergence

k Convergence of the -Means Minimization Γ Problem using -Convergence MatthewThorpe1, FlorianTheil1, AdamM. Johansen1, and NeilCade2 1UniversityofWarwick, Coventry,CV4 7AL,UnitedKingdom 2Selex-ES, Luton,LU1 3PG,United Kingdom Abstract 5 1 Thek-meansmethodisaniterativeclusteringalgorithmwhichassociateseachobservationwithoneofkclus- 0 2 ters.Ittraditionallyemploysclustercentersinthesamespaceastheobserveddata.Byrelaxingthisrequirement,it ispossibletoapplythek-meansmethodtoinfinitedimensionalproblems,forexamplemultipletargettrackingand r p smoothingproblemsinthepresenceofunknowndataassociation. Via aΓ-convergenceargument,theassociated A optimizationproblemisshowntoconvergeinthesensethatboththek-meansminimumandminimizersconverge inthelargedatalimittoquantitieswhichdependupontheobserveddataonlythroughitsdistribution. Thetheory 3 issupplementedwithtwoexamplestodemonstratetherangeofproblemsnowaccessiblebythek-meansmethod. ] The first example combines a non-parametric smoothing problem with unknown data association. The second T addressestrackingusingsparsedatafromanetworkofpassivesensors. S . h 1 Introduction t a m Thek-meansalgorithm[23]isatechniqueforassigningeachofacollectionofobserveddatatoexactlyoneofk [ clusters, eachof whichhasa uniquecenter, in sucha way thateachobservationis assignedto the cluster whose 2 centerisclosesttothatobservationinanappropriatesense. v Thek-meansmethodhastraditionallybeenusedwithlimitedscope.ItsusualapplicationhasbeeninEuclidean 0 spaces which restricts its application to finite dimensionalproblems. There are relatively few theoretical results 2 using the k-means methodologyin infinite dimensions of which [5,8,12,19–21,28] are the only papers known 3 to the authors. In the rightframework,post-hoctrack estimation in multiple targetscenarioswith unknowndata 1 0 associationcanbeviewedasaclusteringproblemandthereforeaccessibletothek-meansmethod.Insuchproblems . one typically has finite-dimensionaldata, but would wish to estimate infinite dimensionaltracks with the added 1 0 complicationofunresolveddataassociation.Itisouraimtoproposeandcharacterizeaframeworkforthek-means 5 methodwhichcandealwiththisproblem. 1 Anaturalquestiontoaskofanyclusteringtechniqueiswhethertheestimatedclusteringstabilizesasmoredata v: becomesavailable.Moreprecisely,weaskwhethercertainestimatesconverge,inanappropriatesense,inthelarge i datalimit.Inordertoanswerthisquestioninourparticularcontextwefirstestablisharelatedoptimizationproblem X andmakeprecisethenotionofconvergence. r Consistencyofestimatorsforill-posedinverseproblemshasbeenwellstudied,forexample[14,24],butwith- a outthedataassociationproblem.Incontrasttostandardstatisticalconsistencyresults,wedonotassumethatthere existsastructuralrelationshipbetweentheoptimizationproblemandthedata-generatingprocessinordertoestab- lishconvergencetotrueparametervaluesinthelargedatalimit;rather,wedemonstrateconvergencetothesolution ofarelatedlimitingproblem. Thispapershowstheconvergenceoftheminimizationproblemassociatedwiththek-meansmethodinaframe- work that is generalenoughto include exampleswhere the cluster centers are not necessarily in the same space as the data points. In particular we are motivated by the application to infinite dimensional problems, e.g. the smoothing-dataassociationproblem. Thesmoothing-dataassociationproblemis theproblemofassociatingdata points (t ,z ) n [0,1] Rκtounknowntrajectoriesµ :[0,1] Rκforj =1,2,...,k. Bytreatingthetra- { i i }i=1 ⊂ × j → jectoriesµ astheclustercentersonemayapproachthisproblemusingthek-meansmethodology.Thecomparison j ofdatapointstoclustercentersisapointwisedistance:d((t ,z ),µ )= µ (t ) z 2(where istheEuclidean i i j j i i | − | |·| normonRκ). Toensuretheproblemiswell-posedsomeregularizationisalsonecessary. Fork = 1theproblem reducestosmoothingandcoincideswiththelimitingproblemstudiedin[17]. Wewilldiscussthesmoothing-data associationproblemmoreinSection4.3. 1 Letusnowintroducethenotationforourvariationalapproach.Thek-meansmethodisastrategyforpartition- ingadatasetΨ = ξ n X intokclusterswhereeachclusterhascenterµ forj = 1,2,...,k. Firstletus n { i}i=1 ⊂ j considerthespecialcasewhenµ X.Thedatapartitionisdefinedbyassociatingeachdatapointwiththecluster j ∈ centerclosesttoitwhichismeasuredbyacostfunctiond : X X [0, ). Traditionallythek-meansmethod considersEuclideanspacesX =Rκ,wheretypicallywechoose×d(x,→y)=∞x y 2 = κ (x y )2. Wedefine | − | i=1 i− i theenergyforachoiceofclustercentersgivendataby P n k 1 f :Xk R f (µΨ )= d(ξ ,µ ), n n n i j → | n i=1j=1 X ^ where for any k variables, a ,a ,...,a , k a := min a ,...,a . The optimal choice of µ is that which 1 2 k j=1 j { 1 k} minimizesf ( Ψ ). Wedefine n n ·| Vθˆ = min f (µΨ ) R. n n n µ Xk | ∈ ∈ Anassociated“limitingproblem”canbedefined θ = min f (µ) µ Xk ∞ ∈ iid whereweassume,inasensewhichwillbemadepreciselater,thatξ P forsomesuitableprobabilitydistribution, i ∼ P,anddefine k f (µ)= d(x,µ )P(dx). j ∞ Z j=1 ^ InSection3wevalidatetheformulationbyfirstshowingthat,underregularityconditionsandwithprobabilityone, the minimum energy converges: θˆ θ. And secondly by showing that (up to a subsequence) the minimizers n → converge:µn µ whereµnminimizesf andµ minimizesf (againwithprobabilityone). ∞ n ∞ In a more→sophisticated version of the k-means method the r∞equirement that µ X can be relaxed. We j ∈ insteadallowµ = (µ ,µ ,...,µ ) Yk forsomeotherBanachspace,Y,anddefinedappropriately. Thisleads 1 2 k ∈ to interestingstatistical questions. When Y is infinite dimensionaleven establishingwhetheror nota minimizer existsisnon-trivial. Whentheclustercenterisinadifferentspacetothedata,boundingthesetofminimizersbecomeslessnatural. For example, consider the smoothing problem in which one wishes to fit a continuous function to a set of data points. Thenaturalchoiceofcostfunctionisapointwisedistanceofthedatatothecurve. Theoptimalsolutionis fortheclustercentertointerpolatethedatapoints:inthelimittheclustercentermaynolongerbewelldefined.In particularwecannothopetohaveconvergingsequencesofminimizers. In thesmoothingliteraturethis problemispreventedbyusinga regularizationtermr : Yk R. Fora cost → functiond:X Y [0, )theenergiesf ( Ψ ),f ():Yk Rareredefined n n × → ∞ ·| ∞ · → n k 1 f (µΨ )= d(ξ ,µ )+λ r(µ) n n i j n | n i=1j=1 X ^ k f (µ)= d(x,µ )P(dx)+λr(µ). j ∞ Z j=1 ^ AddingregularizationchangesthenatureoftheproblemsowecommittimeinSection4tojustifyingourapproach. Particularlywemotivatetreatingλ = λasaconstantindependentofn. We areabletorepeattheanalysisfrom n Section4;thatistoestablishthattheminimumandasubsequenceofminimizersstillconverge. EarlyresultsassumedY =X wereEuclideanspacesandshowedtheconvergenceofminimizerstotheappro- priatelimit[18,25]. Themotivationfortheearlyworkinthisareawastoshowconsistencyofthemethodology.In particularthisrequirestheretobeanunderlying‘truth’.Thisrequirestheassumptionthatthereexistsauniquemin- imizertothelimitingenergy.Theseresultsdonotholdwhenthelimitingenergyhasmorethanoneminimizer[4]. Inthispaperwediscussonlytheconvergenceofthemethodandassuchrequirenoassumptionastotheexistence oruniquenessofaminimizertothelimitingproblem.Consistencyhasbeenstrengthenedtoacentrallimittheorem in [26] also assuming a unique minimizer to the limiting energy. Other rates of convergence have been shown in[2,3,9,22]. InHilbertspacesthereexistconvergenceresultsandratesofconvergencefortheminimum. In[5] the authors show that f (µn) f (µ ) is of order 1 , however, there are no results for the convergenceof | n − ∞ ∞ | √n minimizers.Resultsexistfork ,seeforexample[8](whicharealsovalidforY =X). →∞ 6 AssumingthatY =X,theconvergenceoftheminimizationprobleminareflexiveandseparableBanachspace hasbeenprovedin[21]andasimilarresultinmetricspacesin[20]. In[19],theexistenceofaweaklyconverging 2 subsequencewasinferredusingtheresultsof[21]. Inthefollowingsectionweintroducethenotationandpreliminarymaterialusedinthispaper. Wethen,inSection3,considerconvergenceinthespecialcasewhentheclustercentersareinthesamespace asthedatapoints,i.e. Y = X. Inthiscasewedon’thaveanissuewithwell-posednessasthedatahasthesame dimensionastheclustercenters.Forthisreasonweuseenergiesdefinedwithoutregularization.Theorem3.5shows thattheminimumconverges,i.e. θˆ θ asn ,foralmosteverysequenceofobservationsandfurthermore n → → ∞ wehaveasubsequenceµnm ofminimizersoffnm whichweaklyconvergetosomeµ∞whichminimizesf∞. ThisresultisgeneralizedinSection4toanarbitraryX andY. TheanalogousresulttoTheorem3.5isTheo- rem4.6. WefirstmotivatetheproblemandinparticularourchoiceofscalingintheregularizationinSection4.1 beforeproceedingtotheresultsinSection4.2. Verifyingtheconditionsonthecostfunctiondandregularization termrisnon-trivialandsoweshowanapplicationtothesmoothing-dataassociationprobleminSection4.3. Todemonstratethegeneralityoftheresultsinthispaper,twoapplicationsareconsideredinSection5. Thefirst isthedataassociationandsmoothingproblem. We showtheminimumconvergingasthedatasizeincreases. We alsonumericallyinvestigatetheuseofthek-meansenergytodeterminewhethertwotargetshavecrossedtracks. Thesecondexampleusesmeasuredtimesofarrivalandamplitudesofsignalsfrommovingsourcesthatarereceived acrossanetworkofthreesensors. TheclustercentersarethesourcetrajectoriesinR2. 2 Preliminaries InthissectionweintroducesomenotationandbackgroundtheorywhichwillbeusedinSections3and4toestablish ourconvergenceresults.Inthesesectionsweshowtheexistenceofoptimalclustercentersusingthedirectmethod. By imposing conditions, such that our energiesare weakly lower semi-continuous, we can deduce the existence ofminimizers. Furtherconditionsensurethe minimizersare uniformlybounded. The Γ-convergenceframework (e.g.[6,13])allowsustoestablishtheconvergenceoftheminimumandalsotheconvergenceofminimizers. WehavethefollowingdefinitionofΓ-convergencewithrespecttoweakconvergence. Definition 2.1 (Γ-convergence). A sequence f : A R on a Banach space (A, ) is said to n A → ∪ {±∞} k · k Γ-converge on the domain A to f : A R with respect to weak convergence on A, and we write ∞ → ∪{±∞} f =Γ-lim f ,ifforallx Awehave n n ∞ ∈ (i) (liminfinequality)foreverysequence(x )weaklyconvergingtox n f (x) liminff (x ); n n ∞ ≤ n (ii) (recoverysequence)thereexistsasequence(x )weaklyconvergingtoxsuchthat n f (x) limsupf (x ). n n ∞ ≥ n WhenitexiststheΓ-limitisalwaysweaklylowersemi-continuous,andthusadmitsminimizers. Animportant propertyofΓ-convergenceisthatitimpliestheconvergenceofminimizers. Inparticular,wewillmakeextensive useofthefollowingwell-knownresult. Theorem 2.1 (Convergenceof Minimizers). Let f : A R be a sequence of functionalson a Banach space n → (A, )andassumethatthereexistsN >0andaweaklycompactsubsetK Awith A k·k ⊂ inff =inff n>N. n n A K ∀ Iff =Γ-lim f andf isnotidentically then n n ∞ ∞ ±∞ minf =liminff . n A ∞ n A Furthermoreifeachf isweaklylowersemi-continuousthenforeachf thereexistsaminimizerx Kandany n n n ∈ weaklimitpointofx minimizesf . SinceK isweaklycompactthereexistsatleastoneweaklimitpoint. n ∞ Aproofofthetheoremcanbefoundin[6,Theorem1.21]. Theproblemswhichweaddressinvolverandomobservations. Weassumethroughouttheexistenceofaprob- ability space (Ω, ,P), rich enoughto supporta countablyinfinite sequence of such observations, ξ(ω),.... All F 1 random elements are defined upon this common probability space and all stochastic quantifiers are to be under- stoodasactingwithrespecttoPunlessotherwisestated. Whereappropriate,toemphasizetherandomnessofthe functionalsf , we will write f(ω) to indicate the functionalassociated with the particular observation sequence n n ξ(ω),...,ξ(ω) andweallowP(ω) todenotetheassociatedempiricalmeasure. 1 n n 3 Wedefinethesupportofa(probability)measuretobethesmallestclosedsetsuchthatthecomplementisnull. Forclarityweoftenwriteintegralsusingoperatornotation.I.e. forameasureP,whichisusuallyaprobability distribution,wewrite Ph= h(x)P(dx). Z Forasequenceofprobabilitydistributions,P ,wesaythatP convergesweaklytoP if n n P h Ph forallboundedandcontinuoush n → andwewriteP P. WithaslightabuseofnotationwewillsometimeswriteP(U) := PI forameasurable n U ⇒ setU. For a Banach space A onecan definethe dualspace A to be the space of all boundedand linearmapsover ∗ AspaincetooRfaelqlubiopupneddewdiathndthleinneoarrmmakpFskoAv∗er=Asuipnxto∈AR|.FR(exfl)e|.xiSviemsiplaarcleysoanreecdaenfindeedfintoetbheesspeaccoensdAdusualchAt∗h∗aatsAthies ∗ isometricallyisomorphictoA . Thesehavetheusefulpropertythatclosedandboundedsetsareweaklycompact. ∗∗ For example any Lp space (with 1 < p < ) is reflexive, as is any Hilbert space (by the Riesz Representation ∞ Theorem:ifAisaHilbertspacethenA isisometricallyisomorphictoA). ∗ A sequence x A is said to weakly convergenceto x A if F(x ) F(x) for all F A . We write n n ∗ ∈ ∈ → ∈ x ⇀x. WesayafunctionalG:A RisweaklycontinuousifG(x ) G(x)wheneverx ⇀xandstrongly n n n → → continuousif G(x ) G(x) whenever x x 0. Note that weak continuity implies strong continuity. n n A → k − k → SimilarlyafunctionalGisweaklylowersemi-continuousifliminf G(x ) G(x)wheneverx ⇀x. n n n WedefinetheSobolevspacesWs,p(I)onI Rby →∞ ≥ ⊆ Ws,p =Ws,p(I)= f :I Rs.t. ∂if Lp(I)fori=0,...,s → ∈ where we use ∂ for the weak derivative, i.e(cid:8). g = ∂f if for all φ C (I) (the space o(cid:9)f smooth functionswith ∈ c∞ compactsupport) dφ f(x) (x)dx= g(x)φ(x)dx. dx − ZI ZI Inparticular,wewillusethespecialcasewhenp=2andwewriteHs =Ws,2. ThisisaHilbertspacewithnorm: s f 2 = ∂if 2 . k kHs k kL2 i=0 X Fortworeal-valuedandpositivesequencesa andb wewritea .b if an isbounded.ForaspaceAanda setK AwewriteKcforthecomplementofKninA,ni.e. Kc =An K.n bn ⊂ \ 3 Convergence when Y = X Weassumewearegivendatapointsξ X fori = 1,2,... whereX isareflexiveandseparableBanachspace i ∈ withnorm andBorelσ-algebra . Thesedatapointsrealizeasequenceof -measurablerandomelements X k·k X X on(Ω, ,P)whichwillalsobedenoted,withaslightabuseofnotation,ξ . i F Wedefine n k 1 f(ω) :Xk R, f(ω)(µ)=P(ω)g = d(ξ(ω),µ ) (1) n → n n µ n i j i=1j=1 X ^ k f :Xk R, f (µ)=Pg = d(x,µ )P(dx) (2) µ j ∞ → ∞ ZXj=1 ^ where k g (x)= d(x,µ ), µ j j=1 ^ P isaprobabilitymeasureon(X, ),andempiricalmeasureP(ω)associatedwithξ(ω),...,ξ(ω) isdefinedby X n 1 n n 1 P(ω)h= h(ξ(ω)) n n i i=1 X 4 foranyX-measurablefunctionh:X →R. Weassumeξi areiidaccordingtoP withP =P◦ξi−1. Wewishtoshow θˆ(ω) θ foralmosteveryωasn (3) n → →∞ where θˆ(ω) = inf f(ω)(µ) n n µ Xk ∈ θ = inf f (µ). µ Xk ∞ ∈ Wedefine :Xk [0, )by k k·k → ∞ µ :=max µ forµ=(µ ,µ ,...,µ ) Xk. (4) k j X 1 2 k k k j k k ∈ Thereflexivityof(X, )carriesthroughto(Xk, ). X k k·k k·k Ourstrategyissimilartothatof[25]butweembedthemethodologyintotheΓ-convergenceframework. We showthat(2)istheΓ-limitinTheorem3.2andthatminimizersareboundedinProposition3.3. Wemaythenapply Theorem2.1toinfer(3)andtheexistenceofaweaklyconvergingsubsequenceofminimizers. The key assumptions on d and P are given in Assumptions 1. The first assumption can be understood as a ‘closeness’conditionforthespaceX withrespecttod. Ifweletd(x,y) = 1forx = y andd(x,x) = 0thenour 6 costfunctionddoesnotcarryanyinformationonhowfaraparttwopointsare. Assumethereexistsaprobability density for P which has unbounded support. Then f(ω)(µ) n k (for almost every ω), with equality when n ≥ −n we choose µ ξ(ω) n . I.e. any set of k unique data points will minimize f(ω). Since our data points are j ∈ { i }i=1 n unboundedwemayfinda sequence ξ(ω) . Nowwechooseµn = ξ(ω) andclearlyourclustercenteris k in kX → ∞ 1 in unbounded.Weseethatthischoiceofdviolatesthefirstassumption.Wealsoaddamomentconditiontotheupper boundtoensureintegrability. NotethatthisalsoimpliesthatPd(,0) M( x )P(dx) < sof (0)< · ≤ X k k ∞ ∞ ∞ and,inparticular,thatf isnotidenticallyinfinity. ∞ R Thesecondassumptionisslightlystrongerconditionondthanaweaklowersemi-continuityconditioninthe firstvariableandstrongcontinuityinthesecondvariable. TheconditionallowstheapplicationofFatou’slemma forweaklyconvergingprobabilities,see[15]. Thethirdassumptionallowsustoviewd(ξ ,y)asacollectionofrandomvariables.Thefourthimpliesthatwe i haveatleastkopenballswithpositiveprobabilityandthereforewearenotoverfittingclusterstodata. Assumptions1. Wehavethefollowingassumptionsond:X X [0, )andP. × → ∞ 1.1. Thereexistcontinuous,strictlyincreasingfunctionsm,M :[0, ) [0, )suchthat ∞ → ∞ m( x y ) d(x,y) M( x y ) forallx,y X X X k − k ≤ ≤ k − k ∈ with lim m(r) = , M(0) = 0, there exists γ < such that M( x + y ) γM( x ) + r X X γM( y →)∞andfinally ∞M( x )P(dx)< (andM i∞smeasurable). k k ≤ k k k kX X k kX ∞ 1.2. Foreachx,y X wehRavethatifxm xandyn ⇀yasn,m then ∈ → →∞ liminf d(x ,y ) d(x,y) and lim d(x ,y)=d(x,y). m n m n,m ≥ m →∞ →∞ 1.3. Foreachy X wehavethatd(,y)is -measurable. ∈ · X 1.4. Thereexistkdifferentcentersµ†j ∈X,j =1,2,...,ksuchthatforallδ >0 P(B(µ†j,δ))>0 ∀j =1,2,...,k whereB(µ,δ):= x X : µ x <δ . X { ∈ k − k } Wenowshowthatforaparticularcommonchoiceofcostfunction,d,Assumptions1.1to1.3hold. Remark3.1. Foranyp>0letd(x,y)= x y p thendsatisfiesAssumptions1.1to1.3. k − kX Proof. Takingm(r) = M(r) = rp wecanboundm( x y ) d(x,y) M( x y )andm,M clearly X X k − k ≤ ≤ k − k satisfym(r) ,M(0)=0,arestrictlyincreasingandcontinuous.Onecanalsoshowthat →∞ M( x+y ) 2p 1( x p + y p ) k kX ≤ − k kX k kX henceAssumption1.1issatisfied. 5 Letx xandy ⇀y. Then m n → 1 liminf d(xm,yn)p = liminf xm ym X n,m n,m k − k →∞ →∞ liminf ( y x x x ) n X m X ≥n,m k − k −k − k →∞ =liminf y x sincex x n X m n k − k → →∞ y x X ≥k − k wherethelastinequalityfollowsasaconsequenceoftheHahn-BanachTheoremandthefactthaty x⇀y x n − − whichimpliesliminf y x y x . Clearlyd(x ,y) d(x,y)andsoAssumption1.2holds. n n X X m →∞k − k ≥k − k → ThethirdassumptionholdsbytheBorelmeasurabilityofmetricsoncompleteseparablemetricspaces. Wenowstatethefirstresultofthepaperwhichformalizestheunderstandingthatf isthelimitoff(ω). n ∞ Theorem3.2. Let(X, )beareflexiveandseparableBanachspacewithBorelσ-algebra, ;let ξ be X i i N k·k X { }∈ asequenceofindependentX-valuedrandomelementswithcommonlawP. Assumed:X X [0, )andthat × → ∞ P satisfiestheconditionsinAssumptions1. Definef(ω) :Xk Randf :Xk Rby(1)and(2)respectively. n → ∞ → Then f =Γ-limf(ω) n ∞ n forP-almosteveryω. Proof. DefineΩ astheintersectionofthreeevents: ′ Ω = ω Ω:P(ω) P ω Ω:P(ω)(B(0,q)c) P(B(0,q)c) q N ′ ∈ n ⇒ ∩ ∈ n → ∀ ∈ n o n o ω Ω: I (x)M( x )P(ω)(dx) I (x)M( x )P(dx) q N . ∩ ∈ B(0,q)c k kX n → B(0,q)c k kX ∀ ∈ (cid:26) ZX ZX (cid:27) By the almost sure weak convergenceof the empiricalmeasure the first of these events has probability one, the second and third are characterized by the convergence of a countable collection of empirical averages to their populationaverageand,bythestronglawoflargenumbers,eachhasprobabilityone.HenceP(Ω)=1. ′ Fixω Ω: wewillshowthattheliminfinequalityholdsandarecoverysequenceexistsforthisωandhence ′ ∈ foreveryω Ω. We start by showingthe lim inf inequality,allowing µn Xk to denoteany sequence ∈ ′ { }∞n=1 ∈ whichconvergesweaklytoµ Xk. Wearerequiredtoshow: ∈ liminff(ω)(µn) f (µ). n n ≥ ∞ →∞ ByTheorem1.1in[15]wehave liminf g (x)P(dx) liminf g (x)P(ω)(dx)=liminfP(ω)g . ZXn→∞,x′→x µn ′ ≤ n→∞ ZX µn n n→∞ n µn Foreachx X,wehavebyAssumption1.2that ∈ liminf d(x,µn) d(x,µ ). x′ x,n ′ j ≥ j → →∞ Bytakingtheminimumoverj wehave k k liminf g (x)= liminf d(x,µn) d(x,µ )=g (x). x′ x,n µn ′ x′ x,n ′ j ≥ j µ → →∞ j=1 → →∞ j=1 ^ ^ Hence linm→i∞nffn(ω)(µn)=linm→i∞nfPn(ω)gµn ≥ZXgµ(x)P(dx)=f∞(µ) asrequired. Wenowestablishtheexistenceofarecoverysequenceforeveryω Ω andeveryµ Xk. Letµn =µ Xk. ′ ∈ ∈ ∈ Letζ beaC (X)sequenceoffunctionssuchthat0 ζ (x) 1forallx X,ζ (x) = 1forx B(0,q 1) q ∞ q q ≤ ≤ ∈ ∈ − andζ (x)=0forx B(0,q). Thenthefunctionζ (x)g (x)iscontinuousinx(andwithrespecttoconvergence q q µ 6∈ 6 in )forallq. Wealsohave X k·k ζ (x)g (x) ζ (x)d(x,µ ) q µ q 1 ≤ ζ (x)M( x µ ) q 1 X ≤ k − k ζ (x)M( x + µ ) q X 1 X ≤ k k k k M(q+ µ ) 1 X ≤ k k soζ g isacontinuousandboundedfunction,hencebytheweakconvergenceofP(ω)toP wehave q µ n P(ω)ζ g Pζ g n q µ → q µ asn forallq N. Forallq Nwehave →∞ ∈ ∈ limsup P(ω)g Pg limsup P(ω)g P(ω)ζ g +limsup P(ω)ζ g Pζ g +limsup Pζ g Pg n | n µ− µ|≤ n | n µ− n q µ| n | n q µ− q µ| n | q µ− µ| →∞ →∞ →∞ →∞ =limsup P(ω)g P(ω)ζ g + Pζ g Pg . n | n µ− n q µ| | q µ− µ| →∞ Therefore, limsup P(ω)g Pg liminflimsup P(ω)g P(ω)ζ g n | n µ− µ|≤ q n | n µ− n q µ| →∞ →∞ →∞ bythedominatedconvergencetheorem. Wenowshowthattherighthandsideoftheaboveexpressionisequalto zero.Wehave P(ω)g P(ω)ζ g P(ω)I g | n µ− n q µ|≤ n (B(0,q−1))c µ P(ω)I d(,µ ) ≤ n (B(0,q−1))c · 1 P(ω)I M( µ ) ≤ n (B(0,q−1))c k·− 1kX γ P(ω)I M( )+M( µ )P(ω)I ≤ n (B(0,q−1))c k·kX k 1kX n (B(0,q−1))c γ(cid:16)PI M( )+M( µ )PI as(cid:17)n (B(0,q 1))c X 1 X (B(0,q 1))c → − k·k k k − →∞ 0 asq (cid:0) (cid:1) → →∞ wherethelastlimitfollowsbythemonotoneconvergencetheorem.Wehaveshown lim P(ω)g Pg =0. n | n µ− µ| →∞ Hence f(ω)(µ) f (µ) n → ∞ asrequired. NowwehaveestablishedalmostsureΓ-convergenceweestablishtheboundednessconditioninProposition3.3 sowecanapplyTheorem2.1. Proposition3.3. AssumingtheconditionsofTheorem3.2anddefine by(4),thereexistsR>0suchthat k k·k inf f(ω)(µ)= inf f(ω)(µ) nsufficientlylarge µ∈Xk n kµkk≤R n ∀ forP-almosteveryω. InparticularRisindependentofn. Proof. The structure of the proof is similar to [20, Lemma 2.1]. We argue by contradiction. In particular we argue that if a cluster center is unboundedthen in the limit the minimum is achieved over the remaining k 1 − clustercenters. WethenuseAssumption1.4toimplythataddinganextraclustercenterwillstrictlydecreasethe minimum,andhencewehaveacontradiction. WedefineΩ tobe ′′ Ω′′ =∩δ∈Q∩(0,∞),l=1,2,...,k ω ∈Ω′ :Pn(ω)(B(µ†l,δ))→P(B(µ†l,δ)) . n o AsΩ isthecountableintersectionofsetsofprobabilityone,wehaveP(Ω ) = 1. Fixω Ω andassumethat ′′ ′′ ′′ ∈ theclustercentersµn Xk arealmostminimizers,i.e. ∈ f(ω)(µn) inf f(ω)(µ)+ε n ≤µ Xk n n ∈ 7 forsomesequenceε >0suchthat n lim ε =0. (5) n n →∞ Assumethat lim µn = . Thereexistsl 1,...,k suchthat lim µn = . Fixx X then n k kk ∞ n ∈{ } n k lnkX ∞ ∈ →∞ →∞ d(x,µn ) m( µn x ) . ln ≥ k ln − kX →∞ Therefore,foreachx X, ∈ k lim d(x,µn) d(x,µn) =0. n  j − j  →∞ j^=1 j^6=ln   Letδ >0thenthereexistsN suchthatforn N ≥ k d(x,µn) d(x,µn) δ. j − j ≥− j^=1 j^6=ln Hence k liminf d(x,µn) d(x,µn) P(ω)(dx) δ. n  j − j  n ≥− →∞ Z j^=1 j^6=ln   Lettingδ 0wehave → k liminf d(x,µn) d(x,µn) P(ω)(dx) 0 n  j − j  n ≥ →∞ Z j^=1 j^6=ln   andmoreover liminf f(ω)(µn) f(ω) (µn) 0, (6) n n − n j j6=ln ≥ whereweinterpretf(ω)accordingly.→It∞suf(cid:16)ficestodemonstrat(cid:0)ethat (cid:1)(cid:17) n liminf inf f(ω)(µ) inf f(ω)(µ) <0. (7) n µ Xk n −µ Xk−1 n →∞ (cid:18) ∈ ∈ (cid:19) Indeed,if(7)holds,then liminf f(ω)(µn) f(ω) (µn) n n − n j j6=ln →∞ (cid:16) (cid:0) (cid:1)(cid:17) = lim f(ω)(µn) inf f(ω)(µ) +liminf inf f(ω)(µ) f(ω) (µn) n n −µ Xk n n µ Xk n − n j j6=ln →∞ ∈ →∞ (cid:18) ∈ (cid:19) (cid:0) (cid:1) (cid:0) (cid:1) ≤εn <0 by|(5)and(7),{z } butthiscontradicts(6). We nowestablish(7). ByAssumption1.4thereexistsk centersµ†j ∈ X andδ1 > 0suchthatminj6=lkµ†j − µ†lkX ≥δ1. Henceforanyµ∈Xk−1thereexistsl∈{1,2,...,k}suchthatwehave δ kµ†l −µjkX ≥ 21 forj =1,2,...,k−1. Proceedingwiththischoiceofl,forx∈B(µ†l,δ2)(foranyδ2 ∈(0,δ1/2))wehave δ 1 µ x δ j X 2 k − k ≥ 2 − andtherefored(µ ,x) m(δ1 δ )forallj =1,2,...,k 1. Also j ≥ 2 − 2 − δ Dl(µ):=j=1,m2,.i.n.,k 1d(x,µj)−d(x,µ†l)≥m( 21 −δ2)−M(δ2). (8) − 8 Soforδ sufficientlysmallthereexistsǫ>0suchthat 2 D (µ) ǫ. l ≥ Sincetherighthandsideisindependentofµ Xk 1, − ∈ inf maxD (µ) ǫ. l µ Xk−1 l ≥ ∈ Definethecharacteristicfunction χ (ξ)= 1 ifkξ−µ†l(µ)kX <δ2 µ (0 otherwise, wherel(µ)isthemaximizerin(8). Foreachω Ω oneobtains ′′ ∈ n k 1 1 − inf f(ω)(µ)= inf d(ξ ,µ ) µ Xk−1 n µ Xk−1 n i j ∈ ∈ i=1j=1 X ^ n k 1 1 − ≥µ iXnkf−1 n  d(ξi,µj)(1−χµ(ξi))+ d(ξi,µ†l(µ))+ǫ χµ(ξi) ∈ Xi=1 j^=1 (cid:16) (cid:17) ≥µinXfkfn(ω)(µ)+ǫl=1m,2i,n...,kPn(ω)(B(µ†l,δ2)).  ∈ ThensincePn(ω)(B(µ†l,δ2))→P(B(µ†l,δ2))> 0byAssumption1.4(forδ2 ∈Q∩(0,∞))wecanconclude(7) holds. Remark3.4. OnecaneasilyshowthatAssumption1.2impliesthatdisweaklylowersemi-continuousinitssecond argumentwhichcarriesthroughtof(ω). Itfollowsthatonanybounded(orequivalentlyasX isreflexive: weakly n compact)settheinfimumoff(ω)isachieved.HencetheinfimuminProposition3.3isactuallyaminimum. n WenoweasilyproveconvergencebyapplicationofTheorem2.1. Theorem3.5. AssumingtheconditionsofTheorem3.2andProposition3.3theminimizationproblemassociated withthek-meansmethodconverges.I.e.forP-almosteveryω: min f (µ)= lim min f(ω)(µ). n µ Xk ∞ n µ Xk ∈ →∞ ∈ Furthermoreanysequenceofminimizersµn off(ω) isalmostsurelyweaklyprecompactandanyweaklimitpoint n minimizesf . ∞ 4 The Case of General Y In the previous section the data, ξ , and cluster centers, µ , took their values in a common space, X. We now i j remove this restriction and let ξ : Ω X and µ Y. We may want to use this framework to deal with i j → ∈ finitedimensionaldataandinfinitedimensionalclustercenters,whichcanleadtothevariationalproblemhaving uninformativeminimizers. Intheprevioussectionthecostfunctiondwasassumedtoscale withtheunderlyingnorm. Thisisnolonger appropriatewhend : X Y [0, ). Inparticularifweconsiderthesmoothing-dataassociationproblemthen × → ∞ the natural choice of d is a pointwise distance which will lead to the optimal cluster centers interpolating data points.Hence,inanyHsnormwiths 1,theoptimalclustercenters“blowup”. ≥ OnepossiblesolutionwouldbetoweakenthespacetoL2 andallowthistypeofbehavior. Thisisundesirable frombothmodelingandmathematicalperspectives:Ifwefirstconsiderthemodelingpointofviewthenwedonot expectourestimatetoperfectlyfitthedatawhichisobservedinthepresenceofnoise. Itisnaturalthatthecluster centersaresmootherthanthedataalonewouldsuggest.Itisdesirablethattheoptimalclustersshouldreflectreality. Fromthemathematicalpointofview,restrictingourselvestoonlyveryweakspacesgivesnohopeofobtaininga stronglyconvergentsubsequence. Analternativeapproachis,asiscommoninthesmoothingliterature,tousearegularizationterm.Thisapproach is also standard when dealing with ill-posed inverse problems. This changes the nature of the problem and so requiressomejustification.Inparticularthescalingoftheregularizationwiththedataisoffundamentalimportance. In the following section we argue that scaling motivated by a simple Bayesian interpretation of the problem is 9 not strong enough (unsurprisingly, countable collections of finite dimensional observations do not carry enough informationto provideconsistency when dealing with infinite dimensionalparameters). In the form of a simple exampleweshowthattheoptimalclustercenterisunboundedinthelargedatalimitwhentheregularizationgoes tozero sufficientlyquickly. Thenaturalscaling inthis exampleisfor theregularizationto varywith thenumber ofobservationsas np forp [ 4,0]. We considerthecase p = 0in Section4.2. Thistypeofregularizationis ∈ −5 understoodaspenalizedlikelihoodestimation[16]. Althoughitmayseemundesirableforthelimitingproblemtodependupontheregularizationitisunavoidable inill-posedproblemssuchasthisone: thereisnotsufficientinformation,inevencountablyinfinitecollectionsof observationstorecovertheunknownclustercentersandexploitingknown(orexpected)regularityinthesesolutions providesone way to combineobservationswith qualitative prior beliefs aboutthe cluster centers in a principled manner.Therearemanyprecedentsforthisapproach,including[17]inwhichtheconsistencyofpenalizedsplines is studied using, what in this paper we call, the Γ-limit. In that paper a fixed regularization was used to define the limitingproblemin orderto derivean estimator. Naturally, regularizationstrongenoughto alter the limiting probleminfluencesthesolutionandwecannothopetoobtainconsistentestimationinthissetting,eveninsettings in which the cost function can be interpretedas the log likelihood of the data generating process. In the setting of [17], the regularization is finally scaled to zero whereuponunder assumptions the estimator convergesto the truthbutsuchastepisnotfeasibleinthemorecomplicatedsettingsconsideredhere. Whenmorestructureisavailableitmaybedesirabletofurtherinvestigatetheregularization.Forexamplewith k =1thenon-parametricregressionmodelisequivalenttothewhitenoisemodel[7]forwhichoptimalscalingof theregularizationisknown[1,30]. Itisthesubjectoffurtherworktoextendtheseresultstok >1. With our redefined k-means type problem we can replicate the results of the previous section, and do so in Theorem4.6. Thatis, weprovethatthek-meansmethodconvergeswhereY isageneralseparableandreflexive BanachspaceandinparticularneednotbeequaltoX. Thissectionissplitintothreesubsections.Inthefirstwemotivatetheregularizationterm.Thesecondcontains theconvergencetheoryinageneralsetting. Establishingthattheassumptionsofthissubsectionholdisnon-trivial andso,inthethirdsubsection,weshowanapplicationtothesmoothing-dataassociationproblem. 4.1 Regularization Inthissectionweuseatoy,k =1,smoothingproblemtomotivateanapproachtoregularizationwhichisadopted inwhatfollows.Weassumethattheclustercentersareperiodicwithequallyspacedobservationssowemayusea Fourierargument.Inparticularweworkonthespaceof1-periodicfunctionsinH2, Y = µ:[0,1] Rs.t. µ(0)=µ(1)andµ H2 . (9) → ∈ Forarbitrarysequences(a ),(b(cid:8))anddataΨ = (t ,z ) n [0,1] Rd(cid:9)wedefinethefunctional n n n { j j }j=1 ⊂ × n 1 f(ω)(µ)=a − µ(t ) z 2+b ∂2µ 2 . (10) n n | j − j| nk kL2 j=0 X Dataarepointsinspace-time:[0,1] R.TheregularizationischosensothatitpenalizestheL2normofthesecond × derivative. For simplicity, we employ deterministic measurementtimes t in the followingpropositionalthough j thisliesoutsidetheformalframeworkwhichweconsidersubsequently. Anothersimplificationwemakeistouse convergenceinexpectationratherthanalmostsureconvergence.Thissimplifiesourarguments.Westressthatthis sectionisthemotivationfortheproblemstudiedinSection4.2. Wewillgiveconditionsonthescalingofa and n b thatdeterminewhetherEminf(ω)andEµnstayboundedwhereµnistheminimizeroff(ω). n n n Proposition4.1. LetdatabegivenbyΨ = (t ,z ) n with t = j undertheassumptionz = µ (t )+ǫ n { j j }j=1 j n j † j j forǫ iidnoisewithfinitevarianceandµ L2 anddefineY by(9). Theninf f(ω)(µ)definedby(10)stays j † µ Y n bounded(inexpectation)ifa =O(1)fora∈nypositivesequenceb . ∈ n n n Proof. Assumenisodd.Bothµandzare1-periodicsowecanwrite n−1 n−1 µ(t)= 1 2 µˆle2πilt and zj = 1 2 zˆle2πnilj n n l=X−n−21 l=X−n−21 with n 1 n 1 − 2πilj − 2πilj µˆl = µ(tj)e− n and zˆl = zje− n . j=0 j=0 X X 10

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.