ebook img

Scenario Reduction Revisited: Fundamental Limits and Guarantees PDF

1.5 MB·
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Scenario Reduction Revisited: Fundamental Limits and Guarantees

Scenario Reduction Revisited: Fundamental Limits and Guarantees Napat Rujeerapaiboon, Kilian Schindler, Daniel Kuhn, Wolfram Wiesemann 7 1 Abstract The goal of scenario reduction is to approximate a given discrete dis- 0 2 tribution with another discrete distribution that has fewer atoms. We distinguish continuous scenario reduction, where the new atoms may be chosen freely, and n discretescenarioreduction,wherethenewatomsmustbechosenfromamongthe a J existingones.UsingtheWassersteindistanceasmeasureofproximitybetweendis- 5 tributions, we identify those n-point distributions on the unit ball that are least 1 susceptible to scenario reduction, i.e., that have maximum Wasserstein distance to their closest m-point distributions for some prescribed m<n. We also provide ] sharp bounds on the added benefit of continuous over discrete scenario reduction. C Finally, to our best knowledge, we propose the first polynomial-time constant- O factor approximations for both discrete and continuous scenario reduction as well . as the first exact exponential-time algorithms for continuous scenario reduction. h t a Keywords scenario reduction, Wasserstein distance, constant-factor approxima- m tion algorithm, k-median clustering, k-means clustering [ 1 v 1 Introduction 2 7 The vast majority of numerical solution schemes in stochastic programming rely 0 onadiscreteapproximationofthetrue(typicallycontinuous)probabilitydistribu- 4 tion governing the uncertain problem parameters. This discrete approximation is 0 often generated by sampling from the true distribution. Alternatively, it could be . 1 constructeddirectlyfromrealhistoricalobservationsoftheuncertainparameters. 0 7 NapatRujeerapaiboon,KilianSchindler,DanielKuhn 1 RiskAnalyticsandOptimizationChair v: E´colePolytechniqueF´ed´eraledeLausanne,Switzerland Tel.:+41(0)216930036 Fax:+41(0)216932489 i X E-mail:napat.rujeerapaiboon@epfl.ch,kilian.schindler@epfl.ch,daniel.kuhn@epfl.ch r WolframWiesemann a ImperialCollegeBusinessSchool ImperialCollegeLondon,UnitedKingdom Tel.:+44(0)2075949150 E-mail:[email protected] 2 NapatRujeerapaiboon,KilianSchindler,DanielKuhn,WolframWiesemann Toobtainafaithfulapproximationforthetruedistribution,however,thediscrete distributionmusthavealargenumbernofsupportpointsorscenarios,whichmay render the underlying stochastic program computationally excruciating. An effective means to ease the computational burden is to rely on scenario reduction pioneered by Dupaˇcova´ et al (2003), which aims to approximate the initial n-point distribution with a simpler m-point distribution (m<n) that is as close as possible to the initial distribution with respect to a probability metric; see also Heitsch and Ro¨misch (2003). The modern stability theory of stochastic programmingsurveyedbyDupaˇcova´(1990)andRo¨misch(2003)indicatesthatthe Wasserstein distance may serve as a natural candidate for this probability metric. Our interest in Wasserstein distance-based scenario reduction is also fuelled by recent progress in data-driven distributionally robust optimization, where it has been shown that the worst-case expectation of an uncertain cost over all dis- tributions in a Wasserstein ball can often be computed efficiently via convex op- timization (Mohajerin Esfahani and Kuhn 2015, Zhao and Guan 2015, Gao and Kleywegt 2016). A Wasserstein ball is defined as the family of all distributions that are within a certain Wasserstein distance from a discrete reference distribu- tion. As distributionally robust optimization problems over Wasserstein balls are harder tosolvethan their stochastic counterparts, we expectsignificantcomputa- tional savings from replacing the initial n-point reference distribution with a new m-point reference distribution. The benefits of scenario reduction may be partic- ularly striking for two-stage distributionally robust linear programs, which admit tight approximations as semidefinite programs (Hanasusanto and Kuhn 2016). Suppose now that the initial distribution is given by P = (cid:80) p δ , where i∈I i ξi ξ ∈ Rd and p ∈ [0,1] represent the location and probability of the i-th scenario i i of P for i∈I ={1,...,n}. Similarly, assume that the reduced target distribution is representable as Q = (cid:80) q δ , where ζ ∈ Rd and q ∈ [0,1] stand for the j∈J j ζj j j location and probability of the j-th scenario of Q for j ∈ J = {1,...,m}. Then, the type-l Wasserstein distance between P and Q is defined through   (cid:80) 1/l dl(P,Q)= min (cid:88)(cid:88)πij(cid:107)ξi−ζj(cid:107)l : (cid:80)j∈Jπij =pi ∀i∈I  , Π∈Rn+×mi∈Ij∈J i∈Iπij =qj ∀j ∈J where l ≥ 1 and (cid:107)·(cid:107) denotes some norm on Rd, see, e.g., Heitsch and Ro¨misch (2007) or Pflug and Pichler (2011). The linear program in the definition of the Wasserstein distance can be viewed as a minimum-cost transportation problem, where π represents the amount of probability mass shipped from ξ to ζ at ij i j unit transportation cost (cid:107)ξ −ζ (cid:107)l. Thus, dl(P,Q) quantifies the minimum cost of i j l moving the initial distribution P to the target distribution Q. For any Ξ ⊆ Rd, we denote by P (Ξ,n) the set of all uniform discrete distri- E butions on Ξ with exactly n distinct scenarios and by P(Ξ,m) the set of all (not necessarily uniform) discrete distributions on Ξ with at most m scenarios. We henceforth assume that P∈P (Rd,n). This assumption is crucial for the simplic- E ityoftheresultsinSections2and3,anditisalmostsurelysatisfiedwheneverPis obtained via sampling from a continuous probability distribution. Hence, we can think of P as an empirical distribution. To remind us of this interpretation, we will henceforth denote the initial distribution by Pˆ . Note that the pairwise difference n ofthescenarioscanalwaysbeenforcedbyslightlyperturbingtheirlocations,while ScenarioReductionRevisited:FundamentalLimitsandGuarantees 3 theuniformityoftheirprobabilitiescanbeenforcedbydecomposingthescenarios into clusters of close but mutually distinct sub-scenarios with (smaller) uniform probabilities. We are now ready to introduce the continuous scenario reduction problem (cid:110) (cid:111) Cl(Pˆn,m) = min dl(Pˆn,Q): Q∈P(Rd,m) , Q where the new scenarios ζ , j ∈ J, of the target distribution Q may be chosen j freely from within Rd, as well as the discrete scenario reduction problem (cid:110) (cid:111) Dl(Pˆn,m) = min dl(Pˆn,Q): Q∈P(supp(Pˆn),m) , Q where the new scenarios must be chosen from within the support of the empirical distribution, which is given by the finite set supp(Pˆn)={ξi :i∈I}. Even though the continuous scenario reduction problem offers more flexibility and is therefore guaranteed to find (weakly) better approximations to the initial empirical distri- bution, to our best knowledge, the existing stochastic programming literature has exclusively focused on the discrete scenario reduction problem. Notethatifthesupportpointsζ ,j ∈J,arefixed,thenbothscenarioreduction j problemssimplifytoalinearprogramovertheprobabilitiesq ,j ∈J,whichadmits j an explicit solution (Dupaˇcova´ et al 2003, Theorem 2). Otherwise, however, both problems are intractable. Indeed, if l = 1, then the discrete scenario reduction problem represents a metric k-median problem with k = m, which was shown to beNP-hardbyKarivandHakimi(1979).Ifl=2anddistancesinRdaremeasured bythe2-norm,ontheotherhand,thenthecontinuousscenarioreductionproblem constitutes a k-means clustering problem with k = m, which is NP-hard even if d=2 or m=2; see Mahajan et al (2009) and Aloise et al (2009). Heitsch and Ro¨misch (2003) have shown that the discrete scenario reduction problemadmitsareformulationasamixed-integerlinearprogram(MILP),which canbesolvedtoglobaloptimalityforn(cid:46)103 usingoff-the-shelfsolvers.Forlarger instances,however,onemustresorttoapproximationalgorithms.Mostlarge-scale discrete scenario reduction problems are nowadays solved with a greedy heuristic thatwasoriginallydevisedbyDupaˇcova´etal(2003)andfurtherrefinedbyHeitsch and Ro¨misch (2003). For example, this heuristic is routinely used for scenario (tree) reduction in the context of power systems operations, see, e.g., Ro¨misch and Vigerske (2010) or Morales et al (2009) and the references therein. Despite its practical success, we will show in Section 4 that this heuristic fails to provide a constant-factor approximation for the discrete scenario reduction problem. This paper extends the theory of scenario reduction along several dimensions. (i) We establish fundamental performance guarantees for continuous scenario re- ductionwhenl∈{1,2},i.e.,weshowthattheWassersteindistanceoftheinitial (cid:113) n-point distribution to its nearest m-point distribution is bounded by n−m n−1 across all initial distributions on the unit ball in Rd. We show that for l = 2 this worst-case performance is attained by some initial distribution, which we construct explicitly. We also provide evidence indicating that this worst-case performance reflects the norm rather than the exception in high dimensions d. Finally, we provide a lower bound on the worst-case performance for l=1. 4 NapatRujeerapaiboon,KilianSchindler,DanielKuhn,WolframWiesemann (ii) We analyze the loss of optimality incurred by solving the discrete scenario re- ductionprobleminsteadofitscontinuouscounterpart.Specifically,wedemon- √ strate that the ratio Dl(Pˆn,m)/Cl(Pˆn,m) is bounded by 2 for l=2 and by 2 for l=1. We also show that these bounds are essentially tight. (iii) We showcase the intimate relation between scenario reduction and k-means clustering.Byleveragingexistingconstant-factorapproximationalgorithmsfor k-medianclusteringproblemsduetoAryaetal(2004)andthenewperformance boundsfrom(ii),wedevelopthefirstpolynomial-timeconstant-factorapprox- imation algorithms for both continuous and discrete scenario reduction. We alsoshowthatthesealgorithmscanbewarmstartedusingthegreedyheuristic by Dupaˇcova´ et al (2003) to improve practical performance. (iv) Wepresentexactmixed-integerprogrammingreformulationsforthecontinuous scenario reduction problem. Continuousscenarioreductionisintimatelyrelatedtotheoptimalquantization of probability distributions, where one seeks an m-point distribution approximat- inganon-discreteinitialdistribution.Researcheffortsinthisdomainhavemainly focusedontheasymptoticbehaviorofthequantizationproblemasmtendstoin- finity, see Graf and Luschgy (2000). The ramifications of this stream of literature forstochasticprogrammingarediscussedbyPflugandPichler(2011).Techniques familiarfromscenarioreductionlendthemselvesalsoforscenariogeneration,where one aims to construct a scenario tree with a prescribed branching structure that approximates a given stochastic process with respect to a probability metric, see, e.g., Pflug (2001) and Hochreiter and Pflug (2007). The rest of this paper unfolds as follows. Section 2 seeks to identify n-point distributions on the unit ball that are least susceptible to scenario reduction, i.e., that have maximum Wasserstein distance to their closest m-point distributions, and Section 3 discusses sharp bounds on the added benefit of continuous over discrete scenario reduction. Section 4 presents exact exponential-time algorithms as well as polynomial-time constant-factor approximations for scenario reduction. Section 5 reports on numerical results for a color quantization experiment. Unless otherwise specified, below we will always work with the 2-norm on Rd. Notation: We let I be the identity matrix, e the vector of all ones and e the i-th i standard basis vector of appropriate dimensions. The ij-th element of a matrix A is denoted by a . For A and B in the space Sn of symmetric n×n matrices, ij the relation A(cid:23)B means that A−B is positive semidefinite. Generic norms are denoted by (cid:107)·(cid:107), while (cid:107)·(cid:107)p stands for the p-norm, p≥1. For Ξ⊆Rd, we define P(Ξ,m) as the set of all probability distributions supported on at most m points in Ξ and P (Ξ,n) as the set of all uniform distributions supported on exactly E n distinct points in Ξ. The support of a probability distribution P is denoted by supp(P),andtheDiracdistributionconcentratingunitmassatξ isdenotedbyδ . ξ 2 Fundamental Limits of Scenario Reduction In this section we characterize the Wasserstein distance Cl(Pˆn,m) between an n- pointempiricaldistributionPˆn = n1 (cid:80)ni=1δξi anditscontinuouslyreducedoptimal m-pointdistributionQ∈P(Rd,m).SincethepositivehomogeneityoftheWasser- stein distance dl implies that Cl(Pˆ(cid:48)n,m)=λ·Cl(Pˆn,m) for the scaled distribution ScenarioReductionRevisited:FundamentalLimitsandGuarantees 5 Pˆ(cid:48)n = n1 (cid:80)ni=1δλξi, λ ∈ R+, we restrict ourselves to empirical distributions Pˆn whose scenarios satisfy (cid:107)ξ (cid:107) ≤1, i=1,...,n. We thus want to quantify i 2 (cid:110) (cid:111) Cl(n,m)= max Cl(Pˆn,m) : (cid:107)ξ(cid:107)2 ≤1 ∀ξ∈supp(Pˆn) , (1) ˆPn∈PE(Rd,n) which amounts to the worst-case (i.e., largest) Wasserstein distance between any n-point empirical distribution Pˆn over the unit ball and its optimally selected continuous m-point scenario reduction. By construction, this worst-case distance satisfies C (n,m)≥0, and the lower bound is attained whenever n=m. One also l verifies that C (n,m) ≤ C (n,1) ≤ 1 since the Wasserstein distance to the Dirac l l distribution δ is bounded above by 1. Our goal is to derive possibly tight upper 0 bounds on C (n,m) for the Wasserstein distances of type l∈{1,2}. l In the following, we denote by P(I,m) the family of all m-set partitions of the index set I, i.e., (cid:8) (cid:9) P(I,m)= {I1,...,Im} : ∅(cid:54)=I1,...,Im ⊆I, ∪jIj =I, Ii∩Ij =∅ ∀i(cid:54)=j , and an element of this set (i.e. a specific m-set partition) as {I }∈P(I,m). Our j derivations will make extensive use of the following theorem. Theorem 1 For any type-l Wasserstein distance induced by any norm (cid:107)·(cid:107), the con- tinuous scenario reduction problem can be reformulated as  1/l Cl(Pˆn,m)={Ij}m∈Pin(I,m) n1 j(cid:88)∈Jζmj∈iRndi(cid:88)∈Ij(cid:107)ξi−ζj(cid:107)l . (2) Problem(2)canbeinterpretedasaVoronoipartitioningproblemthatasksfor aVoronoidecompositionofRdintomcellswhoseVoronoicentroidsζ1,...,ζmmin- imizethecumulativel-thpowersofthedistancestonprespecifiedpointsξ1,...,ξn. Proof ofTheorem1 Theorem2ofDupaˇcova´etal(2003)impliesthatthesmallest Wasserstein distance between the empirical distribution Pˆn ∈ PE(Rd,n) and any distribution Q supported on a finite set Ξ⊂Rd amounts to (cid:34) (cid:35)1/l Q∈Pm(iΞn,∞)dl(Pˆn,Q)= n1 (cid:88)mζ∈iΞn(cid:107)ξi−ζ(cid:107)l , i∈I where P(Ξ,∞) denotes the set of all probability distributions supported on the finite set Ξ. The continuous scenario reduction problem Cl(Pˆn,m) selects the set Ξ(cid:63) that minimizes this quantity over all sets in Ξ⊂Rd with |Ξ|=m elements: (cid:34) (cid:35)1/l Cl(Pˆn,m)={ζmj}i⊆nRd n1 (cid:88)i∈Imj∈iJn(cid:107)ξi−ζj(cid:107)l . (3) One readily verifies that any optimal solution {ζ(cid:63),...,ζ(cid:63)} to problem (3) corre- 1 m spondstoanoptimalsolution{I(cid:63),...,I(cid:63)}toproblem(2)withthesameobjective 1 m value if we identify the set I(cid:63) with all observations ξ that are closer to ζ(cid:63) than j i j any other ζ(cid:63) (ties may be broken arbitrarily). Likewise, any optimal solution j(cid:48) {I(cid:63),...,I(cid:63)} to problem (2) with inner minimizers {ζ(cid:63),...,ζ(cid:63)} translates into an 1 m 1 m optimal solution {ζ(cid:63),...,ζ(cid:63)} to problem (3) with the same objective value. (cid:117)(cid:116) 1 m 6 NapatRujeerapaiboon,KilianSchindler,DanielKuhn,WolframWiesemann Remark 1 (Minimizers of (2)) For l=2, the inner minimum corresponding to the set I is attained by the mean ζ(cid:63) =mean(I )= 1 (cid:80) ξ . Likewise, for l =1, j j j |Ij| i∈Ij i theinnerminimumcorrespondingtothesetI isattainedbyanygeometricmedian j ζ(cid:63) =gmed(I )∈argmin(cid:88)(cid:107)ξ −ζ (cid:107), j j i j ζj∈Rd i∈Ij whichcanbedeterminedefficientlybysolvingasecond-orderconeprogramwhen- ever a p-norm with rational p≥1 is considered (Alizadeh and Goldfarb (2003)). TherestofthissectionderivestightupperboundsonC (n,m)forWasserstein l distancesoftypel=2(Section2.1)aswellasupperandlowerboundsforWasser- steindistancesoftypel=1(Section2.2).Wesummarizeanddiscussourfindings in Section 2.3. 2.1 Fundamental Limits for the Type-2 Wasserstein Distance We now derive a revised upper bound on C (n,m) for the type-2 Wasserstein l distance.Theresultreliesonauxiliarylemmasthatarerelegatedtotheappendix. (cid:113) Theorem 2 Theworst-casetype-2WassersteindistancesatisfiesC (n,m)≤ n−m. 2 n−1 Note that whenever the reduced distribution satisfies m > 1, the bound of Theorem 2 is strictly tighter than the na¨ıve bound of 1 from the previous section. Proof of Theorem 2 From Theorem 1 and Remark 1 we observe that  1/2 1 (cid:88)(cid:88)(cid:13) (cid:13)2 C2(n,m) = {ξmi}a⊆xRd {Ij}m∈Pin(I,m)nj∈Ji∈Ij(cid:13)ξi−mean(Ij)(cid:13)2 s.t. (cid:107)ξ (cid:107) ≤1 ∀i∈I. i 2 Introducing the epigraphical variable τ, this problem can be expressed as 2 1 C2(n,m) = max τ τ∈R,{ξi}⊆Rd n (cid:88)(cid:88)(cid:13) (cid:13)2 s.t. τ ≤ (cid:13)ξi−mean(Ij)(cid:13)2 ∀{Ij}∈P(I,m) (4) j∈Ji∈Ij ξ (cid:62)ξ ≤1 ∀i∈I. i i For each j ∈ J and i ∈ I , the squared norm in the first constraint of (4) can be j expressed in terms of the inner products between pairs of empirical observations: (cid:13)(cid:13)ξi−mean(Ij)(cid:13)(cid:13)22 = |I1j|2(cid:13)(cid:13)(cid:13)|Ij|ξi− (cid:88) ξk(cid:13)(cid:13)(cid:13)22 k∈Ij (cid:32) (cid:33) = |I1|2 |Ij|2ξi(cid:62)ξi−2|Ij| (cid:88) ξi(cid:62)ξk+ (cid:88) ξk(cid:62)ξk+ (cid:88) ξk(cid:62)ξk(cid:48) . j k∈Ij k∈Ij k,k(cid:48)∈Ij k(cid:54)=k(cid:48) ScenarioReductionRevisited:FundamentalLimitsandGuarantees 7 Introducing the Gram matrix S=[ξ1,...,ξn](cid:62)[ξ1,...,ξn]∈Sn, S(cid:23)0 and rank(S)≤min{n,d} (5) then allows us to simplify the first constraint in (4) to (cid:32) (cid:33) (cid:88) 1 (cid:88) (cid:88) (cid:88) (cid:88) τ ≤ |I |2 |Ij|2sii−2|Ij| sik+ skk+ skk(cid:48) . j j∈J i∈Ij k∈Ij k∈Ij k,k(cid:48)∈Ij k(cid:54)=k(cid:48) Note that the second constraint in problem (4) can now be expressed as s ≤ 1, ii and hence all constraints in (4) are linear in the Gram matrix S. Our discussion implies that we obtain an upper bound on C (n,m) by refor- 2 mulating problem (4) as a semidefinite program in terms of the Gram matrix S 1 max τ τ∈R,S∈Sn n (cid:32) (cid:33) (cid:88) 1 (cid:88) (cid:88) (cid:88) (cid:88) s.t. τ ≤ |I |2 |Ij|2sii−2|Ij| sik+ skk+ skk(cid:48) j j∈J i∈Ij k∈Ij k∈Ij k,k(cid:48)∈Ij k(cid:54)=k(cid:48) ∀{I }∈P(I,m) j S(cid:23)0, s ≤1 ∀i∈I, ii (6) wherewehaverelaxedtherankconditioninthedefinitionoftheGrammatrix(5). Lemma 1 in the appendix shows that (6) has an optimal solution (τ(cid:63),S(cid:63)) that satisfies S(cid:63) = αI+βee(cid:62) for some α,β ∈ R. Moreover, Lemma 2 in the appendix shows that any matrix of the form S = αI+β11(cid:62) is positive semidefinite if and only if α≥0 and α+nβ ≥0. We thus conclude that (6) can be reformulated as 1 max τ τ,α,β∈R n (7) s.t. τ ≤(n−m)α, α+β ≤1 α≥0, α+nβ ≥0, where the first constraint follows from the fact that for any set I in (6), we have j (cid:32) (cid:33) 1 (cid:88) (cid:88) (cid:88) |I |2(α+β)−2|I |(α+|I |β)+ (α+β)+ β =(|I |−1)α, |I |2 j j j j j i∈Ij k∈Ij k,k(cid:48)∈Ij k(cid:54)=k(cid:48) (cid:80) and (|I |−1)α=(n−m)α since |I|=n and |J|=m. The statement of the j∈J j theorem now follows since problem (7) is optimized by τ(cid:63) = n(n−m), α(cid:63) = n (n−1) n−1 and β(cid:63) = −1 . (cid:117)(cid:116) n−1 (cid:113) TheproofofTheorem2showsthattheupperbound n−m ontheworst-case n−1 type-2 Wasserstein distance C (n,m) is tight whenever there is an empirical dis- 2 tribution Pˆn ∈PE(Rd,n) whose scenarios ξ1,...,ξn correspond to a G√ram matrix S=[ξ1,...,ξn](cid:62)[ξ1,...,ξn]= n−n1I−n−11ee(cid:62), which implies (cid:107)ξi(cid:107)2 = sii =1 for all i∈I. We now show that such an empirical distribution exists when d≥n−1. 8 NapatRujeerapaiboon,KilianSchindler,DanielKuhn,WolframWiesemann Proposition 1 For d ≥ n−1, there is Pˆn ∈ PE(Rd,n) with (cid:107)ξ(cid:107)2 ≤ 1 for all ξ ∈ (cid:113) supp(Pˆn) such that C2(Pˆn,m)= nn−−m1. Proof Assume first that d = n and consider the empirical distribution Pˆn = 1 (cid:80)n δ defined through n i=1 ξi (cid:114) n−1 −1 ξ =ye+(x−y)e ∈Rn with x= and y= . (8) i i n (cid:112)n(n−1) A direct calculation reveals that S=[ξ1,...,ξn](cid:62)[ξ1,...,ξn]= n−n1I− n−11ee(cid:62). To prove the statement for d = n−1, we note that the n scenarios in (8) lie on the (n−1)-dimensional subspace H orthogonal to e∈Rn. Thus, there exists a rotation thatmapsH toRn−1×{0}. Asthe Gram matrixis invariantunder rota- tions,therotatedscenariosgiverisetoanempiricaldistributionPˆn ∈PE(Rn−1,n) satisfying the statement of the proposition. Likewise, for d > n the linear trans- formation ξ (cid:55)→ (I,0)(cid:62)ξ , I ∈ Rn×n and 0 ∈ Rn×(d−n), generates an empirical i i distribution Pˆn ∈PE(Rd,n) that satisfies the statement of the proposition. (cid:117)(cid:116) Proposition1requiresthatd≥n−1,whichappearstoberestrictive.Wenote, however, that this condition is only sufficient (and not necessary) to guarantee the tightness of the bound from Theorem 2. Moreover, we will observe in Sec- tion 2.3 that the bound of Theorem 2 provides surprisingly accurate guidance for theWassersteindistancebetweenpractice-relevantempiricaldistributionsPˆ and n their continuously reduced optimal distributions. 2.2 Fundamental Limits for the Type-1 Wasserstein Distance In analogy to the previous section, we now derive a revised upper bound on C (n,m) for the type-1 Wasserstein distance. l (cid:113) Theorem 3 Theworst-casetype-1WassersteindistancesatisfiesC (n,m)≤ n−m. 1 n−1 Note that this bound is identical to the bound of Theorem 2 for l=2. Proof of Theorem 3 Leveraging again Theorem 1 and Remark 1, we obtain that 1 (cid:88)(cid:88)(cid:13) (cid:13) C1(n,m) = {ξmi}a⊆xRd {Ij}m∈Pin(I,m)nj∈Ji∈Ij(cid:13)ξi−gmed(Ij)(cid:13)2 s.t. (cid:107)ξ (cid:107) ≤1 ∀i∈I. i 2 WeshowthatC (n,m)≤C (n,m)forallnandm=1,...,n,whichinturnproves 1 2 thestatementofthetheorembyvirtueofTheorem2.Tothisend,weobservethat 1 (cid:88)(cid:88)(cid:13) (cid:13) C1(n,m) ≤ {ξmi}a⊆xRd {Ij}m∈Pin(I,m)nj∈Ji∈Ij(cid:13)ξi−mean(Ij)(cid:13)2 (9) s.t. (cid:107)ξ (cid:107) ≤1 ∀i∈I i 2  1/2 ≤ {ξmi}a⊆xRd {Ij}m∈Pin(I,m)n1 j(cid:88)∈Ji(cid:88)∈Ij(cid:13)(cid:13)ξi−mean(Ij)(cid:13)(cid:13)22 (10) s.t. (cid:107)ξ (cid:107) ≤1 ∀i∈I, i 2 ScenarioReductionRevisited:FundamentalLimitsandGuarantees 9 where the first inequality follows from the definition of the geometric median, which ensures that (cid:88)(cid:13) (cid:13) (cid:88)(cid:13) (cid:13) (cid:13)ξi−gmed(Ij)(cid:13)2 ≤ (cid:13)ξi−mean(Ij)(cid:13)2 ∀j ∈J, i∈Ij i∈Ij andthesecondinequalityisduetothearithmetic-meanquadratic-meaninequality (Steele 2004, Exercise 2.14). The statement of the theorem now follows from the observation that the optimal value of (10) is identical to C (n,m). (cid:117)(cid:116) 2 In the next proposition we derive a lower bound on C (n,m). 1 Proposition 2 For d ≥ n−1, the worst-case type-1 Wasserstein distance satisfies (cid:113) C (n,m)≥ (n−m)(n−m+1). 1 n(n−1) Proof Assume first that d = n, and consider the empirical distribution Pˆn with scenarios defined as in (8). Let {I } ∈ P(I,m) be an arbitrary m-set partition j of I and note that gmed(I ) = mean(I ) for every j ∈ J due to the permutation j j symmetry of the ξ . This is indeed the case because 0 ∈ ∂f (mean(I )) for each i j j (cid:80) f (ζ)= (cid:107)ye+(x−y)e −ζ(cid:107) , j ∈J. Thus, we have j i∈Ij i 2 (cid:13) (cid:13) (cid:13)ξi−mean(Ij)(cid:13)2 (cid:34)(cid:18) x+(|I |−1)y(cid:19)2 (cid:18) x+(|I |−1)y(cid:19)2(cid:35)1/2 = x− j +(|I |−1) y− j |I | j |I | j j (cid:20)|I |−1(cid:21)1/2 (cid:20)n(|I |−1)(cid:21)1/2 = j (x−y) = j ∀i∈I , |I | (n−1)|I | j j j wherethelastequalityfollowsfromthedefinitionsofxandyin(8).ByTheorem1 and Remark 1 we therefore obtain C1(Pˆn,m) = {Ij}m∈Pin(I,m)n1 j(cid:88)∈Ji(cid:88)∈Ij(cid:13)(cid:13)ξi−mean(Ij)(cid:13)(cid:13)2 1 (cid:88)(cid:113) = min |I |(|I |−1). (cid:112) j j {Ij}∈P(I,m) n(n−1)j∈J By introducing auxiliary variables z = |I |−1 ∈ N , j ∈ J, we find that deter- j j 0 mining C1(Pˆn,m) is tantamount to solving   C1(Pˆn,m) = (cid:112) 1 min (cid:88)(cid:113)zj(zj+1): (cid:88)zj =n−m. n(n−1) {zj}⊆N0j∈J j∈J  Observethattheobjectivefunctionofz1 =n−mandz2 =...=zm =0evaluates to(cid:112)(n−m)(n−m+1),whichimpliesthatC1(Pˆn,m)≤(cid:113)(n−mn()n(n−−1m) +1).Hence, 10 NapatRujeerapaiboon,KilianSchindler,DanielKuhn,WolframWiesemann it remains to establish the reverse inequality. To this end, we note that (cid:34) (cid:35)1/2 (cid:88)(cid:113) (cid:88) (cid:88) (cid:113) zj(zj+1) = zj(zj+1)+ zjzj(cid:48)(1+zj)(1+zj(cid:48)) j∈J j∈J j,j(cid:48)∈J j(cid:54)=j(cid:48) (cid:34) (cid:35)1/2 (cid:88) (cid:88) ≥ zj(zj+1)+ zjzj(cid:48) j∈J j,j(cid:48)∈J j(cid:54)=j(cid:48) (cid:32) (cid:33)2 1/2 (cid:88) (cid:88) (cid:112) =  zj + zj = (n−m)(n−m+1), j∈J j∈J andthustheclaimfollowsford=n.Thecasesd=n−1andd>ncanbereduced to the case d=n as in Proposition 1. Details are omitted for brevity. (cid:117)(cid:116) Proposition 2 asserts that C (n,m) (cid:38) n−m = C2(n,m) whenever d ≥ n−1. 1 n−1 2 TogetherwithTheorem3,wethusobtainthefollowingrelationbetweentheworst- case Wasserstein distances of types l=1 and l=2: 2 C (n,m)≤C (n,m)≤C (n,m). 2 1 2 Weconjecturethatthelowerboundistighter,butwewerenotabletoprovethis. 2.3 Discussion √ Theorems 2 and 3 imply that C (n,m) (cid:46) 1−p for large n and for l ∈ {1,2}, l where p = m represents the desired reduction factor. The significance of this n result is that it offers a priori guidelines for selecting the number m of support pointsinthereduceddistribution.Toseethis,consideranyempiricaldistribution Pˆn = n1 (cid:80)i∈Iδξi,anddenotebyr≥0andµ∈Rd theradiusandthecenterofany (ideally the smallest) ball enclosing ξ1,...,ξn, respectively. In this case, we have (cid:32) (cid:33) Cl(Pˆn,m)=r·Cl n1 (cid:88)δξi−µ,m ≤r·Cl(n,m)(cid:46)r·(cid:112)1−p, (11) r i∈I wheretheinequalityholdsbecause(cid:107)(ξ −µ)/r(cid:107) ≤1foreveryi∈I.Notethat(11) i 2 enables us to find an upper bound on the smallest m guaranteeing that Cl(Pˆn,m) falls below a prescribed threshold (i.e., guaranteeing that the reduced m-point distribution remains within some prescribed distance from Pˆ ). n Eventhoughtheinequalityin(11)canbetight,whichhasbeenestablishedin Propositi√on 1, one might suspect that typically Cl(Pˆn,m) is significantly smaller than r · 1−p when the points ξ ∈ Rd, i ∈ I, are sampled randomly from a i standarddistribution,e.g.,amultivariateuniformornormaldistribution.However, while the upper bound (11) can be loose for low-dimensional data, Proposition 3 below suggests that it is surprisingly tight in high dimensions—at least for l=2.

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.