Table Of Content

A Framework for Wasserstein-1-Type Metrics Bernhard Schmitzer Benedikt Wirth Abstract We propose a unifying framework for generalising the Wasserstein-1 metric to a discrepancy measure between nonnegative measures of different mass. This generalization inherits the convexity and computational efficiency from the Wasserstein-1 metric, and it includes 7 several previous approaches from the literature as special cases. For various specific in- 1 stances of the generalized Wasserstein-1 metric we furthermore demonstrate their usefulness 0 in applications by numerical experiments. 2 n a Contents J 8 1 Introduction 2 ] 1.1 Motivation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 C 1.2 Overview: Old and new Wasserstein-1 metrics . . . . . . . . . . . . . . . . . . . . 2 O 1.3 Setting and notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 . h t 2 Static formulations of Wasserstein-1-Type discrepancies 5 a m 2.1 Wasserstein-1-Type discrepancies . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 [ 2.2 Infimal convolution-type extensions of W1 . . . . . . . . . . . . . . . . . . . . . . 10 2.3 Model equivalence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 1 v 2.4 The inhomogeneous case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 5 4 3 Examples 22 9 3.1 Different discrepancy measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 1 0 3.2 Unbalanced optimal transport . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 . 1 0 4 Discretization and Optimization 25 7 4.1 Variable splitting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 1 4.2 Discretization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 : v 4.3 Proximal operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 i X 4.4 Primal-dual iteration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 r 4.5 Including further terms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 a 5 Numerical Experiments 29 5.1 Discretization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 5.2 Robustness of unbalanced transport and material flow from videos . . . . . . . . 30 5.3 Cartoon-texture-noise decomposition . . . . . . . . . . . . . . . . . . . . . . . . . 33 5.4 Enhancement of thin structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 6 Conclusion 36 1 1 Introduction 1.1 Motivation Optimal transport and Wasserstein metrics are becoming increasingly popular tools in many applied fields. In particular, they provide a robust and intuitive measure of discrepancy for his- togramsandmassdistributions,whichcanforinstancebeexploitedinimageprocessingforimage interpolation [2], deformation analysis [35], colour transfer [29], cartoon-texture decomposition [23], or shape averaging [33]. A significant practical limitation of Wasserstein metrics is their high naive computational complexity. Various algorithms and methods for computational acceleration and efficient approximation have been proposed: a flow formulation [6], entropic regularization [18, 7] and adaptive sparse solvers [32] among others. The Wasserstein-1 metric W represents an excep- 1 tion, since it can be reformulated as a minimal cost flow problem with local constraints, leading to a significant reduction of the problem size. Plain Wasserstein metrics can only compare nonnegative measures of equal mass, which often does not reflect the requirements of applications. Therefore, early on, ad hoc extensions of optimal transport to unbalanced measures have been proposed (e.g. [30, 5, 27]). An extension of W that retains its computational efficiency can be found in [23]. More recently, unbalanced 1 optimal transport has been studied from a dynamic [19] and geometric perspective ([22, 17, 24], see also [16]). This article provides a unified framework for unbalanced extensions of W , thereby covering 1 several previous approaches as special cases and providing a new family of efficient transport- based discrepancy measures. 1.2 Overview: Old and new Wasserstein-1 metrics The Wasserstein-1 metric between two nonnegative measures ρ and ρ on a domain Ω (which 0 1 has to fulfill certain properties but can simply be thought of as a subset of Rn for the time being) is given by (cid:90) W (ρ ,ρ ) = inf d(x,y)dπ(x,y), (1.1) 1 0 1 π Γ(ρ0,ρ1) Ω Ω ∈ × where d denotes the distance measure between points in Ω and Γ(ρ ,ρ ) is the set of so-called 0 1 couplings, that is, nonnegative measures π on Ω×Ω satisfying π(B×Ω) = ρ (B) and π(Ω×B) = ρ (B) 0 1 for all measurable B ⊂ Ω. By convention, the infimum is infinite for empty Γ(ρ ,ρ ), which is 0 1 equivalenttoρ (Ω) (cid:54)= ρ (Ω). ThemetricW (ρ ,ρ )canbeinterpretedasthecostoftransporting 0 1 1 0 1 the mass ρ to the final distribution ρ , where the cost contribution of each mass particle is its 0 1 transport distance, and π(x,y) can be interpreted as the mass transported from x to y. Other optimal transport problems are obtained when replacing the distance d by other nonnegative cost functions. The above formulation is based on a variable π that lives on the high-dimensional space Ω×Ω, thus making a direct implementation infeasible except for special cases. However, W 1 has the particular property that it can also be expressed by the Kantorovich–Rubinstein formula 2 [34, Rk. 5.16] (cid:26)(cid:90) (cid:90) (cid:12) (cid:12) W1(ρ0,ρ1) = sup αdρ0+ βdρ1(cid:12) α,β Lipschitz with constant 1, (cid:12) Ω Ω (cid:27) α(x)+β(x) ≤ 0∀x ∈ Ω . (1.2) Typically one function is eliminated by setting β = −α. Note that this represents a (computa- tionally feasible) convex optimization problem in the variables α and β that only live in Ω and only satisfy local constraints (in particular, the Lipschitz constraint just amounts to ensuring |∇α|,|∇β| ≤ 1 almost everywhere). Anaturalquestiontoaskishowtogeneralizetheaboveproblemwhilemaintainingconvexity and locality of the constraints, the two requirements for an efficient implementation. We suggest the following simple extension: (cid:26)(cid:90) (cid:90) (cid:12) (cid:12) Wh0,h1,B(ρ0,ρ1) = sup h0(α)dρ0+ h1(β)dρ1(cid:12)(cid:12)α,β Lipschitz with constant 1, Ω Ω (cid:27) (α(x),β(x)) ∈ B∀x ∈ Ω , (1.3) where h ,h : R → [−∞,∞] are concave functions and B ⊂ R2 is a convex set (a few more 0 1 natural conditions on h ,h ,B will be stated in detail later). 0 1 An alternative way to generalize W is motivated by the desire to extend it to unbalanced 1 measures. To this end one can introduce pointwise discrepancy functionals that locally penalize growth or shrinkage of mass and are of the form (cid:90) D(ρ ,ρ ) = c(dρ0, dρ1)dµ, 0 1 dµ dµ Ω whereµ (cid:29) ρ ,ρ isanarbitrarymeasure, dρi istheRadon–Nikodymderivative,andc(m ,m ) ≥ 0 1 dµ 0 1 0 is the cost of changing the mass of a particle from m to m (assumed one-homogeneous in 0 1 (m ,m ) so that D is independent of the choice of µ). Using such a discrepancy D one can 0 1 extend W to allow for unbalanced measures ρ and ρ , for example via inf W (ρ ,ρ) + 1 0 1 ρ,ρ 1 0 (cid:48) (cid:48) (cid:48)(cid:48) D(ρ,ρ )+W (ρ ,ρ ) or inf D(ρ ,ρ)+W (ρ,ρ )+D(ρ ,ρ ) or in the more general case (cid:48) (cid:48)(cid:48) 1 (cid:48)(cid:48) 1 ρ,ρ 0 (cid:48) 1 (cid:48) (cid:48)(cid:48) (cid:48)(cid:48) 1 (cid:48) (cid:48)(cid:48) via W (ρ ,ρ ) = inf D (ρ ,ρ )+W (ρ ,ρ )+D (ρ ,ρ )+W (ρ ,ρ )+D (ρ ,ρ ) D0,D01,D1 0 1 0 0 (cid:48)0 1 (cid:48)0 (cid:48)0(cid:48) 01 (cid:48)0(cid:48) (cid:48)1(cid:48) 1 (cid:48)1(cid:48) (cid:48)1 1 (cid:48)1 1 ρ(cid:48)0,ρ(cid:48)0(cid:48),ρ(cid:48)1(cid:48),ρ(cid:48)1 (1.4) with potentially different discrepancy functionals D , D , and D . 0 01 1 Such ‘inf-convolutions’ can also be constructed for more general transport distances beyond W . The models used in [30, 27] can be interpreted as instances of this framework, where the 1 discrepancies D were chosen sufficiently simple, such that (1.4) remains a linear program. In i [24]entropyfunctionalswereconsideredaslocaldiscrepanciesandalternativedynamicandlifted reformulations of (1.4) were given for particular choices of discrepancies and transport costs. In this article we exploit the particular structure of W to show that the two approaches 1 (1.3) and (1.4) are actually equivalent in the sense that for any admissible (h ,h ,B) one can 0 1 find (D ,D ,D ) and vice versa such that 0 01 1 W = W . h0,h1,B D0,D01,D1 3 The article is organized as follows. The mathematical setting and notation are fixed in Section 1.3. In Section 2 we introduce in detail the W generalizations (1.3) and (1.4) and prove 1 their equivalence via convex duality in Section 2.3. Examples and special cases are described in Section 3. In particular, different existing approaches fit into our framework such as the Kantorovich–Rubinstein norm from [23] and a W -type variant of the Hellinger–Kantorovich 1 distance from [24]. Finally, a numerical discretization and optimization algorithm are provided in Section 4, followed by a number of numerical experiments in Section 5. We conclude in Section 6. 1.3 Setting and notation Throughout the article we will use the following conventions. • Let (Ω,d) be a compact path metric space, that is, the metric d(x,y) represents the infimal length of continuous paths γ : [0,1] → Ω connecting x and y. For instance, Ω could be a closed bounded convex subset of a normed vector space such as Rn or a compact geodesically convex subset of a Riemannian manifold. • For a topological space X we denote by C(X) the set of continuous bounded real functions over X equipped with the sup-norm. • For a metric space X we write Lip(X) ⊂ C(X) for the set of 1-Lipschitz functions on X. • For a compact space X we identify the topological dual of C(X)m with the space of Rm- valued Radon measures on X, denoted by M(X)m, and we equip it with the weak-* topology arising from this pairing (see e.g. [1, Def. 1.58]). • We write M (X) for the nonnegative Radon measures on X. + • The Dirac measure δ ∈ M (Ω) at a point x ∈ Ω is defined for all measurable A ⊂ Ω via x + δ (A) = 1 if x ∈ A and δ (A) = 0 else. x x • For two measures µ,ν ∈ M(Ω), µ (cid:28) ν expresses that µ is absolutely continuous with respect to ν, and its Radon–Nikodym derivative will be denoted dµ. dν • For two measurable spaces X, Y, some µ ∈ M(X), and a measurable map f : X → Y we denote by f µ ∈ M(Y) the push-forward of µ under f defined by f µ(A) = µ(f 1(A)) for (cid:93) (cid:93) − any measurable A ⊂ Y. • For i = 0,1 we define the canonical projections pr : Ω×Ω → Ω, (x ,x ) (cid:55)→ x . i 0 1 i • For two ρ , ρ ∈ M (Ω) the set of couplings is given by 0 1 + (cid:110) (cid:12) (cid:111) Γ(ρ ,ρ ) = π ∈ M (Ω×Ω)(cid:12) pr π = ρ and pr π = ρ . 0 1 + (cid:12) 0(cid:93) 0 1(cid:93) 1 Note that Γ(ρ ,ρ ) is empty when ρ (Ω) (cid:54)= ρ (Ω). 0 1 0 1 • Let X be a normed vector space with dual space X . For a function f : X → R∪{∞} the (cid:48) Legendre–Fenchel conjugate is given by f (x) = sup(cid:104)x,x(cid:105)−f(x), ∗ (cid:48) (cid:48) x X ∈ 4 where (cid:104)·,·(cid:105) denotes the dual pairing. The preconjugate of a function f : X → R∪{∞} (cid:48) will be denoted f(x) = sup (cid:104)x,x(cid:105)−f(x). ∗ (cid:48) (cid:48) x X (cid:48)∈ (cid:48) • The indicator function ι : X → R∪{∞} of a set C ⊂ X is defined as ι (x) = 0 if x ∈ C C C and ι (x) = ∞ else. C Remark 1.1 (Choice of metric space). The property α ∈ Lip(Ω) of a function α : Ω → R is usually a global one; to verify it one needs to assert α(x) − α(y) ≤ d(x,y) for all pairs (x,y) ∈ Ω×Ω. However, in path metric spaces Ω, the 1-Lipschitz property reduces to a local condition on α that only needs to be checked for all x ∈ Ω (on Rn, for instance, α ∈ Lip(Rn) is equivalent to |∇α(x)| ≤ 1 for almost every x ∈ Rn). This reduction of a pointwise constraint on Ω×Ω to a pointwise constraint only on Ω underlies the computational efficiency of W , and for 1 this reason we here restrict to path metric spaces Ω. On the other hand, the choice of a compact space is rather technically motivated. Indeed, for compact Ω, a predual space to M(Ω) is simply C(Ω) and thus contains Lip(Ω). Therefore, formulations as in (1.2) can be obtained via standard duality arguments. On noncompact Ω, by contrast, considering the space C (Ω) of continuous functions that vanish on the boundary and 0 at infinity as the predual to M(Ω), we have Lip(Ω) (cid:54)⊂ C (Ω) so that additional approximation 0 arguments are required (see e.g. the discussion in [34, p.99] or the more direct application of the Hahn–Banach Theorem [20]). 2 Static formulations of Wasserstein-1-Type discrepancies In this section we consider two different extensions of the classical W metric and subsequently 1 demonstrate their equivalence. 2.1 Wasserstein-1-Type discrepancies One may ask how the W metric (1.2) can be generalized without giving up its efficiency, par- 1 ticularly the convexity of the problem and its low dimensional variables and constraints. This motivates the following definition of a discrepancy between two nonnegative measures ρ and ρ . 0 1 Definition 2.1 (W -discrepancy). Consider a convex set B ⊂ R2 and concave, upper h0,h1,B semi-continuous functions h ,h : R → R∪{−∞}. For ρ ,ρ ∈ M (Ω) we define 0 1 0 1 + (cid:82) h (α(x))dρ (x)  Ω 0 0 Eρ0,ρ1 (α,β) = +(cid:82) h (β(x))dρ (x) if α,β ∈ Lip(Ω) with (α(x),β(x)) ∈ B ∀x ∈ Ω, h0,h1,B  Ω 1 1 −∞ else, W (ρ ,ρ ) = sup Eρ0,ρ1 (α,β). (2.1) h0,h1,B 0 1 h0,h1,B α,β C(Ω) ∈ Remark 2.2 (Inhomogeneous versions). In principle one could also allow h , h , and B to have 0 1 spatiallyvarying, inhomogeneousformsh ,h : R×Ω → R∪{−∞}andB : Ω → 2R R. Toavoid 0 1 × technicalities we shall for now only consider the spatially homogeneous case. The generalization to the inhomogeneous case will be discussed in Section 2.4. 5 Remark 2.3 (Complexity). Just like (1.2), definition (2.1) represents a convex optimization problem whose variables α and β are functions on the low-dimensional domain Ω and satisfy three local constraints everywhere on Ω, two Lipschitz constraints as well as (α(x),β(x)) ∈ B. Remark 2.4 (Convexity). As the pointwise supremum of linear functionals in ρ and ρ , the 0 1 discrepancy W (ρ ,ρ ) is jointly convex in ρ and ρ . h0,h1,B 0 1 0 1 Remark 2.5 (Reduction to W case). The Wasserstein-1-distance is obviously retrieved for the 1 choice h (α) = α, h (β) = β, B = {(α,β) ∈ R2|α+β ≤ 0}. 0 1 Definition 2.6 (Admissible (h ,h ,B)). We call (h ,h ,B) admissible if there exist functions 0 1 0 1 h ,h : R → R∪{−∞} such that 01 10 B = B ∩(B ×B ) with (2.2) 01 0 1 B01 = (cid:8)(α,β) ∈ R2(cid:12)(cid:12) α ≤ h01(−β)} = (cid:8)(α,β) ∈ R2(cid:12)(cid:12) β ≤ h10(−α)}, (2.3) B = [α ,+∞), α = inf{α ∈ R : h (α) > −∞}, (2.4) 0 min min 0 B = [β ,+∞), β = inf{β ∈ R : h (β) > −∞}, (2.5) 1 min min 1 and h , h , h or equivalently h satisfy the following conditions (where we drop the indices): 0 1 01 10 1. h is concave, 2. h is upper semi-continuous, 3. h(s) ≤ s for all s ∈ R and h(0) = 0, 4. h is differentiable at 0 and h(0) = 1, (cid:48) 5. h is monotonically increasing. Note that on their respective domains, h = −h 1(−·) and h = −h 1(−·). 01 −10 10 −01 Remark 2.7 (Ontheconditions). TheadmissibilityconditionsarechosenastomakeW a h0,h1,B reasonable discrepancy on M (Ω), especially if two of h , h , and B are taken as in Remark 2.5. + 0 1 In particular, we ask for the following properties. a. Eρ0,ρ1 should be upper semi-continuous (a natural requirement for well-posedness of opti- h0,h1,B mization problem(2.1)), b. W (ρ ,ρ ) ≥ 0 for all ρ ,ρ ∈ M (Ω) and W (ρ ,ρ ) = 0 if ρ = ρ , h0,h1,B 0 1 0 1 + h0,h1,B 0 1 0 1 c. W (ρ ,ρ ) > 0 for ρ (cid:54)= ρ , h0,h1,B 0 1 0 1 d. W (ρ ,ρ ) = ∞ whenever ρ or ρ are negative, h0,h1,B 0 1 0 1 e. W (ρ ,ρ ) should be sequentially weakly-* lower semi-continuous in (ρ ,ρ ). h0,h1,B 0 1 0 1 Now to obtain corresponding conditions on B we first consider the case h = h = id (then 01 0 1 B = B ). Property a requires closedness of B, while property d implies (−∞,a]2 ⊂ B for some 01 finite a ∈ R. Together with the convexity of B it follows that B can be expressed in the form 01 (2.3) for an upper semi-continuous, concave, monotonically increasing h . 01 6 Next, we set any two of h , h , and h to the identity. For the remaining one it is not 0 1 01 difficult to see that condition 1 is equivalent to the convexity of optimization problem(2.1). Likewise,condition2isnecessaryforpropertya(itisalsoneededtomakesenseoftheintegralsin Eρ0,ρ1 ) and will in the proof of Proposition 2.10 turn out to be also sufficient. It is furthermore h0,h1,B a simple exercise to show the equivalence between condition 3 and property b. Indeed, assume h = h = id (the other cases follow analogously), then condition 3 implies W (ρ ,ρ ) ≥ E1ρ0,ρ1 01(0,0) = 0 as well as W (ρ,ρ) = sup (cid:82) h (α)−αdρ ≤ 0 fho0r,hρ1,B∈ M0 1(Ω). Vhic0e,h1v,Bersa, taking ρ = δx for sho0m,he1,Bx ∈ Ω, Wh0,hα1∈,BC((ρΩ,)ρ)Ω=00 implies supα Rh0(α) − α+= 0 and thus in particular h ≤ id. Furthermore, for a contradiction assume h∈(0) < 0, then by 0 0 virtue of the hyperplane separation theorem, the concavity and upper semi-continuity of h 0 there exists s ∈ R and ε > 0 with h (α) < sα − ε for all α ∈ R. Due to h ≤ id we may 0 0 assume s ≥ 0 and thus obtain 0 ≤ Wh0,h1,B(ρ,sρ) = supα Rh0(α)−sα < 0. Assuming now conditions 1 to 3 one can easily derive the equivalence of c∈ondition 4 and property c. Indeed, taking again h = h = id, condition 4 implies that Eρ0,ρ1 is differentiable at (α,β) = (0,0) in any di1rection01(ϕ,−ϕ) ∈ Lip(Ω)2 with ∂ Eρ0,ρ1h0,h(1α,B,β)(ϕ,−ϕ) = (cid:82) ϕd(ρ − ρ ). (α,β) h0,h1,B Ω 0 1 Thus, Eρ0,ρ1 (0,0) = 0 can only be a maximum if ρ = ρ . On the other hand, −h is h0,h1,B 0 1 0 convex with subgradient ∂(−h )(0) ⊃ {−1} due to condition 3. Now taking ρ ∈ M (Ω) and 0 0 + ρ = sρ for some s ≥ 0 with s (cid:54)= 1, property c implies the existence of some α ∈ Lip(Ω) 1 0 with 0 < Eρ0,ρ1 (α,−α) ≤ −(cid:82) (∂(−h )(0) + s)αdρ . Thus, −s ∈/ ∂(−h )(0) and therefore h0,h1,B Ω 0 0 0 ∂(−h )(0) = {−1} from which condition 3 follows. Note further that condition 3 automatically 0 implies property d since h and h are unbounded from below, while condition 5 may simply 0 1 be assumed for h and h without loss of generality: Indeed, suppose for instance h to be 0 1 0 nonmonotone,thenconditions1and3implytheexistenceofauniquemaximumvalueh (α¯) ≥ 0. 0 Therefore, Eρ0,ρ1 (α,β) ≤ Eρ0,ρ1 (min(α,α¯),β) = Eρ0,ρ1 (α,β) and thus W = W h0,h1,B h0,h1,B h,h1,B h0,h1,B h,h1,B for the monotonically increasing h(α) = h (min(α,α¯)). 0 Finally, the structure (2.2) of the set B is necessary for property e (that construction (2.2) actually implies property e will later follow from Corollary 2.32). For instance, take h = id and 1 let ρ = δ and ρn = 1ρ → 0 as n → ∞. If h (−α¯) > h (−α ) for some α¯ < α , then 1 x 0 n 1 10 10 min min W (0,ρ ) ≥ E0,ρ1 (α¯,h (−α¯)) = h (−α¯) h0,h1,B01 1 h0,h1,B01 10 10 > h (−α ) ≥ liminf sup h0(α) +h (−α) = liminfW (ρn,ρ ). 10 min n→∞ α≥αmin n 10 n→∞ h0,h1,B01 0 1 Without explicit mention we will in the following always assume (h ,h ,B) to be admissible. 0 1 The class of W -discrepancies is natural to consider and allows to extend the classical W h0,h1,B 1 distance to unbalanced measures as we will see. Several previously introduced extensions of the W -distance (as well as W itself) can be shown to fall into this category. In Section 3 some 1 1 examples, both already well-known and new variants, will be discussed in more detail. Remark 2.8 (Discrepancy bounds). The conditions on h , h , and B imply W (ρ ,ρ ) ≤ 0 1 h0,h1,B 0 1 W (ρ ,ρ ) for all ρ ,ρ ∈ M (Ω). 1 0 1 0 1 + Remark 2.9 (Non-existence of optimizers). Unfortunately, maximizers of Eρ0,ρ1 do not exist h0,h1,B in general. For instance, consider the relevant special case of h (α) = α for α > −1 and 0 1+α h (α) = −∞ else (see the Hellinger distance in Section 3) and set h = h = id for simplicity. 0 1 01 For ρ = δ and ρ = 0 it is easily seen that W (ρ ,ρ ) = 1 but that Eρ0,ρ1 (α,β) < 1 for 0 x 1 h0,h1,B 0 1 h0,h1,B all α,β ∈ C(Ω). 7 Due to the potential non-existence of maximizers we will later also examine a dual problem formulation. However, non-existence happens rather for certain special cases. As shown below, those cases can be characterized by equations which are simple to check. Proposition2.10(Uppersemi-continuity). EnergyEρ0,ρ1 isuppersemi-continuousonC(Ω)2. h0,h1,B Proof. Let (α ,β ) → (α,β) in C(Ω)2 with Eρ0,ρ1 (α ,β ) > −∞ for all n sufficiently large n n h0,h1,B n n (else there is nothing to show). Due to the closedness of Lip(Ω) and B we have (α,β) ∈ Lip(Ω)2 as well as (α(x),β(x)) ∈ B for all x ∈ Ω. Finally (cid:90) (cid:90) (cid:90) limsup h (α )dρ = limsup h (α )−α dρ + α dρ 0 n 0 0 n n 0 n 0 n Ω n Ω Ω (cid:90)→∞ →∞ (cid:90) (cid:90) (cid:90) (cid:90) ≤ limsuph (α )−α dρ + lim α dρ ≤ h (α)−αdρ + αdρ = h (α)dρ , 0 n n 0 n 0 0 0 0 0 0 Ω n→∞ n→∞ Ω Ω Ω Ω where we have used Fatou’s lemma (noting that h (α )−α ≤ 0), the upper semi-continuity 0 n n of h , as well as the continuity of the dual pairing between M (Ω) and C(Ω). Analogously, 0 + (cid:82) (cid:82) limsup h (β )dρ ≤ h (β)dρ , concluding the proof. n Ω 1 n 1 Ω 1 1 →∞ Proposition 2.11 (Existence of optimizers). Let ρ ,ρ ∈ M (Ω). 0 1 + • W (ρ ,ρ ) = ∞ if and only if sup ρ (Ω)h (α)+ρ (Ω)h (β) = ∞ (in which h0,h1,B 0 1 (α,β) B 0 0 1 1 case there are no maximizers of Eρ0,ρ1 ). ∈ h0,h1,B • A maximizer of Eρ0,ρ1 exists if and only if there exist α ,β ∈ (−∞,0] with h0,h1,B ∗ ∗ ρ (Ω)(−h ◦h )(−β ) ≥ ρ (Ω)(−h )(β ) 0 0 01 (cid:48) ∗ 1 1 (cid:48) ∗ or ρ (Ω)(−h ◦h )(−α ) ≥ ρ (Ω)(−h )(α ), 1 1 10 (cid:48) ∗ 0 0 (cid:48) ∗ where the prime refers to a favourably chosen element of the subgradient. • A maximizer of Eρ0,ρ1 exists if and only if sup h (α)ρ (Ω) + h (β)ρ (Ω) has a maximizer. h0,h1,B (α,β)∈B 0 0 1 1 Proof. For a function f ∈ C(Ω) we abbreviate fˆ= max f(x), fˇ= min f(x). Throughout x Ω x Ω ∈ ∈ the following, let α ,β ∈ C(Ω) denote a maximizing sequence and assume without loss of n n generality that αˆ ≥ βˆ for n large enough (else we may simply swap the roles of h ,α and n n 0 h ,β). 1 As for the first statement, let us show that W (ρ ,ρ ) = ∞ implies the divergence of h0,h1,B 0 1 sup Eρ0,ρ1 (α,β)(theconverseimplicationistrivial). Indeed,wemusthaveh (αˆ ) → α,β constant h0,h1,B 0 n ∞ and thus α (x) → ∞ for all x ∈ Ω due to the Lipschitz constraint. Then for n sufficiently n large, (cid:90) (cid:90) h (α (x))dρ (x)+ h (β (x))dρ (x) 0 n 0 1 n 1 Ω Ω ≤ h (αˇ )ρ (Ω)+h (βˆ )ρ (Ω)+(h (αˆ )−h (αˇ ))ρ (Ω). 0 n 0 1 n 1 0 n 0 n 0 Using (αˇ ,βˆ ) ∈ B and h (αˆ )−h (αˇ ) ≤ αˆ −αˇ ≤ diamΩ (note that h is a contraction on n n 0 n 0 n n n 0 [0,∞)) we indeed obtain Eρ0,ρ1 (αˇ ,βˆ ) ≥ Eρ0,ρ1 (α ,β )−diamΩρ (Ω) → ∞. h0,h1,B n n h0,h1,B n n 0 As for the second statement, note first that −h ◦h is convex and that −(−h )(h (β))· 0 01 0 (cid:48) 01 (−h )(β) ∈ ∂(−h ◦ h )(β), where the prime indicates an arbitrary subgradient element. 01 (cid:48) 0 01 8 Assume now the existence of a suitable β . For a contradiction, suppose αˇ → ∞ and βˆ → −∞ ∗ n n (bothareequivalentduetoα+β ≤ 0forall(α,β) ∈ B). Thenforany∆ > 0andnlargeenough, βˆ < β −∆. Now define β˜ (x) = β (x)+∆ and α˜ (x) = h (−β˜ (x)). Note that α˜ ∈ Lip(Ω) n ∗ n n n 01 n n (since β˜ < β ≤ 0 and h is a contraction on [0,∞)) with α˜ ≥ h (−β )+∆(−h )(−β˜ ) ≥ n ∗ 01 n 01 n 01 (cid:48) n α +∆(−h )(−β ). We have n 01 (cid:48) ∗ (cid:90) (cid:90) Eρ0,ρ1 (α ,β ) = h (α (x))dρ (x)+ h (β (x))dρ (x) h0,h1,B n n 0 n 0 1 n 1 Ω Ω (cid:90) (cid:90) ≤ h (α˜ )+(−h )(α˜ )∆(−h )(−β )dρ + h (β˜ )+(−h )(β˜ )∆dρ 0 n 0 (cid:48) n 01 (cid:48) ∗ 0 1 n 1 (cid:48) n 1 Ω Ω (cid:90) ≤ h (α˜ )dρ +(−h )(h (−β ))∆(−h )(−β )ρ (Ω) 0 n 0 0 (cid:48) 01 ∗ 01 (cid:48) ∗ 0 Ω (cid:90) + h (β˜ )dρ +(−h )(β )∆ρ (Ω) 1 n 1 1 (cid:48) ∗ 1 Ω (cid:90) (cid:90) = h (α˜ )dρ + h (β˜ )dρ 0 n 0 1 n 1 Ω Ω (cid:2) (cid:3) −∆ (−h ◦h )(−β )ρ (Ω)−(−h )(β )ρ (Ω) 0 10 (cid:48) ∗ 0 1 (cid:48) ∗ 1 ≤ Eρ0,ρ1 (α˜ ,β˜ ). h0,h1,B n n Thus, (α˜ ,β˜ ) is an even better maximizing sequence so that we may assume the maximizing n n sequence α ,β ∈ C(Ω) to be uniformly bounded with β ≥ βˆ −diamΩ ≥ β −diamΩ and n n n n ∗ α ≤ −β . Since α ,β ∈ Lip(Ω), the sequence is equicontinuous and converges (up to a n n n n subsequence) against some (α,β) ∈ C(Ω)2. By the upper semi-continuity of the energy, this must be a maximizer. The argument for a suitable α is analogous. ∗ For the converse implication assume ρ (Ω)(−h ◦h )(−β ) < ρ (Ω)(−h )(β ) for all β ∈ 0 0 01 (cid:48) ∗ 1 1 (cid:48) ∗ ∗ (−∞,0] (the proof is analogous if the other condition is violated). Taking β = 0, this implies ∗ ρ (Ω) > ρ (Ω). Let (α,β) ∈ C(Ω)2 be a maximizer and choose ∆ > max{βˆ,0}. We now set 0 1 β˜= β−∆, α˜(x) = h (−β˜(x)) (note that again α˜ ∈ Lip(Ω)) and obtain as before 01 (cid:90) (cid:90) Eρ0,ρ1 (α,β) = h (α)dρ + h (β)dρ h0,h1,B 0 0 1 1 Ω Ω (cid:90) (cid:90) ≤ h (α˜)dρ −(−h )(h (−βˇ˜))∆(−h )(−βˇ˜)ρ (Ω)+ h (β˜)dρ −(−h )(βˇ˜)∆ρ (Ω) 0 0 0 (cid:48) 01 01 (cid:48) 0 1 1 1 (cid:48) 1 Ω Ω = (cid:90) h (α˜)dρ +(cid:90) h (β˜)dρ +∆(cid:104)(−h ◦h )(−βˇ˜)ρ (Ω)−(−h )(βˇ˜)ρ (Ω)(cid:105) < Eρ0,ρ1 (α˜,β˜), 0 0 1 1 0 10 (cid:48) 0 1 (cid:48) 1 h0,h1,B Ω Ω contradicting the optimality of (α,β). By repeating the arguments for the second statement under restriction to spatially constant (α,β) ∈ Lip(Ω)2, we find that the existence conditions of the second statement are equivalent to existence of maximizers for sup h (α)ρ (Ω)+h (β)ρ (Ω). (α,β) B 0 0 1 1 ∈ There may be some redundancy in the choice of h , h , and B. In detail, it turns out that in 0 1 particular cases the model can be simplified by eliminating the constraint set B and the variable β. Later, this will also allow to simplify some infimal convolution-type discrepancy measures (cf. Corollary 2.34 and Section 3.2). Proposition 2.12 (Model reduction). Let γ > 0 (possibly γ = +∞), ρ ,ρ ∈ M (Ω), and 0 1 + abbreviate B(a,b) = {(α,β) ∈ R2|α+β ≤ 0}∩([a,+∞)×[b,+∞)) . 9 • If h (α) = min{α,γ} for all α > 0 (or equivalently h (β) = β −ι (β) for β < 0), 10 01 [ γ, ) then W = W with − ∞ h0,h1,B h0◦h01,h1,B(α˜min,βmin) α˜ = −h (−α ) = max{α ,−γ}. min 10 min min • If h (β) = min{β,γ} for all β > 0 (or equivalently h (α) = α−ι (α) for α < 0), 01 10 [ γ, ) then W = W with − ∞ h0,h1,B h0,h1◦h10,B(αmin,β˜min) β˜ = −h (−β ) = max{β ,−γ}. min 01 min min (cid:8)(cid:82) (cid:82) (cid:12) (cid:9) • Wh0,h1,B(a,b) = sup Ωh0(α)dρ0+ Ωh1(−α)dρ1(cid:12) α ∈ Lip(Ω), a ≤ α ≤ −b . Proof. In the first case, notice that for any β ∈ Lip(Ω) with β ≤ γ we also have α˜ = h ◦(−β) ∈ 01 Lip(Ω), since h (−·) is a contraction on (−∞,γ]. Note that if β(x) > γ for some x ∈ Ω, both 01 energies are −∞ for any α. Moreover, for α˜ to be feasible, we need α˜(x) = h (−β(x)) ≥ α 01 min for all x ∈ Ω (see (2.4)-(2.5)), which is equivalent to −β(x) ≥ −h (−α ) = α˜ (see (2.3)). 10 min min Thus, sup Eρ0,ρ1 (α,β) = Eρ0,ρ1 (α˜,β) h0,h1,B h0,h1,B α C(Ω) ∈ = Eρ0,ρ1 (−β,β) = sup Eρ0,ρ1 (α,β) h0◦h01,h1,B(α˜min,βmin) α C(Ω) h0◦h01,h1,B(α˜min,βmin) ∈ from which the statement follows. The second case follows analogously. Finally, (cid:26)(cid:90) (cid:90) (cid:12) (cid:27) (cid:12) Wh0,h1,B(a,b) = sup h0(α)dρ0+ h1(β)dρ1(cid:12)(cid:12) α,β ∈ Lip(Ω), α+β ≤ 0, a ≤ α, b ≤ β Ω Ω (cid:26)(cid:90) (cid:90) (cid:12) (cid:27) (cid:12) = sup h0(α)dρ0+ h1(−α)dρ1(cid:12) α ∈ Lip(Ω), a ≤ α ≤ −b . (cid:12) Ω Ω Remark 2.13. For standard W , where h = h = h = id, one has α = β = −∞ and 1 0 1 01 min min B = B(α ,β ) = R2. Consequently, by virtue of Proposition 2.12, one can eliminate one min min dual variable by setting α = −β, as is common practice (cf. Section 1.2). 2.2 Infimal convolution-type extensions of W 1 In the literature typically a different approach is taken to achieve convex and efficient generalizations of the W metric, namely an infimal convolution-type combination of non-transport-type 1 metrics with the Wasserstein metric. To introduce a general class of such discrepancies we now fix a suitable family of local, non-transport-type discrepancies. Definition 2.14 (Local discrepancy). Let c : R×R (cid:55)→ [0,∞] satisfy the following assumptions, a. cisconvex,positively1-homogeneous,andlowersemi-continuousjointlyinbotharguments, b. c(m,m) = 0 for all m ≥ 0 and c(m ,m ) > 0 if m (cid:54)= m , 0 1 0 1 c. c(m ,m ) = ∞ whenever m < 0 or m < 0. 0 1 0 1 10