ebook img

Estimating mean dimensionality of ANOVA decompositions PDF

37 Pages·2005·0.22 MB·English
by  
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Estimating mean dimensionality of ANOVA decompositions

Estimating mean dimensionality of ANOVA decompositions Ruixue Liu1 and Art B. Owen2 Department of Statistics Stanford University Stanford CA, 94305 Orig: June 2003. Revised: March 2005 Abstract The analysis of variance is now often applied to functions defined on theunitcube,whereitservesasatoolfortheexploratoryanalysisoffunc- tions. The mean dimension of a function, defined as a natural weighted combination of its ANOVA mean squares, provides one measure of how hard or easy the function is to integrate by quasi-Monte Carlo sampling. This paper presents some new identities relating the mean dimension, andsomeanalogouslydefinedhighermoments,tothevariableimportance measures of Sobol’ (1993). As a result we are able to measure the mean dimension of certain functions arising in computational finance. We pro- duce an unbiased and non-negative estimate of thevariance contribution of the highest order interaction, which avoids the cancellation problems of previous estimates. In an application to extreme value theory, we find 1 among other things, that the minimum of d independent U[0,1] random variables has a mean dimension of 2(d+1)/(d+3). Keywords: Effective Dimension, Extreme Value Theory, Functional ANOVA, Global Sensitivity Analysis, Quasi-Monte Carlo 1Ruixue Liu is a doctoral candidate in Statistics at Stanford University. 2Art B. Owen is a professor of Statistics at Stanford University. This work was supported by the U.S. NSF under grants DMS-0072445 and DMS-0306612. We thank an associate editor and two anonymous referees for comments that have improved this article. 2 1 INTRODUCTION The analysis of variance (ANOVA) for square integrable functions on [0,1]d is becoming a widely used tool for the exploratory analysis of functions. The ANOVA allows us to quantify the notion that some variables and interactions aremuchmoreimportantthan others. The resultis a formofglobalsensitivity analysis,distinctfromlocalmethodsbasedonpartialderivatives. Saltelli,Chan, and Scott (2000) provide a survey of global sensitivity analysis with numerous applications in the physical sciences. Within the ANOVA formulation, we may answer questions about variable importance, via numerical integration. Sobol’ and his co-authors (Sobol’ 1990; Sobol’ 1993;Archer, Saltelli, and Sobol’ 1997;Sobol’ 2001) have developed un- biased Monte Carlo methods for estimating global sensitivity indices expressed through variances of ANOVA component functions. TheANOVAof[0,1]dinvolves2d−1effectsandcorrespondingmeansquares. For moderately larged it becomes difficult to estimate them all. It is much less difficult to estimate certaininterpretable weighted sums of these mean squares. This paper develops some new theory and algorithms for the ANOVA of [0,1]d. Then it uses these techniques to investigate some functions, from finan- cial valuation, and extreme value theory. In addition to sensitivity analysis, finance, and extreme value theory, the ideas presented here also have useful applications to the bootstrap, that we omit due to space considerations. Section2introducesournotation,presentstheANOVAof[0,1]d,somevari- able importance measures from global sensitivity analysis, and the dimension distribution. Section 3 presents new results in the ANOVA of [0,1]d. Included areameasureofvariableimportancemotivatedbyaprobleminmachinelearn- 3 ing and a Monte Carloalgorithmfor estimating it, some new identities relating global sensitivity measures to some weighted combinations of ANOVA mean squares, and an inequality suitable for bounding effective dimension from di- mension moments. Section 4 uses the estimation method from Section 3 to explain why some quasi-Monte Carlo rules can perform well, even for functions wherethediscrepancyboundonerrorisinfinite. Italsoshowsanexamplefrom finance in which lower mean dimension correspondsto better gains from quasi- Monte Carlo sampling. Section 5 develops the ANOVA for the minimum of d random variables, as studied in extreme value theory. Section 6 describes the Monte Carlo and quasi-Monte Carlo methods that we used for our numerical answers. The proofs for the extreme value material appear in an appendix. 2 BACKGROUND Let f ∈ L2[0,1]d and for x ∈ [0,1]d write x = (x1,...,xd). Here we present a brief outline of the ANOVA decomposition for [0,1]d, Sobol’s global sensitivity indices, the notion of effective dimension, and the dimension distribution. For more details, the reader may turn to the cited references. The ANOVA decomposition of L2[0,1]d was first presented in Hoeffding (1948)for his analysis of U-statistics. Sobol’ (1969)uses it in quadrature prob- lems,andEfronandStein(1981)useittostudythejackknife. Takemura(1983) gives a survey of applications in statistics. ThefunctionalANOVAiswellknownfortheanalysisofstatisticalfunction- als. LetT(Y1,...,Yd)beafunctionofdindependentandidenticallydistributed random variables Yj. Suppose that Yj has cumulative distribution function G(y) = Pr(Yj ≤ y). We may write Yj = G−1(xj) for independent xj ∼ U[0,1] 4 and G−1(u) = inf{y | G(y) ≥ u}. Then f(x) = T(G−1(x1),...,G−1(xd)) rep- resents the statistic T as a function on the unit cube. Under some smoothness conditions in von Mises (1947), the function f becomes dominated by an ad- ditive approximation in the large d limit. This result underlies central limit theorems for T(Y1,...,Yd). Sobol’s sensitivity indices describe the relative importance to f of the d input variables xj considered individually and in subsets, as presented below. The sensitivity indices are based in turn on the analysis of variance (ANOVA) decomposition. 2.1 Notation For subsets u⊆D ={1,...,d}, let|u| denote the cardinalityof u, v−u denote the set difference {j |j ∈v,j (cid:6)∈u}, and −u denote the complement D−u. By xu we denote the |u|-tuple of components xj for j ∈ u. The domain of xu is a copy of [0,1]|u| written as [0,1]u. For x and z in [0,1]d the expression f(xu,z−u) means f evaluated at the pointp∈[0,1]dwithpj =xj forj ∈uandpj =zj forj (cid:6)∈u. Wheng(xu,x−u)= g(xu,z−u) for all x,z ∈ [0,1]d then we say that the function g depends on x only through xu, or equivalently, that g does not depend on x−u. (cid:1) (cid:1) Let f(x)dx = I and suppose σ2 = (f(x) − I)2dx < ∞. To rule out (cid:1) trivialities,wealsoassumethatσ2 >0. Integralsoftheform g(x)dxaretaken (cid:1) to be over [0,1]d and produce scalar values. Integrals of the form g(x)dxv represent integration with respect to xv ∈ [0,1]v with the result viewed as a function of x that does not depend on xv. 5 2.2 ANOVA decomposition In the ANOVA decomposition we write (cid:2) f(x)= fu(x) (1) u⊆{1,...,d} where fu(x) depends on x only through xu. The term fu is obtained by sub- tractingfromf allterms forstrictsubsets ofu,andthenaveragingoverx−u to give a function not depending on x−u: (cid:3) (cid:4) (cid:2) (cid:5) (cid:3) (cid:2) fu(x)= f(x)− fv(x) dx−u = f(x)dx−u− fv(x). (2) v(cid:1)u v(cid:1)u Usingusualconventions,f∅ istheconstantfunctionequaltoI forallx∈[0,1]d. (cid:1) It follows by induction on |u|, that when j ∈ u then 01fu(x)dxj = 0 so (cid:1) (cid:1) that fu(x)fv(x)dx=0 when u(cid:6)=v. More generally fu(x)gv(x)dx=0 when u(cid:6)=v and f,g ∈L2[0,1]d. The variance of fu is written as σu2. Clearly σ∅2 =0, (cid:1) while if u (cid:6)= ∅ then σu2 = fu(x)2dx. The ANOVA is named for the following easily proved property (cid:2) 2 2 σ = σ . (3) u u⊆{1,...,d} 6 2.3 Sensitivity and variable importance Sobol’(1993)givestwomeasuresoftheimportanceofasubsetuofthevariables, which we label (cid:2) 2 2 τ = σ , and, (4) u v v⊆u (cid:2) 2 2 τ = σ . (5) u v v∩u(cid:4)=∅ These can be thought of as lower and upper limits, respectively, on the impor- tance of the subset u. It is easy to show that 0 ≤ τ2 ≤ τ2 ≤ σ2 and that u u τ2 +τ2 = σ2. Normalized versions, τ2/σ2 and τ2/σ2, are known as global u −u u u sensitivity indices. (cid:1) Let g∗ ∈L2[0,1]d be the minimizer of (f(x)−g(x))2dx among functions g (cid:6) (cid:1) that depend only on xu. Then g∗ = v⊆ufv and τ2u = g∗(x)2dx. If τ2u/σ2 is closetoonethenf isclosetoafunctionthatdependsonlyonxu. Ifτ2 issmall, u thenasSobol’(1993)describes,the variablesxu maybe consideredunessential, and in some applications we might choose to fix them at default values. Equation (4) expresses 2d values τ2 as linear combinations of 2d values σ2. u u The inverse linear relation is (cid:2) σ2 = (−1)|u−v|τ2. (6) u v v⊆u To compute σ2 from τ2 we can combine equation (6) with the identity σ2 = u v 7 τ2+τ2 . Sobol’ (1993) gives the identities: v −v (cid:3) 2 2 u −u u −u −u I +τ = f(x ,x )f(x ,z )dxdz , and, u (cid:3) 1 τ2 = (f(xu,x−u)−f(zu,x−u))2dxdzu. u 2 TheintegralsintheseidentitiesprovidethebasisforMonteCarloorquasi-Monte Carlo estimation of sensitivity indices. For small d it is feasible to estimate I2 andall 2d−1 non-degenerateτ2 values. Then the ANOVA components σ2 can u u be estimated from (6). For large |u| it often happens that σ2 is small compared to the numerical u errorin estimates in some τ2 for v ⊂u. Then the subtractions in (6) may yield v estimates of σ2 with large relative errors. u 2.4 Effective dimension Afunctionmaybethoughttohaveaneffectivedimensionsmallerthandifitcan be closely approximated by certain sums of functions that involve fewer than d components of x. Caflisch, Morokoff, and Owen (1997) define the effective dimensionof a function in two senses. The function f has effective dimension s (cid:6) in the superposition sense if σ2 ≥ 0.99σ2 and it has effective dimension |u|≤s u (cid:6) s in the truncation sense if σ2 ≥0.99σ2. u⊆{1,...,s} u The choice of 99’th percentile is arbitrary, but reasonable in the context of quasi-Monte Carlo sampling. Hickernell (1998) makes the threshold quantile a parameter in the definition. We will emphasize superposition. The extreme of low effective dimension is attained by additive functions. An additive function can be integrated very effectively by Latin hypercube 8 sampling, (McKay, Beckman, and Conover 1979), and it can be optimized by optimizing separately over each input variable. For instance, one can optimize f(1/2,...,xj,...,1/2)forj =1,...,dtoobtaintheglobaloptimum(x1,...,xd) of f. Stein (1987) shows that Latin hypercube sampling remains an effective integration tool, for functions that are nearly additive. For nearly-additive f, separateoptimizationremainsauseful heuristic,thoughwecanconstructfunc- tions for which it will fail badly. 2.5 Dimension distribution Thedimensiondistribution(inthesuperpositionsense)isadiscreteprobability distribution with mass function (cid:2) 1 2 ν(j)= σ , j =1,...,d. σ2 u |u|=j If one chooses a non-empty set U ⊆{1,...,d} at random,such that the proba- bility that U =u is σ2/σ2 then Pr(|U|=j)=ν(j). u Owen (2003) computes the dimension distribution for some test functions used in quasi-MonteCarlo. Some widely used test functions for numericalinte- gration have very low effective dimension, making them relatively easy. Other test functions are more intrinsically of high dimension. Wang and Fang (2002) givearecursivealgorithmforthedimensiondistributionoffunctionsofproduct form. The effective dimension is defined through a quantile of the dimension dis- tribution. Such quantiles can be hard to estimate directly. Moments are easier to estimate, and in some cases, simple bounds such as those of Chebychev or Markov, yield usable quantile bounds from moment bounds. When comparing 9 functionsofloweffectivedimension,itcanhappenthatallofthefunctionsbeing compared have the same low effective dimension. See Wang and Fang (2002) for examples. Then the mean dimension may serve as an easy to compute tie breaker. 3 NEW ANOVA RESULTS Here we present new results for the ANOVA of [0,1]d. Section 3.1 presents a notion of the importance of xu based on supersets of u. Section 3.2 presents some new identities relating dimension moments to global sensitivity indices Section 3.3 proves an inequality bounding tail probabilities of the dimension distribution in terms of dimension moments. 3.1 Superset importance The quantity (cid:2) 2 2 Υ = σ (7) u w w⊇u is used in the study of black box functions f in machine learning. The inter- pretability of f can be improved by ignoring certain collections of high order interactions among the variables. Then Υ2 represents the cost, in lost model u fidelity, of ignoring fu and fv for all v ⊇u (Hooker 2004). Clearly Υ2u ≤τ2u, so that ignoring the supereffects of u is a less severe simplification than freezing xu. The inverse of (7) is (cid:2) σ2 = (−1)|w−u|Υ2. (8) u w w⊇u 10

Description:
The analysis of variance (ANOVA) for square integrable functions on [0, 1]d is becoming a widely do the high order ANOVA components dominate.
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.