ebook img

On Tractability of Approximation for a Special Space of Functions PDF

0.21 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview On Tractability of Approximation for a Special Space of Functions

ON TRACTABILITY OF APPROXIMATION FOR A SPECIAL SPACE OF FUNCTIONS M. HEGLAND AND G. W. WASILKOWSKI 2 Abstract. We consider approximation problems for a special space of d variate 1 functions. We show that the problems have small number of active variables, as it 0 has been postulated in the past using concentration of measure arguments. We also 2 showthat,depending onthe normformeasuringthe error,the problemsarestrongly n polynomiallyorquasi-polynomiallytractableeveninthemodelofcomputationwhere a J functional evaluations have the cost exponential in the number of active variables. 3 2 ] A 1. Introduction N This paper is inspired by [4], where an importance of a special class of multivariate . h functions was advocated, and by recent results on tractability of problems dealing with t a infinite-variate functions, see [1, 2, 5, 6, 9, 10, 13, 15, 16, 17], where the cost of an m algorithm depends on the number of active variables that it uses. [ The selection of functions in [4] was based on a particular choice of the metric used 1 in the space of the variables x of the functions and on the smoothness of the functions. v i 6 Here we consider the case where the x denote features of some objects. Adding new i 8 features will increase the distance in general, and this increase can grow substantially 8 4 with the dimension. For example, if x [0,1] for i = 1,...,d then the average squared i . ∈ 1 Euclidean distance of two points grows proportional to the dimension d: 0 2 d 1 (x y )2dxdy = O(d). : i − i v Z[0,1]dZ[0,1]d i=1 i X X This unbounded growth shows that Euclidean distance cannot approximate any dis- r tance function between two objects for large d. This is why it was suggested in [4] to a use a scaled Euclidean distance to characterize the dissimilarity of two objects based on features x ,...,x : 1 d d 1 dist(x,y) = (x y )2. vd i − i u i=1 u X t The continuity of functions considered in [4] was Lipschitz-continuity based on the scaled Euclidean distance. For differentiable functions, this leads to conditions of bounded d 2 ∂f d L 1 · ∂x ≤ i=1 (cid:18) i(cid:19) X Date: January 25, 2012. 1 2 M. HEGLAND ANDG. W.WASILKOWSKI where L is the Lipschitz constant of f with respect to the scaled Euclidean distance. 1 A model example is the mean function d 1 f(x) = x . i d i=1 X This function has a Lipschitz constant L = 1. Consequently the gradient satisfies 1 1 f . k∇ kL2 ≤ d1/2 It follows that f is approximated with an O(d 1/2) error by the constant 0.5, i.e., the − values of f are concentrated around 0.5. This concentration phenomenon for general Lipschitz-continuous functions was established by L´evy in [8]. Higher order approximations can be derived in the case when higher order Lipschitz constants are finite, i.e., where for some m > 0 one has ∂mf 2 dm L2 . · ∂x ∂x ≤ m i1≤X···≤im (cid:18) i1 ··· im(cid:19) Using the example of the mean function one has 2 d 1 1 x = O(1/d). i d − 2 ! i=1 X From this one gets the first order (additive function) approximation d 2 1 x x = (1 x /d)x 1/4+O(1/d). d2 i j d − i i − i<j i=1 X X A similar approximation is obtained for the average squared distance 2 (x d(d 1) i<j i− x )2. Both these functions do satisfy a higher order Lipschitz condition w−ith respect to j P the scaled norm introduced earlier. Classes of such functions and the particular scaling by 1/dm, where m is equal to the number of involved variables, are related to the weighted reproducing kernel Hilbert space of multivariate functions on [0,1]d with the reproducing kernel given by d H u (x,y) = 1+ d min(x ,y ). −| | j j K · u= j u X6 ∅ Y∈ Here the sum is over all subsets u of 1,...,d . This is why we consider such spaces in { } the current paper. It is well known, see, e.g., [7], that functions from that space have an ANOVA-like representation of the form f(x) = f + f (x), u ∅ u= X6 ∅ where each component f depends on, exactly, the variables listed in u. Hence, u is u the list of active variables in f and the scaling parameter m is equal to u . The u | | ON TRACTABILITY OF APPROXIMATION FOR A SPECIAL SPACE OF FUNCTIONS 3 corresponding norm of f is given by 2 u ∂ f f 2 = f 2 + du | | u . | | k kHd | ∅| Xu6=∅ ·(cid:13)(cid:13)(cid:13) j∈u∂xj(cid:13)(cid:13)(cid:13)L2 As already mentioned, it was also postulated(cid:13)iQn [4] that(cid:13)functions of this form are (cid:13) (cid:13) well approximated by sums of those components f that depend on small numbers of u variables, i.e., with u of small cardinality, or just by a constant function. We show in a more quantified way, that this is true for approximation problems with errors measured in a norm of another Hilbert space that also has a tensor product form. That is, d G we show that to approximate f with an error not exceeding ε f , it is enough to consider only those terms f that depend on at most u m·(kε,kdH)dvariables, where u | | ≤ m(ε,d) grows with 1/ε very slowly and/or decreases to zero when d tends to infinity. More precisely, for general tensor product spaces (including the L space), we have 2 c ln(1/ε) m(ε,d) min d, · . ≤ ln(ln(1/ε)) (cid:18) (cid:19) for a known constant c > 0 that does not depend on ε and d. For instance, for any d N and the error demand ε = 10 q, we have + − ∈ m 10 2,d 5, m 10 4,d 8, and m 10 8,d 14. − − − ≤ ≤ ≤ Suppose next t(cid:0)hat spa(cid:1)ces an(cid:0)d sa(cid:1)tisfy the followin(cid:0)g assum(cid:1)ption: there exists d d H G C < such that ∞ (1) f 2 C f 2 for all f = f . u u d k kGd ≤ · k kGd ∈ H u u X X Then m(ε,d) has even a smaller upper bound 2 ln(1/ε) m(ε,d) min d, · . ≤ ln(d/c) (cid:18) (cid:19) Hence for a fixed error demand ε, m(ε,d) = O(1/ln(d)) as d . → ∞ Actually, we prove these results for reproducing kernels of the form u (x,y) = 1+ d K(x ,y ) −| | j j K · u= j u X6 ∅ Y∈ for a general class of univariate kernels K : D D R including of course K(x,y) = × → min(x,y) and D = [0,1]. We also study the tractability of approximation problems for algorithms that can use arbitrary linear functional evaluations. However, as it has been done in the recent study of infinite-variate problems, we assume that the cost of each such evaluation depends on the number k of active variables and is given by $(k). Under the general tensor product assumption, approximation is quasi-polynomially tractable, and it is strongly polynomially tractable if (1) is satisfied. These results hold even when the cost function $ is exponential. We also find a sharp upper bound on the exponent of strong tractability. Approximation is weakly tractable even when $ is doubly exponential. 4 M. HEGLAND ANDG. W.WASILKOWSKI 2. Basic Definitions 2.1. Space of d-Variate Functions. Let D R be a Borel measurable set and let ⊆ H = H(K) be a reproducing kernel Hilbert space (RKH space for short) of functions f : D R whose kernel is denoted by K. → We assume that 1 / H, ∈ where 1 denotes the constant function f(x) = 1 for all x. In what follows we write [1..d] to denote the set of positive integers not exceeding d, [1..d] := n N : n d + { ∈ ≤ } and use u,v to denote subsets of [1..d]. Consider now the weights (2) γ := d u for u [1..d]. d,u −| | ⊆ Clearly γ = 1. d, The weig∅hted space of d-variate functions f : Dd R under the consideration is the → RKH space whose kernel is given by d H (x,y) := γ K (x,y) and K (x,y) = K(x ,y ) d d,u u u j j K · u [1..d] j u ⊆X Y∈ with the convention that K 1. For each u, by H we den∅o≡te the RKH space whose kernel is equal to K . Clearly u u H = span 1 and H H u for u = . It is well known that the spaces H , as u ⊗| | u ∅ { } ≃ 6 ∅ subspaces of ,aremutuallyorthogonalandanyf hastheuniquerepresentation d d H ∈ H f(x) = f (x) with f H u u u ∈ u [1..d] ⊆X and f 2 = f 2 = γ 1 f 2 . k kHd k ukHd d−,u ·k ukHu u [1..d] u [1..d] ⊆X ⊆X This representation is similar to the ANOVA decomposition since each term f depends u only on the variables listed in u. The space considered in [4] and mentioned in the Introduction is related to space with the classical Wiener kernel discussed in the d H following example. Example. Consider D = [0,1] and K(x,y) = min(x,y). Then H is the space of functions f : [0,1] R that vanish at zero, are absolutely → continuous, and have f L ([0,1]). The norm in H is given by ′ 2 ∈ 1 f 2 = f (x) 2dx. k kH | ′ | Z0 For u = , H consists of functions that depend only on the variables x with j u, u j 6 ∅ ∈ are zero if at least one of those variables is zero, have the mixed first-order partial ON TRACTABILITY OF APPROXIMATION FOR A SPECIAL SPACE OF FUNCTIONS 5 derivatives bounded in the L norm, and 2 2 ∂ f 2 = f(0) 2+ du f([x;u]) dx for f , | | d k kHd | | Xu6=∅ ZDd (cid:12)(cid:12)(cid:12)Yj∈u ∂xj (cid:12)(cid:12)(cid:12) ∈ H where [x;u] is given by (cid:12) (cid:12) (cid:12) (cid:12) x if j u, [x;u] = [y ,...,y ] with y := j ∈ 1 d j 0 otherwise. (cid:26) 2.2. Function Approximation Problems. For every d 1, let be a separable d ≥ G Hilbert space of functions on Dd such that is continuously embedded in it. We d H denote the corresponding embedding operator by , i.e., d S : and (f) = f. d d d d S H → G S We assume that and have tensor product forms, i.e., for every u and every d d S G f(x) = f (x ) with f H, we have j u j j j ∈ ∈ (3) Q f = f . k kGd k jkG1 j u Y∈ For simplicity of presentation we also assume that 1 = 1 so that 1 = 1. k kG1 k kGd The continuity of is equivalent to continuity of . Indeed, let d 1 S S (4) C := sup f < . 0 f H 1k kG1 ∞ k k ≤ Then for every u we have u sup f = C| | f Hu 1k kGd 0 k k ≤ and kSdk2 ≤ γd,u ·C02|u| = d kd ·d−k ·C02·k = 1+ Cd02 d, u [1..d] k=0(cid:18) (cid:19) (cid:18) (cid:19) ⊆X X since 2 kfk2Gd ≤ C0|u| ·kfukHu ≤ γd,u ·C02·|u| ·kfk2Hd. (cid:18)u [1..d] (cid:19) u [1..d] ⊆X ⊆X Clearly 1 eC02/2 for every d d ≤ kS k ≤ which means that the corresponding approximation problem is properly scaled. Note also that the condition (1) holds if (5) 1,f = 0 for all f H. h i 1 ∈ G Actually, under (5) we have f 2 = f 2 for all f . u d k kGd k kGd ∈ H u [1..d] ⊆X 6 M. HEGLAND ANDG. W.WASILKOWSKI Then we can get a better estimate of the norm of : d S kfk2Gd ≤ d|u| ·kfuk2Hu ·C02·|u| ·d−|u| ≤ kfk2Hd ·mkadxC02·k ·d−k. u ≤ X Since the estimation above is sharp, we conclude that = maxCk d k/2 kSdk k d 0 · − ≤ The class of such approximation problems contains the following weighted-L ap- 2 proximation. 2.2.1. Weighted L Approximation. Letρbeagivenprobabilitydensityfunction(p.d.f. 2 for short) on D. Without loss of generality, suppose that ρ is positive (a.e.) on D. Then the L (ρ ,Dd) space with finite 2 d f 2 = f(x) 2 ρ (x)dx, k kL2(ρd,Dd) | | · d ZDd is a well defined Hilbert space. Here by ρ we mean d d ρ (x) = ρ(x ). d j j=1 Y We then take = L (ρ ,Dd). d 2 d G It is well known that the continuity of is equivalent to the continuity of the following 1 S integral operator := : H H, (f)(x) = f(y) K(x,y) ρ(y)dy, W1 S1∗ ◦S1 → W1 · · Zd since then 2 is equal to the largest eigenvalue of , i.e., 1 1 kS k W C2 = max λ : λ spect( ) . 0 { ∈ W1 } Then C2 d 1 2 1+ 0 . d ≤ kS k ≤ d (cid:18) (cid:19) The condition (5) is now equivalent to f(x) ρ(x)dx = 0 for all f H, · ∈ ZD which is satisfied by various spaces of periodic functions. ON TRACTABILITY OF APPROXIMATION FOR A SPECIAL SPACE OF FUNCTIONS 7 2.3. Algorithms, Errors and Cost. Since problems considered in this paper are defined over Hilbert spaces, we can restrict the attention to linear algorithms only, see e.g., [14], of the form n (f) = L (f) a , n j j A · j=1 X where L are continuous linear functionals and a . In the worst case setting j j d ∈ G considered in this paper, the error of an algorithm is defined by n A f (f) error( n; d, d) := sup k −An kGd. A H G f f∈Hd k kHd So far, in the complexity study of problems with finitely many variables, it has been assumed that the cost of an algorithm is given by the number n of functional evaluations. We believe that, similar to problems with infinitely many variables, the cost of computing L(f) should depend on the number of active variables of L. More precisely, for given L , let h be its generator, i.e., ∈ Hd∗ L ∈ Hd L(f) = f,h for all f . L d h iHd ∈ H Then h = h , L u [1..d] u ⊆ P Act(L) := v : h = 0, h = h v L u 6 (cid:12) (cid:26) u [1..d] (cid:27)(cid:12) (cid:12)[ ⊆X (cid:12) (cid:12) (cid:12) is the number of active variable(cid:12)s in L, and the cost of evaluating(cid:12)L(f) is equal to $(Act(L)), where $ : N R is a given cost function. The only assumptions that we make at + + → this point are $(0) 1 and $(k) $(k +1) for all k N. ≥ ≤ ∈ This includes $(k) = (k +1)q, $(k) = eqk, and $(k) = eeq·k · for some q 0. Then the (information) cost of = n L (f) a is given by ≥ An j=1 j · j n P cost( ) := $(Act(L )). n j A j=1 X The tractability results obtained so far for functions with finite numbers of variables correspond to $ 1. In our opinion, it makes sense to assume that the cost function is ≡ at least linear, i.e., $(k) c (k +1), k N. ≥ · ∈ 8 M. HEGLAND ANDG. W.WASILKOWSKI 2.4. Information Complexity and Tractability. By (information) complexity we mean the minimal information cost among all algorithms with errors not exceeding a given error demand. That is, for ε (0,1), ∈ comp(ε; , ) := inf cost( ) : error( ; , ) ε . d d d d H G { A A H G ≤ } We now recall the definition of three kinds of tractabilities. For a detailed discussion of tractability concepts and results, we refer to excellent monographs [11, 12]. We stress however, that those results pertain to the constant cost function, $ 1. ≡ We say that the problem (or more precisely the sequence of problems ) is d d S S polynomially tractable if there exist c,p,q 0 such that ≥ dq comp(ε; , ) c for all ε (0,1) and d N . Hd Gd ≤ · εp ∈ ∈ + It isstrongly polynomially tractableiff theaboveinequality holdswithq = 0,andweakly tractable iff ln(comp(ε; , )) d d limsup H G = 0. d+1/ε d+1/ε →∞ When the problem is strongly polynomially tractable then pstr := inf p : supεp comp(ε; , ) < d d · H G ∞ (cid:26) ε,d (cid:27) is called the exponent of strong tractability. There is also a concept of quasi-polynomial tractability introduced recently, see [3]. It is weaker than polynomial tractability and stronger than weak tractability. More precisely, the problem is quasi-polynomially tractable if there exist c,t 0 such that ≥ comp(ε; , ) c exp(t (1+ln(d)) (1+ln(1/ε))) for all ε (0,1) and d N . d d + H G ≤ · · · ∈ ∈ This means that comp(ε; , ) c (e d)t(1+ln(1/ε)). The significance of the quasi- d d · H G ≤ · · polynomial tractability is that for some applications, d can be very large but ε need not be very small, say ε = 10 2. Then the complexity of the problem is bounded by a − polynomial in d. As we shall prove in the next Sections, the problems considered in this paper are quasi-polynomially tractable even when the cost function $ is exponential in d. 3. Results 3.1. Number of Active Variables. We are interested in a number m = m(ε,d) such that, for any f , the terms f with u > m can be neglected, i.e., d u ∈ H | | (6) f ε f . u u ≤ · (cid:13)(cid:13)|u|>Xm(ε,d) (cid:13)(cid:13)Gd (cid:13)(cid:13)|u|>Xm(ε,d) (cid:13)(cid:13)Hd (cid:13) (cid:13) (cid:13) (cid:13) Hence, toapproximate (cid:13)(f)with err(cid:13)or bound(cid:13)edby ε√2, i(cid:13)t is enoughtouse algorithms d S with functionals L that have Act(L ) m(ε,d). j j ≤ We first find m(ε,d) for the general tensor product space and next for the special d G case (1). To distinguish between the two cases, we will write respectively m = m (ε,d) 1 1 and m = m (ε,d) instead of m = m(ε,d). 2 2 ON TRACTABILITY OF APPROXIMATION FOR A SPECIAL SPACE OF FUNCTIONS 9 3.1.1. General Case. For given ε (0,1) and d N , define + ∈ ∈ d d C2 k (7) m = m (ε,d) := min m : 0 ε2 . 1 1 k · d ≤ (cid:26) k=m+1(cid:18) (cid:19) (cid:18) (cid:19) (cid:27) X Of course, m (ε,d) is well defined and is bounded by d. 1 Proposition 1. For every d, ε (0,1), and f , (6) holds with m = m (ε,d) given d 1 ∈ ∈ H by (7). Moreover, m (ε,d) is bounded from above by min(d,M), where M = M(ε) is 1 the solution of (M +1)! eC02 = . C2·(M+1) ε2 0 In particular, there exists a constant C such that 1 ln(1/ε) m (ε,d) C for all ε < e e. 1 1 − ≤ · ln(ln(1/ε)) Proof. Of course, (6) holds if m (ε,d) = d. Therefore we consider only the case when 1 m = m (ε,d) < d. We have 1 1 u fu ≤ kfukGd ≤ kfukHu ·C0| | (cid:13)(cid:13)|uX|>m1 (cid:13)(cid:13)Gd |uX|>m1 |uX|>m1 (cid:13) (cid:13) 1/2 1/2 ≤(cid:13) (cid:13)γd−,u1 ·kfuk2Hu · γd,u ·C02·|u| (cid:20)|uX|>m1 (cid:21) (cid:20)|uX|>m1 (cid:21) 1/2 2 u = fu · γd,u ·C0·| | (cid:13)(cid:13)|uX|>m1 (cid:13)(cid:13)Hd (cid:20)|uX|>m1 (cid:21) (cid:13) (cid:13) d 1/2 (cid:13) (cid:13) d = f d k C2k u · k · − · 0· (cid:13)(cid:13)|uX|>m1 (cid:13)(cid:13)Hd (cid:20)k=Xm1+1(cid:18) (cid:19) (cid:21) (cid:13) (cid:13) (cid:13) (cid:13) f ε. u ≤ · (cid:13)(cid:13)|uX|>m1 (cid:13)(cid:13)Hd (cid:13) (cid:13) This completes the pr(cid:13)oof of th(cid:13)e first part. We now estimate the number m (ε,d). 1 Observe that, for any m < d, we have d d C2 k d d (d k +1) 0 = C2k ··· − k · d 0· · dk k! k=m+1(cid:18) (cid:19) (cid:18) (cid:19) k=m+1 · X X ≤ d Ck02!·k ≤ (Cm02·(+m+11))! ∞ C02·j(m(m++1+1)j!)! k=m+1 j=0 X X C2·(m+1) ∞ C2·j m+1+j C2·(m+1) eC02 = 0 0 / 0 · . (m+1)! j! j ≤ (m+1)! j=0 (cid:18) (cid:19) X (cid:3) This completes the proof. 10 M. HEGLAND ANDG. W.WASILKOWSKI Remark 1. One can slightly improve the estimate of m (ε,d) by letting M = M(ε) 1 to be the minimal integer such that C2/(M +1) < 1 and 0 1 2(M+1) (M +1)!/C · . 0 ≥ ε2 (1 C2/(M +1)) · − 0 This is because the last sum in the proof above can be bounded as follows: ∞ C2·j m+1+j C2 j 1 0 / 0 = . j! j ≤ m+1 1 C2 (m+1) j=0 (cid:18) (cid:19) j=0 (cid:18) (cid:19) − 0 · X X We calculated the values of M(ε) for ε = 10 q with q = 1,...,10 for the function − ⌈ ⌉ approximation problem with the Wiener kernel on [0,1] and ρ(x) 1. Recall that then ≡ C2 = 1/2. These values are listed in the following table. 0 q 1 2 3 4 5 6 7 8 9 10 M(10 q) 3 5 7 8 10 11 13 14 15 17 − ⌈ ⌉ 3.1.2. Special Case (1). We now investigate the number of active variables under the assumption (1). Then, for any k < d, 2 f C f 2 u u ≤ · k kGd (cid:13)(cid:13)|Xu|>k (cid:13)(cid:13)Gd |Xu|>k (cid:13)(cid:13) (cid:13)(cid:13) ≤ C · C02·|u| ·γd−,u1 ·γd,u ·kfuk2Hu u>k |X| 2 C max C2ℓ d ℓ f . ≤ · ℓ>k 0· · − · u (cid:0) (cid:1) (cid:13)(cid:13)|Xu|>k (cid:13)(cid:13)Gd (cid:13) (cid:13) Therefore, for m = m (ε,d) given by 2 2 (cid:13) (cid:13) 0 if d < C2 and (C2/d)d ε2/C, 0 0 ≤ (8) m := d if d < C2 and (C2/d)d > ε2/C, 2  0 0 min k : (C2/d)k+1 ε2/C otherwise,  0 ≤ we have the following proposition.  (cid:0) (cid:1) Proposition 2. Suppose that (1) is satisfied. For every d, ε (0,1), and f , (6) d ∈ ∈ H holds with m = m (ε,d) given by (8). Moreover, for d C2, 2 ≥ 0 ln(C/ε2) m (ε,d) min d, 1 and m (ε,d) = O ln 1(d) as d . 2 ≤ ln(d/C2) − 2 − → ∞ (cid:18) (cid:24) 0 (cid:25) (cid:19) (cid:0) (cid:1) 3.2. Changing Dimension Algorithm. We consider in this section very special al- gorithms that are from the family of changing dimension algorithms introduced in [6] for integration and in [16, 17] for approximation of functions with infinitely many variables. As shown recently in [15], these algorithms yield polynomial tractability for weightedL approximationproblemswithinfinitelymanyvariablesandgeneralweights 2 that have the decay greater than one. u These results are not applicable in this paper since the weights γ = d have d,u −| | decay exactly one. However, these weights still allow for quasi-polynomial tractability and strong polynomial tractability if (1) holds.

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.