Adaptive global thresholding on the sphere Claudio Durastantia,1 aFakulta¨t fu¨r Matematik, Ruhr Universita¨t, Bochum 6 1 0 Abstract 2 This work is concerned with the study of the adaptivity properties of nonpara- l u metric regression estimators over the d-dimensional sphere within the global J thresholding framework. The estimators are constructed by means of a form 6 of spherical wavelets, the so-called needlets, which enjoy strong concentration 2 properties in both harmonic and real domains. The author establishes the con- ] vergence rates of the Lp-risks of these estimators, focussing on their minimax T propertiesandprovingtheiroptimalityoverascaleofnonparametricregularity S function spaces, namely, the Besov spaces. . h Keywords: Global thresholding, needlets, spherical data, nonparametric t a regression, U-statistics, Besov spaces, adaptivity. m 2010 MSC: 62G08, 62G20, 65T60 [ 2 v 1. Introduction 4 4 ThepurposeofthispaperistoestablishadaptivityfortheLp-riskofregres- 8 sion function estimators in the nonparametric setting over the d-dimensional 2 sphere Sd. The optimality of the Lp risk is established by means of global 0 thresholding techniques and spherical wavelets known as needlets. . 1 Let (X ,Y ),...,(X ,Y ) be independent pairs of random variables such 1 1 n n 0 that, for each i ∈ {1,...,n}, X ∈ Sd and Y ∈ R. The random variables i i 6 X ,...,X are assumed to be mutually independent and uniformly distributed 1 1 n locations on the sphere. It is further assumed that, for each i∈{1,...,n}, : v i Y =f(X )+ε , (1) X i i i r where f : Sd (cid:55)→ R is an unknown bounded function, i.e., there exists M > 0 a such that sup |f(x)|≤M <∞. (2) x∈Sd Moreover,therandomvariables(cid:15) ,...,(cid:15) inEq.(1)areassumedtobemutually 1 n independent and identically distributed with zero mean. Roughly speaking, Email address: [email protected](ClaudioDurastanti) 1TheauthorissupportedbyDeutscheForschungsgemeinschaft(DFG)-GRK2131,“High- dimensionalPhenomenainProbability—FluctuationsandDiscontinuity”. Preprint submitted to Journal of Multivariate Analysis July 27, 2016 theycanbeviewedastheobservationalerrorsandinwhatfollows, theywillbe assumed to be sub-Gaussian. In this paper, we study the properties of nonlinear global hard thresholding estimators, inordertoestablishtheoptimalratesofconvergenceofLp-risksfor functions belonging to the so-called Besov spaces. 1.1. An overview of the literature In recent years, the issue of minimax estimation in nonparametric settings has received considerable attention in the statistical inference literature. The seminal contribution in this area is due to Donoho et al. [7]. In this paper, the authors provide nonlinear wavelet estimators for density functions on R, lying over a wide nonparametric regularity function class, which attain optimal rates of convergence up to a logarithmic factor. Following this work, the interaction between wavelet systems and nonparametric function estimation has led to a considerable amount of developments, mainly in the standard Euclidean frame- work; see, e.g., [3, 5, 24, 26, 27, 28, 30] and the textbooks [22, 44] for further details and discussions. More recently, thresholding methods have been applied to broader settings. Inparticular,nonparametricestimationresultshavebeenachievedonSd byus- ingasecondgenerationwaveletsystem,namely,thesphericalneedlets. Needlets were introduced by Narcowich et al. [39, 40], while their stochastic properties dealing with various applications to spherical random fields were examined in [2, 6, 34, 35, 36]. Needlet-like constructions were also established over more general manifolds by Geller and Mayeli [18, 19, 20, 21], Kerkyacharian et al. [25] and Pesenson [41] among others, and over spin fiber bundles by Geller and Marinucci [16, 17]. In the nonparametric setting, needlets have found various applications on directional statistics. Baldi et al. [1] established minimax rates of convergence for the Lp-risk of nonlinear needlet density estimators within the hard local thresholding paradigm, while analogous results concerning regression function estimationwereestablishedbyMonnier[38]. Theblockthresholdingframework was investigated in Durastanti [9]. Furthermore, the adaptivity of nonparamet- ric regression estimators of spin function was studied in Durastanti et al. [10]. In this case, the regression function takes as its values algebraical curves lying onthetangentplaneforeachpointonS2 andthewaveletsusedaretheso-called spin (pure and mixed) needlets; see Geller and Marinucci [16, 17]. The asymptotic properties of other estimators for spherical data, not con- cerning the needlet framework, were investigated by Kim and Koo [31, 32, 33], while needlet-like nearly-tight frames were used in Durastanti [8] to establish the asymptotic properties of density function estimators on the circle. Finally, in Gautier and Le Pennec [15], the adaptive estimation by needlet thresholding was introduced in the nonparametric random coefficients binary choice model. Regarding the applications of these methods in practical scenarios, see, e.g., [13, 14, 23], where they were fruitfully applied to some astrophysical problems, concerning, for instance, high-energy cosmic rays and Gamma rays. 2 1.2. Main results Consider the regression model given in Eq. (1) and let {ψ : j ≥ 0,k = j,k 1,...,K } be the set of d-dimensional spherical needlets. Roughly speaking, j j and K denote the resolution level j and the cardinality of needlets at the j resolution level j, respectively. The regression function f can be rewritten in terms of its needlet expansion. Namely, for all x∈Sd, one has Kj (cid:88)(cid:88) f(x)= β ψ (x), j,k j,k j≥0k=1 where {β :j ≥0,k =1,...,K } is the set of needlet coefficients. j,k j For each j ≥0 and k ∈{1,...,K }, a natural unbiased estimator for β is j j,k given by the corresponding empirical needlet coefficient, viz. n 1 (cid:88) β(cid:98)j,k = n Yiψj,k(Xi); (3) i=1 see,e.g.,Baldietal. [1]andH¨ardleetal. [22]. Therefore,theglobalthresholding needlet estimator of f is given, for each x∈Sd, by fˆn(x)=(cid:88)Jn τjK(cid:88)Jnβ(cid:98)j,kψj,k(x), (4) j=0 k=1 where τ is a nonlinear threshold function comparing the given j-dependent j statisticΘ(cid:98)j(p),builtonasubsampleofp<nobservations,toathresholdbased on the observational sample size. If Θ(cid:98)j(p) is above the threshold, the whole j-level is kept; otherwise it is discarded. Loosely speaking, this procedure allows one to delete the coefficients corre- sponding to a resolution level j whose contribution to the reconstruction of the regression function f is not clearly distinguishable from the noise. Following Kerkyacharian et al. [30], we consider the so-called hard thresholding frame- work, defined as τ =τ (p)=1{Θˆ (p)≥Bdjn−p/2}, j j j where p ∈ N is even. Further details regarding the statistic Θˆ (p) will be j discussedinSection3.4,wherethechoiceofthethresholdBdjn−p/2 willalsobe motivated. For the rest of this section, we consider Θ(cid:98)j(p) as an unbiased statistic of |β |p +···+|β |p. The so-called truncation bandwidth J , on the other j,1 j,Kj n hand, is the higher frequency on which the empirical coefficients βˆ ,...,βˆ j,1 j,Kj are computed. The optimal choice of the truncation level is J = ln (n1/d); n B for details, see Section 3. This allows the error due to the approximation of f, whichisaninfinitesumwithrespecttoj, tobecontrolledbyafinitesum, such as the estimator fˆ . n 3 Our objective is to estimate the global error measure for the regression es- timator fˆ . For this reason, we study the worst possible performance over a n so-called nonparametric regularity class {F :α∈A} of function spaces of the α Lp-risk, i.e., R (cid:0)fˆ ;F (cid:1)= sup E(cid:16)(cid:107)fˆ −f(cid:107)p (cid:17). n n α n Lp(Sd) f∈Fα Recall that an estimator fˆ is said to be adaptive for the Lp-risk and for the n scale of classes {F :α∈A} if, for every α∈A, there exists a constant c >0 α α such that E(cid:16)(cid:107)fˆ −f(cid:107)p (cid:17)≤c R (cid:0)fˆ ;F (cid:1); n Lp(Sd) α n n α see, e.g., [1, 22, 30]. For r > 0 and for p ∈ [1,r], we will establish that the regression estimator fˆ is adaptive for the class of Besov spaces Bs , where 1 ≤ q ≤ ∞ and d/p ≤ n p,q s < r+1. Finally, let R ∈ (0,∞) be the radius of the Besov ball on which f is defined. The proper choice of r will be motivated in Section 2.1. Our main result is described by the following theorem. Theorem 1.1. Given r ∈(1,∞), let p∈[1,r]. Also, let fˆ be given by Eq. (4), n with J = ln n1/d. Then, for 1 ≤ q ≤ ∞, d/p ≤ s < r+1 and 0 < R < ∞, n B there exists C >0 such that (cid:16) (cid:17) sup E (cid:107)fˆn−f(cid:107)pLp(Sd) ≤Cn2−s+spd. f∈Bs (R) r,q Thebehaviorofthe L∞-riskfunctionwillbestudiedseparately in Section3 and the analogous result is described in Theorem 3.2. Moreover, the details concerning the choice of r will be presented in Remark 3.1 and other properties of Lp-risk functions, such as optimality, will be discussed in Remark 3.3. 1.3. Comparison with other results The bound given in Eq. (12) is consistent with the results of Kerkyacharian et al. [30], where global thresholding techniques were introduced on R. As far as nonparametric inference over spherical datasets is concerned, our results can be viewed as an alternative proposal to the existing nonparametric regression methods (see, e.g., [1, 9, 10, 38]), related to the local and block thresholding procedures. Recall that in local thresholding paradigm, each empirical estimator β(cid:98)j,k is comparedtoathresholdτ anditis,therefore,keptordiscardedifitsabsolute j,k value is above or below τ respectively, i.e., the threshold function is given by j,k 1{|β(cid:98)j,k| ≥ τj,k}. Typically, the threshold is chosen such that τj,k = κ(lnn/n), whereκdependsexplicitlyontwoparameters,namely,theradiusRoftheBesov ball on which the function f is defined and its supremum M; see, e.g., Baldi et al. [1]. An alternative and partially data-driven choice for κ is proposed by Monnier [38], i.e., here n κ= κ0 (cid:88)ψ (X )2. n j,k i i=1 4 Even if this stochastic approach is proved to outperform the deterministic one, the threshold still depends on both R and M, which control κ . Also according 0 to the results established on R (see H¨ardle et al. [22]), local techniques entail nearlyoptimalityratesfortheLp-risksoverawidevarietyofregularityfunction spaces. Inthiscase,theregressionfunctionf belongstoBs (R),wheres≥d/r, p,q p∈{1,∞},q ∈{1,∞}and0<R<∞(cf. [1,10,22]). However,theseadaptive ratesofconvergenceareachievedontheexpenseofhavinganextralogarithmic term and of requiring explicit knowledge of the radius of the Besov balls on which f is defined, in order to establish an optimal threshold. As far as the block thresholding is concerned, for any fixed resolution level this procedure collects the coefficients βˆ ,...,βˆ into (cid:96) = (cid:96)(n) blocks de- j,1 j,Kj noted B ,...,B of dimension depending on the sample size. Each block is j,1 j,(cid:96) thencomparedtoathresholdandthenitisretainedordiscarded. Thismethod has exact convergence rate (i.e., without the logarithmic extra term), although itrequiresexplicitknowledgeoftheBesovradiusR. Furthermore,theestimator isadaptiveonlyoveranarrowersubsetofthescaleofBesovspaces,theso-called regular zone; see H¨ardle et al. [22]. The construction of blocks on Sd can also be a difficult procedure, as it requires a precise knowledge of the pixelization of the sphere, namely, the structure of the subregions on which the sphere is partitioned, in order to build spherical wavelets. On the other hand, the global techniques presented in this paper do not requireanyknowledgeregardingtheradiusofBesovballandhaveexactoptimal convergence rates even over the narrowest scale of regularity function spaces. 1.4. Plan of the paper This paper is organized as follows. Section 2 presents some preliminary results,suchastheconstructionofsphericalneedletframesonthesphere,Besov spaces and their properties. In Section 3, we describe the statistical methods weapplywithintheglobalthresholdingparadigm. Thissectionalsoincludesan introduction to the properties of the sub-Gaussian random variables and of the U-statistic Θ(cid:98)j(p), which are key for establishing the thresholding procedure. Section 4 provides some numerical evidence. Finally, the proofs of all of our results are collected in Section 5. 2. Preliminaries This section presents details concerning the construction of needlet frames, thedefinitionofsphericalBesovspacesandtheirproperties. Inwhatistofollow the main bibliographical references are [1, 2, 7, 21, 22, 24, 37, 39, 40]. 2.1. Harmonic analysis on Sd and spherical needlets Consider the simplified notation L2(cid:0)Sd(cid:1) = L2(cid:0)Sd,dx(cid:1), where dx is the uniform Lebesgue measure over Sd. Also, let H be the restriction to Sd of (cid:96) 5 the harmonic homogeneous polynomials of degree (cid:96); see, e.g., Stein and Weiss [43]. Thus, the following decomposition holds ∞ L2(cid:0)Sd(cid:1)=(cid:77)H . (cid:96) (cid:96)=0 An orthonormal basis for H is provided by the set of spherical harmonics (cid:96) {Y :m=1,...,g } of dimension g given by (cid:96),m (cid:96),d (cid:96),d (cid:18) (cid:19) (cid:96)+η (cid:96)+2η −1 d−1 g = d d , η = . (cid:96),d η (cid:96) d 2 d For any function f ∈L2(cid:0)Sd(cid:1), we define the Fourier coefficients as (cid:90) a := Y (x)f(x)dx, (cid:96),m (cid:96),m Sd such that the kernel operator denoting the orthogonal projection over H is (cid:96) given, for all x∈Sd, by g(cid:96),d (cid:88) P f(x)= a Y (x). (cid:96),d (cid:96),m (cid:96),m m=1 Also, let the measure of the surface of Sd be given by (cid:46) (cid:18)d+1(cid:19) ω =2π(d+1)/2 Γ . d 2 The kernel associated to the projector P links spherical harmonics to the (cid:96),d Gegenbauer polynomial of parameter η and order (cid:96), labelled by C(ηq). Indeed, d (cid:96) the following summation formula holds P (x ,x )= (cid:88)g(cid:96),d Y (x )Y x = (cid:96)+ηdC(ηd)((cid:104)x ,x (cid:105)), (cid:96),d 1 2 (cid:96),m 1 (cid:96),m 2 η ω (cid:96) 1 2 d d m=1 where (cid:104)·,·(cid:105) is the standard scalar product on Rd+1; see, e.g., Marinucci and Peccati [37]. Following Narcowich et al. [40], K = ⊕(cid:96) H is the linear space of homo- (cid:96) i=0 i geneous polynomials on Sd of degree smaller or equal to (cid:96); see also [1, 37, 39]. Thus,thereexistasetofpositivecubaturepointsQ ∈Sd andasetofcubature (cid:96) weights {λ }, indexed by ξ ∈Q , such that, for any f ∈K , ξ (cid:96) (cid:96) (cid:90) (cid:88) f(x)dx= λ f(ξ). ξ Sd ξ∈Q(cid:96) In the following, the notation a≈b denotes that there exist c ,c >0 such 1 2 that c b ≤ a ≤ c b. For a fixed resolution level j and a scale parameter B, let 1 2 6 (cid:0) (cid:1) K = card Q . Therefore, {ξ : k = 1,...,K } is the set of cubature j [2Bj+1] j,k j points associated to the resolution level j, while {λ :k =1,...,K } contains j,k j the corresponding cubature weights. These are typically chosen such that K ≈Bdj and ∀ λ ≈B−dj. j k∈{1,...,Kj} j,k Define the real-valued weight (or window) function b on (0,∞) so that (i) b lies on a compact support (cid:2)B−1,B(cid:3); (ii) the partitions of unity property holds, namely, (cid:80) b2((cid:96)/Bj) = 1, for j≥0 (cid:96)≥B; (iii) b∈Cρ(0,∞) for some ρ≥1. Remark 2.1. Note that ρ can be either a positive integer or equal to ∞. In the first case, the function b(·) can be built by means of a standard B-spline ap- proach, using linear combinations of the so-called Bernstein polynomials, while in the other case, it is constructed by means of integration of scaled exponen- tial functions (see also Section 4). Further details can be found in the textbook Marinucci and Peccati [37]. For any j ≥0 and k ∈{1,...,K }, spherical needlets are defined as j (cid:18) (cid:19) (cid:112) (cid:88) (cid:96) ψ (x)= λ b P (x,ξ ). j,k j,k Bj (cid:96),d j,k (cid:96)≥0 Spherical needlets feature some important properties descending on the struc- tureofthewindowfunctionb. Usingthecompactnessofthefrequencydomain, it follows that ψ is different from zero only on a finite set of frequencies (cid:96), so j,k that we can rewrite the spherical needlets as (cid:18) (cid:19) (cid:112) (cid:88) (cid:96) ψ (x)= λ b P (x,ξ ), j,k j,k Bj (cid:96),d j,k (cid:96)∈Λj whereΛ =(cid:8)u:u∈(cid:0)(cid:2)Bj−1(cid:3),(cid:2)Bj+1(cid:3)(cid:1)(cid:9)and[u],u∈R,denotestheintegerpart j of u. From the partitions of unity property, the spherical needlets form a tight frame over Sd with unitary tightness constant. For f ∈L2(cid:0)Sd(cid:1), Kj (cid:107)f(cid:107)2 =(cid:88)(cid:88)|β |2, L2(Sd) j,k j≥0k=1 where (cid:90) β = f(x)ψ (x)dx, (5) j,k j,k Sd are the so-called needlet coefficients. Therefore, we can define the following reconstruction formula (holding in the L2-sense): for all x∈Sd, Kj (cid:88)(cid:88) f(x)= β ψ (x). j,k j,k j≥0k=1 7 From the differentiability of b, we obtain the following quasi-exponential local- ization property; for x∈Sd and any η ∈N such that η ≤ρ, there exists c >0 η such that c Bjd/2 |ψ (x)|≤ η , (6) j,k {1+Bjd/2d(x,ξ )}η j,k where d(·,·) denotes the geodesic distance over Sd. Roughly speaking, |ψ (x)| ≈ Bjd/2 if x belongs to the pixel of area B−dj j,k surrounding the cubature point ξ ; otherwise, it is almost negligible. The j,k localizationresultyieldsasimilarboundednesspropertyfortheLp-norm,which is crucial for our purposes. In particular, for any p ∈ [1,∞) there exist two constants c ,C >0 such that p p cpBjd(12−p1) ≤(cid:107)ψj,k(cid:107)Lp(Sd) ≤CpBjd(21−p1), (7) and there exist two constants c ,C >0 such that ∞ ∞ c∞Bjd2 ≤(cid:107)ψj,k(cid:107)L∞(Sd) ≤C∞Bjd/2. AccordingtoLemma2inBaldietal. [1],thefollowingtwoinequalitieshold. For every 0<p≤∞, (cid:13) (cid:13) (cid:13)Kj (cid:13) (cid:13)(cid:13)(cid:13)(cid:88)βj,kψj,k(cid:13)(cid:13)(cid:13) ≤cBjd(21−p1)(cid:107)βj,k(cid:107)(cid:96)p, (8) (cid:13)k=1 (cid:13)Lp(Sd) and for every 1≤p≤∞, (cid:107)βj,k(cid:107)(cid:96)pBjd(12−p1) ≤c(cid:107)f(cid:107)Lp(Sd), where(cid:96) denotesthespaceofp-summablesequences. Thegeneralizationforthe p case p=∞ is trivial. The following lemma presents a result based on the localization property. Lemma 2.1. For x ∈ Sd, let ψ (x) be given by Eq. (2.1). Then, for q ≥ 2, j,k k (cid:54)= k , for i (cid:54)= i = 1,...,q, and for any η ≥ 2, there exists C > 0 such i1 i2 1 2 η that (cid:90) (cid:89)q Bdj(q−1) ψ (x)dx≤ , Sd j,ki (1+Bdj∆)η(q−1) i=1 where ∆= min d(ξ ,ξ ). i1,i2∈{1,...,q},i1(cid:54)=i2 j,ki1 j,ki2 Remark 2.2. As discussed in Geller and Pesenson [21] and Kerkyacharian et al. [25],needlet-likewaveletscanbebuiltovermoregeneralspaces,namely,over compact manifolds. In particular, let {M,g} be a smooth compact homogeneous manifold of dimension d, with no boundaries. For the sake of simplicity, we assume that there exists a Laplace–Beltrami operator on M with respect to the 8 action g, labelled by ∆ . The set {γ : q ≥ 0} contains the eigenvalues of M q ∆ associated to the eigenfunctions {u : q ≥ 0}, which are orthonormal with M q respect to the Lebesgue measure over M and they form an orthonormal basis in L2(M); see [20, 21]. Every function f ∈ L2(M) can be described in terms of its harmonic coefficients, given by a =(cid:104)f,u (cid:105) , so that, for all x∈M, q q L(cid:32) 2(M) (cid:88) f(x)= a u (x). q q q≥1 Therefore, it is possible to define a wavelet system over {M,g} describing a tight frame over M along the same lines as in Narcowich et al. [40] for Sd; see also [21, 25, 41] and the references therein, such as Geller and Mayeli [19, 20]. Here we just provide the definition of the needlet (scaling) function on M, given by Bj+1 (cid:18)√ (cid:19) (cid:112) (cid:88) −γq ψ (x)= λ b u (x)u¯(ξ ), j,k j,k Bj q j,k q=Bj−1 where in this case the set {ξ ,λ } characterizes a suitable partition of M, j,k j,k (cid:112) given by a ε-lattice on M, with ε = λ . Further details and technicalities j,k concerningε-latticescanbefoundinPesenson[41]. Analogouslytothespherical case, for f ∈ L2(M) and arbitrary j ≥ 0 and k ∈ {1,...,K }, the needlet j coefficient corresponding to ψ is given by j,k Bj+1 (cid:18)√ (cid:19) (cid:112) (cid:88) −γq β =(cid:104)f,ψ (cid:105) = λ b a u (ξ ). j,k j,k L2(Sd) j,k Bj q q j,k q=Bj−1 These wavelets preserve all the properties featured by needlets on the sphere: be- cause, as shown in the following sections, the main results presented here do not depend strictly on the underlying manifold (namely, the sphere) but rather they can be easily extended to more general frameworks such as compact manifolds, wheretheconcentrationpropertiesofthewaveletsandthesmoothapproximation properties of Besov spaces still hold. 2.2. Besov space on the sphere Here we will recall the definition of spherical Besov spaces and their main approximation properties for wavelet coefficients. We refer to [1, 10, 22, 39] for more details and further technicalities. Suppose that one has a scale of functional classes G , depending on the q- t dimensional set of parameters t ∈ T ⊆ Rq. The approximation error G (f;p) t concerning the replacement of f by an element g ∈G is given by t G (f;p)= inf (cid:107)f −g(cid:107) . t Lp(Sd) g∈Gt Therefore, the Besov space Bs is the space of functions such that f ∈Lp(cid:0)Sd(cid:1) p,q and (cid:88)1 {tsG (f;p)}q <∞, t t t≥0 9 which is equivalent to (cid:88) Bj{G (f;p)}q <∞. Bj j≥0 The function f belongs to the Besov space Bs if and only if p,q 1/p Kj (cid:88) {|βj,k|(cid:107)ψj,k(cid:107)Lp(Sd)}p =B−jswj, (9) k=1 where w ∈ (cid:96) , the standard space of q-power summable infinite sequences. j q Loosely speaking, the parameters s ≥ 0, 1 ≤ p ≤ ∞ and 1 ≤ q ≤ ∞ of the Besov space Bs can be viewed as follows: given B > 1, the parameter p p,q denotes the p-norm of the wavelet coefficients taken at a fixed resolution j, the parameter q describes the weighted q-norm taken across the scale j, and the parameter r controls the smoothness of the rate of decay across the scale j. In view of Eq. (7), the Besov norm is defined as q/p1/q (cid:107)f(cid:107)Bs =(cid:107)f(cid:107)Lp(Sd)+(cid:88)Bjq{s+d(1/2−1/p)}(cid:88)Kj |βj,k|p p,q j≥0 k=1 (cid:13) (cid:13) =(cid:107)f(cid:107) +(cid:13)Bj{s+d(1/2−1/p)}(cid:107)β (cid:107) (cid:13) , Lp(Sd) (cid:13) j,k (cid:96)p(cid:13)(cid:96)q for q ≥1. The extension to the case q =∞ is trivial. We conclude this section by introducing the Besov embedding, discussed in [1, 29, 30] among others. For p<r, one has Bs ⊂Bs and Bs ⊂Bs−d(1/p−1/r), r,q p,q p,q r,q or, equivalently, Kj Kj (cid:88)|βj,k|p ≤(cid:88)|βj,k|rKj1−p/r; (10) k=1 k=1 Kj Kj (cid:88)|β |r ≤(cid:88)|β |p. (11) j,k j,k k=1 k=1 Proofs and further details can be found, for instance, in [1, 10]. 3. Global thresholding with spherical needlets This section provides a detailed description of the global thresholding tech- nique applied to the nonparametric regression problem on the d-dimensional sphere. We refer to [12, 22, 30] for an extensive description of global threshold- ing methods and to [1, 10] for further details on nonparametric estimation in the spherical framework. 10