ebook img

On the Complexity of Binary Samples PDF

0.2 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview On the Complexity of Binary Samples

On the Complexity of Binary Samples Joel Ratsaby Electrical and Electronics Engineering Department Ariel University Center of Samaria ISRAEL [email protected] 8 September 24, 2007 0 0 2 n Abstract a J Consider a class H of binary functions h : X → {−1,+1} on a finite interval X = 0 [0,B]⊂IR. Definethesamplewidthofhonafinitesubset(asample)S ⊂X asω (h)≡ 3 S minx∈S|ωh(x)|whereωh(x)=h(x)max{a≥0:h(z)=h(x),x−a≤z ≤x+a}. LetS(cid:96) ] be the space of all samples in X of cardinality (cid:96) and consider sets of wide samples, i.e., M hypersets which are defined as A = {S ∈ S : ω (h) ≥ β}. Through an application β,h (cid:96) S D of the Sauer-Shelah result on the density of sets an upper estimate is obtained on the growth function (or trace) of the class {A : h ∈ H}, β > 0, i.e., on the number . β,h s of possible dichotomies obtained by intersecting all hypersets with a fixed collection of c [ samples S ∈S of cardinality m. The estimate is 2(cid:80)2(cid:98)B/(2β)(cid:99)(cid:0)m−(cid:96)(cid:1). (cid:96) i=0 i 1 Keywords: Binary functions, density of sets, VC-dimension v 4 AMS Subject Classification: 06E30, 68Q32, 68Q25, 03C13, 68R05 9 7 4 1 Overview . 1 0 Let B > 0 and define the domain as X = [0,B]. In this paper we consider the class H 8 of all binary functions h : X → {−1,+1} which have only simple discontinuities, i.e., at 0 any point x the limits h(x+) ≡ lim h(z) from the right and similarly from the left : z→x+ v h(x−) exist (but are not necessarily equal). A main theme of our recent work has been i X to characterize binary functions based on their behavior on a finite subset of X. In ? r we showed that the problem of learning binary functions from a finite labeled sample can a improvethegeneralizationerror-boundsifthelearnerobtainsahypothesiswhichinaddition to minimizing the empirical sample-error is also ‘smooth’ around elements of the sample. This notion of smoothness (used also in ??) is based on the simple notion of width of h at x which is defined as ω (x) = h(x)max{a ≥ 0 : h(z) = h(x),x−a ≤ z ≤ x+a}. h For a finite subset (also called sample) S ⊂ X the sample width of h denoted ω (h) is S defined as ω (h) ≡ min|ω (x)|. S h x∈S 1 This definition of width resembles the notion of sample margin of a real-valued function f (see for instance ?). We say that a sample S is wide for h if the width ω (h) is large. Wide S samplesimplicitlycontainmoresideinformationforinstanceaboutalearningproblem. The currentpaperaimsatestimatingthecomplexityoftheclassofwidesamplesforfunctionsin H. This complexity is related to a notion of description complexity and knowing it enables to compute the efficiency of information that is implicit in samples for learning (see ?). 2 Introduction For any logical expression A denote by I{A} the indicator function which takes the value 1 or 0 whenever the statement A is true or false, respectively. Let (cid:96) be any fixed positive integer and define the space S of all samples S ⊂ X of size (cid:96). On S consider sets of wide (cid:96) (cid:96) samples, i.e., A = {S ∈ S : ω (h) ≥ β}, β > 0. β,h (cid:96) S We refer to such sets as hypersets. It will be convenient to associate with these sets the indicator functions on S which are denoted as (cid:96) h(cid:48) (S) = I (S). β,h Aβ,h These are referred to as hyperconcepts and we may write h(cid:48) for brevity. For any fixed width parameter γ > 0 define the hyperclass H(cid:48) = (cid:8)h(cid:48) : h ∈ H(cid:9). (1) γ γ,h Inwords,H(cid:48) consistsofallsetsofsubsetsS ⊂ X ofcardinality(cid:96)onwhichthecorresponding γ binary functions h are wide by at least γ. TheaimofthepaperistocomputethecomplexityofthehyperclassH(cid:48) thatcorresponds γ totheclassH. SincethedomainX isinfinitethensoisH(cid:48) henceonecannotsimplymeasure γ its cardinality. Instead we apply a standard combinatorial measure of the complexity of a family of sets as follows: suppose Y is a general domain and G is an infinite class of subsets of Y. For any subset S = {y ,...,y } ⊂ Y let 1 n Γ (S) ≡ |G | (2) G |S where G = {[I (y ),...,I (y )] : G ∈ G}. The growth function (see for instance ?) is |S G 1 G n defined as Γ (n) = max Γ (S). G G {S:S⊂Y,|S|=n} It measures the rate in which the number of dichotomies obtained by intersecting subsets G of G with a finite set S increases as a function of the cardinality n of S in the maximal case (it is also called the trace of G in ?). Since we are interested in hypersets as opposed to simple sets G (as above) then we consider the trace on a finite collection ζ ⊂ S of samples (instead of a finite sample S as (cid:96) above). It will be convenient to define the cardinality of such a collection as the cardinality of the union of its component sets, i.e., for any given finite collection ζ ⊂ S let (cid:96) (cid:12) (cid:12) (cid:12) (cid:12) (cid:12) (cid:91) (cid:12) |ζ| = (cid:12) S(cid:12) (3) (cid:12) (cid:12) (cid:12)S:S∈ζ (cid:12) 2 and we use m to denote a possible value of |ζ|. As a measure of complexity of H(cid:48) we γ compute the growth as a function of m, i.e. Γ (m) = max Γ (ζ). H(cid:48) H(cid:48) γ ζ:ζ⊂S ,|ζ|=m γ (cid:96) 3 Main result Let us state the main result of the paper. Theorem 1 Let (cid:96),m > 0 be finite integers and B > 0 a finite real number. Let H be the class of binary functions on [0,B] (with only simple discontinuities). For a given width parameter value γ > 0, the corresponding hyperclass H(cid:48) on the space S has a growth which γ (cid:96) is bounded as 2(cid:98)B/(2γ)(cid:99)(cid:18) (cid:19) (cid:88) m−(cid:96) Γ (m) ≤ 2 . H(cid:48) γ i i=0 Remark 1 For m > (cid:96)+B/γ, the following simpler bound holds (cid:18) (cid:19)B eγ(m−(cid:96)) γ Γ (m) ≤ 2 . H(cid:48) γ B Before proving this result we need some additional notation. We denote by (cid:104)a,b(cid:105) a generalized interval set of the form [a,b], (a,b), [a,b) or (a,b]. For a set R we write I (x) to R represent the indicator function of the statement x ∈ R. In case of an interval set R = (cid:104)a,b(cid:105) we write I(cid:104)a,b(cid:105). Proof: Any binary function h may be represented by thresholding a real-valued function f on X, i.e., h(x) = sgn(f(x)) where for any a ∈ IR, sgn(a) = +1 or −1 if a > 0 or a ≤ 0, respectively. The idea is to choose a class F of real-valued functions f which is rich enough (it has to be infinite since there are infinitely many binary functions on X) but is as simple as we can find. This is important since, as we will show, the growth function of H(cid:48) is γ bounded from above by the complexity of a class that is a variant of F. We start by constructing such an F. For a binary function h on X consider the cor- responding set sequence {R } which satisfies the following properties: (a) [0,B] = i i=1,2,... (cid:83) R and for any i (cid:54)= j, R ∩R = ∅, (b) h alternates in sign over consecutive sets i=1,2,... i i j R ,R , (c) R is an interval set (cid:104)a,b(cid:105) with possibly a = b (in which case R = {a}). Hence i i+1 i i h has the following general form (cid:88) h(x) = ± (−1)iI (x). (4) Ri i=1,2,..., Thus there are exactly two functions h corresponding uniquely to each sequence of sets R , i i = 1,2,..... Unless explicitly specified, the end points of X = [0,B] are not considered roots of h, i.e., the default behavior is that outside X, i.e., x < 0 or x > B, the function ‘continues’ with the same value it takes at the endpoint h(0) or h(B), respectively. Now, 3 associate with the set sequence R ,R ,... the unique non-decreasing sequence of right- 1 2 endpoints a ,a ,... which define these sets (the sequence may have up to two consecutive 1 2 repetitions except for 0 and B) according to R = (cid:104)a ,a (cid:105), i = 1,2,.... (5) i i−1 i with the first left end point being a = 0. Note that different choices for (cid:104) and (cid:105) (see 0 earlier definition of a generalized interval (cid:104)a,b(cid:105)) give different sets R and hence different i functions h. For instance, suppose X = [0,7] then the following set sequence R = [0,2.4), 1 R = [2.4,3.6),R = [3.6,3.6] = {3.6},R = (3.6,7]hasacorrespondingend-pointsequence 2 3 4 a = 2.4,a = 3.6,a = 3.6,a = 7. Note that a singleton set introduces a repeated value in 1 2 3 4 this sequence. As another example consider R = [0,0] = {0}, R = (0,4.1), R = [4.1,7] 1 2 3 with a = 0, a = 4.1, a = 7. 1 2 3 Next, define the corresponding sequence of midpoints a +a i i+1 µ = , i = 1,2,.... i 2 Define the continuous real-valued function f : X → [−B,B] that corresponds to h (via the end-point sequence) as follows: (cid:88) f(x) = ± (−1)i+1(x−a )I[µ ,µ ] (6) i i−1 i i=1,2,... where we take µ = 0 (see for instance, Figure 1). Clearly, the value f(x) equals the width 0 200 2 100 1 0 0 100 1 200 2 100 200 300 400 500 600 700 800 Figure 1: h (solid) and its corresponding f (dashed) on X =[0,B] with B =800 ω (x). Note that for a fixed sequence of endpoints a , i = 1,2,... the function f is invariant h i to the type of intervals R = (cid:104)a ,a (cid:105) that h has, for instance, the set sequence [0,a ), i i−1 i 1 [a ,a ), [a ,a ], (a ,B] and the sequence [0,a ], (a ,a ], (a ,a ], (a ,B] yield different 1 2 2 3 3 1 1 2 2 3 3 binary functions h but the same width function f. For convenience, when h has a finite 4 number n of interval sets R , then the sum in (4) has an upper limit of n and we define i a = B. Similarly, the sum in (6) goes up to n−1 and we define µ = B. Let us denote n n−1 by F = {|f| : f ∈ F}. (7) + It follows that the hyperclass H(cid:48) may be represented in terms of the class F as follows: γ + define the hypersets A = {S ∈ S : f(x) ≥ β,x ∈ S}, β > 0,f ∈ F β,f (cid:96) + with corresponding hyperconcepts f(cid:48) = I (S), let γ,f Aβ,f F(cid:48) = {f(cid:48) : f ∈ F } γ γ,f + and H(cid:48) = F(cid:48). (8) γ γ Hence, it suffices to compute the growth function Γ (m). F(cid:48) γ Let us now begin to analyze the hyperclass F(cid:48). By definition, F(cid:48) is a class of indicator γ γ functions of subsets of S . Denote by ζ ⊂ S a collection of N such subsets. By a (cid:96) N (cid:96) generalized collection we will mean a collection of subsets S ⊂ X with cardinality |S| ≤ (cid:96). Henceforth we fix a value m and consider only collections ζ , such that |ζ | = m (9) N N where recall the definition of cardinality is according to (3). Let us denote the individual components of ζ by S(j) ∈ S , 1 ≤ j ≤ N hence N (cid:96) ζ = {S(1),...,S(N)}. N The growth function may be expressed as (cid:12)(cid:110) (cid:111)(cid:12) Γ (m) ≡ max Γ (ζ ) ≡ max (cid:12) [f(cid:48)(S(1)),...,f(cid:48)(S(N))] : f(cid:48) ∈ F(cid:48) (cid:12). (10) Fγ(cid:48) ζN⊂S(cid:96),|ζN|=m Fγ(cid:48) N ζN⊂S(cid:96),|ζN|=m(cid:12) γ (cid:12) Denote by S(j) the ith element of the sample S(j) based on the ordering of the elements of i S(j) (which is induced by the ordering on X). Then Γ (ζ ) F(cid:48) N γ (cid:12)(cid:26)(cid:20) (cid:18) (cid:19) (cid:18) (cid:19)(cid:21) (cid:27)(cid:12) = (cid:12)(cid:12) I min f(x) > γ ,...,I min f(x) > γ : f ∈ F+ (cid:12)(cid:12) (cid:12) x∈S(1) x∈S(N) (cid:12) (cid:12)  (cid:12) (cid:12) (cid:96) (cid:96) (cid:12) = (cid:12)(cid:12)(cid:12)(cid:89)I(cid:16)f(Sj(1)) > γ(cid:17),...,(cid:89)I(cid:16)f(Sj(N)) > γ(cid:17) : f ∈ F+(cid:12)(cid:12)(cid:12). (11)   (cid:12) j=1 j=1 (cid:12) Ordertheelementsineachcomponentofζ bytheunderlyingorderingonX. Thenputthe N sets in lexical ordering starting with the first up to the (cid:96)th element. For instance, suppose m = 7, N = 3, (cid:96) = 4 and ζ = { {2,8,9,10},{2,5,8,9},{3,8,10,13}} 3 5 then the ordered version is {{2,5,8,9},{2,8,9,10},{3,8,10,13}}. For any x ∈ X let θγ(x) ≡ I(f(x) > γ) (12) f (we will sometimes write θ (x) for short). For any sample S(i) of cardinality |S(i)| ≥ 1 let f |S(i)| (cid:89) (i) e (f) = θ (S ). S(i) f j j=1 Then for ζ we denote by N v (f) ≡ [e (f),...,e (f)] ζN S(1) S(N) where for brevity we sometimes write v(f). Let V (ζ ) = {v (f) : f ∈ F } F+ N ζN + or simply V(ζ ). Then from (11) we have N (cid:12) (cid:12) ΓFγ(cid:48)(ζN) = (cid:12)VF+(ζN)(cid:12). (13) Denote by X(cid:48) the union N (cid:91) S(j) = X(cid:48) = {x }m ⊂ X (14) i i=1 j=1 and take the elements to be ordered as x < x , 1 ≤ i ≤ m−1. The dependence of X(cid:48) on i i+1 ζ is left implicit. We will need the following procedure which maps ζ to a generalized N N collection. Procedure G: Given ζ construct ζ as follows: Let Sˆ(1) = S(1). For any 2 ≤ i ≤ N, let N Nˆ i−1 (cid:91) Sˆ(i) = S(i)\ Sˆ(k). k=1 Let Nˆ be the number of non-empty sets Sˆ(i). Note that Nˆ may be smaller than N since there may be an element of ζ which is N contained in the union of other elements of ζ . It is easy to verify by induction that the N sets of ζ are mutually exclusive and their union equals that of the original sets in ζ . We Nˆ N have the following: (cid:12) (cid:12) (cid:12) (cid:12) Claim 1 (cid:12)VF+(ζN)(cid:12) ≤ (cid:12)VF+(G(ζN))(cid:12). 6 Proof: We make repetitive use of the following: let A,B ⊂ X(cid:48) be two non-empty sets and let C = B\A. Then for any f, any b ∈ {0,1}, if [e (f),e (f)] = [b,0], then [e (f),e (f)] A B A C may be either [b,0] or [b,1] since the elements in B which caused the product e (f) to B be zero may or may not also be in C. In the other case if [e (f),e (f)] = [b,1] then A B [e (f),e (f)] = [b,1]. Hence A C |{[e (f),e (f)] : f ∈ F }| ≤ |{[e (f),e (f)] : f ∈ F }|. A B + A C + The same argument holds also for multiple A ,...,A , B and C = B \ (cid:83)k A . Let 1 k i=1 i ζ = G(ζ ). We now apply this to the following: Nˆ N |{[e (f),e (f),e (f),...,e (f)] : f ∈ F }| S(1) S(2) S(3) S(N) + (cid:12)(cid:8)(cid:2) (cid:3) (cid:9)(cid:12) = (cid:12) eSˆ(1)(f),eS(2)(f),eS(3)(f),...,eS(N)(f) : f ∈ F+ (cid:12) (15) (cid:12)(cid:8)(cid:2) (cid:3) (cid:9)(cid:12) ≤ (cid:12) eSˆ(1)(f),eSˆ(2)(f),eS(3)(f),...,eS(N)(f) : f ∈ F+ (cid:12) (16) (cid:12)(cid:8)(cid:2) (cid:3) (cid:9)(cid:12) ≤ (cid:12) eSˆ(1)(f),eSˆ(2)(f),eSˆ(3)(f),eS(4)(f)...,eS(N)(f) : f ∈ F+ (cid:12) (17) ≤ ··· (cid:12)(cid:8)(cid:2) (cid:3) (cid:9)(cid:12) ≤ (cid:12) eSˆ(1)(h),eSˆ(2)(h),eSˆ(3)(h),eSˆ(4)(h),...,eSˆ(N)(h) : f ∈ F+ (cid:12) (18) where (15) follows since using G we have Sˆ(1) ≡ S(1), (16) follows by applying the above with A = Sˆ(1), B = S(2) and C = Sˆ(2), (17) follows by letting A = Sˆ(1), A = 1 2 Sˆ(2), B = S(3), and C = Sˆ(3). Finally, removing those sets Sˆ(i) which are possibly empty leaves Nˆ-dimensional vectors consisting only of the non-empty sets so (18) becomes (cid:12)(cid:8)(cid:2) (cid:3) (cid:9)(cid:12) (cid:12) eSˆ(1)(f),...,eSˆ(Nˆ)(f) : f ∈ F+ (cid:12). (cid:116)(cid:117) Hence (11) is bounded from above as (cid:12) (cid:12) ΓFγ(cid:48)(ζN) ≤ (cid:12)VF+(G(ζN))(cid:12). (19) Denote by N∗ ≡ m−(cid:96)+1 and define the following procedure which maps a generalized collection of sets in X to another. Procedure Q: Given a generalized collection ζ = {S(i)}N , S(i) ⊂ X. Construct ζ as N i=1 N∗ follows: let Y = (cid:83)N S(i) and let the elements in Y be ordered according to their ordering i=2 on X(cid:48) (we will refer to them as y , y , ...). Let S∗(1) = S(1). For 2 ≤ i ≤ m−(cid:96)+1, let 1 2 S∗(i) = {y }. i−1 We now have the following: Claim 2 For any ζ ⊂ S with |ζ | = m, then N (cid:96) N (cid:12) (cid:12) (cid:12) (cid:12) (cid:12)VF+(G(ζN))(cid:12) ≤ (cid:12)VF+(Q(G(ζN)))(cid:12). Proof: Let ζ ≡ Q(G(ζ )) and as before ζ = G(ζ ). Note that by definition of Proce- N˜ N Nˆ N dure Q, it follows that ζ consists of N˜ = N∗ non-overlapping sets, the first S˜(1) having N˜ cardinality (cid:96) and S˜(i), 2 ≤ i ≤ N˜, each having a single distinct element of X(cid:48). Their union satisfies (cid:83)N˜ S˜(i) = X(cid:48). i=1 7 Consider the sets V (ζ ), V (ζ ) and denote them simply by Vˆ and V˜. For any F+ Nˆ F+ N˜ vˆ∈ Vˆ consider the following subset of F , + B(vˆ) = {f ∈ F : vˆ(f) = vˆ}. + We consider two types of vˆ∈ Vˆ. The first does not have the following property: there exist functions f , f ∈ B(vˆ) with θγ (x) (cid:54)= θγ (x) for at least one element x ∈ X(cid:48). Denote α β fα fβ by θγ ≡ [θγ(x ),...,θγ(x )]. Then in this case all f ∈ B(vˆ) have the same θγ = θˆ, where f f 1 f m f θˆ∈ {0,1}m. This implies that e (f) = e (f) = vˆ S˜(1) Sˆ(1) 1 while for 2 ≤ j ≤ N˜ we have e (f) = θˆ S˜(j) k(j) where k : [N∗] → [m] maps from the index of a (singleton) set S˜(j) to the index of an element of X(cid:48) and θˆ denotes the k(j)th component of θˆ. Hence it follows that k(j) |V (ζ )| = |V (ζ )|. B(vˆ) N˜ B(vˆ) Nˆ Let the second type of vˆ satisfy the complement condition, namely, there exist functions f , f ∈ B(vˆ) with θγ (x) (cid:54)= θγ (x) for at least one point x ∈ X(cid:48). If such x is an element of α β fα fβ Sˆ(1) then the first part of the argument above holds and we still have |V (ζ )| = |V (ζ )|. B(vˆ) N˜ B(vˆ) Nˆ If however there is also such an x in some set Sˆ(j), 2 ≤ j ≤ Nˆ then since the sets S˜(i), 2 ≤ i ≤ N˜ are singletons then there exists some S˜(i) ⊆ Sˆ(j) with e (f ) (cid:54)= e (f ). S˜(i) α S˜(i) β Hence for this second type of vˆ we have |V (ζ )| ≥ |V (ζ )|. (20) B(vˆ) N˜ B(vˆ) Nˆ Combining the above, then (20) holds for any vˆ∈ Vˆ. Now, consider any two distinct vˆ , vˆ ∈ Vˆ. Clearly, B(vˆ )(cid:84)B(vˆ ) = ∅ since every f α β α β has a unique vˆ(f). Moreover, for any f ∈ B(vˆ ) and f ∈ B(vˆ ) we have v˜(f ) (cid:54)= v˜(f ) for a α b β a b the following reason: there must exist some set Sˆ(i) and a point x ∈ Sˆ(i) such that θγ (x) (cid:54)= fa θγ (x) (since vˆ (cid:54)= vˆ ). If i = 1 then they must differ on S˜(1), i.e., e (f ) (cid:54)= e (f ). fb α β S˜(1) α S˜(1) β If 2 ≤ i ≤ Nˆ, then such an x is in some set S˜(j) ⊆ Sˆ(i) where 2 ≤ j ≤ N˜ and therefore e (f ) (cid:54)= e (f ). Hence no two distinct vˆ , vˆ map to the same v˜. We therefore have S˜(j) α S˜(j) β α β (cid:12) (cid:12) (cid:88) (cid:12)VF+(ζNˆ)(cid:12) = |VB(vˆ)(ζNˆ)| vˆ∈Vˆ (cid:88) ≤ |V (ζ )| (21) B(vˆ) N˜ vˆ∈Vˆ = |V (ζ )| F+ N˜ 8 where (21) follows from (20) which proves the claim. (cid:116)(cid:117) NotethatbyconstructionofProcedureQ,thedimensionalityoftheelementsofV (Q(G(ζ ))) F+ N isN∗, i.e., m−(cid:96)+1, whichholdsforanyζ (evenmaximallyoverlapping)andX(cid:48) asdefined N in (9) and (14). Let us denote by ζ any set obtained by applying Procedure G on any N∗ collection ζ followed by Procedure Q, i.e., N (cid:110) (cid:111) ζ ≡ S∗(1),S∗(2),...,S∗(N∗) N∗ with a set S∗(1) ⊂ X(cid:48) of cardinality (cid:96) and S∗(k) = {x }, where x ∈ X(cid:48)\S∗(1), k = 2,...,N∗. i i k k Hence we have (cid:12) (cid:12) ζN⊂Sm(cid:96),a|ζxN|=mΓFγ(cid:48)(ζN) ≤ ζN⊂Sm(cid:96),a|ζxN|=m(cid:12)VF+(Q(G(ζN)))(cid:12) (22) (cid:12) (cid:12) ≤ max (cid:12)VF+(ζN∗)(cid:12) (23) ζN∗:|ζN∗|=m where (22) follows from (11), (13) and Claims 1 and 2 while (23) follows by definition of ζ . Now, N∗ (cid:12) (cid:12) (cid:12)VF+(ζN∗)(cid:12) = |{[eS∗(1)(f),...,eS∗(N∗)(f)] : f ∈ F+}| ≤ 2|{[eS∗(2)(f),...,eS∗(N∗)(f)] : f ∈ F+}| (24) where (24) follows trivially since e (f) is binary. So from (23) we have S∗(1) ζN⊂mS,|aζNx|=mΓFγ(cid:48)(ζN) ≤ 2ζN∗:m|ζNa∗x|=m|{[eS∗(2)(f),...,eS∗(N∗)(f)] : f ∈ F+}| (cid:12)(cid:110) (cid:111)(cid:12) ≤ 2 max (cid:12) [θγ(x ),...,θγ(x )] : f ∈ F (cid:12) (25) (cid:12) f 1 f m−(cid:96) + (cid:12) x1,...,xm−(cid:96)∈X where x ,...,x run over any m−(cid:96) points in X. Define the following infinite class of 1 m−(cid:96) binary functions on X by Θγ = {θγ(x) : f ∈ F } F+ f + and for any finite subset X(cid:48)(cid:48) = {x ,...,x } ⊂ X 1 m−(cid:96) let (cid:104) (cid:105) θγ(X(cid:48)(cid:48)) = θγ(x ),...,θγ(x ) f f 1 f m−(cid:96) and Θγ (X(cid:48)(cid:48)) = {θγ(X(cid:48)(cid:48)) : f ∈ F }. F+ f + We proceed to bound |Θγ (X(cid:48)(cid:48))|. F+ The class Θγ is in one-to-one correspondence with a class Cγ of sets C ⊂ X which F+ F+ f are defined as C = {x : θγ(x) = 1}, f ∈ F . f f + 9 We claim that any such set C equals the union of at most K ≡ (cid:98)B/(2γ)(cid:99) intervals. To see f this, note that based on the general form of f ∈ F (see (6) and (7)) in order for f(x) > γ + for every x in an interval set I ⊂ X then I must be contained in an interval set of the form (5) and of length at least 2γ. Hence for any f ∈ F the corresponding set C is comprised + f of no more than K distinct intervals as I. Hence the class Cγ is a subset of the class C F+ K of all sets that are comprised of the union of at most K subsets of X. A class H is said (cid:12) (cid:12) to shatter A if (cid:12){h|A : h ∈ H}(cid:12) = 2k. The Vapnik-Chervonenkis dimension of H, denoted as VC(H), is defined as the cardinality of the largest set shattered by H. It is easy to show that the VC-dimension of C is VC(C ) = 2K. Hence it follows from the Sauer-Shelah K K lemma (see ?) that the growth of Cγ on any finite set X(cid:48)(cid:48) ⊂ X of cardinality m−(cid:96) (see F+ (2)) satisfies 2K (cid:18) (cid:19) (cid:88) m−(cid:96) ΓCFγ+(X(cid:48)(cid:48)) ≤ i . i=0 Since |ΘFγ+(X(cid:48)(cid:48))| = ΓCFγ+(X(cid:48)(cid:48)) then from (8) and (25) it follows that 2(cid:98)B/(2γ)(cid:99)(cid:18) (cid:19) (cid:88) m−(cid:96) |Γ (m)| ≤ 2 H(cid:48) γ i i=0 which proves the statement of the theorem. (cid:3) 10

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.