ebook img

Multiclass Learnability and the ERM Principle PDF

74 Pages·2011·0.66 MB·English
by  
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Multiclass Learnability and the ERM Principle

Multiclass Learnability and the ERM Principle Amit Daniely, Sivan Sabato, Shai Ben-David, Shai Shalev-Shwartz TheHebrewUniversityofJerusalem COLT 2011 Danielyetal (HebrewU) MulticlassLearnabilityandtheERMPrinciple COLT’11 1/24 Surprise: The fundamental theorem is not true in the multiclass case. Then how can we learn in the multiclass setting? The Fundamental Theorem of Learning Theory Uniform ERM works Learnability Convergence Danielyetal (HebrewU) MulticlassLearnabilityandtheERMPrinciple COLT’11 2/24 Then how can we learn in the multiclass setting? The Fundamental Theorem of Learning Theory Uniform ERM works Learnability Convergence Surprise: The fundamental theorem is not true in the multiclass case. Danielyetal (HebrewU) MulticlassLearnabilityandtheERMPrinciple COLT’11 2/24 The Fundamental Theorem of Learning Theory Uniform ERM works Learnability Convergence Surprise: The fundamental theorem is not true in the multiclass case. Then how can we learn in the multiclass setting? Danielyetal (HebrewU) MulticlassLearnabilityandtheERMPrinciple COLT’11 2/24 Given a distribution D, err(h) = P [h(x) (cid:54)= y] (x,y)∼D S = {(X ,Y ),...,(X ,Y )} – an i.i.d. sample. Define 1 1 m m #{i : h(X ) (cid:54)= Y } i i err (h) = S m PAC Algorithm: For m ≥ m ((cid:15),δ), w.p. ≥ 1−δ, A err(A(S)) ≤ minerr(h)+(cid:15) h∈H ERM Algorithm: err (A(S)) is minimal. S For simplicity: assume the realizable case: ∃h∗ ∈ H, err(h∗) = 0 Setting X - instances, Y - labels, H ⊂ YX - hypotheses. Danielyetal (HebrewU) MulticlassLearnabilityandtheERMPrinciple COLT’11 3/24 S = {(X ,Y ),...,(X ,Y )} – an i.i.d. sample. Define 1 1 m m #{i : h(X ) (cid:54)= Y } i i err (h) = S m PAC Algorithm: For m ≥ m ((cid:15),δ), w.p. ≥ 1−δ, A err(A(S)) ≤ minerr(h)+(cid:15) h∈H ERM Algorithm: err (A(S)) is minimal. S For simplicity: assume the realizable case: ∃h∗ ∈ H, err(h∗) = 0 Setting X - instances, Y - labels, H ⊂ YX - hypotheses. Given a distribution D, err(h) = P [h(x) (cid:54)= y] (x,y)∼D Danielyetal (HebrewU) MulticlassLearnabilityandtheERMPrinciple COLT’11 3/24 PAC Algorithm: For m ≥ m ((cid:15),δ), w.p. ≥ 1−δ, A err(A(S)) ≤ minerr(h)+(cid:15) h∈H ERM Algorithm: err (A(S)) is minimal. S For simplicity: assume the realizable case: ∃h∗ ∈ H, err(h∗) = 0 Setting X - instances, Y - labels, H ⊂ YX - hypotheses. Given a distribution D, err(h) = P [h(x) (cid:54)= y] (x,y)∼D S = {(X ,Y ),...,(X ,Y )} – an i.i.d. sample. Define 1 1 m m #{i : h(X ) (cid:54)= Y } i i err (h) = S m Danielyetal (HebrewU) MulticlassLearnabilityandtheERMPrinciple COLT’11 3/24 ERM Algorithm: err (A(S)) is minimal. S For simplicity: assume the realizable case: ∃h∗ ∈ H, err(h∗) = 0 Setting X - instances, Y - labels, H ⊂ YX - hypotheses. Given a distribution D, err(h) = P [h(x) (cid:54)= y] (x,y)∼D S = {(X ,Y ),...,(X ,Y )} – an i.i.d. sample. Define 1 1 m m #{i : h(X ) (cid:54)= Y } i i err (h) = S m PAC Algorithm: For m ≥ m ((cid:15),δ), w.p. ≥ 1−δ, A err(A(S)) ≤ minerr(h)+(cid:15) h∈H Danielyetal (HebrewU) MulticlassLearnabilityandtheERMPrinciple COLT’11 3/24 For simplicity: assume the realizable case: ∃h∗ ∈ H, err(h∗) = 0 Setting X - instances, Y - labels, H ⊂ YX - hypotheses. Given a distribution D, err(h) = P [h(x) (cid:54)= y] (x,y)∼D S = {(X ,Y ),...,(X ,Y )} – an i.i.d. sample. Define 1 1 m m #{i : h(X ) (cid:54)= Y } i i err (h) = S m PAC Algorithm: For m ≥ m ((cid:15),δ), w.p. ≥ 1−δ, A err(A(S)) ≤ minerr(h)+(cid:15) h∈H ERM Algorithm: err (A(S)) is minimal. S Danielyetal (HebrewU) MulticlassLearnabilityandtheERMPrinciple COLT’11 3/24 Setting X - instances, Y - labels, H ⊂ YX - hypotheses. Given a distribution D, err(h) = P [h(x) (cid:54)= y] (x,y)∼D S = {(X ,Y ),...,(X ,Y )} – an i.i.d. sample. Define 1 1 m m #{i : h(X ) (cid:54)= Y } i i err (h) = S m PAC Algorithm: For m ≥ m ((cid:15),δ), w.p. ≥ 1−δ, A err(A(S)) ≤ minerr(h)+(cid:15) h∈H ERM Algorithm: err (A(S)) is minimal. S For simplicity: assume the realizable case: ∃h∗ ∈ H, err(h∗) = 0 Danielyetal (HebrewU) MulticlassLearnabilityandtheERMPrinciple COLT’11 3/24

Description:
Multiclass Learnability and the ERM Principle. Amit Daniely, Sivan Sabato, Shai Ben-David, Shai Shalev-Shwartz. The Hebrew University of
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.