Coresets for k-Means and k-Median Clustering and their Applications Sariel Har-Peled Soham Mazumdar UIUC d Compute C = fc ; : : : ; c g (cid:18) IR - centers 1 k Minimize price: (cid:23) (P; k) = min dist(p; C) opt d C(cid:18)IR ;jCj=k X p2P where dist(p; C) = min kpc k i i Advantages: Less sensitive to noise. Theoretically nice. Problem k median clustering P d Input: - set of n points in IR . k - number of clusters O k k Coresetsfor -Meansand -MedianClusteringandtheirApplications–p.1 Advantages: Less sensitive to noise. Theoretically nice. Problem k median clustering P d Input: - set of n points in IR . k - number of clusters d Compute C = fc ; : : : ; c g (cid:18) IR - centers 1 k Minimize price: (cid:23) (P; k) = min dist(p; C) opt d C(cid:18)IR ;jCj=k X p2P where dist(p; C) = min kpc k i i O k k Coresetsfor -Meansand -MedianClusteringandtheirApplications–p.1 Problem k median clustering P d Input: - set of n points in IR . k - number of clusters d Compute C = fc ; : : : ; c g (cid:18) IR - centers 1 k Minimize price: (cid:23) (P; k) = min dist(p; C) opt d C(cid:18)IR ;jCj=k X p2P where dist(p; C) = min kpc k i i Advantages: Less sensitive to noise. Theoretically nice. k k Coresetsfor -Meansand -MedianClusteringandtheirApplications–p.1 [Arora et al. (1998)] O(1=")+1 O n (cid:16) (cid:17) [Kolliopoulos and Rao (1999)] O(% (cid:1) n log n log k) (Discrete) where d(cid:0)1 % = exp [O((1 + log 1=")=") ] Our result: O(1) O(1) O n + %k log n (cid:16) (cid:17) (1 + ")-approx k-Median Motivated by [Arora (1998)] - Approx. TSP O k k Coresetsfor -Meansand -MedianClusteringandtheirApplications–p.2 [Kolliopoulos and Rao (1999)] O(% (cid:1) n log n log k) (Discrete) where d(cid:0)1 % = exp [O((1 + log 1=")=") ] Our result: O(1) O(1) O n + %k log n (cid:16) (cid:17) (1 + ")-approx k-Median Motivated by [Arora (1998)] - Approx. TSP [Arora et al. (1998)] O(1=")+1 O n (cid:16) (cid:17) O k k Coresetsfor -Meansand -MedianClusteringandtheirApplications–p.2 Our result: O(1) O(1) O n + %k log n (cid:16) (cid:17) (1 + ")-approx k-Median Motivated by [Arora (1998)] - Approx. TSP [Arora et al. (1998)] O(1=")+1 O n (cid:16) (cid:17) [Kolliopoulos and Rao (1999)] O(% (cid:1) n log n log k) (Discrete) where d(cid:0)1 % = exp [O((1 + log 1=")=") ] O k k Coresetsfor -Meansand -MedianClusteringandtheirApplications–p.2 (1 + ")-approx k-Median Motivated by [Arora (1998)] - Approx. TSP [Arora et al. (1998)] O(1=")+1 O n (cid:16) (cid:17) [Kolliopoulos and Rao (1999)] O(% (cid:1) n log n log k) (Discrete) where d(cid:0)1 % = exp [O((1 + log 1=")=") ] Our result: O(1) O(1) O n + %k log n (cid:16) (cid:17) k k Coresetsfor -Meansand -MedianClusteringandtheirApplications–p.2 (1 + ")-approx k-Median High Dimension [Ba˘ doiu et al. (2002)] (k=")O(1) O(1) O(k) 2 d n log n k k Coresetsfor -Meansand -MedianClusteringandtheirApplications–p.3 d Compute: C = fc ; : : : ; c g (cid:18) IR - centers 1 k 2 Price: (cid:23) (P; k) = min (dist(pC)) opt d C(cid:18)IR ;jCj=k X p2P where dist(p; C) = min kpc k i i Advantages: Less sensitive to noise. Efficient heuristic: Lloyd’s method. k-means clustering d Input: P - set of n points in IR . k - number of cluster O k k Coresetsfor -Meansand -MedianClusteringandtheirApplications–p.4
Description: