Nearest Neighbors Algorithms in Euclidean and Metric Spaces [email protected] Introduction kd-trees and basic search algorithms kd-trees and random projection trees: improved search algorithms kd-trees and random projection trees: diameter reduction Metric trees and variants Distance concentration phenomena: an introduction A metric from optimal transportation: the earth mover distance Introduction kd-trees and basic search algorithms kd-trees and random projection trees: improved search algorithms kd-trees and random projection trees: diameter reduction Metric trees and variants Distance concentration phenomena: an introduction A metric from optimal transportation: the earth mover distance Nearest Neighbors Algorithms in Euclidean and Metric Spaces Introduction kd-trees and basic search algorithms kd-trees and random projection trees: improved search algorithms kd-trees and random projection trees: diameter reduction Metric trees and variants Distance concentration phenomena: an introduction A metric from optimal transportation: the earth mover distance Applications (cid:46) A core problem in the following applications: (cid:73) clustering, k-means algorithms (cid:73) information retrieval in data base (cid:73) information theory : vector quantization encoding (cid:73) classification in learning theory (cid:73) ... Nearest Neighbors: Getting Started (cid:46) Input: a set of points (aka sites) P in Rd, a query point q (cid:46) Output: nn(q,P), the point of P nearest to q d(q,P)=d(q,nn(q,P)). (1) nn(q) q The Euclidean Voronoi Diagram and its Dual the Delaunay Triangulation (cid:46) Voronoi and Delaunay diagrams (cid:46) Key properties: (cid:73) Voronoi cells of all dimensions (cid:73) Voronoi - Delaunay via the nerve construction (cid:73) Duality : cells of dim. d−k vs cells of dimension k (cid:73) The empty ball property Nearest Neighbors Using Voronoi Diagrams (cid:46) Nearest neighbor by walking - start from any point p∈P - while ∃ a neighbor n(p) of p in p Vor(P) closer to q than p, step to it: p=n(p) - done nn(q)=p nn(q) q (cid:46) Argument: the Delaunay neighborhood of a point is complete Vor(p,P)= cell of p in Vor(P) N(p) = set of neighbors of p in Vor(P) N(cid:48)(p)={p}(cid:83)N(p) Vor(p,N(cid:48)(p))=Vor(p,P) The Nearest Neighbors Problem: Overview (cid:46) Strategy: prepocess point set P of n points in Rd into a data structure (DS) for fast nearest neighbor queries answer. (cid:46) Ideal wish list: (cid:73) The DS should have linear size (cid:73) A query should have sub-linear complexity i.e. o(n) (cid:73) When d =1: balanced binary search trees yield O(logn) (cid:46) Core difficulties: (cid:73) curse of dimensionality: typically space Rd has a high d dimension and n d. (cid:29) (cid:73) Interpretation (meaningfull-ness) of distances in high dimensional spaces. The Nearest Neighbors Problem: Elementary Options (cid:46) The trivial solution : O(dn) space, O(dn) query time (cid:46) Voronoi diagram d =2, O(n) space O(logn) query time (cid:16) (cid:17) d >2, O n(cid:100)d2(cid:101) space Under locally uniform condition on point distribution → the 1-skeleton Delaunay hierarchy achieves : O(n) space, O(cdlogn) expected query time. (cid:46) Spatial partitions based on trees
Description: