(cid:39) (cid:36) Spatial Data Mining: Accomplishments and Research Needs Shashi Shekhar Department of Computer Science and Engineering University of Minnesota Sea Surface Temperature (SST) in March, 1982 (cid:38) (cid:37) (cid:39) (cid:36) Why Data Mining? (cid:63) Holy Grail – Informed Decision Making (cid:63) Lots of Data are Being Collected Business Applications: • Transactions: retail, bank ATM, air travel, etc – Web logs, e-commerce, GPS-track – Scientific Applications: • Remote sensing: e.g., NASA’s Earth Observing System – Sky survey – Microarrays generating gene expression data – (cid:63) Challenges: • Volume (data) (cid:192) number of human analysts Some automation needed • (cid:63) Data Mining may help! Provide better and custmized insights for business • Help scientists for hypothesis generation • (cid:38) (cid:37) Spatial Data Mining: Accomplishments and Research Needs 1 (cid:39) (cid:36) Spatial Data (cid:63) Location-based Services Ex: MapQuest, Yahoo Maps, Google Maps, MapPoint • Figure 1: Google Local Search (http://maps.google.com) (cid:63) In-car Navigation Device Figure 2: Emerson In-Car Navigation System (In Coutesy of Amazon.com) (cid:38) (cid:37) Spatial Data Mining: Accomplishments and Research Needs 2 (cid:39) (cid:36) Spatial Data Mining (SDM) (cid:63) The process of discovering interesting, useful, non-trivial patterns • patterns: non-specialist – exception to patterns: specialist – from large spatial datasets • (cid:63) Spatial patterns Spatial outlier, discontinuities • bad traffic sensors on highways (DOT) – Location prediction models • model to identify habitat of endangered species – Spatial clusters • crime hot-spots (NIJ), cancer clusters (CDC) – Co-location patterns • predator-prey species, symbiosis – Dental health and fluoride – (cid:38) (cid:37) Spatial Data Mining: Accomplishments and Research Needs 3 (cid:39) (cid:36) Location As Attribute (cid:63) Location as attribute in spatial data mining (cid:63) What value is location as an explanatory variable? most events are associated with space and time • surrogate variable • critical to data analyses for many application domains • physical science – social science – (cid:63) Location helps bring rich contexts Physical: e.g., rainfall, temperature, and wind • Demographical: e.g., age group, gender, and income type • Problem-specific • (cid:63) Location helps bring relationships e.g., distance to open water • (cid:38) (cid:37) Spatial Data Mining: Accomplishments and Research Needs 4 (cid:39) (cid:36) Example Spatial Pattern: Spatial Cluster (cid:63) The 1854 Asiatic Cholera in London (cid:38) (cid:37) Spatial Data Mining: Accomplishments and Research Needs 5 (cid:39) (cid:36) Example Spatial Pattern: Spatial Outliers (cid:63) Spatial Outliers Traffic Data in Twin Cities • Abnormal Sensor Detections • Spatial and Temporal Outliers • Average Traffic Volume(Time v.s. Station) 180 160 10 ) 140 d n u o20 B 120 h ut o S 100 (30 D n I o 80 ati St40 W 60 5 3 I 40 50 20 60 0 50 100 150 200 250 Time (cid:38) (cid:37) Spatial Data Mining: Accomplishments and Research Needs 6 (cid:39) (cid:36) Example Spatial Pattern: Predictive Models (cid:63) Location Prediction: Bird Habitat Prediction Given training data • Predictive model building • Predict new data • Nest sites for 1995 Darr location 0 10 20 30 Marsh land 40 Nest sites 50 60 70 80 0 20 40 60 80 100 120 140 160 nz = 85 (cid:38) (cid:37) Spatial Data Mining: Accomplishments and Research Needs 7 (cid:39) (cid:36) Example Spatial Pattern: Co-locations (backup) (cid:63) Given: A collection of different types of spatial events • (cid:63) Illustration (cid:63) Find: Co-located subsets of event types (cid:38) (cid:37) Spatial Data Mining: Accomplishments and Research Needs 8 (cid:39) (cid:36) What’s NOT Spatial Data Mining (cid:63) Simple Querying of Spatial Data Find neighbors of Canada given names and boundaries of • all countries Find shortest path from Boston to Houston in a freeway • map Search space is not large (not exponential) • (cid:63) Testing a hypothesis via a primary data analysis Ex. Female chimpanzee territories are smaller than male • territories Search space is not large ! • SDM: secondary data analysis to generate multiple plau- • sible hypotheses (cid:63) Uninteresting or obvious patterns in spatial data Heavy rainfall in Minneapolis is correlated with heavy • rainfall in St. Paul, Given that the two cities are 10 miles apart. Common knowledge: Nearby places have similar rainfall • (cid:63) Mining of non-spatial data Diaper sales and beer sales are correlated in evening • (cid:38) (cid:37) Spatial Data Mining: Accomplishments and Research Needs 9
Description: