ebook img

Data Streams: Models and Algorithms PDF

373 Pages·2010·10.84 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Data Streams: Models and Algorithms

Data Streams Models and Algorithms ADVANCES IN DATABASE SYSTEMS Series Editor Ahmed K. Elmagarmid Purdue Universify West Lafayette, IN 47907 Other books in the Series: SIMILARITY SEARCH: The Metric Space Approach, P. Zezuln, C. A~wito,V . Dohnal, M. Batko, ISBN: 0-387-29 146-6 STREAM DATA MANAGEMENT, Naurnan Chaudhry, Kevin Shaw, Mahdi Abdelgueifi, ISBN: 0-387-24393-3 FUZZY DATABASE MODELING WITH XML, Zongrnin Ma, ISBN: 0-387- 24248-1 MINING SEQUENTIAL PATTERNS FROM LARGE DATA SETS, Wei Wang and Jiong Yang; ISBN: 0-387-24246-5 ADVANCED SIGNATURE INDEXING FOR MULTIMEDIA AND WEB APPLICATIONS, Yannis Manolopoulos, Alexandros Nanopoulos, Eleni Tousidou; ISBN: 1-4020-7425-5 ADVANCES IN DIGITAL GOVERNMENT: Technology, Human Factors, and Policy, edited by William J. Mclver, Jr. and Ahrned K. Elrnagarrnid; ISBN: 1- 4020-7067-5 INFORMATION AND DATABASE QUALITY, Mario Piattini, Coral Calero and Marcela Genero; ISBN: 0-7923- 7599-8 DATA QUALITY, Richard Y. Wang, Mostapha Ziad, Yang W. Lee: ISBN: 0-7923- 7215-8 THE FRACTAL STRUCTURE OF DATA REFERENCE: Applications to the Memory Hierarchy, Bruce McNutt; ISBN: 0-7923-7945-4 SEMANTIC MODELS FOR MULTIMEDIA DATABASE SEARCHING AND BROWSING, Shu-Ching Chen, R.L. Kashyap, and ArifGhafoor; ISBN: 0-7923- 7888-1 INFORMATION BROKERING ACROSS HETEROGENEOUS DIGITAL DATA: A Metadata-based Approach, Vipul Kashyap, Arnit Sheth; ISBN: 0-7923-7883-0 DATA DISSEMINATION IN WIRELESS COMPUTING ENVIRONMENTS, Kian-Lee Tan and Beng Chin Ooi; ISBN: 0-7923-7866-0 MIDDLEWARE NETWORKS: Concept, Design and Deployment of Internet Infrastructure, Michah Lerner, George Vanecek, Nino Vidovic, Dad Vrsalovic; ISBN: 0-7923-7840-7 ADVANCED DATABASE INDEXING, Yannis Manolopoulos, Yannis Theodoridis, Vassilis J. Tsotras; ISBN: 0-7923-77 16-8 MULTILEVEL SECURE TRANSACTION PROCESSING, Vijay Atluri, Sushi1 Jajodia, Binto George ISBN: 0-7923-7702-8 FUZZY LOGIC IN DATA MODELING, Guoqing Chen ISBN: 0-7923-8253-6 For a complete listing of books in this series, go to htt~://www.s~rin~er.com Data Streams Models and Algorithms edited by Charu C. Aggarwal ZBM, T.J . Watson Research Center Yorktown Heights, NY, USA a - Springer Charu C. Aggarwal IBM Thomas J. Watson Research Center 19 Skyline Drive Hawthorne NY 10532 Library of Congress Control Number: 20069341 11 DATA STREAMS: Models and Algorithms edited by Charu C. Aggarwal ISBN- 10: 0-387-28759-0 ISBN- 13: 978-0-387-28759- 1 e-ISBN- 10: 0-387-47534-6 e-ISBN-13: 978-0-387-47534-9 Cover by Will Ladd, NRL Mapping, Charting and Geodesy Branch utilizing NRL's GIDBB Portal System that can be utilized at http://dmap.nrlssc.navy.mil Printed on acid-free paper. O 2007 Springer Science+Business Media, LLC. All rights reserved. This work may not be translated or copied in whole or in part without the written permission of the publisher (Springer Science+Business Media, LLC, 233 Spring Street, New York, NY 10013, USA), except for brief excerpts in connection with reviews or scholarly analysis. Use in connection with any form of information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now know or hereafter developed is forbidden. The use in this publication of trade names, trademarks, service marks and similar terms, even if the are not identified as such, is not to be taken as an expression of opinion as to whether or not they are subject to proprietary rights. Contents List of Figures List of Tables xv Preface xvii 1 An Introduction to Data Streams Cham C. Aggarwal 1. Introduction 2. Stream Mining Algorithms 3. Conclusions and Summary References 2 On Clustering Massive Data Streams: A Summarization Paradigm Cham C. Aggarwal, Jiawei Han, Jianyong Wang and Philip S. Yu 1. Introduction 2. The Micro-clustering Based Stream Mining Framework 3. Clustering Evolving Data Streams: A Micro-clustering Approach 3.1 Micro-clustering Challenges 3.2 Online Micro-cluster Maintenance: The CluStream Algo- rithm 3.3 High Dimensional Projected Stream Clustering 4. Classification of Data Streams: A Micro-clustering Approach 4.1 On-Demand Stream Classification 5. Other Applications of Micro-clustering and Research Directions 6. Performance Study and Experimental Results 7. Discussion References 3 A Survey of Classification Methods in Data Streams Mohamed Medhat Gaber, Arkady Zaslavsky and Shonali Krishnaswamy 1. Introduction 2. Research Issues 3. Solution Approaches 4. Classification Techniques 4.1 Ensemble Based Classification 4.2 Very Fast Decision Trees (VFDT) DATA STREAMS: MODELS AND ALGORITHMS 4.3 On Demand Classification 4.4 Online Information Network (OLIN) 4.5 LWClass Algorithm 4.6 ANNCAD Algorithm 4.7 SCALLOP Algorithm 5. Summary References 4 Frequent Pattern Mining in Data Streams Ruoming Jin and Gagan Agrawal 1. Introduction 2. Overview 3. New Algorithm 4. Work on Other Related Problems 5. Conclusions and Future Directions References 5 A Survey of Change Diagnosis Algorithms in Evolving Data Streams Cham C. Agganval 1. Introduction 2. The Velocity Density Method 2.1 Spatial Velocity Profiles 2.2 Evolution Computations in High Dimensional Case 2.3 On the use of clustering for characterizing stream evolution 3. On the Effect of Evolution in Data Mining Algorithms 4. Conclusions References 6 Multi-Dimensional Analysis of Data 103 Streams Using Stream Cubes Jiawei Hun, Z Dora Cai, rain Chen, Guozhu Dong, Jian Pei, Benjamin W: Wah, and Jianyong Wang 1. Introduction 104 2. Problem Definition 106 3. Architecture for On-line Analysis of Data Streams 108 3.1 Tilted time fiame 108 3.2 Critical layers 110 3.3 Partial materialization of stream cube 111 4. Stream Data Cube Computation 112 4.1 Algorithms for cube computation 115 5. Performance Study 117 6. Related Work 120 7. Possible Extensions 121 8. Conclusions 122 References 123 Contents vii 7 Load Shedding in Data Stream Systems Brian Babcoclr, Mayur Datar and Rajeev Motwani 1. Load Shedding for Aggregation Queries 1.1 Problem Formulation 1.2 Load Shedding Algorithm 1.3 Extensions 2. Load Shedding in Aurora 3. Load Shedding for Sliding Window Joins 4. Load Shedding for Classification Queries 5. Summary References 8 The Sliding-Window Computation Model and Results Mayur Datar and Rajeev Motwani 0.1 Motivation and Road Map 1. A Solution to the BASICCOUNTINPrGob lem 1.1 The Approximation Scheme 2. Space Lower Bound for BASICCOUNTINPrGob lem 3. Beyond 0's and 1's 4. References and Related Work 5. Conclusion References 9 A Survey of Synopsis Construction in Data Streams Cham C. Agganual, Philip S. Yu 1. Introduction 2. Sampling Methods 2.1 Random Sampling with a Reservoir 2.2 Concise Sampling 3. Wavelets 3.1 Recent Research on Wavelet Decomposition in Data Streams 4. Sketches 4.1 Fixed Window Sketches for Massive Time Series 4.2 Variable Window Sketches of Massive Time Series 4.3 Sketches and their applications in Data Streams 4.4 Sketches with p-stable distributions 4.5 The Count-Min Sketch 4.6 Related Counting Methods: Hash Functions for Determining Distinct Elements 4.7 Advantages and Limitations of Sketch Based Methods 5. Histograms 5.1 One Pass Construction of Equi-depth Histograms 5.2 Constructing V-Optimal Histograms 5.3 Wavelet Based Histograms for Query Answering 5.4 Sketch Based Methods for Multi-dimensional Histograms 6. Discussion and Challenges viii DATA STREAMS: MODELS AND ALGORITHMS References 10 A Survey of Join Processing in Data Streams Junyi Xie and Jun Yang 1. Introduction 2. Model and Semantics 3. State Management for Stream Joins 3.1 Exploiting Constraints 3.2 Exploiting Statistical Properties 4. Fundamental Algorithms for Stream Join Processing 5. Optimizing Stream Joins 6. Conclusion Acknowledgments References 11 Indexing and Querying Data Streams Ahmet Bulut, Ambuj K. Singh Introduction Indexing Streams 2.1 Preliminaries and definitions 2.2 Feature extraction 2.3 Index maintenance 2.4 Discrete Wavelet Transform Querying Streams 3.1 Monitoring an aggregate query 3.2 Monitoring a pattern query 3.3 Monitoring a correlation query Related Work Future Directions 5.1 Distributed monitoring systems 5.2 Probabilistic modeling of sensor networks 5.3 Content distribution networks Chapter Summary References 12 Dimensionality Reduction and Forecasting on Streams Spiros Papadimitriou, Jimeng Sun, and Christos Faloutsos 1. Related work 2. Principal component analysis (PCA) 3. Auto-regressive models and recursive least squares 4. MUSCLES 5. Tracking correlations and hidden variables: SPIRIT 6. Putting SPIRIT to work 7. Experimental case studies Contents ix 8. Performance and accuracy 9. Conclusion Acknowledgments References 287 13 A Survey of Distributed Mining of Data Streams Srinivasan Parthasarathy, Am01 Ghoting and Matthew Eric Otey 1. Introduction 2. Outlier and Anomaly Detection 3. Clustering 4. Frequent itemset mining 5. Classification 6. Summarization 7. Mining Distributed Data Streams in Resource Constrained Environ- ments 8. Systems Support References 14 Algorithms for Distributed 309 Data Stream Mining Kanishka Bhaduri, Kamalika Das, Krishnamoorthy Sivakumar, Hill01 Kargupta, Ran Wolfand Rong Chen 1. Introduction 310 2. Motivation: Why Distributed Data Stream Mining? 311 3. Existing Distributed Data Stream Mining Algorithms 3 12 4. A local algorithm for distributed data stream mining 315 4.1 Local Algorithms : definition 315 4.2 Algorithm details 316 4.3 Experimental results 318 4.4 Modifications and extensions 320 5. Bayesian Network Learning from Distributed Data Streams 32 1 5.1 Distributed Bayesian Network Learning Algorithm 322 5.2 Selection of samples for transmission to global site 323 5.3 Online Distributed Bayesian Network Learning 324 5.4 Experimental Results 326 6. Conclusion 326 References 329 15 A Survey of Stream Processing Problems and Techniques in Sensor Networks Sharmila Subramaniam, Dimitrios Gunopulos 1. Challenges

Description:
DATA STREAMS: Models and Algorithms edited by Charu C. Aggarwal analysis. Use in connection with any form of information storage and retrieval .. clustering, classification, outlier detection, frequent pattern mining, and surn-.
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.