ebook img

Scientific Data Mining: A Practical Perspective PDF

304 Pages·2009·3.69 MB·english
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Scientific Data Mining: A Practical Perspective

Scientific Data Mining Scientific Data Mining A PRACTICAL PERSPECTIVE Chandrika Kamath Lawrence Livermore National Laboratory Livermore, California Society for Industrial and Applied Mathematics Philadelphia Copyright © 2009by the Society for Industrial and Applied Mathematics (SIAM). 10 9 8 7 6 5 4 3 2 1 All rights reserved. Printed in the United States of America. No part of this book may be reproduced, stored, or transmitted in any manner without the written permission of the publisher. For information, write to the Society for Industrial and Applied Mathematics, 3600 Market Street, 6th Floor, Philadelphia, PA, 19104-2688 USA. Trademarked names may be used in this book without the inclusion of a trademark symbol. These names are used in an editorial context only; no infringement of trademark is intended. The galaxy images on the cover are from the Faint Images of the Radio Sky at Twenty centimeters (FIRST) survey, available at sundog.stsci.edu. Library of Congress Cataloging-in-Publication Data Kamath, Chandrika. Scientific data mining : a practical perspective / Chandrika Kamath. p. cm. Includes bibliographical references and index. ISBN 978-0-898716-75-7 1. Data mining. 2. Science--Databases. 3. Engineering--Databases. I. Title. QA76.9.D343K356 2009 502.85'6312--dc22 2008056149 is a registered trademark. To my parents and Sisira (cid:2) Contents Preface xiii 1 Introduction 1 1.1 Defining“datamining” . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Miningscienceandengineeringdata . . . . . . . . . . . . . . . . . . 2 1.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2 DataMininginScienceandEngineering 5 2.1 Astronomyandastrophysics . . . . . . . . . . . . . . . . . . . . . . . 6 2.1.1 Characteristicsofastronomydata . . . . . . . . . . . . . 10 2.2 Remotesensing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.2.1 Characteristicsofremotesensingdata. . . . . . . . . . . 15 2.3 Biologicalsciences . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.3.1 Bioinformatics . . . . . . . . . . . . . . . . . . . . . . . 18 2.3.2 Medicine . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.3.3 Characteristicsofbiologicaldata . . . . . . . . . . . . . 21 2.4 Securityandsurveillance . . . . . . . . . . . . . . . . . . . . . . . . 21 2.4.1 Biometrics . . . . . . . . . . . . . . . . . . . . . . . . . 22 2.4.2 Surveillance . . . . . . . . . . . . . . . . . . . . . . . . 22 2.4.3 Networkintrusiondetection . . . . . . . . . . . . . . . . 23 2.4.4 Automatedtargetrecognition . . . . . . . . . . . . . . . 24 2.4.5 Characteristicsofsecurityandsurveillancedata . . . . . 24 2.5 Computersimulations . . . . . . . . . . . . . . . . . . . . . . . . . . 25 2.5.1 Characteristicsofsimulationdata . . . . . . . . . . . . . 29 2.6 Experimentalphysics . . . . . . . . . . . . . . . . . . . . . . . . . . 30 2.6.1 Characteristicsofexperimentalphysicsdata . . . . . . . 34 2.7 Informationretrieval . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 2.7.1 Characteristicsofretrievalproblems . . . . . . . . . . . 37 2.8 Otherapplications . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 2.8.1 Nondestructivetesting . . . . . . . . . . . . . . . . . . . 37 2.8.2 Earth,environmental,andatmosphericsciences . . . . . 37 2.8.3 Chemistryandcheminformatics . . . . . . . . . . . . . . 38 2.8.4 Materialsscienceandmaterialsinformatics . . . . . . . . 38 2.8.5 Manufacturing . . . . . . . . . . . . . . . . . . . . . . . 38 2.8.6 Scientificandinformationvisualization . . . . . . . . . . 38 vii viii Contents 2.9 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 2.10 Suggestionsforfurtherreading . . . . . . . . . . . . . . . . . . . . . 39 3 CommonThemesinMiningScientificData 41 3.1 Typesofscientificdata . . . . . . . . . . . . . . . . . . . . . . . . . 41 3.1.1 Tabledata . . . . . . . . . . . . . . . . . . . . . . . . . 42 3.1.2 Imagedata . . . . . . . . . . . . . . . . . . . . . . . . . 42 3.1.3 Meshdata . . . . . . . . . . . . . . . . . . . . . . . . . 43 3.2 Characteristicsofscientificdata . . . . . . . . . . . . . . . . . . . . . 46 3.2.1 Multispectral,multisensor,multimodaldata. . . . . . . . 46 3.2.2 Spatiotemporaldata . . . . . . . . . . . . . . . . . . . . 46 3.2.3 Compresseddata . . . . . . . . . . . . . . . . . . . . . . 47 3.2.4 Streamingdata . . . . . . . . . . . . . . . . . . . . . . . 47 3.2.5 Massivedata . . . . . . . . . . . . . . . . . . . . . . . . 47 3.2.6 Distributeddata . . . . . . . . . . . . . . . . . . . . . . 48 3.2.7 Differentdataformats . . . . . . . . . . . . . . . . . . . 48 3.2.8 Differentoutputschemes . . . . . . . . . . . . . . . . . 49 3.2.9 Noisy,missing,anduncertaindata . . . . . . . . . . . . 49 3.2.10 Low-leveldata,higher-levelobjects . . . . . . . . . . . . 50 3.2.11 Representationofobjectsinthedata . . . . . . . . . . . 51 3.2.12 High-dimensionaldata . . . . . . . . . . . . . . . . . . . 52 3.2.13 Sizeandqualityoflabeleddata . . . . . . . . . . . . . . 52 3.3 Characteristicsofscientificdataanalysis . . . . . . . . . . . . . . . . 53 3.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 3.5 Suggestionsforfurtherreading . . . . . . . . . . . . . . . . . . . . . 56 4 TheScientificDataMiningProcess 57 4.1 Thetasksinthescientificdataminingprocess . . . . . . . . . . . . . 57 4.1.1 Transformingrawdataintotargetdata . . . . . . . . . . 58 4.1.2 Transformingtargetdataintopreprocesseddata . . . . . 60 4.1.3 Convertingpreprocesseddataintotransformeddata . . . 62 4.1.4 Convertingtransformeddataintopatterns. . . . . . . . . 63 4.1.5 Convertingpatternsintoknowledge . . . . . . . . . . . . 63 4.2 Generalobservationsaboutthescientificdataminingprocess . . . . . 64 4.3 Definingscientificdatamining: Therationale . . . . . . . . . . . . . 65 4.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 5 ReducingtheSizeoftheData 67 5.1 Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 5.2 Multiresolutiontechniques . . . . . . . . . . . . . . . . . . . . . . . 70 5.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 5.4 Suggestionsforfurtherreading . . . . . . . . . . . . . . . . . . . . . 77 6 FusingDifferentDataModalities 79 6.1 Theneedfordatafusion . . . . . . . . . . . . . . . . . . . . . . . . . 80 6.2 Levelsofdatafusion. . . . . . . . . . . . . . . . . . . . . . . . . . . 81 6.3 Sensor-leveldatafusion . . . . . . . . . . . . . . . . . . . . . . . . . 83 6.3.1 Multipletargettracking . . . . . . . . . . . . . . . . . . 84 Contents ix 6.3.2 Imageregistration . . . . . . . . . . . . . . . . . . . . . 85 6.4 Feature-leveldatafusion. . . . . . . . . . . . . . . . . . . . . . . . . 90 6.5 Decision-leveldatafusion . . . . . . . . . . . . . . . . . . . . . . . . 91 6.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 6.7 Suggestionsforfurtherreading . . . . . . . . . . . . . . . . . . . . . 92 7 EnhancingImageData 93 7.1 Theneedforimageenhancement . . . . . . . . . . . . . . . . . . . . 94 7.2 Imagedenoising . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 7.2.1 Filter-basedapproaches . . . . . . . . . . . . . . . . . . 95 7.2.2 Wavelet-basedapproaches . . . . . . . . . . . . . . . . . 99 7.2.3 Partialdifferentialequation–basedapproaches . . . . . . 102 7.2.4 Removingmultiplicativenoise . . . . . . . . . . . . . . 105 7.2.5 Problem-specificdenoising . . . . . . . . . . . . . . . . 106 7.3 Contrastenhancement . . . . . . . . . . . . . . . . . . . . . . . . . . 107 7.4 Morphologicaltechniques . . . . . . . . . . . . . . . . . . . . . . . . 110 7.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 7.6 Suggestionsforfurtherreading . . . . . . . . . . . . . . . . . . . . . 111 8 FindingObjectsintheData 113 8.1 Edge-basedtechniques. . . . . . . . . . . . . . . . . . . . . . . . . . 114 8.1.1 TheCannyedgedetector . . . . . . . . . . . . . . . . . 117 8.1.2 Activecontours . . . . . . . . . . . . . . . . . . . . . . 117 8.1.3 TheUSANapproach . . . . . . . . . . . . . . . . . . . . 123 8.2 Region-basedmethods . . . . . . . . . . . . . . . . . . . . . . . . . . 123 8.2.1 Regionsplitting . . . . . . . . . . . . . . . . . . . . . . 124 8.2.2 Regionmerging . . . . . . . . . . . . . . . . . . . . . . 124 8.2.3 Regionsplittingandmerging . . . . . . . . . . . . . . . 125 8.2.4 Clusteringandclassification . . . . . . . . . . . . . . . . 127 8.2.5 Watershedsegmentation . . . . . . . . . . . . . . . . . . 128 8.3 Salientregions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128 8.3.1 Corners . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 8.3.2 Scalesaliencyregions . . . . . . . . . . . . . . . . . . . 129 8.3.3 Scale-invariantfeaturetransforms . . . . . . . . . . . . . 131 8.4 Detectingmovingobjects . . . . . . . . . . . . . . . . . . . . . . . . 132 8.4.1 Backgroundsubtraction . . . . . . . . . . . . . . . . . . 132 8.4.2 Blockmatching . . . . . . . . . . . . . . . . . . . . . . 133 8.5 Domain-specificapproaches . . . . . . . . . . . . . . . . . . . . . . . 135 8.6 Identifyinguniqueobjects . . . . . . . . . . . . . . . . . . . . . . . . 136 8.7 Postprocessingforobjectidentification . . . . . . . . . . . . . . . . . 138 8.8 Representationoftheobjects . . . . . . . . . . . . . . . . . . . . . . 138 8.9 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 8.10 Suggestionsforfurtherreading . . . . . . . . . . . . . . . . . . . . . 139 9 ExtractingFeaturesDescribingtheObjects 141 9.1 Generalrequirementsforafeature . . . . . . . . . . . . . . . . . . . 142 9.2 Simplefeatures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144 x Contents 9.3 Shapefeatures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146 9.4 Texturefeatures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149 9.5 Problem-specificfeatures . . . . . . . . . . . . . . . . . . . . . . . . 153 9.6 Postprocessingthefeatures . . . . . . . . . . . . . . . . . . . . . . . 157 9.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159 9.8 Suggestionsforfurtherreading . . . . . . . . . . . . . . . . . . . . . 160 10 ReducingtheDimensionoftheData 161 10.1 Theneedfordimensionreduction . . . . . . . . . . . . . . . . . . . . 162 10.2 Featuretransformmethods . . . . . . . . . . . . . . . . . . . . . . . 164 10.2.1 Principalcomponentanalysis . . . . . . . . . . . . . . . 164 10.2.2 Extensionsofprincipalcomponentanalysis . . . . . . . . 166 10.2.3 Randomprojections . . . . . . . . . . . . . . . . . . . . 166 10.2.4 Multidimensionalscaling . . . . . . . . . . . . . . . . . 167 10.2.5 FastMap . . . . . . . . . . . . . . . . . . . . . . . . . . 167 10.2.6 Self-organizingmaps . . . . . . . . . . . . . . . . . . . 168 10.3 Featuresubsetselectionmethods . . . . . . . . . . . . . . . . . . . . 168 10.3.1 Filtersforfeatureselection . . . . . . . . . . . . . . . . 169 10.3.2 Wrappermethods . . . . . . . . . . . . . . . . . . . . . 171 10.3.3 Featureselectionforregression . . . . . . . . . . . . . . 172 10.4 Domain-specificmethods . . . . . . . . . . . . . . . . . . . . . . . . 172 10.5 Representationofhigh-dimensionaldata . . . . . . . . . . . . . . . . 174 10.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175 10.7 Suggestionsforfurtherreading . . . . . . . . . . . . . . . . . . . . . 175 11 FindingPatternsintheData 177 11.1 Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178 11.1.1 Partitionalalgorithms . . . . . . . . . . . . . . . . . . . 179 11.1.2 Hierarchicalclustering . . . . . . . . . . . . . . . . . . . 180 11.1.3 Graph-basedclustering . . . . . . . . . . . . . . . . . . 180 11.1.4 Observationsandfurtherreading . . . . . . . . . . . . . 183 11.2 Classification. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184 11.2.1 k-nearestneighborclassifier . . . . . . . . . . . . . . . . 185 11.2.2 NaïveBayesclassifier . . . . . . . . . . . . . . . . . . . 185 11.2.3 Decisiontrees . . . . . . . . . . . . . . . . . . . . . . . 186 11.2.4 Neuralnetworks . . . . . . . . . . . . . . . . . . . . . . 189 11.2.5 Supportvectormachines . . . . . . . . . . . . . . . . . . 193 11.2.6 Ensemblesofclassifiers . . . . . . . . . . . . . . . . . . 194 11.2.7 Observationsandfurtherreading . . . . . . . . . . . . . 196 11.3 Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199 11.3.1 Statisticaltechniques. . . . . . . . . . . . . . . . . . . . 200 11.3.2 Machinelearningtechniques . . . . . . . . . . . . . . . 200 11.4 Associationrules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201 11.5 Tracking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202 11.6 Outlieroranomalydetection . . . . . . . . . . . . . . . . . . . . . . 204 11.7 Relatedtopics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205 11.7.1 Distancemetrics . . . . . . . . . . . . . . . . . . . . . . 205

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.