ebook img

New Trends in Data Warehousing and Data Analysis (Annals of Information Systems) PDF

356 Pages·2009·10.74 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview New Trends in Data Warehousing and Data Analysis (Annals of Information Systems)

New Trends in Data Warehousing and Data Analysis Annals of Information Systems Volume 1: Managing in the Information Economy: Current Research Issues Uday Apte, Uday Karmarkar, eds. Volume 2: Decision Support for Global Enterprises Uday Kulkarni, Daniel J. Power, Ramesh Sharda, eds. New Trends in Data Warehousing and Data Analysis Editors Stanisław Kozielski Robert Wrembel 1 3 Editors Stanisław Kozielski Robert Wrembel Silesian University of Technology Poznań University of Technology Gliwice, Poland Poznań, Poland [email protected] [email protected] ISSN: 1934-3221 ISBN-13: 978-0-387-87430-2 e-ISBN-13: 978-0-387-87431-9 DOI: 10.1007/978-0-387-87430-2 Library of Congress Control Number: 2008934456 © 2009 by Springer Science+Business Media, LLC All rights reserved. This work may not be translated or copied in whole or in part without the written permission of the publisher (Springer Science+Business Media, LLC, 233 Spring Street, New York, NY 10013, USA), except for brief excerpts in connection with reviews or scholarly analysis. Use in connection with any form of information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now know or hereafter developed is forbidden. The use in this publication of trade names, trademarks, service marks and similar terms, even if the are not identified as such, is not to be taken as an expression of opinion as to whether or not they are subject to proprietary rights. Printed on acid-free paper springer.com Preface Amodernwayofmanagingenterprises,institutions,andorganizationsisbasedon knowledge,thatinturn,isgainedfromdataanalysis.Inpractice,businessdecisions are taken based on the analysis of past and current data, continuously collected during the lifetime of an enterprise. A data analysis technology, widely accepted byresearchandindustry,isbasedondatawarehousearchitecture.Inthisarchitec- ture,datacomingfrommultipledistributedandheterogeneousstoragesystemsare integrated in a central repository, called a data warehouse (DW). Such integrated dataareanalyzedbytheso-calledOn-LineAnalyticalProcessing(OLAP)queries and data mining applications for the purpose of: analyzing business performance measures,discoveringandanalyzingtrends,discoveringanomaliesandpatternsof behavior,findinghiddendependenciesbetweendataaswellaspredictingbusiness trendsandsimulatingbusinesssolutions.TheDWandOLAPtechnologiesareap- pliedinmultipleareasincludingamongothers:salesbusiness,stockmarket,bank- ing,insurance,energymanagement,telecommunication,medicine,andscience. Fromatechnicalpointofview,adatawarehouseisalargedatabase,whosesize ranges from several hundreds of GB to several dozens of TB or even several PB. The size of a DW, high complexity of OLAP queries and data mining algorithms as well as the heterogeneous nature of data being integrated in a DW cause seri- ousresearchandtechnologicalchallenges.Intensiveresearchhasbeenconductedin multiplefieldsincludingamongothers:(1)conceptualmodelingofDWsandlogical datamodels,(2)DWloading(refreshing),(3)assuringefficientexecutionofOLAP queriesanddataminingalgorithms,(4)managingmaterializedviews,(5)dataanal- ysistechniques,(6)metadatamanagement,(7)managingtheevolutionofDWs,(7) stream, real-time, and active data warehouses, and (8) warehousing complex data (e.g,XML,multimedia,object,spatial). Typically, a DW applies the so-called multidimensional data model. In this model, analyzed data, called facts, are referenced by multiple dimensions that set upthecontextofananalysis.Suchmulti-dimensionalspacesarecalleddatacubes. In practice, a data cube can be implemented either in relational OLAP (ROLAP) serversorinmultidimensionalOLAP(MOLAP)servers. v vi Preface A DW is filled with data by means of the so-called Extraction-Transformation- Loading(ETL)processes.Theyareresponsibleamongothersfor:extractingandfil- teringdatafromdatasources,transformingdataintoacommondatamodel,cleans- ingdatainordertoremoveinconsistencies,duplicates,andnullvalues,integrating cleansed data into one common data set, computing summaries, and loading data intoaDW. OLAPqueriestypicallyjoinmultipletables,filterandsortdata,computeaggre- gates and use complex groupings. Since these queries are very complex and they oftenreadterabytesofdata,theirexecutionmaytakedozensofminutes,hours,or even days. Therefore, one of the most important research and technological prob- lems concerns assuring an acceptable performance of OLAP queries. Well devel- opedsolutionstothisproblemarebasedonmaterializedviewsandqueryrewriting, advanced index structures as well as on parallel processing and data partitioning techniques. MaterializedviewsappliedtostoringprecomputedresultsofOLAPqueriesare oneofbasicDWobjects.Multiplesolutionshavebeenproposedforselectingopti- mal sets of materialized views for a given query workload, for efficient refreshing ofmaterializedviewsaswellasforassuringmaterializedviewconsistencyduring aDWrefreshing. In order to support OLAP processing, a standard SQL has been extended with multiple clauses as well as with predefined specialized analytical and forecasting functions.Moreover,multipledataminingalgorithmshavebeenintegratedintoDW managementsystems. Inordertoworkproperlyandefficiently,alloftheabovementionedtechniques needtousemetadata.ManagingvarioustypesofmetadatainDWshasalsoreceived a lot of attention, resulting in widely accepted industry standard CWM, supported bymajorDWsoftwarevendors. An inherent feature of data sources is their evolution in time that concerns not only their content but also their structures. The evolution of structures of data sources has an impact on deployed ETL processes as well as on the structure of aDW.Themostadvancedapproachestohandlingtheevolutionofdatasourcesare basedontemporalextensionsandversioningmechanisms. TraditionalDWsarerefreshedperiodicallyovernightandusersexecutetheiran- alytical applications after a DW refreshing. For some DW applications, however, suchaDWusageisinappropriate.Forexample,monitoringtheintensityoftraffic, controllingphysicalparametersoftechnologicalprocesses,monitoringpatients’vi- talsignsbymeansofvariouskindsofsensorsrequireanon-the-flyanalysis.Moni- toringcreditcardusageinordertodetectunauthorizedusagerequiresfrequentDW refreshingandmechanismsofautomaticdataanalysis.Suchrequirementsledtothe developmentofstreamdataanalysissystems,activeandreal-timedatawarehouses. Modern information systems often store huge amounts of data in various Web systems.Typically,dataarerepresentedinthesesystemsasXMLandHTMLdocu- ments,images,maps,anddataofothercomplexstructures.Thesedataareasimpor- tantastraditionaltextandnumericaldata,andtherefore,thereisaneedtoanalyze theminasimilarwayasintraditionalDWs.Thisrequirementmotivatesresearchers Preface vii todesignandbuilddatawarehousesfromWebdatasourcesandtoprovideOLAP functionalityforXMLdocuments,multimediadata,andspatialdata. Despite substantial achievements in the DW and OLAP technologies that have beenmadeforthepast30years,DWandOLAPtechnologiesstillareandwillbe veryactiveareasofresearchandtechnologicaldevelopment. TheaimofthisspecialissueoftheAnnalsofInformationSystemsistopresent currentadvancesinthedatawarehouseandOLAPtechnologiesaswellastopoint totheareasforfurtherresearch.Theissueiscomposedof15chaptersthatingeneral address the following research and technological areas: advanced technologies for buildingXML,spatial,temporal,andreal-timedatawarehouses,novelapproaches toDWmodeling,datastorageanddataaccessstructures,aswellasadvanceddata analysistechniques. Chapter 1 overviews open research problems concerning building data ware- houses for integrating and analyzing various complex types of data, dealing with temporalaspectsofdata,handlingimprecisedata,andensuringprivacyinDWs. Chapter2discusseschallengesindesigningETLprocessesforreal-time(ornear real-time)datawarehousesandproposesanarchitectureofareal-timeDWsystem. Chapter3discussesdatawarehousemodelingtechniques,basedonmultidimen- sionalmodeling.Inparticular,thechaptercoversconceptual,logical,andphysical modeling. Chapter 4 proposes an approach to personalizing a multidimensional database. Theauthorspresentamodelandalanguagethatallowtodefineuserpreferenceson schema elements. User preferences are expressed by means of weights associated withschemaelementsandtheyexpressuserinterestindata. Chapter5coversdesigningspatial(geographical)datawarehouses.Itproposesa metamodelforthesupportofthedesignofspatialdimensionalschemas. Chapter6presentsatechniqueforapproximateansweringrange-sumquerieson datacubes.Tothisend,theauthorsproposetree-baseddatastructuresforseparately storingsampleddataandoutliersdata.Theproposedtechniqueassuresagoodqual- ityofapproximateanswers. Chapter7addressesaproblemofsummarizingmultidimensionalsearchspaces, called data cubes. The authors propose the concept of the so-called closed cube, whichisacoverforadatacube.Theauthorsshowthattheclosedcubeissmaller than its competitor, i.e. a quotient cube, and it can be used for deriving a quotient cube. Chapter 8 analyzes multiple index structures for multiversion data warehouses. In particular, the paper describes how to extend index structures designed for data withlinearevolutioninordertohandledatawithbranchedevolutionanditprovides ananalyticalmodelforcomparingvariousindexstructuresformultiversionDWs. Chapter 9 discusses the application of WAH compressed bitmap indexes to in- dexing text data for full-text search. The chapter also presents performance char- acteristics of the proposed indexing technique in three systems, namely MySQL, FastBit,andMonetDB. Chapter 10 proposes the optimization of OLAP queries by means of applying horizontal partitioning of tables and bitmap join indexes. The partitioning schema viii Preface and the set of bitmap indexes are selected by means of genetic and greedy algo- rithms.Theproposedoptimizationtechniquesarevalidatedexperimentally. Chapter11discussestheapplicationofthex-BR-treeindextospatialdata.The chapterprovidesananalyticalcostmodelofspatialquerieswiththesupportofthe x-BR-treeindex,followedbyitsexperimentalevaluation. Chapter 12 proposes a formal model for representing spatio-temporal and non- spatialdataaboutmovingobjects.Basedonthemodel,theauthorsproposeaquery language allowing to query such data. The main idea is based on replacing object trajectoriesbysequencesofobjectstopsandmoves. Chapter 13 addresses the problem of data mining in a multidimensional space in a data warehouse. The authors propose a compact representation of sequential patterns,calledclosedmultidimensionalsequentialpatterns,whichallowstoreduce thesearchspace.Theproposedrepresentationandminingalgorithmsarefollowed byexperimentalevaluation. Chapter 14 presents issues on modeling and querying temporal semistructured datawarehouses.SuchaDWismodeledasagraphwithlabelednodesandedges. The temporal aspect is added to the graph by means of labels denoting validity times.Themodelissupportedwithaquerylanguagebasedonpathexpressions. Chapter 15 contributes a multidimensional data model of a data warehouse, called "galaxy", that supports the analysis of XML documents. The authors also propose a technique and a tool for integrating XML documents and loading them intoaDW. Acknowledgements The editors would like to acknowledge the help of all involved in the review process of this special issue of the Annals of Information Systems. The reviewers providedcomprehensive,critical,andconstructivecomments.Withouttheirsupport theprojectcouldnothavebeensatisfactorilycompleted. Thealfabeticallyorderedlistofreviewersincludes: • CarloCombi,UniversityofVerona,Italy • KarenDavis,UniversityofCincinnati,USA • PedroFurtado,UniversityofCoimbra,Portugal • MarcinGorawski,SilesianUniversityofTechnology,Poland • CarlosHurtado,UniversidaddeChile,Chile • KrzysztofJankiewicz,Poznan´ UnviersityofTechnology,Poland • RokiaMissaoui,UniversiteduQuebecenOutaouais,Canada • TadeuszMorzy,Poznan´ UnviersityofTechnology,Poland • MikolajMorzy,Poznan´ UnviersityofTechnology,Poland • StefanoRizzi,UniversityofBologna,Italy • AlkisSimitsis,IBMAlmadenResearchCenter,USA Preface ix • JerzyStefanowski,Poznan´ UnviersityofTechnology,Poland • AlejandroVaisman,UniversityofBuenosAires,Argentina • KeshengWu,UniversityofCalifornia,USA TheeditorswouldalsoliketothankProfessorRameshShardaforhisinvitation toguesteditthisspecialissue. GliwiceandPoznan´,Poland StanisławKozielski June2008 RobertWrembel

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.