ebook img

Managing Data in Motion: Data Integration Best Practice Techniques and Technologies PDF

203 Pages·2013·5.27 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Managing Data in Motion: Data Integration Best Practice Techniques and Technologies

Managing Data in Motion This pageintentionallyleftblank Managing Data in Motion Data Integration Best Practice Techniques and Technologies April Reeve AMSTERDAM(cid:129)BOSTON(cid:129)HEIDELBERG(cid:129)LONDON NEWYORK(cid:129)OXFORD(cid:129)PARIS(cid:129)SANDIEGO SANFRANCISCO(cid:129)SINGAPORE(cid:129)SYDNEY(cid:129)TOKYO MorganKaufmannisanimprintofElsevier AcquiringEditor:AndreaDierna DevelopmentEditor:HeatherScherer ProjectManager:MohanambalNatarajan Designer:RussellPurdy MorganKaufmannisanimprintofElsevier 225WymanStreet,Waltham,MA02451,USA Copyrightr2013ElsevierInc.Allrightsreserved. Nopartofthispublicationmaybereproducedortransmittedinanyformorbyanymeans, electronicormechanical,includingphotocopying,recording,oranyinformationstorage andretrievalsystem,withoutpermissioninwritingfromthepublisher.Detailsonhowto seekpermission,furtherinformationaboutthePublisher’spermissionspoliciesandour arrangementswithorganizationssuchastheCopyrightClearanceCenterandtheCopyright LicensingAgency,canbefoundatourwebsite:www.elsevier.com/permissions. Thisbookandtheindividualcontributionscontainedinitareprotectedundercopyright bythePublisher(otherthanasmaybenotedherein). Notices Knowledge and best practice in this field are constantly changing. As new research and experiencebroadenourunderstanding,changesinresearchmethodsorprofessionalpractices, maybecomenecessary.Practitionersandresearchersmustalwaysrelyontheirown experienceandknowledgeinevaluatingandusinganyinformationormethodsdescribed herein.Inusingsuchinformationormethodstheyshouldbemindfuloftheirownsafety andthesafetyofothers,includingpartiesforwhomtheyhaveaprofessionalresponsibility. Tothefullestextentofthelaw,neitherthePublishernortheauthors,contributors,or editors,assumeanyliabilityforanyinjuryand/ordamagetopersonsorpropertyasa matterofproductsliability,negligenceorotherwise,orfromanyuseoroperationof anymethods,products,instructions,orideascontainedinthematerialherein. LibraryofCongressCataloging-in-PublicationData Applicationsubmitted BritishLibraryCataloguing-in-PublicationData AcataloguerecordforthisbookisavailablefromtheBritishLibrary ISBN:978-0-12-397167-8 ForinformationonallMKpublications visitourwebsiteatwww.mkp.com PrintedintheUSA 13 14 15 16 17 10 9 8 7 6 5 4 3 2 1 For my sons Henry who knows everything and, although he hasn’t figured out exactly what I do for a living, advised me to “put words on paper” and David who is so talented, so much fun to be with, and always willing to go with me to Disney. This page intentionallyleftblank Contents Foreword..................................................................................................................xv Acknowledgements...............................................................................................xvii Biography................................................................................................................xix Introduction.............................................................................................................xxi PART 1 INTRODUCTION TO DATA INTEGRATION Chapter 1 The Importance of Data Integration............................3 Thenatural complexityof data interfaces........................................3 Therise ofpurchased vendor packages............................................4 Key enablement ofbigdata and virtualization.................................5 Chapter 2 What Is Data Integration?..........................................7 Data inmotion...................................................................................7 Integrating into acommon format—transformingdata....................7 Migratingdata from one system toanother......................................8 Moving data around the organization...............................................9 Pulling information from unstructured data....................................11 Moving process todata...................................................................12 Chapter 3 Types and Complexity of Data Integration..................15 Thedifferences and similaritiesinmanaging data inmotion andpersistent data...........................................................................15 Batch data integration......................................................................16 Real-time data integration...............................................................16 Bigdata integration.........................................................................17 Data virtualization...........................................................................17 Chapter 4 The Process of Data Integration Development ...........................................................19 Thedata integration development life cycle...................................19 Inclusion ofbusiness knowledge andexpertise..............................20 PART 2 BATCH DATA INTEGRATION Chapter 5 Introduction to Batch Data Integration.......................25 What is batch data integration?.......................................................25 Batch data integration life cycle.....................................................26 vii viii Contents Chapter 6 Extract, Transform, and Load....................................29 What is ETL?...................................................................................29 Profiling...........................................................................................30 Extract..............................................................................................30 Staging.............................................................................................31 Access layers...................................................................................32 Transform.........................................................................................33 Simple mapping..........................................................................33 Lookups.......................................................................................33 Aggregation andnormalization..................................................33 Calculation..................................................................................34 Load.................................................................................................34 Chapter 7 Data Warehousing ...................................................37 What is data warehousing?..............................................................37 Layers inan enterprise data warehousearchitecture......................38 Operational applicationlayer.....................................................38 External data...............................................................................38 Data staging areas coming into a data warehouse.....................39 Data warehouse data structure....................................................40 Staging fromdata warehouse todata mart or business intelligence...................................................................40 Business Intelligence Layer........................................................40 Types ofdata toload in adata warehouse.....................................41 Master data ina data warehouse................................................41 Balance andsnapshotdata in adata warehouse........................42 Transactional data in adata warehouse.....................................43 Events..........................................................................................43 Reconciliation.............................................................................43 Interview with anexpert:Krish Krishnanon data warehousing and data integration............................................44 Chapter 8 Data Conversion ......................................................51 What is data conversion?................................................................51 Data conversion life cycle...............................................................51 Data conversion analysis.................................................................52 Best practice data loading...............................................................52 Improving sourcedata quality.........................................................53 Contents ix Mapping totarget..........................................................................53 Configuration data.........................................................................54 Testing and dependencies..............................................................55 Private data....................................................................................55 Proving...........................................................................................56 Environments.................................................................................56 Chapter 9 Data Archiving.......................................................59 What is data archiving?.................................................................59 Selecting data toarchive...............................................................60 Can the archiveddata be retrieved?..............................................60 Conformingdata structures inthe archiving environment...........61 Flexible data structures..................................................................61 Interview with an expert: John Anderson on data archiving and data integration.......................................................62 Chapter 10 Batch Data Integration Architecture and Metadata...............................................................67 What is batch data integration architecture?.................................67 Profilingtool..................................................................................67 Modelingtool.................................................................................68 Metadata repository.......................................................................69 Data movement..............................................................................69 Transformation...............................................................................70 Scheduling......................................................................................71 Interview with an expert: AdrienneTannenbaum on metadata and data integration........................................................73 PART 3 REAL TIME DATA INTEGRATION Chapter 11 Introduction to Real-Time Data Integration ..............77 Why real-time data integration?....................................................77 Why two sets oftechnologies?......................................................78 Chapter 12 Data Integration Patterns........................................79 Interaction patterns........................................................................79 Loose coupling...............................................................................79 Hub andspoke...............................................................................80 Synchronous and asynchronous interaction..................................83

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.