ebook img

Big Data A Tutorial-Based Approach PDF

203 Pages·2019·4.228 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Big Data A Tutorial-Based Approach

Big Data A Tutorial-Based Approach Big Data A Tutorial-Based Approach Nasir Raheem CRC Press Taylor & Francis Group 6000 Broken Sound Parkway NW, Suite 300 Boca Raton, FL 33487-2742 © 2019 by Taylor & Francis Group, LLC CRC Press is an imprint of Taylor & Francis Group, an Informa business No claim to original U.S. Government works Printed on acid-free paper International Standard Book Number-13: 978-0-367-18345-5 (Hardback) This book contains information obtained from authentic and highly regarded sources. Reasonable efforts have been made to publish reliable data and information, but the author and publisher cannot assume responsibility for the validity of all materials or the consequences of their use. The authors and publishers have attempted to trace the copyright holders of all material reproduced in this publication and apologize to copyright holders if permission to publish in this form has not been obtained. If any copyright material has not been acknowledged please write and let us know so we may rectify in any future reprint. Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying, microfilm- ing, and recording, or in any information storage or retrieval system, without written permission from the publishers. For permission to photocopy or use material electronically from this work, please access www.copyright.com (http://www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-750- 8400. CCC is a not-for-profit organization that provides licenses and registration for a variety of users. For organizations that have been granted a photocopy license by the CCC, a separate system of payment has been arranged. Trademark Notice: Product or corporate names may be trademarks or registered trade- marks, and are used only for identification and explanation without intent to infringe. Library of Congress Cataloging-in-Publication Data Names: Raheem, Nasir, author. Title: Big data : a tutorial-based approach / Nasir Raheem. Description: First edition. | Boca Raton, FL : Taylor & Francis Group, [2019] | Includes bibliographical references and index. Identifiers: LCCN 2018060975| ISBN 9780367183455 (hardback : acid-free paper) | ISBN 9780429060939 (ebook) Subjects: LCSH: Big data--Programmed instruction. Classification: LCC QA76.9.B45 R34 2019 | DDC 005.7--dc23 LC record available at https://lccn.loc.gov/2018060975 Visit the Taylor & Francis Web site at http://www.taylorandfrancis.com and the CRC Press Web site at http://www.crcpress.com Dedicated to my wife, Saba, for her support and endurance. Contents List of Tutorials, xiii List of Figures/Illustrations, xv Foreword, xvii Preface, xix Acknowledgements, xxiii Author, xxv Chapter 1 ◾ I ntroduction to Big Data 1 OVERVIEW 1 RAPID GROWTH OF BIG DATA 1 BIG DATA DEFINITION 3 BIG DATA PROJECTS 4 BUSINESS VALUE OF BIG DATA 5 Chapter 2 ◾ B ig Data Implementation 9 OVERVIEW 9 HIGH-LEVEL TASKS TO IMPLEMENT INFORMATICA BDM, CLOUDERA HIVE, AND TABLEAU 10 BIG DATA TRIGGERS DIGITAL TRANSFORMATION OF THE PRODUCTION MODEL 11 vii viii ◾ Contents BIG DATA CHALLENGES AND ASSOCIATED USE CASES 13 HADOOP INFRASTRUCTURE: OVERVIEW 14 HADOOP INFRASTRUCTURE: DEFINED 15 Hyperconverged Hadoop Infrastructure 15 Compute Hardware Components 16 Network Hardware Components 17 Storage Hardware Architecture and Components 19 HADOOP ECO SYSTEM 20 HADOOP: JVM FRAMEWORK 22 HADOOP DISTRIBUTED FILE PROCESSING 22 MapReduCe SOFTWARE 26 MapReduCe SOFTWARE INSTALLATION 27 MapReduCe PROCESSING 28 Chapter 3 ◾ B ig Data Use Cases 33 OVERVIEW 33 BIG DATA USE CASE: HEALTH 33 BIG DATA USE CASE: MANUFACTURING 35 BIG DATA USE CASE: INSURANCE 36 Chapter 4 ◾ B ig Data Migration 39 OVERVIEW 39 CHALLENGES IN MIGRATING ORACLE DATA USING SQOOP 41 WHERE IS SQOOP USED? 41 SQOOP COMMANDS 42 HIVE ARGUMENTS USED BY SQOOP 43 APACHE SQOOP ARCHITECTURE 44 APACHE SQOOP COMMAND LINE INTERFACE 45 Contents   ◾   ix Chapter 5 ◾ B ig Data Ingestion, Integration, and Management 49 OVERVIEW 49 INFORMATICA: MATURE AND COMPREHENSIVE BIG DATA SOLUTION 50 INFORMATICA DATA INTEGRATION 52 Chapter 6 ◾ B ig Data Repository 59 OVERVIEW 59 DATA REPOSITORY LAYER 61 HIVE BIG DATA WAREHOUSE 62 SLOWLY CHANGING DIMENSION IN HIVE 63 HIVE METADATA: DEFINITIONS 65 INTEGRATED USE OF DATA INTEGRATION, DATA MANAGEMENT, AND DATA VISUALIZATION TOOLS 72 Chapter 7 ◾ B ig Data Visualization 75 OVERVIEW 75 VARIABLE TYPES 83 Numbers 83 Strings 85 Factors 86 SUCCESS FACTORS FOR TABLEAU 87 TABLEAU: STEP FORWARD IN DATA ANALYTICS 88 TABLEAU CONNECTORS FOR DATA SOURCES 93 TABLEAU DATA ENGINE TUNING 93 TABLEAU TUNING FEATURES 100 Fast Interactive Query Engine 100 Strategically Utilize Live Connections versus Extracts 100 Curate Data from the Data Lake 100

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.