ebook img

Data Lake for Enterprises: Leveraging Lambda Architecture for building Enterprise Data Lake PDF

585 Pages·2017·36.552 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Data Lake for Enterprises: Leveraging Lambda Architecture for building Enterprise Data Lake

WOW! eBook www.wowebook.org Data Lake for Enterprises Leveraging Lambda Architecture for building Enterprise Data Lake Tomcy John Pankaj Misra BIRMINGHAM - MUMBAI WOW! eBook www.wowebook.org Data Lake for Enterprises Copyright © 2017 Packt Publishing All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews. Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the authors, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book. Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information. First published: May 2017 Production reference: 1300517 Published by Packt Publishing Ltd. Livery Place 35 Livery Street Birmingham B3 2PB, UK. ISBN 978-1-78728-134-9 www.packtpub.com WOW! eBook www.wowebook.org Credits Authors Copy Editors Tomcy John Shaila Kusanale Pankaj Misra Vikrant Phadkay Reviewers Project Coordinator Wei Di Nidhi Joshi Vivek Mishra Ruben Oliva Ramos Commissioning Editor Proofreader Amey Varangaonkar Safis Editing Acquisition Editor Indexer Chaitanya Nair Mariammal Chettiyar Content Development Editor Production Coordinator Aishwarya Pandere Aparna Bhagat Technical Editor Karan Thakkar WOW! eBook www.wowebook.org Foreword As organizations have evolved over the last 40 to 50 years, they have slowly but steadily found ways and means to improve their operations by adding IT/software systems across their operating areas. It would not be surprising today to see more than 250+ applications in each of our Fortune 200 companies. This has also slowly caused another creeping problem as we evolve from our level of maturity to another; silos of systems that don’t interface well to each other. As enterprises move from local optimization to enterprise optimization they have been leveraging some of the emerging technologies like Big Data systems to find ways and means by which they could bring data together from their disparate IT systems and fuse them together to find better means of driving efficiency and effectiveness improvement that could go a long way in helping enterprises save money. Tomcy and Pankaj, with their vast experience in different functional and technical domains, have been working on finding better ways to fuse information from variety of applications within the organization. They have lived through the challenging journey of finding a ways to bring out changes (technological & cultural). This book has been put together from the perspective of software engineers, architects and managers; so it’s very practical in nature as both of them have lived through various enterprise grade implementation that adds value to the enterprise. Using future proof patterns and contemporary technology concepts like Data Lake help enterprises prepare themselves well for the future, but even more given them the ability to look across data that they have across different organizational silos and derive wisdom that’s typically lost in the blind spots. Thomas Benjamin CTO, GE Aviation Digital. WOW! eBook www.wowebook.org About the Authors Tomcy John lives in Dubai (United Arab Emirates), hailing from Kerala (India), and is an enterprise Java specialist with a degree in engineering (B Tech) and over 14 years of experience in several industries. He's currently working as principal architect at Emirates Group IT, in their core architecture team. Prior to this, he worked with Oracle Corporation and Ernst & Young. His main specialization is in building enterprise-grade applications and he acts as chief mentor and evangelist to facilitate incorporating new technologies as corporate standards in the organization. Outside of his work, Tomcy works very closely with young developers and engineers as mentors and speaks at various forums as a technical evangelist on many topics ranging from web and middleware all the way to various persistence stores. He writes on various topics in his blog and www.javacodebook.com. First and foremost, I would like to thank my savior and lord, Jesus Christ, for giving me strength and courage to pursue this project. It was a dream come true. I would like to dedicate this book to my father (Appachan), Late C.O.John, and my dearest mom (Ammachi), Leela John, for helping me reach where I am today. I would also like to take this opportunity to thank my dearest wife, Serene and our two lovely children, Neil (son) and Anaya (daughter), for all their support throughout this project and also for allowing me to pursue my dream and tolerating not being with them after my busy day job. It was my privilege working with my co-author, Pankaj. I take this opportunity to thank him for supporting me, when I first offloaded my dream of writing this book topic and then staying with me at all stages in completing this book. It wouldn't be possible to reach this stage in my career without mentors at various stages of my career. I would like to thank Thomas Benjamin (CTO, GE Aviation Digital), Rajesh R.V (chief architect, Emirates Group IT) and Martin Campbell (chief architect) for supporting me at various stages, with words of encouragement and wealth of knowledge. WOW! eBook www.wowebook.org Pankaj Misra has been a technology evangelist, holding a bachelor’s degree in engineering, with over 16 years of experience across multiple business domains and technologies. He has been working with Emirates Group IT since 2015, and has worked with various other organizations in the past. He specializes in architecting and building multi-stack solutions and implementations. He has also been a speaker at technology forums in India and has built products with scale-out architecture that support high-volume, near-real-time data processing and near-real-time analytics. This book has been a great opportunity for me and would always be an exceptional example of collaboration and knowledge sharing with my co-author Tomcy. I am extremely thankful to him for entrusting me with this responsibility and standing by me at all times. I would like to dedicate this book to my father B. Misra and my mother Geeta Misra who have always been one of the most special people to me. I am extremely grateful to my wife Priti and my kids, daughter Eva and son Siddhant, for their understanding, support and helping me out in every possible way to complete the book. This book is a medium to give back the knowledge that I have gained by working with many of the amazing people throughout the years. I would like to thank Rajesh R.V. (chief Architect, Emirates Group IT) and Thomas Benjamin (CTO, GE Aviation) for always motivating, helping & supporting us. WOW! eBook www.wowebook.org About the Reviewers Wei Di is currently a staff member in a business analytics data mining team. As a data scientist, she is passionate about creating smart and scalable analytics and data mining solutions that can impact millions of individuals and empower successful business. Her interests also cover wide areas, including artificial intelligence, machine learning, and computer vision. She was previously associated with the eBay human language technology team and eBay research labs, with focus on image understanding for large-scale application and joint learning from both visual and text information. Prior to that, she was with Ancestry.com, working on large-scale data mining and machine learning models in the areas of record linkage, search relevance, and ranking. She received her PhD from Purdue University in 2011 with focus on data mining and image classification. Vivek Mishra is an IT professional with more than 9 years of experience in various technologies like Java, J2ee, Hibernate, SCA4J, Mule, Spring, Cassandra, HBase, MongoDB, REDIS, Hive, Hadoop. He has been a contributor to open source software such as Apache Cassandra and lead committer for Kundera(a JPA 2.0-compliant object-datastore mapping library for NoSQL Datastores such as Cassandra, HBase, MongoDB, and REDIS). Vivek, in his previous experience, has enjoyed long-lasting partnerships with the most recognizable names in SCM, banking and finance industries, employing industry-standard, full-software life cycle methodologies such as Agile and SCRUM. He is currently employed with Impetus Infotech. He has undertaken speaking engagements in cloud camp and Nasscom big data seminars and is an active blogger at mevivs.wordpress.com. WOW! eBook www.wowebook.org Rubén Oliva Ramos is a computer systems engineer with a master's degree in computer and electronic systems engineering, teleinformatics, and networking specialization from University of Salle Bajio in Leon, Guanajuato, Mexico. He has more than 5 years of experience in developing web applications to control and monitor devices connected with Arduino and Raspberry Pi using web frameworks and cloud services to build Internet of Things applications. He is a mechatronics teacher at the University of Salle Bajio and teaches students of master's in design and engineering of mechatronics Systems. He also works at Centro de Bachillerato Tecnologico Industrial 225 in Leon, Guanajuato, Mexico, teaching the following: electronics, robotics and control, automation, and microcontrollers at Mechatronics Technician Career. He has worked on consultant and developer projects in areas such as monitoring systems and datalogger data using technologies such as Android, iOS, Windows Phone, Visual Studio .NET, HTML5, PHP, CSS, Ajax, JavaScript, Angular, ASP .NET databases (SQlite, mongoDB, and MySQL), and web servers (Node.js and IIS). Ruben has done hardware programming on Arduino, Raspberry Pi, Ethernet Shield, GPS and GSM/GPRS, ESP8266, and control and monitor systems for data acquisition and programming. He has written the book titled Internet of Things Programming with JavaScript, Packt. His current job involves monitoring, controlling, and acquisition of data with Arduino and Visual Basic .NET for Alfaomega Editor Group. "I want to thank God for helping me reviewing this book, to my wife, Mayte, and my sons, Ruben and Dario, for their support, to my parents, my brother and sister whom I love and to all my beautiful family." WOW! eBook www.wowebook.org www.PacktPub.com For support files and downloads related to your book, please visit www.PacktPub.com. Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.comand as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at [email protected] for more details. At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters and receive exclusive discounts and offers on Packt books and eBooks. https://www.packtpub.com/mapt Get the most in-demand software skills with Mapt. Mapt gives you full access to all Packt books and video courses, as well as industry-leading tools to help you plan your personal development and advance your career. Why subscribe? Fully searchable across every book published by Packt Copy and paste, print, and bookmark content On demand and accessible via a web browser WOW! eBook www.wowebook.org

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.