[ 1 ] Hadoop Backup and Recovery Solutions Learn the best strategies for data recovery from Hadoop backup clusters and troubleshoot problems Gaurav Barot Chintan Mehta Amij Patel BIRMINGHAM - MUMBAI Hadoop Backup and Recovery Solutions Copyright © 2015 Packt Publishing All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews. Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the authors, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book. Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information. First published: July 2015 Production reference: 1220715 Published by Packt Publishing Ltd. Livery Place 35 Livery Street Birmingham B3 2PB, UK. ISBN 978-1-78328-904-2 www.packtpub.com Credits Authors Project Coordinator Gaurav Barot Milton Dsouza Chintan Mehta Proofreader Amij Patel Safis Editing Reviewers Indexer Skanda Bhargav Tejal Soni Venkat Krishnan Stefan Matheis Graphics Jason Monteiro Manjeet Singh Sawhney Production Coordinator Commissioning Editor Aparna Bhagat Anthony Lowe Cover Work Acquisition Editor Aparna Bhagat Harsha Bharwani Content Development Editor Akashdeep Kundu Technical Editor Shiny Poojary Copy Editors Tani Kothari Kausambhi Majumdar Vikrant Phadke About the Authors Gaurav Barot is an experienced software architect and PMP-certified project manager with more than 12 years of experience. He has a unique combination of experience in enterprise resource planning, sales, education, and technology. He has served as an enterprise architect and project leader in projects in various domains, including healthcare, risk, insurance, media, and so on for customers in the UK, USA, Singapore, and India. Gaurav holds a bachelor's degree in IT engineering from Sardar Patel University, and has completed his post graduation in IT from Deakin University Melbourne. I would like to thank all my team members and fellow coauthors at KNOWARTH Technologies. I sincerely thank and appreciate the entire team at Packt Publishing for providing this opportunity. Thanks a lot to Akashdeep Kundu for his continuous support and patience throughout this project. Last but not least, I would like to thank my parents and my two younger sisters, Kinjal and Yogini, for their love and encouragement. A special thanks to my wife, Kruti, and my lovely daughter, Twisha. Both of them have been tolerant and understanding during all the time I spent on my computer while working on this book. Chintan Mehta is a cofounder of KNOWARTH Technologies (www.knowarth.com) and heads the cloud/RIMS department. He has rich, progressive experience in the AWS cloud, DevOps, RIMS, and server administration on open source technologies. Chintan's vital roles during his career in infrastructure and operations have included requirement analysis, architecture design, security design, high availability and disaster recovery planning, automated monitoring, automated deployment, build processes to help customers, performance tuning, infrastructure setup and deployment, and application setup and deployment. He has done all these along with setting up various offices in different locations with fantastic sole ownership to achieve operation readiness for the organizations he has been associated with. He headed and managed cloud service practices with his previous employer, and received multiple awards in recognition of the very valuable contribution he made to the business. He was involved in creating solutions and consulting for building SaaS, IaaS, and PaaS services on the cloud. Chintan also led the ISO 27001:2005 implementation team as a joint management representative, and has reviewed Liferay Portal Performance Best Practices, Packt Publishing. He completed his diploma in computer hardware and network certification from a reputed institute in India. I have relied on many people, both directly and indirectly, in writing this book. First of all, I would like to thank my coauthors and the great team at Packt Publishing for this effort. I would especially like to thank my wonderful wife, Mittal, and my sweet son, Devam, for putting up with the long days, nights, and weekends when I was camping in front of my laptop. Many people have inspired me, made contributions to this book, and provided comments, edits, insights, and ideas—specifically Parth Ghiya, Chintan Gajjar, and Nilesh Jain. Special thanks go to Samir Bhatt, who got me started on writing this book; there were several things that could have interfered with my book. I also want to thank all the reviewers of this book. Last but not least, I want to thank my parents and all my friends, family, and colleagues for supporting me throughout the writing of this book. Amij Patel is a cofounder of KNOWARTH Technologies (www.knowarth.com) and leads mobile, UI/UX, and e-commerce vertical. He is an out-of-the-box thinker with a proven track record of designing and delivering the best design solutions for enterprise applications and products. He has a lot of experience in the Web, portals, e-commerce, rich Internet applications, user interfaces, big data, and open source technologies. His passion is to make applications and products interactive and user friendly using the latest technologies. Amij has a unique ability—he can deliver or execute on any layer and technology from the stack. Throughout his career, he has been honored with awards for making valuable contributions to businesses and delivering excellence through different roles, such as a practice leader, architect, and team leader. He is a cofounder of various community groups, such as Ahmedabad JS and the Liferay UI developers' group. These are focused on sharing knowledge of UI technologies and upcoming trends with the broader community. Amij is respected as motivational, the one who leads by example, a change agent, and a proponent of empowerment and accountability. I would like to thank my coauthors, reviewers, and the Packt Publishing team for helping me at all the stages of this project. I would also like to thank my parents and family, especially my wife, Nehal Patel. She supported me when we were expecting a baby and I was writing this book. Then, I would like to thank my children, Urv and Urja, for adjusting to my late nights and weekends when I was busy writing this book. My special thanks to Parth Ghiya, who helped me a lot with brainstorming sessions, practical examples, and ideas to make the chapters more interactive. Also, I would like to thank Chintan Gajjar and Nilesh Jain for helping with the graphics and images as and when required. About the Reviewers Skanda Bhargav is an engineering graduate from Visvesvaraya Technological University (VTU) in Belgaum, Karnataka, India. He did his major in computer science engineering. He is a Cloudera-certified developer in Apache Hadoop. His interests are big data and Hadoop. He has been a reviewer of the following books and videos, all by Packt Publishing: • Building Hadoop Clusters [Video] • Hadoop Cluster Deployment • Instant MapReduce Patterns – Hadoop Essentials How-to • Cloudera Administration Handbook • Hadoop MapReduce v2 Cookbook – Second Edition I would like to thank my family for their immense support and faith in me throughout my learning stage. My friends have brought the confidence in me to a level that makes me bring the best out of myself. I am happy that God has blessed me with such wonderful people around me, without whom, my success as it is today would not have been possible. Venkat Krishnan is a programming expert who has spent 18 years in IT training and consulting in the areas of Java, Enterprise J2EE, Spring, Hibernate, web services, and Android. He spent 5 years in Hadoop training, consulting, and assisting organizations such as JPMC and the Microsoft India Development Center in the inception, incubation, and growth of big data innovations. Venkat has an MBA degree. He has mentored more than 5,000 participants in the area of big data on the Hadoop platform in Linux and Windows, and more than 10,000 participants in Java, J2EE, Spring, and Android. He has also mentored participants in Amdocs, Fidelity, Wells Fargo, TCS, HCL, Accenture, and other organizations in Hadoop with its ecosystem components, such as Hike, HBase, Pig, and Sqoop. Venkat has provided training for associates across the globe, in countries such as Japan, Australia, Europe, South Africa, USA, Mexico, Dubai, Oman, and others. I offer my humble thanks to my parents and all my teachers who have helped me learn and be open to new upcoming technologies. Also, I want to thank my wife, Raji, and my children, Disha and Dhruv, for supporting me in all of my new endeavors. In the ever-evolving world of technology, I consider myself lucky to be able to share my knowledge with the brilliant kids of tomorrow. Thanks to the Packt Publishing team for giving me the opportunity to review the work of some eminent people. Stefan Matheis is the CTO of Kellerkinder GmbH, based near Mannheim, Germany. Kellerkinder offers technical support for various projects based on PHP, as well as consulting and workshops for Apache Solr. His passion includes working in API development, natural language processing, graph databases, and infrastructure management. He has been an Apache Lucene/Solr committer since 2012, as well as a member of the project management committee. Stefan is also a speaker at various conferences, the first of which was the Lucene/Solr Revolution in Dublin, Ireland. The admin UI that is shipped with all releases since Solr 4.0 is what he is known for in the Solr community and among its users. He has reviewed a few books, one of which is Solr Cookbook – Third Edition, Packt Publishing. He can be contacted at [email protected]. Manjeet Singh Sawhney currently works for a large IT consultancy in London, UK, as a Principal Consultant - Information / Data Architect. Previously, he worked for global organizations in various roles, including Java development, technical solutions consulting, and data management consulting. During his postgraduate studies, he also worked as a Student Tutor for one of the top 100 universities in the world, where he was teaching Java to undergraduate students and was involved in marking exams and evaluating project assignments. Manjeet acquired his professional experience by working on several mission-critical projects, serving clients in the financial services, telecommunications, manufacturing, retail, and public sectors. I am very thankful to my parents; my wife, Jaspal; my son, Kohinoor; and my daughter, Prabhnoor, for their encouragement and patience, as reviewing this book took some of the evenings and weekends that I would have spent with my family.