Table Of ContentBIG DATA WITH
HADOOP MAPREDUCE
A Classroom Approach
BIG DATA WITH
HADOOP MAPREDUCE
A Classroom Approach
Rathinaraja Jeyaraj
Ganeshkumar Pugalendhi
Anand Paul
Apple Academic Press Inc. Apple Academic Press, Inc.
4164 Lakeshore Road 1265 Goldenrod Circle NE
Burlington ON L7L 1A4 Palm Bay, Florida 32905
Canada USA
© 2021 by Apple Academic Press, Inc.
Exclusive worldwide distribution by CRC Press, a member of Taylor & Francis Group
No claim to original U.S. Government works
International Standard Book Number-13: 978-1-77188-834-9 (Hardcover)
International Standard Book Number-13: 978-0-42932-173-3 (eBook)
All rights reserved. No part of this work may be reprinted or reproduced or utilized in any form or by
any electric, mechanical or other means, now known or hereafter invented, including photocopying and
recording, or in any information storage or retrieval system, without permission in writing from the
publisher or its distributor, except in the case of brief excerpts or quotations for use in reviews or critical
articles.
This book contains information obtained from authentic and highly regarded sources. Reprinted material
is quoted with permission and sources are indicated. Copyright for individual articles remains with the
authors as indicated. A wide variety of references are listed. Reasonable efforts have been made to pub-
lish reliable data and information, but the authors, editors, and the publisher cannot assume responsibility
for the validity of all materials or the consequences of their use. The authors, editors, and the publisher
have attempted to trace the copyright holders of all material reproduced in this publication and apologize
to copyright holders if permission to publish in this form has not been obtained. If any copyright material
has not been acknowledged, please write and let us know so we may rectify in any future reprint.
Trademark Notice: Registered trademark of products or corporate names are used only for explanation
and identification without intent to infringe.
Library and Archives Canada Cataloguing in Publication
Title: Big data with Hadoop MapReduce : a classroom approach / Rathinaraja Jeyaraj,
Ganeshkumar Pugalendhi, Anand Paul.
Names: Jeyaraj, Rathinaraja, author. | Pugalendhi, Ganeshkumar, author. | Paul, Anand, author.
Description: Includes bibliographical references and index.
Identifiers: Canadiana (print) 20200185195 | Canadiana (ebook) 20200185241 | ISBN 9781771888349
(hardcover) | ISBN 9780429321733 (electronic bk.)
Subjects: LCSH: Apache Hadoop. | LCSH: MapReduce (Computer file) | LCSH: Big data. |
LCSH: File organization (Computer science)
Classification: LCC QA76.9.D5 .J49 2020 | DDC 005.74—dc23
CIP data on file with US Library of C ongress
Apple Academic Press also publishes its books in a variety of electronic formats. Some content that appears
in print may not be available in electronic format. For information about Apple Academic Press products,
visit our website at www.appleacademicpress.com and the CRC Press website at www.crcpress.com
About the Authors
Rathinaraja Jeyaraj
Post-Doctoral Researcher, University of Macau, Macau
Rathinaraja Jeyaraj has obtained PhD from National Institute of Technology
Karnataka, India. He recently worked as a visiting researcher at connected
computing and media processing lab, Kyungpook National University, South
Korea and supervised by Prof. Anand Paul. His research interests include
big data processing tools, cloud computing, IoT, and machine learning. He
completed his BTech and MTech at Anna University, Tamil Nadu, India.
He has also earned an MBA in Information Systems and Management at
Bharathiar University, Coimbatore, India.
Ganeshkumar Pugalendhi, PhD
Assistant Professor, Department of Information Technology,
Anna University Regional Campus, Coimbatore, India
Ganeshkumar Pugalendhi, PhD, is an Assistant Professor in the Depart-
ment of Information Technology, Anna University Regional Campus,
Coimbatore, India. He received his BTech from University of Madras, MS
(by research), and PhD degrees from Anna University, India, and did his
postdoctoral work at Kyungpook NationalUniversity, South Korea. He is
the recipient of a Student Scientist Award from the TNSCST, India; best
paper awards from IEEE, the IET, and the Korean Institute of Industrial and
Systems Engineers; travel grants from Indian Government funding agen-
cies like DST-SERB as a Young Scientist, DBT, and CSIR and a workshop
grant from DBT. He has visited many countries (Singapore, South Korea,
USA, Serbia, Japan, and France) for research interaction and collabora-
tion. He is the resource person for delivering technical talks and seminars
sponsored by Indian Government Organizations like UGC, AICTE, TEQIP,
ICMR, DST and others. His research works are published in well reputed
Scopus/SCIE/SCI journals and renowned top conferences. He has written
two research-oriented textbooks: Data Classification Using Soft Computing
and Soft Computing for Microarray Data Analysis. He is a Track Chair for
Human Computer Interface Track in ACM SAC (Symposium on Applied
About the Authors
vi
Computing) for 2016 in Italy, 2017 in Morocco, 2018 in France and 2019 in
Cyprus. He is a Guest Editor for Taylor & Francis Journal and Inderscience
Journal in 2017, Hindawii Journal in 2018, MDPI Journal of Sensor and
Actuator Networks in 2019. His Citation and h-index are (260, 8), (218, 7)
and (117, 6) in Google Scholar, Scopus and Publons respectively as on 2020.
His research interests are in Data Analytics and Machine Learning.
Anand Paul, PhD
Associate Professor, School of Computer Science and Engineering,
Kyungpook National University, South Korea
Anand Paul, PhD, is currently working in the School of Computer Science and
Engineering at Kyungpook National University, South Korea, as Associate
Professor. He earned his PhD in Electrical Engineering from the National
Cheng Kung University, Taiwan, R.O.C. His research interests include big
data analytics, IoT, and machine learning. He has done extensive work in
big data and IoT-based smart cities. He was a delegate representing South
Korea for the M2M focus group in 2010–2012 and has been an IEEE senior
member since 2015. He is serving as associate editor for the journals IEEE
Access, IET Wireless Sensor Systems, ACM Applied Computing Reviews,
Cyber Physical Systems (Taylor & Francis), Human Behaviour and Emerging
Technology (Wiley), and the Journal of Platform Technology. He has also
guest edited various international journals. He is the track chair for smart
human computer interaction with the Association for Computing Machinery
Symposium on Applied Computing 2014–2019, and general chair for the 8th
International Conference on Orange Technology (ICOT 2020). He is also an
MPEG delegate representing South Korea.
A Message from Kaniyan
From Purananuru written in Tamil
English Translation by Reverend G.U. Pope (in 1906)
To us all towns are one, all men our kin.
Life’s good comes not from others’ gift, nor ill
Man’s pains and pains’ relief are from within.
Death’s no new thing; nor do our bosoms thrill
When Joyous life seems like a luscious draught.
When grieved, we patient suffer; for, we deem
This much – praised life of ours a fragile raft
Borne down the waters of some mountain stream
That o’er huge boulders roaring seeks the plain
Tho’ storms with lightnings’ flash from darken’d skies
Descend, the raft goes on as fates ordain.
Thus have we seen in visions of the wise ! (Puram: 192)
—Kaniyan Pungundran
Kaniyan Pungundran was an influential Tamil philosopher from the Sangam
age (3000 years ago). His name Kaniyan implies that he was an astronomer
as it is a Tamil word referring to mathematics. He was born and brought up
in Mahibalanpatti, a village panchayat in the Thiruppatur taluk of Sivaganga
district in the state of Tamil Nadu, India. He composed two poems called
Purananuru and Natrinai during the Sangam period.
Contents
Abbreviations ..................................................................................................xi
Preface ...........................................................................................................xv
Dedication and Acknowledgment ................................................................xvii
Introduction ..................................................................................................xix
1. Big Data ..........................................................................................................1
2. Hadoop Framework .....................................................................................47
3. Hadoop 1.2.1 Installation ...........................................................................113
4. Hadoop Ecosystem ......................................................................................153
5. Hadoop 2.7.0 ................................................................................................167
6. Hadoop 2.7.0 Installation ...........................................................................197
7. Data Science ................................................................................................357
APPENDIX A: Public Datasets .........................................................................371
APPENDIX B: MapReduce Exercise ...............................................................375
APPENDIX C: Case Study: Application Development for NYSE Dataset ...383
Web References ...................................................................................................391
Index ....................................................................................................................393