Table Of ContentStudies in Big Data 26
S. Srinivasan E ditor
Guide to
Big Data
Applications
Studies in Big Data
Volume 26
SeriesEditor
JanuszKacprzyk,PolishAcademyofSciences,Warsaw,Poland
e-mail:kacprzyk@ibspan.waw.pl
AboutthisSeries
Theseries“StudiesinBigData”(SBD)publishesnewdevelopmentsandadvances
in the various areas of Big Data – quickly and with a high quality. The intent
is to cover the theory, research, development, and applications of Big Data, as
embedded in the fields of engineering, computer science, physics, economics and
life sciences. The books of the series refer to the analysis and understanding of
large, complex, and/or distributed data sets generated from recent digital sources
coming from sensors or other physical instruments as well as simulations, crowd
sourcing, social networks or other internet transactions, such as emails or video
click streams and other. The series contains monographs, lecture notes and edited
volumes in Big Data spanning the areas of computational intelligence including
neuralnetworks,evolutionarycomputation,softcomputing,fuzzysystems,aswell
asartificialintelligence,datamining,modernstatisticsandOperationsresearch,as
well as self-organizing systems. Of particular value to both the contributors and
thereadershiparetheshortpublicationtimeframeandtheworld-widedistribution,
whichenablebothwideandrapiddisseminationofresearchoutput.
Moreinformationaboutthisseriesathttp://www.springer.com/series/11970
S. Srinivasan
Editor
Guide to Big Data
Applications
123
Editor
S.Srinivasan
JesseH.JonesSchoolofBusiness
TexasSouthernUniversity
Houston,TX,USA
ISSN2197-6503 ISSN2197-6511 (electronic)
StudiesinBigData
ISBN978-3-319-53816-7 ISBN978-3-319-53817-4 (eBook)
DOI10.1007/978-3-319-53817-4
LibraryofCongressControlNumber:2017936371
©SpringerInternationalPublishingAG2018
Thisworkissubjecttocopyright.AllrightsarereservedbythePublisher,whetherthewholeorpartof
thematerialisconcerned,specificallytherightsoftranslation,reprinting,reuseofillustrations,recitation,
broadcasting,reproductiononmicrofilmsorinanyotherphysicalway,andtransmissionorinformation
storageandretrieval,electronicadaptation,computersoftware,orbysimilarordissimilarmethodology
nowknownorhereafterdeveloped.
Theuseofgeneraldescriptivenames,registerednames,trademarks,servicemarks,etc.inthispublication
doesnotimply,evenintheabsenceofaspecificstatement,thatsuchnamesareexemptfromtherelevant
protectivelawsandregulationsandthereforefreeforgeneraluse.
Thepublisher,theauthorsandtheeditorsaresafetoassumethattheadviceandinformationinthisbook
arebelievedtobetrueandaccurateatthedateofpublication.Neitherthepublishernortheauthorsor
theeditorsgiveawarranty,expressorimplied,withrespecttothematerialcontainedhereinorforany
errorsoromissionsthatmayhavebeenmade.Thepublisherremainsneutralwithregardtojurisdictional
claimsinpublishedmapsandinstitutionalaffiliations.
Printedonacid-freepaper
ThisSpringerimprintispublishedbySpringerNature
TheregisteredcompanyisSpringerInternationalPublishingAG
Theregisteredcompanyaddressis:Gewerbestrasse11,6330Cham,Switzerland
TomywifeLakshmiandgrandsonSahaas
Foreword
ItgivesmegreatpleasuretowritethisForewordforthistimelypublicationonthe
topicoftheever-growinglistofBigDataapplications.Thepotentialforleveraging
existing data from multiple sources has been articulated over and over, in an
almostinfinitelandscape,yetitisimportanttorememberthatindoingso,domain
knowledge is key to success. Naïve attempts to process data are bound to lead to
errors such as accidentally regressing on noncausal variables. As Michael Jordan
at Berkeley has pointed out, in Big Data applications the number of combinations
of the features grows exponentially with the number of features, and so, for any
particular database, you are likely to find some combination of columns that will
predict perfectly any outcome, just by chance alone. It is therefore important that
we do not process data in a hypothesis-free manner and skip sanity checks on our
data.
In this collection titled “Guide to Big Data Applications,” the editor has
assembledasetofapplicationsinscience,medicine,andbusinesswheretheauthors
have attempted to do just this—apply Big Data techniques together with a deep
understanding of the source data. The applications covered give a flavor of the
benefitsofBigDatainmanydisciplines.Thisbookhas19chaptersbroadlydivided
into four parts. In Part I, there are four chapters that cover the basics of Big
Data, aspects of privacy, and how one could use Big Data in natural language
processing (a particular concern for privacy). Part II covers eight chapters that
vii
viii Foreword
lookatvariousapplicationsofBigDatainenvironmentalscience,oilandgas,and
civilinfrastructure,coveringtopicssuchasdeduplication,encryptedsearch,andthe
friendshipparadox.
Part III covers Big Data applications in medicine, covering topics ranging
from “The Impact of Big Data on the Physician,” written from a purely clinical
perspective, to the often discussed deep dives on electronic medical records.
PerhapsmostexcitingintermsoffuturelandscapingistheapplicationofBigData
applicationinhealthcarefromadevelopingcountryperspective.Thisisoneofthe
most promising growth areas in healthcare, due to the current paucity of current
services and the explosion of mobile phone usage. The tabula rasa that exists in
manycountriesholdsthepotentialtoleapfrogmanyofthemistakeswehavemade
in the west with stagnant silos of information, arbitrary barriers to entry, and the
lackofanystandardizedschemaornondegenerateontologies.
InPartIV,thebook covers BigDataapplications inbusiness,which isperhaps
theunifyingsubjecthere,giventhatnoneoftheaboveapplicationareasarelikely
to succeed without a good business model. The potential to leverage Big Data
approachesinbusinessisenormous,frombankingpracticestotargetedadvertising.
Theneedforinnovationinthisspaceisasimportantastheunderlyingtechnologies
themselves. As Clayton Christensen points out in The Innovator’s Prescription,
threerevolutionsareneededforasuccessfuldisruptiveinnovation:
1. Atechnologyenablerwhich“routinizes”previouslycomplicatedtask
2. Abusinessmodelinnovationwhichisaffordableandconvenient
3. A value network whereby companies with disruptive mutually reinforcing
economicmodelssustaineachotherinastrongecosystem
We see this happening with Big Data almost every week, and the future is
exciting.
In this book, the reader will encounter inspiration in each of the above topic
areasandbeabletoacquireinsightsintoapplicationsthatprovidetheflavorofthis
fast-growinganddynamicfield.
Atlanta,GA,USA GariClifford
December10,2016
Preface
BigDataapplicationsaregrowingveryrapidlyaroundtheglobe.Thisnewapproach
todecisionmakingtakesintoaccountdatagatheredfrommultiplesources.Heremy
goalistoshowhowthesediversesourcesofdataareusefulinarrivingatactionable
information. In this collection of articles the publisher and I are trying to bring
in one place several diverse applications of Big Data. The goal is for users to see
how a Big Data application in another field could be replicated in their discipline.
With this in mind I have assembled in the “Guide to Big Data Applications” a
collection of19chapters writtenbyacademics andindustrypractitionersglobally.
These chapters reflect what Big Data is, how privacy can be protected with Big
DataandsomeoftheimportantapplicationsofBigDatainscience,medicineand
business. These applications are intended to be representative and not exhaustive.
For nearly two years I spoke with major researchers around the world and the
publisher. These discussions led to this project. The initial Call for Chapters was
senttoseveralhundredresearchersgloballyviaemail.Approximately40proposals
weresubmitted.Outofthesecamecommitmentsforcompletioninatimelymanner
from 20 people. Most of these chapters are written by researchers while some are
written by industry practitioners. One of the submissions was not included as it
could not provide evidence of use of Big Data. This collection brings together in
one place several important applications of Big Data. All chapters were reviewed
using a double-blind process and comments provided to the authors. The chapters
includedreflectthefinalversionsofthesechapters.
Ihavearrangedthechaptersinfourparts.PartIincludesfourchaptersthatdeal
with basic aspects of Big Data and how privacy is an integral component. In this
part I include an introductory chapter that lays the foundation for using Big Data
inavarietyofapplications.Thisisthenfollowedwithachapterontheimportance
ofincludingprivacyaspectsatthedesignstageitself.Thischapterbytwoleading
researchers in the field shows the importance of Big Data in dealing with privacy
issues and how they could be better addressed by incorporating privacy aspects at
thedesignstageitself.Theteamofresearchersfromamajorresearchuniversityin
the USA addresses the importance of federated Big Data. They are looking at the
use of distributed data in applications. This part is concluded with a chapter that
ix
x Preface
shows the importance of word embedding and natural language processing using
BigDataanalysis.
In Part II, there are eight chapters on the applications of Big Data in science.
Science is an important area where decision making could be enhanced on the
way to approach a problem using data analysis. The applications selected here
dealwithEnvironmentalScience,HighPerformanceComputing(HPC),friendship
paradox in noting which friend’s influence will be significant, significance of
using encrypted search with Big Data, importance of deduplication in Big Data
especially when data is collected from multiple sources, applications in Oil &
Gas and how decision making can be enhanced in identifying bridges that need
to be replaced as part of meeting safety requirements. All these application areas
selected for inclusion in this collection show the diversity of fields in which Big
Data is used today. The Environmental Science application shows how the data
published by the National Oceanic and Atmospheric Administration (NOAA) is
usedtostudytheenvironment.Sincesuchdatasetsareverylarge,specializedtools
are needed to benefit from them. In this chapter the authors show how Big Data
tools help in this effort. The team of industry practitioners discuss how there is
greatsimilarityinthewayHPCdealswithlow-latency,massivelyparallelsystems
and distributed systems. These are all typical of how Big Data is used using tools
suchasMapReduce,HadoopandSpark.Quoraisaleadingproviderofanswersto
user queries and in this context one of their data scientists is addressing how the
FriendshipparadoxisplayingasignificantpartinQuoraanswers.Thisisaclassic
illustrationofaBigDataapplicationusingsocialmedia.
Big Data applications in science exist in many branches and it is very heavily
used in the Oil and Gas industry. Two chapters that address the Oil and Gas
application are written by two sets of people with extensive industry experience.
TwospecificchaptersaredevotedtohowBigDataisusedindeduplicationpractices
involving multimedia data in the cloud and how privacy-aware searches are done
over encrypted data. Today, people are very concerned about the security of data
storedwithanapplicationprovider.Encryptionisthepreferredtooltoprotectsuch
data and so having an efficient way to search such encrypted data is important.
This chapter’s contribution in this regard will be of great benefit for many users.
We conclude Part II with a chapter that shows how Big Data is used in noting the
structuralsafetyofnation’sbridges.ThispracticalapplicationshowshowBigData
isusedinmanydifferentways.
Part III considers applications in medicine. A group of expert doctors from
leading medical institutions in the Bay Area discuss how Big Data is used in the
practiceofmedicine.Thisisoneareawheremanymoreapplicationsaboundandthe
interestedreaderisencouragedtolookatsuchapplications.Anotherchapterlooks
athowdatascientistsareimportantinanalyzingmedicaldata.Thischapterreflects
a view from Asia and discusses the roadmap for data science use in medicine.
Smokinghasbeennotedasoneoftheleadingcausesofhumansuffering.Thispart
includes a chapter on comorbidity aspects related to smokers based on a Big Data
analysis.Thedetailspresentedinthischapterwouldhelpthereadertofocusonother
possibleapplicationsofBigDatainmedicine,especiallycancer.Finally,achapteris