ebook img

Java: Data Science Made Easy (1) PDF

715 Pages·2017·8.633 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Java: Data Science Made Easy (1)

Richard M. Reese, Jennifer L. Reese, Alexey Grigorev Java: Data Science Made Easy Learning Path Data Collection, Processing, Analysis and more Java: Data Science Made Easy Data collection, processing, analysis, and more A course in two modules BIRMINGHAM - MUMBAI Java: Data Science Made Easy Copyright © 2017 Packt Publishing All rights reserved. No part of this course may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews. Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this course is sold without warranty, either express or implied. Neither the authors, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this course. Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information. Published on: July 2017 Production reference: 1040717 Published by Packt Publishing Ltd. Livery Place 35 Livery Street Birmingham B3 2PB, UK. ISBN 978-1-78847-565-5 www.packtpub.com Credits Authors Content Development Editor Richard M. Reese Mayur Pawanikar Jennifer L. Reese Alexey Grigorev Reviewers Production Coordinator Walter Molina Arvindkumar Gupta Shilpi Saxena Stanislav Bashkyrtsev Luca Massaron Prashant Verma Module 1: Java for Data Science Chapter 1: Getting Started with Data Science 7 Problems solved using data science 8 Understanding the data science problem - solving approach 9 Using Java to support data science 10 Acquiring data for an application 11 The importance and process of cleaning data 12 Visualizing data to enhance understanding 14 The use of statistical methods in data science 15 Machine learning applied to data science 17 Using neural networks in data science 19 Deep learning approaches 22 Performing text analysis 23 Visual and audio analysis 25 Improving application performance using parallel techniques 27 Assembling the pieces 29 Summary 29 Chapter 2: Data Acquisition 30 Understanding the data formats used in data science applications 31 Overview of CSV data 32 Overview of spreadsheets 32 Overview of databases 33 Overview of PDF files 35 Overview of JSON 36 Overview of XML 36 Overview of streaming data 37 Overview of audio/video/images in Java 38 Data acquisition techniques 39 Using the HttpUrlConnection class 39 Web crawlers in Java 40 Creating your own web crawler 42 Using the crawler4j web crawler 45 Web scraping in Java 48 Using API calls to access common social media sites 52 Using OAuth to authenticate users 52 Handing Twitter 52 Handling Wikipedia 55 Handling Flickr 58 Handling YouTube 61 Searching by keyword 62 Summary 65 Chapter 3: Data Cleaning 66 Handling data formats 67 Handling CSV data 68 Handling spreadsheets 70 Handling Excel spreadsheets 71 Handling PDF files 72 Handling JSON 74 Using JSON streaming API 74 Using the JSON tree API 79 The nitty gritty of cleaning text 80 Using Java tokenizers to extract words 82 Java core tokenizers 83 Third-party tokenizers and libraries 83 Transforming data into a usable form 85 Simple text cleaning 85 Removing stop words 87 Finding words in text 89 Finding and replacing text 90 Data imputation 92 Subsetting data 95 Sorting text 96 Data validation 100 Validating data types 101 Validating dates 102 Validating e-mail addresses 104 Validating ZIP codes 106 Validating names 106 Cleaning images 107 Changing the contrast of an image 108 Smoothing an image 109 Brightening an image 111 Resizing an image 112 Converting images to different formats 113 Summary 114 [ ii ] Chapter 4: Data Visualization 115 Understanding plots and graphs 116 Visual analysis goals 122 Creating index charts 123 Creating bar charts 126 Using country as the category 128 Using decade as the category 130 Creating stacked graphs 133 Creating pie charts 135 Creating scatter charts 138 Creating histograms 140 Creating donut charts 143 Creating bubble charts 145 Summary 148 Chapter 5: Statistical Data Analysis Techniques 149 Working with mean, mode, and median 150 Calculating the mean 150 Using simple Java techniques to find mean 150 Using Java 8 techniques to find mean 151 Using Google Guava to find mean 152 Using Apache Commons to find mean 152 Calculating the median 153 Using simple Java techniques to find median 153 Using Apache Commons to find the median 155 Calculating the mode 155 Using ArrayLists to find multiple modes 157 Using a HashMap to find multiple modes 158 Using a Apache Commons to find multiple modes 159 Standard deviation 159 Sample size determination 162 Hypothesis testing 162 Regression analysis 163 Using simple linear regression 165 Using multiple regression 168 Summary 175 Chapter 6: Machine Learning 176 Supervised learning techniques 177 Decision trees 178 Decision tree types 179 Decision tree libraries 179 [ iii ] Using a decision tree with a book dataset 180 Testing the book decision tree 184 Support vector machines 185 Using an SVM for camping data 188 Testing individual instances 191 Bayesian networks 192 Using a Bayesian network 193 Unsupervised machine learning 196 Association rule learning 196 Using association rule learning to find buying relationships 198 Reinforcement learning 200 Summary 201 Chapter 7: Neural Networks 203 Training a neural network 205 Getting started with neural network architectures 206 Understanding static neural networks 207 A basic Java example 207 Understanding dynamic neural networks 215 Multilayer perceptron networks 215 Building the model 216 Evaluating the model 218 Predicting other values 219 Saving and retrieving the model 220 Learning vector quantization 220 Self-Organizing Maps 221 Using a SOM 221 Displaying the SOM results 222 Additional network architectures and algorithms 226 The k-Nearest Neighbors algorithm 226 Instantaneously trained networks 226 Spiking neural networks 227 Cascading neural networks 227 Holographic associative memory 227 Backpropagation and neural networks 228 Summary 228 Chapter 8: Deep Learning 229 Deeplearning4j architecture 230 Acquiring and manipulating data 231 Reading in a CSV file 231 Configuring and building a model 232 Using hyperparameters in ND4J 233 [ iv ] Instantiating the network model 235 Training a model 235 Testing a model 236 Deep learning and regression analysis 237 Preparing the data 237 Setting up the class 238 Reading and preparing the data 238 Building the model 239 Evaluating the model 240 Restricted Boltzmann Machines 242 Reconstruction in an RBM 243 Configuring an RBM 244 Deep autoencoders 245 Building an autoencoder in DL4J 246 Configuring the network 246 Building and training the network 248 Saving and retrieving a network 248 Specialized autoencoders 248 Convolutional networks 249 Building the model 249 Evaluating the model 252 Recurrent Neural Networks 253 Summary 254 Chapter 9: Text Analysis 255 Implementing named entity recognition 256 Using OpenNLP to perform NER 257 Identifying location entities 258 Classifying text 260 Word2Vec and Doc2Vec 260 Classifying text by labels 260 Classifying text by similarity 263 Understanding tagging and POS 266 Using OpenNLP to identify POS 266 Understanding POS tags 268 Extracting relationships from sentences 269 Using OpenNLP to extract relationships 270 Sentiment analysis 272 Downloading and extracting the Word2Vec model 273 Building our model and classifying text 276 Summary 278 [ v ] Chapter 10: Visual and Audio Analysis 280 Text-to-speech 281 Using FreeTTS 283 Getting information about voices 285 Gathering voice information 287 Understanding speech recognition 288 Using CMUPhinx to convert speech to text 289 Obtaining more detail about the words 290 Extracting text from an image 292 Using Tess4j to extract text 292 Identifying faces 293 Using OpenCV to detect faces 294 Classifying visual data 296 Creating a Neuroph Studio project for classifying visual images 297 Training the model 304 Summary 309 Chapter 11: Mathematical and Parallel Techniques for Data Analysis 310 Implementing basic matrix operations 311 Using GPUs with DeepLearning4j 313 Using map-reduce 315 Using Apache's Hadoop to perform map-reduce 315 Writing the map method 316 Writing the reduce method 317 Creating and executing a new Hadoop job 318 Various mathematical libraries 320 Using the jblas API 320 Using the Apache Commons math API 321 Using the ND4J API 322 Using OpenCL 324 Using Aparapi 324 Creating an Aparapi application 325 Using Aparapi for matrix multiplication 328 Using Java 8 streams 330 Understanding Java 8 lambda expressions and streams 331 Using Java 8 to perform matrix multiplication 332 Using Java 8 to perform map-reduce 333 Summary 335 Chapter 12: Bringing It All Together 337 [ vi ]

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.