ebook img

Practical Predictive Analytics PDF

651 Pages·2017·17.067 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Practical Predictive Analytics

Contents 1: Getting Started with Predictive Analytics b'Chapter 1: Getting Started with Predictive Analytics' b'Predictive analytics are in so many industries' b'Skills and roles that are important in Predictive Analytics' b'Predictive analytics software' b'Other helpful tools' b'R' b'How is a predictive analytics project organized?' b'GUIs' b'Getting started with RStudio' b'The R console' b'The source window' b'Our first predictive model' b'Your second script' b'R packages' b'References' b'Summary' 2: The Modeling Process b'Chapter 2: The Modeling Process' b'Advantages of a structured approach' b'Analytic process methodologies' b'An analytics methodology outline \xc3\xa2\xc2\x80\xc2\x93 specific steps' b'Step 2 data understanding' b'Step 3 data preparation' b'Step 4 modeling' b'Step 5 evaluation' b'Step 6 deployment' b'References' b'Summary' 3: Inputting and Exploring Data b'Chapter 3: Inputting and Exploring Data' b'Data input' b'Joining data' b'Exploring the hospital dataset' b'Transposing a dataframe' b'Missing values' b'Imputing categorical variables' b'Outliers' b'Data transformations' b'Variable reduction/variable importance' b'References' b'Summary' 4: Introduction to Regression Algorithms b'Chapter 4: Introduction to Regression Algorithms' b'Supervised versus unsupervised learning models' b'Regression techniques' b'Generalized linear models' b'Logistic regression' b'Summary' 5: Introduction to Decision Trees, Clustering, and SVM b'Chapter 5: Introduction to Decision Trees, Clustering, and SVM' b'Decision tree algorithms' b'Cluster analysis' b'Support vector machines' b'References' b'Summary' 6: Using Survival Analysis to Predict and Analyze Customer Churn b'Chapter 6: Using Survival Analysis to Predict and Analyze Customer Churn' b'What is survival analysis?' b'Our customer satisfaction dataset' b'Partitioning into training and test data' b'Setting the stage by creating survival objects' b'Examining survival curves' b'Cox regression modeling' b'Time-based variables' b'Comparing the models' b'Variable selection' b'Summary' 7: Using Market Basket Analysis as a Recommender Engine b'Chapter 7: Using Market Basket Analysis as a Recommender Engine' b'What is market basket analysis?' b'Examining the groceries transaction file' b'The sample market basket' b'Association rule algorithms' b'Antecedents and descendants' b'Evaluating the accuracy of a rule' b'Preparing the raw data file for analysis' b'Analyzing the input file' b'Scrubbing and cleaning the data' b'Removing colors automatically' b'Filtering out single item transactions' b'Merging the results back into the original data' b'Compressing descriptions using camelcase' b'Creating the test and training datasets' b'Creating the market basket transaction file' b'Method two \xc3\xa2\xc2\x80\xc2\x93 Creating a physical transactions file' b'Converting to a document term matrix' b'K-means clustering of terms' b'Predicting cluster assignments' b'Running the apriori algorithm on the clusters' b'Summarizing the metrics' b'References' b'Summary' 8: Exploring Health Care Enrollment Data as a Time Series b'Chapter 8: Exploring Health Care Enrollment Data as a Time Series' b'Time series data' b'Health insurance coverage dataset' b'Housekeeping' b'Read the data in' b'Subsetting the columns' b'Description of the data' b'Target time series variable' b'Saving the data' b'Determining all of the subset groups' b'Merging the aggregate data back into the original data' b'Checking the time intervals' b'Picking out the top groups in terms of average population size' b'Plotting the data using lattice' b'Plotting the data using ggplot' b'Sending output to an external file' b'Examining the output' b'Detecting linear trends' b'Automating the regressions' b'Ranking the coefficients' b'Merging scores back into the original dataframe' b'Plotting the data with the trend lines' b'Plotting all the categories on one graph' b'Performing some automated forecasting using the ets function' b'Smoothing the data using moving averages' b'Simple moving average' b'Verifying the SMA calculation' b'Exponential moving average' b'Using the ets function' b'Forecasting using ALL AGES' b'Plotting the predicted and actual values' b'The forecast (fit) method' b'Plotting future values with confidence bands' b'Modifying the model to include a trend component' b'Running the ets function iteratively over all of the categories' b'Accuracy measures produced by onestep' b'Comparing the Test and Training for the "UNDER 18 YEARS" group' b'Accuracy measures' b'References' b'Summary' 9: Introduction to Spark Using R b'Chapter 9: Introduction to Spark Using R' b'About Spark' b'Spark environments' b'SparkR' b'Building our first Spark dataframe' b'Importing the sample notebook' b'Creating a new notebook' b'Becoming large by starting small' b'Running the code' b'Running the initialization code' b'Extracting the Pima Indians diabetes dataset' b'Simulating the data' b'Simulating the negative cases' b'Running summary statistics' b'Saving your work' b'Summary' 10: Exploring Large Datasets Using Spark b'Chapter 10: Exploring Large Datasets Using Spark' b'Performing some exploratory analysis on positives' b'Cleaning up and caching the table in memory' b'Some useful Spark functions to explore your data' b'Creating new columns' b'Constructing a cross-tab' b'Contrasting histograms' b'Plotting using ggplot' b'Spark SQL' b'Exporting data from Spark back into R' b'Running local R packages' b'Some tips for using Spark' b'Summary' 11: Spark Machine Learning - Regression and Cluster Models b'Chapter 11: Spark Machine Learning - Regression and Cluster Models' b'About this chapter/what you will learn' b'Splitting the data into train and test datasets' b'Spark machine learning using logistic regression' b'Running predictions for the test data' b'Combining the training and test dataset' b'Exposing the three tables to SQL' b'Validating the regression results' b'Calculating goodness of fit measures' b'Confusion matrix for test group' b'Plotting outside of Spark' b'Creating some global views' b'Normalizing the data' b'Characterizing the clusters by their mean values' b'Summary' 12: Spark Models Rule-Based Learning b'Chapter 12: Spark Models \xe2\x80\x93 Rule-Based Learning' b'Loading the stop and frisk dataset' b'Reading the table' b'Discovering the important features' b'Running the OneR model' b'Another OneR example' b'Constructing a decision tree using Rpart' b'Running an alternative model in Python' b'Indexing the classification features' b'Summary' Chapter 1. Getting Started with Predictive Analytics "In God we trust, all others must bring Data" - Deming I enjoy working and explaining predictive analytics to people because it is based upon a simple concept: predicting the probability of future events based upon historical data. Its history may date back to at least 650 BC. Some early examples include the Babylonians, who tried to predict short-term weather changes based on cloud appearances and halos: Weather Forecasting through the Ages, NASA. Medicine also has a long history of needing to classify diseases. The Babylonian king Adad-apla-iddina decreed that medical records be collected to form the Diagnostic Handbook. Some predictions in this corpus list treatments based on the number of days the patient had been sick, and their pulse rate (Linda Miner et al., 2014). One of the first instances of bioinformatics! In later times, specialized predictive analytics was developed at the onset of the insurance underwriting industries. This was used as a way to predict the risk associated with insuring marine vessels (https://www.lloyds.com/lloyds/about-us/history/corporate-history). At about the same time, life insurance companies began predicting the age that a person would live to in order to set the most appropriate premium rates. Although the idea of prediction always seemed to be rooted early in the human need to understand and classify, it was not until the 20th century, and the advent of modern computing, that it really took hold. In addition to helping the US government in the 1940s break code, Alan Turing also worked on the initial computer chess algorithms that pitted man against machine. Monte Carlo simulation methods originated as part of the Manhattan project, where mainframe computers crunched numbers for days in order to determine the probability of nuclear attacks (Computing and the Manhattan Project, n.d). In the 1950s, Operations Research (OR) theory developed, in which one could optimize the shortest distance between two points. To this day, these techniques are used in logistics by companies such as UPS and Amazon. Non-mathematicians have also gotten in on the act. In the 1970s, cardiologist Lee Goldman (who worked aboard a submarine) spend years developing a decision tree that did this efficiently. This helped the staff determine whether or not the submarine needed to resurface in order to help the chest pain sufferers (Gladwell, 2005)! What many of these examples had in common was that people first made observations about events which had already occurred, and then used this information to generalize and then make decisions about might occur in the future. Along with prediction, came further understanding of cause and effect and how the various parts of the problem were interrelated. Discovery and insight came about through methodology and adhering to the scientific method. Most importantly, they came about in order to find solutions to important, and often practical, problems of the times. That is what made them unique. Predictive analytics are in so many industries We have come a long way since then, and practical analytics solutions have furthered growth in so many different industries. The internet has had a profound effect on this; it has enabled every click to be stored and analyzed. More data is being collected and stored, some with very little effort, than ever before. That in itself has enabled more industries to enter predictive analytics. Predictive Analytics in marketing One industry that has embraced PA for quite a long time is marketing. Marketing has always been concerned with customer acquisition and retention, and has developed predictive models involving various promotional offers and customer touch points, all geared to keeping customers and acquiring new ones. This is very pronounced in certain segments of marking, such as wireless and online shopping cards, in which customers are always searching for the best deal. Specifically, advanced analytics can help answer questions such as, If I offer a customer 10% off with free shipping, will that yield more revenue than 15% off with no free shipping? The 360-degree view of the customer has expanded the number of ways one can engage with the customer, therefore enabling marketing mix and attribution modeling to become increasingly important. Location-based devices have enabled marketing predictive applications to incorporate real-time data to issue recommendation to the customer while in the store. Predictive Analytics in healthcare Predictive analytics in healthcare has its roots in clinical trials, which use carefully selected samples to test the efficacy of drugs and treatments.

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.