ebook img

Practical Business Analytics Using SAS PDF

565 Pages·2015·19.34 MB·English
by  
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Practical Business Analytics Using SAS

BOOKS FOR PROFESSIONALS BY PROFESSIONALS® Konasani Kadre Practical Business Analytics Using SAS RELATED Practical Business Analytics Using SAS: A Hands-on Guide shows SAS users and businesspeople how to analyze data effectively in real-life business scenarios. The book begins with an introduction to analytics, analytical tools, and SAS programming. The authors—both SAS, statistics, analytics, and big data experts—first show how SAS is used in business, and then how to get started programming in SAS by importing data and learning how to manipulate it. Besides illustrating SAS basic functions, you will see how each function can be used to get the information you need to improve business performance. Each chapter offers hands-on exercises drawn from real business situations. The book then provides an overview of statistics, as well as instruction on exploring data, preparing it for analysis, and testing hypotheses. You will learn how to use SAS to perform analytics and model using both basic and advanced techniques like multiple regression, logistic regression, and time series analysis, among other topics. The book concludes with a chapter on analyzing big data. Illustrations from banking and other industries make the principles and methods come to life. Readers will find just enough theory to understand the practical examples and case studies, which cover all industries. Written for a corporate IT and programming audience that wants to upgrade skills or enter the analytics field, this book includes: • More than 200 examples and exercises, including code and datasets for practice. • Relevant examples for all industries. • Case studies that show how to use SAS analytics to identify opportunities, solve complicated problems, and chart a course. Practical Business Analytics Using SAS: A Hands-on Guide gives you the tools you need to gain insight into the data at your fingertips, predict business conditions for better planning, and make excellent decisions. Whether you are in retail, finance, healthcare, manufacturing, government, or any other industry, this book will help your organization increase revenue, drive down costs, improve marketing, and satisfy customers better than ever before. Shelve in ISBN 978-1-4842-0044-5 Databases/General 55999 User level: Beginning–Advanced SOURCE CODE ONLINE 9781484200445 www.apress.com For your convenience Apress has placed some of the front matter material after the index. Please use the Bookmarks and Contents at a Glance links to access them. Contents at a Glance About the Authors ���������������������������������������������������������������������������������������������������xix Acknowledgments ��������������������������������������������������������������������������������������������������xxi Preface �����������������������������������������������������������������������������������������������������������������xxiii ■ Part 1: Basics of SAS Programming for Analytics ������������������������������1 ■ Chapter 1: Introduction to Business Analytics and Data Analysis Tools ���������������3 ■ Chapter 2: SAS Introduction ��������������������������������������������������������������������������������29 ■ Chapter 3: Data Handling Using SAS �������������������������������������������������������������������55 ■ Chapter 4: Important SAS Functions and Procs ��������������������������������������������������95 ■ Part 2: Using SAS for Business Analytics ���������������������������������������145 ■ Chapter 5: Introduction to Statistical Analysis ������������������������������������������������147 ■ Chapter 6: Basic Descriptive Statistics and Reporting in SAS ��������������������������165 ■ Chapter 7: Data Exploration, Validation, and Data Sanitization ������������������������197 ■ Chapter 8: Testing of Hypothesis �����������������������������������������������������������������������261 ■ Chapter 9: Correlation and Linear Regression ��������������������������������������������������295 ■ Chapter 10: Multiple Regression Analysis ��������������������������������������������������������351 ■ Chapter 11: Logistic Regression �����������������������������������������������������������������������401 ■ Chapter 12: Time-Series Analysis and Forecasting ������������������������������������������441 ■ Chapter 13: Introducing Big Data Analytics ������������������������������������������������������509 Index ���������������������������������������������������������������������������������������������������������������������541 v Part 1 Basics of SAS Programming for Analytics Chapter 1 Introduction to Business Analytics and Data Analysis Tools There is an ever-increasing need for advanced information and decision support systems in today’s fierce global competitive environment. The profitability and the overall business can be managed better with access to predictive tools—to predict, even approximately, the market prices of raw materials used in production, for instance. Business analytics involves, among others, quantitative techniques, statistics, information technology (IT), data and analysis tools, and econometrics models. It can positively push business performance beyond executive experience or plain intuition. Business analytics (or advanced analytics for that matter) can include nonfinancial variables as well, instead of traditional parameters that may be based only on financial performance. Business analytics can effectively help businesses, for example, in detecting credit card fraud, identifying potential customers, analyzing or predicting profitability per customer, helping telecom companies launch the most profitable mobile phone plans, and floating insurance policies that can be targeted to a designated segment of customers. In fact, advanced analytical techniques are already being used effectively in all these fields and many more. This chapter covers the basics that are required to comprehend all the analytical techniques used in this book. Business Analytics, the Science of Data-Driven Decision Making Many analytical techniques are data intensive and require business decision makers to have an understanding of statistical and various other analytical tools. These techniques invariably require some level of IT and database knowledge. Organizations using business analytics techniques in decision making also need to develop and implement a data-driven approach in their day-to-day operations, planning, and strategy making. However, in a large number of cases, businesses have no other choice but to implement a data-driven decision-making approach because of fierce competition and cost-cutting pressures. This makes business analytics a lucrative and rewarding career choice. This may be the right time for you to enter this field because the business analytics culture is still in its nascent stage in most organizations around the world and is on the verge of exploding with respect to growing opportunities. Business Analytics Defined Business analytics is all about data, methodologies, IT, applications, mathematical, and statistical techniques and skills required to get new business insights and understand business performance. It uses iterative and methodical exploration of past data to support business decisions. 3 Chapter 1 ■ IntroduCtIon to BusIness analytICs and data analysIs tools Business analytics aims to increase profitability, reduce warranty expenditures, acquire new customers, retain customers, upsell or cross-sell, monitor the supply chain, improve operations, or simply reduce the response time to customer complaints, among others. The applications of business analytics are numerous and across industry verticals, including manufacturing, finance, telecom, and retail. The global banking and financial industry traditionally has been one of the most active users of analytics techniques. The typical applications in the finance vertical are detecting credit card fraud, identifying loan defaulters, acquiring new customers, identifying responders to e-mail campaigns, predicting relationship value or profitability of customers, and designing new financial and insurance products. All these processes use a huge amount of data and fairly involved statistical calculations and interpretations. Any application of business analytics involves a considerable amount of effort in defining the problem and the methodology to solve it, data collection, data cleansing, model building, model validation, and the interpretation of results. It is an iterative process, and the models might need to be built several times before they are finally accepted. Even an established model needs to be revisited/rebuilt periodically for changes in the input data or changes in the business conditions (and assumptions) that were used in the original model building. Any meaningful decision support system that uses data analytics thus requires development and implementation of a strong data-driven culture within the organization and all the external entities that support it. Let’s take an example of a popular retail web site that aims to promote an upmarket product. To do that, the retail web site wants to know which segment of customers it needs to target to maximize product sales with minimum promotional dollars. To do this, the web site needs to collect and analyze customer data. The web site may also want to know how many customers visited it and at what time; their gender, income bracket, and demographic data; which sections of the web site they visited and in what frequency; their buying and surfing patterns; the web browser they used; the search strings they used to get into the web site; and other such information. If analyzed properly, this data presents an enormous opportunity to garner useful business insights about customers, thereby providing a chance to cut promotional costs and improve overall sales. Business analytics techniques are capable of working with multiple and a variety of data sources to build the models that can derive rich business insights that were not possible before. This derived rich fact base can be used to improve customer experiences, streamline operations, and thereby improve overall profitability. In the previous example, it is possible, by applying business analytics techniques, to target the product to a segment of customers who are most likely to buy it, thereby minimizing the promotional costs. Conventional business performance parameters are based mainly on finance-based indicators such as top-line revenue and bottom-line profit. But there is more to the performance of a company than just financial parameters. Measures such as operational efficiency, employee motivation, average employee salary, working conditions, and so on, may be equally important. Hence, the numbers of parameters that are used to measure or predict the performance of a company have been increased here. These parameters will increase the amount of data and the complexity of analyzing it. This is just one example. The sheer volume of data and number of variables that need to be handled in order to analyze consumer behavior on a social media web site, for instance, is immense. In such a situation, conventional wisdom and reporting tools may fail. Advanced analytics predictive modeling techniques help in such instances. The subsequent chapters in this book will deal with data analytics. Statistical and quantitative techniques used in advanced analytics, along with IT, provide business insights while handling a vast amount of data that was not possible until a few years ago. Today’s powerful computing machines and software (such as SAS) take care of all the laborious tasks of analytics algorithms coding and frees the analyst to work on the important tasks of interpretation and applying the results to gain business insights. 4 Chapter 1 ■ IntroduCtIon to BusIness analytICs and data analysIs tools Is Advanced Analytics the Solution for You? Anyone who is in a competitive business environment and faces challenges such as the following, or almost any problem for which data is available, might be a potential candidate for applying advanced analytics techniques: • Consumer buying pattern analysis • Improving overall customer satisfaction • Predicting the lead times in supply chain • Warrant costs optimization • Right sizing or the optimization of the sales force • Price and promotion modeling • Predicting customer response to sales promotion • Credit risk analysis • Fraud identification • Identifying potential loan defaulters • Drug discovery • Clinical data analysis • Web site analytics • Text analysis (for instance on Twitter) • Social media analytics • Identifying genes responsible for a particular disease As discussed, business analytics is a culture that needs to be developed, implemented, and finally integrated as a way of life in any organization with regard to decision making. Many organizations around the world have already experienced and realized the potential of this culture and are successfully optimizing their resources by applying these techniques. eXaMpLe trade-offs such as sales volumes versus price points and the costs of carrying inventory versus the chances of stocks not available on demand are always part of day-to-day decision making for managers. Many of these business decisions are highly subjective or based on available data that is not that relevant. In one such example, a company’s analysis found that the driving force of customer sentiments on key social media sites is not its tV commercials but the interaction with the company’s call centers. the quality of service provided by the company, and the quality of call center interaction, was largely affecting the brand impact. Based on these insights, the company decided to divert part of the spending on tV commercials toward improving the call center satisfaction levels. the results were clearly visible; customer satisfaction surveys improved considerably, and there was a significant increase in customer base and revenues. 5 Chapter 1 ■ IntroduCtIon to BusIness analytICs and data analysIs tools Simulation, Modeling, and Optimization This section (and the chapter, by and large) explains the terminology and basics to build a background for the coming chapters, which will be more focused and technical in nature. Simulation There are various types of simulations. In the context of analytics, computer simulations, an oft-used term, is more relevant. Some real-world systems or scenarios might be complex and difficult to comprehend or predict. Predicting a snow storm or predicting stock prices are classic examples. They depend upon several variables or factors, which are practically impossible to predict. Daily stock prices, for instance, may be affected by current political conditions, major events during the day, international business environment, dollar prices, or simply the overall mood in the market. There can be various levels of simulations, from simple programs that are a few hundred of lines of code which are complex and millions of lines of code. Computer simulations used in atmospheric sciences are another classic example where complex computer systems and software are used for weather prediction and forecasting. Computer simulations use various statistical models in analytics. Modeling A modeling is merely the mathematical logic and concepts that go into a computer program. These models, along with the associated data, represent the real-world systems. These models can be used to study the effect of different components and predict system behavior. As discussed earlier, the accuracy with which a model represents a real-world system may vary and depends upon the business needs and resources available. For instance, 90 percent accuracy in prediction might be acceptable in banking applications such as the identification of loan defaulters, but in systems that involve human life—for instance, reliability models in aerospace applications—accuracy of 100 percent, or as close to it, is desired. Optimization Optimization is a term related to computer simulations. The sole objective of some computer simulations may be simply to ensure optimization, which in simple terms can be explained as minimization or maximization of a mathematical function, subject to a given set of constrains. In optimization problems, a set of variables might need to be selected from a range of available alternatives to minimize or maximize a mathematical function while working with constraints. Although optimization is discussed here in its most simple form, there is much more to it. An instance of a simple optimization problem is maximizing the working time of a machine, while keeping the maintenance costs below a certain level. If enough data is available, this kind of problem may be solved using advanced analytics techniques. Another instance of a practical optimization problem is chemical process factories, where an engineer may need to adjust a given set of process parameters in order to get maximum output of a chemical reaction plant, while also keeping the costs within budget. Advanced analytical techniques can be an alternative here as well. 6 Chapter 1 ■ IntroduCtIon to BusIness analytICs and data analysIs tools Data Warehousing and Data Mining Creating a data warehouse can be considered one of the most important basics. It can give a jump start to any business analytics project. Consider an example of a multilocational business organization with sales offices and manufacturing plants spread across the country. Today, in almost all large establishments, some amount of business process automation using homegrown or packaged applications such as SAP is expected. Some processes can be local, and their transaction data might be maintained at the branch level. It may not be possible to provide the head office with quarterly sales reports across the products and locations, unless all the relevant data is readily available to the reporting engine. This task is easier if the company links its branch-level data sources and makes them available in a central database. The data may need cleansing and transformations before being loaded in the central database. This is done in order to make the raw data more meaningful for further analysis and reporting. This central database is often called a data warehouse. The previous instance was just one example of a data warehouse. We live in a data age. Terabytes and petabytes of data are being continuously generated by the World Wide Web, sales transactions, product description literatures, hospital records, population surveys, remote-sensing data by satellites, engineering analysis results, multimedia and videos, and voice and data communications networks. The list is endless. The sources of data in a data warehouse can be multiple and heterogeneous. Interesting patterns and useful knowledge can be discovered by analyzing this vast amount of available data. This process of knowledge discovery is termed data mining (Figure 1-1). The sources of data for a data mining project may be multiple, such as a single large company-wide data warehouse or a combination of data warehouses, flat files, Internet, commercial information repositories, social media web sites, and several other such sources. Figure 1-1. Data mining What Can Be Discovered Using Data Mining? There are a few defined types of pattern discoveries in data mining. Consider the familiar example of the bank and credit card. Bank managers are sometimes interested only in summaries of a few general features in a target class. For instance, a bank manager might be interested in credit card defaulters who regularly miss payment deadlines by 90 days or more. This kind of abstraction is called characterization. In the same credit card example, the bank manager might want to compare the features of clients who pay on time versus clients who regularly default beyond 90 days. This is a comparative study, termed as discrimination, between two target groups. 7 Chapter 1 ■ IntroduCtIon to BusIness analytICs and data analysIs tools In yet another type of abstraction called association analysis, the same bank manager might be interested in knowing how many new credit card customers also took personal loans. The bank may also be interested in building a model that can be used as a support tool to accept or reject the new credit card applications. For this purpose, the bank might want to classify the clients as “very safe,” “medium risky,” and “highly risky,” as one of the steps. It might be done after a thorough analysis of a large number of client attributes. The bank might also be interested in predicting which customers can be potential loan defaulters, again based on an established model, which consumes a large number of attributes pertaining to its clients. This is predictive analysis. To open new branches or ATMs, the bank might be interested in knowing the concentration of customers by geographical location. This abstraction, called clustering, is similar to classification, but the names of classes and subclasses are not known as the analysis is begun. The class names (the geography names with sizable concentration) are known only after the analysis is complete. While doing a cluster analysis or classification on a given client attribute, there may be some values that do not fit in with any class or cluster. These exclusions or surprises are outliers. Outlier values might not be allowed in some model-building activities because they tend to bias the result in a particular direction, which may not be a true interpretation of the given data set. Such outliers are common while dealing with $ values in data sets. Deviation analysis deals with finding the differences between the expected and actual values. For example, it might be interesting to know the deviation with which a model predicts the credit card loan defaulters. Such an analysis is possible when both the model-predicted values and actual values are present. It is, in fact, periodically done to ascertain the effectiveness of models. If the deviation is not acceptable, it might warrant the rebuilding of the model. Deviation analysis also attempts to find the causes of observed deviations between predicted and actual values. This is by no means a complete list of patterns that can be discovered using data mining techniques. The scope is much wider. Business Intelligence, Reporting, and Business Analytics Business intelligence (BI) and business analytics are two different but interconnected techniques. As reported in one of SAS’s blogs, a majority of business intelligence systems aim at providing comprehensive reporting capabilities and dashboards to the target group of users. While business analytics tools can do reporting and dashboards, they can also do statistical analysis to provide forecasting, regression, and modeling. SAS business analytics equips users with everything needed for data-driven decision making, which includes information and data management and statistical and presentation tools. Analytics Techniques Used in the Industry The previous few sections introduced the uses of data mining or business analytics. This section will examine the terminology in detail. Only the frequently used terms in the industry are discussed here. Then the chapter will introduce and give examples of many of these analytics techniques and applications. Some of the more frequently used techniques will be covered in detail in later chapters. Regression Modeling and Analysis To understand regression and predictive modeling, consider the same example of a bank trying to aggressively increase its customer base for some of its credit card offerings. The credit card manager wants to attract new customers who will not default on credit card loans. The bank manager might want to build a model from a similar set of past customer data that resembles the set of target customers closely. This model 8

Description:
Practical Business Analytics Using SAS Basics of SAS Programming for Analytics Introduction to Business Analytics and Data Analysis Tools.
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.