ebook img

Blue Martini Software Data Mining Tutorial for Data Analysts PDF

106 Pages·2003·4.82 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Blue Martini Software Data Mining Tutorial for Data Analysts

Blue Martini Software Data Mining Tutorial for Data Analysts For Blue Martini Analytics versions 1.2 and above By Michael Berry and Brij Masand Data Miners, Inc. For feedback, please send e-mail to [email protected] with the title “Mining Tutorial Feedback” Data Mining Tutorial Copyright © 1999-2003 Blue Martini Software, Inc. All rights reserved. The information in this document is confidential and proprietary to Blue Martini Software, Inc. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, without the prior written consent of Blue Martini Software, Inc. Restricted Rights Legend Use, duplication, or disclosure by the United States Government is subject to the restrictions set forth in DFARS 252.227-7013 (c)(1)(ii) and FAR 52.227-19. Patents Patents pending. Trademarks Blue Martini, Blue Martini Software, and the Blue Martini Software logo are trademarks of Blue Martini Software, Inc., 2600 Campus Drive, San Mateo, CA 94403. All other company names and their associated products may be the trademarks of their respective owners. Statement of Conditions This document, as well as the product(s) described herein, is subject to change without notice, and should not be construed as a commitment by Blue Martini Software, Inc. The statements, configurations, technical data, and recommendations in this document are believed to be accurate and reliable, but are presented without warranty. Blue Martini Software, Inc. assumes no liability for any errors or inaccuracies that may appear in this document, makes no warranty of any kind (express, implied, or statutory) with respect to this publication, and expressly disclaims any and all warranties of merchantability, fitness for particular purposes, and non-infringement of third-party right. Blue Martini Software, Inc. does not assume any liability that may occur due to the use or application of the product(s) described herein. Users must take full responsibility for their applications of any products specified in this document. The software described in this document is furnished under a license agreement and may only be used in accordance with the terms of that license. Acknowledgements The Blue Martini system contains technology from the following vendors: Acxiom Corporation, Advanced Visual Systems Inc., Akamai Technologies, BEA Systems, Inc., Blaze Software, Inc., Borland Software Corporation, Crystal Decisions, Inc., CyberSource Corporation, Fujitsu Software Corporation, Hummingbird USA Inc., IBM Corporation, RuleQuest Research Pty. Ltd., Silicon Graphics, Inc. ("Powered By MineSet": copyright 2000, Silicon Graphics, Inc. Used by permission. All rights reserved.), Taxware International, Inc., and Tom Sawyer Software. Contributors Authors: Michael Berry and Brij Masand Technical Contributors and Reviewers: Ronny Kohavi, Llew Mason, Rajesh Parekh, Zijian Zheng. October 30, 2002. Updated 4/29/2003 i Data Mining Tutorial Contents Contents..........................................................................................................................................3 1 What is Data Mining and How is it Useful?...........................................................................5 1.1 Data Mining Tasks..........................................................................................................6 1.1.1 Hypothesis Testing..................................................................................................6 1.1.2 Profiling..................................................................................................................7 1.1.3 Making Recommendations.....................................................................................8 1.1.4 Predicting the Future...............................................................................................9 2 Overview of the Tutorial.......................................................................................................10 2.1 What You Will Learn....................................................................................................10 2.2 What You Won’t Learn................................................................................................11 3 The Data Mining Process......................................................................................................11 3.1 Formulating the Business Goal.....................................................................................11 3.1.1 Defining Customer Segments...............................................................................12 3.1.2 Offering the Right Product....................................................................................13 3.1.3 Defining Product Assortments..............................................................................13 3.2 Transforming Data into Actionable Information..........................................................13 3.2.1 Exercise: Getting Started......................................................................................14 3.2.2 Data Sources.........................................................................................................16 3.2.3 Transformations....................................................................................................16 3.2.4 Destinations...........................................................................................................16 3.3 Selecting Appropriate Data for your Business Problem...............................................16 3.3.1 Choosing the Right Level of Detail......................................................................17 3.3.2 Data Stars and Decision Support Databases.........................................................18 4 Investigating Business Questions using Visualization, Statistics, and Reports....................20 4.1 Exercise: Exploring Customer Profiles.........................................................................20 4.2 Exercise: Using Targeted Statistics to Investigate Customer Attitude.........................29 4.3 Exercise: How does Purchase Behavior Differ by Week, Month, and Time of Day?..34 4.4 Sharing Investigation Results.......................................................................................54 4.4.1 Creating a Blue Martini Report Template............................................................55 i Data Mining Tutorial 4.4.2 Applying the Custom Template to Build a New Report in the Analysis Center..57 4.4.3 Running and Viewing the Report in the Analysis Center.....................................60 4.4.4 Running Visualizations using Java Web Start......................................................62 5 Preparing Data to Better Address your Business Questions.................................................62 5.1 Reasons to Transform Data...........................................................................................63 5.1.1 Bringing Information to the Surface.....................................................................63 5.1.2 Taking Advantage of Domain Knowledge...........................................................64 5.1.3 Capturing Trends..................................................................................................64 5.1.4 Dealing with Rare Events.....................................................................................64 5.1.5 Fixing Problems....................................................................................................65 5.2 Data Transformations and Modifications.....................................................................65 5.2.1 Transformations....................................................................................................65 5.2.2 Exercise: What are the Top 10 Products?.............................................................68 5.2.3 Exercise: Summarizing Total Revenue and Total Quantity Sold.........................80 5.2.4 Modifications........................................................................................................83 5.2.5 Assigning an Attitude to Customers Who Leave it Blank....................................84 5.2.6 Exercise: Finding Anomalies................................................................................84 5.3 Discovering Decision Rules that Predict Customer Behavior......................................88 5.3.1 Exercise: Who Buys Tents?..................................................................................88 5.3.2 Building Models that Produce Scores...................................................................96 5.3.3 Building Stable, Reliable Models.........................................................................96 5.3.4 What Should Each Customer be Shown First?.....................................................98 5.4 Generating Association Rules.......................................................................................99 6 Closing the Loop.................................................................................................................103 6.1 Shipping Rules to the Store.........................................................................................103 6.2 Shipping Scores to the Store.......................................................................................104 6.3 Generating Targeted Lists for Campaigns..................................................................104 6.4 Measuring the Effects of an Action............................................................................105 6.5 Mining the Results of Your Actions...........................................................................105 7 Summary and Next Steps....................................................................................................106 i Data Mining Tutorial Introduction This tutorial is written for people who want to analyze their data using Blue Martini Software’s Integrated Analytics, derive insights from the results, share the results with business users using Blue Martini Software’s Analysis Center, and score data for personalization and campaigns. The tutorial focuses on data transformations (which is usually the most time-consuming process), reporting, visualization, statistics, classification, and anomaly detection. While the examples used in this tutorial are taken from a fictional web site (Blue Planet), the processes and the product are not restricted to web data and can be used against any data source. 1 What is Data Mining and How is it Useful? Data mining is an investigative process that uses a variety of analysis tools to discover meaningful patterns and relationships in data that can be used to reliably predict future behavior. These relationships can be used to gain insight about what has happened in the past and to make valid predictions about what will happen in the future. In short, data mining is an approach to business decision making that puts data at the heart of the process. Data mining is especially useful for businesses that interact with a large number of customers and potential customers through a variety of channels, such as web sites. These channels capture copious amounts of data about every customer interaction and can reveal insightful information. Unlike other more familiar data sources, the electronic customer interaction data captured by the Blue Martini web store has several advantages. In particular, the data collected from electronic channels is: • Clean—the captured data better reflects user behavior and eliminates sessions created by web robots (often called “bots”) and spiders. • Comprehensive—data is available for every customer and every prospect covering all of their web store interactions. In contrast, market research surveys, focus groups, and customer interviews can only hope to tell you what a subset of customers claim to be thinking and doing. • Behavioral—the collected data reflects customers’ actual actions rather than expectations based on the customers’ ages, incomes, genders, addresses, or stated interests. For example, it might seem unusual for a sixty-year-old woman to purchase Magic: The Gathering™ cards or Anime videos. However, she might have a twelve- year-old grandson, or perhaps her tastes simply run counter to stereotype. In either case, her past behavior can guide future interactions. i Data Mining Tutorial • Actionable—much of the data collected pertains to things that your business controls— your products, your campaigns, your up-sell and cross-sell rules, your pricing decisions. Collected behavioral data can identify actionable customer segments. If you identify a segment of customers whose behavior shows that they are not sensitive to price, you can stop sending them discounts. If you identify a segment of customers who tend to respond well to new products, you can target new product introductions at them. Used properly, this clean, comprehensive, behavioral, actionable data can greatly improve the quality of the customer experience by making your store seem more responsive and appropriate. This, in turn, leads to satisfied, loyal customers who come back more often and buy more products. A great goal, but data alone cannot accomplish it; only clever marketing and merchandising can do that. Data mining is the process that clever marketers and merchandisers use to turn customer data into marketing actions. This tutorial introduces you to that process using Blue Martini Software’s Integrated Analytics. Note: This tutorial was written using Integrated Analytics, version M1.2.1. Version M1.3 of Integrated Analytics has slight differences, which are footnoted in this document. 1.1 Data Mining Tasks Data mining is sometimes defined as “the exploration and analysis, by automatic or semiautomatic means, of large quantities of data in order to discover meaningful patterns and rules.” 2 This broad definition covers a variety of tasks and activities. This tutorial explores data mining tasks that are particularly relevant to electronic commerce—hypothesis testing, profiling, recommendation, and prediction. 1.1.1 Hypothesis Testing Hypothesis testing is the simplest form of data mining. Hypothesis testing uses data to confirm or deny the validity of things that might be true about customers, products, or the interaction of the two. Are people who buy products through an online channel also more likely to use that channel for customer support? That is the hypothesis of one major American computer company that sells its products through a web store and by telephone, but not through retail stores. This company is 1 Data Mining Techniques for Marketing, Sales, and Customer Support by Michael J. A. Berry and Gordon S. Linoff, 1997, John Wiley & Sons. 2 Data Mining Techniques for Marketing, Sales, and Customer Support by Michael J. A. Berry and Gordon S. Linoff, 1997, John Wiley & Sons. i Data Mining Tutorial convinced that one of the key competitive advantages it gets from its emphasis on web-based sales is that while the competition’s customers are tying up expensive customer-support phone lines, their own customers are looking up the answers to technical questions on the web. This is a good example of a hypothesis that can be tested using readily available data. To confirm this hypothesis, the company mined its customer support data to compare the behavior of customers who ordered by phone with the behavior of customers who ordered on the web. Sure enough, there was a strong relationship between purchase behavior and customer support cost. Customers who ordered on the web were not only less costly to acquire, they were less costly to support. Further study confirmed another hypothesis. The least expensive computers tend to appeal to the least experienced users who, in turn, account for the largest volume of calls to customer support. By making sure that their computers are not the lowest-cost product in each niche, the company can avoid these low-revenue, high-cost customers. Every web-based business can and should be asking the same kinds of questions that this company asks. Are men more likely than women to buy backpacking equipment? Do people who spend a long time reading product information buy more of these products? Do repeat buyers buy more in subsequent orders than they do in the first order? Do users of some web browsers have trouble with some areas of the web store? Often, the quickest way to test a hypothesis is to look at a chart or visualization. In this tutorial you will create tables, reports, charts, and heat map visualizations to test various hypotheses about Blue Planet customers. Blue Planet is an imaginary outdoor equipment web store that ships with the Blue Martini software. 1.1.2 Profiling Profiling is a data mining technique that discovers rules about customers who buy a particular product or behave in a particular way. Profiling can be used to build customer segments, to find prospects for a particular product, or to choose the right message for a particular customer. In the Blue Martini system, profiles based on rules can be derived from customer data and used to improve personalization and better target campaigns. Profiles can also be used to generate scores for anything from credit worthiness to propensity to redeem coupons. A score is a way of expressing how closely a customer matches the profile of specific customer types (for example, customers who fail to pay bills or like to use coupons.) Scores are one of the most useful outputs from the data mining process. In this tutorial you will build a profile of customers who buy a specific product and use it to produce scores. You will also see how to ship both rules and scores to the web store. i Data Mining Tutorial 1.1.3 Making Recommendations There are two styles of shopping: searching and browsing. The same customer adopts different styles of shopping on different visits to the store; one day searching for “that great Oregon Pinot Noir he had at the restaurant last night” or “the Bonzo Dog album with Mickey’s Son And Daughter” and another day browsing for “something to serve with couscous” or “something to read on the airplane.” A well-designed web site makes searching easy, but even the best sites are difficult to browse. With today’s technology, there is simply no practical way to wander through a web store the way one might wander through a bricks-and-mortar bookstore or wine shop glancing at hundreds of items while waiting for an impulse to buy. To avoid loosing the business of customers who are browsing rather than searching, you must configure your web store to make good recommendations. The Blue Martini System makes it easy to mine customer data for recommendations that are likely to be effective. In this tutorial you will learn two different ways to use data mining to make recommendations. One approach, illustrated by the live Blue Martini web store in Figure 1, uses scores to pick particular categories or items that are likely to appeal to a customer based on his or her stated preferences and past purchases. Figure 1. A Blue Martini web store recommends wines based on a customer’s ratings i Data Mining Tutorial The second approach, illustrated in Figure 2, is to issue recommendations based on what the customer is currently viewing. In this case, a customer looking at jeans is shown a shirt that would look good with them. This sort of recommendation, known as a cross-sell, can be based on the merchandiser’s intuition alone, or on association rules found through data mining. Essentially, association rules capture the likelihood that a customer who has already selected product A will select product B. Association rules found through data mining can also be used to define product assortments. After completing the tutorial, you will understand how to use the Blue Martini data mining manager to implement both score-based and association-rule based recommendations. Figure 2. A Blue Martini powered cross-sell recommendation 1.1.4 Predicting the Future Frequently, the goal of data mining is to use data from the past to make predictions about the future. How long will this customer remain active? How will a promotion on girls’ shorts affect sales of tank tops? What response rate can we expect for the upcoming e-mail campaign? All of these future-oriented questions can be answered by finding the drivers for similar outcomes in the past. The trick is to find patterns in data from the more distant past to explain events i Data Mining Tutorial observed in the recent past. When data from the recent past is fed into such models, the result is predictions about the future. To build predictive models, you must pay attention to certain things that are less important when building simpler profiles. In particular, since all the data used to create the models is from the past (when the outcomes are already known), it is possible for information that will not actually be available when the model is deployed to “leak” into the model. This tutorial teaches you how to recognize and eliminate such leaks in order to build more useful predictive models. 2 Overview of the Tutorial The goal of this tutorial is to introduce you to important data mining concepts while introducing some of the most important features of Blue Martini Integrated Analytics. The tutorial uses data from the fictional Blue Planet web store to show how the Data Mining Manager and Analysis Center are used to address actual business needs that arise in real companies. 2.1 What You Will Learn This tutorial covers the entire data mining process from selecting the right data to mine to publishing the final results as interactive reports for business users. Along the way, you will be exposed to the tools you need to: • Create tables, charts, and visualizations • Create new variables • Transform existing variables in various useful ways • Summarize transaction data to the appropriate level of aggregation • Publish results to the web-based Analysis Center using JSP • Create profiles, rules, and scores • Detect and correct data anomalies • Generate cross-sell rules from product associations • Build predictive models • Evaluate data mining results • Take action based on data mining results Of course, in a brief tutorial, none of these topics are treated in depth. The most fundamental skills are introduced first in a detailed, click-by-click manner. As the tutorial progresses, the instructions become less detailed i

Description:
An analysis chain or data mining chain starts with a data source and ends with a rules can also be shipped from the data mining manager to the web store,
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.