MASTER'S THESIS Sales Prediction for Pharmaceutical Distribution Companies A Data Mining Based Approach Neda Khalilzadeh Master (120 credits) Master of Science in Electronic Commerce Luleå University of Technology Department of Business Administration, Technology and Social Sciences MASTER'S THESIS Sales Prediction for Pharmaceutical Distribution Companies- A Data Mining Based Approach Supervisors: Prof. Mohammad Mehdi Sepehri Prof. Peter Naude Referees: Dr. Amir Albadvi Dr. Lars Backstrom Prepared by: Neda Khalilzadeh Tarbiat Modares UniversityFaculty of Engineering Department of Industrial Engineering Lulea University of Technology Department of Business Administration and Social Sciences Division of Industrial Marketing and E-Commerce JOINTMSc PROGRAM IN MARKETING AND ELECTRONIC COMMERCE 2008 ABSTRACT Due to the tough competitions that exist today, most pharmaceutical distribution companies are in acontinuous effort to increasetheir profits and reduce their costs. Actually, both shortage and surplus of goods can lead to loss of income for these companies. One of the problems in pharmaceutical distribution organizations which deal with public health and pharmaceutical products ishow tocontrol inventory levels by means of accurate sales prediction in order to prevent costs of excessive inventory also prevent losing their customers because of drug shortage.Accurate sales prediction is certainly a valuable management tool to meet the mentioned goals since this leads to improved customer service, also, reduced lost sales and costs. However, most pharmaceutical distribution companies in Iran are still using heuristic or traditional statistical techniques to make sales prediction for their products. Thus, the purpose of this research is to apply an innovative and reliablesales prediction method for pharmaceutical distribution companies. To make sales prediction for a pharmaceutical distribution company, we needed to have past sales records of each drug. Accordingly, we gathered sales data of three years from Pakhsh Hejrat Co. which is one of the leading pharmaceutical distributors in Iran. We choseneural networks as our basic tools for sales prediction since most traditional methods like ARIMA are incapable of modeling nonlinearities that exist in most real data; also, they need forecaster’s supervision fortheparameter estimation phase. In fact, neuralnetworks are versatile tools for sale prediction sinceestimation with neural networks can be automatized,and theyhave proved very effective in order to make prediction by handling non-linearinput and output variables. Due to the fact that we did not have enough past sales records of drug items, we came up to a new idea of grouping drugs to find group members and make use of co-members’ sales data for each other. Thus, we did a comprehensive network based analysis in order to find clique-setsand group members. Afterwards, we built sales forecasting models with three different approaches: a) ARIMA methodology for time series forecasting, b) Hybrid neural network approach for time series forecasting by means of each drug’s past recodes,and c) Hybrid neural network approach for time series forecasting by means of each drug’s past records and its group members’past records. Ourevaluations and results indicated that our new methodology (number 3 above)was the best methodology, and the weakest one was ARIMA model. Keywords: Pharmaceutical Distribution Companies, Data Mining, Time Series Sales Forecasting,Network Based Analysis,Neural Networks 1 ACKNOWLEDGMENTS Actually, this researchwas carried out from spring of 2007 till spring of 2008 atTarbiat Modares University (TMU) in fulfillment of the MSc Joint program in marketing and e- commerce with Lulea University of Technology (LTU). While working on this thesis, I have learned manyunknown issues, which are really of great value to me. Here, I wouldlike to express my gratitude to all who helped me with their worthwhile supports during all stepsof this thesis. First of all, I would like to express my appreciation to my supervisors Prof. Mohammad Mehdi Sepehriand Prof. Peter Naudefor all their patience, kind supports, encouragements, and helpful and valuable advicewhile performing this research. Without their encouragements and valuable comments,I was not able to fulfill this project research. I also would like to thank Prof. Amir Albadvi, the head of Industrial Engineering Department of TMU,and Prof. Esmail Salehi-Sangari,the chairman of Industrial Marketing and E-commerce Division of LTU,for theirstrong and continuous supports in arranging this joint master program. Furthermore, I wantto express my thanks and regards to Dr. Hamid Farvaresh, the PHD candidatein Industrial Engineering Department in TMU, for his valuable and great helps and supports, in all steps of this research. In addition, I would like to extend my special thanks to the experts and sales managers of pharmaceutical distribution companies especially to Dr. Mostafavi and Prof. Sarkandi who guided me with their valuable comments and ideas. I also want to thank Mr. Haj Sadeghi for providing me with Pakhsh Hejrat’s past sales records. And last but not least, I want to thank my families and friends, specially my parentsand my dear brother, for their encouragements, patience, and kind supports while accomplishing this research. I would like to dedicate my thesis tomy lovely parents in order to express my deep appreciation towardsthem for their unending and notable supports in all aspects of my life. Neda Khalilzadeh 24thMarch 2008 2 TABLE OF CONTENTS CHAPTER 1...................................................................................................................................9 INTRODUCTION..........................................................................................................................9 1. Introduction.............................................................................................................................9 1.1.Background ofthe Study....................................................................................................10 1.2. Motivation of the Study......................................................................................................15 1.3. Pharmaceutical Distribution Companies in Iran................................................................16 1.4. Problem Discussion............................................................................................................19 1.4.1. Problem Definition......................................................................................................20 1.5. Research Objectives and Question.....................................................................................23 1.6. Research Framework..........................................................................................................24 1.7. Contributions of the Research............................................................................................25 1.8. Outline of the Thesis..........................................................................................................26 CHAPTER 2.................................................................................................................................28 DEFINITIONS AND LITERATURE REVIEW..........................................................................28 2. Definitions and Literature Review........................................................................................28 2.1. Data Mining and Knowledge Discovery............................................................................29 2.1.1. An Overview of Data Mining......................................................................................29 2.1.2. Difference between knowledge Discovery and DataMining.........................................30 2.1.3. Descriptions of the Knowledge Discovery and Data Mining in Literatures...............31 2.1.4. The Process of Data Mining and Knowledge Discovery............................................32 2.1.5. Data Mining Machine Learning and Statistics............................................................34 2.1.6. Data Mining Tasks and Techniques............................................................................35 2.1.7. Choose Appropriate Data Mining Techniques............................................................38 2.1.8. Data Mining Applications...........................................................................................38 2.2. Neural Networks................................................................................................................41 2.2.1. Neural Networks Architectures...................................................................................43 2.2.2. Learning Process..........................................................................................................47 2.2.3. Back Propagation Algorithm.......................................................................................49 3 2.2.4. Single-layer Perceptron...............................................................................................51 2.2.5. Multilayer Perceptron (MLP)......................................................................................52 2.2.6. Radial basis function network (RBF)..........................................................................54 2.2.7. Time Delay Neural Networks (TDNN).......................................................................55 2.2.8. Recurrent neural networks...........................................................................................56 2.2.9. Benefits and Limitations of Neural Networks.............................................................56 2.2.10. Applications of Neural Networks..............................................................................58 2.3. Supply Chain......................................................................................................................63 2.3.1. Inventory and Inventory Management........................................................................65 2.3.2. Participants in the Supply Chain.................................................................................66 2.3.3. Logistics.......................................................................................................................67 2.3.4. Why Demand Forecasting is Essential in a Supply Chain?.........................................68 2.4. Forecasting.........................................................................................................................69 2.4.1. Sales Forecasting.........................................................................................................71 2.4.2. Time Series Forecasting..............................................................................................71 2.4.3. Time Series Forecasting Models.................................................................................79 2.4.4. Comparisons of Different Time Series Forecasting Models.......................................81 CHAPTER 3.................................................................................................................................85 RESEARCH METHODOLOGY..................................................................................................85 3. Research Methodology..........................................................................................................85 3.1. Research Design.................................................................................................................86 3.1.1. Research Purpose.........................................................................................................87 3.1.2. Research Approach......................................................................................................88 3.1.3. Research Strategy........................................................................................................89 3.2. Research Process................................................................................................................89 3.2.1. Data Collection and Description..................................................................................92 3.2.2. Data Preparing.............................................................................................................93 3.3. Exploratory Analysis..........................................................................................................98 3.4. NetworkBased Analysis....................................................................................................99 3.5. Methods Used for Modeling and Time Series Forecasting...............................................99 CHAPTER 4...............................................................................................................................104 4 RESULTS AND ANALYSIS.....................................................................................................104 4. Results and Analysis............................................................................................................104 4.1. Data preprocessing...........................................................................................................105 4.2. Exploratory Analysis........................................................................................................107 4.2.1. Data Visualization.....................................................................................................107 4.2.2.Descriptive Statistics.................................................................................................111 4.3. Statistical Analysis of Drugs’ Networks..........................................................................113 4.3.1. Network Identification...............................................................................................114 4.4. Building Sales Forecasting Model...................................................................................125 4.4.1. ARIMA Methodology for Time Series Forecasting..................................................128 4.4.2. A Hybrid Neural Network Approach for Time Series Forecasting...........................138 4.4.3. Model Evaluation......................................................................................................144 4.5. Summary..........................................................................................................................147 CHAPTER 5...............................................................................................................................148 CONCLUSION...........................................................................................................................148 5. Conclusion...........................................................................................................................148 5.1. Conclusion........................................................................................................................148 5.2. Contributions....................................................................................................................152 5.2.1. Methodological Contribution....................................................................................152 5.2.2. Empirical Contribution..............................................................................................152 5.3. Managerial Implications...................................................................................................153 5.4. Research Limitations........................................................................................................156 5.5. Further Research..............................................................................................................157 REFERENCES...........................................................................................................................159 Ho, T.B., (2002), “Introductionto Knowledge Discovery and Data Mining”, Available at: ttp://www.netnam.vn/unescocourse/knowlegde/know_frm.htm................................................164 APPENDIX 1..............................................................................................................................172 Tables of Prediction Results with Neural Networks...................................................................172 APPENDIX 2..............................................................................................................................184 Definitions of Codesin Result Tables of Forecasting................................................................184 5 LIST OF TABLES Table 1: Applications of data mining in marketing and Business................................................41 Table 2: Applications of Neuronal Networks in Business Forecasting........................................62 Table 3: Related Works on Applications of Neural Networks in Sales and Demand Forecasting77 Table 4: Sales Data Information Collected from Pakhsh Hejrat Company..................................93 Table 5: Amount of Indices for Different өs................................................................................97 Table 6: Descriptive Measures of 21 Drugs...............................................................................112 Table 7: Cliques Sets..................................................................................................................116 Table 8: Descriptive Statistics of Degrees..................................................................................120 Table 9:Frequency and Cumulative Frequency of Degrees.......................................................121 Table 10: Group Members..........................................................................................................128 Table 11: Parameter Estimation..................................................................................................135 Table 12: Four Months’ Forecasting Results for the Original Data...........................................135 Table 13: Sensitivity Analysis of Drug 71 for the best 5 models of 20 tried models.................144 Table 14: Forecasting Results and Accuracy Measures of Test Data for Drug 24 (with Its Own Past Records)..............................................................................................................................146 Table 15: Forecasting Results and Accuracy Measures of Test Data for Drug 24 (with Its Partner’s Past Records)...............................................................................................................146 Table 16: Forecasting Results and Accuracy Measures of Test Data for Drug 71 (with Its Own Past Records)..............................................................................................................................146 Table 17: Forecasting Results and Accuracy Measures of Test Data for Drug 71 (with Its Partner’s Past Records)...............................................................................................................146 6 LIST OF FIGURES Figure 1: Pharmaceutical Distribution Channel............................................................................18 Figure 3: A Schematic Representation of the Research Framework............................................25 Figure 4: Outline of the Thesis....................................................................................................27 Figure 5: A Typical knowledge Discovery Process......................................................................30 Figure 6: Combining cognitive psychology with artificial intelligence, databases, and statistics31 Figure 7: Application of Data Mining for Marketing...................................................................40 Figure 8: A Simple Neural Network.............................................................................................44 Figure 9: A Neural Network with one Hidden Layer...................................................................44 Figure 10: A Neural Network with a Wide Hidden Layer............................................................45 Figure 11: A Neural Network with Multiple Outputs...................................................................45 Figure 12: The Unit of an Artificial Neural Network...................................................................46 Figure 13: Three Common Transfer Functions............................................................................47 Figure 14: A Back-Propagation Network Model..........................................................................51 Figure 15: Structure of the MLP...................................................................................................53 Figure 16: The RBF Neural Network Topology...........................................................................55 Figure 17: TDNN Architecture.....................................................................................................55 Figure 18: Recurrent neural network............................................................................................56 Figure 19: A time-delay neural network.......................................................................................60 Figure 20: An Example of a Non-Stationary Time Series............................................................73 Figure 21: An Example of a Stationary Time Series....................................................................74 Figure 22: Research Design of this Study.....................................................................................86 Figure 23: Research Process.........................................................................................................91 Figure 24: Amount of Indices for Different өs.............................................................................97 Figure 25: Time Series Plot of Sales Data for Drug 211............................................................108 Figure 26: Time Series Plot of Sales Data for Drug 174............................................................108 Figure 27: Time Series Plot of Sales Data for Drug 24..............................................................109 Figure 28: Multiple Time Series Plot of 3 Drugs.......................................................................109 Figure 29: 3D Surface Plot for Drugs 24, 78, 84........................................................................110 Figure 30: 3D Surface Plot for Drugs 24, 31, 37........................................................................110 Figure 31: 3D Surface Plot for Drugs 24, 174, 86......................................................................111 Figure 32: A Part of Output of UCINET for Clique-sets ...........................................................118 Figure 33: Cluster (Tree) Diagram.............................................................................................119 Figure 34: A Part of Output of Degree Centrality Measures......................................................120 Figure 35: Degree Distribution of the market graph...................................................................121 Figure 36: Networks of Drugs....................................................................................................123 Figure 37: Network of Drugs after Deleting Isolated Nodes and Nodes with Degree 1............123 Figure 38: Network of Clique Indicators....................................................................................125 Figure 39: Time Series Plot of Drug 24 Sales Data....................................................................130 7 Figure 40: Autocorrelation Diagram...........................................................................................130 Figure 41: Partial Autocorrelation Diagram...............................................................................131 Figure 42: Time series plot of drug 24 sales data after differentiation and log transforming....131 Figure 43: Auto-correlation Function of Drug 24 Sales Data.....................................................133 Figure 44: Partial Auto-correlation Function of Drug 24 Sales Data.........................................134 Figure 45: Plot of Forecasting.....................................................................................................136 Figure 46: Normal Probability Plot of Residuals for ARMA (1, 1, 0).......................................137 Figure 47: Autocorrelation Function of Residuals......................................................................137 Figure 48: Partial Autocorrelation Function of Residuals.........................................................138 8
Description: