ebook img

Ali Dag_Dissertation_Final_Submission PDF

126 Pages·2016·1.95 MB·English
by  
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Ali Dag_Dissertation_Final_Submission

A Data Driven Framework to Identify the Critical Variables, Visualize Their Conditional Relations and Predict the Outcomes of U.S. Heart Transplants by Ali Dag A dissertation submitted to the Graduate Faculty of Auburn University in partial fulfillment of the requirements for the Degree of Doctor of Philosophy Auburn, Alabama August 6, 2016 Keywords: Data Mining, Bayesian Belief Networks, Healthcare Analytics, Medical Decision Making, Transplantation, United Network for Organ Sharing (UNOS) Copyright 2016 by Ali Dag Approved by Fadel M. Megahed, Chair, Assistant Professor of Industrial and Systems Engineering Jorge Valenzuela, Professor of Industrial and Systems Engineering Richard Sesek, Professor of Industrial and Systems Engineering Mark Carpenter, Professor of Mathematics and Statistics Abstract Predicting the survival of heart transplant patients is an important, yet challenging problem since it plays a crucial role in understanding the matching procedure between a donor and a recipient. Recent studies have shown that data mining models can be used to effectively analyze and extract novel information from large/complex transplantation datasets. The objective of this dissertation is to gain hidden, novel and useful information from these large and complex heart transplant datasets by employing data mining techniques, which helps decision makers to have a better understanding. Specifically, this work: 1) identifies the predictive factors for short-, mid- and long- term survival after the heart transplant, as well as their time-dependent effects on the given follow- up time point. Therefore, it enables us to differentiate the factors whose effect change over time, 2) develops a DSS tool that provides the patient-specific failure risk score based on the values of the relevant preoperative predictors, as well as to investigate the conditional relations among the important predictors of long term survival after heart transplants and 3) is an exploratory study that is still in progress, which evaluates the effect of the newly added variables to the predictability of the survival outcome. Overall, the research goal is to develop mathematical models and tools that present important retrospective findings, which can be the basis for a prospective medical studies. ii Acknowledgments Firstly, I would like to express my sincere thanks to my advisor Dr. Fadel Megahed for providing me continuous support throughout my PhD study. I have learnt how to make a good research that will have a high impact on the society from him. His help, patience and endless support guided me during the research and writing process. Without having him, I could not have completed this process. I would also like to thank to Dr. Richard Sesek., Dr. Carpenter and Dr. Valenzuela for their encouragement and support throughout my PhD. I also would like to thank my wife Zeynep for her patience and love which has enabled me to complete this challenging process. I would also like to express my thanks to my father (Hasan Dag), mother (Secil Dag), mother-in-law (Nesrin Taspinar) and father-in-law (Tahir Taspinar), my brothers and sisters for their valuable sacrifice and support. Last but not least, I would like to present my appreciation to my primary school teacher (Selma Yilmaz) for her unbelievable effort to support me. Without having had her during those difficult years, I would not have had a PhD in the U.S. This work was supported in part by Health Resources and Services Administration contract 234-2005-370011C. The content is the responsibility of the authors alone and does not necessarily reflect the views or policies of the Department of Health and Human Services, nor does mention of trade names, commercial products, or organizations imply endorsement by the U.S. Government. iii Table of Contents Abstract ........................................................................................................................................... ii Acknowledgments.......................................................................................................................... iii Preface............................................................................................................................................. x 1 Introduction ............................................................................................................................. 1 1.1 Problem Description and Significance............................................................................ 1 1.2 Research Objectives ........................................................................................................ 2 1.3 Dissertation Layout ......................................................................................................... 3 1.4 References ....................................................................................................................... 4 2 Predicting Heart Transplantation Outcomes through Data Analytics..................................... 5 2.1 Abstract ........................................................................................................................... 5 2.2 Introduction ..................................................................................................................... 6 2.3 Methodology ................................................................................................................. 10 2.3.1 Data Preparation........................................................................................................ 11 2.3.1.1 Data Cleaning....................................................................................................... 13 2.3.1.2 Data Inclusion Criteria ......................................................................................... 13 2.3.1.3 Sampling Methods ............................................................................................... 14 2.3.2 Data Analytics Models .............................................................................................. 14 2.3.2.1 Support Vector Machine ...................................................................................... 15 2.3.2.2 Artificial Neural Networks .................................................................................. 16 iv 2.3.2.3 Decision Trees ..................................................................................................... 16 2.3.2.4 Logistic Regression .............................................................................................. 17 2.3.3 Sensitivity Analysis of Predictor Variables .............................................................. 17 2.3.4 Information Fusion.................................................................................................... 18 2.4 Results and Discussion ................................................................................................. 20 2.4.1 Data Analytic Model Results .................................................................................... 21 2.4.2 Information Fusion-based Sensitivity Analysis Results ........................................... 25 2.5 Conclusions and Future Recommendations .................................................................. 34 2.6 References ..................................................................................................................... 38 3 A Preoperative Recipient-Donor Heart Transplant Survival Score ...................................... 45 3.1 Abstract ......................................................................................................................... 45 3.2 Introduction ................................................................................................................... 46 3.3 Proposed Method .......................................................................................................... 48 3.3.1 Data Acquisition and Preparation ............................................................................. 49 3.3.2 Variable Selection Methods ...................................................................................... 51 3.3.2.1 Data Mining-based Variable Selection Models ................................................... 52 3.3.2.2 Genetic Algorithms (GA) ................................................................................. 56 3.3.2.3 Ridge Regression .............................................................................................. 56 3.3.2.4 Variable Selection through Cox Survival Analysis Regression Model and Literature Review ............................................................................................. 57 3.3.2.5 Creating Possible Predictor Sets ....................................................................... 58 v 3.3.3 Use of Bayesian Belief Networks ............................................................................. 58 3.4 Results and Discussion ................................................................................................. 61 3.4.1 Variable Selection Results ........................................................................................ 61 3.4.1.1 Data Mining-based Variable Selection Results ................................................ 61 3.4.1.2 Variable Selection Results based on the Cox Model ........................................ 65 3.4.1.3 Variable Selection based on the Literature ....................................................... 66 3.4.2 The Union Set of Data Mining, Cox Regression and Domain-Experts Predictors ... 66 3.4.3 BBN Model Results .................................................................................................. 68 3.5 A Decision Support Tool for Providing Insights to Medical Practitioners ................... 73 3.6 Conclusions and Future Recommendations .................................................................. 75 3.7 References ..................................................................................................................... 78 4 An Exploratory Study to Evaluate the Effect of the Newly Added Variables to the Predictability of the Heart Transplant Outcomes .................................................................. 86 4.1 Abstract ......................................................................................................................... 86 4.2 Introduction ................................................................................................................... 87 4.3 Methodology ................................................................................................................. 91 4.3.1 Data Acquisition and Preparation ............................................................................. 92 4.3.1.1 Data Cleaning and Differentiating the Newly added Variables ......................... 93 4.3.1.2 Data Inclusion Criteria ....................................................................................... 95 4.3.2 Variable Selection ..................................................................................................... 97 4.3.2.1 Fast Feature Selection (FFS) via Information Gain Analysis ............................ 98 4.3.2.2 Random Forests ................................................................................................. 99 4.3.3 Prediction Models ................................................................................................... 100 4.3.3.1 Tree Augmented Naïve (TAN) Bayesian Belief Network ............................... 100 vi 4.3.3.2 Logistic Regression .......................................................................................... 100 4.4 Results and Discussion ............................................................................................... 101 4.4.1 Variable Selection Results ...................................................................................... 101 4.4.2 Prediction Results ................................................................................................... 103 4.4.3 Sensitivity Analysis Results .................................................................................... 105 4.5 Conclusions and Future Recommendations .................................................................... 109 4.6 References ........................................................................................................................ 111 5 Conclusion and Summary of Dissertation Contributions ............................................ 115 vii List of tables Table 2.1 Number of Survivals, Failures, and Excluded Observations over Three-time Points…14 Table 2.2: The List of the Data Analytic Models used for Each Time Period ............................. 21 Table 2.3: Classification Results of Models for 1-,5-,and 9-year Time Points .......................... 26 Table 2.4: The Agreement of Four Models on the Important Variables for Each Time Point ..... 23 Table 2.5: A Numeric Comparison of the Numer of Important vs. Unimportant Variablesfor Each IF models ........................................................................................................... 23 Table 3.1: Results of the Six Evaluation Metrics for the C&RT and ANN for 10-fold Samples….63 Table 3.2: Data Mining Models Variable Set (DMVS) ................................................................ 65 Table 3.3: Cox Model Variable Set .............................................................................................. 65 Table 3.4: BBN Variables ............................................................................................................. 67 Table 3.5: BBN Classification Results ......................................................................................... 70 Table 3.6: Performance of the BBN with Different cutoffs for the cscores ................................. 73 Table 4.1: Variables that are Added to UNOS Heart Transplant Databases after 2004 ............... 95 Table 4.2: Number of Survivals, Failues and Excluded Observations over the Time Points ....... 96 Table 4.3: The Number of the Features Selected through Variable Selection Methods and Literature Review...................................................................................................... 101 Table 4.4: Variables that are Selected through Different Time Points ....................................... 102 Table 4.5: The List of the Prediction Models used for Each Time Period ................................. 103 Table 4.1: Prediction Results obtained through Including and Excluding the Newly Added Variables ................................................................................................................... 104 viii List of Figures Figure 2.1 An Overview of the Proposed Hybrid Data Analytic Approach ................................. 11 Figure 2.2 The importance of the Variables through Three Time Periods ................................... 28 Figure 3.1 An Overview of the Proposed Methodology ............................................................... 50 Figure 3.2 Three-Augmented Naïve Bayes Structure ................................................................... 60 Figure 3.3 Sensitivity Analysis for ML-based Variable Selection Models .................................. 62 Figure 3.4 The (fused) Importance of the Union Set of Predictors based on the IF model .......... 64 Figure 3.5 TAN Structure of the Proposed Method ...................................................................... 71 Figure 3.6 The Interface of the Decision Support Tool ................................................................ 74 Figure 4.1 An Overview of the Proposed Methodology ............................................................... 92 Figure 4.2 The Most Important Contributory Predictors for 1-month Survival Prediction ........ 106 Figure 4.3 The Most Important Contributory Predictors for 1-year Survival Prediction ........... 107 Figure 4.4 The Most Important Contributory Predictors for 5-year Survival Prediction ........... 108 ix Preface This dissertation is submitted to the Graduate Faculty of Auburn University in partial fulfillment of the requirements for the Degree of Doctor of Philosophy in Industrial and Systems Engineering. The completion of this dissertation involves many steps. In the fall semester of 2013, I met with Dr. Megahed to talk about the research concept that I am interested in. The general concept was to apply data mining models to extract useful information from heart transplant datasets. After having consecutive several meetings with him, we came up with a stream of research ideas involving data analytical applications on the concept of survival prediction after heart transplantation. The first paper in the stream used several data mining models to predict short-, mid- and long-term survival after heart transplantation. He recommended such an idea since it allowed us to differentiate the factors whose importance (on survival) vary over time. After completing the analyses, we have shared our findings with Dr. Serkan Bulur and Dr. Hussam Farhoud, who are cardiologists (MD) and have been doing research in this area. They both provided their expertise in the field which in turn significantly improved the discussion part of the paper. Then, I have presented this study at the INFORMS 2014 DMA (Data Mining & Analytics) Workshop in San Francisco, California. Afterwards, we wrote it up as a journal article and submitted it to Decision Support Systems Journal (DSS). After completing the revision of this study, we have resubmitted it to the same journal. It is currently under second review. This study is presented in Chapter 2. x

Description:
I have learnt how to make a good research that will have a high .. 4.3.3.1 Tree Augmented Naïve (TAN) Bayesian Belief Network .
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.