Table Of Content

Detecting Adverse Drug Reactions in the General Practice Healthcare Database Jenna Marie Reps School of Computer Science University of Nottingham A thesis submitted for the degree of Doctor of Philosophy 2014 Acknowledgements IwouldliketoacknowledgeandthankmysupervisorsProfessorJonathan GaribaldiandProfessorUweAickelin, whosguidancemadethisthesis possible, my family for their continued support during my PhD and my friends for helping to keep me motivated. Abstract The novel contribution of this research is the development of a supervised algorithm that extracts relevant attributes from The Health Im- provement Network database to detect prescription side effects. Pre- scription drug side effects are a common cause of morbidity through- out the world. Methods that aim to detect side effects have his- torically been limited due to the data available, but some of these limitations may be overcome by incorporating longitudinal observational databases into pharmacovigilance. Existing side effect detecting methods using longitudinal observational databases have shown promise at becoming a fundamental component of post marketing surveillance but unfortunately have high false positive rates. An extra step is required to further analyse and filter the potential side effects detected by existing methods due to their high false positive rates, and this reduces their efficiency. In this thesis a novel methodology, the supervised adverse drug reaction predictor (SAP) framework, is presented that learns from known side effects, and identifies patterns that can be utilised to detect unknown side effects. The Bradford-Hill causality considerations are used to derive suitable attributes as in- puts into a learning algorithm. Both supervised and semi-supervised techniques are investigated due to the limited number of definitively known side effects. The results showed that the SAP framework implementing a random forest classifier outperformed the existing methods on The Health Improvement Network longitudinal observational database, with AUCs ranging between 0.812-0.937, an overall MAP of 0.667, precision values between 0.733-1 and a false positive rate ≤ 0.013. When applied to the standard reference the SAP framework implementing a support vector machine obtained a MAP score of 0.490, an average AUC of 0.703 and a false positive rate of 0.16. Thefalsepositiverateislowerthanthatobtainedbyexistingmethods on the standard reference. Contents Contents iv List of Figures ix Nomenclature xv 1 Introduction 1 1.1 Background & Motivation . . . . . . . . . . . . . . . . . . . . . . 2 1.2 Aims & Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.2.1 Hypotheses . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.2.2 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 1.3 Thesis Organisation . . . . . . . . . . . . . . . . . . . . . . . . . . 9 1.4 Contribution to Knowledge . . . . . . . . . . . . . . . . . . . . . . 11 2 Literature Review 13 2.1 Current Pharmacovigilance . . . . . . . . . . . . . . . . . . . . . . 13 2.1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.1.2 Causality . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.1.3 Spontaneous Reporting Databases . . . . . . . . . . . . . . 19 2.1.3.1 Overview . . . . . . . . . . . . . . . . . . . . . . 19 2.1.3.2 Causality . . . . . . . . . . . . . . . . . . . . . . 25 2.1.3.3 Limitations . . . . . . . . . . . . . . . . . . . . . 27 2.1.3.4 Summary . . . . . . . . . . . . . . . . . . . . . . 28 2.1.4 Longitudinal Observational Databases . . . . . . . . . . . 29 iv CONTENTS 2.1.4.1 Introduction . . . . . . . . . . . . . . . . . . . . 29 2.1.4.2 Methods . . . . . . . . . . . . . . . . . . . . . . . 33 2.1.4.3 Causality . . . . . . . . . . . . . . . . . . . . . . 42 2.1.4.4 Limitations . . . . . . . . . . . . . . . . . . . . . 43 2.1.4.5 Summary . . . . . . . . . . . . . . . . . . . . . . 43 2.1.5 Combining Multiple Databases . . . . . . . . . . . . . . . 44 2.1.5.1 Overview . . . . . . . . . . . . . . . . . . . . . . 44 2.1.5.2 Summary . . . . . . . . . . . . . . . . . . . . . . 50 2.1.6 Pharmacovigilance Summary . . . . . . . . . . . . . . . . . 50 2.2 Pattern Recognition . . . . . . . . . . . . . . . . . . . . . . . . . 53 2.2.1 Supervised Learning . . . . . . . . . . . . . . . . . . . . . 55 2.2.1.1 Introduction . . . . . . . . . . . . . . . . . . . . 55 2.2.1.2 Classifiers . . . . . . . . . . . . . . . . . . . . . . 61 2.2.1.3 Ensemble Methods . . . . . . . . . . . . . . . . . 69 2.2.1.4 Supervised Learning Summary . . . . . . . . . . 73 2.2.2 Semi-Supervised Learning . . . . . . . . . . . . . . . . . . 74 2.2.2.1 Introduction . . . . . . . . . . . . . . . . . . . . 74 2.2.2.2 Semi-Supervised Clustering . . . . . . . . . . . . 79 2.2.2.3 Metric Learning . . . . . . . . . . . . . . . . . . 80 2.2.2.4 Semi-Supervised Learning Summary . . . . . . . 82 2.2.3 Pattern Recognition Summary . . . . . . . . . . . . . . . . 83 2.3 Literature Review Summary . . . . . . . . . . . . . . . . . . . . . 84 3 Existing Methods Comparison 87 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 3.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 3.3 Existing Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 3.3.1 TPD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 3.3.2 MUTARA & HUNT . . . . . . . . . . . . . . . . . . . . . 92 3.3.3 ROR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 3.4 Determining Labels . . . . . . . . . . . . . . . . . . . . . . . . . . 93 3.4.1 ADR Labels . . . . . . . . . . . . . . . . . . . . . . . . . . 93 v CONTENTS 3.4.1.1 Online . . . . . . . . . . . . . . . . . . . . . . . . 93 3.4.1.2 SIDER . . . . . . . . . . . . . . . . . . . . . . . 93 3.4.2 Noise Labels . . . . . . . . . . . . . . . . . . . . . . . . . . 94 3.5 Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 3.5.1 Natural Thresholds . . . . . . . . . . . . . . . . . . . . . . 95 3.5.2 Ranking Ability . . . . . . . . . . . . . . . . . . . . . . . . 96 3.6 General Comparison . . . . . . . . . . . . . . . . . . . . . . . . . 98 3.6.1 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 3.6.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 3.6.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 3.7 Specific Comparison . . . . . . . . . . . . . . . . . . . . . . . . . 104 3.7.1 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 3.7.2 Results and Discussion . . . . . . . . . . . . . . . . . . . . 105 3.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 4 Incorporating Causation 110 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 4.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 4.2.1 Data Cleansing . . . . . . . . . . . . . . . . . . . . . . . . 113 4.2.2 Data Extraction . . . . . . . . . . . . . . . . . . . . . . . . 113 4.2.2.1 Formulation . . . . . . . . . . . . . . . . . . . . . 113 4.2.2.2 Extraction . . . . . . . . . . . . . . . . . . . . . . 115 4.2.3 Data Derivation . . . . . . . . . . . . . . . . . . . . . . . . 119 4.2.3.1 Association Strength . . . . . . . . . . . . . . . . 121 4.2.3.2 Temporality . . . . . . . . . . . . . . . . . . . . . 123 4.2.3.3 Specificity . . . . . . . . . . . . . . . . . . . . . . 124 4.2.3.4 Biological Gradient . . . . . . . . . . . . . . . . . 126 4.2.3.5 Experimentation . . . . . . . . . . . . . . . . . . 127 4.2.3.6 Other Criteria . . . . . . . . . . . . . . . . . . . 128 4.2.3.7 THIN Specific . . . . . . . . . . . . . . . . . . . . 128 4.2.3.8 A Note on Dependency . . . . . . . . . . . . . . 130 4.2.4 Data Description . . . . . . . . . . . . . . . . . . . . . . . 131 vi CONTENTS 4.2.5 Data Transformation . . . . . . . . . . . . . . . . . . . . . 131 4.2.5.1 Continuous Attributes . . . . . . . . . . . . . . . 131 4.2.5.2 Discrete Attributes . . . . . . . . . . . . . . . . . 132 4.2.6 Feature Selection . . . . . . . . . . . . . . . . . . . . . . . 133 4.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 5 Developing The ADR Learning Framework 136 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136 5.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 5.3 Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 5.3.1 Supervised ADR Predictor . . . . . . . . . . . . . . . . . . 139 5.3.1.1 Training Stage . . . . . . . . . . . . . . . . . . . 141 5.3.1.2 Prediction Stage . . . . . . . . . . . . . . . . . . 142 5.3.1.3 Results and Analysis . . . . . . . . . . . . . . . . 142 5.3.1.4 Summary . . . . . . . . . . . . . . . . . . . . . . 147 5.3.2 Semi-Supervised ADR Predictor . . . . . . . . . . . . . . . 148 5.3.2.1 Self Training Random Forest . . . . . . . . . . . 149 5.3.2.2 Semi-supervised Clustering . . . . . . . . . . . . 151 5.3.2.3 Results and Analysis . . . . . . . . . . . . . . . . 152 5.3.2.4 Summary . . . . . . . . . . . . . . . . . . . . . . 158 5.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158 6 Evaluating The ADR Learning Framework 160 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160 6.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161 6.3 Evaluation using the Standard Reference . . . . . . . . . . . . . . 162 6.3.1 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 6.3.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 6.3.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . 164 6.4 Specific Comparison . . . . . . . . . . . . . . . . . . . . . . . . . 166 6.4.1 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166 6.4.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166 vii CONTENTS 6.4.2.1 Nifedipine . . . . . . . . . . . . . . . . . . . . . . 166 6.4.2.2 Ciprofloxacin . . . . . . . . . . . . . . . . . . . . 168 6.4.2.3 Ibuprofen . . . . . . . . . . . . . . . . . . . . . . 170 6.4.2.4 Budesonide . . . . . . . . . . . . . . . . . . . . . 172 6.4.2.5 Naproxen . . . . . . . . . . . . . . . . . . . . . . 173 6.4.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . 175 6.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181 7 Conclusions 183 7.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184 7.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190 7.3 Dissemination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193 7.3.1 Journal Papers . . . . . . . . . . . . . . . . . . . . . . . . 193 7.3.2 Conference papers . . . . . . . . . . . . . . . . . . . . . . 194 A The THIN Database 196 B Drugs 205 C Software Details and Preliminary Work 211 C.1 Software Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211 C.2 Wrapper Feature Selection . . . . . . . . . . . . . . . . . . . . . . 212 C.3 Preliminary Work . . . . . . . . . . . . . . . . . . . . . . . . . . . 213 D SAP Result Tables 219 References 268 viii List of Figures 2.1 Illustration of data contained in SRS databases. . . . . . . . . . 20 2.2 AnexampleentityrelationshipdiagramforanSRSdatabasebased on the FAERS database. . . . . . . . . . . . . . . . . . . . . . . 21 2.3 The online form for submitting suspected ADRs via the Yellow Card Scheme in the UK. . . . . . . . . . . . . . . . . . . . . . . . 22 2.4 Illustration of patients’ longitudinal data contained in the THIN database. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 2.5 Illustration of the counterfactual theory of causaltion. . . . . . . . 33 2.6 Illustration of the disproportionality methods. . . . . . . . . . . . 34 2.7 Illustration of the TPD method. . . . . . . . . . . . . . . . . . . 35 2.8 Illustration of the MUTARA and HUNT methods. . . . . . . . . 40 2.9 Illustration of a classifier partitioning the attribute space. Using the training data (blue dots are labelled as ADR and red as non- ADR)afunctionistrainedtopartitionthespaceintoADRsections and non-ADR sections. This can then be used to predict whether a new data-point is an ADR or non-ADR based on where the data point lies in the attribute space. . . . . . . . . . . . . . . . . . . . 56 ix

Description:

5.4 The framework for the Semi-Supervised ADR Predictor algorithm. This algorithm uses labelled and Mining Unexpected Temporal Association Rules given the An- tecedent: a method for quentist methods for signal detection is that they are fast, which is an important factor due to the large

Detecting adverse drug reactions in the general practice healthcare database. PhD thesis ... PDF

315 Pages·2017·6.01 MB·English

by Christine Middleton

Checking for file health...

Save to my drive

Quick download

Download

Upgrade Premium

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Detecting adverse drug reactions in the general practice healthcare database. PhD thesis ...

Description:

See more

The list of books you might like

Upgrade Premium

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.