Record linkage study to discriminate between community-acquired, healthcare-associated and hospital-acquired bloodstream infection in children in England Thesis presented for the degree of Doctor of Philosophy Katherine Louise Henderson Institute of Child Health, University College London 2017 Declaration I, Katherine Louise Henderson, confirm that the work presented in this thesis is my own. Where information has been derived from other sources, I confirm that this has been indicated in the thesis. Signature: Date: 2 Abstract Current national guidance for empiric antibiotic therapy of children with bloodstream infections (BSIs) indicates that treatment options are dependent on whether a BSI is community-acquired (CA) or hospital-acquired (HA). Recent changes in healthcare delivery have seen an emergence of patients from the community with healthcare-associated (HCA) BSIs who require different antibiotics compared to ‘classic’ CA BSIs; research in this field has predominantly concentrated on adults. This thesis aims to predict the relative proportions of CA, HCA and HA BSIs in children aged 1 month to 5 years by identifying demographic and clinical characteristics associated with these three groups using two datasets. This thesis introduces the structure of the microbiology laboratory surveillance (LabBase2) and clinical (Hospital Episode Statistics) datasets and describes the probabilistic methodology used to link them, encompassing the match-weight calculation, threshold selection and evaluation of the final linked dataset. Three statistical approaches were used to predict CA, HCA and HA BSI using indicators of child susceptibility to infection, hospital exposure and timing of BSI. The combination of CA and HCA BSI accounted for 74% or more of all BSIs, but differences in the ranges of individual CA and HCA BSI estimates from each of the three statistical approaches were too wide (difference of 48%) to discern a consistent proportion of either CA or HCA BSI; conversely, the estimated proportion of HA BSI was consistently predicted (26% or less of all BSI). The cross-tabulation of child characteristics and invasive pathogens by predicted CA, HCA and HA BSI group identified only a few low prevalence risk factors (more than two chronic conditions, indwelling devices, hospital discharge in the month prior to the BSI) that had a high predictive value for HCA as opposed to CA BSI. Pathogens historically associated with causing HA BSI, particularly Gram-negative pathogens that are harder to treat with antibiotics, were frequently isolated on or before the day of hospital admission. This thesis demonstrated the wide range of pathogens isolated before and on admission to hospital, reflecting the growing mix of children acquiring CA and HCA BSIs in England in the community setting. A few, low-frequency clinical characteristics were predictive of CA, HCA or HA BSI, however, additional data from other sources (e.g. outpatient, primary care), may help to increase the accuracy of the prediction models. The mix of pathogens isolated from children is likely to become more heterogeneous as healthcare provision outside of hospital increases, suggesting that the concept of using clinical characteristics to identify the source of BSI is becoming less relevant. This highlights the importance of developing rapid indicators, such as early bedside testing, to inform appropriate treatment strategies. 3 Acknowledgements My very sincere thanks to my supervisors Ruth Gilbert, Angie Wade, Berit Muller-Pebody and Alan Johnson for their continued support, guidance and inspiration throughout my PhD. It has been a privilege and an incredible experience to have been supervised by you. My thanks also to Mike Sharland (St. George’s Hospital), for both his clinical advice and contagious enthusiasm in advancing the field of paediatric epidemiology, and to Mehdi Minaji, for helping with the technical IT aspects of the data linkage. Thank you also to Katie Harron for introducing me to probabilistic linkage. Thank you to the members of the Improving Child Antimicrobial Prescribing group, the Institute of Child Health Early Career Research group and the St. George’s Hospital Paediatric Department, who have listened, challenged and provided useful suggestions in response to various practice presentations of my PhD work over the past five years. I would also like to thank my colleagues in the HCAI & AMR Department at PHE who have been very supportive throughout my PhD, making it possible for me to balance my job with my thesis, and for that I am extremely grateful. Thank you as well to my fellow PhD students and colleagues at ICH, it has been a pleasure to meet and work amongst such a talented and friendly team. Finally, my immeasurable thanks to my family and friends for their endless support, humour, encouragement and provision of tea; to Harry for his musical inspiration; and to Antonio for his patience and faith in me, thank you. This thesis would not have been possible without you all. Funding This work was partially-funded by PHE’s Education and Training Committee and the HCAI & AMR Department, and partially self-funded. The views and opinions expressed in this thesis are those of the author and do not necessarily reflect those of PHE or UCL ICH. 4 Contents Declaration ................................................................................................................................. 2 Abstract ................................................................................................................................. 3 Acknowledgements ........................................................................................................................ 4 Funding ................................................................................................................................. 4 Contents ................................................................................................................................. 5 Figures ................................................................................................................................. 9 Tables ............................................................................................................................... 13 Appendices ............................................................................................................................... 15 Abbreviations ............................................................................................................................... 16 Chapter 1. Background, aims and objectives .......................................................................... 18 1.1 Introduction ......................................................................................................................... 18 1.1.1 Structure and content of Chapter 1 ............................................................................ 19 1.2 Bloodstream infections (BSIs) .............................................................................................. 19 1.2.1 Community-acquired (CA) and hospital-acquired (HA) BSI ......................................... 20 1.2.2 Healthcare-associated (HCA) BSIs ............................................................................... 21 1.2.3 BSI in children .............................................................................................................. 22 1.2.4 Previous Public Health England work on children with BSI ......................................... 24 1.2.5 Existing Evidence ......................................................................................................... 24 1.3 Systematic literature review ................................................................................................ 25 1.3.1 Methods ...................................................................................................................... 25 1.3.2 Results ......................................................................................................................... 26 1.3.3 Key findings ................................................................................................................. 33 1.4 Professional opinion survey: what pathogens causing paediatric BSI are judged to be (or not be) HA BSI in the UK? ............................................................................................... 35 1.4.1 Methods ...................................................................................................................... 35 1.4.2 Results ......................................................................................................................... 36 1.4.3 Key findings ................................................................................................................. 40 1.5 Discussion ............................................................................................................................. 40 1.6 Rationale for PhD ................................................................................................................. 41 1.6.1 Aim ............................................................................................................................... 42 1.6.2 Objectives .................................................................................................................... 42 1.6.3 Thesis structure ........................................................................................................... 43 Chapter 2. Defining the cohort of children with BSIs for analysis .......................................... 44 2.1 Introduction: data sources ................................................................................................... 44 2.2 Structure and Content .......................................................................................................... 44 2.3 LabBase2: Public Health England’s surveillance system ...................................................... 46 2.3.1 Purpose ........................................................................................................................ 46 2.3.2 Background .................................................................................................................. 46 2.3.3 What data are reported to LabBase2? ........................................................................ 46 2.3.4 Capturing data on positive blood cultures .................................................................. 47 2.3.5 Coverage ...................................................................................................................... 48 2.3.6 Data structure .............................................................................................................. 49 2.3.7 Identification of infection episodes ............................................................................. 49 2.3.8 Matching BSI episode reports to the same patient episode ....................................... 50 5 2.3.9 Query design for LabBase2 data extraction ................................................................ 52 2.3.10 Patient identifiers in LabBase2 .................................................................................... 53 2.4 Hospital Episode Statistics (HES) .......................................................................................... 56 2.4.1 Purpose ........................................................................................................................ 56 2.4.2 Background .................................................................................................................. 56 2.4.3 Data format ................................................................................................................. 57 2.4.4 Patient identifiers in HES ............................................................................................. 58 2.4.5 HESid – identifying patients over time ........................................................................ 59 2.4.6 HES three-step search and matching method ............................................................. 59 2.4.7 Episodes and Spells ...................................................................................................... 60 2.4.8 Data structure .............................................................................................................. 61 2.4.9 Query design for HES data extraction ......................................................................... 61 2.5 Data sources: considerations for linkage and analysis ......................................................... 62 2.6 Information governance: permissions to access data ......................................................... 63 2.7 Cohort selection ................................................................................................................... 63 2.7.1 Cohort time period ...................................................................................................... 63 2.7.2 Exposure period ........................................................................................................... 64 2.7.3 Definition of a child with a BSI..................................................................................... 64 2.7.4 Definition of a BSI index case ...................................................................................... 64 2.7.5 Treatment of discharge and re-admission on the same day ....................................... 65 2.7.6 Age ............................................................................................................................... 66 2.8 Summary .............................................................................................................................. 66 2.9 Conclusion ............................................................................................................................ 67 Chapter 3. Calculating match-weights to probabilistically link LabBase2 and HES datasets .. 68 3.1 Introduction ......................................................................................................................... 68 3.2 Data linkage .......................................................................................................................... 68 3.2.1 Growing demand ......................................................................................................... 68 3.2.2 Methods of data linkage .............................................................................................. 70 3.2.3 Data linkage in the UK ................................................................................................. 74 3.2.4 Chosen linkage method ............................................................................................... 74 3.3 Methods: determining probabilistic match-weights............................................................ 75 3.3.1 Patient identifiers ........................................................................................................ 75 3.3.2 PHE match-weights for all ages (MW-All) ................................................................... 76 3.3.3 Calculation of child match-weights (MW-Child) .......................................................... 77 3.4 Results: determining probabilistic match-weights............................................................... 79 3.5 Methods: probabilistic linkage of LabBase2 and HES .......................................................... 83 3.5.1 Threshold selection ..................................................................................................... 85 3.6 Results: probabilistic linkage of LabBase2 and HES ............................................................. 85 3.6.1 Comparison of linked and unlinked OPIEs ................................................................... 88 3.6.2 Sensitivity analysis ....................................................................................................... 89 3.7 Results: Association between MW-All and MW-Child ......................................................... 91 3.8 Final match-weight set selection ......................................................................................... 93 3.8.1 Data management and preparation of linked cohort for analysis .............................. 93 3.8.2 Final distribution of links by MW-Child ....................................................................... 97 3.9 Discussion ............................................................................................................................. 97 3.9.1 Limitations ................................................................................................................... 97 3.9.2 Summary ...................................................................................................................... 98 Chapter 4. Clinical and demographic characteristics of children with a BSI ........................... 99 4.1 Introduction ......................................................................................................................... 99 4.1.1 Mechanisms for acquisition of a BSI.......................................................................... 101 4.2 Methods ............................................................................................................................. 105 6 4.2.1 Child susceptibility ..................................................................................................... 105 4.2.2 Direct exposure to invasive BSI from inpatient healthcare ....................................... 107 4.2.3 Timing of positive blood samples in relation to hospital admission ......................... 110 4.3 Results ................................................................................................................................ 111 4.3.1 Child susceptibility ..................................................................................................... 111 4.3.2 Direct inpatient exposure to infection ...................................................................... 115 4.3.3 Timing of positive blood samples in relation to hospital admission ......................... 117 4.4 Discussion ........................................................................................................................... 120 4.4.1 Child susceptibility to BSI........................................................................................... 120 4.4.2 Direct exposure to invasive BSI from inpatient healthcare ....................................... 120 4.4.3 Timing of positive blood samples in relation to hospital admission ......................... 121 4.4.4 Limitations of cohort ................................................................................................. 121 Chapter 5. Methods and Results: Predicting the proportion of CA, HCA and HA BSI ........... 123 5.1 Introduction ....................................................................................................................... 123 5.2 Definitions and assumptions about the linked cohort ....................................................... 125 5.2.1 Defining children with HA BSI .................................................................................... 125 5.2.2 Defining children with a priori CA BSI........................................................................ 126 5.2.3 Results ....................................................................................................................... 128 5.2.4 Summary and discussion ........................................................................................... 133 5.3 Methods of the statistical approaches A, B and C ............................................................. 134 5.3.1 Methods: approach A ................................................................................................ 137 5.3.2 Methods: approach B ................................................................................................ 139 5.3.3 Methods: approach C ................................................................................................ 141 5.3.4 Sensitivity analyses .................................................................................................... 144 5.4 Results of approaches A, B and C ....................................................................................... 144 5.4.1 Results: approach A ................................................................................................... 144 5.4.2 Results: approach B ................................................................................................... 157 5.4.3 Results: approach C ................................................................................................... 162 5.5 Chapter summary ............................................................................................................... 171 5.5.1 Discussion and limitations of the methodology ........................................................ 171 Chapter 6. Distribution of risk factors and pathogens by predicted CA, HCA and HA BSI category ............................................................................................................................. 174 6.1 Introduction ....................................................................................................................... 174 6.1.1 Chapter structure ...................................................................................................... 175 6.2 Comparing the estimated proportion of CA, HCA and HA BSI according to each method (A, B and C) ......................................................................................................................... 177 6.2.1 Methods .................................................................................................................... 177 6.2.2 Results ....................................................................................................................... 177 6.2.3 Comparing the estimated proportions of CA, HCA and HA BSI with the commonly used CA/HA BSI threshold ......................................................................................... 180 6.2.4 Section summary ....................................................................................................... 181 6.3 Distribution of risk factors by CA, HCA and HA BSI category ............................................. 182 6.3.1 Methods .................................................................................................................... 182 6.3.2 Results ....................................................................................................................... 183 6.3.3 Section summary ....................................................................................................... 187 6.4 Distribution of bacterial species by time and BSI category ................................................ 187 6.4.1 Methods .................................................................................................................... 188 6.4.2 Results ....................................................................................................................... 188 6.4.3 Section summary ....................................................................................................... 194 6.5 Pathogen distribution by predicted CA, HCA and HA BSI category ................................... 194 6.5.1 Methods .................................................................................................................... 194 7 6.5.2 Results ....................................................................................................................... 194 6.5.3 Section summary ....................................................................................................... 202 6.6 Discussion ........................................................................................................................... 202 Chapter 7. Discussion ............................................................................................................ 204 7.1 Aim of the thesis ................................................................................................................ 204 7.1.1 Objectives .................................................................................................................. 204 7.2 Key findings ........................................................................................................................ 204 7.3 Strengths and limitations ................................................................................................... 205 7.3.1 Objective 1: Link microbiology surveillance and clinical data and evaluate the effect of match-weight choice on the final linked dataset ........................................ 205 7.3.2 Objective 2: Develop statistical methods to identify patient and clinical characteristics associated with CA, HCA and HA BSI in children ............................... 207 7.3.3 Objective 3: Predict the proportion of CA, HCA and HA BSI occurring in children and the distribution of causative pathogens in each group ...................................... 208 7.4 What this thesis adds ......................................................................................................... 210 7.4.1 Implications for clinical practice ................................................................................ 210 7.4.2 Implications for Public Health ................................................................................... 210 7.4.3 Implications for Research .......................................................................................... 211 7.5 Closing remarks .................................................................................................................. 213 References ............................................................................................................................. 214 Appendices ............................................................................................................................. 234 8 Figures Figure 1.1 Flow diagram to illustrate the process of study selection using the inclusion criteria ..................................................................................................................... 23 Figure 1.2. Flow diagram to illustrate the process of study selection using the inclusion criteria ..................................................................................................................... 28 Figure 1.3. Screenshot example of the online professional opinion survey ................................ 38 Figure 1.4a-m. Professional opinion survey results from UK Paediatrician and Microbiologists for 12 pathogens for the percentage of BSIs caused by each pathogen, in their opinion, is likely to be HA .......................................................... 39 Figure 1.5. Flow diagram of steps to select appropriate antibiotic therapy for children with BSIs .......................................................................................................................... 42 Figure 2.1. Flow diagram of the steps already addressed in previous chapters (grey), those that will be addressed in this Chapter (red) and the following chapters ................ 45 Figure 2.2. Flow diagram showing the path of the specimen results data from the local laboratories to the national LabBase2 database. .................................................... 48 Figure 2.3. Example demonstrating that OPIEs are assigned based on the organism species .... 50 Figure 2.4. Example of a LabBase2 query (fictitious data) ........................................................... 52 Figure 2.5. Proportion of identifier records with a data entry for each of the six patient identifiers in LabBase2 between 2007/08 and 2010/11 for children aged 0-5 years and patients with no date of birth ................................................................. 54 Figure 2.6. Proportion of identifier records with a data entry for each of the five patient identifiers in HES between 2007/08 and 2010/11 for children aged 0-5 year ....... 58 Figure 2.7. Episodes and spells in HES ......................................................................................... 60 Figure 2.8. Example of the first restricted, patient identifier-only HES query (fictitious data) ... 61 Figure 2.9. Example of the second more comprehensive HES query (fictitious data) ................ 62 Figure 2.10. Diagram indicating cohort time-period with exposure prior to the BSI .................. 64 Figure 2.11. Examples of cohort inclusion for OPIE (1, 2, and 3) based on date of the positive blood isolate ............................................................................................................ 65 Figure 2.12. Treatment of a discharge and a re-admission episodes occurring on the same day ........................................................................................................................... 65 Figure 3.1. Flow diagram of the steps already addressed in previous chapters (grey), those that will be addressed in this Chapter (red) and the following chapters ................ 69 Figure 3.2. Data linkage error ...................................................................................................... 70 Figure 3.3. Diagram illustrating that when full patient identifiers are known and complete in both datasets, there is very high confidence in the match being true ................... 71 Figure 3.4. Diagram illustrating that when patient identifiers are missing or there are data entry errors, there is less confidence in the match being true ............................... 72 Figure 3.5. Range of possible probabilistic linkage thresholds .................................................... 73 Figure 3.6. Example dataset containing true-matches ................................................................ 77 Figure 3.7. Dataset containing true-matches (grey) and false-matches (white) ......................... 78 Figure 3.8. Agree match-weights: Radar plot overlaying the sets of match-weights and comparing the proportion that each of the eight individual patient identifiers contributes towards the sum of the total match-weight (100%). DD = 2-digit day of birth; MM – 2-digit month of birth; YYYY = 4-digit year of birth; MW = match-weight .......................................................................................................... 81 Figure 3.9. Disagree match-weights: Radar plot overlaying the sets of match-weights and comparing the proportion that each of the eight individual patient identifiers contributes towards the sum of the total match-weight (100%); DD = 2-digit day of birth; MM – 2-digit month of birth; YYYY = 4-digit year of birth; MW = match-weight .......................................................................................................... 81 9 Figure 3.10. Missing data match-weights: Radar plot overlaying the sets of match-weights and comparing the proportion that each of the eight individual patient identifiers contributes towards the sum of the total match-weight (100%); DD = 2-digit day of birth; MM – 2-digit month of birth; YYYY = 4-digit year of birth; MW = match-weight ................................................................................................ 82 Figure 3.11. Linkage of all the LabBase2 data to each individual year of HES data ..................... 83 Figure 3.12. Computational challenges involved with probabilistic linkage of LabBase2 and HES data................................................................................................................... 84 Figure 3.13. Distribution of the total number of linked pairs per match-weight considered as true, uncertain or false matches using the MW-All in relation to the chosen threshold (selected via manual review) .................................................................. 86 Figure 3.14. Distribution of the total number of linked pairs per match-weight considered as true, uncertain or false matches using the MW-Child in relation to the chosen threshold (selected via manual review) .................................................................. 87 Figure 3.15. Number of unique and mutual OPIEs and HESid linked pairs for the MW-All and MW-Child ................................................................................................................. 88 Figure 3.16. The number of linked pairs (denoted by the size of each circle) for MW-All compared to MW-Child and their relative assigned match-weights before the selection of the highest match-weight per OPIE ..................................................... 92 Figure 3.17. Flow diagram of LabBase2, good-links patient identifiers and HES data linkage, processing, manipulation and cleaning. .................................................................. 94 Figure 3.17 (cont.) Flow diagram of LabBase2, good-links patient identifiers and HES data linkage, processing, manipulation and cleaning. .................................................... 95 Figure 3.17 (cont.) Flow diagram of LabBase2, good-links patient identifiers and HES data linkage, processing, manipulation and cleaning. .................................................... 96 Figure 3.18. MW-Child distribution of the highest match-weight assigned to each linked LabBase2 and HES pair of records for children aged 1 month – 5 years, April 2007 – March 2011 .................................................................................................. 97 Figure 4.1. Flow diagram of the steps already addressed in previous chapters (grey), those that will be addressed in this Chapter (red) and the following chapters .............. 100 Figure 4.2. Summary of mechanisms of BSI epidemiology in children admitted to hospital .... 104 Figure 4.3. Frequency distribution of children with a BSI by quarter year of age ..................... 111 Figure 4.4. Frequency distribution of inpatient postnatal length of stay (no. of nights) for children aged 1 month < 1 year (0.5 symbolises day admissions only) ................ 115 Figure 4.5. Frequency and timing when positive bacterial blood specimens were taken in relation to hospital admission (day 0) for children aged 1 month < 1 year, 2007- 2011. Use of log-scale in main graph, and normal-scale in the excerpt ............... 118 Figure 4.6. Frequency and timing when positive bacterial blood specimens were taken in relation to hospital admission (day 0) for children aged 1 - 5 years, 2007-2011. Use of log-scale in main graph, and normal-scale in the excerpt ......................... 119 Figure 5.1. Flow diagram of the steps taken (grey) and those that will be addressed in this chapter (red) to predict the proportion of CA, HCA and HA BSI in children ......... 124 Figure 5.2. Expected timing of specimen date sample in relation to hospital admission: Varying HA1-HA9 cut-off dates for a priori HA BSI selection ................................ 126 Figure 5.3. Expected timing of specimen date sample in relation to hospital admission for a priori CA BSI ........................................................................................................... 127 Figure 5.4. Proportional distribution of a priori CA pathogens in children aged 1 month < 1 year in relation to hospital admission ................................................................... 130 Figure 5.5 Proportional distribution of a priori CA pathogens in children aged 1-5 years in relation to hospital admission ............................................................................... 131 Figure 5.6. Expected timing of specimen date sample in relation to hospital admission in approach A: Mixed CA, HCA and HA BSI ................................................................ 137 10
Description: