DEVELOPMENT AND APPLICATION OF STATISTICAL METHODS FOR PROGNOSIS RESEARCH By KYM IRIS ERIKA SNELL A thesis submitted to the University of Birmingham for the degree of DOCTOR OF PHILOSOPHY School of Health and Population Sciences University of Birmingham May 2015 University of Birmingham Research Archive e-theses repository This unpublished thesis/dissertation is copyright of the author and/or third parties. The intellectual property rights of the author or third parties in respect of this work are as defined by The Copyright Designs and Patents Act 1988 or as modified by any successor legislation. Any use made of information contained in this thesis/dissertation must be in accordance with that legislation and must be properly acknowledged. Further distribution or reproduction in any format is prohibited without the permission of the copyright holder. ABSTRACT A pivotal component of prognosis research is the prediction of future outcome risk. This thesis applies, develops and evaluates novel statistical methods for development and validation of risk prediction (prognostic) models. In the first part, a literature review of published prediction models shows that the Cox model remains the most common approach for developing a model using survival data; however, this avoids modelling the baseline hazard and therefore restricts individualised predictions. Flexible parametric survival models are shown to address this by flexibly modelling the baseline hazard, thereby enabling individualised risk predictions over time. Clinical application reveals discrepant mortality rates for different hip replacement procedures, and identifies common issues when developing models using clinical trial data. In the second part, univariate and multivariate random-effects meta-analyses are proposed to summarise a model’s performance across multiple validation studies. The multivariate approach accounts for correlation in multiple statistics (e.g. C-statistic and calibration slope), and allows joint predictions about expected model performance in applied settings. This allows competing implementation strategies (e.g. regarding baseline hazard choice) to be compared and ranked. A simulation study also provides recommendations for the scales on which to combine performance statistics to best satisfy the between-study normality assumption in random-effects meta-analysis. ACKNOWLEDGEMENTS This PhD was funded by the MRC Midland Hub for Trials Methodology Research, without which I would not have been able to complete this research. I would like to express my gratitude to my supervisors, Prof. Richard Riley and Prof. Lucinda Billingham. Richard, I consider myself so lucky to have had the opportunity to work with you. You have always believed in me, built up my confidence, sent opportunities my way and helped me find my passion and shape my career. I cannot truly thank you enough for the endless support and guidance you have offered. I would also like to thank the following people I have been lucky enough to work with: Thomas Debray for input and feedback on several chapters, Joie Ensor for all the discussions and feedback on chapters, Jon Deeks for his support in the last few months leading up to submission, and not forgetting Karen Biddle and Anne Walker for helping with anything and everything that they could. Thanks also go to my colleagues in Health and Population Sciences for all their encouragement. To my family, thank you for all your love and support over the years. Mum, Dad, Matt, Jay and Dawn, thanks for always believing in me and supporting me in everything I do. I am so lucky to have the family I do and words cannot express how much I love you all. Last but certainly not least, thanks go to my friends, old and new: Lozz, Hannah, Ruby, Elena, and the biggest thanks of all to a friend that has been there every single day of this journey, Dani. Since we started our PhDs together, you have been the best friend and my biggest support. We’ve been through it all together, you’ve been there to celebrate the highs and help me through the lows. You have made the last three and a half years an amazing experience. I am so thankful to have you in my life and proud of all we have achieved. TABLE OF CONTENTS Chapter 1: Introduction ...................................................................................................... 1 1.1 Introduction to research area .................................................................................... 1 1.2 What is prognosis research? ..................................................................................... 2 1.2.1 Framework for prognosis research .................................................................... 5 1.3 Logistic regression .................................................................................................... 6 1.3.1 Example prognostic model developed using logistic regression ........................ 8 1.4 Survival analysis ........................................................................................................ 9 1.4.1 Functions in survival data ................................................................................. 11 1.4.2 Cox proportional hazard model ........................................................................ 14 1.4.3 Parametric models ........................................................................................... 16 1.4.4 Flexible parametric models .............................................................................. 18 1.4.5 Non-proportional hazards ................................................................................. 23 1.4.6 Example prognostic model developed using a flexible parametric survival model ......................................................................................................................... 23 1.5 Model development considerations ......................................................................... 24 1.6 Validating a prognostic model ................................................................................. 27 1.6.1 Internal validation ............................................................................................. 28 1.6.2 External validation ............................................................................................ 31 1.6.3 Validation statistics ........................................................................................... 32 1.7 Presentation of prognostic models for clinical decision making .............................. 38 1.8 Importance of improving methodology in prognosis research ................................. 39 1.9 Aims and overview of thesis .................................................................................... 42 Chapter 2: Hip replacement surgery in osteoarthritis patients .................................... 45 2.1 Introduction .............................................................................................................. 45 2.2 Background to hip replacement procedures ............................................................ 45 2.2.1 Cemented procedures ...................................................................................... 46 2.2.2 Uncemented procedures .................................................................................. 47 2.2.3 Hybrid procedures ............................................................................................ 48 2.2.4 Birmingham Hip Resurfacing ............................................................................ 48 2.3 Data ......................................................................................................................... 49 2.4 Objectives ................................................................................................................ 51 2.4.1 Clinical objectives ............................................................................................. 51 2.4.2 Statistical objectives ......................................................................................... 51 2.5 Methods ................................................................................................................... 52 2.5.1 Data cleaning, inclusion and exclusion criteria ................................................. 52 2.5.2 Summary of data .............................................................................................. 52 2.5.3 Analysis of primary outcomes .......................................................................... 53 2.5.4 Assessing the proportional hazards assumption .............................................. 54 2.5.5 Number of knots for the baseline hazard function ............................................ 55 2.5.6 Analysis of secondary outcomes ...................................................................... 55 2.6 Results ..................................................................................................................... 56 2.6.1 Summary of data for cemented and uncemented THRs .................................. 56 2.6.2 Proportional hazards assumption ..................................................................... 57 2.6.3 Number of knots for the baseline hazard function ............................................ 59 2.6.4 Primary outcome analyses ............................................................................... 61 2.6.5 Secondary analyses ......................................................................................... 73 2.7 Discussion ............................................................................................................... 81 2.7.1 Summary of clinical findings ............................................................................. 81 2.7.2 Statistical advantages of flexible parametric models in this dataset ................ 84 2.7.3 Potential pitfalls and situations when Royston-Parmar models are not required . ......................................................................................................................... 87 2.7.4 Further work ..................................................................................................... 87 2.8 Conclusion ............................................................................................................... 89 Chapter 3: Estimating the baseline hazard and absolute risk in multivariable prediction models: a review of current practice ............................................................... 91 3.1 Introduction and objectives ...................................................................................... 91 3.2 Method .................................................................................................................... 93 3.2.1 Identifying a set of articles for review ............................................................... 93 3.2.2 Inclusion/exclusion criteria ............................................................................... 93 3.2.3 Review process ................................................................................................ 94 3.2.4 Evaluation of relevant articles .......................................................................... 94 3.3 Results .................................................................................................................... 96 3.3.1 Identification of relevant articles ....................................................................... 96 3.3.2 Summary of articles included in the review ...................................................... 97 3.3.3 Development data description and size ......................................................... 102 3.3.4 Model development methods ......................................................................... 105 3.3.5 Reporting of results ........................................................................................ 107 3.3.6 Modelling the baseline hazard and reporting absolute risk predictions .......... 108 3.3.7 Validation ....................................................................................................... 119
Description: