ebook img

Statistical methods to evaluate disease outcome diagnostic accuracy of multiple biomarkers with PDF

231 Pages·2016·1.09 MB·English
by  
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Statistical methods to evaluate disease outcome diagnostic accuracy of multiple biomarkers with

Statistical methods to evaluate disease outcome diagnostic accuracy of multiple biomarkers with application to HIV and TB research By Muna Balla Elshareef Mohammed [email protected] Supervisor : Professor Henry G. Mwambi [email protected] School of Mathematics, Statistics and Computer Science University of KwaZulu-Natal Pietermaritzburg, South Africa A thesis submitted for the fulfillment of the requirements for Doctor of Philosophy at the School of Mathematics, Statistics and Computer Sciences, University of KwaZulu-Natal, Pietermaritzburg October 2015 i Abstract Onechallengeinclinicalmedicineisthatofthecorrectdiagnosisofdisease. Medicalresearchers invest considerable time and effort to improving accurate disease diagnosis and following from this diagnostic tests are important components in modern medical practice. The receiver oper- ating characteristic (ROC)isastatisticaltoolcommonlyusedfordescribingthediscriminatory accuracy and performance of a diagnostic test. A popular summary index of discriminatory accuracy is the area under ROC curve (AUC). In the medical research data, scientists are simultaneously evaluating hundreds of biomarkers. A critical challenge is the combination of biomarkers into models that give insight into disease. In infectious disease, biomarkers are often evaluated as well as in the micro organism or virus causing infection, adding more complexity to the analysis. In addition to providing an improved understanding of factors associated with infection and disease development, combinations of relevant markers are im- portant to the diagnosis and treatment of disease. Taken together, this extends the role of, the statistical analyst and presents many novel and major challenges. This thesis discusses some of the various strategies and issues in using statistical data analysis to address the diagnosis problem, of selecting and combining multiple markers to estimate the predictive accuracy of test results. We also consider different methodologies to address missing data and to improve ii the predictive accuracy in the presence of incomplete data. The thesis is divided into five parts. The first part is an introduction to the theory behind the methods that we used in this work. The second part places emphasis on the so called classic ROC analysis, which is applied to cross sectional data. The main aim of this chap- ter is to address the problem of how to select and combine multiple markers and evaluate the appropriateness of certain techniques used in estimating the area under the ROC curve (AUC). Logistic regression models offer a simple method for combining markers. We applied resampling methods to adjust for over-fitting associated with model selection. We simulated several multivariate models to evaluate the performance of the resampling approaches in this setting. We applied these methods to data collected from a study of tuberculosis immune reconstitution inflammatory syndrome (TB-IRIS) in Cape Town, South Africa. Baseline levels of five biomarkers were evaluated and we used this dataset to evaluate whether a combination ofthesebiomarkerscouldaccuratelydiscriminatebetweenTB-IRISandnonTB-IRISpatients, by applying AUC analysis and resampling methods. The third part is concerned with a time dependent ROC analysis with event-time outcome and comparative analysis of the techniques applied to incomplete covariates. Three different methods are assessed and investigated, namely mean imputation, nearest neighbor hot deck imputation and multivariate imputation by chain equations (MICE). These methods were used together with bootstrap and cross-validation to estimate the time dependent AUC using a non-parametric approach and a Cox model. We simulated several models to evaluate the performance of the resampling approaches and imputation methods. We applied the above methods to a real data set. Thefourthpartisconcernedwithapplyingmoreadvancedvariableselectionmethodstopredict the survival of patients using time dependent ROC analysis. The least absolute shrinkage and iii selectionoperator(LASSO)Coxmodelisappliedtoestimatethebootstrapcross-validated,632 and 632+ bootstrap AUCs for TBM/HIV data set from KwaZulu-Natal in South Africa. We also suggest the use of ridge-Cox regression to estimate the AUC and two level bootstrapping to estimate the variances for AUC, in addition to evaluating these suggested methods. The last part of the research is an application study using genetic HIV data from rural KwaZulu-Natal to evaluate the sequence of ambiguities as a biomarker to predict recent infec- tion in HIV patients. iv Preface The work described in this thesis was carried out from March 2013 to October 2015, under the supervision and direction of Professor Henry G. Mwambi, School of Mathematics, Statistics and Computer Science, University of KwaZulu-Natal, Pietermaritzburg. The thesis represent original work of the author and has not been otherwise been submitted in any form for any degree or diploma to any University. Where use has been made of the work of others it is duly acknowledged in the text. Signature (Student) Date: 31st of March 2016 Signature (Supervisor) Date: 31st of March 2016 v Dedication TO MY LOVELY PARENTS DR BALLA AND HANAN, MY DEAR HUSBAND AYOUB, MY LOVELY DAUGHTER FATIMA, MY BROTHERS MUGTABA, AHMED AND TO THE SOUL OF MY SISTER FATIMA (TOTA), I DEDICATE THIS WORK. vi Acknowledgements First of all, I thank ALLAH for his Grace and Mercy showered upon me. I heartily express my profound gratitude to my supervisor, Professor Henry G. Mwambi, for his invaluable learned guidance, advises, encouragement, understanding and continued support he has provided me throughout the duration of my studies which led to the compilation of this thesis. I will be always indebted to him for introducing me to this fascinating area of application in health research and creating my interest in Biostatistics. I lovingly thank my dear husband Ayoub, who supported me each step of the way and without his help and encouragement it simply never would have been possible to finish this work. I also would like to thank my lovely parents Hanan and Dr Balla for their continuous support and best wishes. Also I would like to thank Mr Rob Ettershank, for his kindness and valuable corrections, comments and suggestions through the editing and proofreading process. I am grateful for the facilities made available to me by the School of Mathematics, Statistics and Computer Science of the University of KwaZulu-Natal (UKZN), Pietermaritzburg. I am also grateful for the financial support that I have received from UKZN and the South African vii CenterforEpidemiologicalModellingandAnalysis(SACEMA).MythanksextendtoProfessor Robert Wilkinson, Dr Suzaan Marais and Professor Tulio de Oliveira for supporting us with real datasets. Finally I sincerely thank my entire extended family represented by Balla, Hanan, Mohammed Elmojutaba, Ahmed, Fatima (tota), Basheer, Suaad, Eihab, Adeeb and Nada. viii Table of contents Abstract ii Preface v Dedication vi Acknowledgements vii Table of contents ix List of notations xvi List of figures xix List of tables xxi 1 Introduction 1 ix

Description:
DAUGHTER FATIMA, MY BROTHERS MUGTABA, AHMED AND TO THE SOUL OF MY SISTER . 5.3 Resampling methods in the context of combining multiple biomarkers and esti- .. in Cape Town - a secondary-level hospital. of TBM patients who had died was 48% while 10% were lost to the
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.