Applied Biostatistical Principles and Concepts The past three decades have witnessed modern advances in statistical mod- eling and evidence discovery in biomedical, clinical, and population-based research. With these advances come the challenges in accurate model stipu- lation and application of models in scientific evidence discovery regarding patient care and public health improvement. Applied Biostatistical Principles and Concepts provides practical knowl- edge on evidence discovery in clinical, biomedical, and translational science research. Biostatistics is conceived as an information science aimed at assess- ing data variation, which may arise from natural phenomenon such as sex, age, race, and genetic variations or due to measurement or observation errors. The process of quantifying sample variations requires random variable, implying probability or unbiased sample, exploratory or descriptive statistics, and infer- ential statistics in quantifying uncertainties through estimation, confidence interval method, as well as hypothesis testing via p value method. Since reli- able and valid data are required for setting clinical guidelines in enhancing therapeutics and improving patient and public health, clinicians and health- care providers who play a fundamental role in the task force for clinical and public health guidelines development require basic knowledge of research methodology, namely, design, conduct, analysis, and interpretation. The con- cepts and techniques provided in this text will facilitate researchers’/clinicians’ design and conduct studies, then translate data from bench to clinics in an attempt to improve the health of patients and populations. Suitable for both clinicians and health or biological sciences students, this book presents the reality in the statistical modeling of clinical, biomedical, and translational data with emphasis on clinically meaningful difference as effect size prior to random error quantification through p value, since p value, no matter how small, does not rule out uncertainties in our findings, and is not the measure of evidence but remains in large part a function of sample size, thus enhanc- ing findings generalizability. Laurens Holmes Jr. was trained in internal medicine, specializing in immunology and infectious diseases prior to his expertise in epidemiology-with-b iostatistics. Over the past two decades, Dr. Holmes had been working in cancer epidemiol- ogy, control, and prevention. His involvement in evidence discovery emphasizes reality in statistical modeling of clinical, biomedical, and translational research data. With his concentration in survival data modeling, he is committed to clini- cal and biologic relevance of data prior to statistical significance as evidence against the null hypothesis and not the measure of evidence. In survival model- ing, he advocates and stresses the importance of treatment effect heterogeneity and its application in drug development and therapeutics. http://taylorandfrancis.com Applied Biostatistical Principles and Concepts Clinicians’ Guide to Data Analysis and Interpretation Laurens Holmes Jr., MD, DrPH London and New York First published 2018 by Routledge 711 Third Avenue, New York, NY 10017 and by Routledge 2 Park Square, Milton Park, Abingdon, Oxon, OX14 4RN Routledge is an imprint of the Taylor & Francis Group, an Informa business © 2018 Taylor & Francis The right of Laurens Holmes Jr. to be identifed as author of this work has been asserted by him/ her/them in accordance with sections 77 and 78 of the Copyright, Designs and Patents Act 1988. All rights reserved. No part of this book may be reprinted or reproduced or utilised in any form or by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying and recording, or in any information storage or retrieval system, without permission in writing from the publishers. Trademark notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identifcation and explanation without intent to infringe. Library of Congress Cataloging‑in‑Publication Data Names: Holmes, Larry, Jr., 1960- author. Title: Applied biostatistical principles and concepts : clinicians’ guide to data analysis and interpretation / Laurens Holmes Jr. Description: Abingdon, Oxon ; New York, NY : Routledge, 2017. | Includes bibliographical references and index. Identifers: LCCN 2017023417| ISBN 9781498741194 (hardback) | ISBN 9781315369204 (ebook) Subjects: | MESH: Biometry | Research Design | Biostatistics | Data Interpretation, Statistical Classifcation: LCC R853.S7 | NLM WA 950 | DDC 610.1/5195--dc23 LC record available at https://lccn.loc.gov/2017023417 ISBN: 9781498741194 (hbk) ISBN: 9781315369204 (ebk) Dedicated to Jay Kumar, MD (Distinguished Orthopedic Surgeon and Scientist, in memoriam); and Richard Bowen, MD (Distinguished Orthopedic Surgeon and Scientist, Former Chair, Orthopedic Department, Nemours Alfred I. duPont Hospital for Children, Wilmington, Delaware) http://taylorandfrancis.com Contents Foreword xi Preface xv Acknowledgments xvii Author xix Introduction xxi SECTION I Design process 1 Basics of biomedical and clinical research 3 1.1 Introduction 3 1.2 Why conduct clinical research? 5 1.3 Study subjects 6 1.4 Subject selection 6 1.5 Sampling 7 1.6 Generalization 7 1.7 Sample size and power estimations 8 1.8 Screening (detection) and diagnostic (confirmation) tests 8 1.9 Balancing benefits and harmful effects in medicine 21 1.10 Summary 22 Questions for discussion 24 References 25 2 Research design: Experimental and nonexperimental studies 27 2.1 Introduction 27 2.2 Epidemiologic study designs 28 2.3 Nonexperimental designs 30 2.4 Experimental designs (clinical trials) 32 2.5 Nonexperimental versus experimental design 34 2.6 Measures of disease association or effect 36 2.7 Precision, random error, and bias 37 viii Contents 2.8 Confounding, covariates, effect measure modifier, interaction 39 2.9 Summary 41 Questions for discussion 42 References 43 3 Population, sample, probability, and biostatistical reasoning 45 3.1 Introduction 45 3.2 Populations 46 3.3 Sample and sampling strategies 47 3.4 Biostatistical reasoning 48 3.5 Measures of central tendency and dispersion 49 3.6 Standardized distribution—z score statistic 68 3.7 Basic probability notion 69 3.8 Simple and unconditional probability 69 3.9 Conditional probability 70 3.10 Independence and conditional probability 71 3.11 Probability distribution 72 3.12 Summary 72 Questions for discussion 73 References 74 SECTION II Biostatistical modeling 4 Statistical considerations in clinical research 7 4.1 Introduction 77 4.2 Types of variables 82 4.3 Variables and sources of variation (variability) 82 4.4 Sampling, sample size, and power 84 4.5 Research questions, hypothesis testing, and statistical inference 87 4.6 Summary 100 Questions for discussion 101 References 102 5 Study size and statistical power estimations 107 5.1 Introduction 107 5.2 Sample size characterization 110 5.3 Purpose of sample size 110 5.4 Sample size computation 110 5.5 Sample size estimation for single- or one-sample proportion hypothesis testing 114 5.6 One-sample estimation of sample size with outcome mean 116 5.7 Two independent samples: Proportions 117 Contents ix 5.8 Two independent group means 119 5.9 Prospective cohort or two-group comparison in clinical trials 120 5.10 Case–control study 121 5.11 Summary 122 Questions for discussion 123 References 123 6 Single sample statistical inference 125 6.1 Introduction 125 6.2 One-sample group design 130 6.3 Hypothesis statement 130 6.4 Test statistic 130 6.5 Inference from a nonnormal population—One-sample t test 136 6.6 Other types of t tests 138 6.7 Summary 145 Questions for discussion 148 References 148 7 Two independent samples statistical inference 151 7.1 Introduction 151 7.2 Independent (two-sample) t test and nonparametric alternative (Mann–Whitney u test) 152 7.3 z Test for two independent proportions 162 7.4 Chi-square test of proportions in two groups 164 7.5 Summary 168 Questions for discussion 169 References 170 8 Statistical inference in three or more samples 171 8.1 Introduction 171 8.2 Analysis of variance (ANOVA)? 173 8.3 Other hypothesis tests based on ANOVA 182 8.4 Summary 188 Questions for discussion 189 References 190 9 Statistical inference involving relationships or associations 191 9.1 Introduction 191 9.2 Correlation and correlation coefficients 202 9.3 Simple linear regression 209 9.4 Multiple/multivariable linear regression 217 9.5 Logistic regression technique 219 9.6 Model building and interpretation 222 9.7 Survival analysis: Time-to-event method 229