ebook img

Introduction to Data Analysis and Graphical Presentation in Biostatistics with R: Statistics in the Large PDF

172 Pages·2014·1.82 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Introduction to Data Analysis and Graphical Presentation in Biostatistics with R: Statistics in the Large

SPRINGER BRIEFS IN STATISTICS Thomas W. MacFarland Introduction to Data Analysis and Graphical Presentation in Biostatistics with R Statistics in the Large SpringerBriefs in Statistics Forfurthervolumes: http://www.springer.com/series/8921 Thomas W. MacFarland Introduction to Data Analysis and Graphical Presentation in Biostatistics with R Statistics in the Large 123 ThomasW.MacFarland OfficeforInstitutional Effectiveness NovaSoutheasternUniversity FortLauderdale,FL,USA ISSN2191-544X ISSN2191-5458(electronic) ISBN978-3-319-02531-5 ISBN978-3-319-02532-2(eBook) DOI10.1007/978-3-319-02532-2 SpringerChamHeidelbergNewYorkDordrechtLondon LibraryofCongressControlNumber:2013953880 ©TheAuthor(s)2014 Thisworkissubjecttocopyright.AllrightsarereservedbythePublisher,whetherthewholeorpartof thematerialisconcerned,specificallytherightsoftranslation,reprinting,reuseofillustrations,recitation, broadcasting,reproductiononmicrofilmsorinanyotherphysicalway,andtransmissionorinformation storageandretrieval,electronicadaptation,computersoftware,orbysimilarordissimilarmethodology nowknownorhereafterdeveloped.Exemptedfromthislegalreservationarebriefexcerptsinconnection with reviews or scholarly analysis or material supplied specifically for the purpose of being entered and executed on a computer system, for exclusive use by the purchaser of the work. Duplication of this publication or parts thereof is permitted only under the provisions of the Copyright Law of the Publisher’slocation,initscurrentversion,andpermissionforusemustalwaysbeobtainedfromSpringer. PermissionsforusemaybeobtainedthroughRightsLinkattheCopyrightClearanceCenter.Violations areliabletoprosecutionundertherespectiveCopyrightLaw. Theuseofgeneraldescriptivenames,registerednames,trademarks,servicemarks,etc.inthispublication doesnotimply,evenintheabsenceofaspecificstatement,thatsuchnamesareexemptfromtherelevant protectivelawsandregulationsandthereforefreeforgeneraluse. While the advice and information in this book are believed to be true and accurate at the date of publication,neithertheauthorsnortheeditorsnorthepublishercanacceptanylegalresponsibilityfor anyerrorsoromissionsthatmaybemade.Thepublishermakesnowarranty,expressorimplied,with respecttothematerialcontainedherein. Printedonacid-freepaper SpringerispartofSpringerScience+BusinessMedia(www.springer.com) Contents 1 Introduction:BiostatisticsandR........................................... 1 1.1 PurposeofThisText................................................... 1 1.2 DevelopmentofBiostatistics.......................................... 2 1.3 DevelopmentofR ..................................................... 3 1.4 HowRisUsedinThisText........................................... 4 2 DataExploration,DescriptiveStatistics,andMeasures ofCentralTendency ......................................................... 5 2.1 BackgroundonThisLesson........................................... 5 2.1.1 DescriptionoftheData........................................ 5 2.1.2 NullHypothesis(Ho) ......................................... 7 2.2 DataImportofa.csvSpreadsheet-TypeDataFileintoR ........... 7 2.3 OrganizetheDataandDisplaytheCodeBook ...................... 9 2.4 ConductaVisualDataCheck......................................... 9 2.5 DescriptiveAnalysisoftheData...................................... 10 2.6 Summary............................................................... 13 2.7 Addendum:SpecializedExternalPackagesandFunctions.......... 13 2.8 PreparetoExit,Save,andLaterRetrieveThisRSession ........... 15 3 Student’st-TestforIndependentSamples................................. 17 3.1 BackgroundonThisLesson........................................... 17 3.1.1 DescriptionoftheData........................................ 17 3.1.2 NullHypothesis(Ho) ......................................... 18 3.2 DataImportofa.csvSpreadsheet-TypeDataFileintoR ........... 19 3.3 OrganizetheDataandDisplaytheCodeBook ...................... 20 3.4 ConductaVisualDataCheck......................................... 23 3.5 DescriptiveAnalysisoftheData...................................... 34 3.6 ConducttheStatisticalAnalysis ...................................... 40 3.7 Summary............................................................... 42 3.8 Addendum:t-Statisticvz-Statistic ................................... 43 3.8.1 CreatetheEnumeratedDataset............................... 44 v vi Contents 3.8.2 Calculatethet-Statistic........................................ 44 3.8.3 Calculatethez-Statistic....................................... 45 3.9 PreparetoExit,Save,andLaterRetrieveThisRSession ........... 45 4 Student’st-TestforMatchedPairs......................................... 47 4.1 BackgroundonThisLesson........................................... 47 4.1.1 DescriptionoftheData........................................ 47 4.1.2 NullHypothesis(Ho) ......................................... 49 4.1.3 UnstackedDataandStackedData............................ 49 4.2 DataImportofa.csvSpreadsheet-TypeDataFileintoR ........... 51 4.3 OrganizetheDataandDisplaytheCodeBook ...................... 52 4.4 ConductaVisualDataCheck......................................... 54 4.5 DescriptiveAnalysisoftheData...................................... 60 4.6 ConducttheStatisticalAnalysis ...................................... 63 4.7 Summary............................................................... 65 4.8 Addendum1: Stacked Data and Student’s t-Test forMatchedPairs...................................................... 66 4.9 Addendum2:TheImpactofNonStudent’st-Test.................. 70 4.10 PreparetoExit,Save,andLaterRetrieveThisRSession ........... 72 5 OnewayAnalysisofVariance(ANOVA)................................... 73 5.1 BackgroundonThisLesson........................................... 73 5.1.1 DescriptionoftheData........................................ 73 5.1.2 NullHypothesis(Ho) ......................................... 75 5.2 DataImportofa.csvSpreadsheet-TypeDataFileintoR ........... 75 5.3 OrganizetheDataandDisplaytheCodeBook ...................... 77 5.4 ConductaVisualDataCheck......................................... 82 5.5 DescriptiveAnalysisoftheData...................................... 87 5.6 ConducttheStatisticalAnalysis ...................................... 89 5.6.1 ExploratoryOnewayANOVA ................................ 90 5.6.2 OnewayANOVAMethod1:lm()andanova()Functions... 91 5.6.3 Oneway ANOVA Method 2: aov() and TukeyHSD()Functions........................................ 92 5.7 Summary............................................................... 93 5.8 Addendum:OtherPackagesforDisplayofOnewayANOVA ...... 96 5.9 PreparetoExit,Save,andLaterRetrieveThisRSession ........... 97 6 TwowayAnalysisofVariance(ANOVA)................................... 99 6.1 BackgroundonThisLesson........................................... 99 6.1.1 DescriptionoftheData........................................ 99 6.1.2 NullHypothesis(Ho) ......................................... 100 6.2 DataImportofa.csvSpreadsheet-TypeDataFileintoR ........... 100 6.3 OrganizetheDataandDisplaytheCodeBook ...................... 101 6.4 ConductaVisualDataCheck......................................... 104 6.5 DescriptiveAnalysisoftheData...................................... 111 6.6 ConducttheStatisticalAnalysis ...................................... 117 Contents vii 6.7 Summary............................................................... 122 6.8 Addendum:OtherPackagesforDisplayofTwowayANOVA ...... 124 6.9 PreparetoExit,Save,andLaterRetrieveThisRSession ........... 126 7 CorrelationandLinearRegression........................................ 129 7.1 BackgroundonThisLesson........................................... 129 7.1.1 DescriptionoftheData........................................ 129 7.1.2 NullHypothesis(Ho) ......................................... 130 7.2 DataImportofa.csvSpreadsheet-TypeDataFileintoR ........... 131 7.3 OrganizetheDataandDisplaytheCodeBook ...................... 132 7.4 ConductaVisualDataCheck......................................... 135 7.5 DescriptiveAnalysisoftheData...................................... 140 7.6 ConducttheStatisticalAnalysis ...................................... 142 7.6.1 CorrelationUsingPearson’sr................................. 142 7.6.2 LinearRegression ............................................. 150 7.7 Summary............................................................... 154 7.8 Addendum:MultipleRegression ..................................... 155 7.8.1 Hand-CalculateMultipleRegression......................... 156 7.8.2 MinimalAdequateModel(MAM)forRegression .......... 158 7.8.3 StepwiseRegression .......................................... 160 7.9 PreparetoExit,Save,andLaterRetrieveThisRSession ........... 163 8 FutureActionsandNextSteps ............................................. 165 8.1 UseofThisText ....................................................... 165 8.2 FutureUseofRforBiostatistics...................................... 166 8.3 ExternalResources .................................................... 167 8.4 ContacttheAuthor..................................................... 167 Chapter 1 Introduction: Biostatistics and R Abstract The purpose of this lesson is to provide context for the science of biostatistics and to highlight a few of the major contributors. Emphasis is given to the role of data analysis for the various disciplines in the biological sciences (e.g., agriculture, biology, clinical trials, ecology, environmental health, epidemi- ology, genetics, health sciences, nutrition, public health, etc.). The practice of biostatistics is then linked to the use of R, a free and open source software environment. As explained, each problem in this text is associated with a .csv (comma-separated values) ASCII file, a Code Book detailing data organization, qualityassurancethroughgraphicalpresentationsanddescriptivestatistics,selected statisticalanalyses,summaryofoutcomes,andanaddendumofferingideasonhow Rcanbeusedforadditionalinsightintobiostatistics. Keywords Agriculture • Biology • Biostatistics • Census • Clinical trials • Code Book • Comma-separated values ASCII file • Command Line Interface (CLI) • ComprehensiveRArchiveNetwork(CRAN) • CRAN ContributedPack- ages • Data analysis • Descriptive statistics • Ecology • Environmental health • Epidemiology • Genetics • GraphicalUserInterface(GUI) • Healthsciences • Nutrition • Opensourcesoftware • Publichealth • R • S • Scheme 1.1 PurposeofThisText Scientists use empiricism to guide and validate decisions. Precision, orderliness, analysis,andasoundbackgroundinstatisticsaredirectlyassociatedwithinformed judgment,decision-making,andthesubsequentallocationofhuman,physical,and fiscalresources–alltoimprovethehumancondition.Thepurposeofthistextisto provideanintroductiontotheuseofRsoftwareasaplatformforproblemsrelated to biostatistics. Data identification, data organization, graphical and descriptive portrayalof phenomena,and statistical tests through the use of R are all inherent tothistext. T.W.MacFarland,IntroductiontoDataAnalysisandGraphicalPresentation 1 inBiostatisticswithR,SpringerBriefsinStatistics,DOI10.1007/978-3-319-02532-2__1, ©TheAuthor(s)2014 2 1 Introduction:BiostatisticsandR RsupportsaGraphicalUserInterface(GUI),theRCommander.Thisresourceis availableasanexternalRpackage,Rcmdr.Rcmdrisfairlyeasytousebuteventually therearelimitsontheuseofRCommander. R also supports a far more robust and useful syntax-based Command Line Interface (CLI) approach to statistics. This text is focused on the use of R-based syntax, working at the command line, to address data organization, statistical analyses, and graphical presentations as they relate to biostatistics. A series of smallconfidence-buildingactivitiesarepresentedatthebeginningofthistext,with moredetailgraduallyintroducedasthetextisfollowedfrombeginningtoend.All examples are for biostatistics. The many examples presented in this text can be easilyappliedtoallareasofbiostatistics,regardlessofmajorareaofstudy. 1.2 Development of Biostatistics Thetermstatisticsisderivedfromstatus,theLatintermforstate.Thus,thescience and practice of statistics, as we think of it today, was first associated with data relatingtothestate,suchascensuscountsandhealthrecords.Giventheimportance of statistics as a part of state governance, there are more than a few accounts of census-takingandhealthrecordsfromtheearliestdaysofrecordedhistory. Going beyond mere record-keeping, an interest in the mathematics of chance (e.g., probability) began to develop in the 1500s and 1600s, especially among those who engaged in European court life. The early interest in probability may nothavebeenaltruisticbutwasinsteadfocusedongainingadvantageincardgames andotherformsofgambling.Theuseofprobabilitytosolveproblemsforsocietal gain may not have been the first interest but instead attention was focused on the question, Given that there a X cards in the deck, if I discard the Y card from my hand,whatisthechancethatIwilldrawtheZcardfromthedeckandimprovemy chanceofwinningthisgameofcards? Thisearlyinterestinprobabilityandeventuallytheevolvingscienceofstatistics as a vehicle for social improvement eventually grew into what we think of as biostatistics. It is far beyond the purpose of this introductory text on the use of Rinbiostatisticstogointotoomuchdetail,butataminimumitwouldbehelpful tolookintothebiographyandcontributionsofthefollowingfoundersofwhatwe nowconsiderbiostatistics: • BlaisePascal(1623–1662),preparedearlywritingsonprobabilityanddeveloped thePascaline(e.g.,mechanicalcalculator). • JohnGraunt(1620–1674),publishedNaturalandPoliticalObservationsMade UpontheBillsofMortality,perhapsthefirstwidely-readtextondemographics, publichealth,andepidemiology. • JohnSnow(1813–1858),advocatedforepidemiologyandthe1854BroadStreet (London)CholeraOutbreak.

Description:
Through real-world datasets, this book shows the reader how to work with material in biostatistics using the open source software R. These include tools that are critical to dealing with missing data, which is a pressing scientific issue for those engaged in biostatistics. Readers will be equipped t
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.