BUSINESS ANALYTICS AND STATISTICS BLACK | ASAFU-ADJAYE | BURKE | KHAN | KING PERERA | PAPADIMOS | SHERWOOD | WASIMI Business analytics and statistics FIRST EDITION Ken Black John Asafu-Adjaye Paul Burke Nazim Khan Gerard King Nelson Perera Andrew Papadimos Carl Sherwood Saleh Wasimi Firsteditionpublished2019by JohnWiley&SonsAustralia,Ltd 42McDougallStreet,MiltonQld4064 Typesetin10/12ptTimesLTStd ©JohnWiley&SonsAustralia,Ltd2019 AuthorisedadaptationofAustralasianBusinessStatistics,4thedn(ISBN9780730312932), publishedbyJohnWiley&Sons,BrisbaneAustralia.©2010,2013,2016.Allrights reserved. Themoralrightsoftheauthorshavebeenasserted. AcataloguerecordforthisbookisavailablefromtheNationalLibraryofAustralia. ReproductionandCommunicationforeducationalpurposes TheAustralianCopyrightAct1968(theAct)allowsamaximumofonechapteror10%of thepagesofthisworkor—wherethisworkisdividedintochapters—onechapter, whicheveristhegreater,tobereproducedand/orcommunicatedbyanyeducational institutionforitseducationalpurposesprovidedthattheeducationalinstitution(orthe bodythatadministersit)hasgivenaremunerationnoticetoCopyrightAgencyLimited (CAL). ReproductionandCommunicationforotherpurposes ExceptaspermittedundertheAct(forexample,afairdealingforthepurposesofstudy, research,criticismorreview),nopartofthisbookmaybereproduced,storedinaretrieval system,communicatedortransmittedinanyformorbyanymeanswithoutpriorwritten permission.Allinquiriesshouldbemadetothepublisher. Theauthorsandpublisherwouldliketothankthecopyrightholders,organisationsand individualsforthepermissiontoreproducecopyrightmaterialinthisbook. Everyefforthasbeenmadetotracetheownershipofcopyrightmaterial.Informationthat willenablethepublishertorectifyanyerrororomissioninsubsequenteditionswillbe welcome.Insuchcases,pleasecontactthePermissionsSectionofJohnWiley&Sons Australia,Ltd. Coverdesignimage:sdecoret/Shutterstock.com TypesetinIndiabyAptara PrintedinSingaporeby MarkonoPrintMediaPteLtd 10 9 8 7 6 5 4 3 2 1 BRIEF CONTENTS Preface ix Keyfeatures x Abouttheauthors xi 1.Dataandbusinessanalytics 1 2.Datavisualisation 17 3.Descriptivesummarymeasures 60 4.Probability 108 5.Discretedistributions 155 6.Thenormaldistributionandothercontinuousdistributions 191 7.Samplingandsamplingdistributions 215 8.Statisticalinference:estimationforsinglepopulations 249 9.Statisticalinference:hypothesistestingforsinglepopulations 285 10.Statisticalinferencesabouttwopopulations 335 11.Analysisofvarianceanddesignofexperiments 397 12.Chi-squaretests 444 13.Simpleregressionanalysis 471 14.Multipleregressionanalysis 518 15.Time-seriesforecastingandindexnumbers 553 AppendixA:Tables 619 AppendixB:Fundamentalsymbolsandabbreviations 656 Index 659 CONTENTS Preface ix CHAPTER 3 Keyfeatures x Descriptive summary Abouttheauthors xi measures 60 CHAPTER 1 Introduction 61 Data and business analytics 1 3.1Measuresofcentraltendency 62 Introduction 2 Mode 62 1.1Informingbusinessstrategy 2 Median 63 1.2Businessanalytics 3 Mean 64 1.3Basicstatisticalconcepts 4 3.2Measuresoflocation 67 Typesofdata 6 Percentiles 67 1.4Bigdata 7 Quartiles 69 1.5Datamining 9 3.3Measuresofvariability 71 Machinelearning 11 Range 72 Summary 13 Interquartilerange 72 Keyterms 14 Varianceandstandarddeviation 73 Reviewproblems 14 Populationversussamplevarianceandstandard References 16 deviation 79 Acknowledgements 16 Computationalformulasforvarianceandstandard deviation 81 z-scores 83 CHAPTER 2 Coefficientofvariation 84 Data visualisation 17 3.4Measuresofshape 86 Introduction 18 Skewness 87 2.1Frequencydistributions 18 Skewnessandtherelationshipofthemean,median Classmidpoint 20 andmode 88 Relativefrequency 20 Coefficientofskewness 88 Cumulativefrequency 20 Kurtosis 88 2.2Basicgraphicaldisplaysofdata 24 Box-and-whiskerplots 89 Histograms 24 3.5Measuresofassociation 93 Frequencypolygons 28 Correlation 93 Ogives 30 Summary 100 Piecharts 32 Keyterms 101 Stem-and-leafplots 33 Keyequations 102 Paretocharts 35 Reviewproblems 103 Scatterplots 37 Mathsappendix 106 2.3Multidimensionalvisualisation 41 Acknowledgements 107 Representations 42 Manipulations 48 CHAPTER 4 2.4Datavisualisationtools 51 Probability 108 Interactivevisualisations 51 Visualisationsoftware 53 Introduction 109 Summary 54 4.1Methodsofdeterminingprobabilities 110 Keyterms 54 Classicalmethod 110 Reviewproblems 55 Relativefrequencyofoccurrencemethod 111 Acknowledgements 59 Subjectiveprobabilitymethod 112 4.2Structureofprobability 114 GraphingPoissondistributions 181 Experiment 114 Poissonapproximationofthebinomial Event 114 distribution 182 Elementaryevents 114 Summary 187 Samplespace 114 Keyterms 187 Setnotation,unionsandintersections 115 Keyequations 188 Mutuallyexclusiveevents 116 Reviewproblems 188 Independentevents 117 Acknowledgements 190 Collectivelyexhaustiveevents 118 Complementaryevents 118 CHAPTER 6 4.3Contingencytablesandprobability The normal distribution matrices 120 and other continuous Marginal,union,jointandconditional probabilities 120 distributions 191 Probabilitymatrices 121 Introduction 192 4.4Additionlaws 124 6.1Thenormaldistribution 192 Generallawofaddition 124 Historyandcharacteristicsofthenormal Speciallawofaddition 129 distribution 194 4.5Multiplicationlaws 132 6.2Thestandardisednormaldistribution 196 Generallawofmultiplication 132 6.3Solvingnormaldistributionproblems 198 Speciallawofmultiplication 133 6.4Thenormaldistributionapproximation 4.6Conditionalprobability 136 tothebinomialdistribution 202 Assessingindependence 139 6.5Theuniformdistribution 205 Treediagrams 141 6.6Theexponentialdistribution 209 RevisingprobabilitiesandBayes’rule 143 Probabilitiesfortheexponentialdistribution 210 Summary 148 Summary 212 Keyterms 149 Keyterms 212 Keyequations 149 Keyequations 212 Reviewproblems 150 Reviewproblems 213 Acknowledgements 154 Acknowledgements 214 CHAPTER 5 CHAPTER 7 Discrete distributions 155 Sampling and sampling Introduction 156 distributions 215 5.1Discreteversuscontinuousdistributions 156 Introduction 216 5.2Describingadiscretedistribution 158 7.1Sampling 216 Mean,varianceandstandarddeviationofdiscrete Reasonsforsampling 216 distributions 159 Reasonsfortakingacensus 217 5.3Binomialdistribution 163 Samplingframe 218 Assumptionsaboutthebinomialdistribution 163 7.2Randomversusnonrandomsampling 218 Solvingabinomialproblem 165 Randomsamplingtechniques 219 Usingthebinomialtable 169 Nonrandomsampling 223 Meanandstandarddeviationofabinomial 7.3Typesoferrorsfromcollecting distribution 171 sampledata 225 Graphingbinomialdistributions 172 Samplingerror 225 5.4Poissondistribution 176 Nonsamplingerrors 226 SolvingPoissonproblemsbyformula 178 7.4Samplingdistributionofthesample MeanandstandarddeviationofaPoisson mean,x̄ 227 distribution 180 CONTENTS v Centrallimittheorem 231 9.2Thesix-stepapproachtohypothesis Samplingfromafinitepopulation 236 testing 296 7.5Samplingdistributionofthesample Step1:SetupH andH 296 0 a proportion,p̂ 239 Step2:Decideonthetypeanddirection Summary 244 ofthetest 296 Keyterms 245 Step3:Decideonthelevelofsignificance(𝛼), Keyequations 246 determinethecriticalvalue(s)andregion(s),and Reviewproblems 246 drawadiagram 296 Acknowledgements 248 Step4:Writedownthedecisionrule 296 Step5:Selectarandomsampleanddorelevant CHAPTER 8 calculations 296 Statistical inference: estimation Step6:Drawaconclusion 297 9.3Hypothesistestsforapopulationmean:large for single populations 249 samplecase(zstatistic,𝜎known) 297 Introduction 250 Step1:SetupH0andHa 298 8.1Estimatingthepopulationmeanusingthe Step2:Decideonthetypeanddirection zstatistic(𝜎known) 250 ofthetest 298 Finitepopulationcorrectionfactor 256 Step3:Decideonthelevelofsignificance(𝛼), Estimatingthepopulationmeanusingthezstatistic determinethecriticalvalue(s)andregion(s),and whenthesamplesizeissmall 257 drawadiagram 298 8.2Estimatingthepopulationmeanusingthe Step4:Writedownthedecisionrule 299 tstatistic(𝜎unknown) 259 Step5:Selectarandomsampleanddorelevant Thetdistribution 260 calculations 299 Robustness 260 Step6:Drawaconclusion 299 Characteristicsofthetdistribution 260 Testingthemeanwithafinitepopulation 300 Readingthetdistributiontable 261 Thecriticalvaluemethod 300 Confidenceintervalstoestimatethepopulation Thep-valuemethod 302 meanusingthetstatistic 262 9.4Hypothesistestsaboutapopulationmean: 8.3Estimatingthepopulationproportion 266 smallsamplecase(tstatistic,𝜎unknown) 306 8.4Estimatingthepopulationvariance 270 9.5Testinghypothesesaboutaproportion 311 8.5Estimatingsamplesize 274 9.6Testinghypothesesaboutavariance 316 Samplesizewhenestimating𝜇 274 9.7SolvingforTypeIIerrors 320 Determiningsamplesizewhenestimatingp 276 SomeobservationsaboutTypeIIerrors 325 Summary 280 Operatingcharacteristicandpowercurves 325 Keyterms 281 Effectofincreasingsamplesizeontherejection Keyequations 281 limits 326 Reviewproblems 282 Summary 330 Acknowledgements 284 Keyterms 331 Keyequations 332 CHAPTER 9 Reviewproblems 332 Acknowledgements 334 Statistical inference: hypothesis testing for single CHAPTER 10 populations 285 Statistical inferences about two populations 335 Introduction 286 9.1Hypothesis-testingfundamentals 286 Introduction 336 Rejectionandnonrejectionregions 289 10.1Hypothesistestingandconfidenceintervalsfor TypeIandTypeIIerrors 292 thedifferencebetweentwomeans(zstatistic, Howarealphaandbetarelated? 294 populationvariancesknown) 336 vi CONTENTS Hypothesistesting 337 Reviewproblems 436 Confidenceintervals 343 Mathsappendix 441 10.2Hypothesistestingandconfidenceintervalsfor Acknowledgements 443 thedifferencebetweentwomeans(tstatistic, populationvariancesunknown) 349 CHAPTER 12 Hypothesistesting 350 Chi-square tests 444 Confidenceintervals 357 Introduction 445 10.3Statisticalinferencesabouttwopopulations 12.1Chi-squaregoodness-of-fittest 445 withpairedobservations 362 Step1:SetupH andH 447 Hypothesistesting 363 0 a Step2:Decideonthetypeoftest 447 Confidenceintervals 368 Step3:Decideonthelevelofsignificance𝜶and 10.4Statisticalinferencesabouttwopopulation determinethecriticalvalue(s)andregion(s) 448 proportions 373 Step4:Writedownthedecisionrule 448 Hypothesistesting 374 Step5:Selectarandomsampleanddorelevant Confidenceintervals 378 calculations 448 10.5Statisticalinferencesabouttwopopulation Step6:Drawaconclusion 448 variances 381 12.2Contingencyanalysis:chi-squaretestof Hypothesistesting 382 independence 458 Confidenceintervals 387 Step1:SetupH andH 460 Summary 390 0 a Step2:Decideonthetypeoftest 460 Keyterms 390 Step3:Decideonthelevelofsignificance𝜶and Keyequations 391 determinethecriticalvalue(s)andregion(s) 460 Reviewproblems 392 Step4:Writedownthedecisionrule 460 Mathsappendix 396 Step5:Selectarandomsampleanddorelevant Acknowledgements 396 calculations 460 Step6:Drawaconclusion 461 CHAPTER 11 Summary 466 Analysis of variance and design Keyterms 466 of experiments 397 Keyequations 466 Reviewproblems 467 Introduction 398 Acknowledgements 470 11.1Introductiontodesignofexperiments 398 11.2Thecompletelyrandomiseddesign CHAPTER 13 (one-wayANOVA) 400 ReadingtheFdistributiontable 404 Simple regression analysis 471 11.3Multiplecomparisontests 409 Introduction 472 Tukey’shonestlysignificantdifference(HSD)test: 13.1Examiningtherelationshipbetweentwo Thecaseofequalsamplesizes 409 variables 472 Tukey–Kramerprocedure:Thecaseofunequal 13.2Determiningtheequationofthe samplesizes 413 regressionline 475 11.4Therandomisedblockdesign 415 13.3Residualanalysis 485 11.5Afactorialdesign(two-wayANOVA) 421 Usingresidualstotesttheassumptionsofthe Advantagesoffactorialdesign 422 regressionmodel 488 Factorialdesignswithtwotreatments 422 13.4Standarderroroftheestimate 492 Statisticallytestingafactorialdesign 423 13.5Coefficientofdetermination 497 Interaction 424 Relationshipbetweenrandr2 499 Summary 434 13.6Hypothesistestsfortheslopeoftheregression Keyterms 435 modelandtestingtheoverallmodel 500 Keyequations 436 Testingtheslope 500 CONTENTS vii 13.7Estimationandprediction 505 CHAPTER 15 Confidence(prediction)intervalstoestimatethe Time-series forecasting and conditionalmeanofy:𝜇 505 y/x index numbers 553 Predictionintervalstoestimateasingle valueofy 506 Introduction 554 Interpretingtheoutput 510 15.1Componentsofatimeseries 554 Summary 511 Trendcomponent 554 Keyterms 512 Seasonalcomponent 554 Keyequations 512 Cyclicalcomponent 555 Reviewproblems 513 Irregular(orrandom)component 555 Acknowledgements 517 15.2Time-seriessmoothingmethods 557 Themovingaveragemethod 557 CHAPTER 14 Theexponentialsmoothingmethod 561 Multiple regression Seasonalindices 563 Deseasonalisingtimeseries 567 analysis 518 15.3Leastsquarestrend-basedforecasting Introduction 519 models 573 14.1Themultipleregressionmodel 519 Thelineartrendmodel 573 Multipleregressionmodelwithtwoindependent Thequadratictrendmodel 577 variables(first-order) 521 Theexponentialtrendmodel 578 Determiningthemultipleregressionequation 521 15.4Autoregressivetrend-basedforecasting 14.2Significancetestsoftheregressionmodeland models 582 itscoefficients 528 Testingforautocorrelation 582 Testingtheoverallmodel 528 Waystoovercometheautocorrelation Significancetestsoftheregression problem 585 coefficients 530 15.5Evaluatingalternativeforecastingmodels 593 14.3Residuals,standarderroroftheestimate 15.6Indexnumbers 597 andR2 534 Simplepriceindex 597 Residuals 534 Aggregatepriceindices 597 SSEandstandarderroroftheestimate 536 Changingthebaseperiod 603 Coefficientofmultipledetermination(R2) 539 Applicationsofpriceindices 605 AdjustedR2 540 Summary 608 14.4Interpretingmultipleregressioncomputer Keyterms 609 output 543 Keyequations 610 Are-examinationofmultipleregression Reviewproblems 611 output 543 Acknowledgements 618 Summary 546 Keyterms 546 AppendixA:Tables 619 Keyequations 547 AppendixB:Fundamentalsymbolsand Reviewproblems 547 abbreviations 656 Acknowledgements 552 Index 659 viii CONTENTS