HAESE MATHEMATICS Specialists in mathematics publishing Mathematics for the international student Mathematics HL (Option): Statistics and Probability HL Topic 7 FM Topic 3 CCaatthheerriinnee QQuuiinnnn PPeetteerr BBllyytthhee RRoobbeerrtt HHaaeessee MMiicchhaaeell HHaaeessee for use with IB Diploma Programme 0 5 25 50 75 95 100 0 5 25 50 75 95 100 0 5 25 50 75 95 100 0 5 25 50 75 95 100 IB_HL-3ed cyan magenta yellow black Y:\HAESE\IB_HL_OPT-Stat-Prob\IB_HL_OPT-Stat-Prob_00\001IB_HL_OPT-Stat-Prob_00.cdr Wednesday, 17 April 2013 12:34:19 PM BRIAN MATHEMATICSFORTHEINTERNATIONALSTUDENT MathematicsHL(Option):StatisticsandProbability CatherineQuinn B.Sc.(Hons),Grad.Dip.Ed.,Ph.D. PeterBlythe B.Sc. RobertHaese B.Sc. MichaelHaese B.Sc.(Hons.),Ph.D. HaeseMathematics 152RichmondRoad,Marleston, SA5033,AUSTRALIA Telephone: +618 82104666, Fax: +618 83541238 Email: [email protected] Web: www.haesemathematics.com.au NationalLibraryofAustraliaCardNumber&ISBN 978-1-921972-31-7 ©Haese&HarrisPublications2013 PublishedbyHaeseMathematics. 152RichmondRoad,Marleston, SA5033,AUSTRALIA FirstEdition 2013 ArtworkbyBrianHouston. CoverdesignbyPiotrPoturaj. TypesetinAustraliabyDeanneGallasch.TypesetinTimesRoman10\Qw_. PrintedinMalaysiathroughBookpacProductionServices,Singapore. The textbook and its accompanying CD have been developed independently of the International Baccalaureate Organization (IBO). The textbook and CD are in no way connected with, or endorsed by, theIBO. This book is copyright. Except as permitted by the CopyrightAct (any fair dealing for the purposes of private study, research, criticism or review), no part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, without the prior permission of the publisher. Enquiries to be made to Haese Mathematics. Copying foreducational purposes:Where copies of part or the whole of the book are made under Part VB of the CopyrightAct, the law requires that the educational institution or the body that administers it has given a remuneration notice to Copyright Agency Limited (CAL). For information, contact the CopyrightAgencyLimited. Acknowledgements: While every attempt has been made to trace and acknowledge copyright, the authors and publishers apologise for any accidental infringement where copyright has proved untraceable. They wouldbepleasedtocometoasuitableagreementwiththerightfulowner. Disclaimer:Alltheinternetaddresses(URLs)giveninthisbookwerevalidatthetimeofprinting.While the authors and publisher regret any inconvenience that changes of address may cause readers, no responsibilityforanysuchchangescanbeacceptedbyeithertheauthorsorthepublisher. 0 5 25 50 75 95 100 0 5 25 50 75 95 100 0 5 25 50 75 95 100 0 5 25 50 75 95 100 IB_HL-3ed cyan magenta yellow black Y:\HAESE\IB_HL_OPT-Stat-Prob\IB_HL_OPT-Stat-Prob_00\002IB_HL_OPT-Stat-Prob_00.cdr Tuesday, 23 April 2013 9:31:43 AM BRIAN FOREWORD MathematicsHL(Option):StatisticsandProbabilityhasbeenwrittenasacompanionbooktothe MathematicsHL(Core)textbook.Together,theyaimtoprovidestudentsandteacherswith appropriatecoverageofthetwo-yearMathematicsHLCourse,tobefirstexaminedin2014. Thisbookcoversallsub-topicssetoutinMathematicsHLOptionTopic7andFurtherMathematics HLTopic3,StatisticsandProbability. Theaimofthistopicistointroducestudentstothebasicconceptsandtechniquesofstatisticsand probabilityandtheirapplications. Detailedexplanationsandkeyfactsarehighlightedthroughoutthetext.Eachsub-topiccontains numerousWorkedExamples,highlightingeachstepnecessarytoreachtheanswerforthatexample. TheoryofKnowledgeisacorerequirementintheInternationalBaccalaureateDiplomaProgramme, wherebystudentsareencouragedtothinkcriticallyandchallengetheassumptionsofknowledge. DiscussiontopicsforTheoryofKnowledgehavebeenincludedonpages157to159.Theseaimto helpstudentsdiscoverandexpresstheirviewsonknowledgeissues. TheaccompanyingstudentCDincludesaPDFofthefulltextandaccesstospeciallydesigned graphingsoftware. GraphicscalculatorinstructionsforCasiofx-9860GPlus,Casiofx-CG20,TI-84PlusandTI-nspire areavailablefromiconslocatedthroughoutthebook. Fullyworkedsolutionsareprovidedatthebackofthetext,howeverstudentsareencouragedto attempteachquestionbeforereferringtothesolution. Itisnotourintentiontodefinethecourse.Teachersareencouragedtouseotherresources.Wehave developedthisbookindependentlyoftheInternationalBaccalaureateOrganization(IBO)in consultationwithexperiencedteachersofIBMathematics.TheTextisnotendorsedbytheIBO. Inthischangingworldofmathematicseducation,webelievethatthecontextualapproachshownin thisbook,withassociateduseoftechnology,willenhancethestudentsunderstanding,knowledge andappreciationofmathematicsanditsuniversalapplications. Wewelcomeyourfeedback. Email: [email protected] CTQ PJB Web: www.haesemathematics.com.au RCH PMH ACKNOWLEDGEMENTS The authors and publishers would like to thank all those teachers who offered advice and encouragementonthisbook. 0 5 25 50 75 95 100 0 5 25 50 75 95 100 0 5 25 50 75 95 100 0 5 25 50 75 95 100 IB_HL-3ed cyan magenta yellow black Y:\HAESE\IB_HL_OPT-Stat-Prob\IB_HL_OPT-Stat-Prob_00\003IB_HL_OPT-Stat-Prob_00.cdr Friday, 19 April 2013 12:48:22 PM BRIAN USING THE INTERACTIVE STUDENT CD TheinteractiveCDisidealforindependentstudy. Students can revisit concepts taught in class and undertake their own revision andpractice.TheCDalsohasthetextofthebook,allowingstudentstoleave thetextbookatschoolandkeeptheCDathome. By clicking on the relevant icon, a range of interactive features can be accessed: INTERACTIVE (cid:2) Graphics calculator instructions for the Casio fx-9860G Plus, Casio fx- LINK CG20,TI-84PlusandtheTI-nspire (cid:2) Interactivelinkstographingsoftware GRAPHICS CALCULATOR INSTRUCTIONS 0 5 25 50 75 95 100 0 5 25 50 75 95 100 0 5 25 50 75 95 100 0 5 25 50 75 95 100 IB_HL-3ed cyan magenta yellow black Y:\HAESE\IB_HL_OPT-Stat-Prob\IB_HL_OPT-Calculus_00\004IB_HL_OPT-Stat-Prob_00.cdr Wednesday, 17 April 2013 11:29:29 AM BRIAN TABLEOFCONTENTS 5 TABLE OF CONTENTS SYMBOLSANDNOTATIONUSEDINTHISBOOK 6 A Expectationalgebra 9 B Discreterandomvariables 26 C Continuousrandomvariables 42 D Probabilitygeneratingfunctions 52 E DistributionsofthesamplemeanandtheCentralLimitTheorem 66 F Pointestimation(unbiasedestimatorsandestimates) 82 G Confidenceintervalsformeans 90 H Significanceandhypothesistesting 100 I BivariateStatistics 124 ReviewsetA 146 ReviewsetB 148 ReviewsetC 151 ReviewsetD 153 THEORYOFKNOWLEDGE(TheCentralLimitTheorem) 157 THEORYOFKNOWLEDGE(PopulationParameters) 158 WORKEDSOLUTIONS 160 INDEX 207 0 5 25 50 75 95 100 0 5 25 50 75 95 100 0 5 25 50 75 95 100 0 5 25 50 75 95 100 IB_HL-3ed cyan magenta yellow black Y:\HAESE\IB_HL_OPT-Stat-Prob\IB_HL_OPT-Stat-Prob_00\005IB_HL_OPT-Stat-Prob_00.cdr Friday, 19 April 2013 5:00:39 PM BRIAN 6 SYMBOLS AND NOTATION USED IN THIS BOOK ¼ is approximatelyequal to > is greater than > is greater than or equal to < is less than 6 is less than or equal to f......g the set of all elements...... 2 is an elementof 2= is not an element of N the set of all natural numbers f0, 1, 2, 3, ....g Z the set of integers f0, §1, §2, §3, ....g Q the set of rational numbers R the set of real numbers Z+ the set of positive integers f1, 2, 3, ....g µ is a subset of ½ is a proper subset of ) implies that )Á does not imply that f : A!B f is a functionunder which each element of set A has an image in set B f : x7!y f is a functionunder which x is mapped to y f(x) the image of x under the function f f ±g or f(g(x)) the compositefunctionof f and g jxj the modulus or absolutevalue of x [a, b] the closed interval a6x6b ]a, b[ the open interval a<x<b un the nth term of a sequence or series with first term u1 fung the sequence with nth term un, if first term is u1 Sn the sum of the first n terms of a sequence S1 the sum to infinity of a convergent series Pn ui u1+u2+u3+::::+un i=1 Qn ui u1£u2£u3£::::£un i=1 lim f(x) the limit of f(x) as x tends to a x!a lim f(x) the limit of f(x) as x tends to a from the positive side of a x!a+ lim f(x) the limit of f(x) as x tends to a from the negative side of a x!a¡ maxfa, bg the maximumvalue of a or b P1 cnxn the power series whose terms have form cnxn n=0 dy the derivativeof y with respect to x dx 0 5 25 50 75 95 100 0 5 25 50 75 95 100 0 5 25 50 75 95 100 0 5 25 50 75 95 100 IB HL OPT 2ed cyan magenta yellow black Calculus Y:\HAESE\IB_HL_OPT-Stat-Prob\IB_HL_OPT-Stat-Prob_00\006IB_HL_OPT-Stat-Prob_00.cdr Friday, 19 April 2013 1:31:23 PM BRIAN 7 f0(x) the derivativeof f(x) with respect to x d2y the second derivativeof y with respect to x dx2 f00(x) the second derivativeof f(x) with respect to x dny the nth derivativeof y with respect to x dxn f(n)(x) the nth derivativeof f(x) with respect to x R y dx the indefiniteintegral of y with respect to x R b y dx the definite integral of y with respect to x between the a limits x=a and x=b ex exponentialfunctionof x lnx the natural logarithmof x sin, cos, tan the circular functions csc, sec, cot the reciprocalcircularfunctions arcsin, arccos, arctan the inverse circular functions ¡ ¢ n n! r r!(n¡r)! P(A) probabilityof event A P(A0) probabilityof the event “not A” P(AjB) probabilityof the event A given B x1, x2, .... observations P(x) probabilitydistributionfunction Px = P(X =x) of the discrete random variableX f(x) probabilitydensity function of the continuousrandom variable X F(x) cumulativedistributionfunctionof the continuousrandom variable X E(X) the expected value of the random variable X Var(X) the variance of the random variableX ¹ populationmean Pn (xi¡¹)2 ¾2 populationvariance,the value ¾2 = i=1 , for a populationof size n n ¾ populationstandarddeviation x sample mean Pn (xi¡x)2 s2 sample variance, the value s2 = i=1 , from a sample of size n n n n sn standarddeviationof the sample of size n s2 unbiasedestimateof the populationvariance,the value n¡1 Pn (xi¡x)2 s2 = n s2 = i=1 , from a sample of size n n¡1 n¡1 n n¡1 Pn X the estimator of ¹, that is the function X = 1 Xi, where Xi, i=1, ...., n n i=1 are identicallydistributedindependentrandom variables each with mean ¹ 0 5 25 50 75 95 100 0 5 25 50 75 95 100 0 5 25 50 75 95 100 0 5 25 50 75 95 100 IB HL OPT 2ed cyan magenta yellow black Calculus Y:\HAESE\IB_HL_OPT-Stat-Prob\IB_HL_OPT-Stat-Prob_00\007IB_HL_OPT-Stat-Prob_00.cdr Friday, 19 April 2013 1:31:51 PM BRIAN 8 Pn (Xi¡X)2 S2 the biased estimator of ¾2, that is the function S2 = i=1 n n n where Xi, i = 1, ...., n are identically distributed independent random variables each with variance ¾2 n S2 the unbiasedestimatorof ¾2, that is the function S2 = S2 n¡1 n¡1 n¡1 n DU(n) discreteuniform distributionwith parametern B(1, p) Bernoullidistributionwith parameterp B(n, p) binomialdistributionwith parametersn and p Geo(p) geometricdistributionwith parameterp NB(r, p) negativebinomialdistributionwith parametersr and p Po(m) Poisson distributionwith mean m X » DU(n) the random variable X has a discreteuniform distributionwith parametern X » B(1, p) the random variable X has a Bernoullidistributionwith parameterp X » B(n, p) the random variable X has a binomialdistributionwith parametersn and p X » Geo(p) the random variable X has a geometricdistributionwith parameterp X » NB(r, p) the random variable X has a negativebinomialdistributionwith parametersr and p X » Po(m) the random variable X has a Poisson distributionwith mean m U(a, b) continuousuniformdistributionwith parametersa and b Exp(¸) exponentialdistributionwith mean 1 ¸ N(¹, ¾2) normal distributionwith mean ¹ and variance ¾2 º number of degrees of freedom t(º) Student’s t-distributionwith º degrees of freedom X » U(a, b) the random variable X has a continuousuniformdistributionwith parametersa and b X » Exp(¸) the random variable X has an exponentialdistributionwith mean 1 ¸ X » N(¹, ¾2) the random variable X has a normal distributionwith mean ¹ and variance ¾2 T » t(º) the random variable T has the Student’s t-distributionwith º degrees of freedom G(t) the probabilitygeneratingfunction E(tX) for a discrete random variableX which takes values in N p dependingon the context, a parameterof a distribution,a populationproportion, or a p-value in a hypothesistest pb a sample proportion H0 null hypothesis H1 alternativehypothesis ® significancelevel or probabilityof a Type I error ¯ probabilityof a Type II error 1¡¯ power of a hypothesistest Cov(X, Y) covarianceof random variablesX and Y ½ product momentcorrelationcoefficient betweentwo random variables R the sample product momentcorrelationcoefficient; an estimatorof ½ r the observed value of R for a given sample of bivariatedata; an estimateof ½ 0 5 25 50 75 95 100 0 5 25 50 75 95 100 0 5 25 50 75 95 100 0 5 25 50 75 95 100 IB HL OPT 2ed cyan magenta yellow black Calculus Y:\HAESE\IB_HL_OPT-Stat-Prob\IB_HL_OPT-Stat-Prob_00\008IB_HL_OPT-Stat-Prob_00.cdr Friday, 19 April 2013 1:32:17 PM BRIAN STATISTICSANDPROBABILITY 9 A EXPECTATION ALGEBRA A random variable can take any one of a set of values from a given domain, according to given probabilities. The domain may be discrete or continuous. DISCRETE RANDOM VARIABLES If X is a discrete random variable, then: 1 X has possible values x1, x2, x3, ..... To determinethe value of X we usually count. P 2 X takes value xi with probabilitypi, where 06pi 61, i=1, 2, 3, ...., and pi =1. 3 X has a probabilitydistributionfunction(or probabilitymass function) P(x), where P(xi)= P(X =xi)=pi, i=1, 2, 3, ..... P reads “thesumforall 4 X has a cumulativediPstributionfunction(CDF) F(x), where xi6x valuesofxiless F(x)= P(X 6x)= P(X =xi). thanorequaltox”. x 6x i F(x) is the probabilitythat X takes a value less than or equal to x. Examplesof discreteprobabilitydistributionsand random variables coveredin the Core course include Bernoulli,Discrete Uniform,Binomial,and Poisson. CONTINUOUS RANDOM VARIABLES If X is a continuousrandom variable, then: 1 Thepossiblevaluesof X may be all x2R, or all realx in somedomain[a, b]. To determinethe value of X we usually measure. 2 X has a continuousprobabilitydensity function(PDF) f(x), where: y ² f(x)>0 for all x in the domain of f. Z y=f(x) 1 Area=1 ² f(x)dx=1 if the domain of f is R, or ¡1 Z b f(x)dx=1 if the domain of f is [a, b]. a x 3 Suppose f and the PDF for X, have domain [a, b]. X has a cumulative distribution function Z x (CDF) F(x), where F(x)= P(X 6x)= f(t)dt for x2[a, b]. a y ² F(x) is the probabilitythat X takes a value less than y=f(t) or equalZto x. Area=F(x) b ² F(b)= f(t)dt=1 a t a x b 0 5 25 50 75 95 100 0 5 25 50 75 95 100 0 5 25 50 75 95 100 0 5 25 50 75 95 100 IB HL OPT 2ed cyan magenta yellow black Y:\HAESE\IB_HL_OPT-Stat-Prob\IB_HL_OPT-Stat-Prob_01\009IB_HL_OPT-Stat-Prob_01.cdr Friday, 12 April 2013 2:42:40 PM BRIAN 10 STATISTICSANDPROBABILITY ² The probabilitythat X takes a value in the interval y [c, d]µ[a, b] is given by Z d Area=P(c6X6d) P(c6X 6d)= f(t)dt c =F(d)¡F(c) t a c d b 4 Since X has infinitely many possible values, the probability that X takes a single value X = x is 0. However, since X is a continuousrandomvariable,for x2Z and x¡0:56X <x+0:5, the value of X will be roundedto the integer x. Thus, for x2Z, we define P(X =x)=P(x¡0:56X <x+0:5) Z x+0:5 = f(t)dt x¡0:5 Youshouldrecognise =F(x+0:5)¡F(x¡0:5) theNormaldistribution fromtheCorecourse. Example1 Givenarandomvariable X »N(7:2,28), find P(X =10). P(X =10)=P(9:56X <10:5) ¼0:0655 For a continuousrandom variable X, P(c6X <d) =P(c6X 6d) =P(c<X 6d) =P(c<X <d) sincethecorrespondingintegralsalldefinethesameareaunderthecurve y =f(t) between t=c and t=d. ExamplesofcontinuousprobabilitydistributionsandrandomvariablescoveredintheCorecourseinclude ContinuousUniform, Exponential,and the Normal distribution. EXPECTATION The mean or expected value or expectationE(X) of a random variableX is defined as follows: ² If X is a discrete random variable with set of possible vaPlues x1, x2, .... and probabilitymass function P(X =xi)=pi, i=1, 2, ...., E(X)=¹= xiP(X =xi) Pi = xipi, i=1, 2, .... i ² If X is a continuousrandom variable with probabilitydensity function f(x) with domain [a, b], Z b E(X)=¹= xf(x)dx. a 0 5 25 50 75 95 100 0 5 25 50 75 95 100 0 5 25 50 75 95 100 0 5 25 50 75 95 100 IB HL OPT 2ed cyan magenta yellow black Y:\HAESE\IB_HL_OPT-Stat-Prob\IB_HL_OPT-Stat-Prob_01\010IB_HL_OPT-Stat-Prob_01.cdr Friday, 12 April 2013 2:43:15 PM BRIAN