DATA HANDLING INSCIENCE ANDTECHNOLOGY —VOLUME25 Statistical design — Chemometrics i DATAHANDLING IN SCIENCE ANDTECHNOLOGY Advisory Editors: S.Rutan and B.Walczak Other volumes in this series: Volume1 MicroprocessorProgrammingandApplicationsforScientistsandEngineers,by R.R.Smardzewski Volume2 Chemometrics:ATextbook,byD.L.Massart,B.G.M.Vandeginste,S.N.Deming, Y.MichotteandL.Kaufman Volume3 ExperimentalDesign:AChemometricApproach,byS.N.DemingandS.L.Morgan Volume4 AdvancedScientificComputinginBASICwithApplicationsinChemistry,Biology andPharmacology,byP.Valko´ andS.Vajda Volume5 PCsforChemists,editedbyJ.Zupan Volume6 ScientificComputingandAutomation(Europe)1990,PreceedingsoftheScientific ComputingandAutomation(Europe)Conference,12–15June,1990,Maastricht, TheNetherlands,editedbyE.J.Karjalainen Volume7 ReceptorModelingforAirQualityManagement,editedbyP.K.Hopke Volume8 DesignandOptimizationinOrganicSynthesis,byR.Carlson Volume9 MultivariatePatternRecognitioninChemometrics,illustratedbycasestudies, editedbyR.G.Brereton Volume10 SamplingofHeterogeneousandDynamicMaterialSystems:Theoriesof Heterogeneity,SamplingandHomogenizing,byP.M.Gy Volume11 ExperimentalDesign:AChemometricApproach(Second,RevisedandExpanded Edition)byS.N.DemingandS.L.Morgan Volume12 MethodsforExperimentalDesign:PrinciplesandApplicationsforPhysicists andChemists,byJ.L.Goupy Volume13 IntelligentSoftwareforChemicalAnalysis,editedbyL.M.C.Buydensand P.J.Schoenmakers Volume14 TheDataAnalysisHandbook,byI.E.FrankandR.Todeschini Volume15 AdaptionofSimulatedAnnealingtoChemicalOptimizationProblems, editedbyJ.Kalivas Volume16 MultivariateAnalysisofDatainSensoryScience,editedbyT.NæsandE.Risvik Volume17 DataAnalysisforHyphenatedTechniques,byE.J.KarjalainenandU.P.Karjalainen Volume18 SignalTreatmentandSignalAnalysisinNMR,editedbyD.N.Rutledge Volume19 RobustnessofAnalyticalChemicalMethodsandPharmaceuticalTechnological Products,editedbyM.W.B.Hendriks,J.H.deBoer,andA.K.Smilde Volume20A HandbookofChemometricsandQualimetrics:PartA,byD.L.Massart, B.G.M.Vandeginste,L.M.C.Buydens,S.deJong,P.J.Lewi,and J.Smeyers-Verbeke Volume20B HandbookofChemometricsandQualimetrics:PartB,byB.G.M.Vandeginste, D.L.Massart,L.M.C.Buydens,S.deJong,P.J.Lewi,andJ.Smeyers-Verbeke Volume21 DataAnalysisandSignalProcessinginChromatography,byA.Felinger Volume22 WaveletsinChemistry,editedbyB.Walczak Volume23 Nature-inspiredMethodsinChemometrics:GeneticAlgorithmsandArtificial NeuralNetworks,editedbyR.Leardi Volume24 HandbookofChemometricsandQualimetrics,byD.L.Massart,B.M.G.Vandeginste, L.M.C.Buydens,S.deJong,P.J.Lewi,andJ.Smeyers-Verbeke ii DATAHANDLING IN SCIENCE ANDTECHNOLOGY—VOLUME 25 AdvisoryEditors: S. Rutan andB. Walczak Statistical design — Chemometrics R.E. BRUNS InstitutodeQuimica, UniversidadeEstadualde Campinas,Brazil I.S. SCARMINIO Departamento deQuimica, UniversidadeEstadualdeLondrina, Brazil B. de BARROS NETO Departamento deQuimica Fundamental,Universidade FederaldePernambuco,Brazil Amsterdam – Boston – Heidelberg – London – New York – Oxford Paris – San Diego – San Francisco – Singapore – Sydney – Tokyo iii ELSEVIERB.V. ELSEVIERInc. ELSEVIERLtd ELSEVIERLtd Radarweg29 525BStreet,Suite1900 TheBoulevard, 84TheobaldsRoad P.O.Box211,1000AE SanDiego LangfordLane,Kidlington LondonWC1X8RR Amsterdam CA92101-4495 OxfordOX51GB UK TheNetherlands USA UK r2006ElsevierB.V.Allrightsreserved. ThisworkisprotectedundercopyrightbyElsevierB.V.,andthefollowingtermsandconditionsapplytoitsuse: Photocopying Singlephotocopiesofsinglechaptersmaybemadeforpersonaluseasallowedbynationalcopyrightlaws. PermissionofthePublisherandpaymentofafeeisrequiredforallotherphotocopying,includingmultipleor systematic copying, copying for advertising or promotional purposes, resale, and all forms of document delivery.Specialratesareavailableforeducationalinstitutionsthatwishtomakephotocopiesfornon-profit educationalclassroomuse. Permissions may be sought directly from Elsevier’s Rights Department in Oxford, UK: phone (+44) 1865 843830,fax(+44)1865853333,e-mail:[email protected] theElsevierhomepage(http://www.elsevier.com/locate/permissions). IntheUSA,usersmayclearpermissionsandmakepaymentsthroughtheCopyrightClearanceCenter,Inc., 222RosewoodDrive,Danvers,MA01923,USA;phone:(+1)(978)7508400,fax:(+1)(978)7504744,andinthe UKthroughtheCopyrightLicensingAgencyRapidClearanceService(CLARCS),90TottenhamCourtRoad, LondonW1P0LP,UK;phone:(+44)2076315555;fax:(+44)2076315500.Othercountriesmayhavealocal reprographicrightsagencyforpayments. DerivativeWorks Tablesofcontentsmaybereproducedforinternalcirculation,butpermissionofthePublisherisrequiredfor externalresaleordistributionofsuchmaterial.PermissionofthePublisherisrequiredforallotherderivative works,includingcompilationsandtranslations. ElectronicStorageorUsage Permission ofthePublisherisrequiredtostore oruseelectronicallyanymaterialcontained inthiswork, includinganychapterorpartofachapter. Exceptasoutlinedabove,nopartofthisworkmaybereproduced,storedinaretrievalsystemortransmittedin anyformorbyanymeans,electronic,mechanical,photocopying,recordingorotherwise,withoutpriorwritten permissionofthePublisher. Addresspermissionsrequeststo:Elsevier’sRightsDepartment,atthefaxande-mailaddressesnotedabove. Notice NoresponsibilityisassumedbythePublisherforanyinjuryand/ordamagetopersonsorpropertyasamatter of products liability, negligence or otherwise, or from any use or operation of any methods, products, instructionsorideascontainedinthematerialherein.Becauseofrapidadvancesinthemedicalsciences,in particular,independentverificationofdiagnosesanddrugdosagesshouldbemade. Firstedition2006 LibraryofCongressCataloginginPublicationData AcatalogrecordisavailablefromtheLibraryofCongress. BritishLibraryCataloguinginPublicationData AcataloguerecordisavailablefromtheBritishLibrary. ISBN-13: 978-0-444-52181-1 ISBN-10: 0-444-52181-x ISSN: 0922-3487(Series) ∞ The paper used in this publication meets the requirements of ANSI/NISO Z39.48-1992 (Permanence of Paper). PrintedinTheNetherlands. Working together to grow libraries in developing countries www.elsevier.com | www.bookaid.org | www.sabre.org iv Preface Utilityoughttobetheprincipalintentionofeverypublication.Wherever this intention does not plainly appear, neither the books nor their authors have the smallest claim to the approbation of mankind. Thus wrote William Smellie in the preface to the first edition of Encyclopaedia Britannica, published in 1768. Our book has the modest intention of being useful to readers who wish — or need — to do experiments. The edition you are reading is a translation of a much revised, corrected and expanded version of our original text, Como Fazer Experimentos, published in Portuguese. To prepare this edition, every sentence was reconsidered, with the objective of clarifying the text. All the errors that we were able to discover, or the readers were kind enough to point out, have been corrected. Duringthelast20yearsorsowehavespentconsiderabletimeteaching chemometrics — the use of statistical, mathematical and graphical techniques to solve chemical problems — to hundreds of students in our ownuniversities,aswellasinover30differentindustries.Thesestudents came principally from the exact sciences and engineering but other professional categories were also represented, such as management, medicine, biology, pharmacy and food technology. This diversity leads us to believe that the methods described here can be learned and applied, with varying degrees of effort, by any professional who has to do experiments. Statistics does not perform miracles and in no way can substitute specialized technical knowledge. What we hope to demonstrate is that a professional who combines knowledge of statistical experimental design and data analysis with solid technical and scientific training in his own area of interest will become more competent, and therefore even more competitive. We are chemists, not statisticians, and perhaps this differentiates our bookfrommostotherswithsimilarcontent.Althoughwedonotbelieveit is possible to learn the techniques of experimental design and data analysiswithoutsomeknowledgeofbasicstatistics,inthisbookwetryto keep its discussion at the minimum necessary — and soon go on to what v vi Preface really interests the experimenter — research and development problems. Ontheotherhand,recognizingthatstatisticsisnotverydeartotheheart of many scientists and engineers, we assume that the reader has no knowledgeofit.Inspiteofthis,wearriveearlierattreatingexperimental problems with many variables than do more traditional texts. Many people have contributed to making this book a reality. When the first edition came out, the list was already too extensive to cite everyone by name. We have been fortunate that this list has grown considerably sincethattimeandourgratitudetoallhasincreasedproportionately.We do,however,wishtothankespeciallythosewhoseworkhasallowedusto include so many applications in this edition. These people are cited with specific references when their results are discussed. We are also grateful to Fapesp, CNPq and Faep-Unicamp research granting agencies for partial financial support. Ofcourse,weremainsolelyresponsibleforthedefectswehavenotbeen able to correct. We count on the readers to help us solving this optimization problem. Our electronic addresses are below. If you know of places where we could have done better, we will be most interested in hearing from you. Campinas, July 2005. B. de Barros Neto Fundamental Chemistry Department Federal University of Pernambuco E-mail: [email protected] I. S. Scarminio Chemistry Department State University of Londrina E-mail: [email protected] R. E. Bruns Chemistry Institute State University of Campinas E-mail: [email protected] Contents Preface v 1 How statistics can help 1 1.1 Statistics can help 2 1.2 Empirical models 4 1.3 Experimental design and optimization 5 2 When the situation is normal 9 2.1 Errors 10 2.1.1 Types of error 11 2.2 Populations, samples and distributions 14 2.2.1 How to describe the characteristics of the sample 17 2.3 The normal distribution 23 2.3.1 Calculating probabilities of occurrence 24 2.3.2 Using the tails of the standard normal distribution 29 2.3.3 Why is the normal distribution so important 32 2.3.4 Calculating confidence intervals for the mean 34 2.3.5 Interpreting confidence intervals 35 2.4 Covariance and correlation 37 2.5 Linear combinations of random variables 41 2.6 Random sampling in normal populations 45 2.7 Applying the normal distribution 54 2.7.1 Making comparisons with a reference value 54 2.7.2 Determining sample size 58 2.7.3 Statistically controlling processes 60 2.7.4 Comparing two treatments 64 Comparing two averages 64 Making paired comparisons 66 Comparing two variances 69 2A Applications 70 2A.1 From home to work 70 2A.2 Bioequivalence of brand-name and generic medicines 76 2A.3 Still more beans? 77 2A.4 Marine algae productivity 81 vii viii Contents 3 Changing everything at the same time 83 3.1 A 22 factorial design 85 3.1.1 Calculating the effects 86 3.1.2 Geometrical interpretation of the effects 89 3.1.3 Estimating the error of an effect 89 3.1.4 Interpreting the results 93 3.1.5 An algorithm for calculating the effects 95 3.1.6 The statistical model 98 3.2 A 23 factorial design 103 3.2.1 Calculating the effects 104 3.2.2 Estimating the error of an effect 106 3.2.3 Interpreting the results 107 3.2.4 The statistical model 110 3.3 A 24 factorial design 110 3.3.1 Calculating the effects 111 3.3.2 Estimating the error of an effect 112 3.4 Normal probability plots 114 3.5 Evolutionary operation with two-level designs 119 3.6 Blocking factorial designs 123 3A Applications 125 3A.1 Resin hydrolysis 125 3A.2 Cyclic voltammetry of methylene blue 126 3A.3 Retention time in liquid chromatography 127 3A.4 Gas separation by adsorption 129 3A.5 Improving wave functions 131 3A.6 Performance of Ti/TiO electrodes 133 2 3A.7 Controlling detergent froth 138 3A.8 Development of a detergent 140 3A.9 A blocked design for producing earplugs 142 4 When there are many variables 147 4.1 Half-fractions of factorial designs 148 4.1.1 How to construct a half-fraction 152 4.1.2 Generators of fractional factorial designs 154 4.2 The concept of resolution 156 4.2.1 Resolution IV fractional factorial designs 156 4.2.2 Resolution V fractional factorial designs 157 4.2.3 Inert variables and factorials embedded in fractions 158 4.2.4 Half-fractions of maximum resolution 161 4.3 Screening variables 163 4.3.1 Resolution III fractional factorial designs 163 4.3.2 Saturated designs 165 Contents ix 4.3.3 How to construct resolution III fractional factorial designs 171 4.3.4 How to construct a 28(cid:1)4 fraction from a 27(cid:1)4fraction 171 IV III 4.3.5 Saturated Plackett–Burman designs 173 4.3.6 Taguchi techniques of quality engineering 175 4A Applications 179 4A.1 Adsorption on organofunctionalized silicas 179 4A.2 Calcium oxalate thermogravimetry 179 4A.3 Chromatographic analysis of gases 182 4A.4 Mn-porphyrin catalytic response 184 4A.5 Oxide drainage in the steel industry 185 4A.6 Violacein production by bacteria 187 4A.7 Polyester resin cure 189 4A.8 Screening design for earplug production 193 4A.9 Plackett–Burman designs for screening factors 194 5 Empirical model-building 199 5.1 A model for y¼fðXÞ 199 5.2 The analysis of variance 209 5.3 Confidence intervals 213 5.4 Statistical significance of the regression model 218 5.5 A new model for y¼fðXÞ 219 5.6 Lack of fit and pure error 223 5.7 Correlation and regression 231 5A Applications 233 5A.1 The spring of air 233 5A.2 Chromatographic calibration 235 5A.3 Multivariate calibration 239 5A.4 Forbidden energy gaps in semiconductors 240 5A.5 Heat of vaporization determination 241 5A.6 Another calibration 243 6 Exploring the response surface 245 6.1 Response surface methodology 245 6.1.1 Initial modeling 246 6.1.2 Determining the path of steepest ascent 250 6.1.3 Finding the optimum point 254 6.2 The importance of the initial design 259 6.3 An experiment with three factors and two responses 260 6.4 Treating problems with many variables 268 6.5 Central composite designs 273 6.6 Box–Behnken designs 277 6.7 Doehlert designs 281