ebook img

Handling Missing Data in Ranked Set Sampling PDF

124 Pages·2013·3.942 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Handling Missing Data in Ranked Set Sampling

SPRINGER BRIEFS IN STATISTICS Carlos N. Bouza-Herrera Handling Missing Data in Ranked Set Sampling 123 SpringerBriefs in Statistics For furthervolumes: http://www.springer.com/series/8921 Carlos N. Bouza-Herrera Handling Missing Data in Ranked Set Sampling 123 CarlosN.Bouza-Herrera Facultad deMatemática y Computación Universidad deLa Habana Havana Cuba ISSN 2191-544X ISSN 2191-5458 (electronic) ISBN 978-3-642-39898-8 ISBN 978-3-642-39899-5 (eBook) DOI 10.1007/978-3-642-39899-5 SpringerHeidelbergNewYorkDordrechtLondon LibraryofCongressControlNumber:2013944885 MathematicsSubjectClassification(2010):62D05,62F05,62F10,62Pxx,62F40 (cid:2)TheAuthor(s)2013 Thisworkissubjecttocopyright.AllrightsarereservedbythePublisher,whetherthewholeorpartof the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation,broadcasting,reproductiononmicrofilmsorinanyotherphysicalway,andtransmissionor informationstorageandretrieval,electronicadaptation,computersoftware,orbysimilarordissimilar methodology now known or hereafter developed. Exempted from this legal reservation are brief excerpts in connection with reviews or scholarly analysis or material supplied specifically for the purposeofbeingenteredandexecutedonacomputersystem,forexclusiveusebythepurchaserofthe work. Duplication of this publication or parts thereof is permitted only under the provisions of theCopyright Law of the Publisher’s location, in its current version, and permission for use must always be obtained from Springer. Permissions for use may be obtained through RightsLink at the CopyrightClearanceCenter.ViolationsareliabletoprosecutionundertherespectiveCopyrightLaw. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publicationdoesnotimply,evenintheabsenceofaspecificstatement,thatsuchnamesareexempt fromtherelevantprotectivelawsandregulationsandthereforefreeforgeneraluse. While the advice and information in this book are believed to be true and accurate at the date of publication,neithertheauthorsnortheeditorsnorthepublishercanacceptanylegalresponsibilityfor anyerrorsoromissionsthatmaybemade.Thepublishermakesnowarranty,expressorimplied,with respecttothematerialcontainedherein. Printedonacid-freepaper SpringerispartofSpringerScience+BusinessMedia(www.springer.com) To my jewels: Gemayqzel, for being the brilliant apple of my eyes Sira María, the constant and shining support of my heart BernardaDulce,thepreciousdiamondofour lifes Preface Usage is the best language teacher. Quintilianus Theuseofrandomsamplingsustainsthedevelopmentofcurrentstatisticaltheory. Inmanycasesitisnecessarytohavesomecontroloftheunitstobeselected.The solutioninclassicsamplingistousestratification,clustering,unequalprobabilities of selection, etc. Ranked Set Sampling (RSS) is a new method of selection of samples. RSS allows controlling the selection procedure, as the sample will containunitsinwhichthevaluesofthevariablesofinterestarespreadthroughout the interval of possible values. The sample consists of units with different ranks. Ranksareassignedusingsomeauxiliaryinformation;thejudgmentofexpertsisa particular case. RSS is a kind of stratification. Hence, using this design instead of simple random sampling with replacement means that a gain in accuracy is straightfor- wardly achieved. It is sustained by the fact that each strata consists of population unitswiththesamerank.Thestatisticalpropertiesoforderstatisticsallowderiving thepropertiesofRSS-basedestimators.Oneofthemainconsequencesofthestudy of RSS methods is determining formulas for evaluating the gains in accuracy as well as relative precision measures. From the proposal of McIntyre (1952) to the book of Chen et al. (2004), the study of particular RSS strategies has produced a large number of papers and a body of models has formed an alternative theory of sampling. The number of theoretical papers and applications of RSS is growing. One of the usual problems in sampling applications is the presence of non-sampling errors. The effects of rankingerrorshavebeenstudied.Thisworkdealswiththeproblemsderivedbyno responses. RSS models after subsampling the non-respondents and imputation procedures are studied. The aim of this book is quite modest; it attempts to be a systematic exposition ofallthatiscontainedintheliteratureonRSSintheareaofmissingobservations. Inwritingthisbook,Itriedtoproduceatextthatisassimpleaspossible.Myaim is to spread awareness of the potentialities of RSS. I am hopeful that this oeuvre will trigger additional theoretical research, as well as provide tools for practical vii viii Preface applications, when non-sampling errors are present and an RSS model is used. Hence my torch was Quintilianus maxima: consuetudo certissima est loquendi magistra. Iamdeeplyindebtedtomyfamilyfortheirinvaluableandkindsupport.Igive my hearty thanks to the staff of SpringerBriefs, Evelyn Best, Eva Hiripi, and Veronika Rosteck, for their responsiveness through the entire Editorial Process, and,lastbutnotleast,tothespecialistswhosecommentsandsuggestionsallowed improving the initial version of this book. La Habana, Cuba, October 29, 2012 Carlos N. Bouza-Herrera References McIntyre,G.A.(1952).Amethodofunbiasedselectivesamplingusingrankedsets.Australian JournalofAgriculturalResearch,3,385–390. Chen,Z.,Bai,Z.,&Sinha,B.K.(2004).Rankedsetsampling:Theoryandapplications.Lectures NotesinStatistics,(p.176).NewYork:Springer. Contents 1 Missing Observations and Data Quality Improvement. . . . . . . . . . 1 1.1 Missing Observations and Data Quality . . . . . . . . . . . . . . . . . . 1 1.2 Ranked Set Sample in the Presence of Missing Data. . . . . . . . . 4 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2 Sampling Using Ranked Sets: Basic Concepts. . . . . . . . . . . . . . . . 7 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.2 The Basic RSS Strategy. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.2.1 The Sampling Procedure . . . . . . . . . . . . . . . . . . . . . . . 11 2.2.2 Estimation of l. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.2.3 The Estimation of the Variance . . . . . . . . . . . . . . . . . . 14 2.2.4 Confidence Intervals . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.3 Some Other RSS Strategies . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.3.1 Median RSS Sampling (MRSS) . . . . . . . . . . . . . . . . . . 16 2.3.2 Extreme RSS Sampling (ERSS) . . . . . . . . . . . . . . . . . . 17 2.3.3 L-RSS Sampling. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.4 Some Particular Estimators. . . . . . . . . . . . . . . . . . . . . . . . . . . 20 2.4.1 The Ratio Estimator . . . . . . . . . . . . . . . . . . . . . . . . . . 20 2.4.2 The Difference of Means. . . . . . . . . . . . . . . . . . . . . . . 27 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 3 The Non-response Problem: Subsampling Among the Non-respondents. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 3.1 Some Aspects of Non-response. . . . . . . . . . . . . . . . . . . . . . . . 31 3.2 Estimation of the Mean . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 3.3 RSS Designs and the Non-responses . . . . . . . . . . . . . . . . . . . . 34 3.3.1 Managing with NR . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 3.3.2 The Use of RSS for Subsampling s . . . . . . . . . . . . . . . 35 2 3.3.3 The Use of the Extreme RSS for Subsampling s . . . . . . 38 2 3.3.4 The Use of Median RSS for Subsampling s . . . . . . . . . 41 2 3.4 The Estimation of a Difference. . . . . . . . . . . . . . . . . . . . . . . . 43 3.5 Product-Type Estimators . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 3.6 The Non-response Problem: Double Sampling . . . . . . . . . . . . . 52 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 ix x Contents 4 Imputation of the Missing Data . . . . . . . . . . . . . . . . . . . . . . . . . . 59 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 4.2 Ignoring the Non-responses in the Estimation of the Mean. . . . . 61 4.3 Ratio Imputation Procedures. . . . . . . . . . . . . . . . . . . . . . . . . . 62 4.3.1 SRWR Designs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 4.3.2 The RSS Design. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 4.4 Imputation in Median RSS. . . . . . . . . . . . . . . . . . . . . . . . . . . 73 4.4.1 Some Quotations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 4.4.2 Mean Imputation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 4.4.3 Ratio Imputation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 4.5 Imputation Using Product-Type Estimators. . . . . . . . . . . . . . . . 79 4.6 Imputation in LRSS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 4.6.1 Modeling the Non-responses . . . . . . . . . . . . . . . . . . . . 83 4.6.2 The Mean Substitution. . . . . . . . . . . . . . . . . . . . . . . . . 84 4.6.3 Ratio Imputation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 5 Some Numerical Studies of the Behavior of RSS. . . . . . . . . . . . . . 91 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 5.2 Studies of Some Estimators in RSS. . . . . . . . . . . . . . . . . . . . . 92 5.2.1 Antecedents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 5.2.2 Analysis of a Monte Carlo Experiment of the Behavior of the Estimator of the Difference of Means . . . . . . . . . 92 5.2.3 Numerical Study of Ratio Type Estimators . . . . . . . . . . 93 5.2.4 Numerical Study of Other Estimators . . . . . . . . . . . . . . 96 5.3 Analysis of Non-response Models . . . . . . . . . . . . . . . . . . . . . . 97 5.3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 5.3.2 A Monte Carlo Comparison of the Accuracy for Estimating the Population Mean . . . . . . . . . . . . . . . 97 5.3.3 Estimation of the Difference of Means: Analysis of a Monte Carlo Experiment. . . . . . . . . . . . . . . . . . . . 101 5.4 Numerical Studies of Imputation Methods . . . . . . . . . . . . . . . . 105 5.4.1 Some General Remarks. . . . . . . . . . . . . . . . . . . . . . . . 105 5.4.2 The Median Estimator. . . . . . . . . . . . . . . . . . . . . . . . . 106 5.4.3 Imputation in LRSS . . . . . . . . . . . . . . . . . . . . . . . . . . 109 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 Chapter 1 Missing Observations and Data Quality Improvement Abstract Missing data is a well-recognized problem which arises in statistical inferencesanddataanalysis.Weaddressdifferentpossiblewaystohandlemissing data, to ameliorate its effect on the reliability and accuracy of survey-based inferences. Subsampling the non-respondents and imputation of missing values, are considered asmethods for dealing with non-responses. Thisbookpresentsthe workdevelopedonRankedSetSampling(RSS)indealingwithmissingdata.RSS is a relatively new sampling design. This chapter may be considered as an intro- duction to the rest of the oeuvre. Keywords Non respondent (cid:2) Imputation (cid:2) Randomized responses (cid:2) Simple ran- dom sampling (cid:2) Ranked set sampling Whatthecaterpillarcallstheendoftheworld,therestofthe worldcallsabutterfly. LaoTse 1.1 Missing Observations and Data Quality ConsiderafinitepopulationUofsizeNfromwhichasimplerandomsamples,of size n, is drawn with replacement. Full response surveys are rare situations. In samplesurveysitiscommon thatsomeunitsaremissingatthefirstmeasurement attempt. Let the characteristic under study determine a variable Y. For each i 2 U we can determine the value of Y. When some units do not provide infor- i mation we have that the sample is divided into two subsets. s ¼fi2Ujthe responseY is obtainedg; s r i rn ¼fi2Ujthe responseY is not obtainedg i An estimate obtained from s only is biased and may be misleading. r C.N.Bouza-Herrera,HandlingMissingDatainRankedSetSampling, 1 SpringerBriefsinStatistics,DOI:10.1007/978-3-642-39899-5_1, (cid:2)TheAuthor(s)2013

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.