'n-'-"<>fT-.'gBairiy'^".Tii|i-y-jbh Recensement Census 1^1 SCtaantiasdtaique SCtaantiasdtiacs Canada STATISTICS STAr^stiio'ye CANADA CAMAO^ :<>V.t ^ 1995 LIDRAnY NATIONAL CENSUS TEST Report No. 28 Evaluation ofProcessing Operations PhilStevens Neelam Prakash LorieShinder SpecialSurveysDivision July1994 Introduction Thisreport discussesproblems resolved intheprocessing (including capture) ofthe 1993 National Census Test(NCT) survey. Itmayhelp intheplanning anddevelopment of caonmyplfeuttiuroen,Cemnasiuls-baTceskt,paapsesru-mainndg-ptehnecUmetfohromd.olOongythoefotthheerCehnasnuds,ttohecoNntCiTnuewaassaadaspetlfe-d to 1993Labour-Force Survey(LFS)procedures; astheLFSmovestoCAI(Computer- AssistedInterviewing) some ofourproblems maybeunique tothe 1993test. FieldOperations -Background TheNCTusedaNovember 8,1993reference date. Questionnaire drop-offstarted on Saturday October 30andfinishedonFriday,November 5,1993. One component ofthetestwasselection ofasubsample ofNCThouseholds' whose responses werecaptured asthe "EditFaUure Survey"(EFS) component ofdie surveyas soonasreceived indieROs,then sentbacktointerviewers forfieldeditsandfoUow-up. These questionnaires werere-captured inorder totestthefieldeditsandfollow-up. Startup ofdata capture ofEFSresponses wasdelayed adayortwobecause ofthe November 11thholiday(Thursday) andproblems encountered withthecapture program writtenintheDC2software. Thefirsttransmission wasreceived fromtheEdmonton RO onSaturday, November 13,1993. Data collectionwas completed forEFSresponses byNovember 30,1993. TherawEFSfileconsisted of10,145person-records. Data capture ofNCTresponses started onDecember 10,1993andthefinaltransmission wasreceived onJanuary 26.1994. TherawNCTfile(combined LFS-based and special populations) consisted of47,057person-records. PrintRequirements Labels: Labels andinterviewer-assignment control-listsweregenerated through theLFS andhadtofitinwiththeLFSproduction facUities. Thehousehold identification codeon 'Sample selection wascompleted ontheMainframe byMikeEgan. Themain sample wasselected asasubset ofthe LFSsample fromApril,MayandJune of1991, which"rotated out"of(i.e.,finishedwith)theLFSinSeptember -November 1991. It wasjudgedthat,twoyearsafterLFSparticipation, thehouseholds wouldnothave retained anybiasinthen:attitudes fromdieLFSexperience thatwouldsignificandy affecttheirNCTresponses. Atotal of17,109dwellingswasselected. TheEFS sample was1/2oftheNCTsample, i.e.,8500dwellings. Afiirdier 3985dwellingswerepicked for"specialpopulation" samples, primarilyfrom1991Census files. labelsandcontrol listswas15byteslong,consistingofthe LFS-filefieldsPSU,GROUP, CLUSTER, ROTATION, LISTING andMULTIPLE. Onlyfourteen bytesofidentifying information weregenerated inthe labelprogram instead offifteen,with'multiple'not beingprinted. Toresolvethisproblem regional officeswereinstructed tocode '0'in 'multiple'onthelabelsandcontrol lists. Inspiteofthe instructions, 'multiple'wasstill blankformanycasesontheassignment control lists. This wasproblematic because they weretobelinkedtohousehold-related responses fromthemainquestionnaire tocreate a Hwoausspeehroflodrmfeidl.e. TAlolrmeuclotvieprlethceodmeusltoifpl1eoarmhaitghcehrtwoetrheessealemcptleedfainledtoaspsiicgkneudpmtahniusalfileyld on-line. Alllinkagespecifications useda20-byteidconsistingoftheInterviewer Assignment Number (IAN),PSU,group,cluster,rotation, listingandmultiple. Withhindsight,it wouldhavebeenbetter tohaveusedashorter household-id codesimilartotheLFSRO- DOCKET. Linkingonahousehold identifier oftwentybyteswaslong,cumbersome and error-prone. Duringassignment planning,ROs wereasked toassign"7"asthesecond digitoftheIAN isfatmhpeledsw^ellIinnggewnaesralp,ictkheidswfoorrktehedLaFnSd-bmaasdeedistarmepaldeU,yapnodss"i9bl"efotrosspeepcairaalt-epopaunldatioornderthe assignments forprintinglabelsandassignment control sheets. However,there werea fewerrors whichsuggeststhatclean-ups mustbedone earlyintheinitialstagesof processing inorder toverifytheIAN. Fleuardthteirmmeorteo,altloocaactceomloAdNast.eThtehseeLa\rNepalllaoncnaitnegd(bRy21PSfUi,lesG)RthOeUrePneaenddsCtoLUbeSTadEeRquaftoerthe LFSrotate-out sample. Weallowedthree months forthecompletion ofassignment planning intheregional offices. Aproblem arose mtheSpecialPopulation sample. SomeEAsexceeded 75households whichwasthemaximumallocation foroneassignment. InorderfortheROs to dheotuesremhionled hleovwelttohebrheoauksethhoeldEAiddwoawsncainrtroiemdeaonnintghefuLlOlgeofiglrea.phAichaalrddceloipnyealtiisotnisngawtitthhe agdidvreenssoninthfeorLmaOtliofnilew.asFoprroEvAisdedthaftorweearcehtRooOlatrogecrfoosrsorneeferaesnsciegnwmietnhtt'h0e1'hwoaussephuotldmid ^Atotal sample of3985households wasselected using1991Census Visitation Records tolocate EAswithhighconcentrations ofthespecialpopulations ofinterest. TashseocWiaitninoinpeligstsa.ndThSeassekahtoouosnehoMledtsiswsearmepliensadwdeirteionobttoaitnheedLfFrSo-mbapsreodvinmcaiailnmNeCmbTerssahmiplpe. the yieldl Fortunately, none ofthese procedures hadanimpact onthe instruction manuals writtenbySurveyOperations. TtsTheahesmetpilSnQepfuseoecrsiomtafailtotinhPoneonapipuroolebpatutalifiaoontnreidos.snpaemciTipnfhliegecesngmerwaroejaulrop,ers*anpnouttdrhpawntoeastimeigaohnoytafeledtxhepebsesetreciiamesauntascemeespltpewhaiserlytliiscnauortletoarbneoveptarlcouraaebltpclerueelmsatsethneetdwaiCttfeihrvneostmuhes proposed questions. Iaapftsrtoteichnneetpsriuseotinnitgsoistwotrahebesqeur"iaeSrq0eSud3pi"erpicernidiatnlhtteoPforpiplereufeslpoaarntrmeiaaeottdineoSdncpeobcomyifpaoIltnhSeePDonp"t(uFKl0aia3tnt"hifyfouinlmReresreiendce/CoeerdDdnesasduvisenbtytBoeosLatwPsnmSat"nhF(e)0Jn.3oh"cnsaEtrxRertfouurcwlamlraendd) rpersopdouncsteionfiloe.fNThCiTsflialbeelwsasanrdeqcuoinrteidrolbylistthse. LFPorStthoectrewaoteMeatnis"Ss0a3m"pplreisntseflielectfeodrfthreom mTehmebeprrospheirpParsosvoicnicaetiocnodleistwsawseashsaidgnteod,ma'k00e00u0p0'dfuormmFyEDP-rEoAv-aFnEdD-tEhAe-nHhHhllddnnuummbbeersr. wzearsossequTehnetiaLlPlySa"sFs0i3gnSehdorcto"mrmeeconrcdmglenagtth'0i0s1'2.80GrbyotuepsbauntdtRhoetaStpieocnialweProepurleactoidoendmtoock F03wascreated as 131bytes. TheLPSP03fileexpects arecord lengthof372sothe SpecialPopulation filewaszero-filledtomeetthisrequirement. Twoproblems arose withtheSpecialPopulations. TheF03interviewer assignment numbers aANs) werenotmovedovercorrectlyandconsequentiy didnotmatchDave Bp(raoeswvsmiiaogunnsm'esnItAfiNlpel.taonTnbhieinsgp)wlaafcsileedrse.soonTlhvteehdeLiFPn0S3thsefyislpetreiomngrtraewqmouictrohenadtsemcauectruigrvereednptltahIceeAsF.N03aSseacnwodenUdRl2ays1,aitwas dcreeoqcmuipidlreiidcnagttehaadtmtEahdetmcpohrinintngtonfoUnewaPpsSrUetp,oarLhaiatsnitdoilnnegsntohuemthbaaestrsiEgdannmmdoennlttaostpnlbaycntoneuilnodgfpIfroAirnNtWifWonirmnpRmepOge.g1'6sTahlniadsbeRlsOby 17data. Afewdayswerelostduringproduction toproduce aprintfUe.Somemore fieldsontiie ^Yieldisarequired fieldaccordingtoLabour Forceprocedures forupdating assignments intheROs '^BlacksinHalifax,AsiansinMontireal, BlacksinMonti:6al, LatinAmericans in Montreal, AsiansinToronto. BlacksinToronto, Aboriginals inWinnipeg, Metis in Winnipeg,MetisinSaskatoon, Aboriginals inRegina, Aboriginals inEdmonton and Asians inVancouver 372-byteF03wereidentified that shouldhave specificvalues^ (see footnote belowfor future reference). Data Capture ThestartofEFSdata capture wasdelayed three orfourdaysmainlybecause ofbugsin theDC2softwarebeingusedfortheNCT. Duetotimeconstraints Special Surveysdid nothaveanopportunity toreviewkeyerinstructions. One instruction calledfor household data (Steps 1-7andQQ47 onwards) tobecaptured onlyonce asapart of theperson-1 record. Inseveralcases,household data werecaptured forsomebody other thanperson 1^ Anambiguityintheinstructions mayhavebeenthereason. Itwould havebeendesirable tohavereviewedandcommented ontheinstructions. InusingDC2,there wasnomeans ofcontroltoguarantee allhousehold members had beencaptured. DC2wascapable ofcontrolling forthenumber offormsbutthere was nowayofknowmgwhether akeyoperator hadmissedoutanentke person fromthe form. Ahigherlevelofverificationthanwasusedwouldbehelpful. Thedesignofthequestionnaure wassetupforverticalcapture toreflectthecapturing of pasesrissotnthreeckoerydsopweirtahtionrthveisuhaolulsyehtoolsdt.ayNinothceolcoourrredcitsticnocltuiomnnwoansemaacdhepaogne.theAformto recommendation forshading shouldbemade ifthebudget canhandle theadditional cost forprinting. Another limitationoftheDC2softwarewasthat itlackedverificationflexibUity. For example, 100%ofagivenfieldhadtobeverifiedbyre-keyingforallforms.Itwasnot possibletoverifyasample offorms.Itwouldhavebeenpreferable tohavetaken a subsample ofdocuments andverifiedonseveralorallfields. Ifthisoptionhadbeen avaUable abetter picture oftheerrorrate incurred bythekeyoperator wouldhavebeen evident. Forbudgetreasons wechosetoverifythe20-byteHousehold identifier and questions 2-5foreveryform. Wewouldadvisewithhindsightahigherlevelof verification. Processing ^surveyid(pos.25,1)=1 preprinted code(pos.52,l)=l(=0ifspecialpopulations) pos.54,8=blank(flagsbased onapreviousmonth -make specialpopulations look likebirths) pos.280,1= (there arenonotes) pos.266=1(ifflaggedforEFSorblankonspecialpopulations) ^Thisoccured primarilyintheMontreal RO. 4 Swriintcteenaamnadjocrosptuerdpotsoealolfowthefortesmtulwtiapsletoenmteraisesurfeorerarllorpsr,etchoedecdaptquuersetipornso,grmacmluwdamsg those withinstructions "markoneonly". Itisquestionable whether subjectmatter people wereinterested enough inmultiple-response errors tomake thisworthwhile. Developing andtestingthedata capture entrysystemtookmanyperson-days more than rifesSppoecnisaelwShuerrveeyso'nlsytaonndeardisepxrpoeccetsesdi.ngFpurtacutriecetehstasdmbiegehntualsseod,usoefcRaOptufraicnilgititehseffoirrst groomingbefore capture. Ashortage ofresources duringtheprocessing wasexperienced. Planners andmanagers ofanyfuture Census testwillwanttobeassured ofhavingadequate, qualified programmers tohandle complexprogrammmg requirements^ Tocomplete thetestmg andproduction runsfortheNCTthree experienced programmers werefoundatshort naostsiicgeneidnhJiasnuoawrnyt1a9sk9s4.,eWahcahtavwaailsabbleeinfgorteastfeedwowreperkosduocnleyd.aEnadchbypwrohgormamrmeeqruurweadshour- to-hour attention anddirection fromacoordinator^. Attention needs tobepaidtothecreation andavailabUityofdetailed testfiles. During thephase ofprogram development forderivedvariablesprogrammers wanted goodtest fmialtetserBepcearsuosnesowfhtohearneamrreespoonfsitbhleeDfVorsstpheecsiefictaetsitonfsilefsoarrtehebeDstVss.uppElaireldybaydvsiusbejemcetnt shouldbegiventosubjectmatter inorder thattheycanprepare forthispartofthe processing. Workingonthesameplatform wouldhavebeenadvantageous interms oflocationand mhacaamnnadallgieetdymeontnhteorueorfwofaiwlsenacLrelAoatNti,oofn.ourploFonoardtihenexgaCmeapnnlsdeu,sdoalwLlnAolNfoatd(hieUnNgpDrCooc)fesposrriondogunctctiohouenldMfahiilaenvsfermabmeoeer.nderInto accomodate thetwoworkingenvironments usedforthe 1993NCT'. Fromthecreation oftheRAW filestothePREDIT filestheprocessing washandled ontheMainfirame. Afterwards theprocessing wasalldone ontheUNIX,exceptforoneofthe short-term borrowed programmers whopreferred toworkonthemainframe. AttimestheUnix wasabitslowduetomaximumusercapacityorspace limitations. Presumably there are costsavingsinusingaLANsuchastheCensus Unix,although itmightbedifficultto determine justhowmuch. 'The NCTOperational team inSpecialSurveysconsisted ofPhilStevens -Manager, Neelam Prakash -Programmer (replacement forMikeEganinSeptember, 1993),and LorieShinder -ProcessingRep. ^EvelynRyanofCensus Operations contiroUed andmonitored theassignment oftasks ofthethree programmers. 'Processing wasdeveloped andrunonboththeMainframe as wellastheUNIX. 5 Other than forincome data (Q.46inthetest) there wasnoeditingofresultsbySpecial SurveysaftercapUire. Forfuttire tests,fromacapmre pointofview,consideration shouldbegiventoomittingthe income-question centsboxes,ortoretaining the background colourinthem (i.e.,notdropping them out). Theyadded tothe incidence of errordespite keyoperators beinginstructed nottocapmre thecents. Perhaps mcome- question mstructions couldbetested: "entertheamount tothenearest doUar". Iffuture ttehsetrseisnhcoluudledebdeitmiongretheediitnscionmceluddaetda,invetrhieficcaatpimonreonsytshtisemf.ielAdbsohuotulfdoubrepmrcolgurdaemdm,era-nd weekswereusedinthetestjusttoeditQ46whichbecame anexpensive andtime- consuming aspect ofoverallprogramming. TreosulstastisffiyleLhFaSdwteoihghatv-ecoarnreacgtieonvalpureo.cedYueraers,ofevbeurtyhrweacosrdimipnuttehdemwahienreLFmSi-sssianmgp,let-hbenasaegde dteHdheoierwtiseNvveaCednrTd,frLfsoFomoSlml-ioebtw.a-rsuAeepcgdoeir.ndswsatamhpseilnetEt,hFheeSwniEtcfFiholpSed,iaetftdihaleetnfowoeartfrhteeeearcnEhoePtdSpitecsrafispaltoenu.ndrecdfTaophlteilunorEwteh-FdeuSptmwsiainaicmntephle-NeoCNnwTCcaeTsfiblafeei,flseobu.rebecseatfuiseoelfd tnhoetEreFtSurnfieldefirnotmuntehefoNrCNTCTfilec,apatnur"e,unkornolwonst"ivnalthueemwaaUs.giIvnenthteoEstFeSp orfeccoorpdysinngotagmetthoe NhaCvTebfielee.nSdeosmireabolfetthoesaellEoFwSforrecsoerpdarsatneevedretrhievlaetsisonhoafdavgaelifdoryetahre-soef-brierctohrddsa.ta.IffItummrieght testsfollowthesamemethodology foranEFScomponent, thisstep shouldbetaken mto consideration. Adetailed planofprocessing steps needs tobedrafted aswellasdocumented toprovide anoverviewofthetasksathand andintheproper sequence. Forexample, afterthe creation ofmanyofthederived variablestheNCTteam became awarethatthe temporary andforeignresidents (i.e.,those checkingStep 4orStep 6ofthe qoufepsrtoicoensnasiirneg),wshheounldduphlaivceatbeeeannddreompppteydrfeicroormdsthweerNeCTdroapnpdedE.FSThfiilsesalastoeaurnlpiaecrtestdagoesn tahnedcfoodrieniggnberecsaiudseentsthewewrreitder-ionpspehda.dbAetenthespltiitmoeffofancdodseenltinfkoargecowdienghabdefmooreretecmopdoersary thanwecouldinitiallyaccount for,untilwerecalled thattheextraswerefromthe^^ dropped temporary andforeignresidents. Up-to-date documentation isimportant . Autocoding Aoisnnethuacntoadnitteicctaionpabtoeendlryetcpuorromnvpeioddneefnoortneawmcauosldtetihpepleerrerqweurseipstoten-siben.y.sMuOubnljeetcitpclhmeaartartceteserprointsostiecasllofofowrtfhEoetrhAnmiCocrTeoRntgshyamnstem ^"Attached aretheNCTandEFSprocessing flowcharts. 6