HindawiPublishingCorporation InternationalJournalofVehicularTechnology Volume2013,ArticleID705086,12pages http://dx.doi.org/10.1155/2013/705086 Research Article Evaluation of a Navigation Radio Using the Think-Aloud Method PaulA.Green1andJin-SeopPark2 1DriverInterfaceGroup,UniversityofMichiganTransportationResearchInstitute,2901BaxterRoad,AnnArbor, MI48109-2150,USA 2AdvancedVehicleSafetyandDynamicsResearchOffice,KoreaAutomobileTestingandResearchInstitute, 200Samjon-ro,Songsan-myeon,Hwaseong-si,Gyeonggi-do445-871,RepublicofKorea CorrespondenceshouldbeaddressedtoPaulA.Green;[email protected] Received10October2012;Accepted26December2012 AcademicEditor:MotoyukiAkamatsu Copyright©2013P.A.GreenandJ.-S.Park.isisanopenaccessarticledistributedundertheCreativeCommonsAttribution License,whichpermitsunrestricteduse,distribution,andreproductioninanymedium,providedtheoriginalworkisproperly cited. Inthisexperiment,13licenseddriversperformed20taskswithaprototypenavigationradio.Subjectscompletedsuchtasksas enteringastreetaddress,selectingapresetradiostation,andtuningtoanXMstationwhile“thinkingaloud”toidentifyproblems withoperatingtheprototypeinterface.Overall,subjectsidenti�ed64uniqueproblemswiththeinterface;17speci�cproblems wereencounteredbymorethanhalfofthesubjects.Problemsarerelatedtoinconsistentmusicinterfaces,limitationstodestination entrymethods,iconsthatwerenotunderstood,thelackoffunctionalgrouping,andsimilarlookingbuttonsanddisplays,among others.Animportantprojectfocuswasgettingthe�ndingstothedevelopersquickly.Havingascribetocodeinteractionsinreal timehelpedaswellasdirectedobservationsoftestsessionsbyrepresentativesofthedevelopers.Otherresearchersareencouraged tousethismethodtoexamineautomotiveinterfacesasacomplementtotraditionalusabilitytesting. 1.Introduction Consequently, there has been considerable interest in predicting user performance, in particular task time [17– People want products that are easy to use, and that is 22]. Task times for experienced users can be predicted in a partic-ularly true of motor vehicles. Numerous methods fractionofthetimetoplan,conduct,andanalyzeausability have been developed to assess the ease of use of driver test. If the method used by subjects to perform a task is interfaces, both traditionally, and more recently from the known,thepredictionsshouldbeasaccurateastheusability human-computerinteractionliterature[1–3].ethreemost testdata[23]. prominentmethodsare(1)usabilitytesting[4–8],(2)expert Expertreviewscanbeanefficientalternativetousability reviews [9–11], and (3) the think-aloud method [12–15]. testing,especiallyearlyindesign,thoughtheymaybeused Methodsvaryintermsoftheirvalueforformativeevaluation forsummativetestingaswell.Inanexpertreview,eachstepof (whiledevelopmentisinprogress)andsummativeevaluation eachtaskisexaminedtodeterminehowtheinterfaceshould (at the end of development). See [16] for an extensive be designed according to established usability heuristics overviewofhowvariousmethodsareconductedandwhere and guidelines. Expert reviews are oen criticized as being theyshouldbeapplied. “just someone’s opinion.” erefore, reviewers should be Usability testing is the gold standard of usability test professionally certi�ed in human factors or usability. (See methods, as it involves real users performing real tasks, http://www.bcpe.org/). thoughoeninalaboratorysetting,andcanbepartofeither formativeorsummativetesting.epurposeistodetermine Inthethink-aloudmethod,usersdescribetheirlogicas taskcompletiontimesanderrors.Generally,usabilitytesting they try to use an interface. For example, a subject might occursinthelatterstagesofdesign,whenafullyfunctioning say, “I selected the city name but cannot �gure out how interface is available. Usability tests are time-consuming to to get to the next step,” or “Sometimes, there is an OK planandanalyzeandcanbecostly. button in the lower right corner, but there is not one here. 2 InternationalJournalofVehicularTechnology F1:Mostcommonscreen. F3:XMradioscreen. F2:Oneoftwodestinationscreens. F4:Listofpreviousdestinationsscreen. I am stuck and frustrated.” e think-aloud method helps usefultothesponsor,theexperienceoftheresearchteam,the evaluatorsidentifywhatisconfusingormisleadingandhow fundingfortheproject,theschedule,andotherfactors.ere thoseproblemscanberesolved.Ifsubjectsfallsilent,thenthe wasextremepressuretocompletethisprojectveryquicklyto experimenterpromptsthemtospeakbutneedstodowithout meettheproductionschedulesetbyHyundai-Kia.erefore, interferingwiththesubject�sthinkingprocessorin�uencing considerable thought was given as to how to complete this them.Whentopromptandwhattosayismuchmoredifficult project quickly, which meant that less time was spent on to do than it may seem. In fact, considerable experimenter certainactivitiesthanisideal,andmethodstoacceleratedata trainingandpracticearerequired.See[16]foradiscussion. collectionandanalysiswereexplored. ink-aloudevaluationsaremostusefulduringtheearly stages of design while the design is still being formu- 2.Method lated, identifying problems users experience more readily than other methods. Unfortunately, as is described later, 2.1.NavigationDeviceExamined. edeviceexaminedwas data reduction in think-aloud evaluations is very time- an early working prototype of a Mobis generation 3 nav- consuming. igation radio. As shown in Figure 1, the navigation radio Recently, the authors conducted an evaluation of an consistedofanLCDdisplaysurroundedby10hardbuttons earlyprototypeoftheMobisGeneration3navigationradio (e.g., select satellite radio, seek), two CD related buttons, a for Hyundai-Kia vehicles [24]. e complete experiment volumeknob,andatuningknob.esehardbuttonsaswell included four parts (1) a think-aloud evaluation from a assobuttonsonthetouchscreenallowedaccesstohundreds humanfactorsexpert,(2)athink-aloudevaluationinvolving ofscreens.Figures2,3,and4showexamplescreens. 13 ordinary drivers, (3) a follow-up survey of those drivers primarily concerning their understanding of icons, and 2.2. Test Facility. e experiment was performed using the (4)estimatesoftasktimesperSAERecommendedPractice third generation UMTRI driving simulator while “parked.” J2365 [20]. Because this experiment was conducted during enavigationsystemwasmountedintothecenterstackof theearlystageofinterfacedesign,andtheinterfacedesigners thesimulatorcab.Toenablesignalreceptionanduseofthe needed to know what problems users would encounter, the GPSandXMfunctions,anantennawasinstalled,connecting focusoftheexperimentandthispaperconcernsonlypart2, thesimulatorlabwithanoutsideroom.Figures5and6show thethink-aloudevaluationbyordinarydrivers. a hypothetical subject being recorded, the equipment used, ere are many other ways this data could have been andtherecordedimagefromanactualsubject.Althoughthe collected. For example, questions concerning what subjects cameras were in plain sight, subjects ignored them, in part did and what subjects did and why, could have been asked becausethecamerainfrontofthemwassmall. retrospectively. In selecting methods to utilize, the authors considered the speci�cs of the request for quote from the 2.3.SequenceofTasksintheExperiment. Ineachsession,one sponsor,verifyingconformancetoacceptedindustrypractice sessionpersubject,subjects(1)completedbiographicaland (e.g.,SAEJ2365),whatinformationwasbelievedtobemost consentformsandhadtheirvisionchecked,(2)practicedthe InternationalJournalofVehicularTechnology 3 testingofthe�rstauthor,whichledtodiscussionsofwhen Panasonic SV-BP314 camera topromptandwhattosay.esecondauthoralsoobserved aimed at navigation radio andprovidedfeedbackontestsessionsofthe�rstfewsubjects aer each session was complete. Ideally, more time would havebeendesiredfortraining,butthatwasnotfeasiblegiven thesponsor’sproductdevelopmentschedule. Duringthethink-aloudtest,theexperimenter,seatedin thefrontpassengerseat,presentedasheetofpaper,oneper task,describingthetasktoperformandthedatatouse,and Experimenter askedthesubjecttothinkaloudwhiledoingthetask.When Subject subjects fell silent for an extended period of time (typically 30s, but sometimes longer), the experimenter prompted Supercircuits PC502XP camera them. (“What are you looking for? What are you expecting aimed at the subject’s face to see? Are you confused?”) Generally, few prompts were needed. ere were no speci�c rules about which prompt F 5: Experimenter and a hypothetical subject interacting. to use when. Although these prompts seem leading, it was Note:notshownarethePanasonicWJ-420quadsplitter,theSharp oenapparentfromwhatsubjectssaidandtheiractionswhat LC10-A3U-B10inLCDmonitor,thePanasonicDMR-EZ47VDVD thatthesubjectstatewasconsistentwiththeseprompts.For recorder,theAudioTechnicamicrophone,ortheMackieDX8digital example,ifasubjectrepeatedlyswitchedbetweenscreensbut audiomixer. didnotselectanythingelse,then“Whatareyoulookingfor” wasanappropriateprompt. Ifasubjectwasnotmakinganyprogressaerthreeto�ve minutesonatask(dependingonthetimeremaining),they weregivenahint.Hintsidenti�edwhattodonext(e.g.,press this button), without any explanation of why an action was appropriate.Ifsubjectscontinuedtostruggle,theyweretold tostopandmoveontothenexttask,astheintentwastoreveal asmanyproblemsaspossibleinthetimeavailable. Ascribe(averyfasttypist)satoutsidethesimulatorcab and attempted to record verbatim what the experimenter and subject said during the experiment. e only and very F6:Samplescreengrabfromatypicalsubject. generalinstructionstothescribeweretorecordeverything saidandtousethevideoandaudiorecordingsto�llinany gaps. In part, a verbatim transcription was feasible because think-aloud method, (3) completed 20 test tasks in a �xed theexperimenterandsubjectwerenottalkingcontinuously, order while thinking aloud, and (4) completed a 19-page so the scribe was able to catch up during pauses. Anything surveyrequestedbythesponsor.Practiceinvolvedcounting that the scribe missed (e.g., if the subject spoke quickly) the number of chairs in the place where they lived, going was �lled in immediately aer each test session (in the 30 roombyroom.Asintended,thesubjectsdidnotsimplylist minutes or so before the next subject) from the audio and how many chairs were in each room, but said something videorecordings.Havingacompletetranscriptionessentially aboutthekindofchairspresent,wherethechairswerelocated immediatelyaereachsessionwascompletedshortenedthe ineachroom,orotherinformation.Iftheydidnotprovide time to reduce the data. Furthermore, aer the fact review someofthosedetails,thenaquestionwasaskedtoencourage of the audio recordings, collected using a system that was them. Oen, the process of recalling chairs was a virtual hastily assembled and not optimized for recording quality, journey (“e front door takes me into the living room. In weresometimeinaudible,especiallywhensubjectsmumbled. that room Next to that room, around the table ”). e Althoughrequiringascribeinadditiontoanexperimenter practicewasquickandhelpedsubjectsunderstandwhatwas madetheexperimentmoredifficulttoschedule,givenwhen meantby“…thinkaloud.” … thisexperimentwasconducted,ascribewasalwaysavailable. Some 20 tasks were examined (Table 1). ese tasks were selected because of their importance, frequency of occurrence,andtoprovidedataonavarietyofentrytypes. 2.4.Subjects. irteenlicenseddriversvolunteeredtoserve Onlymanualentrywasallowed.Foreaseofadministration as subjects: 6 younger people (ages 19–26, 3 men and andanalysis,theorderoftaskswas�xed. 3 women) and 7 older people (ages 65–83, 3 men and Subjectswerenotgivenanydocumentationorinstruction 4 women). ey were recruited via an advertisement on astohowtheinterfacefunctioned,astheinterfacedesignwas Craigslist.Alloldersubjectswereretired.Fiveoftheyounger intendedtobeintuitive. subjectswerestudents.Youngersubjectswerethemostlikely e second author served as the experimenter for the users of the navigation radio, especially of the audio func- main experiment. He had reviewed the literature on think- tions.Oldersubjectswerethosemostlikelytobechallenged aloud studies and also served as the experimenter for pilot bytheinterfaceandwouldbethe�rsttoencounterproblems 4 InternationalJournalofVehicularTechnology T1:Tasklist. Category # Task Datatouse 1 Enterdestinationviastreetaddressmethod 100North5thAvenue;AnnArbor,MI 2 EnterwaypointviaPOI(PointOfInterest)method NearestMcDonaldsfromcurrentposition 3 EnterdestinationviaPOImethod Nearestgasstationfromcurrentposition 4 Canceltheroute Currentroute 5 EnterdestinationviaPOImethod DetroitMetroAirport Navigation 6 Changethemapscale Zoomin 7 Enterdestinationviastreetaddressmethod 1600PennsylvaniaAvenue;Washington,DC 8 Enterdestinationviaintersectionmethod US-23andI-94 9 EnterdestinationviaPOImethod CoboHallinDetroit,MI 10 Enterdestinationviaintersectionmethod BroadwaySt.andPlymouthRd. 11 Storeaddress Subject’shomeaddress CD 12 InsertCDandselecttrack CDprovided,thirdtrack 13 Tuneradio FM91.7 14 Adjustvolume Toacomfortablelevel FM/AMradio 15 Setpreset FM91.7asthePreset1 16 Adjustvolume Toacomfortablelevel 17 TuneXMradio ElvisStation Satelliteradio 18 TurnoffXMradio Turnoff Clock 19 Changeandsettime 11:45pm iPod 20 ConnectiPodandselectsong “WonderfulTonight” this study was to reveal. All subjects had corrected visual other factors [25, 26]. Speci�cally, the six subject value acuity adequate to drive. ey drove a mean of 8,000 miles comes from assuming that each subject has about a one in peryear,somewhatlessthanistypicalintheUnitedStates. three chance of discovering a problem, that problems are Other than being a licensed driver, in good health, in independent,andthegoalistodiscover90 oftheproblems. speci�cagecategories,anativeEnglishspeaker,andexperi- Furthermore, this initial analysis assumes all problems are encewithXM/Siriusradio,therewerenootherrequirements of a similar severity, and one may have %different goals for to participate. us, there was no control over experience different level of severity. (See [27] for the most recent of a withtechnologyinthisexperiment,alimitationnotincluded long series of papers, reports, and now a book chapter on so subjects could be recruited in the time frame available. samplesize.) Experiencewithrelevanttechnologywasmixed.Fivesubjects Althoughonecanquibbleonthespeci�csofthecalcula- owned an iPod, and one owned another brand of MP3 tions,thesurprisetomanynonhumanfactors,nonusability player.OnlytwoofthesubjectshadvehicleswithXM/Sirius professionals is that most of the problems can be found radios.FivesubjectsownedGPSsystems:3TomTomsand2 with just a few more than a handful of properly selected Garmins. subjects. e data from this experiment con�rmed that Subjectswerepaid 40fortheirtimeiftheycompletedthe conclusion (Figure 7), with most problems being found by experimentintwohours.Subjectswhotooklonger(someof the �rst few subjects. Testing more subjects would provide theoldersubjects)wer$epaidanextra 10. better statistical evidence for the frequency of occurrence e sample, seemingly small, was more than adequate of each problem and would identify more problems, but for identifying problems, of which 6$4 were identi�ed. “A not many. Most importantly, testing more subjects would problem was de�ned as a situation where a task took too delay producing a complete report informing the designers long,subjectsstruggledtomakeprogress,ortheyotherwise ofinterfaceproblemstobecorrected.Inthiscase,boosting expressed doubt (“I am not sure what button to press”), the con�dence of the sponsor was partly why more than confusion, irritation (“I would like to shoot the engineer six subjects or so were tested. Further, in these instances, that designed this”), or other undesired feelings” ([24] when deciding which aspects of an interface to modify, the page 48). Problems invariably involved deviations from the percentage of subjects who encounter a problem may be intendedsequenceofstepstocompletethetaskexpeditiously. secondary.Rather,ifjustonesubjectencountersaproblem, Researchshowsthataeraboutsixsubjectsorso,thenumber andtheproblemseemsreasonable,thenchangestoeliminate ofnewproblemsfoundwitheachadditionalsubjectissmall, or reduce the impact of that problem should be consid- with the speci�c number varying the problem severity and ered. InternationalJournalofVehicularTechnology 5 100 ere were three primary severity categories—critical 90 (“showstopper”),major,andminor(oencosmetic).Acriti- %) 80 calproblempreventssubjectsfromcompletingatask,suchas ms ( 7600 not�ndingapowerbuttonoranenterkey.Amajorproblem e substantiallydelaysthesubjectbutcanbeovercome.Aminor bl 50 pro 40 problemhasminimalimpactonperformance,butachangeis d nonethelessdesired,suchasmakingalabeladifferentcolor n 30 u Fo 20 or choosing a different font. For this study, task time was 10 usedtodetermineseverity.Tasktimesgreaterthan300s(5 0 minutes)werecritical.Greaterthan30sbutlessthanorequal 1 2 3 4 5 6 7 8 9 10 11 12 13 to300swasmajor.Minorwas30sorless. Number of subjects In the literature are a number of formal methods to analyze verbal protocols some of which the authors were F 7: Cumulative usability problems found as a function of unawareofatthetimethisexperimentwasconducted(e.g., numberofsubjects.�ote:this�guredeterminesthe ofproblems [28]). However, for the purpose of this applied analysis, (outof64total)foundutilizingdatafromsubjectsintheorderthey weretested.So,combiningthedataofsubjects1and2%,38/64=59 given the degree to which the subjects were expressive the oftheproblemswerefound. experience of the experimenter, and the time available, a % custom categorization scheme seemed more appropriate. In brief, problems were categorized as device domain or 3.Results subject domain problems. Device domain problems were (1) visual or auditory interface related (layout, label, text, 3.�. Data �eduction for Problem �denti�cation. Data reduc- sound, and action time), (2) logic and organization (con- tion consists of (1) listening to each session to verify each trols, search, system, and information architecture), or (3) transcription, correcting them as needed, (2) identifying nonusabilitysowareissues(stability,database,andresponse each problem that users experienced, and (3) identifying time).Subjectproblemsincludeddataknowledge,procedure the frequency, severity, and persistence of each problem. A knowledge, and preference. In this case, the scheme was to problem was indicated when a task took too long, subjects aid�oreandesigners,manyofwhowerenot�uentin�nglish, struggled to make progress, or they otherwise expressed andhadnohumanfactorsbackground. doubt(“Iamnotsurewhatbuttontopress”),confusion,or Finally,givenfocusonwhatthesponsorneededto�x,the irritation(“Iwouldliketoshoottheengineerthatdesigned limitedtimeschedule,limitedfunding,noeffortwasmadeto this”). examine the effect of subject differences (young versus old, Subjectsindicatedproblemsinseveralways.Indications thosewhohadnavigationsystemsversusthosewhodidnot, included(1)questions(“Whatisthis?Whereisthebutton? etc.)ontheparticularproblemsencountered.Althoughvery HowdoIgetthemap?”),(2)statementsofuncertainty(“I’m interesting,theyweresecondaryissues. not sure if I can have the map when I click on this button. It does not allow me to save this radio frequency. I am 3.2. How Oen Were Subjects Able to Complete the Tasks? wonderingwhyit’snotacceptingthis.”),and(3)exclamations Table 2 shows how well subjects did when they were given with �ller words (“Oh, man ,” “Umm, ”). About 13 hints. One could interpret this data to suggest that the of the problems for younger subjects were associated with interfacewasrelativelyeasytouse,butthatwasnotthecase. questions,whereasforolders…ubjects,ques…tionswerelinke%d Many subjects required hints to complete many of the tasks. toathirdoftheproblems.Incontrast,about81 oftheprob- Had hints not been provided, the success rate would have lemsforyoungersubjectswereassociatedwithstatementsof beenhalfthevaluesshown,ortypicallylessthan50 ,quite uncertainty, versus 61 of the problems for o%lder subjects. poor. For perspective, subjects took 2 hours on average to erewasnodifferenceintheuseof�llerwords(about7 complete these 20 tasks, which is about 6 minutes%/task, a for both age groups).%Speci�c examples of how particular longtime.Severalofthetasks,suchassettingapresetradio problemsidenti�edappearlaterinthispaper. % frequency,shouldonlytakeafewsecondstocomplete. ere were instances during the experiment where the subjectwassilentforanextendedperiodoftime,wherethere 3.3. How Oen Did Problems Occur and How Severe Were were no probes from the experimenter, and where it was ey? Table 3 lists the 64 problems, a rather large number, uncertainfromthetranscriptandvideorecordingwhatthe from the most frequent to the least frequent. About two- subject was thinking. is oen occurs with novice exper- thirdsofallproblemswereexperiencedbyatleast2subjects. imenters as they focus on observing what subjects do. One Among those, 16 problems (25 percent of the total) were solutionwouldbeatimingdevicepromptedtheexperiment encounteredbymorethanhalfofallsubjects.Inthereport (unobtrusively to the subject) to probe the subject to think summarizing this project [24] the frequency of problems aloud. wasreportedmanyways.Forthetablethatfollows,theyare Frequencyisthenumberoftimestheproblemoccurred, reportedatthenumberofsubject(outof13)experiencinga usuallyacrosssubjects,butsometimeswithinsubjectgroups. problembecausethiswastheformatmostreadilyunderstood Persistencecanbethenumberoftimesaproblemreoccurred bythesponsoranditsinterfacedesigners.Furthermore,the withineachsubject. experimentwasdeliberatelydesignedtominimizerepetition 6 InternationalJournalofVehicularTechnology T2:Successfulcompletionwithhints. Success # Task #Notdo # 1 Setdestinationto100north5thavenue,AnnArbor,MI. 9 69 % 2 FindthenearestMcDonaldsandsetitasawaypoint. 8 62 3 Findthenearestgasstationfromthecurrentposition. 11 85 4 Canceltheroutethatyouhavesetup. 10 77 5 EnterandsetadestinationtothemainDetroitAirport. 12 92 6 Changemapscaletozoominslightly. 10 77 7 Enterandsetadestinationto1600PennsylvaniaAve,WashingtonDC. 10 77 8 SetthedestinationtotheintersectionofUS-23andI-94. 1 8 9 SetthedestinationtoCoboHallinDetroit,MI. 9 100 4 10 SetthedestinationtointersectionofBroadwayStandPlymouthRd,AnnArbor. 9 100 4 11 Enteryourhomeaddressandstoreit. 3 50 7 12 InserttheCDandselectthethirdtrack. 13 100 13 Adjustvolumetoacomfortablelevel. 13 100 14 TuneFMradioto91.7. 13 100 15 Adjustvolumetoacomfortablelevel. 13 100 16 Savethefrequencyas�rstpreset. 6 46 17 TuneXMradiotoElvisstation. 10 77 18 TurnoffXMradio. 12 100 1 19 Changethetimeto11:45pm. 13 100 20 ConnecttheiPodandselectthesong“WonderfulTonight.” 12 92 Note:undertheheadingsuccess,#referstothenumberofsubjectsoutof13whosuccessfullycompletedthetaskwithhints.us,fortask1,100 9/13= 69 .e#NotDoreferstothenumberofsubjectswhowerenotaskedtodouptofourtaskstokeeptheexperimentfromtakingtoolong.us,fortask9, only9ofthe13subjectsperformedthetask,butall9ofthem(100 )completethetask. × % % oftasksandtaskelements,andtherebymorewidelyexplore esystemoenfroze(problem10)whiletheexperiment the interface, which tend to limit repeated encounter with was in progress (for 11 of the 13 subjects, and for some problems. subjects, multiple times). Given that a prototype was being e linkage between what was observed and these evaluated, some problems were expected. e work around problems can best be described by example. Following is the freeze was to unplug the system, plug it back in again, a description of some of the most frequent and critical andwaitforthesystemtorestart.isprocesstookaminute problems.Problemswereidenti�edbyacombinationofhow or so to complete and was mildly annoying to subjects and long it took to complete each task and step (if they were experimenters.Althoughtestingcouldhavebeenconducted completedatall),whatsubjectsdid,whatsubjectssaid,and when the interface had fewer bugs that froze the interface, although not described here, their facial expressions. e that testing would have occurred later in design when ultimateindicationofaproblemwaswhentheysaid,“Igive therewaslesstimeandfewerresourcestocorrectproblems up.”Quitefrankly,identifyingwhenaproblemhadoccurred identi�ed. wasfairlyobvious. For task 16, 10 of the 13 subjects could not �gure out e most frequent problem (9) was that the system did howtosettheradiopresets(pushandholdthesobutton), notaccept“DC”asastatenamewhensubjectssearchedfor problem42.Whentheyreachedthepresetscreen,theywould anaddressin Washington,D.C(task1). All subjects except saysomethingsuchas“Whereisthesetbutton?”Oen,they one(92 ofthesample)triedtotype“DC.”eywouldget would push “autostore,” which reset all the presets. In fact, to the state �eld and type in “D.” Immediately, the C key onesubjectdidthisthreetimes,exclaiming,“Whydiditdo wouldg%rayoutbecausethesystemwasexpectingthesubject that, again!” when the desired frequency setting appeared to type “District.” “Why is the C key gray? I want to type onadifferentbuttonthanitwaspriortopressingautostore. it.”Incontrasttomoststatenames,theDistrictofColumbia Subjectsdidnotrealizethatthemethodusedtopresetaradio isinvariablyabbreviatedas“DC,”butthesowareaccepted frequency that works for radios with mechanical buttons only“DistrictofColumbia.”Allowingtheintelligentspeller (pushandhold),alsoworkedhere. toacceptbothcompletenamesandtwo-letterabbreviations Some 9 of the 13 subjects had trouble understanding wouldeliminatethisproblem,andspeedtheentryofother thelabeltochangethekeyboardfromalphabeticmode(the statenamesaswell. default) to numeric mode (problem 2), a step required to InternationalJournalofVehicularTechnology 7 T3:Listofproblemsbyfrequencyforall13subjects. Frequency Problem# Severity Description 12 9 Major Statenamecannotbeabbreviated(DC,WashingtonDC) 11 10 Minor Systemfreeze 10 14 Minor Nowaytoenterhighwayintersectionasdestination 10 42 Major Nointuitivemethodtosaveradiofrequencyaspreset 9 2 Major Difficulttoswitchkeyboardtonumberkeys CouldnotsearchforanXMchannelbyname,sotheybrowsedthenumericlisting, 9 20 Major whichwasslow 9 24 Critical Typedbuildingnumberinthestreetname(north/no) 9 35 Major SearchingforaniPodsongbytitlewasverytime-consuming 8 4 Minor Guidancevoicequalitywasinadequate 8 6 Critical Nointuitivemethodtocancelroute/destinationonmapscreen 8 7 Major Buttonsareeasilyconfusedwithdisplays 8 8 Major �xpectedthisinterfacetosupportmultitouch,inparticular2-�ngerzooming 8 17 Minor Correctaddressrejected(wronghousenumberrange) 7 28 Major Didnotrealizewhatthesearchbuttondid(alsousedforenter) 7 29 Major Incompleteandinsufficienttextinformation 7 37 Critical Noalphabeticinputmethodfornumericstreetname(FihAve) 6 3 Minor Negativeresponsetoautosuggestkeyboard(autocompletefunction) 6 13 Major Didnotrealizethereweremultiplepagesfordestinationscreen 6 32 Major Confusingintersectionsearchguidance(“X”,“space”) 6 36 Major Nointuitivemethodtostoretheenteredaddress/homeaddress Didnotunderstandiconsonthemapwindow(Z,compass,�ag,currentposition, 6 45 Major scale) 5 23 Major Typedbuildingnumberinthestreetname 5 26 Major DidnotunderstandtheacronymPOI 4 5 Major DidnotunderstandPOIsymbol/functiononthemap 4 16 Minor Searchdisplayedmultipleentriesforthesamedestination(3AnnArbor) 4 22 Major Difficulttoswitchtoalphabetkeys Nointuitivemethodtosearchforawaypointandadditasawaypointonthemap 4 25 Critical screen. 3 11 Minor Tooslow/fastsystemresponsetime 3 12 Minor Address/textinformationwasnotlegible 3 27 Major Didnotrecognizescalebuttononthemapscreen 3 30 Major Nowaytobackup(e.g.,mainmodescreen,simulationmode,andlocalPOI) 3 33 Minor NointuitivemethodtobringupCDcontrolaerinsertingCD 3 34 Major DidnotunderstandlabelSAT 3 38 Major DidnotunderstandlabelZagat 3 39 Major Didnotrecognizebackbutton 3 40 Minor UnabletogettoCDmodeaerinsertingCD 3 41 Major DidnotunderstandAutostore 3 44 Major Didnotunderstandiconsonthestatusbar 3 57 Major Nointuitivegraphictoselectthesearchedresultonthelist(city/street) 2 46 Minor Scalewindowdisappearstooquicklyonthescreen 2 48 Major NarrowsearchcoveragebyPOIName(nosupportforMcDonalds) 2 50 Major Didnotrecognizethedeletebuttononkeyboard 2 64 Major DidnotunderstandlabelMapkey 1 1 Minor StatenameishighlightedandshownonSearchbyAddressscreen. 1 15 Minor Confusingsearchresult(AnnArbor/AnnArborTwp) 8 InternationalJournalofVehicularTechnology T3:Continued. Frequency Problem# Severity Description 1 18 Minor Misunderstoodseek/trackbuttonlabel 1 19 Minor Didnotnoticevolumelabel 1 21 Major NointuitivedistinctionbetweenRouteandDestkey 1 31 Minor Wrongscreeninstruction(“searchbyPOI”inintersection) 1 43 Minor DidnotunderstandlabelDest 1 47 Minor Selectbuttonlocationwasinconsistent 1 49 Minor Didnotunderstand“Alternative”menu 1 51 Minor MisunderstoodAsian,Korean,andChineseonPOIlist(aslanguage) 1 52 Minor Nowaytocontrolvolumeonthescreen 1 53 Minor Didnotunderstandpresetscan 1 54 Minor MisunderstoodATMandBankonPOIcategorylist 1 55 Minor Wrongunitinformation/temperature 1 56 Major Didnotunderstandplay/stopicon 1 58 Major Didnotunderstandnumbersnexttothesearchedstreetnames 1 59 Major Notabletosetclockinexpectedmanner 1 60 Major Nointuitivelayoutforhousenumberinput 1 61 Minor Didnotunderstandorangelineonthemap 1 62 Minor Noautosavefeatureinenteringcity 1 63 Minor ConfusingcategorizationonPOIlist(gasstation-travel/automotive) Note:Problem#:e64problemswerenumbered1–64. completetask1.eydidnotrealizetherewasakeyonthe abletocompleteentryofastreetnamecontainingadirection, alphabetickeyboardscreentochangemodes,sotheywould evenwithhints.Oentheytriedusingexactlythesameand search for other screens using the back key, proclaiming stepsmultipletimes,sayingsomethingsuchas,“Ithinkthis somethinglike“whereisthenumberscreen?”Atothertimes, isthewaytodoit,butImusthavenotdoneitquiteright.” they could say, “I do not know where it is, so I will try eydidnotrealizethesystemwouldnotacceptthedataas everything,” and they selected each key on the screen one theyenteredit. by one to learn what each did. Some subjects who were Subjectsalsostruggledwithstreetnamesthatcontained methodical in their efforts found the alphabetic keyboard numbers (e.g., 5th street) as only one method of entry was mode key in this manner. e source of the problem was supported,eventhough subjects maychoose toenterthose thatthemodekeys/buttonslookedlikeotherbuttonsonthe streetsasnumbersoralphabetically(Fih).Numberedstreets keyboard, and there was no spacing or graphics to group arequitecommonintheUnitedStates. them apart. ere were other buttons for which functional Related to this were problems associated with subjects grouping, indicated by spacing and graphics, would have not realizing what had been set or was a default. is was helpedaswell. particularly true with setting the state. Keep in mind that In fact, subjects had numerous problems with so but- many cities are located on rivers, because rivers provide tons.First,therewasnographicdistinctionbetweenbuttons both water and transportation. However, rivers also serve and displays, so that when subjects were not sure what to asageographicalboundary,sogoingtoanearbyplacemay do, they pressed everything, including displays. Providing requirechangingthestate. buttons with a drop shadow or other unique graphical Problems in search for songs and radio stations were characteristic as well as auditory feedback (a click sound) common. In part, this was because the interface for each whenbuttonsareoperated should reduceconfusion. When mode (AM/FM, XM, etc.) was unique. e criteria on driving, drivers should not be looking at the display. Many which one could search was unique to the mode, and most interfaces have a beep to con�rm that a switch has been importantly,sowastheorganization.Whatcouldbesavedor pressed, but that beep is oen the same as the beep for an presetvariedwiththemode,includingwhatsavesorpresets error,whichconfusesdrivers. were named. As a consequence, subjects needed to browse Also noteworthy were two problems related to street through pages of screens to �nd an XM radio station or addressentry.ManyAmericanstreetnameshaveadirection songonastoragedevice,whichwasaverytime-consuming as part of the name (e.g., North Main) and subjects were process. Manually scrolling through a list of 100 or more uncertainiftheyshouldenterthestreetnameasNorthMain, itemsandreadingthemto�ndadesireditemshouldnotbe MainandselectNorth,orabbreviateNorthasNorNO.Only donewhiledriving,especiallywhenthelistsareinanorder one option was provided, so only 9 of the 13 subjects were thesubjectcannotusetospeedthesearch. InternationalJournalofVehicularTechnology 9 In summary, from most to least, problems included T 4: Problem numbers listed summarized by frequency and unclearlabels(20instances,suchasthatforthenumberkey severity. and the name autostore), problems with search (10, such Severity as a lack of consistency in method names and methods available,especiallyin�ndingsongsand�Mstations),poor Frequency Critical Major Minor graphics (9, many icons were meaningless), disorganized ( 300s) ( s) ( ) system (9, information such as destination modes being 12 9 split across 2 screens), illegible text (5, mostly text that 𝑡𝑡𝑡 300≥𝑡𝑡𝑡30 30≥𝑡𝑡 11 10 was too small, especially on maps), poor layout (4, such 10 42 14 as inconsistent location of the “ok” and “done” buttons), otherorganizationalissues,unreliablesoware,anddatabase 9 24 2,20,35 errors (2 each, including missing addresses), and problems 8 6 7,8 4,17 associatedwithunrecognizablesounds,slowsystemresponse 7 37 28,29 (2types),anddisorganizedcontrols(all1each). 6 13,32,36,45 3 5 23,26 4 25 5,22 16 3.4. Persistence. Using persistence to identify problems was less useful here than has been identi�ed in the literature. 3 27,30,34,38,39,41, 11,12,33,40 44,57 Examples of the most persistent problems included not understandingtheacronymPOI,notunderstandingiconsin 2 48,50,64 46 themapwindow,systemfreeze,notknowingwhatthesearch 1,15,18,19, button did, expecting multitouch to be supported, and not 31,43,47,49, 1 21,56,58,59,60 understanding the label Zagat. If anything, the persistence 51,52,53,54, 55,61,62,63 data reinforced the need for better icons and graphics, and potentiallyeliminatingiconsinsomecases.Asanexample, Note:forthedescriptionscorrespondingtotheproblemnumberslistedhere, seeTable3. there were two screens from which subjects could select a method to enter a destination. Oen, they did not realize therewasasecondscreen,sotheywerestuck.eiconswere To help designers prioritize what they should do, tables of no help. Had the icons been removed, leaving only text, were created using pairs of the dimensions of interest (fre- all of the entry methods would be on one screen and user quency, severity, and persistence). Ideally, designers should performancewouldlikelyhaveimproved. consider those dimensions, the effort required to �x each Althoughtherearenumerousnavigationsystemsinuse problem, and the implications of �xing each problem on today, and many of them use icons, there are no standard otherproblems,aswellasotherfactorsinmakingdecisions icons for navigation functions in ISO Standard 2575 [29]. aboutwhatto�x. Althoughitmaynotbepossibletodevelopwell-understood efrequency-severitytable(Table4)containsthemost iconsformanyofthenavigationfunctionsofinterest,what- usefulofthedimensioncombinations.eproblemslistedin everisdevelopedcouldbebetterthanthecurrentsituation, theupperleareaofthetable(e.g.,9,42,24,2,20,and35) whereiconsvaryfromsystemtosystem. arethemostfrequentandsevere,andtherefore,ofthehighest priority. 3.5.CombinedAnalyses. Whenpresentedwithahugelistof 4.ConclusionsandDiscussion problems,suchasthoseinTable3,thedesigners’immediate reactionisoentoignoretheoverwhelminguserfeedback. ere were many problems with this interface, some of First, there is disbelief that subjects experienced all of the whichwereexpectedearlyindesignwhenthisinterfacewas problems listed. erefore, two representatives from the examined. Subjects consistently needed hints to complete sponsorresponsiblefortheinterfacedesignobservedevery tasks,whichwasnotindicativeofanintuitiveinterface. subject, so they saw that the problems were real. Unfor- One might criticize the sponsor of the research for the tunately, they were not native English speakers, so for the existence of these problems. However, the more important �rst few subjects they struggled to understand what was point is that they supported quali�ed experts to examine occurring. at was overcome by impromptu discussions theirinterface,identifyproblems,andsuggestimprovements. with them between test sessions or at the end of the day to Keepinmindthatitisnottheresearchers’roletomakethe explainwhatwasobserved.Alsoprovidedwereafewvideo changes desired, and what is changed represents a tradeoff outtakes for others not present. In a subsequent project, a between user impact, cost, schedule, and hardware and secure web camera in the test room was provided so those sowarelimitations. not present could observe the experiment. In this case, the 13-hourtimedifferencebetweenthetestfacility(AnnArbor, 4.1. Lack of Style Guide. ere were inconsistencies in the Michigan,USA)andthesponsor’smainengineeringcenter interface,forexample,wherethe“done”or“ok”buttonwas (Yongin-Shi, Gyunngi-Do, Korea) makes remote viewing located(andhowitwaslabeled).Accordingtothesponsor’s inconvenient. representatives,therewasnostyleguideorotherspeci�cset 10 InternationalJournalofVehicularTechnology ofguidelinesgoverningtheinterface,asituationthatgreatly but there were instances where spacing and graphics could increasesthelikelihoodofproblemsduetoinconsistencyin havebeenutilizedforthispurpose. theuserinterface.Creatingastyleguide,especiallyonebased onresearch,isamajortask,butwellworththeeffort.Most 4.7.ButtonsandDisplaysLookedAlike. erewerenocom- computer manufacturers have style guides to help ensure mongraphicalelementstocontrols(primarilysobuttons) theirinterfacesareconsistent(e.g.,[30]). and displays (mostly icons), so that when subjects were lost,theypressedeverything(problem7).Insuchinstances, 4.2.InconsistentMusicInterfaces. esearchmethodsavail- havingdifferentauditoryfeedbackforoperationofabutton ableandwhatcouldbestoredaspresetsvariedwiththemedia anderroneousoperationwouldhavebeenhelpful. andarere�ectedinproblems16,20,28,35,andothers.ese Manyoftheseproblemsarenotnew(See[31]). inconsistencies led to interfaces that were unique to each Beyond this speci�c interface, which does this exper- media,makinginterfacenavigationdifficult.Admittedly,the iment say about how think-aloud experiments should be underlying databases have different structures, but a more conducted? commonformatandmoresimilarsearchfeatureswouldhave beenbene�cial,sothatsubjectswouldonlyneedtolearnone 4.8. More Experimenter Training Needed. More time was set of search methods that were consistently named, not a neededtotraintheexperimenterinthethink-aloudmethod, unique set of methods for each data set. Interestingly, this inparticulartimespentontestingpilotsubjectsandreview- system (and most others) did not allow for aggregation of ing video recordings of them. In this project, there were all favored music presets (AM/FM, XM, etc.) on a single two days between when the interface actually worked and screen. when testing had to begin to deliver results to meet the sponsor’s production schedule, far too short. A minimum 4.3.DestinationEntryProblems. erewerenumerousissues of two to three weeks is recommended. Training is par- withdestinationentry,mostduetolimitingthewaysinwhich ticularly important for nonnative speakers of the language informationwastobeenteredandnotmakingapparentwhat of subjects or those who do not have extensive experience hadalreadybeenset.eseissuesarere�ectedinproblems in testing human subjects as experimenters. Typically, they 14,16,17,37,andothers. do not prompt subjects enough, or more generally, just have problems in reading subjects, not knowing when to 4.4.IconsWereNotUnderstood. erewasadesiretoprovide engagethem.isneedwasre�ectedinsilentperiods,where icons so the interface would be language independent and neitherthesubjectortheexperimenterspoke,andoenthe usable by a wider user group. at assumes the icons are experimenter just starred at the screen. In other situations, understandable, which was not true here. Although better when designers without human factors/usability expertise icons could be developed and included in ISO Standard conductthetesting,excessiveleadingofsubjectsisobserved, 2575, how well they will be understood is uncertain. at typicallyinvolvingtellingsubjectshowtocompleteatask. suggests for some parts of the interface, icons may not be A list of probe questions and criteria for when those provided.Inthecaseofselectingadestinationentrymethod, questions should be asked would be a useful addition to all of the methods will �t on one screen instead of being thetrainingmaterials.Beingrepeatedlyasked“whatareyou distributedacrosstwoscreens,makingiteasierforusersto thinking”isannoying.Formalrulesaboutwhentointervene �nd the desired method. In other cases (e.g., maps), icons and what to do (just press this button, but not saying why) must be provided, as there is insufficient space for text (let canbehelpful. aloneiconsplustext).However,inthisinstance,mosticons Another idea is a device that would look for periods of werenotunderstood(problems44,45,64,etc.).Infact,ina silence and subject inaction. When those periods occurred verylengthyexaminationoftheiconsusedinthisinterface, forsometime,thedevicewouldvibratesomethingonwhich conductedaeralltaskswerecompleted,onaverageonly2of the experimenter was sitting as a reminder to ask a probe the13subjectswereabletocorrectlyidentifywhatthevarious question. iconsmeantwhenshownincontext.Easytounderstandmap iconsneedtobedeveloped. 4.9. Real-Time Session Recording Helped. Secretaries or stu- dentswhowerefasttypistssatintheroomwiththesubject 4.5.LabelsWereNotUnderstood. isincludedPOI(prob- andtheexperimenterandservedasthescribe,creatingthe lem 5), SAT (problem 34), Zagat (problem 38), and others. session transcripts. Immediately aer the session, when the Eachofthelabelsusedshouldbeconsideredandalternatives session was fresh in their mind, the scribe checked their proposed. is needs to be done in conjunction with the transcriptagainsttheaudioandvideorecording.Forthisto efforttodevelopnewiconsastheyareanalternativetotext occur,morethanafewminutesisrequiredbetweensubjects. labels. Nonetheless, generating transcripts in real time rather than aer the fact from a video recording reduced the time to 4.6. Lack of Functional Grouping. ere were several provide the results to the sponsor, which was important instances where information on screens was not grouped where a real evolving product with a rigid development by function, increasing the time for users to �nd particular schedulewasconcerned. information(andincreasingerrorsaswell).ebestexample ere was no evidence that the presence of a scribe ofthisisproblem2.Admittedly,spaceisextremelylimited, was disruptive, which was a concern. For example, subjects
Description: