UiRef: Analysis of Sensitive User Inputs in Android Applications BenjaminAndow AkhilAcharya DengfengLi NorthCarolinaStateUniversity NorthCarolinaStateUniversity UniversityofIllinoisat [email protected] [email protected] Urbana-Champaign [email protected] WilliamEnck KapilSingh TaoXie NorthCarolinaStateUniversity IBMT.J.WatsonResearchCenter UniversityofIllinoisat [email protected] [email protected] Urbana-Champaign [email protected] ABSTRACT fromwell-defined,systemAPIcalls,whichexplicitlyrepresentthe Mobileapplicationsfrequentlyrequestsensitivedata.Whileprior semanticsofthedatareturnedbytheAPIinvocation.Suchfocus workhasfocusedonanalyzingsensitive-datausesoriginatingfrom addressesonlyapartofthechallenge:applicationsalsoobtaina well-definedAPIcallsinthesystem,thesecurityandprivacyim- widerangeofsensitivedata,suchaspasswordsandcredit-card plicationsofinputsrequestedviaapplicationuserinterfaceshave numbers,asinputwithintheirgraphicaluserinterfaces(GUIs). beenwidelyunexplored.Inthispaper,ourgoalistounderstandthe Thesedatarequestsarelargelyignoredbyprioranalyses. broadimplicationsofsuchrequestsintermsofthetypeofsensitive Inthiswork,wefocusonprotectingsensitivedatacollectedby databeingrequestedbyapplications. third-partyapplicationsviatheirGUIs.Resolvingthesemanticsof Tothisend,weproposeUiRef(UserInputREsolutionFrame- suchdataischallenging,astheAPIcallsthatretrieveauserinput work),anautomatedapproachforresolvingthesemanticsofuser aregenericanddonotindicatethedata’ssemantics.Forexample, inputsrequestedbymobileapplications.UiRef’sdesignincludes auser’shomeaddressandcredit-cardnumberareretrievedbythe anumberofnoveltechniquesforextractingandresolvinguser samegetText()APIcallifenteredintoEditTextwidgets.Instead, interfacelabelsandaddressingambiguityinsemantics,resultingin thesemanticsofauserinputisdenotedbynatural-languagetext significantimprovementsoverpriorwork.WeapplyUiRefto50,162 intheGUI,whichprompttheusertoenterspecifictypesofdata. AndroidapplicationsfromGooglePlayanduseoutlieranalysis Semanticsresolutionofuserinputshasbeenexploredintwo totriageapplicationswithquestionableinputrequests.Weiden- recenttechniques:SUPOR[18]andUIPicker[22].However,aswe tifyconcerningdeveloperpractices,includinginsecureexposureof discussinSection3,bothtechniqueshavesignificantlimitations accountpasswordsandnon-consensualinputdisclosurestothird thatleadtoinaccuracies.Further,bothtechniquesfailtotakeinto parties.Thesefindingsdemonstratetheimportanceofuser-input accounttheproblemofwordambiguity,whichisthecoexistence semanticswhenprotectingendusers. ofmultiplemeaningsforawordorphrase.Forexample,theword “address”canrefertoapostaladdressoranIPaddressdependingon thecontextthatthewordisused,andhencecancauseconfusionfor 1 INTRODUCTION theexistingtechniques.InSection5.3,wemeasuretheprevalenceof Mobileapplicationscontinuetoconsumeanincreasingamount ambiguityduringsemanticsresolutionandshowthat20.9%ofinput ofsensitivedatatoperformavarietyoftasks,suchaspersonal resolutionscontainanambiguousterm.Althoughwordambiguity banking,healthtracking,andonlineshopping.Vettingtheseappli- isaknownandactivelyresearchedareainNLP,thelimitedamount cationstoidentifyabnormalbehaviorsisparamounttoensurethat oftextwithinGUIsmakesdisambiguationespeciallychallenging. privacyandsecurityrequirementsofusersareadequatelysatisfied. Inthispaper,weproposeUiRef,aUserInputREsolutionFrame- Priorresearch[6,10,11,13,14,32]hasdemonstratedtheutility workthatautomaticallyresolvesthesemanticsofuser-inputwid- ofanalyzingsensitive-datarequeststoidentifysecurityandprivacy getsbyanalyzingtheGUIsofAndroidapplications.UiRefadvances problemswithinapplications,includinginsecure-datatransmission, thestateoftheartusingnoveltechniquesinitsthreemainstages: private-dataleakage,andmalwareidentification.Suchapproaches layoutextraction,labelresolution,andsemanticsresolution.First, oftenrelyonthesemanticsoftheuserdatatoenableautomated UiRef’slayoutextractionprovidesanewhybridanalysisthataccu- analyses,e.g.,fortaintsourcesandruntimeprivacynotices.How- ratelyrenderscustomwidgets.Priortechniquesdonotfullysupport ever, these approaches focus on only sensitive data originating customwidgets,whichareusedby48.7%oftheapplicationsunder ourstudy.Second,UiRef’slabelresolutionmodelspatternswithin Permissiontomakedigitalorhardcopiesofallorpartofthisworkforpersonalor thespatialarrangementofwidgets,achievinga20.8%improvement classroomuseisgrantedwithoutfeeprovidedthatcopiesarenotmadeordistributed overpriorwork[18].Finally,UiRef’ssemanticsresolutiondisam- forprofitorcommercialadvantageandthatcopiesbearthisnoticeandthefullcitation onthefirstpage.Copyrightsforcomponentsofthisworkownedbyothersthanthe biguateswordswithininputlabels.Existingword-disambiguation author(s)mustbehonored.Abstractingwithcreditispermitted.Tocopyotherwise,or techniques[20]cannotbeapplieddirectly,asGUIsrarelydisplay republish,topostonserversortoredistributetolists,requirespriorspecificpermission and/[email protected]. textinfullsentences.Instead,UiReflearnsdifferentwordsenses WiSec’17,Boston,MA,USA basedontextthatappearsinotherwidgetswithinlayouts. ©2017Copyrightheldbytheowner/author(s).PublicationrightslicensedtoACM. WeuseUiReftoperformalarge-scaleanalysisof50,162appli- 978-1-4503-5084-6/17/07...$15.00 DOI:10.1145/3098243.3098247 cationsfromGooglePlay.UiRefresolvestheuser-inputsemantics WiSec’17,July18-20,2017,Boston,MA,USA BenjaminAndow,AkhilAcharya,DengfengLi,WilliamEnck,KapilSingh,andTaoXie 1 <LinearLayoutwidth="match_parent"height="wrap_content"orientation="vertical"> First Name 2 <LinearLayoutwidth="match_parent"height="wrap_content"> 3 <TextViewwidth="wrap_content"height="wrap_content"text="FirstName"/> 4 <EditTextwidth="match_parent"height="wrap_content"id="@+id/first"/> 5 </LinearLayout> Last Name 6 <LinearLayoutwidth="match_parent"height="wrap_content"> 7 <TextViewwidth="wrap_content"height="wrap_content"text="LastName"/> Address 8 <EditTextwidth="match_parent"height="wrap_content"id="@+id/last"/> 9 </LinearLayout> 10 <TextViewwidth="wrap_content"height="wrap_content"text="Address"/> 11 <EditTextwidth="match_parent"height="wrap_content"id="@+id/address"/> 12 </LinearLayout> Figure1:ExampleAndroidlayoutandcorrespondingXMLspecificationwithstrippednamespaces foreachapplication.Wethenuseoutlierdetectiontoidentifyques- Theremainderofthispaperdescribesthedesignandimplemen- tionable,andpotentiallymalicious,inputrequests.Thegoalofour tationofUiRef,followedbyitsapplicationtowardssecurityand analysisistoprovideanunderstandingofwhatsensitiveinfor- privacythreatsinuserinputs. mationapplicationsareaskingforandbroadlyidentifyvarious 2 BACKGROUND questionablepracticesincollectinguserinputsthatcanpotentially compromiseusers’securityandprivacy.Notethatforthiswork, WhenusingAndroidapplications,usersinteractwithlayouts(i.e., wedonotdifferentiateaquestionablepracticefrommaliciousin- GUIs)toenterinputsandperformactions.Layoutsconsistofwid- tent.Suchdifferentiationcanbedonebyfurtheranalysisusing gets, such as push buttons, text fields, and check boxes. These existingtechniques[6,10,11,13,14,32].Rather,UiRefcanfilter widgetsarearrangedandgroupedwithinlayoutsthroughtheuse theapplicationdatasetforfocuseddeeperinspection. ofviewgroups,whicharecontainersforholdingotherviews(i.e., Ouranalysisleadstothreemainsecurityandprivacyfindings viewgroupsorwidgets).Duetoviewgroups,layoutsarestructured (seeSection6).First,wefindthatapplicationsrequestawiderange ashierarchicaltreesnamedasviewhierarchies. ofsensitivedata,includingSSNs,passportnumbers,andhealthcare Layoutsaretypicallydefinedbythedeveloperatcompiletimeus- information.Whilelegitimaterequestsforthisdatamayexist,we ingXML.Figure1showsasimplifiedversionofalayout’sXMLfile find that many such requests do not align with the purpose of (i.e.,main_layout.xml)andacorrespondingscreenshot.Torendera the requesting application. Second, we identify 66 applications layout,applicationspasstheresourceidentifierofthelayout’sXML thatdirectlyrequesttheuser’sthird-partyaccountpasswords(e.g., filetothesetContentView()orLayoutInflater.inflate()methods.To Gmail).Theserequestsexposetheuser’sthird-partyaccountsto accessaresource,developersusetheJavaRclassgeneratedduring compromise,aswellasviolatingtheprimarytenetbehindOAuth- compilation,whilethecompiledapplicationusesintegerconstants likesolutions.Finally,weidentify6popularapplicationsthatsend tointernallyrefertoresources.Forexample,torenderthelayout sensitive-inputdatatoadvertiserswithoutdisclosingthebehavior. inFigure1,anappinvokessetContentView(R.id.main_layout).The Duringouranalysis,wealsoidentifyapplicationdesignflaws public.xmlfileintheapplicationpackage(APK)providesamapping thatmayresultinlossofprivacy.First,applicationsallowthird- oftheintegerconstantstothestringsusedintheXMLlayout. partylibrariestodirectlyrequestsensitiveinputsfromusers,likely Android’sSDKprovidesawiderangeofpre-definedviewgroups resultinginuserconfusion.Inotherwords,theusercannotidentify andwidgets.However,developersmayalsodefinecustomviewsin whetherthelibraryortheapplicationisrequestingdata.Thisflaw theirJavacodebyextendingthesepre-definedclasses,whichcan canbeexploitedbymaliciouslibrariestocompromisesensitivedata. thenbereferencedinthelayout’sXMLfiles.Theimplementationof Second,applicationsdisplayuntrustedthird-partycomponentsin thesecustomviewsmaycustomizerenderingandalsodynamically thesameGUIsassensitive-inputrequests,leavingtheapplications insertviewgroupsandwidgetswithinitsviewhierarchy. vulnerabletoprivate-datatheftduetolayout-traversalattacks[29]. Insummary,thispapermakesthefollowingmaincontributions: 3 PROBLEMANDCHALLENGES • WedevelopnoveltechniquesforsemanticsanalysisofGUI Unlike information retrieved from Android platform APIs (e.g., inputstoconsiderablyimproveoverthecurrentstateof GPS),thesemanticsoftextinputsisoftenpoorlydefined.Assuch, the art (5.5%-25.6% more accurate at extracting layouts itreceivedlimitedinvestigationbypriorresearch[18,22]incontrast and20.8%moreaccurateatresolvinglabels).Weperform tothewealthofliteraturestudyingtheabuseofprivacy-sensitive adirectcomparisonwithpriorworkbyimplementinga platformAPIs[11,13,14,32]. representativesolutioninSUPOR[18]. ProblemStatement:Thisworkseekstounderstandwhatinforma- • Weproposeanovelapproachtoaddressambiguityofde- tionAndroidappsarerequestingthroughtheirGUIsbyautomatically scriptivetextinmobileapplications.Ambiguityisakey resolvingthesemanticsoftheuserinputsandtofurtheranalyzetheir challengeunaddressedbypriorwork,anditisparticularly potentialsecurityandprivacyrepercussions. challengingforGUIsduetolimitedtext.Wedemonstrate Semanticsresolutionisthefirststepinensuringproperhan- thatwordembeddingscaneffectivelyperformthistask. dlingofsecurityandprivacysensitiveinput.Therearemanyways • WedemonstratetheutilityofUiRefinperformingsecurity inwhichtheresultingmetainformationcanbeusedtoenhance andprivacyanalysisthroughalarge-scalestudyon50,162 securityanalysis(e.g.,semanticsfortaintsources).Inthispaper, Androidapplications.Ourstudyhighlightsvariousforms weusetheresolvedsemanticstoachievetwomaingoals.First,we of questionable, and potentially malicious, practices by useittounderstandthegenerallandscapeofthetypesofsecurity developersandemphasizestheneedtoconsideruserinputs andprivacysensitiveinformationthatapplicationsarerequesting inthesecurityandprivacyanalysisofmobileapps. fromtheuser.Second,weleverageittoidentifyquestionable,and UiRef:AnalysisofSensitiveUserInputsinAndroidApplications WiSec’17,July18-20,2017,Boston,MA,USA Layout Extraction Module Label Resolution Semantics Module Resolution Module Private APK IdLeanytoifiuet r RLeanydoeurtesd Pba!aseerdn- SReemsaonlvteicrs ReDquateasts Label Layout IDs Resolution Algorithm Descriptive Text ReAwPrKit er Rewri!en APK RLeasyoolvuetsd Spo!er SeTnesrimtisve Figure2:UiRefArchitecture potentiallymalicious,practicesusedbyappsbyperformingoutlier pixelintheinputwidgettothenearestpixelsinlabels,andmapping analysis.Thisanalysisidentifiesapplicationsthatrequestdataun- eachinputwidgettothelabelwidgetwiththelowestweighted commonfortheitscategory(e.g.,agameaskingforanSSN)and average.However,SUPORsuffersfrommultiplelimitationsthat allowsustotriageappsformanualanalysis. aresourcesforincorrectlabelresolutionsasshowninSection5. Ourworkisnotthefirsttoconsiderthesemanticsofuserinputs First,italwaysresolveslabelstoaninputwidgetevenifthelabelis inAndroidapplications.Priortechniques(SUPOR[18],UIPicker[22], toofarawayfromtheinputwidget.Second,itdependsontheinput andAppsPlayground[28])havesignificantlimitationsthatimpact widget’sareasizeandpredefinedweightsthatbiasittowardslabels theaccurateandunambiguousresolutionofthesemanticsofuser totheleftandabove,thusaffectingthealgorithm’sgenerality. inputs.Weidentifythreeareasofchallenges:layoutextraction, SemanticsResolution:Mobileapplicationsuseshorttextphrases descriptive-text(label)resolution,andsemanticsresolution. in descriptive text, making semantics resolution of user inputs LayoutExtraction:Thespatialarrangementofwidgetsinfluences challenging.Wordpolysemylimitstheuseofsimplekey-phrase thelayout’ssemantics.Whiletheparent-childandsiblingrelation- matchingtechniques,andaffects20.9%ofinput-fieldresolutions shipsofalayout’sviewhierarchyinfluencethespatialarrangement, (Section5.3).Therefore,disambiguationisrequired. itisapoorapproximationofproximityduetotheflexibilityallowed Noneofthepriortechniquesresolvetheambiguityofwords. bywidgets.Further,codedefiningcustomviewsmaycontrolthe SUPOR,UIPicker,andAppsPlaygroundusekey-phrasematching typesofwidgetsdisplayedaswellastheirpositioninthelayout. ontheinputwidget’sassociateddescriptivetext,butcannotdiffer- Around48.7%ofappsinourstudy(Section6)usecustomviews. entiatethemultiplesemanticsofthatkeyphrase.SUPORacknowl- UIPickeroperatesdirectlyontheXMLspecificationoflayouts; edgesthislimitation,andexcludestheword“address”fromtheir therefore,itcannotaccuratelyinterpretthespatialarrangementof key-phraselist.Additionally,UIPickerleveragesdeveloper-defined widgetsandtheirproximitiestooneanother.Incontrast,SUPOR variablenamesandinput-widgettypeattributes(e.g.,password)in extractslayoutsbymodifyingthestaticrenderingengineofthe theirresolution.However,suchdeveloper-definednamesarenot AndroidDeveloperTool(ADT),whichprovidesthispositioning. toleranttoname-basedobfuscationorpoorcodingpractices. However,neitherUIPickerorSUPORsufficientlyhandlescustom views.TheUIPickerpaperdoesnotmentioncustomviews.The 4 SYSTEMAPPROACHANDDESIGN staticrenderingengineofADTusedbySUPORcannotexecute TheUiRefresolvesthedatasemanticsofuser-inputwidgetsby thebytecodeofcustomviews.Therefore,SUPORrenderscustom analyzingAndroidlayouts.Figure2showsUiRef’sarchitecture. viewsasthenearestsuperclassavailableintheAndroidframework, Thelayout-extractionmoduleinstrumentsAPKstoforcetheren- causinginaccurateandincompletelayoutrendering.Finally,App- deringoftheirlayoutsandexportstherenderedlayoutsforfurther sPlaygroundcanhandlecustomviews,butitmustdynamically analysis.Thelabel-resolutionmoduleidentifiespatternswithinthe navigatetheGUItoreachlayouts. placementandorientationoflabelswithrespecttoinputwidgets Descriptive-TextResolution:Toprompttheuserforinputs,de- in the layout and geometrically resolves labels. The semantics- veloperseitherusetextwidgetswithalabelinthenearbyproximity resolutionmoduleappliestext-analyticstechniquesontheinput totheinputfields(Figure1),orembeddescriptivetextasattributes widget’sassociateddescriptivetexttodeterminetheexpectedtype ontheinputwidgets(e.g.,thehintandtextattributesontheEdit- ofinformationacceptedbytheinputwidgets. Textwidget).Whileembeddeddescriptivetextisstraightforwardto resolve,identifyingthecorrectlabelforaninputwidgetisnontriv- 4.1 LayoutExtraction ial.Evenifembeddeddescriptivetextexistsforaninputwidget,the Thegoalsofthelayout-extractionmodulearethree-fold:(1)identify correspondinglabelwidgetmustberesolved.Someapplicationsuse alllayoutsinanapplication;(2)identifythespatialrelationships labelstodenotethetypeofinformationandembeddeddescriptive betweenUIwidgetsforeachlayout;and(3)identifythedescriptive texttoprovideinstructionstotheuser(e.g.,“required”). textdisplayedtousers.Thespatialrelationshipsareneededtomap Ageneralapproachtomatchaninputwidgetwithalabelwidget labelwidgetstoinputwidgets,asdiscussedinSection4.2. istodefineadistancemetric.BothUIPickerandAppsPlayground UiRef’s layout-extraction module uses a hybrid technique to usesiblingrelationshipsinlayoutstoresolvetheassociatedde- extractrenderedlayouts.UiRef’stechniqueusesstaticanalysisto scriptivetext.However,inpractice,siblingrelationshipsdonot identifythelayoutsusedbytheapplication,anddynamicon-device accuratelygaugeproximity.Incontrast,SUPORperformslabelres- renderingtoextracteachrenderedlayoutthatuserseventually olutionbypartitioningthespacesurroundinganinputwidgetinto interactwith.Sinceon-devicerenderingallowsfortheexecutionof nineareas,calculatingaposition-basedweightedaverageforeach developercode,itallowsforthecorrectrenderingofcustomviews. WiSec’17,July18-20,2017,Boston,MA,USA BenjaminAndow,AkhilAcharya,DengfengLi,WilliamEnck,KapilSingh,andTaoXie UiRefdisassemblesanAPKusingApkTool[4],andcollectsthe Algorithm1ResolutionofLabeltoInputWidget resourceidentifiersforeachlayoutpresentintheAPK’spublic.xml 1: procedureResolveLabels(uifs,labels) file.Itsavesthelayoutresourceidentifierstotheapplication’sassets 2: resolved←initemptylist 3: do foldertobeusedlaterduringon-devicerendering.Subsequently,it 4: pairs←GenCandidateSets(uifs,labels) injectsacustomactivityintotheAPK,andrewritestheapplication’s 5: labels.remove(pairs.labels) manifestfiletoregistertheinjectedactivityasanentrypoint.UiRef 6: uifs.remove(pairs.uifs) 7: resolved.append(pairs) thenreassemblestheAPK,andsideloadsitontoalivedevice. 8: whilesize(pairs)>0 Whentheinjectedactivityisexecuted,ititeratesthroughthe 9: returnresolved resourceidentifierssavedintheassetsfolder,andinvokestheset- UiRef’salgorithmforlabelresolution(Algorithm1)isiterative, ContentView()methodforeachidentifiertocausetheapplication soitcorrectlyresolveslabelsevenifinconsistenciesexist.Asan torenderthelayout.Ittheniteratesthroughtherenderedlayout’s example,the“Address”labelinthelayoutofFigure1isinconsistent viewhierarchytoextractassociatedmetadata,suchasthecoor- asitisplacedabovetheinputwidgetwhileotherlabelsareplaced dinates of each view, visibility attributes, and text strings. This totheleftoftheinputwidgets,andisstillsuccessfullyresolvedby informationalongwiththeviewhierarchyisexportedasXML. thealgorithm.Duringthefirstiteration,UiRefresolvesthelabel AsdiscussedinSection4.2,UiRef’slabel-resolutionmodulere- withthetext“FirstName”totheEditTextwiththeidentifier@+id/- quiresalistofpotentiallabelsandalistofinputwidgets.Tocon- first,and“LastName”totheEditTextwiththeidentifier@+id/last. structtheselists,UiRefusesclass-inheritanceinformationtoiden- Duringtheseconditeration,UiRefresolvesthelabelwiththetext tifythetypesofwidgetsthatcommonlyacceptuserinputs(i.e., “Address”totheEditTextwiththeidentifier@+id/address. EditText,AbsSpinner,CheckedTextView,RadioButton,CheckBox, Thelabel-resolutionmoduleacceptstherenderedlayoutsfrom ToggleButton,Switch,SwitchCompat,RatingBar),andwidgetsthat thelayout-extractionmoduleasinput,andresolveslabelsassociated displaystatictext(i.e.,TextView).Forexample,anEditText,orwid- withuser-inputwidgets. getsthatextendEditText,arelikelytoacceptuserinputs.Notethat Label-ResolutionAlgorithm:UiRefmapslabelstoinputwidgets thistechniquediffersfromSUPOR’suseofinheritanceinformation, byidentifyingpatternsintheirrelativeplacements.Algorithm1 asSUPORusesitduringtherenderingprocesstodeterminehowto describes the algorithm. The input is the sets of input widgets rendercustomviews.Incontrast,UiRefdoesnotrelyoninheritance (uifs)andpotentiallabels(labels)identifiedusingclass-inheritance informationduringrendering,asUiRefcanrendercustomviews. informationwithinthelayout-extractionmodule(Section4.1). UiRefextractsinheritanceinformationduringlayoutextractionto Algorithm1starts(Line4)byinvokingGenCandidateSets(Algo- avoidperformingtheClassHierarchyAnalysisoffline. rithm2)togeneratealistofcandidatesetsoflabelandinput-widget Todeterminewhetherawidgetacceptsauserinput,UiRefre- pairs.Foreachinputwidget,GenCandidateSetscreatesasetof solvestheclosestancestorinawidgetobject’sclasshierarchythat vectorsfromtheinputwidgettoallpotentiallabelsinthelayout isinAndroid’sframework,andincludesthenearestSDKclassinthe (Lines3–8,Algorithm2).Thevectorsrepresenttheeuclideandis- metadataoutput.Forexample,ifaclassCustomTextView1extends tance(i.e.,magnitude)anddirection(i.e.,angle)betweentheinput android.view.Textview,UiRefwouldmarkitsancestorasTextView. widgetandlabel.calcSmallestVector(notshown)createsuptothree Similarly,ifCustomTextView2extendsCustomTextView1,UiRef vectorsforeachinput-widgetandlabelpair.Twovectorsgofrom wouldalsomarkitsancestorasTextView,becauseCustomTextView2 thetwoclosestcornersoftheinputwidgettothecorresponding extendsCustomTextView1,whichextendsTextView.Weidentify cornersofthelabel.Thethirdvectoriscreatedifalabelisdirectly 14basewidgettypesfromAndroid’sdocumentation,anduseJava’s above,belowortothesidesoftheinputwidget,beinganchoredat instanceofoperatortofindtheviewobject’snearestSDKclass. theclosestpointbetweentheinputwidgetandlabel. NotethatUiRefexecutesonlythecodeneededtorenderlay- Ifavector’smagnitudeisgreaterthanapredeterminedthresh- outs,andhencedoesnotsufferfromcode-coveragelimitations. old,itisnotconsideredasacandidate(Lines6–7,Algorithm2). Thistechniquedoeshavetwolimitations.First,UiRefislimitedto Thealgorithmthenappendstheinput-widgetandlabelpairtothe staticallydefinedlayouts.However,staticallydefininglayoutsisa candidatesetundertheentryforthecorrespondingvectorifeither commonpractice,asittypicallyrequireslesseffortthandynamic ofthewidgetsdoesnotalreadyexist(Line8,Algorithm2).During generation.Second,UiRefcannotextractdynamicallygenerated implementation,weempiricallydeterminethethresholdbytaking text(e.g.,fromnetworkconnections).However,sincewefocuson theaveragedistancesoflabelstoinputwidgetsfor100randomly textlabelsforuserinputsandnotdisplaycontent,webelieveitis sampledapplications.Thedistancethresholdcanbemadeindepen- reasonabletoassumethatlabelsstaticallydefinethetext. dentfromthedevice’sscreensizebyrepresentingthethresholdas 4.2 LabelResolution aproportionofthescreensize.Notethattheappsinthevalidation setwereexcludedfromthedatasetsusedinourevaluation. ThegoalofUiRef’slabel-resolutionmoduleistoidentifythelabel Afterthecandidatesetsareconstructed,GetOptimalSet(Algo- associatedwitheachuser-inputwidget.Itoperatesontheintu- rithm3)isinvoked(Line9,Algorithm2)toextracttheoptimallabel itionthatdevelopersareconsistentwiththephysicalarrangement toinput-widgetmapping.Thealgorithmfirstretrievesthecandi- andorientationoflabelstouser-inputwidgets.Forexample,ifa datesetswiththelargestsetsize(Line2,Algorithm3).Ifthereare developerpositionslabelstotheleftofaninputwidget,thenitisex- multiplesuchsets,thealgorithmprioritizessetswhoselabelsare pectedthatotherlabelsinthelayoutwillalsobepositionedonthe eitheraboveortotheleftofinputwidgets(i.e.,thevectordirection left.Therefore,thelabel-resolutionmoduleworksbyidentifying is90degreesor180degrees,respectfully)(Lines5–8,Algorithm3). patternswithintheplacementoflabelsrelativetoinputwidgets. UiRef:AnalysisofSensitiveUserInputsinAndroidApplications WiSec’17,July18-20,2017,Boston,MA,USA Algorithm2CreateCandidateMap Algorithm3FindOptimalMapping 1: procedureGenCandidateSets(uifs,labels) 1: procedureGetOptimalSet(candidates) 2: candidates←initemptycandidatemapobject 2: maxSets←GetLarдestSets(candidates) 3: foreach i ∈uifs do 3: opt←maxSets[0] 4: foreach l ∈labels do 4: foreachs ∈maxSets[1:]do 5: v←calcSmallestVector(i,l) 5: optLorT ←IsLeftOrTop(opt.vec) 6: if v.distance >threshold then 6: sLorT ←IsLeftOrTop(s.vec) 7: continue 7: if!optLorT&sLorTthen 8: candidates[v].appendNE({i,l}) 8: opt←s 9: returnGetOptimalSet(candidates) 9: elseif optLorT ==sLorTANDopt.vec.dist >s.vec.distthen 10: opt←s Ifmultiplesetsareofequalsizeandbothvectordirectionsareto 11: returnopt theleftortopoftheinputwidgets,thenGetOptimalSetselectsthe processusesaslidingwindowacrossthewordsinthetexttofind setwiththesmallestvectordistance(Lines9–10,Algorithm3).The themostappropriategroupingsofwords. algorithmreturnstheoptimallabeltoinput-widgetmapping. Oncethesetofn-gramsiscreated,weheuristicallyrefinetheset OnceAlgorithm1receivestheoptimalmappingoflabeltoinput- asfollows.First,weremoven-gramsthatappearinlessthantwo widgetpair,itremovestheresolvedpairsfromtheinitiallists,and separatelayouts.Second,weremoven-gramsthatareasubstrings appendstheresolvedpairstotheresolvedset(Lines5–7,Algo- oflargern-gramsandappearinonlythesamelayouts.Forexample, rithm1).Thisprocessisrepeateduntilnonewlabelsareresolved. we remove the bigram “social security” if the only time that it appearsisinthelongerterm“socialsecuritynumber”.Third,we 4.3 SemanticsResolutionofInputWidgets removen-gramsthatstartorendinstopwords(e.g.,“a”,“the”,“and”). Thegoalofsemanticsresolutionistoresolvethetypesofinputdata Fourth,weremoven-gramswithcommonverbsoccurringinmiddle promptedbytheassociateddescriptivetext.Thesetypesofdata positionsofn-grams(e.g.,“could”,“have”,“would”).Notethatthe correspondtoconceptsinourterminology,whichareexpressedby goalofmultitermextractionisnottoextracttextualprompts,but terms(i.e.,singlewordorphrase)inlayouts.However,themapping rathertoextracttermsthatcorrespondtoconcepts.Forexample, oftermstoconceptsisnotnecessarilydisjoint.Differenttermsmay we seek to extract the term first name from the following text, representthesameconcept(synonymy),andthesametermmay “Whatisyourfirstname?”Finally,ifamatchingunigramisfound representmultipleconcepts(polysemy).Duetopolysemy,strategies whenspacesareremovedfromn-grams,wemarktheunigramasa ofkeyword-basedresolutioncannotbeusedtoresolveconcepts. potentialsynonymofthen-gram(e.g.,usernameandusername). Thesemantics-resolutionmoduleusesthecontextsurrounding Oncethecandidatelistsarecreated,weperformtwofinalmanual a term to resolve its concept. The module has two main tasks: post-processingsteps.First,wereadthroughthelistsandfilter (1)terminologyextraction,and(2)conceptresolution. outn-gramsrepresentingverbphrasesnotcaughtwithourfilters, andphrasesthatdonotclearlyrepresentconcepts.Second,we 4.3.1 TerminologyExtraction. Terminologyextractionisused scanthroughthelistsofn-gramsanduniterms,andmarkdown todeterminecandidatetermsthatrepresentconceptsinthedo- potentiallysensitivetermsalongwiththeirsynonyms.Forexample, main.UiRefrequiresterminologyofsecurityandprivacysensitive inthisstep,wemark“lastname”asasensitiveterm,and“surname” conceptsthatapplicationsusetopromptforinputs.Termscanbea asitssynonym.Wealsocreatealistofpotentiallyambiguousterms singleword(uniterm)oraphrase(multiterm),suchasgenderand (e.g.,name,address,number),whichisusedlaterduringsemantics dateofbirth.UiRefusesadata-driventechniquefordefiningalist resolutiontodeterminewhichtermsrequiredisambiguation. ofsecurityandprivacysensitiveterms.Thelistisderivedfromthe textdisplayedwithinlayoutsfromthedatasetinSection6. 4.3.2 ConceptResolution. Thegoalofconceptresolutionisto Ourterminology-extractionprocessusesalltextdisplayedin determinethesemanticsofaninputwidget.Forexample,UiRef thelayoutsextractedbythelayout-extractionmodule.Webegin aimstolinktheconceptsfirstname,lastname,andpostaladdressto byusingregularexpressionstoreplaceemailaddresses,URLs,and thecorrespondinginputwidgetsinFigure1.RecallfromSection3 commonphone-numberformats(e.g.,[email protected],http://www.b.com, thatkey-phrasematchingaloneisnotsufficientduetopolysemy. 123-456-7890)bytheirrespectiveterms.Forexample,[email protected] Beforewecandisambiguateterms,wemustdeterminethediffer- replacedbyemail_addr_example.Wealsosubstitutethe#symbol entmeanings(i.e.,senses)inwhichatermappears.Notethatan withtheword“number”,anduseregularexpressionstoremove automatedtechniqueisrequired,asallpossiblesensesoftheterm prompt text, such as “enter your”. We then lemmatize the text, mustbeconsideredwhenperformingdisambiguation. beingacommonpreprocessingsteptonormalizetext.Forexample, Theprocessofdeterminingthesedifferentmeaningsisknown lemmatizingtheword“ethnicities”wouldchangeitto“ethnicity.” asword-senseinduction.Theprocessofresolvingthemeaningofa Afterpreprocessingthetext,wemarktextthatcontainsonlya specificinstanceofatermisknownasword-sensedisambiguation. singlewordasapotentialuniterm.Toheuristicallyremovenoise UiRefperformsthesetwotasksusingtheAdaptiveSkipGram(Ada- andreducethemanualpost-processingeffortsdescribedbelow,we Gram)model[8].AdaGramextendsMikolov’sSkipGrammodel[21] removeunitermsthatappearonlywithinasinglelayout.Notethat byusinganon-parametricBayesianapproachoverDirichletpro- removingunitermsmaycauseustomissafewimportantterms; cessestolearnmultiplewordvectorsperword.Forasingleword, however,missedtermscanbeaddedmanually. awordvectorrepresentsasenseinwhichthewordappears. Next,wecreateacandidatesetofsensitivemultitermsbyex- Word-SenseInduction:TraininganAdaGrammodeltoperform tractingn-gramsofsizes2-4foreachpieceoftext.Notethatthis word-senseinductionrequiresflattextdocuments.Toflattentext WiSec’17,July18-20,2017,Boston,MA,USA BenjaminAndow,AkhilAcharya,DengfengLi,WilliamEnck,KapilSingh,andTaoXie withinlayouts,UiRefgeneratesastringfromthetextwithinlay- Table1:PerformanceEvaluationResults outs,scanningfromthelayout’stop-leftcornertothebottom-right LabelResolution SemanticsResolution Disambig. corner.Foreachpieceoftext,UiRefpreprocessesthetextbyper- UiRef SUPOR∗ UiRef SUPOR∗ UiRef Acc. 84.0% 63.2% 95.0% 90.2% 82.1% forminglemmatization,strippingstopwords,andtransformingthe Raw 630/750 474/750 708/745 672/745 275/335 textintolowercase.Forinputwidgetswithbothlabelsandhints, ∗UiRef’sLayoutExtraction,andourreimplementationofSUPOR UiRefoutputsthelabeltextbeforethehint.Forexample,Figure1 emulatorsandasingleNexus4,bothrunningAndroid5.1.1.Werun producesthefollowingdocument:“firstnamelastnameaddress”. applicationsthatcontainARM-basednativelibrariesonthereal To allow for the disambiguation of multiterms, UiRef gener- device,asthex86emulatorsdonotincludethetranslationlibrary. atesthesecond document forthesamelayout withmultiterms OurdatasetisbasedontheOctober31,2014PlayDrone[33] treated as one lexical unit. UiRef collapses adjacent words that snapshot(1.4millionapplications).Weperformproportionatestrat- formmultitermsbyreplacingspaceswithunderscores(e.g.,“first ifiedrandomsamplingacrossthedatasettoensurearepresentative name”becomes“first_name”)usingthelistofmultitermsfromthe samplebyusingtheGooglePlaycategoriesandnumberofdown- previousstep.Forexample,Figure1alsoproducesthefollowing loadstoformourstrata.Forverificationpurposes,weusePython’s document:“first_namelast_nameaddress”. langdetectmoduletoensurethedatasetcontainsonlyapplications Afterthesetwodocumentsarecreatedforeachlayout,UiRef withEnglishdescriptions.Ourfinaldatasetconsistsof50,162apps. trainsanAdaGrammodel,whichisusedtoperformword-sense WeuseSUPOR[18]asarepresentativebaselineforcomparison. disambiguation.NotethatUiReftrainstheAdaGrammodelusing SinceSUPOR’ssourceorbinarycodewasnotavailable,wereimple- amaximumof25potentialwordsensesbeinglearnedperword. menttheirapproach.Whenspecificimplementationdetailsarenot Thisnumberischosenthroughexperimentalexplorationtoavoid clearorambiguous,wecontacttheauthorsforclarifications.Wedo underestimatingthenumberofwordsenses,asdoingsowould notcompareourresultstoUIPickerduetothelimitationsdiscussed causeconceptstooverlapinthesamewordsense. inSection3,andtheunavailabilityoftheirtrainedclassifiers. Oncethemodelistrained,wemanuallyresolvetheconceptto 5.1 Layout-ExtractionPerformance whichthewordvectorrefersforeachpotentiallyambiguousword byanalyzingthetermsthathavetheclosestrelationtotheword ToevaluatetheeffectivenessofUiRef’slayout-extractiontechnique, vector. For example, the closest related words for one meaning weestimateitsimprovementoverSUPOR’stechniqueforrender- discoveredfor“address”arecity,street,first,zip,postal.Therefore, ingstaticlayouts.AsdiscussedinSection3,oneimprovementthat wemarkthesemanticsforthatmeaningof“address’asapostal UiRefprovidesoverSUPORistherenderingofcustomviews.To address.Theclosestrelatedwordsforanothermeaningof“address” evaluatetheimprovementforhandingcustomviews,weestimate areport,hostusername,ip,andpswd,whichwemarkasIPaddress. a lower bound and upper bound: (1) lower bound: the number SemanticsResolutionandDisambiguation:Toresolvethese- oflayoutsandapplicationsthatincludecustomviewsthatpro- manticsofeachinputwidget,UiReffirstlemmatizesthehinttext, grammaticallyaddotherviews;and(2)upperbound:thenumber combinesmultitermsintoonelexicalunit,andremovesstopwords. oflayoutsandapplicationsthatincludecustomviews.Notethat Next,UiRefusesthespottertoscanthehint’spreprocessesedtext wedonotevaluateotherpotentiallimitationsofSUPOR’slayout tosearchforthesensitivetermsconstructedinSection4.3.1.Ifthe extraction,e.g.,theaccuracyofADT’srendering. spotterlocatesasensitivetermthatisunambiguous,UiRefmarks Ourdatasetconsistsofthe50,162applicationsdescribedearlier. theinputwidgetwiththeterm’sassociatedconcept.However,ifthe Toidentifycustomviewsthataddotherviewsprogrammatically,we spotterfindsapotentiallyambiguousterm,UiRefusesthetrained disassembletheAPKsusingApkToolandperformclass-hierarchy AdaGrammodeltodisambiguatethetermusingthesurrounding analysistoidentifyclasseswhoseinheritancehierarchycontains contextwithawindowsizeof5(i.e.,5wordsbeforeand5words theViewclassoranyotherSDKclassthatextendstheViewclass after).Todisambiguateaword,UiRefpredictstheprobabilityof (e.g.,EditText,LinearLayout).Next,weperformasignature-based thetargettermgiventhecontext,andreturnstheconceptwiththe searchonthecustomview’sunderlyingcodeformethodinvoca- highestprobability.Ifthehinttextdoesnotcontainanysensitive tionsusedtoaddviews,suchasViewGroup→addView(Viewchild). terms,UiRefrepeatsthisprocesswiththeassociatedlabel’stext. Sinceourgoalistoprovideaconservativeestimationratherthan Forexample,UiRefresolvestheinputwidgetwiththewidget anexhaustiveanalysis,wechooseasignature-basedtechniqueover identifier@+id/addressinFigure1asfollows.Sincethewidgetdoes reachabilityanalysistoreducecomputationaloverhead. notcontainembeddedtext(e.g.,hintortextattributes),UiRefbegins Results:Intotal,around25.6%ofthestaticlayouts(479,337/1,873,737) byanalyzingthetextofthelabelthatitresolvesforthiswidget(i.e., and48.7%oftheapplications(24,436/50,162)containatleastone “address”).UiRef’sspotterfindstheword“address”inthetextand customview.Further,35.1%ofapplicationsinthedatasetcontain tagsitasambiguous.UiRefdisambiguatesaddressbyextractingthe atleastonecustomviewwhoseunderlyingcodedynamicallyadds surroundingcontext(first_namelast_name),andusingthemodel views(17,597/50,162),whichimpactsaround5.5%ofstaticlayouts topredictthemeaningoftheword.UiRefpredictsthat“address” (102,671/1,873,737).Therefore,UiRefprovidesbetweena5.5%to referstothesensethatcorrespondstoapostaladdress. 25.6%improvementoverSUPORforextractinglayoutswithan averageextractiontimeof53.7millisecondsperlayout. 5 EVALUATION 5.2 Label-ResolutionPerformance Inthissection,weevaluatetheeffectivenessofUiRefwithrespect toitsmajormodules.ToevaluateUiRef’slayoutextraction,weuse ToevaluateUiRef’slabel-resolutiontechnique,wemanuallyanno- acombinationofemulatorsandrealdevices.Weuse12x86Android tatethelabelthatcorrespondstoeachinputwidget.Wecompare UiRef:AnalysisofSensitiveUserInputsinAndroidApplications WiSec’17,July18-20,2017,Boston,MA,USA ourmanualannotationswiththeresultsfrombothUiRefandSU- hasa90.2%accuracywiththisdataset,suchresultisanoverapprox- POR’slabelresolutions.NotethatweuseUiRef’slayout-extraction imationduetodatasetselection,asthedatasethasalownumber techniquewhenreportingSUPOR’slabel-resolutionresultstoal- ofinputwidgetswithambiguousterms,andthemajorityofinput lowforadirectcomparisonbetweenthealgorithms,andtoreduce widgetswithSUPOR’sincorrectlabelresolutionsareresolvedusing potentialerrorscarriedoverfromSUPOR’slayoutextraction. theembeddedtextattributes(171/192).InSection5.3.3,wemeasure Werandomlyselect12applicationsforeachofGooglePlay’s42 theprevalenceofambiguityduringsemanticsresolution. categoriesfromthe50kdataset.WerunUiRefonthe504applica- 5.3.2 PrevalenceofAmbiguity. Todemonstratetheimportance tionstoextractlayouts.NotethatSUPORhandlesonlyEditText ofdisambiguationwhenresolvingsemanticsofinputwidgets,we widgets.Therefore,weremovelayoutsthatdonotcontainatleast measuretheimpactofambiguityonsemanticsresolution.Werun oneEditTextwidgettoprovideafaircomparisontoSUPOR(al- UiRef’ssemanticsresolutiononthe50,162applicationsdescribedin thoughUiRefresolves9baseinputwidgettypes),reducingour Section5andoutputinputrequestswherethedescriptivetext(i.e., datasetto280applications.Wethenremovelayoutswhosescreen- hint,label,ortext)usedforsemanticsresolutioncontainsoneofthe shotfailsordoesnotcapturetheentirelayout,layoutsthatare 19ambiguoustermsshowninTable2.Fromthe50,162applications, duplicatesofanotherlayout,containnon-Englishtext,areafrag- there are 175,101 input requests requiring semantics resolution mentsuchasasearchbar,ordonotcontainanydescriptivetext across71,291layoutsand15,642applications. oricon.Ourfinaldatasetconsistsof349layouts(109applications) Results:Table2showsthattheresolutionof20.9%(36,600/175,101) with750inputwidgetswhere420widgetshaveassociatedlabels, ofinputrequestscontainanambiguoustermwithinthedescriptive and472aremanuallytaggedascontainingsensitivedata. textusedwhenresolvingsemantics.Further,ambiguityaffectsthe Results:Table1showsthatUiRefprovidesasignificant20.8%im- semanticsresolutionof36.1%(25,720/71,291)ofthelayoutsand provementinaccuracyoverSUPORwhenresolvinglabels.Intotal, 63.9%(10,003/15,642)ofapplications.Theprevalenceofambiguity UiRefcorrectlyresolvesthelabelsfor630/750inputwidgets(84.0%) clearlydemonstrateslimitationsofkey-phrase-basedtechniques withanaverageruntimeof16millisecondsperlayout.Themajority for resolving semantics. For example, SUPOR ignores the reso- ofUiRef’sincorrectlabelresolutionsareduetoapplicationsusing lutionoftheterm“name”,whichimpactstheresolutionof5.6% TextViewstodisplaynon-modifiabledataalongsideuser-inputwid- (9,753/175,101)ofinputfieldsbeingsemanticallyresolved.Similarly, gets,causingUiRef’spattern-basedresolutionalgorithmtochoose theterm“address”affects1.8%(3,228/175,101)ofsemanticsresolu- incorrectcandidatesets.Infuturework,weplantoresolvethis tions.Thepervasivenessofambiguitywhenresolvingsemantics problembyapplyingprogramanalysistodifferentiatebetweenla- demonstratesthenecessityofadisambiguationtechnique. belsandsuchTextViews.Therestoftheincorrectlabelresolutions areduetothemultiplelabelsperinputwidget(e.g.,measurement 5.3.3 DisambiguationPerformance. ToevaluateUiRef’sdisam- units),ormultipleinputwidgetsperlabel(tabulararrangements). biguationtechnique,weevaluateUiRef’saccuracyon19ambiguous SUPOR’serrorsoriginatefromthelackofmaximumdistancemetric termsshowninTable2.Foreachambiguousterm,werandomly alwaysresultinginaresolvedlabelforaninputwidget,andfrom select25inputrequeststhatcontaintheambiguoustermasahint, itspreferenceforlabelslocatedtotheleftoraboveinputwidgets label,ortextattributeofaninputwidgetfromthedatasetinSec- whilethecorrectlabelislocatedbelowtheinputwidget. tion5.Notethatfortermsthatdonotappearinatleast25input requests(e.g.,“cc”),weselectthemaximumnumberofsamples 5.3 Semantics-ResolutionPerformance available.Wesubstitutelayoutswithanotherrandomlyselected layoutifwecannotmanuallyresolveambiguitywhenviewingthe Wenextevaluate(1)UiRef’saccuracyofresolvingsemantics;(2) screenshot.Further,sincecertainlayoutsareduplicatedacrossap- theprevalenceofambiguityduringsemanticsresolution;and(3) plicationsandcanskewtheevaluation(e.g.,over-approximating theeffectivenessofUiRef’sdisambiguation. accuracy),wesubstitutelayoutswithanotherrandomlyselected 5.3.1 OverallPerformance. ToevaluateUiRef’ssemanticsreso- layoutifitisaduplicateofapreviouslyannotatedlayoutforthat lution,wemanuallyannotatethe349layoutsinSection5.2with term.Ourdatasetconsistsof362inputrequestsfrom296layouts asemanticslabelforeachinput.Weremoveinputwidgetsfrom (250applications).Foreachlayout,wemanuallyresolveambiguity ourresultswhoseambiguitycouldnotberesolvedfromviewing byviewingthescreenshotandXMLfileandthencompareour thescreenshotalone(5inputwidgetsintotal).Wecompareour manualannotationtoUiRef’sprediction. manualannotationswiththeresultsproducedbyUiRef’ssemantics NotethatUiRefdoesnotresolvethesemanticsofaninputwidget resolutionalgorithmforinputwidgets,andSUPOR’skey-phrase requestingnon-sensitivedata(e.g.,cementagefortheterm“age”), matchingusingourtermlist.NotethatSUPORignorestheresolu- occurring with 26 widgets in our dataset. Such result is due to tionofambiguousterms,suchas“name”,soourre-implementation word-senseinductionnotlearningasensefortheconceptdueto ofSUPORalsoignoresambiguousterms. under-representationinthedataset.However,unresolvedsemantics Results:Table1showsthatUiRefachieves95.0%accuracywhen fornon-sensitiverequestsshouldnotaffecttheresults,asUiRef’s resolvingthesemanticsofinputwidgets(4.8%increaseinaccuracy goalistoidentifysensitiverequests.Thus,weremovethese26 overSUPOR).UiRef’sincorrectresolutionsareduetoUiRefnot widgetsfromourdataset,resultingin335inputrequests. havingenoughcontexttodisambiguatetheterm(14/37),thespotter Results:Table2showsthatUiRefachievesan82.1%accuracyfor missingsensitivekeywords(13/37),insufficientparsingofthetext resolvingtheambiguityofsensitiveterms(275/335).Theresults (5/37),incorrectlabelresolutions(4/37),andincorrectlymarkinga demonstrate that UiRef’s disambiguation provides a significant non-sensitiveinputrequestassensitive(1/37).AlthoughSUPOR advantageoverkey-phrase-basedtechniques,suchasSUPOR.In WiSec’17,July18-20,2017,Boston,MA,USA BenjaminAndow,AkhilAcharya,DengfengLi,WilliamEnck,KapilSingh,andTaoXie Table2:DisambiguationEvaluationResults AmbiguousTerm PrevalenceofAmbiguityDuringSemanticsResolution(175,101widgets) DisambiguationPerformance(335widgets) (<=25widgets) ExampleConcepts Requests Layouts Apps Correct Incorrect account_number bankaccount,utilityaccount 292(0.2%) 274(3.8%) 181(1.2%) 20 3 address IPaddress,postaladdress 3,228(1.8%) 2,905(4.1%) 2,310(14.8%) 22 2 age personage,productage 334(0.2%) 297(0.4%) 198(1.3%) 21 3 cc carboncopy,creditcard 61(0.03%) 55(0.1%) 48(0.3%) 3 0 cm personheight,productheight 1,599(0.9%) 1,5692.1%) 1,553(9.9%) 4 2 day birthday,currentday 7,297(4.2%) 5,502(7.7%) 2,095(13.4%) 1 5 destination physicallocation,filelocation 165(0.1%) 158(0.2%) 141(0.9%) 16 7 first firstname,firstplace 325(0.1%) 307(0.4%) 254(1.6%) 22 3 ft personheight,productheight 1,537(0.9%) 1,530(2.1%) 1,521(9.7%) 1 2 height personheight,productheight 246(0.1%) 233(0.3%) 148(0.9%) 19 2 last lastname,lastplace 201(0.1%) 176(0.2%) 144(0.9%) 20 2 lb personweight,productweight 3,056(1.7%) 1,548(2.2%) 1,533(9.8%) 2 4 location physicallocation,filelocation 4,577(2.6%) 4,545(6.4%) 2,653(17.0%) 23 2 month birthmonth,expirationmonth 386(0.2%) 313(0.4%) 235(1.5%) 3 6 name personname,filename 9,753(5.6%) 9,497(13.3%) 7,403(47.3%) 19 5 number phonenumber,cardnumber 1,641(0.9%) 1,330(1.9%) 995(6.4%) 20 3 security_code CCVcode,verificationcode 145(0.1%) 136(0.2%) 109(0.7%) 22 2 weight personweight,productweight 285(0.2%) 273(0.4%) 181(1.2%) 19 3 year birthyear,expirationyear 1,472(0.8%) 1,394(1.9%) 1,174(7.5%) 18 4 Total 36,600/175,101(20.9%) 25,720/71,291(36.1%) 10,003/15,642(63.9%) 275/335(82.1%) 60/335(17.9%) particular,UiRefresolvesthesemanticsoftheterm“address”with requestsarerelatedtoaccountorcontactinformation(e.g.,user- 88.0%accuracy(22/25)whereSUPORignoresthetermduetoambi- names/emailaddresses,passwords),whichisduetoapplications guity.UiRef’sdisambiguationalsoshowsa91.6%accuracy(22/24) requiringaccountloginorregistration.Applicationsalsorequesta whendistinguishingbetweencredit-cardverification(CCV)codes substantialamountofpersonalinformation(e.g.,theperson’sname, andpassphrasesdenotedbytheterm“securitycode.”Inthiscase, dateofbirth,gender,age,race,maritalstatus,religion,andpolitical UiRef correctly disambiguates “security code” as CCV code (15 affiliation).Otherrequestsincludesensitivepersonalidentifiers widgets),andasapassphrase(7widgets). (e.g.,SSN-52applications,driver’slicensenumber-51applica- tions).Finally,applicationsrequestarangeoffinancialdata(e.g., 6 SECURITYANDPRIVACYANALYSIS credit card numbers), vehicular data (e.g., vehicle identification numbers—VIN),locationdata(e.g.,postaladdresses),anddevice OurprimarymotivationforcreatingUiRefwastoclassifythesecu- data (e.g., IMEI). IMEIrepresentsan interesting usecase as the rityandprivacysensitiveinputsprovidedbyusers.Inthissection, applicationsareaskingtheusertoinputtheIMEIratherthanre- weuseUiReftoperformalarge-scalestudyof50,162GooglePlay trievingtheIMEIinthebackground.Notethatforthisstudy,we applicationsfromSection5tounderstandthescopeofquestionable determinethesensitivityofthetermsbasedonownexperiences, securityandprivacypracticesWeranUiRefonthedatasetand however,UiRef’sapproachisgenericandcanbeeasilyadaptedfor triagedapplicationsusingoutlierdetectionformanualinspection. alternativetermswithdifferentsensitivities(seeSection7). Weaimtoexploretwomainresearchquestions: Finding2:Applicationsdirectlyrequestthird-partypasswordsde- RQ1:Whattypesofsecurityandprivacysensitiveinformationare featingthepurposeofOAuth-likesolutions.Knowledgeofapassword mobileapplicationsaskingfor? leadstofullcontrolofusers’accounts.AprimarygoalofOAuthis RQ2:Whatarethesecurityandprivacyimplicationsofsensitive toallowthird-partiestoaccessauser’sresourceswithouthaving inputrequests? directaccesstotheirpassword.Further,theusercanrevokeaccess Note that our analysis is not intended to be comprehensive. withoutchangingthepassword.Iftheuserentersthepassword Rather,weintendtobroadlyhighlightvariousformsofquestion- directlyintotheapplication,theapplicationcanstillrequestan ablepracticesbyapplicationdevelopers.Ourhopeisthatourinitial OAuthtoken;however,iteliminatesmuchofOAuth’sbenefits. findingswilldrivefurtherresearchinunderstandingthesecurity Wefound68applicationsdirectlyrequestingtheuser’sTwitter andprivacychallengesassociatedwithuserinputrequests.Ex- (55apps),Gmail(10apps),Facebook(2apps),andAdobe(1app) tendedresultsareavailableontheprojectwebsite[5]. credentialswithininternally-definedlayouts.MostTwitterpass- wordrequests(51/55)comefromtheWinterWellJTwitterlibrary, 6.1 SensitiveInformationRequests whichrequeststheuser’spasswordwithinastaticAndroidlayout. Theremaining4applicationsrequestingtheuser’sTwitterpass- ToanswerRQ1,weconsolidatedthesensitiveinputrequestsiden- worduseotherthird-partylibrariestorequestanOAuthtoken(e.g., tifiedbyUiRefforeveryapplication.Wegroupedthisinformation Twitter4J).Duringouranalysis,wefoundthattheJTwitterlibrary into9categoriesbasedonhigh-levelsemantics(Table3).Tofur- alsohasanoptiontorequestOAuthtokenswithinanembedded therunderstandthesensitiveinputrequestsmadebyapplications, WebView.Thisisstillabadpractice,astheapplicationthatembeds weusethefollowingmethodology:(1)Wemanuallyidentifyrele- theWebViewcanaccessalloftheWebView’sdata.Infact,Google vantsensitiveconceptsextractedbyUiRef,(2)forinterestinginput deprecated support for OAuth requests to Google in embedded requests,weviewthelayout’sscreenshot,and(3)iffurtherclarifi- WebViewsforsecurityconcerns[1].Theapplicationsrequesting cationisrequired,weanalyzetheapplication’sdisassembledcode. Gmail(10),Facebook(2),andAdobe(1)passwordsdirectlyrequest Ouranalysisresultsinseveralinterestingfindingsasfollows. thedataforloginpurposesandaresubjecttothesameproblem. Finding1:Applicationsrequestawide-rangeofsecurityandprivacy Finding3:Applicationsfrequentlyrequestsensitivefinancialinfor- sensitiveinformation.Table3showsthetypesofsensitiveinfor- mation.Wefoundthataround8%ofapplications(4,433/50,162)are mationrequestedbyapplications.Themostfrequentinformation UiRef:AnalysisofSensitiveUserInputsinAndroidApplications WiSec’17,July18-20,2017,Boston,MA,USA Table3:ConsolidatedSensitiveInformationRequests Category SensitiveInformationRequests(#Applications) Account/Contact username_or_email(11047),passwd(10251),phone_num(6307),twitter_passwd(55),wifi_passwd(17),gmail_passwd(10),social_media_url(8),wifi_ssid(7),ftp_passwd(6), smtp_passwd(5),facebook_passwd(2),nq_account_passwd(1),mint_passwd(1),gritsafe_passwd(1),exchange_passwd(1),adobe_passwd(1) Personal person_name(8057),date_of_birth(2285),gender(1237),company_name(499),person_age(178),person_weight(149),job_title(121),person_height(109),school_name(75), education_info(38),marital_status(37),demographic_info(27),native_language(8),citizenship(7),birth_place(7),marriage_date(5),religion(4),political_affiliation(3), Financial credit_card_info(4433),loan_info(1974),bank_account_info(156),salary(116),bank_info(78),house_financial_info(57) Vehicle/Driver vehicle_info(173),license_plate(66),vehicle_vin(61),insurance_policy_num(52),vehicle_registration(11),license_expiry_date(3) Device device_id(106),mac_address(27),serial_num(26),service_provider(19),device_manufac_info(6),sim_card(6), Health medication_name(70),prescription_num(27),drug_dosage(23),blood_pressure(22),blood_type(16),heart_rate(14),body_mass_index(11),blood_glucose(6),doctor_email_id(2) PersonalId# SSN(52),driver_license_num(51),id_num(20),tax_id(5),passport_num(3),student_id(1),medicaid_num(1) FamilyMember family_member_name(30),family_member_phone(9),guardian_email(3),mother_birth_place(2) Location/Travel location_info(6761),flight_num(28) requestingtheuser’screditcardinformation(e.g.,creditcardnum- datadoesnotalignwiththeapp’scategory.Forexample,inthe ber,securitycode,andexpirationdate).Exposingcreditcarddetails Communicationcategory,outliersrequesttheuser’sreligion,mari- andotherfinancialinformationtountrustedthird-partyapplica- talstatus,anddemographics.Weinspectedtheseappsandfound tionsishighrisk,asdemonstratedbythenumerousdatabreaches thatsomerequestthedatatosendtoadvertisersanddisclosethis inrecentyears[3].Further,applicationsthatarenotfullycompliant purposetousers,whileothersrequestittosearchforfriends. tothePaymentCardIndustry’sDataSecurityStandard[2](PCI InthePersonalizationcategory,outlierapplicationsaskfordata DSS)placetheuseratpotentialriskforfinanciallossandfraud. suchastheuser’sbirthday,birthplace,andbankaccountinforma- PCIDSSoutlinesstandardsformerchantsthathandlecreditcard tion.Throughmanualinspection,wefoundtheapp(com.Chinese_I_ information,suchasencryptedstorageandtransmission.Basedon Ching_Horoscope_20706)requeststheuser’sbirthdayandbirth- ourfindings,webelievethatthereisaneedfordeeperanalysisto placetoprovidehoroscopereadings,butsendthedatatoathird- identifynon-compliance,andexplorealternatepaymentsolutions, partywebsitetoprovidetheservicewithoutnotifyingtheuser.The suchasoptingforcentralizedandwell-testedbillingprocessors app(psmainapp-ui)requestingbankaccountinformation(bank (e.g.,Google’sIn-appBillingservice),toreduceassociatedrisks. nameandcredentials)providesavaultforencrypteddatastorage. Uponmanualanalysis,wefoundthattheappusesaconstantseed 6.2 IdentifyingAnomalousInputRequests forkeygeneration(’sknpyvvn’),whichputstheusers’dataatrisk. Similarly,intheGamecategory,outliersaskforawide-rangeof To explore RQ2, we used outlier analysis to triage applications personalinformation,suchastheuser’sname,salary,age,marital formanualanalysis.Weapplyanunsupervisedtechnique,called status,gender,andSSN,whichwediscussfurtherinFindings5-7. theRanking-basedOutlierAnalysisandDetection(ROAD)algo- Finding5:Applicationsdisclosesensitiveinputrequeststoadvertis- rithm[31]atthegranularityofGooglePlay’s25applicationcate- ers.Wefound6gameapplicationsrequestingtheuser’szipcode, gories(gamessubcategoriescombinedintoonecategory),togener- age,salary,gender,maritalstatus,educationinformation,ethnicity, atearankedlistbasedonthelikelihoodthatarecordisanoutlier. andpoliticalaffiliation.Onfurtheranalysis,wefoundthatthey NotethatwemodifythedistancefunctionoftheROADalgorithm weredisclosingthisinformationtotheMillennialMediaadvertising byusingaprobability-basedvalueweightingfunction,sothatrare networkfortargetedadvertisements.Theappswerealldeveloped attributeshavealargerinfluenceonthedistancefunction.Anap- bythesamedeveloper(BrettPlummer),and3ofthemarerelatively plicationisconsideredtobeanoutlierbasedonitsdistancefrom popular1,astheyhave100-500kdownloads.Theappscollectthe thelargestcluster(orsecondlargestclusterifthelargestcluster sensitivedatarequestsinlayoutsclaimingtobeusedforprofile requestsnosensitiveinput),whereclustersareformedbasedon information,whichismisleadingasitisonlyusedtodiscloseto thetypesofsensitiveinformationtheyrequest.Interestingly,only advertisers.Further,theprivacypoliciesandlayoutsdonotspecify theWeathercategoryhadsensitiveinputrequestsassociatedwith thatthesensitivedataprovidedisbeingdisclosedtoadvertisers. thelargestcluster.SectionA(Appendix)providesadetailedexpla- Finding6:Applicationsincludethird-partylibrariesthatdirectly nationonROADandhowweuseittodetectoutliersinourdataset. requestsensitiveinformation.Wefound2gameapps2thatincludea Table4(Appendix)showsanexceptoftheresultsofouroutlier third-partylibrary(i.e.,SkillzeSports)thatdisplayslayoutstore- analysis,wherethecountsrepresentthenumberofapplications. questawide-rangeofsensitiveinformation,suchasSSNs,passport Ourapproachistouseoutlieranalysisasafilteringmechanism numbers,names,addresses,creditcarddata,andphonenumbers. toidentifyapplicationsthatmeritfurtherdeeperanalysis.Wefocus Althoughwecouldnotsuccessfullyruntheappstoverifythere- ourmanualanalysisonthetop-10outlierapplications(basedon questsduetocrashing,weviewedtheSkillzeSports’swebsiteand thedistancefromthelargestcluster)foreachofthe25Google foundthatitrequeststhisdatatoallowuserstowithdrawtheir Playcategories(250appsintotal).Ourdeepermanualanalysis winnings.Themainconcerninsuchcasesisthattheusermaynot involvesreviewingthescreenshotsoftheapplication’slayoutsand understandtowhichentitytheyaredisclosinginformation.Forex- itsdisassembledcodetoexplorehowtheapplicationisusingthe ample,considerthecasewheretheusertruststhemainapplication data.Thefollowingdiscussionreportsourhigh-levelfindings. andiswillingtodisclosetheirSSNtothatapplication,butdoesnot Finding4:Applicationsaremakingquestionablerequestsasking trustSkillzeSportswiththeirinformation.Sincethelayoutsdonot fortoomuchsensitivedata.Table4(Appendix)showsthecontrast denotewho(libraryorapplication)iscollectingtheinformation, betweenexpectedandunexpectedinputrequestsbyoutlierapps. usersmaypotentiallyexposedatatothird-partiestheydonottrust. Whenthelargestclusterrequestsaconcept(e.g.,username)itisnot unexpectedfortheoutliertoalsorequestit.However,outlieranal- 1com.alaskajim.{football,bible2,rockmusic} ysisidentifiesanumberofinterestingcaseswheretherequested 2com.binarypumpkin.bingo.{easter,usa} WiSec’17,July18-20,2017,Boston,MA,USA BenjaminAndow,AkhilAcharya,DengfengLi,WilliamEnck,KapilSingh,andTaoXie Finding7:Applicationsaredisplayinguntrustedadsinthesame that leak private information, such as GPS location and device windowsassensitivedatarequests.Wefoundthesame6applica- identifiers.PiOS[10]andAndroidLeaks[14]aretheinitialstatic tionsdiscussedinFinding5displayingadsinlayoutsthatrequests analysis duals of TaintDroid for iOS and Android, respectively. sensitiveinput.Priorwork[29]showedthatuntrustedcodecan Duetothevariouschallengesassociatedwithstaticallyanalyz- traverseUIstostealprivatedatainlayouts.Forexample,anun- ingAndroidapplications,anumberofadditionalstatic-analysis trustedwidgetloadedwithinthesameviewhierarchycanobtain frameworks[12,13,15,24,25]havebeenproposed.Otherprogram therootview,andthentraversebackdownthehierarchytosearch informationthaninformationflowsisalsousedtostudyprivacyin forinputwidgetswithsensitivedata.Ourfindingdemonstratesthe mobileapplications.Hanetal.[17]compareAPIcallswithiniOS needformechanismssuchasLayerCake[29]. andAndroidapplications,andshowiOSapplicationscallsignifi- Takeaway:Thefindingsfromourstudymotivatetheneedtoana- cantlymoreprivacy-sensitiveAPIsthantheirAndroidcounterparts. lyzehowappsusesensitivedatarequestedthroughinputwidgets. Inpriorwork,theprivacysemanticsofinformationisunambiguous, Thesefindingswerepossibleduetotheavailabilityofreliableuser asitcomesfromawell-definedAPIs.Incontrast,UiRefresolves inputsemanticsfromUiRef.Althoughouranalysisisnotcompre- privacysemanticsfromambiguoususer-inputAPIs. hensive,wefoundseveralconcerningpracticesandsensitiveinput The act of sending privacy-sensitive information to the net- requestswithlimitedmanualanalysis,furthermotivatingtheneed workdoesnotconstituteaviolation.AppIntent[34]pairsscreen- forautomatedanalysistofocusontheappstriagedbyUiRef. shotswithpotentialprivacyleaksforhumanreview.Recentre- searchhassoughttolessentheneedforhumanreviewbyusing 7 DISCUSSION naturallanguageprocessing.WHYPER[26],AutoCog[27],and ThreatstoExternalValidity:Thedatasetusedinthestudymay CHABADA[16]considertheapplication’stextdescriptiontohelp notberepresentativeoftheentiremarket.Tomitigatethisthreat, infertheuser’sexpectationofsecurityandprivacyrelevantac- weuseproportionatestratifiedsamplingtoselectourdataset,using tions.AsDroid[19]checksthecoherencebetweenUItextandevent theapplicationcategoriesandpopularitytoformthestrata.Further, callbackstriggeringsensitivebehavior.UiRefcomplementsthese UiRef’soutlierdetectionassumesthatthemajorityofapplications approachesbyprovidingcomprehensivecontextualsemanticsof arewellbehavedandnotrequestinganextraneousamountofdata. theUI.Pluto[9]assessesuserdataexposurebyresolvingthese- AlthoughthisassumptionisacceptableforGooglePlayapplications, manticsofdataoriginatingfromwell-structuredfiles(e.g.,SQL, thisassumptionmaynotbetransferabletoothermarkets. XML,JSON).ClosesttoourworkareSUPOR[18]andUIPicker[22] ThreatstoInternalValidity:UiRefextractsthestaticallydefined forwhichweprovideadetailedcomparisoninSection3. layoutsfromapplications.Applicationsmayalsodynamicallygen- Otherworkhasfocusedonusingprogramanalysistoextract erateandmodifylayoutsatruntime,anddisplaycontentinWeb- thestructureandsequenceofGUIsasintermediaterepresentations. Views,whichmayresultinfalsenegatives.Reconstructingdynamic GATOR[30]extractsGUI-layouthierarchiesandtransitiongraphs layoutsusingstaticanalysisisachallengingproblem,leftasfu- fortest-inputgenerationbyperformingstaticreferenceanalysis turework.Applicationsmayalsoincludeunreachableorunused andcontrol-flowanalysis.A3E[7]constructsactivity-transition layoutsintheAPKs,suchaslayoutsbundledwithlibrariesorasar- graphstodriveautomatedapplicationexplorationbyusingstatic tifactsfromtesting.Asthischaracteristiccanleadtofalsepositives, tainttrackingtoresolveintentsthatflowtomethodinvocationsthat identifyingreachablelayoutsfromtheapplications’entrypoints launchactivities.Test-inputgenerationandautomatedexploration maymitigatethisproblem.Althoughtextuallabelsarecommonly toolswouldbenefitfromUiRefbyusingittoresolvethesemantics usedforinternationalization,iconsandimagesmayalsodenote ofinputwidgets,andusesuchsemanticstogenerateinputdata. semantics,leadingtofalsenegatives.Integratingopticalcharacter 9 CONCLUSION andimagerecognitioncanbeconductedinfuturework. Automaticallyextractingacomprehensivelexiconofsecurity WhilepriorstudiesofAndroidsecurityandprivacyhavefocused andprivacyrelatedtermsisanopenproblemandwarrantsfuture oninformationfromwell-definedAPIs,theyhavelargelyignored investigation.Althoughanincompletelexiconmayresultinfalse userinputsasasourceofsensitiveinformation.Inthispaper,we negatives,asUiRefusesittodeterminetypesofinformationcon- havepresentedUiRefforresolvingthesecurityandprivacyseman- sideredprivate,UiRef’stechniquesaregeneralandthelistcanbe ticsofdataenteredintoinputwidgets.Wehaveintroducednovel substitutedonceamorecomprehensivelexiconbecomesavailable. techniquesthatachieveanoverallaccuraciesof95.0%atsemantics Further,userscommonlyprogressthroughsequencesoflayoutsto resolutionand82.1%atdisambiguation.WehaveusedUiRefto accomplishtaskswithinapplications,suchthatpriorlayoutsmay performalarge-scalestudyof50,162apps,andidentifiedconcern- providecontextintothesemanticsofthecurrentlayout.Thelack ingpractices,includinginsecureexposureofaccountpasswords ofpriorcontextmayleadtoimprecisionwhenresolvingcertain andnon-consensualinputdisclosurestothirdparties.Ourfind- layouts.Infuturework,weplantoexploreleveragingcontextfrom ingsdemonstratethatuser-inputsemanticscouldprovideaunique priorlayoutsinworkflowstoassistinresolvingsemantics. perspectiveintoimprovingmobile-appsecurityandprivacy. 8 RELATEDWORK 10 ACKNOWLEDGMENTS Althoughtherehasbeenaconsiderableamountofworkthatfo- ThismaterialisbaseduponworksupportedbytheMarylandPro- cusesonmobilemalwareanalysis,welimitourdiscussiontoprivacy curementOfficeunderContractNo.H98230-14-C-0141.Thiswork analysisofmobileapplications.TaintDroid[11]pioneeredthearea isalsosupportedinpartbyNSFgrantsCNS-1513690,CNS-1513939, byusingdynamictaintanalysistoidentifyAndroidapplications CNS-1434582,andaGoogleFacultyResearchAward.
Description: