What is keeping my phone awake? Characterizing and Detecting No-Sleep Energy Bugs in Smartphone Apps Abhinav Pathak Abhilash Jindal PurdueUniversity PurdueUniversity [email protected] [email protected] Y. Charlie Hu Samuel P. Midkiff PurdueUniversity PurdueUniversity [email protected] [email protected] ABSTRACT Keywords Despitetheirimmensepopularityinrecentyears,smartphonesare Smartphones,Mobile,Energy,Energy-Bug,No-Sleep-Bug. and will remain severely limited by their battery life. Preserv- ing this critical resource has driven smartphone OSesto undergo 1. INTRODUCTION aparadigmshiftinpower management: bydefault everycompo- nent,includingtheCPU,staysofforinanidlestate,unlesstheapp 1.1 Motivation explicitlyinstructstheOStokeepiton! Suchapolicyencumbers app developers to explicitly juggle power control APIs exported Smartphoneshavesurpasseddesktopmachinesinsalesin2011 by the OS tokeep thecomponents on, during their activeuse by tobecomethemostprevalentcomputingplatforms[1]. Toenrich theapp and off otherwise. Theresultingpower-encumbered pro- theuserexperience,moderndaysmartphonescomewithahostof grammingunavoidablygivesrisetoanewclassofsoftwareenergy hardwareI/Ocomponents embeddedinthem. Thelistofcompo- bugsonsmartphonescalledno-sleepbugs, whicharisefrommis- nentsbroadlyfallintotwocategories:traditionalcomponentssuch handlingpowercontrolAPIsbyappsortheframeworkandresult asCPU,WiFiNIC,3Gradio,memory,screenandstoragethatare insignificantandunexpectedbatterydrainage. alsofoundindesktopandlaptopmachines,andexoticcomponents Thispapermakesthefirstadvancestowardsunderstandingand such as GPS, camera and various sensors. And they differ from automatically detecting software energy bugs on smartphones. It theirdesktop/laptopcounterpartsinthatpowerconsumedbyindi- makes the following three contributions: (1) we present the first vidualI/Ocomponentsisoftencomparableto,orhigherthan,the comprehensivestudyofrealworldno-sleepenergybugcharacter- powerconsumedbytheCPU. istics; (2) we propose the first automatic solution to detect these This, alongwiththefactthatsmartphones havelimitedbattery bugsbasedontheclassicreachingdefinitionsdataflowanalysisal- life, dictatesthatenergyhasbecome themostcriticalresourceof gorithm; (3) we provide experimental data showing that our tool smartphones. Preserving this crucial resource has driven smart- accurately detected all 14 known instances of no-sleep bugs and phoneOSestoresorttoaparadigmshiftincomponentpowerman- found30newbugsinthe86appsexamined. agement. On desktop machines, where the CPU accounts for a majority of the energy consumption, the default energy manage- mentpolicyisthattheCPUstayson(orrunsatahighfrequency) unlessanextendedperiodoflowloadhasbeenobserved.Thepol- Categories andSubject Descriptors icyisconsistentwiththehistoricalnotionthatenergymanagement D.2.5[SoftwareEngineering]:TestingandDebugging isasecondclasscitizensincemachinesarepluggedintoapower source[2]. Smartphones,insharpcontrast,makepowermanage- ment policy a first class citizen. In fact, the power management General Terms policyonsmartphones hasgonetotheotherextreme: thedefault powermanagement policyisthateverycomponent, includingthe Design,Experimentation,Measurement CPU,staysofforinanidlestate,unlesstheappexplicitlyinstructs theOStokeepiton! Inparticular,allsmartphoneOSes,e.g.,Android,IOS,andWin- dowsMobile,employanaggressivesleepingpolicywhichputthe componentsofthephonetosleep,i.e.,putsthemintoanidlestate Permission tomake digital orhardcopies ofall orpartofthis workfor personalorclassroomuseisgrantedwithoutfeeprovidedthatcopiesare immediatelyfollowingabriefperiodofuserinactivity. Intheidle notmadeordistributedforprofitorcommercialadvantageandthatcopies state, the smartphone as a whole draws near zero power, since bearthisnoticeandthefullcitationonthefirstpage.Tocopyotherwise,to nearlyallthecomponents, includingCPU,areputtosleep. Such republish,topostonserversortoredistributetolists,requirespriorspecific asleepingpolicyislargelyresponsibleforprolonged smartphone permissionand/orafee. standbytimes–smartphonescanlastdozensofhourswhenidle. MobiSys’12,June25–29,2012,LowWoodBay,LakeDistrict,UK. Theaggressivesleepingpolicy,however,severelyimpactssmart- Copyright2012ACM978-1-4503-1301-8/12/06...$10.00. 267 phoneapps,sinceanappmaybeperformingcriticaltasksbyinter- work, including popular apps (e.g., Facebook) and built-in (i.e., mittentlyinteractingwiththeexternalworldusingvarioussensors. shippedwith)appsandservices(e.g.,theAndroidemailapp).The Forexample,anappsyncingwitharemoteserveroverthenetwork bugsarecollectedbycrawlingInternetmobileforums,bugrepos- mayappear toperformnoactivitywhenwaitingfor theserver to itories,commitlogsofopensourceAndroidappsandbyrunning send itsreply, andthesystemmaybeput tosleepbytheaggres- our no-sleep bug detector developed inthispaper. Foreach bug, sivesleepingpolicy,leavingtheremoteserverwithaviewoflost wecarefullyexamineitsreportedsymptoms,correspondingsource connectivity. codeandrelatedpatches(whenavailable),anddeveloper’sdiscus- Toavoidsuchdisruptionsduetotheaggressivesleepingpolicy, sions (when available), or the analysis performed by our bug de- smartphoneOSesprovideasetofmechanismsforappdevelopers tector. Ourstudyrevealsataxonomyofthreemajorcausesofno- toexplicitlynotifytheOSoftheirintentiontocontinueusingeach sleep energy bugs, which provide useful guidelines and hints to component. Inparticular, theOSexportsexplicitpower manage- designingeffectivedetectiontechniques. Ourstudyalsoconfirms menthandlesandAPIs,typicallyintheformofpowerwakelocks thesignificantburdenpower-encumbered programmingplaceson andacquireandreleaseAPIs[3],forusebytheappdeveloper to appdevelopers. Forexample,makingasingleoutgoingorincom- specify when aparticular component needs tostay on, or awake, ingphonecallinAndroidinvolvesabout40invocationsofpower untilitisexplicitlyreleasedfromduty. control APIs in the Dialer app and the framework’s Radio Inter- Wearguesuchexplicitmanagementofsmartphonecomponents face Layer services to dynamically manage the power control of by appdevelopers haspresented totheapp developer aprofound theCPU,screen,andothersensors! paradigmshiftinsmartphoneprogrammingthatwecallaspower- (2) The first solution to automatically detect no-sleep energy encumbered programming. This new programming paradigm bugs: We make the key observation that power control APIsare placesasignificantburdenondeveloperstoexplicitlymanipulate explicitlyembeddedintheappsourcecodebytheappdevelopers, thepowercontrolAPIs(§4.1detailsonesuchexampleofthebur- andtwooutofthethreecausesforno-sleepenergybugsfromour den placed on developer due to power encumbrance). This ma- characterizationstudyarebecauseaturn-onAPIcallismissinga nipulation isrequired toensurethecorrect operation oftheapps. matching turn-off API call before the end of the program execu- Consequently,power-encumberedprogrammingunavoidablygives tion.Wethusproposeacompile-timesolutionbasedontheclassic risetoanewclassofsoftwareenergybugsonsmartphones,called reachingdefinitionsdataflowanalysisproblem[6]toautomatically no-sleepbugs. No-sleepbugsaredefinedasenergybugsresulting inferthepossibilityofano-sleepbuginagivenapp. Oursolution from mis-handling power control APIs in an app or framework,1 detectsno-sleepbugsinsingle-threaded andmulti-threadedapps, resultinginthesmartphonecomponentsstayingonforanunneces- aswellasevent-basedappswhichhavemultipleentrypoints.Like sarilylongperiodoftime. No-sleepbugsformoneimportantcat- all static analysis based tools, our detection tool can suffer false egoryofthefamilyofsmartphoneenergybugswhicharedefined positivesbuthasthetremendousadvantageofnoruntimeoverhead in[4]aserrorsinthesmartphonesystem(anapportheframework, and no falsenegatives (to the best of our abilitiesof establishing the OS,or the hardware) that cause an unexpectedly high energy thegroundtruth). consumptionbythesystemasawhole. We further present the complete implementation of our static Discussions of energy bugs on numerous Internet forums have analysis detection tool for apps written for Android. The tool is narrowedthecausestomis-handlingofpowercontrolAPIsbyapps capableofrunningdirectlyontheappinstallers(.apkfiles)and andtheframeworkonsmartphonesOSes,includingAndroid,IOS, hence source code is not required. Our implementation handles and WindowsMobile. Ourrecent survey[4] hasfound that 70% thespecificsofevent-drivenmobileprogrammingandoftheJava ofallenergyproblemsinappsandframeworksreportedbymobile languagesuchasruntimenullpointerexceptionsandobjectrefer- userswereduetono-sleepenergybugs. Theseandothertypesof ences. energybugshavecausedagreatdealofuserfrustrations. Despite (3)Detectingnewno-sleepbugsinAndroidappsandframework: theirseverity,i.e.,highbatterydrain,tothebestofourknowledge Wehaverunour no-sleepbugdetection toolon86Android apps therehasbeen nostudyof anykindof smartphone energy bugs, andtheframeworkcollectedfromtheAndroidmarket.Experimen- muchlessno-sleepenergybugs. Drawingparallelswithresearch talevaluation showsthatour toolaccurately detectedallreported ontraditionalsoftwarebugs(e.g.,concurrencybugsinconcurrent instances of no-sleep bugs, as well as 30 instances of new previ- programs[5]),acomprehensivetreatmentofenergybugsonsmart- ously unreported no-sleep bugs. These include no-sleep bugs in phoneswillrequire(1)agoodunderstandingofrealworldenergy manypopular apps, e.g.,thedefault AndroidEmailapp. Ourno- bug characteristics, learned fromcommon mistakesprogrammers sleepbugdetectionincurredfalsepositivesin13outofthe55apps make in writing smartphone apps, to lead to effective debugging itreportedtocontainabug. techniques; and (2) developing multi-faceted approaches toelim- inating energy bugs, including avoiding energy bugs during app 2. POWER ENCUMBERED development(e.g.,byprovidingbetterprogramminglanguagesup- PROGRAMMING portforpowermanagement)andcompileandruntimedetection. In this section, we first describe the energy management APIs 1.2 Our Contributions and theirsemanticsthat areexposed tothedevelopers intheAn- Thispapertakesthefirststepstowardsunderstandingandauto- droidsmartphone OS,anddiscusstheburdentheyimposeonthe maticallydetectingsoftwareenergybugsonsmartphones. Specifi- appdevelopers. WefirstdiscussprogrammingAPIsfortraditional cally,ourpapermakesthreeconcretecontributions: components (e.g., screen, CPU) and then for exotic components (1) The first characterization study of no-sleep energy bugs in (e.g.,GPS,Camera). Wealsodiscusstheissuesarisingfromevent smartphone apps: We present the first comprehensive real world basedprogrammingmodelofsmartphoneapps.Wethenintroduce no-sleepenergybugcharacterizationstudy. Ourstudyisbasedon theprominentclassofenergybugsstudiedinthispaper: no-sleep no-sleep energy bugs in real world apps and the Android frame- bugs. 1Inthispaper,weusethetermframeworktorefertoboththeser- vicesinandtheappsthatarebundledwith,theAndroidframework. 268 Table1:SummaryofpoweroperationsexportedbyAndroidAPIs. Componentlock/managername Component(s) BatteryDrain Comments (APItostart/stop) upto(%/hr) TraditionalComponents PARTIAL_WAKE_LOCK(acquire/release) CPU 5% CPUrunsdespiteanytimers SCREEN_DIM_WAKE_LOCK(acquire/release) CPUandScreen(DIM) 12% Noilluminationifshutdown,else SCREEN_BRIGHT_WAKE_LOCK(acquire/release) CPUandScreen(bright) 25% illuminatestilllockrelease(Flag FULL_WAKE_LOCK(acquire/release) CPU, Screen (bright) and Key- 25% ACQUIRE_CAUSES_WAKEUP boardbacklight forcesilluminationinallcases) ExoticComponents PROXIMITY_SCREEN_OFF_WAKE_LOCK(acquire/release) Screen,ProximitySensor 25% Screenshutsifsensoractivates LocationManager(requestLocationUpdate/removeUpdates) GPS 15% Tracksuserlocation SensorManager(registerListener/unregisterListener) Accelerometer,Gyro,Proximity Sensormanagerclasscontrols MagneticField,etc.[8] 10% varioussensorsonphone MediaRecorder(start/stop) Mic/Camera(forvideo) 20% Usuallystoresmediaonsdcard Camera(startPreview/stopPreview) Camera(forstillpictures) 20% Oneappatatimeregisterscamera thecorrespondingcomponent(theACQUIRE_CAUSES_WAKEUPflag Listing1:Anexamplepowerwakelockusage. wakesupthescreen),orkeepsthecomponentawakeifitisalready 1 PowerManager.WakeLock wl = pm.newWakeLock(PowerManager. so. Inotherwords,anacquire( )calledonanalreadyacquired PARTIAL_WAKE_LOCK); wakelock istreatedasa nop. Similarly, arelease()called on 2 wl.acquire(); //CPU should not go to sleep 3 net_sync(); //Perform critical task here anacquiredwakelocksetsthecomponentfreetosleepasfarasthe 4 wl.release(); //CPU is free to sleep perspectiveofthiswakelockisconcerned,irrespectiveofthenum- ber of times an acquire() has been called on the lock. In this sense, anon-reference counted wakelock islikeacondition vari- 2.1 Managing Traditional Components able. TheAndroidframeworkexportswakelockfunctionalitythrough In contrast, reference counted wakelocks are like semaphores. PowerManager.Wakelock 2 class, with4 different options and Eachacquire()ofawakelockincrementstheinternalcounteras- associatedAPIsformanagingseveraltraditionalcomponents:CPU, sociatedwiththe(instanceof)wakelock,andarelease()decre- screen, and the keyboard backlight. A wakelock is an instance mentstheinternalcounter. Itonlyletsthecomponentsleep(from (object in Java) of the wakelock class, instantiated using one of theperspectiveofthiswakelock)iftheinternalcountervaluereaches 4 options, and each option has adifferent effect on the hardware 0.Areleaseisanop3inothercases. component, assummarizedinTable1. Forexample,optionFULL Tomakemattersevenmorecomplicated, anacquirecanalso _WAKE_LOCKinstantiatesalockthatwhenacquiredbothkeepsthe becalledwithatimer[10],whichinstructsthesystemtoreleaseit CPU and screen on at full brightness and turns on the keyboard automaticallyoncethetimeoutintervalexpires. backlight. Third,theabovepowercontrolsemanticsarefromtheperspec- Listing1illustratesabasicwakelockusage: howtoensurethat tive of one wakelock. Unlike traditional mutual exclusion locks, the CPU does not sleep during some critical phase. The app de- differentwakelocks(eveninstantiatedwithdifferentoptions)onthe claresawakelock (pmisaninstanceofPowerManager) andthen same component can be held by multipleentities(e.g., processes acquires it, which instructs the OS not to put the CPU to sleep, andthreads) inthesystematthesametime. Evenasingleentity irrespectiveofuseractivity,sinceitintendstoperformsomecriti- may hold multiple (instances of) wakelocks on the same compo- caltask. Oncethecriticaltask(inthiscasearemotenetworksync nent. Thepowercontroleffectonacomponentmusttakeintoac- net_sync())iscompletedtheappreleasesthewakelock, indi- count the state of all wakelocks. The component is switched on catingtotheOSthatCPUcannowsleepaccordingtoitssleeping whenthefirstwakelockisheld.Onlywhenallthewakelocksfrom policies. alltheentitiesforthecomponentarereleased,takingintoaccount Semantics: The above simple usage of a wakelock is just like a thereferencecounting semanticsforeach, canthecomponent go conventionalmutualexclusionlock,i.e.,anappexplicitlyacquires tosleep,subjecttohigherlevelsleepingpolicies(administeredby andreleasesittoinstructtheOStoswitchthecomponent onand frameworkprocesses),e.g.,sleepafter5secondsofuserinactivity. off,respectively. Likeanobjectawakelockcanbesharedamong Thisdemonstratesthenewprogrammingburdeninflictedonapp several threads of a process. The semantics of wakelocks, how- developers: powermanagementisnolongerjustatransparentOS ever, arequitedifferentfromthoseofconventional mutualexclu- ordrivertask;thedevelopersnowneedtoperformexplicitpower sionlocks. managementintheapplayer. First,asshowninTable1,asinglewakelock(instantiatedwith one of the four options) controls one or more components. Sec- 2.2 Managing ExoticComponents ond,thepowercontroleffectofasinglewakelockdependsonthe Inadditiontotraditionalcomponents,modernsmartphonescome configurationofthatwakelock:awakelockcanbeconfiguredtobe with several “exotic” components embedded in them. These in- referencecounted[9]. Inanalmostpeculiarsense, itismorelike cludeGPS,camera,severalsensorssuchasanaccelerometer,prox- aconditionvariablewhenconfigurednottobereferencecounted, imity sensor, and gyroscope. Some of these components are the andasemaphorewhenconfiguredtobereferencecounted. biggest energyconsumers insmartphones, e.g., GPSandcamera, We first consider non-reference counted wakelocks. An anddrainthebatteryatahighrate[11,12,13]. acquire()onareleasedorfreshinstanceofwakelockwakesup 3NewerAndroidAPIsthrowanexceptionwhenrelease()iscalled 2Apps need android.permission.WAKE_LOCK permission from onanunacquiredwakelock.Hence,areleaseisusuallycalledafter userstousethisclass. testingifthelockiscurrentlyheld(usingAPIwakelock.isHeld()). 269 Unlikesomeofthetraditionalcomponents, e.g.,WiFiNIC,the outgoingorincomingcallinAndroidresultedin30-40distinctin- newexoticcomponentsareusedinanexpliciton-offfashion. For stancesofwakelockacquiresandreleases! example, the GPS is explicitly turned on, using the OS exported API, to acquire the smartphone location and in this state it con- 2.4 No-Sleep Bugs sumesbatteryatahighrate. Oncethelocationisdetermined, the Thenewburdenofexplicitcomponentpowermanipulationfrom componentisexplicitlyturnedoff,triggeringthecomponenttore- power-encumbered programming, combined with the complexity turntoalowpowerstate. ofhandling eventsintheappbehavior, caneasilyoverwhelmde- Likewakelocks,theexplicitpowermanagementofexoticcom- velopers and lead to programming mistakes in manipulating the ponentsplacesasignificantprogrammingburdenonappdevelop- powercontrolAPIs.IncorrectorinefficientusageofsuchAPIscan ers. An incorrect or inefficient use of these APIscan easilylead leadtoanunexpecteddrainofthephonebattery,knownasno-sleep topoorutilizationofthesecomponents,wastingsignificantbattery bugs[4]. energy(see§2.4). A“no-sleepbug”isaconditionwhereatleastonecomponentof Semantics:Table1liststheAPIsexportedbyAndroidforaccess- thephoneiswokenupandisnotputtosleepduetoamistakein ing the exotic components. Their semantics of power control of manipulating power control APIsinanapp. Thecomponent that thesecomponentsissimilartotheplainwakelocksin§2.1,i.e.,no iswoken upcontinuestodrainthebatteryforaprolonged period referencecountingortimer-basedrelease. oftime,resultinginsevereandunexpectedbatterydrain.Typically Insummary,thedevelopersareburdenedwithexplicitlymanip- thebatterydraincontinuesuntiltheappisforcefullykilled4orthe ulating power control APIsfor both traditional and exotic smart- systemisrebooted. phone components to ensure the correct operations of the apps. No-sleepbugsformoneofthemostimportantcategoriesofsoft- We call this new smartphone programming paradigm as power- wareenergybugsinsmartphoneapps[4]. Unlikeregularsoftware encumberedprogramming. bugsinapps,energybugsdonotleadtoanappcrashorOSblue screenofdeath[4]. Anapphitbyanenergybugcontinuestopro- 2.3 Issues from Event-based Programming videtheintendedfunctionality,withasingledifference: thephone The complexity of power-encumbered programming is exacer- suffersasevere,unexpectedbatterydrain. Theseverityoftheen- bated by the event-based nature of smartphone apps. Compared ergy drain due to the bug depends on the component that is not toprogrammingindesktop/server environments, smartphonepro- put tosleep. Asshown inTable1, for eachof the3components grammingisevent-orientedbecauseoftheinheritinteractivenature (GPS,Screenwithfullorlowbrightnessandcamera),theimpact ofano-sleepbugcanbeseverewiththebatterydrainingatarate ofphoneapps. Atypicaluser-facingsmartphoneappiswrittenas asetofeventhandlerswitheventsbeinguserorexternalactivities. of10-25%everyhour5withoutanyuserinteraction. Thedeveloperneedstokeeptrackofeachpossibleeventandwhen Forothercomponents, e.g., theCPUandproximitysensor, the itmaybetriggered,andmanipulatethewakelocksaccordingly. batterydrainsatarelativelylowrate-upto5%everyhour. When Weillustratehowthelevelofcomplexityintroducedbypower- a CPU wakelock (PARTIAL_WAKE_LOCK)is held, it prevents the encumberedprogrammingisexacerbatedbyevent-basedprogram- CPUfrom‘freezing’,astatewhereitwouldconsumezeropower ming through a concrete example from the Dialer app [14] that (IDLE state). In a wakelock held state, the CPU draws minimal comeswiththeAndroidframework. power(dependingontheCPUspecificationsofthehandset).How- The Dialer app implements the dialing functionality of the ever,astheCPUremainson,otheractivitiescontinuetorun,e.g., phone. The app is triggered when the user receives an incoming WiFi NIC chatters, background periodic OS processes, hardware call or when the user clicks the phone icon to make an outgo- interruptshandlingbyOS,etc. Theseactivitiestogetherconsume, ing call. To implement itsfunctionality, the app explicitly main- asmeasuredonGoogleNexusOne,about5%ofthebatteryevery tainsthreewakelocks:FULL_WAKE_LOCKforkeepingthescreenon hour. Anyadditionaluseractivityisnotaccountedforinthis. As (e.g.,insituationslikewhentheuserisdialingthenumberstocall), a result, over a long period of time, say 12 hours, an only-CPU PARTIAL_WAKE_LOCKforkeepingtheCPUon(e.g.,incaseofan wakelock bugcandrainabout 50-60% ofthebatterywithout any incoming call when the phone is switched off), and PROXIMITY userinteractionorperformingnecessaryactivities. _SCREEN_OFF_WAKE_LOCK which switches the proximity sensor onandoff(todetectuser’sproximitytothephone). 3. METHODOLOGY To manage the three wakelocks, the app explicitly maintains a Tocharacterizetherootcauseofno-sleepbugsobservedincur- state machine where the states represent the lock behavior, i.e., rent mobileapps, wecollectedno-sleepbugsinsmartphone apps which lockneeds tobeacquired and whichneeds tobereleased, infourways. (a) Mobileforums: Wecrawled4popular mobile and the “condition” of the phone represents the state transitions. Internetforums(thesameasin[4]): onegeneral forumwithdis- The conditions are diverse and include events such as (a) if the cussionscoveringallmobiledevicesandOSes,andthreeOS/com- phonegetsacall,(b)ifthephoneispressedagainsttheuser’sear panyspecificmobileforums.Intotalwecrawled1.2Mposts,from in which case the proximity sensor triggers the screen to go off, whichwefilteredoutpostsrelatedtono-sleepbugsinsmartphone (c)ifthecallends, (d)ifawiredorbluetoothheadset isplugged apps. Foreachappreportedbytheusertocontainano-sleepbug, in(e.g.,inthemiddleofacall),(e)ifthephonespeakeristurned wedownloadedthebinaryinstallersoftheversionoftheappthat on,(f)ifthephonesliderisopenedinbetweencalls,and(g)ifthe wasreportedtocontainthebugandthefirstversionwhichhadthe userclickedhomebuttoninthemiddleofacall. Foreachofthese problemsolved.Wethendecompiledtheappfrombinaryinstallers triggering events, the phone changes the state of wakelock state machine,acquiringoneandreleasinganother. 4InAndroid,apps(esp. backgroundservices)arenotkilled; they Inadditiontowakelocks intheDialerapp, theRadioInterface runinbackground onceauser stopsinteractingwiththem. They Layer(RIL)inAndroidmaintainsadditional5wakelockstohan- areusuallykilledbythesystemonlyincaseofmemorypressure. dleincomingandoutgoingcalls. Usingexplicitcomponentaccess 5TheserateswerecalculatedusingHTCnexusandmagichandsets tracingthroughaninstrumentedAndroidframeworkrunningona runningAndroid.Afullychargedbatteryholdsbetween1100mAH GoogleNexusOnehandset,weobservedthatperformingasingle to1500mAHofcharge. 270 to Java source code using ded6 [16]. For apps that were suc- Listing2:No-sleepbug:differentcodepaths. cessfully decompiled (e.g., FaceBook), westudied the root cause of no-sleep bugs. (b) Bug lists: We crawled mobile bug repos- 1 try{ 2 wl.acquire(); //CPU should not sleep itories of open source mobile frameworks like Android [17] and 3 net_sync(); //Throws Exception(s) Maemo[18]. Weextractedbugsreportedwithno-sleepconditions 4 wl.release(); //CPU is free to sleep and extracted the source code (open source) of the versions that 5 } catch(Exception e) { actuallycontainedthebugs(e.g.,no-sleepbuginAndroidSIPSer- 6 System.out.println(e); //Print the error vice[19])anditspatch(ifavailable).(c)Opensourcecoderepos- 7 } finally { 8 } //End try-catch block itories: Wescrapedthecommitlogsofopen-sourceAndroidapps hostedononlinecoderepositorieslikegithub[20]. Weextracted thecommitlogsofno-sleepbugfixesanddownloadedtheversions thatrepresentsatypicaltemplateofano-sleepbugwhereanapp bothbeforeandafterthefix. (d)Runningourno-sleepbugde- takes adifferent, somewhat unanticipated code pathafter waking tectiontool:Finally,weranoursolutionofautomaticallydetecting upacomponentandthereforedoesnotputitbacktosleep. Asin no-sleepbugsdevelopedinthispaperon86Androidappsandthe Listing1,thecriticaltaskintheapp,net_sync(),isprotected stockframeworkanddiscovered42appswithno-sleepcode-paths byacquiringandreleasingtheCPUwakelockinstructingtheCPU asdetailedin§6,§7and§9(labeledwith“*”inTable2). These not to sleep during the remote syncing phase. However, routine appsareusedinthecharacterizationstudyin§4. net_sync()maythrowexceptions[41],aJavalanguagemech- anismfornotifyingappsofsomefailureconditions,suchasacon- 4. CHARACTERIZINGNO-SLEEP BUGS nect to a remote end host failed, a string could not be parsed to Usingthebug-collectionmethodologydescribedabove,wenow integer, or a specific file to be opened does not exist. A thrown presentacase-studyofno-sleepbugsobservedinsmartphoneapps. exception is explicitly caught by the try’s catch block which We characterize the root cause and impact of the bugs. Table 2 simplyprintstheexceptionfordebuggingpurposes. Nowtheno- gives asummaryof thethreegeneral categoriesof no-sleep bugs sleepbugcanmanifestitselfinthefollowingcodepath. Firstthe wehaveidentifiedandtheirimpact. Draintimeshowstheamount tryblockexecutesandacquiresthewakelock. Nextacallisis- of time it will take to drain a fully charged battery under typical suedtothecriticaltask, net_sync(). Ifanexception israised usage.7 Withoutthebugs,ittakesabout15hourstodrainafully insidenet_sync(),thecontroldirectlyjumpstothecatchblock, charged battery. The bug references in Table 2 refer to both the thedebugoutputisprinted,andthecodeexitsthetrycatchblock. bug fix commit logs and user complaints about specific apps on Consequently, the code-path followed does not release the wake- Androidbugrepositories,allofwhichindicatetherealimpactand lock, keeping the CPU on indefinitely. To fix this problem, the user frustration caused by the no-sleep bugs. The first two cate- wakelock should be released in the finally block so that it is gories,No-SleepCodePathsandNo-SleepRaceCondition,exhibit alwaysexecuted. typicalsymptomsofno-sleepbugswhereacomponentisnotputto Alargenumberofno-sleepbugsarecausedbythissecondrea- sleepatall,whereasthethirdcategory,no-sleepdilation,represents son. Theseincludepopular appssuchasFaceBook[22],Agenda thescenariowhereacomponentwasheldonmuchlongerthanthe widget [21] (another version), MyTrack [25] (no sleep of GPS), programmer’sintention(ontheorderofhours). Belowwepresent BabbleSink [26], CommonsWare and apps [27], as well as ser- anin-depthanalysisofthesethreecategoriesofno-sleepbugs. vicesthatcomewiththeAndroidframework,suchasAndroidTele- phony[28],AndroidExchange[29,30],andWifiService[31].For 4.1 No-Sleep Code Path example,inSIPservice[19],thewakelockwasnotreleasedsince Therootcauseformostoftheobservedno-sleepbugsinasin- theobjectscontainingthewakelocksweredeletedandsothelock gle threaded activity was the existence of a code path in the app handlersweredeletedalongwiththem. that wakes up a component, e.g., by acquiring the wakelock for The third cause for a no-sleep code path is that a higher level thecomponent,butdoesnotputthecomponentbacktosleep,e.g., condition(likeanappleveldeadlock)preventedtheexecutionfrom thereisnoreleaseofthelock.Thiscategorycapturesamajorityof reachingthepointwherethewakelockwastobereleased. Thisis theno-sleepbugswehaveobservedinourbugcollection. likelytohappeninsmartphoneprogrammingbecauseevent-based Weobservedthreecausesfortheexistenceofacodepathwhere programmingofsmartphonescanleadtomanypossiblecodebranches thecomponentwasswitchedonbutnotputtosleep. (asintheexample of the Android Emailapp) that makes itdiffi- Thefirstcauseisthat theprogrammer simplyforgot torelease cult for the programmer to anticipate all the possible code paths thewakelockthroughoutthecode,ortheprogrammerreleasedthe andkeeptrackofthewakelockstate. LocationListener[33, lockintheifbranchofaconditionbutnotintheelsebranch. 34]intheAndroidframeworkcontainedsuchano-sleepbug. The Althoughitseemslikeasimplemistake, thisdoeshappeninreal developer did release the wakelock, however, a higher level app apps. Forexample,aversionoftheAgendawidget[21]contained deadlockpreventedthecodefromenteringthereleasephaseofthe suchano-sleepbug. app. The second cause isthat the programmer did put code that re- Finally, themostcommonpatternofno-sleepcode-pathbugis leasesthecomponent wakelockonmanycodepaths,butthecode aresultofthefactthatdevelopersdonotproperlyunderstandthe tookanunanticipatedcodepathduringexecutionalongwhichthe life-cycleofAndroidprocesses. InAndroid, anappactivityonce component was not put to sleep. Listing 2 shows a code-snippet startedisalwaysalive.Whentheuserexitsanyapp,Androidsaves thestateoftheappandpassesitbacktotheappiftheuserreturns 6Not all app binaries could be transformed into (meaningful) toit.Theappisonlycompletelykilledwhenthephoneiscritically source code, especially the ones that have been obfuscated dur- lowonRAMorwhentheappkillsitself.Thismethodologyisused ingcompilation(usingtoolslikeproguard[15]),e.g.,theNYTimes toreducethestartuptimeoftheappandtomaintainitsstate. Androidapp. 7Typicalusageassumesthatthephoneisusedactivelybyauserfor Thisessentiallymeansthattheappmaynotactuallybedestroyed 20%ofthetotaltime. Draintimesarecalculatedusingtheenergy for very longperiods of time. But many appdevelopers only re- drainrateinthebugstatemeasuredontheHTCNexusphone. leasethewakelockintheonDestroy()call-back,insteadofin 271 Table2:No-sleepbugcasestudy.Entrieswith(F)representbugsintheAndroidframework,andwith(*)representnewbugsfoundbyourtechnique. App Description BugDescription Drain Ref. No-SleepCodePaths AgendaWidget Popular Android widget Twobugswerereportedindifferenceversions(a)notallbranchesreleasewakelocksin 9hrs [21] managingnews/calendar AlarmService;(b)programmerforgottocallwakelockreleaseafteracquiringit. FaceBook The default FaceBook facebook.katana.HomeActivity, the central Activity, acquires wakelock to 9hrs [22] Appv 1.3.0 runFaceBookService.Notallpossiblebranchesintheservicereleasewakelocks. k9mail Oneofthemostpopular Per-threadwakelockmaintained. WakelockacquiredwhenIMAPDONEwassent,but 9hrs [23] emailclientforAndroid wasnotreleasedinMessagingControllerPushReceiverduringIDLEstat CheckinMaps Visualstoriesonmaps GPSremainson,evenwhenuserclosestheapp(callingtheonPause()handler). 5hrs [24] (*)MyTrack TrackUserPathonline GPSremainson,evenafterusernavigatesawayfromtheappdrainingbattery. 5hrs [25] BabbleSink Findphone’slocation ANullPointerExceptioncausesthethreadtoexitwithoutreleasingwakelock. 9hrs [26] CommonsWare AndroidTrainingBook Wakelockreleasedwithoutfinalize. 9hrs [27] SipService(F) Std. voice protocol im- ASiphandler(object)wasdeletedwhichhadwakelocksacquiredbeforereleasingthe 9hrs [19] plementationinAndroid wakelock.Thedeletedhandlerscannotperformrelease()toreleasethewakelocks. Telephony(F) TelephonyHandler: RIL Androidtelephonydoesnotreleasethepartialwakelockrightawayifthereisanerrorin 9hrs [28] serviceinAndroidcode sendingtheRILrequest,preventingthephonetogoinpowercollapsedrainingbattery. (*) Android Ex- Thedefaultemailappin During background syncing of mailboxes in an exchange account, the app acquires 9hrs [29] change(F) Androidframework wakelockanddoesnotreleaseinallfailureconditions,specificallyinIOExceptions. [30] WifiService(F) AndroidWiFiHandlers CPUdoesnotgotosleepduringamessageremovalandwakelockwasheldforever. 9hrs [31] PowerManager(F) PowerServiceAndroid TwoinstancesofwakelocksarenotreleasedinPowerManagerServiceinAndroid. 9hrs [32] LocationListener GPS handling library in AdeadlockinLocationManagerServiceforreleasingwakelocksafterclientno- 9hrs [33] (F) Androidframework tificationshavebeenreceivedpreventedthereleaseofthewakelocksdrainingbattery. [34] No-SleepRaceCondition (*)AndroidEmail Default Android email Raceconditionbetweenemailsynchronizingthreadandthemainthreadwhichkillsthe 9hrs [35] app(F) appperformingsync synchronizingthreadresultedinasharedwakelocktoremaininacquiredstateafterexit. [36] No-SleepDilation MyTrack TrackUserPathonline Wakelockacquiredandreleasedmuchbeforeandaftertherequiredfunctionalityinapp. 9hrs [25] GoogleBackup(F) Cloudbackup[37] Wakelockreportedtobeheldforalongdurationoftime(uptoanhour)inpoornetwork. 11hrs [38] GPSDriver(F) AndroidGPShandler WakelocksarebeingheldforlongerthanneededinlowlevelGPSdrivercode. 15hrs [39] GoogleMaps AndroidGooglemaps Appwasreportedtoholdwakelockforseveralhoursevenwhenitwasnotused. 10hrs [40] droid framework’s class CdmaConnection. This class uses a Listing3:WakelockComplexity PARTIAL_WAKE_LOCK for managing the connection and re- 1 @Override protected void finalize(){ leases the wakelock when the connection is disconnected. How- 2 /** ever, there are many different possible program paths arising 3 * It is understood that This finializer is not 4 * guaranteed to be called and the release lock from different patterns of user interactions, hardware states de- 5 * call is here just in case there is some path pendentontheexternalenvironment,etc. Thedeveloperincluded 6 * that doesn’t call onDisconnect and or releaseWakeLockinfinalizeasanadditionalsafetymea- 7 * onConnectedInOrOut. sure,eventhoughfinalizeisnotguaranteedtobecalled. This 8 */ 9 if (mPartialWakeLock.isHeld()) { exampleshowstheneedforanautomatedtoolthatcanaiddevel- 10 Log.e(LOG_TAG, "[CdmaConn] UNEXPECTED; mPartialWakeLock opersincheckingallpossibleprogrampathsforno-sleepbugs. is held when finalizing."); 11 } 12 releaseWakeLock(); 13 } 4.2 No-Sleep RaceCondition Thesecondcategoryofno-sleepbugsweobservedwascauseby raceconditionsinmulti-threadedapps. Specifically,weobserved onPause(). onDestroy()isonlycalledwhentheappcom- thatthepowermanagementofaparticularcomponentwascarried ponentisabouttobedestroyed. Asaresult,onceanappwiththis out(i.e.,switchedonandoff)bydifferentthreadsintheapp.Inthe bugisstarted,thephonewillonlysleepwhenitisrunningcritically common case, one thread switches the component on, and some lowonmemory(whichmaytakealongperiodoftime). timelateranotherthreadswitchesthecomponent off, resultingin Manually tracking all possible code paths for wakelock ac- thenormalbehaviorofcomponentutilization.However,inacorner quire/release appears to be a daunting task for app developers. case condition, it canhappen that the threadthat switches onthe Listing3showsanexampleofthecomplexityinvolvedinpower- componentgetsscheduledtorunafterthethreadthatswitchesthe encumbered programming. This is a code-snippet from the An- componentoff,resultinginano-sleepbugwiththecomponentleft 272 foranunexpectedlengthoftimebeforeitreturns,thethebatteryis Listing4:No-sleepbug:racecondition. drainedforthatprolongedperiodoftime. 1 public void Main_Thread(){ Whileitisarguablethatkeepingthesystemonduringtheexecu- 2 mKill = false; //Unset kill flag tionofacriticaltask,nomatterhowlongittakes,wasindeedthe 3 wl.acquire(); //CPU should not sleep 4 start(worker_thread); //Start worker intentionbytheappdeveloper, wecharacterizesuchsituationsas 5 //....Do Something thethirdcategoryofno-sleepenergybugs,no-sleepdilation.These 6 mKill = true; //Set kill flag areconsideredno-sleepenergybugsforthefollowingreasons: (a) 7 stop(worker_thread); //Signal worker suchinstancesofprolongedcomponentwakeupareusuallyunex- 8 wl.release(); //CPU can sleep now 9 } //End Main_Thread(); pected,evenbytheappdeveloper,aswefoundbyreadingthelog 10 public void Worker_Thread(){ ofcodecommits;(b)themobileprogrammingAPIdocumentation 11 while(true){ strictlywarndevelopersnottokeepthecomponentsawakeforpro- 12 if(mKill) break; //Break if flagged longed periods of time unless it is actually required, e.g., in the 13 net_sync(); //Critical task Skypeapp,whereauserperformingavideocallrequiresthecom- 14 wl.release(); //Rel. wl before sleep 15 sleep(180000);//Sleep for 3 minutes ponents(screen,CPU)tobeswitchedonfromthestarttilltheend 16 wl.acquire(); //CPU should not sleep ofthecallirrespectiveofhowlongthecallpersists;(c)instancesof 17 } //End while loop; no-sleepbugsinthiscategorywereobservedtocauseseverefrus- 18 } //End Worker_Thread(); tration among smartphone users since the energy drain was both severe and unexpected; and (d) the root cause of such prolonged on. Effectivelythereisaraceconditionbetweenthemanipulation completion timeof critical taskswas usually because of ahigher ofthewakelockbythetwothreads. level bug (i.e., programming mistake) in the code, which signifi- Listing4showsacodesnippetofano-sleepbugcausedbyarace cantlyinflatedtherunningtimeofcriticaltask. condition. Main_Threadrunsfirst,acquiringawakelock(waking Wefoundtwocausesforno-sleepdilationinsmartphoneapps: up a component, e.g., the CPU), and then fires Worker_Thread app delay and app optimizations. We first discuss the dilation whichperiodicallyexecutesacriticaltask,e.g.,syncingstockup- causedbyappdelayintheGPSdriver[39]inAndroid. Thedriver dates.AftereverysynchWorker_Threadgivesupthelock,sleeps8 heldwakelocksforlongerthanneeded.Insomecircumstances,af- for3minutes(allowingtheCPUtosleep),andre-acquiresthelock ter holding thewakelock, thedriver issuedawait, waitingfor an after waking up. This process is repeated in an infinite loop un- event. However, after being signaled, a second wait was issued tilWorker_ThreadisnotifiedbyMain_ThreadusingthemKill causing another wait untilthedriver wassignaledagain. Allthis flagtobreakoutoftheloop. Toinitiatetheterminationoftheapp, was done while holding a wakelock. As a result, a higher level Main_ThreadsetsthemKillflag,andcallstheAPIstoptosig- buginhandlingsignalsextendedthetimeperiodthewakelockwas nalWorker_ThreadtoinitiatethehaltwhichwakesMain_Thread beingheld. upifitisinsleepstate. Main_Threadreleasesthewakelockafter Anothercauseofno-sleepdilationobservedintheappswestud- callingstop. iedresultsfrompoorplacementofcomponentwakeupcodeinthe In the normal scheme of things, the code in Listing 4 exe- app. For example, consider the code in Listing 1. The dilation cutes without any energy bug. However, consider the follow- could happen if the app developer, instead of just protecting the ing sequence of events. Main_Threadsetsthe mKillflag, sig- critical part net_sync(), wrapped a large piece of code in wake- nals Worker_Thread to stop, and releases the wakelock. Then locks. We observed such a bug in the MyTrack app [25] where Worker_Thread wakes up, acquires the lock and exits the loop thedeveloperacquiredtheCPUwakelockthemomenttheappwas becauseofthemKillflag.Asaresult,thewakelockremainsheld turned onand releasedit whentheapp completed. However, the bytheappandisneverreleased. Akeypointtonotehereisthat criticalpartofthecodewasonlytheperiodwheretheuserclicked the semantics of the stop() API called by Main_Thread does thetrackbuttonforlocationtracking. not guarantee that the return from the call will be synchronized, i.e.,onlyafterWorker_Threadexits.Hadthatbeenthecase,there 5. DEBUGGINGNO-SLEEP BUGS would havebeen noracecondition andhencenono-sleep bugin Two general approaches exist to understanding program be- theapp. havior: those done at compile-time and those done at run-time. Tracingno-sleepbugsinappsourcecodecausedbyracecondi- Compile-time approaches incur no run-time overhead. While a tionsisparticularlyhardsinceitrequiresenumeratingallthepos- run-timeapproachcangather perfect, or nearperfect information sibleexecutionorderingsofthethreads. However,usingourauto- about a given run, a compile-time approach will (conservatively) matic techniques for detecting no-sleep bugs presented in §7, we determinefactsthatmaybetrueonanyrun. Becauseoftherun- wereabletodetectaninstanceofano-sleepbugcausedbyarace timeoverhead, compile-time approaches arepreferred when they conditionintheAndroidEmailApp[35,36,42],whichhadasim- are sufficientlyaccurate, asisthe case withour problem. In this ilarpatternasshowninListing4. paper,wepresentastatic,compile-timesolutionfordetectingno- 4.3 No-Sleep Dilation sleepenergybugsinsmartphoneapps. Our solution treats the acquire and release of a wakelock l as This category of no-sleep bugs differs from the first two cate- adefinitionof (assignment to)thevariable vl corresponding tol. gories ina singleaspect: the component woken up bythe app is A definition d of a variable vl issaid to reach some point p ina ultimately put to sleep by the app, but only after a substantially program if there exists a path from d to p that does not redefine longer period of time than expected or necessary. For example, vl. Therefore, if a definition of vl corresponding to acquiring a considerthecodeinListing1.Supposeroutinenet_sync()usu- wakelock reaches the end of some code region thereexists ano- allyfinishesinafewseconds, butduringaparticularrunithangs sleepcodepathintheregion. Thus,detectingno-sleepcodepaths 8WeassumethatthesleepAPIcallusedinListing4registersa correspondsexactlytothereachingdefinitions(RD)dataflowprob- wakeuptimerwiththeCPU,whichincasethephonefreezesduring lem[6],whichcanbesolvedbyastandardcompile-timedataflow theappsleep,wakesuptheCPUattheendofthetimeout. analysis. 273 Wefirst present, in§6, our solution when only a single thread 6.1.1 TheReachingDefinitionsDataflowProblem isbeinganalyzed,andthen,in§7,showhowtoapplytheproblem ThefirsttaskinapplyingtheseconceptstotheRDproblemisto tomulti-threadedsmartphoneappstodetectno-sleepbugsarising constructaCFG.Next,wedefinetheGENandKILLsetsforeach from races. We leave detecting and debugging no-sleep dilation blockB. Thelastassignmenttoavariablevintheblockcreatesa bugsasfuturework. definitiondofvthatcanreachotherstatementsoutsidetheblock, and therefore the definition is placed in the GEN set. Thus for 6. NO-SLEEP CODEPATHS blockB1ofFigure1,thedefinitionsatd1,d2,andd3canreach Wefirstgiveanoverviewofdataflowanalysis,andthendescribe the other blocks, and therefore are added to B1’s GEN set. The oursolutionasadataflowanalysisproblem. definition of the KILLset comes fromthe following observation: anassignmenttosomevariablevinablockpreventsanydefinition 6.1 Dataflow Analysis: AnOverview ofthevariablevoutsidefromflowingthroughtheblock.Thusany Dataflowanalysisreferstoasetoftechniquesthatascertainfacts definitionoutsidetheblockbecomemembersoftheKILLset.Thus about program properties by analyzing the effects of statements inblockB5,definitiond8causesdefinitiond4tobeinthekillset. alongdifferentpathsofacontrolflowgraph(CFG)onthoseprop- Wenow define theIN set. Consider theset of predecessors of erties. There exist many useful dataflow analysis, e.g., RD (dis- someblockB.AnydefinitionthatisintheOUTsetofoneofthese cussed above), live variable analysis (which variable values are predecessorscanreachB,andthusisinIN[B]setofB.Therefore, used after a block), and available expressions (which subexpres- IN[B]istheunionoftheOUTsetsofallofitspredecessors, e.g., sionshavealreadybeencomputedandareunchangedyet). IN[B2] is OUT[B5]∪OUT[B1]. Finally, the OUT set for a EachnodeinaCFGisabasicblockofstatements,i.e.,theblock block B must be computed. The OUT set is simply the IN set has exactly one entry point and one exit point. There exist a di- with the effects of flowing through the block applied to it. The rectededge(Bi,Bj)intheCFGconnecting everypairof blocks expressionIN[B]−KILL[B]givesthosedefinitionsthatreached Bi,BjsuchthatblockBjcanexecuteimmediatelyafterblockBi. the block and can reach later blocks, and unioning this with the There is also an edge from every exception toevery catch that GENsetgivesalldefinitionsthatcanpassthroughthisblockand might catch it. Figure 1 shows an example of a dataflow graph. reachotherblocks. ThusfB :OUT[B]=GEN[B]∪(IN[B]− TwospecialblocksareaddedtotheCFG:ENTERandEXIT.There KILL[B])isthetransferequationforblockB. existsanedgefromENTERtoeveryblockBi (cid:2)=ENTERwithno Interprocedural analysis, whichincorporates theeffectsofrou- predecessor, andanedgefromEXITtoeveryblockBj (cid:2)= EXIT tinecallsandroutinearguments,isbeyondthescopeofthisdiscus- with no successor. A forward dataflow analysis propagates facts sionbutiscoveredindetailin[44]. abouttheprogramfromtheENTERtotheEXITnodewhileaback- 6.2 No-Sleep CodePathDataflow Analysis wardanalysispropagatesinformationbackwardsthroughthegraph fromtheEXITtotheENTERnode. We now formulate the single-thread no-sleep code path prob- EachnodeintheCFGisannotatedwithtwosets:GENandKILL. lem as an RD problem, and show how to solve it using standard TheKILLsetcontainsfactsintheanalysisthatbecomefalseinthis dataflowanalysis[6]. Weonlyanalyzenon-referencecounted,no- node,andtheGENsetcontainsfactsthatbecometrue. Eachnode timerwakelocksandexoticcomponentpowerAPIs. Weleavethe alsohasanINand anOUTset. Thesetsassociatedwithablock studyofothercategoriesasfuturework. B canbedenotedasIN[B],OUT[B],GEN[B],andKILL[B].For 6.2.1 No-SleepCodePathtoReachingDefinitions aforward(backward)analysistheIN(OUT)setwillcontainfacts thataretrueimmediatelybeforethenodeisvisited,andtheOUT For no-sleep code path analysis, we are only interested in the (IN)setwillcontainfactsthataretrueimmediatelyafterthenode pointsinthecodepathwherethesmartphonecomponentpoweris isvisited. ThetransferfunctiondescribeshowtheOUT(IN)setis managed,e.g.,thepointsintheCFGwherewakelocksfortheCPU computedfromtheOUT(IN),KILLandGENsets. Forsimplicity or screen are acquired or released, or points where the camera is weonlyconsiderforwardanalysisfromthispointon. turnedonandoff. Asaresult,thedomainofthedataflowproblem CFGs with branches contain join points where multiple paths isasetconsistingofcomponentwakelocksfortraditionalcompo- cometogether,e.g.,theblockcontainingstatementsd8andd9in nents and component power management assignments for exotic Figure1. A meet operation decideshow thevalues coming from components. Forbrevity, fromnowonweusewakelockstorefer thepredecessornodearecombinedtoformthevalueoftheINset. tothepowercontrolhandlesforbothtraditionalandexoticcompo- IftheCFGcontainscycles, aniterativealgorithmthatvisitnodes nents. repeatedlyisuseduntiltheanalysisconvergestoafixedpointsuch Once the transformation is completed, the no-sleep code path thatrevisitingallofthenodesdoesnotchangethevaluesofanyIN problemisreducedtofindingtheRDinthetransformedCFG,i.e., orOUTset. ThealgorithmworksbyaddingtheENTERnodetoa findingwhichdefinitionofawakelockreachestheEXITnodeof worklist. Asnodesareprocessed,iftheirOUTsetchanges,their theCFG.Ifonlythosedefinitionsthatdeclareallthevariablesas successorsareaddedtotheworklist. Whentheworklistisempty 0(i.e.,thecomponentcansleep)reachtheEXITnode,thecodeis thealgorithmhasconvergedatafixed-point. saidtobefreeofno-sleepbugs,sinceallofthepossiblecodepaths We note that all dataflow schemas compute approximations to putallaccessedcomponentstosleepbeforereachingtheendofthe the actual ground truth. The actual problem being solved is un- CFG,andthereforetheendofthecode. decidable (e.g., constant propagation [43]). This is because it is Solvingthecodepathproblem. Wenowshowhowtoapplythe undecidable, in general, if a particular path along a CFG will be standarditerativealgorithmfordataflowanalysistosolveourno- taken during a program’s execution. As a result, dataflow solu- sleepcodepathproblem. tionsreturnconservativeorsafeestimatestotheactualproblem.A Forourno-sleepcodepathproblem,thesetofnon-zerovariable conservative approach guarantees that the resultsobtained by the definitions reaching the EXIT node represents the no-sleep code analysiswillerronthesideofsafety. ThuswhileanRDanalysis pathbugsintheapp. Table3shows,foreachblockB,theIN[B] maysaymoredefinitionsreachsomepointthanactuallydo,itwill andOUT[B]setsattheendofthreeiterations.ItshowstheIN[B] neverfailtofindalldefinitionsthatdoreachaprogrampoint. andOUT[B]setsarethesameattheendofthesecondandthird 274 Listing5:No-sleepcodepathduetoruntimeexceptions. 1 wake_lock_.acquire();//CPU should not sleep 2 Object b = xyz.getObject(); //b is a reference to an object 3 b.net_sync(); //Perform critical task here 4 wake_lock_.release();//CPU is free to sleep Listing6:Fixingno-sleepcodepathduetoruntimeexceptions. 1 wake_lock_.acquire(); 2 - client = new AppengineClient(this); 3 Log.d(TAG, "onHandleIntent"); 4 + try { 5 + client = new AppengineClient(this); 6 //.... 7 + } finally { 8 wake_lock_.release(); 9 + } Figure 1: Transforming no-sleep code path into reaching definition dataflowproblem,andtheresultingINandOUTsets. gramming typically consists of several functions which are event handlers,onecorrespondingtoeacheventtheapphandles. These iteration, and hence the algorithm has reached a fixed-point and eventscouldbeabuttonclick,anincomingcall,anotificationfrom converged in three iterations. The value of OUT[EXIT] in the server,aresponsefromacomponent (e.g.,GPS),etc. Eachevent lastiterationcontainsthereachingdefinitionsattheendofthecode: handlerisinvokedwhentheeventisfiredandthehandlermayin alldefinitionsbutd1andd4canreachtheendofthecode,including turninvokeatreeofroutinesunderneathitbeforeexiting. d2,d5andd8whichindicatetheexistenceofano-sleepbug.Their Handlingmultipleentrypointsofanappcreatesanewchallenge: presence indicatesno-sleep code paths along which acomponent each handler has itsown CFG,9 and a component may be turned (GPS, camera, and CPU wakelock_2, respectively) is woken up oninoneeventhandler andputtosleepunder another(e.g.,start butnotputtosleep. camera when start button is clicked and stop camera when stop buttonisclicked). However,theorderofexecutionofthedifferent 6.2.2 HandlingUncaughtRuntimeExceptions events, which is needed to stitch together the CFGs of different Java runtime exceptions (RTE) [45] (e.g., null pointer and ar- handlers, may be unknown at compile time and depends on user rayindexoutofboundsexceptions)canbethrownduringnormal interactions. JavaVirtualMachine(JVM)operations.RTEsthatarehandledex- Wehandlethiscomplicationasfollows. (1)Forcommonevent plicitlybyatry-catchblockincodearehandledasbeforeby handlers (e.g., onCreate, onPause) which have known in- addingapathfromthesourceblocktothehandlerblock.However, vocation orders, we simplyperform theRD analysis across them RTEsareoftennothandledbyaprogramandthethreadraisingthe onthecombinedCFGobtainedfromstitchingtogetherindividual exceptionisterminatedbytheJVMwhentheexceptionisthrown. CFGsfollowing those invocation orders. For example, if a com- UncaughtRTEsareasourceofno-sleepbugsandmustbehan- ponent is not put to sleep when the app is paused after being dledbyouranalysis.ConsiderthecodeinListing5.ACPUwake- firstcreated,itusuallyisasignofano-sleepbug. (2)Forthe lockisacquired,followedbyacalltothecriticalroutinefromin- remaininghandlers,weaskthedeveloperstospecifyallexpected stanceb.Thewakelockisreleasedafterthecall.IfaRTEisraised invocationorders, andthenperformno-sleepbugRDanalysison (e.g.,anullpointerexceptiononline3causedbybbeingnull), thecombinedCFGfortheseorders. thethreadishalted. Thisresultsinano-sleepenergybugsincethe threadterminatesbeforethewakelockisreleased.Weidentifiedan 7. NO-SLEEP RACECONDITIONS instance of this bug [26] in our characterization study. Listing 6 detailsthepatchappliedbythedevelopertofixanullpointerRTE Tostaticallydetectpossibleno-sleepraceconditionsformulti- (codelinesappendedwith“-”or“+”indicatethattheselineswere threadapps, weadapt theRDdataflowanalysispreviouslydevel- removedfromoraddedtothenewversion,respectively). Thede- opedforparallelprograms[49,50]. veloperaddedhandlersforthenullpointerRTEandmovedthelock A multi-threaded program typically has a repeating pattern of releaseintoafinallyblocktoensurethatitisrunregardlessof sequentialsectionsendingwithathreadfork,interleavedexecution anyexceptions. ofparallelthreadsfollowedbyathreadjoin,followedbythenext OurtechniqueplacesanedgefromeachRTEthatisnothandled sequential section in the pattern. Execution is sequential within withinaroutinetotheEXITnodeforthatroutine. Thiscreatesa each thread and so a CFG can be built for the thread. CFGs for pathforalockacquiredefinitiontoreachtheexit,andcouldlead different threads can be stitched together by connecting the fork tomorefalsepositives(althoughwehavenotseenthatinourtest spawning the thread withthe ENTRYnode for the thread’s CFG, cases). Techniquessuchasnullpointeranalysis[46],ABCD[47] andtheEXITnodeoftheCFGwiththejoinnodeusingaparallel forarrayboundscheck,andRTEanalysistechniques,e.g.,[48]can controledge[50]. beusedtomaketheanalysismorepreciseandgeneratefewerfalse TheRDanalysisisnowmodifiedforthisnewCFG.Threeobser- positives. vations[50]motivatethesemodifications.(1)Allthreadsinaparal- lelsectionareexecuted;(2)Anyofthedefinitionsdti,dtj,...,dtk 6.2.3 HandlingEventBasedEntryPoints tosomevariablev executingindifferentthreads,andnotordered Android app programming is primarily event-based program- 9We extract the entry point of an event handler, i.e., the root ming.Unliketraditionalcode,wherethemain()routinestartsthe of its CFG, from various .xml files in the build tree (e.g., appwiththeappexitingwhenmain()returns,Androidapppro- Manifest.xml,main.xml). 275 Table3:ComputingINandOUTforno-sleepcodepaths. BlockB OUT[B]0 IN[B]1 OUT[B]1 IN[B]2(=IN[B]3) OUT[B]2(=OUT[B]3) B {} {} {d ,d ,d } {} {d ,d ,d } 1 1 2 3 1 2 3 B {} {d ,d ,d } {d ,d ,d ,d ,d } {d ,d ,d ,d ,d ,d ,d ,d } {d ,d ,d ,d ,d ,d ,d } 2 1 2 3 1 2 3 4 5 1 2 3 5 6 7 8 9 1 2 3 4 5 7 9 B {} {d ,d ,d ,d ,d } {d ,d ,d ,d ,d } {d ,d ,d ,d ,d ,d ,d } {d ,d ,d ,d ,d ,d ,d } 3 1 2 3 4 5 1 2 3 4 6 1 2 3 4 5 7 9 1 2 3 4 6 7 9 B {} {d ,d ,d ,d ,d } {d ,d ,d ,d ,d } {d ,d ,d ,d ,d ,d ,d } {d ,d ,d ,d ,d ,d } 4 1 2 3 4 6 1 3 4 6 7 1 2 3 4 6 7 9 1 3 4 6 7 9 B {} {d ,d ,d ,d ,d ,d ,d } {d ,d ,d ,d ,d ,d ,d } {d ,d ,d ,d ,d ,d ,d ,d } {d ,d ,d ,d ,d ,d ,d } 5 1 2 3 4 5 6 7 2 3 5 6 7 8 9 1 2 3 4 5 6 7 9 2 3 5 6 7 8 9 EXIT {} {d ,d ,d ,d ,d ,d ,d } {d ,d ,d ,d ,d ,d ,d } {d ,d ,d ,d ,d ,d ,d } {d ,d ,d ,d ,d ,d ,d } 2 3 5 6 7 8 9 2 3 5 6 7 8 9 2 3 5 6 7 8 9 2 3 5 6 7 8 9 Listing7:Handlingsimplecodepaths. 1 wl1.acquire(); wl2.acquire(); //wakeup 2 if(wl1 != null) //if object is not null 3 wl1.release(); //release the wakelock 4 if(wl2.isHeld()) //if wl2 is acquired 5 wl2.release(); //release the wakelock controltransfermechanismposesproblemsforstaticanalysissince itisdifficulttodetermineatcompiletimewhichclasstheobjectis aninstanceof,orwhichhandlerwillservicetheintent,andhence which particular method (routine) will be called. We use a con- servativeapproachbyanalyzingallroutines’referencesthatcould possiblybereferredtoatruntime. Figure2: Tracingno-sleepracebug: ParallelFlowGraph(PFG)for HandlingSpecialCodePaths:Toreducethenumberoffalsepos- codeinListing4. itives, we handle the two special cases shown in Listing 7. Two wakelocks(wl1, wl2)areacquiredbeforeifconditionsandare bysynchronization,maybethelastdefinitionofvtoexecute,there- releasedundertheirrespectiveif. Wefoundthesetwousagesto forenosuchdefinitionttpcanbesaidtokillanotherdefinitiondtq, becommoninmobileapps.TheRDno-sleepbuganalysisforcode p,q∈{i,j,...,k};(3)Anykillperformedunconditionallybyany inListing7wouldflagbothlockacquiresasreachingtheendofthe blocksincetheyarenotreleasedintheelsebranches. However, threadinaparallelsection(i.e.,alongallsequentialpathsthrough inbothcases,itisevidentthatthereisnobugsinceifthewakelock the thread) kills all definitions that occur before the parallel sec- tion’sfork; (4)Anydefinitiondperformedconditionallywithina iseithernullornotheld,itneednotbereleased.Wehandlethese threaddoesnotkilldefinitionsd(cid:2) beforethethread’sparallelsec- twocommonusagesspecially,byinsertinganelsebranchtothe tion’sforksinceeitherdord(cid:2)mayreachtheparallelsection’sjoin, ifconditionwhichcontainsadefinitionofrelease(). Runtime Exceptions: In our characterization study, we did not and following statements. The details of modifying the dataflow observeanyoccurrencesofuncaughtRTEsotherthannullpointer analysis to account for these additional constraints and ordering exceptions(NPEs).Hence,inourcurrentimplementation,weonly synchronizationarediscussedindetailin[50]andarenotrepeated handle NPEs. Weleavehandling other RTEsasfuture work. To here. handleNPEs,wetracethenullreachingdefinitionsateachaccess Figure2showsthattheRDanalysisonthePFGfindsthatdefi- nitionsd3,d4andd7canreachtheEXITnode. Sinced7turnson point of theobject forevery object declared intheprogram. Ifa nulldefinitionreachesanobjectaccesspoint,weaddapathfrom thecomponent,thisisano-sleepracebug. thatpointtotheEXITnodeintheCFG. Race Conditions: Our current implementation only implements 8. IMPLEMENTATION analysisforno-sleepraceconditionforprogramswithoutsynchro- ProGuardExtension: Weimplementedno-sleepbugtracingasa nizationpoints. Weleaveasfutureworkextensionstohandlesyn- chronization points, which can be implemented by adopting the 1K-LOC extension toProGuard[15]. TheProGuardtool isused techniquesproposedin[50]. toshrink,optimizeandobfuscateAndroidcodeandhelpstomake smaller.apkinstallerfiles. ItbuildsanintermediateRepresenta- tion of the input source containing CFGsthat weuse. We chose 9. EVALUATION ProGuard since it is integrated into the Android build system, it automaticallyrunswhenanAndroidapportheframeworkiscom- Wenow present experimental resultsof no-sleepbugdetection piled and does not require a separate, manual invocation. How- usingourdataflowanalysisbasedno-sleepbugdetectiontool. We ever,sourcecodeisnotrequiredtoperformtheanalysissincePro- firstpresentasummaryofthedetectionresultson500appsrunning GuardcanrundirectlyonthebytecodegeneratedbytheJavacom- onAndroidandthendiscussfalsepositivesandtheruntimeofthe piler. If weonly have the .apk installer for an app we first use scheme. useded[16]todecompiletheembedded.dexfiles(DalvikEx- Methodology: We collected app installers (.apk files) for 500 ecutable[51])andconvertthemtoJavabytecode(.classfiles). apps, includingpopular appslikeFacebook, Googleappssuchas WethenrunProGuardandtheno-sleep bugdataflow analysison gtalk and stock apps in the Android framework including Email the.classbytecodes. andDialer. TheseincludealltheappslistedinTable2. Automatic HandlingObjectReferencesandIntentResolution:Javaobject analysis of themanifest.xmlfilefor permissions reveals that referencesandintentresolutionsaretheonlyindirectcontroltrans- 187appsexplicitlymanipulatecomponentwake/sleepcycles. We fermechanisms intheAndroidframework andapps. Anindirect thendecompiledthe.apkinstallersusingded[16]andobtained 276
Description: