ebook img

Leveraging graphical models to improve accuracy and reduce privacy risks of mobile sensing PDF

13 Pages·2013·0.95 MB·English
by  
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Leveraging graphical models to improve accuracy and reduce privacy risks of mobile sensing

Leveraging Graphical Models to Improve Accuracy and Reduce Privacy Risks of Mobile Sensing Abhinav Parate Meng-Chieh Chiu Deepak Ganesan Benjamin M. Marlin DepartmentofComputerScience UniversityofMassachusetts,Amherst Amherst,MA01003-9264 {aparate,joechiu,dganesan,marlin}@cs.umass.edu ABSTRACT Keywords Theproliferationofsensorsonmobilephonesandwearableshas Continuous context-sensing; Energy-Accuracy-Privacy optimiza- ledtoaplethoraofcontextclassifiersdesignedtosensetheindivid- tions;Mobilecomputing ual’scontext. Wearguethatakeymissingpieceinmobileinfer- enceisalayerthatfusestheoutputsofseveralclassifierstolearn CategoriesandSubjectDescriptors deeper insights into an individual’s habitual patterns and associ- atedcorrelationsbetweencontexts,therebyenablingnewsystems C.5.3 [Computer System Implementation]: Microcomputers— optimizations and opportunities. In this paper, we design CQue, Portabledevices;D.4.8[OperatingSystems]:[PerformanceMod- adynamicbayesiannetworkthatoperatesoverclassifiersforindi- elingandprediction];K.8[PersonalComputing]:[General] vidualcontexts,observesrelationsacrosstheseoutputsacrosstime, andidentifiesopportunitiesforimprovingenergy-efficiencyandac- 1. INTRODUCTION curacybytakingadvantageofrelations. Inaddition,suchalayer The past decade has seen unprecedented growth in sensor-rich providesinsightsintoprivacyleakagethatmightoccurwhenseem- mobilephonesandwearableaccessoriessuchasfitnessmonitors, inglyinnocuoususercontextrevealedtodifferentapplicationson sleep monitors, heart monitors and others. With the proliferation aphonemaybecombinedtorevealmoreinformationthanorigi- ofsuchdevices,therehasbeensignificantemphasisontechniques nallyintended. Intermsofsystemarchitecture,ourkeycontribu- todrawhigher-levelinferencesfromcontinuoussensordataaswe tionisacleanseparationbetweenthedetectionlayerandthefusion movearoundinourday-to-daylives.Researchhasshownthatsev- layer,enablingclassifierstosolelyfocusondetectingthecontext, eralaspectsofourbehaviorcanbeinferredincludingphysicalac- andleveragetemporalsmoothingandfusionmechanismstofurther tivity,sleepbehavior,socialcontext,movementpatterns,emotional boostperformancebyjustconnectingtoourhigher-levelinference oraffectivecontext,andmentaldisorders,withvaryingdegreesof engine. To applications and users, CQue provides a query inter- accuracy. face, allowinga)applicationstoobtainmoreaccuratecontextre- Thegrowinglandscapeofhigh-levelinferencespresentsanin- sultswhileremainingagnosticofwhatclassifiers/sensorsareused teresting opportunity: can we combine these inferences to obtain andwhen,andb)userstospecifywhatcontextstheywishtokeep deeper insights into individual behavior? Intuition suggests that private,andonlyallowinformationthathaslowleakagewiththe the outputs of individual inference algorithms must be correlated privatecontexttoberevealed. WeimplementedCQueinAndroid, acrossspaceandtime; afterall,theysensevariousdimensionsof andourresultsshowthatCQuecani)improveactivityclassifica- an individual’s habits, behaviors and physiology all of which are tionaccuracyupto42%,ii)reduceenergyconsumptioninclassify- inter-linked. Asasimpleexample,takethecaseofanindividual’s ingsocial,locationandactivitycontextswithhighaccuracy(>90%) mobilitypatternsandhowmuchofitcanbeinferredwithoutusing byreducingthenumberofrequiredclassifiersbyatleast33%,and anylocationsensorsuchasGPS,celltowerorWiFi.Locationisof- iii)effectivelydetectandsuppresscontextsthatrevealprivatein- tencorrelatedtosocialcontext—thefactthatphonesofcolleagues formation. areinproximity(detectedviabluetooth)canindicatethatthelikely locationistheworkplacewhereasthefactthatafamilymember’s phone is nearby means that one is likely at home. Similarly, lo- cationrelatestoactivitycontext—ausermaybemoresedentary atworkthanatothertimesduringtheday,thereforeonecouldin- ferthatthemostlikelylocationistheworkplacebyobservingthe Permissiontomakedigitalorhardcopiesofallorpartofthisworkfor levelofactivity. Morebroadly,almostallinferencesarerelatedin personalorclassroomuseisgrantedwithoutfeeprovidedthatcopiesare one way or another — studies have shown that sleep (or lack of notmadeordistributedforprofitorcommercialadvantageandthatcopies it)affectsourmoodandproductivity,socialinteractionsinfluence bearthisnoticeandthefullcitationonthefirstpage.Tocopyotherwise,to ouremotionalcontext,addictivebehavioriscorrelatedtospecific republish,topostonserversortoredistributetolists,requirespriorspecific locationsandsocialinteractions,andsoon permissionand/orafee. Theexistenceofarichinteractiongraphbetweeninferencespresents MobiSys’13,June25-28,2013,Taipei,Taiwan Copyright2013ACM978-1-4503-1672-9/13/06...$15.00. anopportunityandachallenge. Ononehand, amodelofthere- 83 lationsacrossthedifferentinferencescanbeleveragedtogainin- specificlocationcontextthattheuserdoesnotwishtoreveal. Fi- sightsintoindividualbehavior,therebyenablingnewsystemsopti- nally,throughprivacypolicies,ausercanspecifyacontextaspri- mizationopportunities.Forexample,ahighlevelinferenceframe- vatethatshouldnotberevealed,andsuppressthenon-privatecon- workmightobserverelationsbetweensemanticlocationandactiv- texts in real-time that can be used to infer private context using itylevels,andlearnthatanindividualhasmostlysedentarybehav- correlationsamongthesecontexts. Ausercanspecifythispolicy iorattheworkplace. Thisprovidestwoopportunitiestooptimize specifictoanapplicationoragroupofapplicationsthatcanpoten- inferenceofthelocationcontext,“work": a)energy-efficiencycan tiallycollude. beimprovedbyrelyingprimarilyonactivityinferenceratherthan Ourresultsshowthat: expensiveGPS,andb)accuracycanbeimprovedwhenthereissig- • CQuecananswercontextquerieswithhighconfidenceand nificanterror(e.g.indoorsettings)byfusingitwithactivitylevels. improveaccuracyupto42%byperformingfusionofinfor- Ontheotherhand,theabilitytoleveragerelationsacrossinfer- mationfrommultiplecontext-inferencealgorithms ences comes with a steep price tag — loss of privacy. A mobile application that purports to merely be using the accelerometer to • Whenenergyislimited,CQuecanlowerexecutioncostsfor detectactivitylevelsmayindeedbeinferringyourlocation. More multiple context queries by exploiting context relations to disturbingisthepossibilitythatasingleapplicationdevelopermay runfewerinferencealgorithms. Ifenergyisplentiful,CQue have several applications, one that monitors sensors for activity candecidewhatcontextalgorithmsinadditiontothequery level,andperhapsanotherthatusesbluetooth,whichmaybelever- setshouldbeexecutedtoimproveaccuracy. agedinconjunctiontorevealmuchmorethanfromtheseemingly innocuousindividualapplications. Compoundingthisissueisthe • CQueiseffectiveinassessingprivacyrisksandprovidespri- fact that we do not have tools that allow us to reason about how vacywhileensuringhighutilitytotheapplications. muchofprivacyexposureoccursfromrevealingseeminglyinnocu- Therestofthepaperisorganizedasfollows. §2describesthe oussensordataiftheadversaryweretohaveagoodmodelofthe prior work done in the related research areas. In §3, we provide relationsacrosscontexts. an high-level overview of the CQue query engine along with the In this paper, we present a mobile “big data” inference frame- descriptionofhowacontextquerycanbespecifiedinourframe- work, CQue, that takes as input streams of inferences from di- work. In§4.1,wedescribetherelationshipmodelusedbyCQue. verseon-bodyandsmartphone-basedsensors,andprovidesauni- Restof§4describesthecomponentsofCQuequeryengineinde- fiedwayforexploitingandexploringrelationsacrosstheseinfer- tailthatareresponsibleforoptimizingmultiplecontextqueriesand ences.CQuecanbeusedinseveralways:a)aninferencealgorithm addressingprivacy.Wedescribetheimplementationdetailsandthe canleverageCQuetoimproveaccuracyand/orenergy-efficiency, experimentalevaluationofCQuein§5and§6respectively.Wedis- whileremainingagnosticofhowthisisachieved,b)ausercanis- cusspossibleextensionsofthisworkin§7.Finally,theconclusions sue“whatif"queriesthatexploretheextentofprivacyleakageof areprovidedin§8. sensitiveinformationthatmightbepossibleifcertainsensordata wererevealedtooneormoreapplications,andc)ausercanspecify 2. RELATEDWORK privacypoliciestoprovideafirstorderprotectionagainstprivacy leakageofsensitiveinformationtotheuntrustedapplications. Inthissection,wedescribethreeareasofrelatedwork—model- Intermsofsystemarchitecture,thekeybenefitofCQueisthatit drivensensordataacquisition,contextinferenceformobilephones separatesdetectionfromfusion. Existingclassifiersarelargelyde- andprivacyintemporaldata. signedinastovepipemannertoaddressaspecificcontextsensing Graphicalmodelsforsensordataacquisition: Therehasbeen goalasbestaspossible.Whiletheystartwithdetectionofthespe- substantialworkonleveragingspatialandtemporalmodelsofcor- cificactivity(e.g. conversation,walking,etc),real-worldvagaries relationsbetweendistributedsensorsourcestooptimizesampling oftenresultinspuriousstatetransitionsduetoseveralconfounding andcommunicationinasensornetwork[3,5,6,12,14]. Forex- factors. Toaddresstheseissues,classifiersoftenrelyonothersen- ample, BBQ [6] uses a Dynamic Bayesian Network to select the sorsourcesthatcanidentifyconfoundersandreduceerrors. CQue minimumnumberofsensornodesfordataacquisitionsuchthatit providesacleanseparationbetweenthedetectionlayerandthefu- cananswerrangequerieswithinquery-desiredconfidencebounds. sionlayer—inthisway,aclassifiercanbedesignedtojustfocus Meliouetal[14]extendthisworktosensornetworkroutingwhere ondetectingthephenomenaofinterest,andleveragefusionmech- a query can be answered with desired confidence bounds while anismstofurtherboostperformancebyjusthookingintoCQue. traversing the minimum number of nodes in a network spanning Tousersandapplications,CQueoffersasimplecontextquery- tree. Graphical models are also used in [5], which explored the inginterfacewithsupportforseveraltypesofqueries. A“context problemofansweringrangequerieswhileminimizingtheenergy query"canrequestdifferentcontexts, whileoptionallyspecifying costofsamplingsensors,andin[3],wheresensorsusemodelsto constraintssuchasconfidencerequirementsanddelaybounds.For reducethecommunicationcostsbytransmittingsamplestoabase example,aquerymightrequestsedentaryactivitycontextwith90% station only when the ground truth is significantly different from certaintyandamaximumnotificationdelayoftwominutes. CQue thepredictionmadebythemodel. Thesimilaritieswithprioref- uses the query constraints to reason about how to duty-cycle in- forts are only in that they leverage DBNs — we use DBNs in a ferencealgorithms,andhowmuchtemporalhistorytoleverageto novelapplicationcontextwhichisforreal-timeinferenceonamo- improve accuracy and confidence. A “what if" query provides a bile phone to enable energy-accuracy-delay tradeoffs and protect measure of potential leakage of sensitive information if applica- privacy. tionswereallowedaccesstospecificsensorsources. Forexample, Context-Sensing: OurworkisclosesttoACE[15]thatproposes aquerycanrequesttheexpectedleakageof“home"locationcon- acontextsensingenginethatexploitsrelationshipsamongcontexts text if an application had access to accelerometer and bluetooth toinferanenergy-expensivecontextfromacheapercontext. The data. Internally, CQue would execute such a query by providing centraldifferencebetweenACEandCQueisthattheformeruses a measure of the correlation between the outputs of inference al- AssociateRuleMiningtolearnrulesamongcontexts(e.g.Driving gorithmsthatoperateonaccelerometerandbluetoothdata,andthe ⇒ ¬AtHome),butsuchminingapproachescannottakeacontext 84 classifier’suncertaintyintoaccount.Theabilitytotakeuncertainty Query Set into account is critical when using classifiers that operate on raw Q1: (drive | accel+gyro), (walk | accel+gyro), (stationary| accel+gyro) Q2: (with friends | bluetooth), (with spouse | bluetooth) sensordata—forexample,ifaclassifiermaydetectDrivingwith Q3: (at-office | wifi+gps,120s,0.9) alowconfidence,say0.7,GPSmaydetectAtHomewithanerror radiusof100m,andsoon. Inthesecases,theruleswouldassume Fitness App Movie Recommender Do not App Disturb App perfect context information whereas CQue would use the uncer- taintyininferringtherelations. Thus, aprobabilisticapproachis Q1 Q2 Q3 strictlysuperiortoarulebasedone. AlsorelatedtoCQueareeffortstodevelopacontext-sensingen- Query Interface gineforphonethatcanbeusedbyapplicationstorequestcontexts offload dynamic [4,13]. TheJigsawcontextsensingengine[13]comprisesofaset user privacy plan evaluation settings Dynamic Query Execution Engine to cloud ofsensingpipelinesforaccelerometer,microphoneandGPS.More (cost- accuracy - privacy optimization) recently,Kobe[4]proposedacontextqueryingengineformobile offload DBN phonesthatcanbeusedtoplugindifferentclassifiers,andthatbal- learning to ancesenergyconsumption,latencyandaccuracyoftheclassifiers Dynamic Bayesian Network cloud byoffloadingcomputationtothecloud. Unliketheseapproaches, CQuecanleverageprobabilisticrelationsacrosscontexts. CQue is complementary to prior work on context sensing en- S1 S2 S3 gineswhichhavelargelyexploredoptimizationsofindividualclas- H1 Walking H2 Driving H3 AtHome sifiers(e.g. [2,11,13,20]). CQueallowsindividualclassifiersto beeasilyintegratedwithoutworryingabouthowtoleverageother Sensors contextstoimproveperformanceandefficiency. Privacy in temporal data: There has been some work in ad- dressingprivacywheretheadversaryisawareofthetemporalcor- Figure1:High-leveloverviewoftheframeworkdescribingthe relations [8, 17, 19]. MaskIT [8] presents a privacy-preserving workflow mechanismtofilterastreamofcontextsthatcanbeusedtoanswer context queries requested by the phone applications. It presents twoprivacycheckstodecidewhenausercontextcanbereleased anycombinationofclassifiersthatusethosesensorsmaybeused whileprovidingprivacyguaranteesagainstanadversaryknowing inansweringthequery(toensurethattheapponlyusessensorsfor temporalcorrelationsmodeledasaHiddenMarkovmodel(HMM) whichithasapprovalfromtheuser). Thesekeywordscouldalso ofusercontexts. refertootherinferences,iftheapplicationneedsmoreexplicitcon- Our goal in this work is not to to prove privacy guarantees or troloverothercontextsthatshouldbefused.Finally,thequerycan designnewprivacymetrics,ratherourargumentisthattheuseof also replace this with a ‘*’ which would let CQue automatically DBNenablesconsiderablymoreflexibleprivacypoliciesincom- choose sensors and classifiers to optimize the system. The value parison with an HMM-based approach. When MaskIT identifies ‘240s’inthequeryisthedelaytoleranceanddenotesthattheap- thataprivacybreachmightoccur,itsuppressesallcontextsatthat plicationcantoleratethemaximumdelayof240secondsforeach timewithoutconsideringwhichsubsetofthemmightcontributeto recordinthequeryresponse. Thequeryprocessorusesthisdelay thebreach. Incontrast,CQueenablesmorefine-grainedreasoning windowtoimprovetheconfidenceintheoutputcontextvalue.The ofwhichspecificcontext(s)leadtoaprivacybreachsuchthatonly querycanalsoprovideaconfidencethreshold. Intheaboveexam- thatinformationcanbesuppressed. ple, the query processor will output context as soon as it reaches confidence of ‘0.8’, provided it is within the delay tolerance pe- 3. CQUEOVERVIEW riod. If confidence values associated with the context output are lower than the query-desired confidence threshold, the context is The CQue framework has been designed with the goal of pro- reportedas‘unknown’. vidinganeasy-to-useabstractiontoapplicationdevelopersandend Thenexttwoqueriesmarkedby’?’areprivacyqueriesandpro- users,bothforexploitingcorrelationsacrossinferencetoimprove vide a measure of information gain in queried context "home" if energy-efficiencyandaccuracy,aswellastousethesecorrelations any inference algorithm operating over accelerometer and gyro- toascertainprivacyleakageorcontextualinsights.Thisgoalisac- scopesensorswereused(Q2)orifwithfriendsinferencealgorithm complished using a high level query language and an underlying wasused(Q3). Inturn,thismeasureprovidesaprivacyscore,an execution environment running as a service on the phone. In the indicatorofprivacyleakage.Notethatinprovidingaprivacyscore, following,weprovideanhigh-leveloverviewoftheCQueframe- CQueislimitedtousinginferencealgorithmsthatareavailablein work. itslibrary. Context Query Interface. At the top level of CQue are ap- CQue Architecture. ThecentraltenetofCQueisanarchitec- plicationsthatissuecontextqueriesusingahigh-leveldeclarative turalseparationbetweenthemechanismsthatinferencealgorithms querylanguage.CQueprovidesasimplequerylanguagefortheap- use to improve performance. Inference algorithms are often de- plicationstorequestcontexts.Considerthefollowingthreequeries: signed in a stovepipe manner and integrate a combination of the following components: a) feature extraction, which involves ex- Q1. (drive | accel+gyro+gps, 240s, 0.8) Q2. (home? | accel+gyro) tracting time-domain or frequency-domain features from the raw Q3. (home? | with friends) sensor data, b) classification, where they look at a short tempo- ral sequence and detect a particular event or state, for example, The first query requests the context drive, and includes three walking, running, cycling, smoking, etc, c) temporal consistency otheroptionalfields.Thekeywordsaccel,gyroandgpssuggestthat mechanisms such as Hidden Markov Models (HMMs) to correct 85 mis-classifications that occur in the classification, and d) context X1,t-1 X1,t fusion where they leverage other information such as location or timetoprovideadditionalinputtocorrectclassificationerrors. X2,t-1 X2,t Incontrasttothestovepipeapproach,ourdesignseparatesthese componentsintoacleaner,layeredarchitecturewhereanyclassifier X3,t-1 X3,t cansimplyhookintoCQue,andleveragetemporalconsistencyand Time t-1 Time t contextfusionmechanismstoimproveperformance.CQuemodels temporalconsistencywithatime-seriesofpreviousobservations, Figure 2: An example of 2-step Dynamic Bayesian Network. and uses active learning mechanisms to automatically determine ThenodesinthegraphmodelthesetofrandomvariablesX= whenuserinputisneededtoimprovethismodel. CQuealsoauto- {X ,X ,X }. maticallydeterminestherelationshipsbetweenaclassifieroutput 1 2 3 andallotherobservationsfromotherclassifiers/sensorsthatithas accessto, anddetermineswhenandhowtouseadditionalobser- amongrandomvariablesatdifferenttimeinstances. Thismodelis vationstoimproveaccuracy. Inshort, itcanmaketheprocessof representedasadirectedacyclicgraphconsistingofnodescorre- designinganewinferencealgorithmaloteasierforadesigner. spondingtoeachrandomvariableateachtimeinstance.Thestatic dependenciesacrossrandomvariablesattimeinstancetarerepre- ExecutionEnvironment. TheexecutionenvironmentofCQue sentedbyedgesconnectingnodescorrespondingtotheserandom isshowninFigure1andconsistsoffollowingcorecomponents:(i) variables.Thetemporaldependenciesarerepresentedbytransition agraphicalmodel,namely,DynamicBayesianNetwork,and(ii)a edgesconnectingnodesattimeinstancet−1withnodesattime queryprocessor. TheDynamicBayesianNetwork(DBN)istheat instance t. Each node in this graph has a conditional probability thecoreofCQueandplaysanimportantroleinmakingvariousde- table(CPT)describingdependenciesonitsparentnodes. Figure2 cisionsduringqueryprocessing.TheDBNprovidesamodelofthe showsanexampleof2-stepDBNfortwotimesliceswhichcanbe relationshipsacrossmultiplecontextsforthephoneuserandkeeps unrolledtoaccommodateatime-seriesofanylengthbyduplicating trackofthetime-seriesofobservationsforvariouscontexts.Itthen thetime-slicesandtransitionedges. WeusenotationXtodenote usescontextrelationshipstoboostconfidenceinobservationsmade thesetofrandomvariables.WeuseXttodenotesetofnodesrep- byindividualclassifiersortocorrectthem.Thesecondcomponent resentingrandomvariablesXattimeinstancetanduseX1...t as isaqueryprocessorwhichisresponsibleforinteractionwithappli- shortnotationtodescribetimeseriesofnodesX1,...,Xt. With cationsincludingreceivingcontextqueriesandgeneratingthere- thisnotation,thejointprobabilitydistributionforatimeseriesof sponsestothequeries. Inaddition,dependingonthecurrentstate nodesisgivenby oftheDBNandthetime-seriesofobservations,thequeryproces- Yt sorneedstodeterminewhichcontext-inferencealgorithmsshould Pr(X1...t)=Pr(X1) Pr(Xt|Xt−1) be executed such that it can provide best answers for the queries t=2 withinthespecifiedconstraints. Thesedecisionsaremadebythe queryprocessorateverytimestepduringexecution. Asshownin In the above equation, probability Pr(Xt|Xt−1) is computed Figure1,theresultofthesedecisionsarethesignalsS1,S2,...,Sn usingtheDBN. indicatingwhichcontext-algorithmsbeexecuted.Finally,uponob- servingtheoutputofselectedcontext-inferencealgorithms,query DBNModelforCQue. InCQue,wehavethreetypesofDBN processorenforcesuser-specifiedprivacypolicies(definedin§4.2) nodes—sensors,classifier-providedcontext,andreal-worldcon- bydecidingamaximalsetofqueriedcontextswhosevaluescanbe text. The goal of the DBN is to model relationships across real- releasedtotheapplicationswithoutviolatingtheprivacypolicies. worldcontextswhiletakingintoaccounttheuncertaintyassociated Inadditiontothequeryexecution,thequeryprocessorisrespon- with the classifier-provided contexts. We considered two criteria sibleforsamplinghuman-providedcontextvaluesduringthelearn- inconstructingtheDBNmodel: a)classifiersareblackboxes,and ingphaseoftheDBN.Suchhumaninputcanbeminimizedthrough we do not assume knowledge of what sensors they utilize, what activelearningtoonlyobtaininputatappropriatetimestoimprove sampling rates they use, how they duty-cycle sensors to save en- the structure of the DBN. These human inputs shown as signals ergy, and what features they extract, and b) we wish to keep the H1,H2,...,Hn in Figure 1 are provided to the context-inference model simple and low complexity so that DBN inference can be algorithmsthatcanuseittoretrainthemselvestopersonalizefor performedinreal-timeonthemobilephone. thephoneuser. Toaddresstheseconsiderations,classifiersandsensorsaresepa- ratedfromtheDBNintoadifferentlayer.Figure3showsanexam- 4. THEEXECUTIONENVIRONMENT ple where classifiers provide the probability of a context through observationnodescalledvirtualevidences,O,totheDBN.Theun- TheexecutionenvironmentofCQueisresponsiblefori)mod- certaintyinevidenceisaccountedintheseobservationnodesusing elingandlearningtherelationshipsacrosscontexts,ii)multi-query optimizationsuchthatuncertaintyinqueriedcontextsisminimized Pearl’smethod[18]. Inthismethod,forsomecontextXiandcor- whileoperatingwithinthebudgetanddelayconstraintsofaquery, responding observation node Oi, if L(x) gives the likelihood of and iii) executing privacy queries and enforcing privacy policies. classifier stating that Xi = x if Xi is actually in state x and if Inthefollowing,wedescribeeachcomponentoftheexecutionen- L(x)givesthelikelihoodofaclassifierstatingthatXi = xifXi vironmentindetail. isnotinstatex,thentheconditionalprobabilitiesforobservation nodeOigiventhecurrentstateforrealworldcontextXisatisfies: 4.1 InferenceFramework Pr(Oi =x|Xi =x):Pr(Oi =x|Xi (cid:3)=x)=L(x):L(x) The core learning framework in CQue is a Dynamic Bayesian network(DBN),whichisaclassofBayesiannetworksthatcanrep- Withtheseconditionalprobabilitiesandgivenaseriesofvirtual resentatime-seriesofrandomvariablestomodeltemporalconsis- evidencesO1...t, wecancomputethejointprobabilitiesfortime tency. ADBNcandescribebothtemporalandstaticrelationships seriesofrealworldcontextsX1...tasfollows: 86 a)howtochoosetheinferencealgorithmstoexecutegiventhebud- X1 t-1 X2 t+1 get?,andb)howtohandledifferentdelaytolerancerequirements ofqueries? Inaddition,privacyqueriesandpoliciesrequireusto answer:a)howtoexecuteprivacyqueries?,andhowtoenforcepri- O1 O2 Virtual vacypolicies?Weaddressthesequestionintherestofthissection. Evidences Whichinferencestoexecutegivenabudget? Time t C1 C2 The CQue query processor needs to execute the set of inference algorithmsthatprovidesmaximumvalueofinformation(VOI)for S1 S2 S3 thequeriedcontextswithinthebudgetconstraints. Thebudgetis typicallyintheformofenergysinceinferencealgorithmsconsume energy,butincaseswheretheuserisinterruptedtoprovidelabels, Figure 3: A model separating DBN from the layer of sensors thebudgetmaybethenumberofinterruptionsallowedperday. andclassifiers,whereclassifiersprovidevirtualevidencestothe Duetotherelationshipsacrosscontexts,thesetprovidingmax- DBN. imumVOImaybelargerthanthequerysetassomecontextsthat arenotpartofthequerysetmaybeusefultoimprovetheinference accuracyforcontextsthatareinthequeryset. Similarly, thisset Yt canbesmallerthanthequerysetifsomeofthequeriedcontexts Pr(X1...t,O1...t)=Pr(X1|O1) Pr(Xt|Xt−1)Pr(Ot|Xt) are strongly implied by other contexts in the query. In addition, t=2 thesetofinferencesthatprovidesthemaximumVOIcanchange overtime, dependingontheuser’scurrentcontextaswellasdy- Thismodelhasseveralbenefits:a)itiscomputationallycheapboth namicsintheuser’sbehaviorpatterns. Inthissection,wedescribe whenclassifiersareturnedonoroffsinceitavoidsthecomplexity anadaptiveapproachtodecidingthesetofinferencesthatneedto ofmodelingsensorandclassifiervariablesintheDBNitself,b)it beexecutedtosatisfyquerydemandswhileremainingwithinthe requires no additional human labels for modeling these variables budget. whiletrainingtheDBN,andc)itcaneasilybeextendedtohandle Theproblemofselectingtheoptimalsetofcontext-algorithms multiple classifier implementations for the same context without toexecutegivenabudgetandthesetofqueriescanbeformalized changingtheDBNstructure. as follows. Let Q be the set of context queries, let Et−1 be the setofcontextclassifiersundergoingexecutioncurrentlyandv be DwaByNsbuysCaQgueei.nCQue. TheoutputoftheDBNisusedinseveral uthtielictyorfruesnpcotinodninggivcilnagsstihfieerVoOuItpouftst.hLeectoFnt(eQxt,sE, E,E,tf−o1r,tvh)ebqeuethrye 1. Filtering/Smoothing Most of the contexts that we observe setwhereEisthesubsetofobservationnodesOintheDBN.Let in the real world have some temporal consistency i.e they C(E)givethecostofexecutingalgorithmsforcontextsinE.IfB lastforsometimeperiod. Suchtemporalrelationshipscan isthemaximumbudget,thenourgoalcanbedefinedasidentifying be used to correct intermittent misclassifications made by asetEt i.e. thesetofcontextclassifierstobeexecutednextsuch context-inferencealgorithms. SinceaDBNmodelstempo- that: ralconsistencywiththetime-seriesofpreviousobservations, Et = argmax F(Q,E,Et−1,v) (1) aDBNcancorrectthecurrentoutputandreducetheproba- E⊆Ot:C(E)≤B bilityofacontexttakinganincorrectvalue. Apartfromthe SinceourgoalistohavehighcertaintyforquerysetQ,weconsider temporalconsistency,amisclassificationcanbecorrectedus- theinformationgainforquerysetQ(alsocalledVOI)astheutility ingrelationshipswithothercontextsiftheobservationsare functionwhichisgivenby: availableforthem. 2. HindSightThepreviouscasedescribedusinghistorytocor- F(Q,E,Et−1,v)=H(Q|Et−1 =v)−H(Q|E,Et−1 =v) (2) rectoutputatthecurrenttime.Similarly,aDBNcanbeused whereH(.)istheentropyfunction. Wemustnotethatcomputing toimprovetheconfidencesinhistoricalobservations,which optimal-setofclassifierstobeexecutedatthenextiterationisan perhapshadlowconfidence. Futureobservationswithhigh NP-hardproblem[10]. Wesolve thisoptimizationproblem using confidencecanbeusedtoimprovetheconfidenceofprevious a greedy approach as described in Algorithm 1. The results by observations. Thiscapabilitycanbeexploitedwhenqueries Nemhauser et al. [16] and Krause et al. [10] have shown that a specifyahigherdelaytolerancethreshold. greedyalgorithmprovidesasolutionwithintheconstantfactor(1− 3. ValueofInformationTheDBNcanalsobeusedtoassess 1)oftheoptimalsolutionforageneralbayesiannetwork. e the value of a context-inference algorithm. Value of infor- Our extension of optimization problem to the DBN does not mation(VOI)isdefinedastheexpectedgainincertaintyfor changetheresultsaslongasweselectnodesEtfromasetSsuch therandomvariablesinDBNifanadditionalobservationis thatthenodescorrespondingtoSinDBNareindependenttoeach made. Thisisusefulforthequeryprocessor,whichcande- othergiventhequerysetQ. InaDBN,weselectEtfromasetof cidewhatcontextobservationcanprovidehighestutility. observationnodesOtwhereeachobservationnodeisconnectedto theDBNonlythroughaparenthiddennode. Hence,anypathbe- 4.2 QueryProcessor tweenanytwoobservationnodesisalwaysblockedbyoneofthe Thequeryprocessorisresponsiblefori)multi-queryoptimiza- parenthiddennodes.Thus,anytwoobservationnodesinDBNare tioni.e. achievingtheconfidencerequirementsofcontextqueries d-separated[21]andhence,independentofeachother. while operating within query constraints, ii) execution of privacy WenotethatAlgorithm1givesnear-optimalsolutionwheneach leakagequeries,andiii)enforcementofuser-definedprivacypoli- classifierisassignedaunitcost. Inaspecificscenario,theremay cies.Multi-queryoptimizationinCQueraisesfollowingchallenges: be several considerations in determining the cost, which depends 87 onthecostofsensing,processing,orobtaininghumaninput.Ifthe metricbetweenQandC(cid:6)andisgivenasfollows: raelgsuolrtiitnhgmcaosstdseasrcerinboend-uinni[f1o0r]m. innature,wereferthereaderstoan D(Q|C(cid:6))=1− H(Q)−H(Q|C(cid:6)) H(Q) Thismetrictakesvalue1ifQisindependentofC(cid:6)andtakesvalue Algorithm1ComputeOptimalSet 0ifQisfullydeterminedbyC(cid:6).Weusethismeasureasaprivacy uIcnltaiplsiustyitfi:feuBrnuocdutigtopenuttkFs;v;q;puCreerovysitosufeutsnQocbt;isoseenrtvCOed.tsefrtoEmt−D1BNan;dcorrespondingsetof scoFroerwthheerseecaohnidghqeuresrcyotryepien,daicuasteersilsesisntienrfeosrtmedatiinonidleenatkiafygien.gor- Output:SetofobservationsE⊆Ot dered list of contexts that reveal most information about queried LetE=φ contextQ. ForeachcontextcsupportedinCQue,weuseprivacy fori=1tokdo score D(Q|c) as described above to measure information leaked o∗=argmaxo∈Ot\EF(Q,E∪{o},Et−1,v) aboutqueriedcontextQ.Werankcontextsbasedonthismetric. E=E∪{o∗} endfor Howtoenforceprivacypolicies? return E As described in §2, one of the benefits of CQue is that the DBN providesin-depthandreal-timeinformationaboutcorrelationsbe- tweendifferentcontextoutputsincontrastwithHMMsandother Howtohandledelayrequirementsofqueries? techniquesthathavebeenleveragedinpriorwork.Ourfocusisnot onidentifyingthebestprivacyexposurepolicyorprovingitspri- InCQue, weunrolltheDBNsuchthatitcanaccommodatease- vacyproperties;ratheritistodemonstratethatCQuecanbeused riesofobservationsoflengthW. Ateachtime-step,weslidethe todevelopsuchmethods. DBNwindowoverobservationssuchthattheoldestobservations To demonstrate these benefits, CQue supports a privacy policy aredroppedandthelatestobservationsareaddedtotheDBN.As thatcansuppresscontextvaluesthatleadstochangeinconfidence a result of this sliding window, any context observation resides ortheprobabilisticbeliefaboutaprivatecontextgreaterthanthresh- in DBN for exactly W time-steps. Hence, we can compute the oldδ.Suchapolicycanprovidegreaterprivacycontrolthanasim- probabilityforqueriedcontextsW times,correspondingtohaving pler policy that just blocks a specific private context from being 0,1,2,...,W −1 observations from the future. This allows our revealed.InCQue,ausercandefinethesepoliciesspecifictoapo- frameworktoansweraquerywithadelayuptoW −1time-steps. tentialadversarialentitywherethisentitycanbeanapplicationor Our framework can use this delay period in three cases: i) if the agroupofapplicationsoralltheapplicationsontheuser’sphone. sequenceofhistoricalobservationshavelowconfidenceresulting Apolicyforagroupofapplicationscanbeusefultoprotectagainst inlowconfidenceforthelatestobservation,ii)ifthesequenceof informationcollusionamongtheseapplicationsinagroup. historicalobservationshavefluctuatingvaluesforthecontextsindi- In CQue, we assume a strong adversary having access to the catinglowconfidenceintheoutputofcontextinferencealgorithms, user’s DBN. Now, we enforce user policy by using the DBN to andiii)ifthelatestobservationisdifferentfromthehistoricalob- calculatetheconfidenceortheposteriorprobabilityfortheprivate servations.Inthefirsttwocases,wewaitforthefutureobservations context using the output for the set of contexts, requested by the andhopethattheseobservationsareconsistentandhavehighcon- adversarialentity,asobservationsinDBN.Also,wecomputeprior fidence.Ifthishappens,weboosttheprobabilityofqueriedcontext probabilityfortheprivatecontextwithoutusinganyobservations inhindsight.Inthethirdcasewherethelatestobservationisdiffer- intheDBN.Ifthedifferencebetweenposteriorprobabilityandthe entfromthehistoricalobservations,itcanhappeneitherbecauseof prior probability is greater than δ, then we suppress the value of intermittentmisclassificationbytheclassifierorbecausethecon- thecontextthatcausesthemaximumchangeinprobabilities.Ifthe textvaluehasactuallychanged.Anyfutureobservationcanhelpin changeinprobabilitiesisstillhighwiththeremainingcontextout- distinguishingthesecasesandimprovethecertaintyinthecontext. puts,werepeattheprocessofremovingcontextvaluethatcauses maximumchangeuntilwereachthethresholdlimitδ. Werelease Howtoexecuteprivacyqueries? theremainingcontextstotheadversarialentity. Unlikequeriesthatrequestcurrentstateofanindividual, privacy 4.3 LearningtheGraphicalModel querieslookforaggregateinformationregardingthemutualinfor- mationsharedbetweenacontextandsensordataorothercontexts Now,wedescribethechallengesinvolvedinlearningtheDBN availableinCQue.Inourframework,weprovidetwotypesofpri- modelforthephoneuser. Therearetwochallengesinlearningthe vacyqueries:i)userspecifiesaquerycontext,Qandwantstoknow DBN:a)howtolearnstructurewithoutexecutingallsensorsand informationrevealedaboutthequeriedcontextfromaspecifiedlist contextinferencealgorithmscontinuouslyonthephonesincethis ofsensors(S)andcontexts(C);andii)userspecifiesaquerycon- consumessignificantenergy,andb)howtominimizeinterruptions textQandwantstoknowrankedlistofcontextsalongwithapri- ofthephoneusertoobtaingroundtruthlabelsforlearning. vacy score that indicates how much information is revealed by a Context relationship hints: The learning process can degrade rankedcontextaboutthequeriedcontext. Thus,thesequerieshelp userexperienceduetotheenergycostofrunningseveralcontext userunderstandhowmuchinformationcanberevealedaboutcer- inference algorithms concurrently on the phone. While it is pos- taincontextfromothercontexts. Together,thesetwoquerytypes sible to randomly sample a few contexts at a time, this can slow enableuserstomakeadecisionregardingtheirownprivacy. downthelearningprocessastherandomapproachmaynotalways Forthefirstquerytype,givenauserspecifiedsensorlistS we samplerelatedcontextstogether.Toperformefficientsamplingand identifycontextclassifiersCSthatcanexecutegivensensorsS.In facilitatefasterlearningofpersonalizedrelationships,wemaintain, additiontoS,usercanprovideasetofcontextsC. Now,wewant foreachcontext,alistofcontextstobesampledtogetherthatmay tohaveanuncertaintymeasureinqueriedcontextQifcontextsin possiblyhavesoftrelationships.Thesoftrelationshipsmayormay C(cid:6) = C∪CS areobserved. Weuseanormalizedvariantofmu- not hold for individual users and it does not necessarily result in tualinformationbasedoninformationtheorythatgivesadistance edgesinDBNconnectingsoftly-relatedcontexts.Asitmaynotbe 88 possibletoenvisionallthesoftrelationships,westillperformran- isbeneficialasitprovideshigherclassificationaccuracyandbet- domsamplingandbiasthesamplingforcontextsthatseemtobe teruncertaintyestimates. Theuseofpersonalizedclassifiersalso related. benefitstheDBN—withpersonalizedclassifiers,thecorrectDBN Whiletheuseofsoftrelationshipscanaddresstheefficientsam- parameterscanbelearntusingfewerfuturehumaninputresulting plingproblem,wecanspeedupthelearningprocessandparameter inshortertrainingperiod.CQueprovidesanAPItothedevelopers estimationinDBNbytheuseofhintsthatcanbeprovidedbythe ofcontextclassifierstoreceivehuman-providedlabelsthatcanbe context-inference algorithm developers. A context developer can leveragedforpersonalization. includehintsforeachcontextthatidentifypositivecausesandneg- ativecausesforit.Asanexample,drivingisadirectpositivecause 5. IMPLEMENTATION foruser’slocationbeingonstreetwhereasuser’slocationbeingat WenowdescribetheimplementationofCQueonAndroidsmart- homeisanegativecauseforonstreet. Thesehintscanbedirectly phonesrunningAndroidOSversion2.2orhigher. utilizedinsettingtheparametersforrelationshipsthataretruefor Inference Engine: The CQue inference engine is designed to amajorityofusers. run in real-time on a mobile phone to avoid any delays incurred Minimizinghuman-providedlabels: Oneofthechallengesin in accessing the cloud. The query processor maintains a model learningthedynamicbayesiannetworkisthatwedonothaveac- of a DBN that is personalized to the phone user. This model is cess to ground truth for contexts. While ground truth can be ob- storedasafileinanXMLBIFformatwhichisaninterchangefor- tainedbyhavingtheuserprovidegroundtruthforaninitialtraining matthatisrecognizedbymostbayesianinferencesoftwares.Inour period,wewishtominimizeinterruptionsandhenceusethisoption implementation,weusethewellknownvariable-eliminationalgo- sparingly.Instead,wecansamplethecontextsinferredbythealgo- rithm’simplementationforprobabilisticinferenceprovidedbythe rithmsalongwiththeconfidencevalueassociatedwiththeoutput. JavaBayespackage[1]. Onedownsideisthataccurateinference Theinferredcontextvaluesprovideuswiththepartialobservations usingvariable-eliminationcantakeexponentialtimeinthenumber oftheunderlyingMarkovianprocess.WeusetheStructuralEMal- ofvariablesinthebayesiannetwork. Whilethiswasnotanissue gorithmdescribedin[7]tolearnthestructureoftheDBNfromthe forthenumberofcontextsinourcurrentimplementation,thismay partialobservations. Inthisalgorithm,thestructureoftheDBNis beabottleneckasthenumberofcontextsincrease. Futureimple- improvediterativelyuntiltheMDLscoreofthestructureB given mentationsofCQuewillusefastapproximateinferencealgorithms atrainingdatasetDconverges. LetussupposeastructureBcon- such as importance sampling and MCMC simulation to optimize sistingofnrandomvariablesX1,...,Xn. Weusethenotationxi inferenceperformanceonthephone[21]. ratrenasdipneΠicnxtgiivdetoalytad.eDAnloswtoeh,aeNnre(axXssiii,gΠ=nmxxie)indatenfnodorPtXeasi(nXaunmid)bi=tesrpΠoafxriein.nstLtaseenttcNePsabi(neXtthhiee) QpcelaurtneariwnyhePenlreaerngth:yebduAedcsgiseditiosincsuprseesgreiadorddiiincnag§l4lwy.2hr,aeCt-vcQiosunietteeudxsttesostcoaaepdxtyuenrcaeumtdeyicgniaqvmueenircyas totalnumberofinstancesintrainingdata.Then,theMDLscorefor aDBNstructureBgivendatasetDisgivenas: intheincomingcontextstream. Suchdynamicadaptationcanbe expensivetoperformonthephonesinceitdependsonthenumber X˘ X ¯ of contexts in the DBN. So, for infrequent plan evaluation or if E[N(xi,Πxi)|D]logθ(xi,Πxi)−log2N#(Xi,Pa(Xi) the size of the DBN is small, our implementation performs such i xi,Πxi planningonthephone, butotherwiseoffloadsthecomputationto thecloud. where θ(xi,Πxi) = E[EN[(Nxi(,xΠi)x|iD)|]D] and #(Xi,Pa(Xi) is the DBN Learning: While the execution engine can execute on a number of parameters needed to represent Pr(Xi|Pa(Xi)). We mobile phone, learning the DBN is more computationally inten- needtotakeexpectedcountsaswehaveaccesstoonlypartialob- siveandrequirescloudsupport. WeimplementedtheDBNlearn- servations. Wesimplifythelearningprocessbyfirstlearningthe ingalgorithmusingpartialobservationsbymodifyingtheWEKA staticnetworkoftheDBNfollowedbylearningthetransitionedges package[9].CQueoffloadstheprocessoflearningtheDBNtothe ofthenetwork.Thisprocessissuboptimalinnaturebutithasbeen cloudbysendingtheappropriatecontextinstancestotheserverat showninpracticetoyieldparametersestimatesclosetooptimal. theendofeachday. TheDBNislearntattheendofeverydayin WhilelearningtheDBNwithoutanyhumaninputisideal,this thecloudandissentbacktotheCQueframeworkinXMLBIFfor- canleadtoerrorsparticularlyifthetrainingandtestcontextdistri- mat. Overtime,thefrequencyofupdatingtheDBNreducesasits butionsareverydifferent. Thus,thereisaneedforatleastsome structureandparametersstabilize.Also,tofacilitatefasterlearning corrective human input. We randomly sample human input for a ofDBNparameters,wemaintainaconfigurationfilethatprovides smallfractionofcontextsin-ordertocorrectDBNparameters,and initialparameterestimatesfortheDBN,basedonaveragenumbers assign these human-provided contexts higher weight over the in- from a general population. The configuration file has entries for stancesobtainedusingcontext-inferencealgorithms. each context containing a list of positive and negative causes for it along with the conditional probability. For example, the nega- 4.4 ClassifierPersonalization tivecauselistforthecontextatofficelookslike: athome,0.99;at So far, our discussion has assumed that the underlying context store,0.85;atrestaurant,0.93;driving,1.0. classifiersareblack-boxesi.e.theydonotexposetheirinternalbe- QueryProcessor: Weimplementedthecontextengineasaback- haviororallowchanges. Thus,theDBNislimitedtothecontext groundservicerunningonandroidphones. Thisserviceisrespon- outputanduncertaintyprovidedbytheclassifiers. Anaturalques- sible for activating context-inference algorithms and appropriate tioniswhetherwecandobetterifwewereableprovidefeedback sensors, and providing the context values generated by the algo- fromtheDBNtotheclassifierin-ordertoimproveitsperformance rithmstothequeryprocessor. Communicationbetweenthequery further. processorandthecontextengineusesAndroidIPC.Ourimplemen- In CQue, any human input provided to the DBN can not only tationcurrentlysupportssemanticlocationcontexts,socialcontexts beusedtocorrectDBNparameters,butalsobeusedtopersonal- andactivitycontexts: a)Activitycontexts: walking, drivingand izetheclassifiers. Aclassifierpersonalizedforanindividualuser stationary, b) Social contexts: with friends, with colleagues and 89 alone,andc)Locationcontexts: atrestaurant,athome,atoffice, ActivityDataset: Whiletherealityminingdatasethasrichmulti- atstoreandonstreet. sensordata,itreliesonself-reportsforactivityclassification,hence Ourimplementationusesdecision-treeclassifiersforactivityrecog- itdoesnotallowustounderstandhowaccuracyofactivityclassi- nition. Weusedwell-knownsetoffeaturesextractedfromthe3- fierscanbeimprovedthroughtheuseofCQue.Toaddressthis,we axisaccelerometerandgyroscopereadings.Thefollowingfeatures collectedatraceofusercontextsfortwoweeksfromsevenusers. werecomputedforaccelerometerreadingsalongeachofthethree Eachofthesevenusersprovidedabout50contextlabelsperday. axis: mean, standard deviation, mean-crossing rate, energy given Sincetherecouldbelabelingerror,wecorrectedactivitycontextin- asnormalizedsumofsquareddiscreteFFTcomponent,peakfre- formationusingGPSinformationandmanualcorrectionattheend quency in FFT and peak energy. In addition, we computed cor- ofeachday. Allusersprovidingdataweregraduatestudents. We relationbetweeneachpairofaxes. Thegyroscopereadingswere useddatafromfouruserstotraintheactivityclassifiers. Fortest- usedtocomputemeanandstandarddeviationofangularvelocity ing,weusedtheseclassifierstoclassifyactivitiesfortheremaining aroundeachofthethreeaxis. Forclassifyingsocialandlocation threeusers. Foreachuserinthetestset,wedividedtheirdatainto contexts,weusedauser-providedmappingfrombluetoothdevices twoweeks—inthefirstweek,apersonalizeduser-specificDBN to the social context and a mapping from WiFi access points to istrainedusingtheoutputsoftheclassifieralgorithms,andafew thecorrespondingsemanticlocation(eitheruserprovidedorusing humanlabelsobtainedfromtheground-truthset(dependingonthe publiclyavailabledatabases). Unlikeactivitycontexts,theclassi- interruptionlimits,andlabelselectionscheme),andthedatainthe fiedsocialandlocationcontextshavehigherconfidenceassociated secondweekisusedtoevaluateCQue. withthem.Whilewerestrictedourselvestotheabovecontextsdue tothenatureofourdatasets,ourarchitectureisgeneralisdesigned SetofContexts. toaccommodateothercontext-enginesthatmaybeaddedbyprac- Inboththedatasets,wehaveasetofcontextsthatincludeslo- titioners. cation contexts like at home, at office, on street and three activ- ity contexts, namely, walking, driving and stationary. For social UserInterface: TheCQueframeworkprovidesaninterfaceto contexts,weusethegroupsofbluetoothdevicesinthecloseprox- theusertospecifyanenergybudgetandpreferencesforinterrup- imityofauserasthesocialcontext. InActivitydataset,wehave tion(duringtraining)andprivacy.Theenergybudgetcanbespeci- thesegroupsexplicitlylabeledasfriends,colleagues,roommates, fiedasafractionofafullbatterythatmaybeusedforrunningCQue spousewhereasinRealityminingdataset,wedonothaveexplicit andabatterythresholdbelowwhichCQueshouldbestopped. For labelsbutweidentifythesegroupsasgroup-1,group-2andsoon. interruptionpreferences,ausercanspecifythetotalinterruptsper- Foreachcontext,thecorrespondingclassifieroutputstrueorfalse mittedperdaywhiletrainingtheDBN.Forprivacypreferences,a valuealongwiththeconfidencevalueassociatedwiththeclassifi- usercanselectprivatecontexts,selectalevelofprotectionagainst cationoutput.WeusethesevaluesasinputinDBN. adversaryfromlow,mediumandhighwherethesevaluesmapto valuesforthresholdforchangeinadversarialconfidenceδ = 0.4, δ = 0.25andδ = 0.05respectively. Additionally,usercanselect Evaluationmetrics. theapplicationsforwhichthesepreferencesapply. We use three performance metrics in our evaluation: 1) Accu- racy,whichiscomputedasthefractionofthesecontextsthatare correctlyclassified,2)Confidence,ortheprobabilityofthemost 6. EXPERIMENTALRESULTS likelyvalueforthecontextprovidedbytheDBN,and3)F-Measure which gives the harmonic mean of recall and precision (higher Inthissection,wedescribethesetofexperimentsperformedto scoreisbetter) evaluate our framework. We first describe the data sets and the evaluationmetricsusedfortheexperimentalevaluation. Next,we 6.2 Cost-AccuracyTradeoffsusingDBN presentouranalysisofprivacyforvariouscontexts.Wethenlookat theenergy-accuracytradeoffsanddemonstratethebenefitsofusing OneofthebenefitsofaDBNisthatitallowsatradeoffcostfor DBNovercontext-classifiersusingasetofexperiments,andcon- accuracy — by understanding the relations across different con- cludewithanevaluationofhowvariousparameterssuchasdelay, texts,aninferencealgorithmcanbeturnedofftoreduceoverhead. andinterruptionbudgetimpactresults. While the “cost" can be different depending on the sensing and communicationneedsoftheinferencealgorithm,ortheburdenof 6.1 DataSetsandEvaluationMetrics user input, we use a simplistic model where we assume that all classifiershaveequalcost.Thismodelprovidesanintuitiveunder- standingofthecost-accuracytradeoffs. Datasets. Inordertoconductourexperiments,weusedfollowingtwodatasets: Classifiers<Queries. Wefirstlookatthecasewherethesys- Reality Mining Dataset: This data set contains data collected temrunsfewerclassifiersthanthenumberofqueriesandexploits continuouslyfor100studentsandstaffatMIToveracademicyear thecontext-relationshipstoanswercontextqueriesforwhichthere 2004-2005. This data provides various contexts like user’s loca- arenoobservationscomingfromtheclassifier. Thiscasedemon- tion(work, home, other)basedoncell-towerobservations, social stratesthattheDBNcanbeusedtoanswerthecontextqueriesin contextsbasedonproximityofbluetoothdevicesandphysicalac- expectationwhichisnotpossibleotherwiseusingclassifiersalone. tivities like stationary, walking and driving based on self-reports. WeevaluatethismodelusingtheRealityMiningdataset. The duration of user traces varied from 30 days to 269 days. In Inthisexperiment,weconsiderallthecontextsthatwesupport our evaluation, we used data from 37 users who had data for at to be in the query set. We then vary the budget such that it can least10weeks. Thetracesforeachoftheseuserscontains3loca- executebetween1to5classifiers. Basedonthebudget,thesetof tioncontextsand3activitycontexts. Inaddition,wederivesocial classifiersarechosenthatcanprovidemaximuminformationgain contextsusingclusteringofbluetoothdevicesthatappearcloserin asdescribedin§4.2. time, thusthenumberofsocialcontextsvariedfromusertouser Figure 4 shows the accuracy and F-measure averaged over 37 andtherewereatleast2suchcontexts. usersasthenumberofclassifiersincreases.Weseethatboththese 90 1 1 0.9 0.9 0.8 0.8 0.7 0.7 cy 0.6 a 0.6 ur 0.5 cc 0.4 0.5 A 0.3 0.4 0.2 0.1 0.3 0 0.2 Accuracy(Static) U1U2U3 U1U2U3 U1U2U3 0.1 AFcc-Mureaacsyu(Drey(nSatmatiicc)) Walk Stationary Drive F-Measure(Dynamic) - 0 1 2 3 4 5 Classifier Number of Classifiers DBN-Gain (a) Figure 4: Effect of varying number of executing context classifiers whennumberofqueries(≥6)ishigherthanthenumberofexecuting 1 0.9 contexts. ThefigureshowsaggregateaccuracyandF-measureover37 0.8 usersforstaticanddynamicplanexecution. 0.7 e ur 0.6 s a 0.5 e M 0.4 metricsimproveasweincreasethenumberofclassifiers,andeven F- 0.3 withtwoclassifiersexecuting,wegetfairlygoodaccuracyandF- 0.2 measure. Thus, significant benefits can be obtained with only a 0.1 smallnumberofclassifiersbyleveragingaDBNmodeloftherela- 0 U1U2U3 U1U2U3 U1U2U3 tionsacrosscontexts. Walk Stationary Drive Itmightseemsurprisingthatveryhighaccuracycanbeachieved - evenwithasingleclassifier. Infact,thisisbecausemostcontexts Classifier havebiaseddistribution(e.g. thestationarystateistrue(> 90%) DBN-Gain ofthetime),hencehighaccuracycanbeachievedbyjustusingthe (b) modelinexpectationwithnoclassifiersexecuting! Butthisgives poorrecall,precisionandconsequentlytheF-measureislow,and Figure5: AccuracyandF-Measureforvariousactivityclasses morecontextinputisrequiredtoimprovethesemetrics. Figure4 forClassifieronlyandgainprovidedbyClassifier+DBNmech- showsthatDBNachieves75%F-measureand94%accuracyusing anismondatafrom3users.ResultsshowthatusingtheDBNis only4classifiersresultinginatleast33%costreduction. beneficialinmostcases,exceptinthecaseofrarelyoccurring Figure 4 also compares the use of a dynamic plan where the contexts. queryplanisre-evaluatedevery20minutesvsastaticplanwhere thequeryplanisevaluatedonce.Thedynamicplanprovidesbetter accuracyandF-measurethanstaticplanacrosstheboard. Thisis correcttheoutputsprovidedbytheclassifiersandhence,improve becausethedynamicplanselectsappropriatesetofclassifiersusing recall and precision. Since the DBN performs corrections using valueofinformationprovidedbytheobservedclassifieroutputsin thetemporalconsistency,therecallvalueforacertaincontextmay real-time whereas static plan selects the set of classifiers without dropifthecontextisrarelyseenandlastsforaveryshortduration. consideringthereal-timeclassifieroutputs. Inourtraces,weobservedthatthewalkingactivitywasinfrequent. Insuchacase,iftheclassifierdoesnotgenerateoutputwithhigh Classifiers>Queries. Wenowlookatthebenefitsofusinga confidencetheDBNcansmoothoutthewalkingcontextresulting DBNwhenthenumberofclassifersthatexecutearegreaterthanthe inreducedrecallbuthigherprecision. Asaresult,F-measuredoes numberofqueries. WeusetheActivitydatasetinthisstudy,since not see significant gain. In contrast, the driving context though we have raw data for activity inferences, which are typically the rarelyseenascomparedtostationarylastslongerandhence,sees contextswithhighestinaccuracy. Welookattheperformancefor improvementinrecallandhence,improvesF-measure. thethreeactivitycontexts(walking,drivingandstationary),which 6.3 EvaluatingPrivacy havehighestuncertainty. Wecomparetwomechanisms: a)using justtheactivityrecognitionclassifiers,andb)usingallthecontext In this section, we evaluate two aspects of how CQue can be classifierswithDBN.Weshowresultsforthreerepresentativeusers usefulindealingwithprivacybreachesduetocorrelationsacross inthedataset. contexts. Figure5(a)showsthataccuracyisworstwhenweuseonlyclas- Location-SocialRelations: Inourfirstexperiment,welookata sifiersalone,andtheuseofDBNimprovesimproveaccuracysig- twosensorscenarioandunderstandthecorrelationsbetweenthese nificantly. TheimprovementinaccuracybyusingDBNforusers contextsacrossdifferentdatasets. Recallthatlocationisobtained U1andU2isabove24%forbothstationaryanddrivingcontexts. through GPS and social interactions through bluetooth. We use Improvementsaresmallforwalkingactivity(2%)sinceDBNnei- CQuetounderstandhowaprivacybreachcanoccurforaprivacy- ther observed any context strongly related with walking nor the sensitivecontextthroughindirectobservations. Here, weassume temporal consistency. This was because walking was a rare con- astrongadversarythathasaccesstothepersonalizedDBNfora textandrarelylastedforlongerthanafewminutes. user. Potentially, this is possible if various apps on a phone that Figure5(b)showsF-measureforvariousclasses. Weseethat observedifferentcontextsdecidetocolludeandcombinetheirdata thismetricgenerallyimprovessincethemainroleoftheDBNisto togenerateacompleteDBN. 91 (a) PrivacyleakageinAllLocationContexts 1 Social | Location-RM 0.9 Location | Social-RM Observation Social | Location-AD Metric None Allsocial 0.8 Location | Social-AD contexts 0.7 Accuracy(RM) 73.9±3.0 74.81±2.73 0.6 Accuracy(AD) 61.11±5.55 88.9±6.26 F D 0.5 C (b) PrivacyleakageinAllSocialContexts 0.4 Observation 0.3 Metric None Alllocation 0.2 contexts 0.1 Accuracy(RM) 73.86±2.9 74.18±2.82 0 Accuracy(AD) 64.01±15.29 87.3±5.94 0 0.2 0.4 0.6 0.8 1 Privacy Score Table 1: Cross-context privacy leakage in location and so- Figure6: Cumulativedistributionofprivacyscoresforcross- cialcontextsfori)RealityMiningdataset(RM),andii)Activity contextpairsini)RealityMiningdataset(RM),andii)Activity dataset(AD).Foractivitydataset,weseesignificantincreasein dataset(AD). From the distribution of privacy scores, we can accuracy for all the location contexts when all the social con- concludethatthereissignificantcross-contextprivacyleakage textsareobservedandviceversa. Forrealityminingdataset, inActivitydatasetwhereasnosuchleakageisobservedinRe- accuraciesdonotchangewithcross-contextobservations. alityMiningdataset. Thisemphasizestheimportanceofper- sonalizedDBNtoevaluateprivacyforauser. 1 0.8 Givensuchanadversary,aprivacy-sensitivecontextcanbein- d e ferredbyobservingoneormoreothercontexts. Usingourprivacy as 0.6 e score,wecanrankcontextsinthedecreasingorderofitscapabil- el R itytoinferprivacy-sensitivecontext. Inthisexperiment,welook a 0.4 twoscenarios: a)allcaseswhereweuseonelocationcontextasa Dat privatecontext(p)andasocialcontextoasobservation,andb)the 0.2 reversescenariowhereasocialcontextareprivateandlocationis observed. We evaluate privacy score D(p|o) for each pair (cid:5)p,o(cid:6) 0 ofcontexts. Figure6givesdistributionofprivacyscoresforthese 0.4 0.2 0.1 0.05 scenariosfortheRealityMiningandActivityDatasets. δ Theresultsareinteresting—weseethatthereisconsiderably highercorrelationacrosslocationandsocialcontextinthecaseof Figure 7: Fraction of data that can be released to prevent the activity dataset than in the reality mining data. This is also change in adversarial confidence greater than δ for a private reflected in Table 1, which shows the accuracy of inferring loca- contextathome. tion/social only based on the prior distribution of these contexts v.s.observingtheothercontext.Theresultsshowthatthereisonly releasedasaresultofthissuppressionforvariousvaluesofδwhere asmallchangeinthecaseoftheRMdataset,buttheaccuracyin- wechoseathomeastheprivatecontextandrestofthecontextsas creasesbymorethan25%inthecaseoftheADdataset. Inother queries.Asmallervalueofδresultsinmuchtighterprivacycontrol words,anadversarywouldbeabletoinferanindividual’slocation butreleasestoofewcontexts. withsubstantialaccuracyiftheyonlyhadaccesstothebluetooth Table2showsthefractionofdatasuppressedincontextofeach informationontheADdataset. Ourexplanationfortheseresults category: social,location,andactivityforvalueofthresholdδ ∈ isthatbluetoothusageisfarmoreprevalentinrecenttimes,there- {0.1,0.4}. In this case, we choose at friend’s place as a private forethecorrelationshaveincreased. Overall,ourresultsshowthat context available in Activity dataset. We can see that CQue sup- CQuecanbeusedtoprovideintuitionaboutthecorrelationsacross presses in an intelligent manner where it suppresses highly cor- contexts, therebyenablingmoreinformeddecisionaboutwhatto relatedsocialcontextmorefrequentlythanthelesscorrelatedlo- expose. cation or activity contexts. If we were to use a mechanism like SuppressionPolicyforPrivacy: Inoursecondexperiment,we MaskIT [8], it would suppress every context at the same level. lookathowCQuecanbeusedtoimplementareal-timesuppression SinceCQuereleasesmorecontexts, itresultsinhigherutilityfor policy for protecting privacy. Our policy is intended to be repre- theapplicationsandtheuserswhousetheseapplications. sentativeandillustratehowtheDBNmaybeused,andwemakeno Inconjunction,theseexperimentsdemonstratethepotentialuse formalclaimsregardingitsprivacyproperties. Inourexperiment, ofCQuebothforunderstandingprivacyimplicationsofreleasing theusercandefineapolicythatpermitsreleasingamaximalsetof a context, as well as to implement privacy policies that leverage contextobservationssuchthatthechangeinadversarialconfidence relationsacrosscontexts. forprivatecontextuponobservingthissetislessthanthresholdδ. 6.4 DBNwithPersonalizedClassifiers Asaresult,someofthenon-privatecontextscanbesuppressedin real-timetocontrolthechangeinadversarialconfidence. Thus,it While all our previous results have assumed classifiers to be resultsinlowerutilityfortheapplicationsseekingcontexts.Figure black-boxcodethatcannotbemodified, wenowlookatthecase 7givesanexampleshowingthefractionofcontextdatathatcanbe wherewecanpersonalizetheclassifiersusingthehumaninputob- 92

Description:
ences comes with a steep price tag — loss of privacy Context-Sensing: Our work is closest to ACE [15] that proposes . These human inputs shown as signals resent a time-series of random variables to model temporal consis- ple where classifiers provide the probability of a context through.
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.