Contextual Consent: Ethical Mining of Social Media for Health Research Chris Norval Tristan Henderson SchoolofComputerScience SchoolofComputerScience UniversityofStAndrews UniversityofStAndrews StAndrews,Fife,UK StAndrews,Fife,UK [email protected] [email protected] 7 1 ABSTRACT college students [24]. Such findings can help researchers and 0 practitionersidentifyimportantmarkersinhealthandinsociety. 2 Socialmediaarearichsourceofinsightfordatamininganduser- Despite these advantages, those who created the social data in centredresearch,butthequestionofconsentariseswhenstudying n questionarenotalwaysinformedabout theprocess. Whilesome such data without the express knowledge of the creator. Case a researchers outline clear methods of obtaining informed consent studiesthatminesocialdatafromusersofonlineservicessuchas J when conducting research on social media data (e.g. [24]), only Facebook and Twitter are becoming increasingly common. This 6 around5%oftheliteraturethatusesuchdatadescribetheirconsent has led to calls for an open discussion into how researchers can 2 procedures[12]. bestusethesevastresourcestomakeinnovativefindingswhilestill Some have argued that social data posted online are freely respecting fundamental ethical principles. In this position paper ] availableforresearch[9]; however, othersarguethatjustbecause C wehighlightsomekeyconsiderationsforthistopicandarguethat personal information is made available online does not mean it the conditions of informed consent are often not being met, and H is appropriate to capture and analyse [2, 4, 11, 35]. Conway that using social media data that some deem free to access and . and O’Connor argue that the potential challenge to privacy s analysemayresultinundesirableconsequences,particularlywithin occurs “not in the reading or accessing of individual materials c thedomainofhealthresearchandothersensitivetopics. Weposit [ that successful exploitation of online personal data, particularly (publicly available as they are), but rather in the processing and for health and other sensitive research, requires new and usable disseminationofthosematerialsinawayunintended"[4]. 1 Whileitisimportantforsuchresearchusingsocialmediadata methodsofobtainingconsentfromtheuser. v tocontinue,theinformedconsentofallparticipantsisanimportant 5 milestonetostrivefor. Firstly,consentdecisionsarestronglytied 6 Keywords to the expected audience [19, 22]. Research has suggested that 7 ethics;privacy;health;consent;socialmedia users have differing online behaviour based on their perceived 7 audience, including more self-censorship [5, 32], and sharing 0 1. INTRODUCTION smaller proportions of both positive and negative emotions [3] . 1 whentheperceivedaudienceismoresparse,andlesscontrolled. Socialmediaservices,suchasFacebookandTwitter,areanear 0 Secondly, research into privacy settings on social media have ubiquitous part of people’s lives. Facebook, for example, boasts 7 suggested that most users may be significantly oversharing what over 1.7 billion monthly active users [7]. As a result we are 1 isintended[18,20].Whenanalysingsocialmediadataonline,data increasingly sharing more and more data about our lives online: v: 68%ofUK-basedonlineadultslookatsocialmediasitesorapps canbetakenfromindividualswhoareunawarethatitisaccessible, i each week, 35% upload photos or videos, 28% share links to andprivacyviolationsmaybeoccurring.Thelikelihoodofprivacy X violationsincreaseswiththenumberofpeopleinvolved,withmany websites or online articles, and 24% contribute comments to a r websiteorblog[29]. suchstudiesindiscriminatelyscrapingthedataofhugenumbersof a individuals[2],ornotevenreportinghowtheygatherdata[12]. The vast quantities of personal data shared through such And finally, participants may find it intrusive to discover that platforms can offer new insights into important areas of theirdataisusedinawaywhichisoutsidetheboundariesofwhat research [2]. Academics from various fields have capitalised wasoriginallyexpected[4,19]. AnexampleofthisisSamaritans on these data, leading to the comprehension of complex social Radar, a well-intentioned Twitter app designed to monitor the behaviours and trends in society. Within a health context tweetsoftheuser’scontactsforpotentiallysuicidalmessages[16]. alone, such social data has been found to act as a predictor This app received criticism from the Twitter community over for depression [25, 30], suicide risk factors [6, 13], mood privacyconcerns,andwasshutdowndaysafterlaunching. changes [15], flu outbreaks [17] and problem drinking in US Therefore,withallpreviouspointsconsidered,participantsmay besharing moresensitiveandemotional content withafarwider audience than intended, they may not have otherwise provided consentfortheirdatatobestudied,andwouldlikelyfindaviolation of this to be intrusive. This suggests a disconnect between the researcher andtheparticipant; asituationinwhichtheautonomy of the participant should be respected. Safer and more explicit methodsofconsentarethereforenecessarytobestprotecttheneeds ofpotentialparticipants,whileensuringthatwecanmakethebest useofthevastquantitiesofsocialmediadataforhealthresearch. neededtomeettheethicalandlegalrequirementsforconsentwhile Inthispaper, wetakethepositionthatnew formsof obtaining accomodatingthedynamicnatureofmodernresearch[14]. meaningful consent are necessary for successful online health Some of these concerns have been echoed by Morrison et research.Weoutlinecurrentmethodsofobtainingconsent,discuss al., who found that few participants were aware of participating new potential methods that seek to mitigate these problems, and in an academic trial, as specified in the Terms and Conditions outlinesomeofthechallengesthatneedtobeovercomeforsucha document of a mobile application. Further means of informing methodtobedeveloped. usersofresearchers’intentionstocollectandanalyseuserdataare necessarytobehaveinanethicalmanner[26]. Suchfindingscall intoquestionthesuitabilityofbroad,one-offmethodsofcollecting 2. BACKGROUND consent when dealing with online research. As Steinsbekk et Informedconsent,adeclarationthattheparticipantunderstands al. articulate, at the core of the debate is “what it means to the consequences of participating in the study, is widely seen be ‘adequately informed’ and whether giving consent based on as fundamental to conducting research with human participants broaderpremisesisvalidornot"[34]. [21]. From the point of view of the researcher, it is seen as fulfilling ethical responsibilities with regards to the protection of 2.3 Dynamicconsent data, privacy, and autonomy of the participant [26]. While there Researchershaveinvestigateddifferent mechanisms forcoping appears to be broad agreement when it comes to interventional with the above issues, such as using dynamic consent [14, 27, research,thesubjectofwhetherinformedconsentisnecessaryina 34]. Dynamic consent is a framework for allowing participants studyofonlinecommunicationsisthesubjectofdebate[9,33].As togrant access totheir personal datatoresearchersinaway that aresult,newmodelsforconsenthavebeenproposed[11,14]. theycancontrol. Forexample,aparticipantmaydecidetorevoke access to their data, or customise their opt-in/opt-out preferences 2.1 User attitudes and theneed forconsent forparticipatinginresearch. Studies into user attitudes are often subject to selection bias, Kayeetal. arguethat,forparticipants,someoftheadvantages given that participants have opted in and are therefore not of dynamic consent include the ability to easily consent to new necessarilyrepresentativeofthewiderpopulation. Despitethis,it projects, alterconsentpreferencesinrealtime,findouthowtheir isworthconsultingtheviewpointsofthosewhomaybepotentially datahasbeenused,andtosetpreferencesabouthowtheyarekept affected by such research, and how they approach the trade-off informed. Researchersaresaidtogainmoreengagedparticipants, between societal benefit and personal intrusion. Mikal et al. streamlinedrecruitment,improvedpublictrust,andtheknowledge investigateduserattitudestowardtheanalysisofsocialmediadata thattheirresearchconformstohighlegalstandards[14]. forresearch,discovering“equivocalfindings"intheliterature[23]. While researchers have outlined the potential benefits of Mikal et al. asked participants about their expectations of dynamic consent, Steinsbekk et al. dispute these, arguing that privacy with regard to monitoring depression at the population a convincing case has not been made. Their criticisms of level, finding that most were accepting of it [23]. This was, dynamicconsentincludemorefrequent(andthereforemoretrivial) however, largely conditional on the analysis being conducted in requests for re-consent, and an increased risk of the relationship an aggregated and anonymised way, with no way of targeting a between researcher and participant breaking down due to unmet particular individual. This is echoed by Conway and O’Connor, expectations and a lack of reciprocity [34]. The authors argue who argue the need for a differentiation between automatic that “broad consent combined with competent ethics review and identificationattheindividual levelandatthepopulationlevelin an active information strategy is a more sustainable solution". ordertoprotecttheprivacyofthoseinvolved[4]. Hutton and Henderson have also raised issues with dynamic Mikal et al. found that “many respondents felt as though a consent, arguing that frequent requests for consent can lead to a failuretoprotectonlinedataconstitutedconsenttohavethatdata significantly greater burden on the participant [11], which could systematizedandanalyzed",however,theauthorsgoontoconsider frustrateparticipantsandleadthemtowithdrawfromtheresearch. whatkindsofindividualswouldbelesslikelytoprotecttheirdata [23].SuchamodelwouldleavethosewithlimitedInternetliteracy 2.4 Contextual integrity skills or those mistakenly oversharing their social data at risk of Nissenbaum’s model of contextual integrity is a theoretical unwittinglyprovidingimpliedconsent. framework that attempts to prescribe “specific restrictions on collection, use, and dissemination of information about people” 2.2 Unsuitability ofinformedconsent dependingonpresidingnormsofinformationappropriatenessand Research has questioned thesuitability of informed consent as distribution[28].Contextualintegrityfocusesonwhetheraflowof it currentlystands [11, 14, 19, 26, 27]. Luger and Rodden argue informationisappropriatewithinaparticularcontext,orwhethera that consent, as currently conceived, cannot hope to meet the violationofprivacyhasoccurred[31]. Ithaspreviouslybeenused challenges posed by ubiquitous computing systems, and tensions to explore privacy implications of social networking sites [1, 10, existbetweenwhatisexpectedofconsentandwhatisexpectedof 11,31,35]. a ubiquitous computer system. They describe consent as having HuttonandHendersonhaveusedcontextualintegritytoexplore been“stretchedthintothepointofbreaking"[19]. a new method of obtaining consent for social media research, Kayeetal. arguethatcurrentproposals fortheEuropeanData described as ‘contextual integrity consent’ [11]. This middle- Protection Regulations1 require explicit consent, questioning the groundapproachseekstheflexibilityofdynamicconsent,allowing legality of broad consent methods, and that static systems for userstochoosewhatdataisaccessibleandwhen, whilereducing obtainingconsentarenolongerfitforpurposeduetothechanging the burden of such data management. It works by inferring natureofresearchandtechnology. Newapproaches aretherefore context-specific norms; a ruleset determining the appropriate flow of information. Individuals are asked explicitly about their 1since the writingof that paper, the GDPR has been ratified and willingnesstoshareunlesstheyclearlyconformtoordeviatefrom willcomeintoeffectin2018. such a norm. In situations where a violation is found to have occurred,theframeworkrejectsthepracticeinquestionandfurther the concerns and interests of all parties. Due to the potential for actionscanbetaken,suchasre-requestingconsentfromtheuser. sensitive topics and the extent in which a sharing violation may These norms depend on a number of factors, such as what the impact individuals, it is important that steps are put in place to dataare,towhomthedataareflowing,andforwhatpurposethey greatlyreducerisk. Suchapredictivemodelwilllikelyserveasa are being requested. For example, McNeilly et al. found that recommender systemuntil thetechnique can beproven, therisks participants of a location sharing study were more likely to give are understood, and confidence in such a method is developed. the researchers access to their location data than a health study, Working closely with all relevant stakeholders will help us to and in both studies, photos were much less likely to be shared documentandunderstandtherisksinvolvedandthechallengesthat than ‘liked’ pages [22]. These norms can change over time, and willneedtobeovercome. individuals, organisations, and sections of society can each have theirownexpectationsofwhatisappropriate. 4. CHALLENGES Oneofthelargesttechnicalchallengeswefaceisthecontextual 3. TOWARDS CONTEXTUAL CONSENT nature of consent over items of data; One shared status update FORHEALTH or photo may be seen as appropriate for researchers to access and utilise, whereas another may not be, making it difficult to Much of the work in dynamic consent has studied mobile or designbroadrulesforwhatisacceptableandwhatisnot[19,20]. socialnetworkapplications.Anopenquestionishowtoapplythese Furthermore, the given rules for any one individual may change techniquestominingonlinehealthdata. Ouraimistoinvestigate overtime,associalrelationshipsandopinionsevolve[18]. contextualconsentforhealth;canwecapture,interpret,andacton Participants may have concerns about sharing sensitive data context-specific norms within a healthcare domain? Further, can with any autonomous system, given that it is an extension of the weimproveoncontextualconsentbyblendingcontextualintegrity researcher. Thisraisesnewchallengesabouthowsuchaclassifier with machine learning techniques? Finally, such techniques are couldbeused,oreventrained,ifsomeparticipantsarenotwilling only useful if available to practitioners, and feedback from such to share certain data with researchers. Approaching this project, deploymentswillinformfurtherdevelopmentoftoolsandmodels. wemuststrikeabalancebetweenaccuracyandappropriateness,as By predicting when social media users find it appropriate to participantsmayormaynotbecomfortablewithhavingtheactual share different types of data with different stakeholders, such as contentoftheirsocialdatabeingmined,however,itisunclearhow researchers and clinicians, theconsent process will, firstly, better effectivepredictiontechniquescanbeusingmetadataalone. reflect the context in which data were created, and secondly, Asignificantethicalconsiderationisthepotentialforerroneous respect users’ preferences about which data should be made and unacceptable norm violations taking place as a result of this availableandwithwhom. Forinstance,someoneseekingsupport research. In short, the sharing of a particular bit of data against from their peers because they are anxious about an upcoming the will of the owner may erode trust and call into question the medical procedure might not want this shared with medical entireproject. Suchasystemmaynot begivenasecond chance. researchers, while others may want reports about side-effects of Ourinterimsolutiontothisistofocusonasystemwhichsuggests theirmedicationtobeviewedbyclinicians. appropriate sharing levels to the user for approval, rather than We are focusing specifically within a health context for three explicitlysharingthedataautonomously. Whilethisincreasesthe reasons. Firstly, we believe that health data is an area where burden of sharing content, it would stillbelessburdensome than presiding norms exist. Secondly, getting these expectations of asking theuser to specifically choose the acceptable audience on appropriate data flow correct is important due to the potential eachbitofcontentshared,suchaswithgroups,lists,orcircles.We sensitivity of the data. Finally, given the vast number of studies believethatthislimitationexistsinanysystemthataimstoreduce usingsocialmediadatawithinamedicaldomain, webelievethis theburdenofconsentbyautomatingtheuser’sdecision. workistimely,andthereisanopportunitytoprovideguidanceand Thecollectionofdataforsuchresearchwillalsolikelyproveto tools for the appropriate handling of social media data in health beabarriertoovercome. Collectinganydatainanethicallyaware researchtoacademicsandpractitionersalike. way will introduce selection bias. Obtaining large sample sizes We are currently planning two studies to further this mayalsobedifficult.Webelievethatthischallengeisalsoinherent investigation. The first will involve collecting a large corpus of in such research, although there may be ways to minimise the data to use in a training and model evaluation phase. We will impactwithoutcompromisingtheprincipleofinformedconsent. workwithparticipantstodeterminewhatthemainpredictorsareof appropriatedataflowtodifferentmedicalstakeholders. Withthis 5. CONCLUSION data, wewill then derive machine-learning models for predicting consentwiththesedifferentstakeholders. Finally,wewillevaluate Peopleareincreasinglysharingmoreandmoreinformationvia ourmodel’seffectivenessinafollow-upstudywithparticipantsin social media, and as a result, such platforms are seen as rich asocialmediamedicalsupportcommunity. datasourcesforresearchersfromvariousfieldsandbackgrounds. Gomer et al. propose a similar semi-autonomous method This can lead to interesting research and discoveries, however, it of obtaining consent. Their proposed system trains a consent alsoraisessignificant questions over informed consent intheage agent, usesthismodeltoreceiveandacceptorrejectrequestsfor of social media and data mining. We argue that the successful participation, and allows users to review past decisions with the exploitationofonlinepersonalhealthdatarequiresnewandusable resultsre-trainingthemodel[8]. Thismethoddiffersfromoursas methods of obtaining consent from the content creators. By ittrainsandbuildsamodelonaperson-by-personbasis,ratherthan deriving a new and usable method of obtaining consent in such buildingonthecollectivenormswhichshapecontextualintegrity. circumstances, we hope that researchers can continue safe in the Assuch,itwouldrequireusertrainingbeforeitcouldbeused. At knowledgethattheresearchistransparentandtheparticipantsare thetimeofwriting,nofollow-upworkhassincebeenpublished. informedabouthowtheirdataisbeingusedandwhy. Throughout this research, we aim to engage with clinicians, We are currently planning studies to explore the contextual members of support communities and researchers to understand normsofdatasharingwithinthecontextofhealthresearchinorder to investigate whether or not this process can be automated. To [17] J.LiandC.Cardie.EarlyStageInfluenzaDetectionfrom developanewmethodofobtainingconsent,weneedabroadrange Twitter.arXivpreprintarXiv:1309.7340,2013. ofinputintheareasofsocialmedia,privacy, ethics,andlaw. We [18] Y.Liu,K.P.Gummadi,B.Krishnamurthy,andA.Mislove. wouldliketoinviteexperts,researchersandpractitionersinthese AnalyzingFacebookPrivacySettings:UserExpectationsvs. fields, as well as users of such services, to provide thoughts and Reality.InProc.IMC2011,pages61–70.ACM,2011. contributetodiscussionsabouthowsuchmethodsmaywork. [19] E.LugerandT.Rodden.Aninformedviewonconsentfor UbiComp.InProc.UbiComp2013,pages529–538.ACM, Acknowledgements 2013. [20] M.Madejski,M.Johnson,andS.M.Bellovin.Astudyof ThisworkwassupportedbytheWellcomeTrust[UNS19427]. privacysettingserrorsinanonlinesocialnetwork.InProc. SESOC2012,pages340–345.IEEE,2012. 6. REFERENCES [21] N.C.MansonandO.O’Neill.RethinkingInformedConsent inBioethics.CambridgeUniversityPress,2007. [1] L.Barkhuus.TheMismeasurementofPrivacy:Using [22] S.McNeilly,L.Hutton,andT.Henderson.Understanding ContextualIntegritytoReconsiderPrivacyinHCI.InProc. ethicalconcernsinsocialmediaprivacystudies.InProc. CHI2012,pages367–376.ACM,2012. CSCW2013WorkshoponMeasuringNetworkedSocial [2] d.boydandK.Crawford.Criticalquestionsforbigdata. Privacy:Qualitative&QuantitativeApproaches,2013. Info.,Comm.&Soc.,15(5):662–679,2012. [23] J.Mikal,S.Hurst,andM.Conway.Ethicalissuesinusing [3] M.BurkeandM.Develin.Oncemorewithfeeling: Twitterforpopulation-leveldepressionmonitoring:a SupportiveresponsestosocialsharingonFacebook.InProc. qualitativestudy.BMCMedicalEthics,17(1),2016. CSCW2016,pages1462–1474.ACM,2016. [24] M.A.Moreno.AssociationsBetweenDisplayedAlcohol [4] M.ConwayandD.O’Connor.Socialmedia,bigdata,and ReferencesonFacebookandProblemDrinkingAmong mentalhealth:currentadvancesandethicalimplications. CollegeStudents.ArchivesofPediatrics&Adolescent CurrentOpinioninPsychology,9:77–82,2016. Medicine,166(2):157–163,2012. [5] S.DasandA.D.I.Kramer.Self-CensorshiponFacebook.In [25] M.A.Moreno,L.A.Jelenchick,K.G.Egan,E.Cox, Proc.ICWSM2013,2013. H.Young,K.E.Gannon,andT.Becker.Feelingbadon [6] M.DeChoudhury,E.Kiciman,M.Dredze,G.Coppersmith, Facebook:depressiondisclosuresbycollegestudentsona andM.Kumar.DiscoveringShiftstoSuicidalIdeationfrom socialnetworkingsite.Depressionandanxiety, MentalHealthContentinSocialMedia.InProc.CHI2016, 28(6):447–455,2011. pages2098–2110.ACM,2016. [26] A.Morrison,D.McMillan,andM.Chalmers.Improving [7] CompanyInfo|FacebookNewsroom.http://newsroom.fb. consentinlargescalemobileHCIthroughpersonalised com/company-info.Accessed:2016-11-08. representationsofdata.InProc.NordiCHI2014,pages [8] R.Gomer,M.C.Schraefel,andE.Gerding.Consenting 471–480.ACM,2014. agents:Semi-autonomousinteractionsforubiquitous [27] C.Munteanu,H.Molyneaux,W.Moncur,M.Romero, consent.InProc.UbiComp2014Adjunct,pages653–658. S.O’Donnell,andJ.Vines.Situationalethics:Re-thinking ACM,2014. approachestoformalethicsrequirementsfor [9] J.M.HudsonandA.Bruckman. “GoAway":Participant human-computerinteraction.InProc.CHI2015,pages ObjectionstoBeingStudiedandtheEthicsofChatroom 105–114.ACM,2015. Research.TheInformationSociety,20(2):127–139,2004. [28] H.F.Nissenbaum.Privacyascontextualintegrity. [10] G.Hull,H.R.Lipford,andC.Latulipe.Contextualgaps: WashingtonLawReview,79(1):119–157,2004. Privacyissuesonfacebook.EthicsandInf.Technol., [29] Ofcom.Adults’mediauseandattitudesReport2016,2016. 13(4):289–302,2011. [30] A.G.ReeceandC.M.Danforth.Instagramphotosreveal [11] L.HuttonandT.Henderson.“Ididn’tsignupforthis!”: predictivemarkersofdepression.CoRR,abs/1608.03282, Informedconsentinsocialnetworkresearch.InProc. 2016. ICWSM2015,pages178–187,2015. [31] R.SarandY.Al-Saggaf.Contextualintegrity’sdecision [12] L.HuttonandT.Henderson.Towardsreproducibilityin heuristicandthetrackingbysocialnetworksites.Ethicsand onlinesocialnetworkresearch.IEEETransactionson InformationTechnology,2013. EmergingTopicsinComputing,page1,2015. [32] M.Sleeper,R.Balebako,S.Das,A.L.McConahy,J.Wiese, [13] J.Jashinsky,S.H.Burton,C.L.Hanson,J.West, andL.F.Cranor.Thepostthatwasn’t:Exploring C.Giraud-Carrier,M.D.Barnes,andT.Argyle.Tracking self-censorshiponFacebook.InProc.CSCW2013,pages suicideriskfactorsthroughTwitterintheUS.Crisis, 793–802.ACM,2013. 35(1):51–59,2014. [33] L.Solberg.DataminingonFacebook:Afreespacefor [14] J.Kaye,E.A.Whitley,D.Lund,M.Morrison,H.Teare,and researchersoranIRBnightmare?UniversityofIllinois K.Melham.Dynamicconsent: apatientinterfacefor JournalofLaw,Technology&Policy,2010(2),2010. twenty-firstcenturyresearchnetworks.EuropeanJournalof [34] K.S.Steinsbekk,B.KareMyskja,andB.Solberg.Broad HumanGenetics,23(2):141–146,2014. consentversusdynamicconsentinbiobankresearch:Is [15] J.A.Lee,C.Efstratiou,andL.Bai.OSNMoodTracking: passiveparticipationanethicalproblem?;.EuropeanJournal ExploringtheUseofOnlineSocialNetworkActivityAsan ofHumanGenetics,21(9):897–902,2013. IndicatorofMoodChanges.InProc.UbiComp2016,pages [35] M.Zimmer.“Butthedataisalreadypublic”:ontheethicsof 1171–1179.ACM,2016. researchinFacebook.EthicsandInformationTechnology, [16] N.Lee.Troubleontheradar.TheLancet,384(9958):1917, 12(4):313–325,2010. 2014.