Table Of ContentUser-Generated Free-Form Gestures for Authentication:
Security and Memorability
Michael Sherman†, Gradeigh Clark†, Yulong Yang†, Shridatt Sugrim†, Arttu Modig∗,
Janne Lindqvist†, Antti Oulasvirta‡, Teemu Roos∗
†Rutgers University, ‡Max-Planck Institute for Informatics, ∗University of Helsinki
Corresponding author: janne@winlab.rutgers.edu
ABSTRACT
4
1 This paper studies the security and memorability of free-
0 formmultitouchgesturesformobileauthentication.Towards
2 this end, we collected a dataset with a generate-test-retest
n paradigmwhereparticipants(N=63)generatedfree-formges-
a tures, repeated them, and were later retested for memory.
J
Half of the participants decided to generate one-finger ges-
2
tures,andtheotherhalfgeneratedmulti-fingergestures. Al-
though there has been recent work on template-based ges-
]
R tures, there are yet no metrics to analyze security of either
C templateorfree-formgestures. Forexample,entropy-based
. metrics used for text-based passwords are not suitable for
s
c capturing the security and memorability of free-form ges-
[ tures. Hence,wemodifyarecentlyproposedmetricforan-
alyzinginformationcapacityofcontinuousfull-bodymove-
1
v mentsforthispurpose. Ourmetriccomputedestimatedmu-
1 tual information in repeated sets of gestures. Surprisingly,
Figure 1: This paper studies continuous free-form mul-
6 one-fingergestureshadhigheraveragemutualinformation.
titouch gestures as means of authentication on touch-
5
Gestures with many hard angles and turns had the highest
0 screen devices. Authentication on touchscreens is nor-
mutualinformation.Thebest-rememberedgesturesincluded
. mallydonewithagrid-basedmethod. Free-formgesture
1 signaturesandsimpleangularshapes. Wealsoimplemented
0 passwordshavealargerpasswordspaceandarepossibly
a multitouch recognizer to evaluate the practicality of free-
4 less vulnerable to shoulder surfing. We note that there
form gestures in a real authentication system and how they
1 arenovisualcuesforthegestures,thegesturetracesare
: perform against shoulder surfing attacks. We conclude the
v shownonlyaftercreatingthegesture.
paper with strategies for generating secure and memorable
i
X free-form gestures, which present a robust method for mo-
r bileauthentication.
a require less accuracy. Although grid-based gestures better
utilizethecapabilitiesoftouchscreensasinputdevices,they
1. INTRODUCTION
arelimitedasanauthenticationmethod. Forexample,avi-
Smartphones and tablets today are important for secure sualpatterndrawnonagridispronetoattackssuchasshoul-
dailytransactions. Theyarepartofmulti-factorauthentica- dersurfing[37]andsmudgeattacks[1].
tionforenterprises[20],allowustoaccessouremail,make This paper studies free-form multitouch gestures without
1-click payments on Amazon, allow mobile payments [31] visualreference;thatis,gesturesthatallowallfingersdraw
andevenaccesstoourhouses[13].Therefore,itisimportant atrajectoryonablankscreenwithnogridorothertemplate.
toensurethesecurityofmobiledevices. AnexampleofthecreationprocessisdepictedinFigure1,
Recently, mobile devices with touchscreens have made where the gesture traces are shown only after the gesture
gesture-basedauthenticationcommon.Forexample,theAn- wascreated. Thismethodbearspotential,becauseitrelaxes
droidplatformincludesa3x3gridthatisusedasastandard someoftheassumptionsthatmakethegrid-basedmethods
authentication method, which allows users to unlock their vulnerable. In particular, arbitrary shapes can be created.
devicesbyconnectingdotsinthegrid. Comparedwithtext- Moreover,asmorefingerscanbeused,inprinciplemorein-
based passwords, gestures could be performed faster while formation can be expressed. Technically such gestures can
1
bescaleandpositioninvariant,allowingtheusertogesture tended features of the gesture and that of uncontrolled and
on the surface without visually attending the display. Con- unreproduciblefeatures.
sider,forexample,drawingacircleasyourpassword. This Ourresultsshowthatseveralparticipantswereabletocre-
maybebeneficialformobileuserswhoneedtoattendtheir ate secure and memorable gestures without guidance and
environment. Nevertheless, although no visual reference is prior practice. However, many participants used multiple
provided, mnemonic cues referring to shapes and patterns fingersinatrivialway,justrepeatingthesamegesture. Our
canstillbeutilizedforgeneratingthegestures.Finally,when implementation of a practical multitouch recognizer shows
multiple fingers are allowed to move on the surface and no that the free-form gestures can used as a secure authenti-
visual reference is provided, observational attacks may be cation mechanism, and are resistant to shoulder-surfing at-
moredifficult. tacks.
Previous work on gestures as an authentication method Ourcontributionsareasfollows:
has focused on two major directions: one was whether the
1. Report on patterns in user-generated free-form multi-
samegesturecanbecorrectlyrecognizedingeneral[12,28]
touch gestures generated from 63 participants with a
orinaspecificenvironmentsuchashandwritingmotionde-
typicaltablet;
tectedbyKinect-cameras[32], predefinedwhole-bodyges-
turesdetectedfromwirelesssignals[25],andmobiledevice 2. Adaptationofarecentinformationtheoreticmetricfor
movement detected by built-in sensors [27]. Studies of the measuringthesecurityandmemorabilityofgestures;
securityofgestureslookateithertheprotectionofgestures
from specific scenarios [37, 29, 32, 8], or an indirect mea- 3. Adesignandimplementationofapracticalmultitouch
surementofsecurity[14,22]. Further,theseworkshavefo- gesture recognizer to evaluate free-form gestures ap-
cusedonunderstandingperformanceoftemplategesturesre- plicabilityforauthentication;
peatedbyparticipants,notuser-generatedfree-formgestures
4. A preliminary study on a shoulder-surfing attack that
asthepresentwork.
indicatesthepotentialoffree-formgesturesagainstsuch
Our goal is to understand the security of this method by
attacks.
measuringinformationcapacityandstudyingmemorability
in a dataset that allowed users to freely choose the kind of 2. RELATEDWORK
multitouch passwords they deemed best. We conducted a
Inthissection,wediscussrelatedworkonbiometric-rich
controlledexperimentwith63participantsinagenerate-test-
authenticationschemes,graphicalpasswords,andpassword
retestdesign. Atfirst,participantscreatedandrepeatedges-
memorability.
tures (generate), then trying to recall it after a short break
2Dgestureauthenticationschemes.Similartofree-form
(test) and recalling it again after a period of time at least
gestures,biometric-richauthenticationschemesarebasedon
10 days (retest). With this paradigm we were able to ex-
thepremisethatwhenauserperformsagestureonatouch-
aminetheeffectoftimeonhowparticipantsmemorizetheir
screentheywilldothisinsuchawaythatfeaturescanbeex-
gestures. To the best of our knowledge, we are the first to
tractedthatwilluniquelyidentifythemlateron[12,28,39,
presentastudyhowpeopleactuallyrecalltheirgesturesaf-
3]. Similarideashavebeenappliedtorecognizingmotions
teradelay.
withKinect[32]. Specifically,Sae-Baeetal.[28]hasshown
To analyze the security of the gestures, we use a novel
that there is a uniqueness to the way users perform identi-
information metric of mutual information in repeated mul-
calsetoftemplate2Dgesturesbasedonbiometricfeatures
tifinger trajectories. We base our metric on a recent metric
(e.g.handsizeandfingerlength). Franketal.[12]demon-
thatwasusedforaverydifferentpurpose,specificallythees-
stratedthatthewayauserinteractswithasmartphoneforms
timation of throughput (bits/s) in continuous full-body mo-
a unique identifier for that user; they show that the way a
tion[24],andithasnotbeenusedpreviouslyforauthentica-
user performs simple tasks (e.g. scrolling to read or swip-
tion. Becausemultitouchgesturesarecontinuousbynature
ingtothenextpage)isperformedinauniquewaysuchthat
thestandardinformationmetricscannotbedirectlyapplied.
the coordinates of a stroke, time, finger pressure, and the
Whatisuniquetogesturingoverdiscreteaimedmovements
screenareacoveredbyafingeraremeasurementsthatcould
(physical and virtual buttons) is that every repetition of a
be used to classify said user. Zheng et al. [39], operating
trajectory is inherently somewhat different [17]. However,
onsimilarprinciples,havestudiedbehavioralauthentication
when this variability grows too large, the password is use-
using the way a user touches the phone – the features ex-
less, because it is both not repeatable by the user and not
tractedincludedacceleration,pressure,size,andtime. Boet
discriminable from other passwords. The information met-
al.[39]performedrecognitionbyminingcoordinates,dura-
ric should capture this variability. In our metric, a secure
tion, pressure, vibration, and rotation. Cai et al. [5] exam-
gesture should contain a certain amount of “surprise", i.e.
ined six different features (e.g. sliding) and compared data
some turns or changes, while still being able to be repro-
such as the speed, sliding offset, and variance between fin-
ducedbytheuseritself. Wealsoincludemutualinformation
ger pressures. Deluca et al. [8] developed a system for au-
calculationtoseparatethecomplexityofcontrolledandin-
thentication by drawing a template 2D gesture on the back
2
of a device using two phones connected back to back. The size of the password space for a gesture is based on three
securityofthegestureisanalyzedthroughvariousmethods spaces: designfeatures(howtheuserinteractswiththede-
byanattackertoreplicatetheoriginalbiometricorgraphical vice), smartphone capabilities (screen size, etc.), and pass-
password–thereisnoanalysisperformedastothesecurity word characteristics (existing metrics of security, usability,
contentofthegesture,justitsdifficultytobereproduced. etc). Securityinthiscontextreferstoameasuredresistance
3D gesture authentication schemes. 3D gesture recog- toshouldersurfing.
nitioncanbeperformed,mostrecently,usingcamera-based Continuing with security analysis, brute force attacks on
systems (e.g. Kinect) [32] or using wireless signals [25]. gestures have been examined in some studies [32, 38, 2].
With the camera-based systems, a user would trace a ges- Zhao et al. [38] have examined the security of 2D gestures
tureoutinspaceandtheimagegetscompressedintoatwo againstbruteforceattacks(assistedorotherwise)whenus-
dimensionalimageandprocessedforrecognition[32].Puet inganauthenticationsystemwhereauserwilldrawagesture
al. [25] have shown that three-dimensional gestures can be onapicture. Ameasureofthepasswordspaceisdeveloped
recognized by measuring the Doppler shifts between trans- and an algorithm under which a gesture in that space can
mittedandreceivedWi-Fisignals. beattacked. Theattackiscapableofguessingthepassword
Graphicalandtext-basedpasswordssecurityandmem- basedonareasofthescreenthatauserwouldbedrawnto-
orability. Therehasbeenconsiderableworkoncuedgraph- wards. This study does not concern itself with the security
ical passwords, a survey is offered by Biddle et al. [2] for of the gesture drawn, instead it is focused on where a user
the past twelve years. In particular, there has been anal- would target in a picture-based authentication schema – it
ysis on how Draw a Secret (DAS) [16] type of graphical doesnotaddressfree-formgestureauthentication.Serwadda
passwordsmeasuresuptotext-basedpasswordsintermsof etal.[30]showedthatauthenticationschemabasedonbio-
dictionary attacks [23]. Oorschot et al. [23] go on to de- metric analysis (including one by Frank et al. [12]) can be
scribe a set of complexity properties based for DAS pass- crackedusingarobottobruteforcetheinputsusinganalgo-
wordsandconcludethatsymmetryandstroke-countarekey rithmthatissuppliedswipeinputstatisticsfromthegeneral
in how complicated a DAS-password can be. They do not population.
provideadirectmeasurementofthisforDAS-password;the Finally, on non-security related work, Oulasvirta et al.
analysis is restricted to constructing a model to perform a [24]studiedtheinformationcapacityofcontinuousfull-body
dictionary attack and show that there are weak password movements. Ourmetricismotivatedbytheirwork. Specif-
subspaces based on DAS symmetry. For text-based pass- ically, they did not study 2D gestures or their security and
words Florencio et al. [11] studied people’s web password memorability or use for an authentication system. When
habits, and found that people’s passwords were generally askedtocreategesturesfornon-securitypurposes,previous
of poor quality, they are re-used and forgotten a lot. Yan work [14, 22] indicates that people tend to repeat gestures
et al. [36] were among the first to study empirically how thatareseenonadailybasisandarecontext-dependent(e.g.
differentpasswordpoliciesaffectsecurityandmemorability that the gestures people perform are dependent on whether
of the text-based passwords. Chiasson et al. [6] conducted they are directing someone to perform a task or receiving
laboratorystudiesonhowpeoplerecallmultipletext-based directionsonatask).
passwordscomparedtomultipleclick-basedgraphicalpass-
words (PassPoints [34]). They found that the recall rates 3. SECURITYOFGESTURES
after two weeks were not statistically significant from each
Inthissection,wepresentournovelinformation-theoretic
other.Everittetal.[9]analyzedthememorabilityofmultiple
metric for evaluating the security and memorability of ges-
graphical passwords (PassFaces [2]) through a longitudinal
tures. We briefly discuss why existing entropy-based met-
study and found that users who authenticate with multiple
rics used to evaluate discrete text-based passwords [4] are
differentgraphicalpasswordsperweekweremorelikelyto
notsuitableforgestures,andmovetopresentourmetricfor
fail authentication than users who dealt with just one pass-
securityandmemorabilityofcontinuousgestures. Wehave
word.
modifiedarecentmetriconanalyzinginformationcapacity
Security Analysis of Graphical Passwords and Ges-
of full-body movements [24] to estimate the security of a
tures. Most security analysis focus on preventing shoulder
multitouchgesture.
surfingattacksfromhijackingagraphicalpasswordorges-
Multitouchgesturesonatouchscreensurfaceproducetra-
ture[37,29,8]. Themethodsdependonimplementingtech-
jectorydatawherethepositionsofoneormoreend-effectors
niquestomaketheinputmoredifficulttoattack(e.g.making
(fingertips)aretrackedovertime.Thecontinuousandmulti-
thegraphicalpassworddisappearasitisbeingdrawn[37]).
dimensional nature of multitouch gesture data poses some
Another team designed an algorithm based on Rubine [26]
challenges for defining the information content compared
thattolduserswhetherornottheirgesturesaretoosimilar,
toregularkeyboard-basedpasswordsthatonlygaugeinfor-
although the metric for this is inherently based on the rec-
mation in discrete movements corresponding to key events
ognizer’s scoring capabilities and not on a measure of the
(pressdown) caused by a single end-effector (e.g., finger,
gesture by itself [19]. Schaub et al. [29] suggest that the
cursor)atatime. Multitouchgesturesinvolvemultipleend-
3
effectorsandcontinuousmovement. Toourknowledge, no doso,wefitasecondorderautoregressivemodelforbothof
informationtheorybasedsecuritymetrichasbeenproposed thesequencesseparately. Forsequencex,themodelis:
formultitouchgesturesaspasswords.
x =β +β x +β x +ε(x), (1)
Thecoreideaistodemonstratethatthereisanassociation t 0 1 t−1 2 t−2 t
betweenthesecurityofagesturepasswordandtheinforma-
where β ,β and β are parameters that we estimate using
0 1 2
tion content of the gesture. Intuitively, information content
thestandardleast-squaresmethod. Thebenefitofasecond-
is a property of a message or a signal (such as a recorded
order model is its interpretability: it captures the physical
gesture):itmeasurestheamountofsurprisingness,orunpre-
principle that once the movement vector (direction and ve-
dictability, of the signal with the important additional con-
locity)isdetermined,constantmovementcontainsnoinfor-
straintthatanysurprisingnessduetorandom(uncontrolled)
mation.
componentinthesignalisexcluded.Information-theoretically, After parameterfitting, weobtain residuals r(x) for each
t
thesurprisingnessofamessage,ormoreprecisely,ofasource
framet:
generating messages according to a certain probability dis-
tribution, can be measured by the entropy H(x) associated rt(x) =xt−xˆt =(βˆ0+βˆ1xˆt−1+βˆ2xˆt−2) (2)
withtherandomvariable,x,whosevaluesarethemessages.
The residuals r(x) correspond to deviations from constant
Forinstance, thesurprisingnessofakeystrokechosenuni- t
movementandarehencethepartofthesequenceunexplained
formly at random among 32 alternatives is log (32) = 5
2
bytheautoregressivemodel. Theycanbeusedtogaugethe
bits;fivetimesthatofananswertoasingleyes–noquestion.
surprisingnessofthetrajectory. Thesameprocedureiscar-
Asimilarmeasureofsurprisingnesscanbealsoassociatedto
riedoutforsequencey. Wecouldnowcomputethediffer-
continuousrandomvariables,calledthedifferentialentropy,
entialentropyofaresidualsequences,butasstatedabove,it
but it lacks a similar meaning in terms of yes–no questions
alonehaslittlemeaningandweareinfactonlyinterestedin
as it can, for instance, take negative values. For more de-
themutualinformationbetweenthetwosequences.
taileddefinitionsoftheusedinformation-theoreticconcepts,
Beforewecomputethemutualinformation,dimensionre-
pleaseseee.g. CoverandThomas[7].
ductionisperformedwherebymultitouchgesturesarerepre-
Fordiscreteaswellascontinuousmessages,theinforma-
sentedusingonlyasmanyfeaturespermeasurementasthe
tioncontentcanbedefinedasthemutualinformationI(x;y)
datarequires. Themotivationforthisstepisthatonecannot
betweentworandomvariables,xandy,anditgivesthere-
simply add information in the movement features in multi-
ductionintheentropy(surprisingness)ofonerandomvari-
touchgesturestogether. Instead,anydependenciesbetween
able (y) when another one (x) becomes known. Note that
thefingersshouldberemoved. Intuitively,amultitouchges-
a message can have high entropy (complexity) without the
ture with all fingers in a fixed constellation contains essen-
mutualinformation(informationcontent)beinghighbutnot
viceversa.1 tially the same amount of information as the same gesture
performedusingasinglefinger. Dimensionreductionisper-
Themetricweusefortheinformationcontentinrepeated
formed using principal component analysis (PCA), which
gestures is defined as the mutual information between two
removesanylineardependencies. Followingcommonprac-
realizationsofthesamegesture. Theunderlyingassumption
tice,thenumberofretaineddimensionsissetbyfindingthe
is that any dependencies between the two gestures must be
lowestnumberofdimensionsthatyieldsanacceptablylow
intended,andconversely,anyaspectsofthegesturesthatare
reprojectionerror(e.g.,meansquareerror;see[24]).
notpresentinbothrealizationsmustbeunintended.
Oncethemovementfeatureshavebeenprocessedbyadi-
Theinputtothemetriccomprisesoftwomultitouchmove-
mensionreductiontechnique(PCA),wetreatthemindepen-
mentsequencesproducedbyaskingausertoproduceages-
dently which amounts to simply adding up the information
ture and repeat it. The two gesture trajectories are denoted
contentineachfeatureintheend. Hence,thefollowingdis-
byxandy. Thetrajectoriesrecordthelocationsofeachof
cussiononlyconsiderstheone-dimensionalcasewhereboth
the used fingers over duration of the gesture. The informa-
xandyareunivariatesequences.
tion content is then the amount of information, as outlined
Anotherissueingesturedataisthatthetwogesturesxand
aboveintheinformation-theoreticsense,thatiscontainedin
yareoftennotofequallengthduetodifferentspeedatwhich
bothxandy,measuredbythemutualinformation.
thegesturesareperformed. Thiscanbecorrectedbytempo-
Computation. Computation of the mutual information
rally aligning the sequences using, for instance, Canonical
involvesasequenceofsteps. First,weneedtoremovefrom
Time Warping [40]. The result is a pairwise alignment of
thesequencestheirpredictableaspects,asfaraspossible.To
eachoftheframesinxandy achievedbyduplicatingsome
1Infact,tobeprecise,theinequalityI(x;y)≤min{H(x),H(y)} oftheframesineachsequence. Theseduplicateframesare
holdsonlyfordiscretesignals,suchassymbolicpasswords,butnot skippedwhencomputingmutualinformationtoavoidanin-
forcontinuoussignals,suchasgestures,becausecontinuoussignals
flatingtheireffect.
canhavenegativeentropy[7]. However, eventhoughthereisno
Finally, we form pairs of residual values (x ,y ) corre-
theoreticalguaranteeofit,theintuitionthatatrivialgesturesuchas t t
astraightlinecannotcontainhighinformationcontentholdstrue sponding to each of the frames in the aligned residual se-
inourexperiments;seebelow. quencesandevaluatethemutualinformation. Sincethemu-
4
tualinformationisdefinedforajointdistributionoftworan- ipantsfilledaquestionnaireonworkload(NASA-TLX[15])
dom variables, we model the pairs (x ,y ) of residuals in aftereachtaskandashortsurveyintheendofsecondses-
t t
each frame 1 ≤ t ≤ n using a bivariate Gaussian model, sion.
under which the mutual information is given by the simple We note that a somewhat similar generate-test-retest de-
formula signhasbeenusedbeforebyChiassonetal.[6]tocompare
n multiplepasswordinferencetorecallbetweentext-basedand
I(x;y)=− log (1−ρ2 ), (3)
2 2 x,y graphical passwords (PassPoints [34]). However, our work
whereρ isthePearsoncorrelationcoefficientbetweenx uses TLX forms, and is focused on free-form multitouch
x,y
andy. gestures, there are more repetitions and recalls, does not
Bysubstitutingthesamplecorrelationcoefficientestimated have a separate login phase, and the questions are asked at
fromthedatainplaceofρ andsubtractingatermdueto theend.
x,y
theknownstatisticalbiasoftheestimator(see[24]),weob- Next,wedescribeourvolunteerparticipants,ourappara-
tainthemutualinformationestimate tus,datapreprocessing,experimentdesignandprocedure.
n Participants. Werecruitedparticipantswithfliers,email
Iˆ(x;y)=− log (1−r2)−log (e)/2, (4)
2 2 2 lists,andinpersonincafeterias.Werequiredtheparticipants
tobe18yearsoldoroverandfamiliarwithtouchscreende-
wherelog (e) ≈ 1.443isthebase-2logarithmoftheEuler
2 vices.Werecruited63participantsinall,fromtheagesof18
constant.
to65(M=27.2,SD=9.9),24aremaleand39arefemale.
Thetotalinformationcontentinthegesture,basedontwo
Their educational background varies: 22 have high school
repetitions,isestimatedasthesumofthemutualinformation
diploma, 23 have a Bachelor’s degree, 16 have a graduate
estimatesineachofthemovementfeaturesafterdimension
degreeandtwohaveotherdegrees.
reduction.
All63participantscompletedsession1ofourstudy,and
Summary. The metric has some promising properties
57ofthemreturnedandparticipatedinsession2. Ascom-
to serve as an index of security. First, a distinctive feature
pensation,participantsreceived$30forcompletingthewhole
of our framework that sets it apart from the work on sym-
study.Theyalsoparticipatedinaraffleofthree$75giftcards.
bolic password security is that in dealing with continuous
We have recruited our participants in two batches: first
gestures,itisimperativetobeabletoseparatethecomplex-
in May 2013 (33) and second in June 2013 (30). Further,
ityduetointendedaspectsofthegesturefromthatduetoits
inordertoanalyzeeffectofvaryingtimeonrecall, thegap
unintended,andhencenon-reproducible,aspects.Thisisthe
betweenthetwosessionsofthestudyvaries. Themeantime
mainmotivationtousemutualinformationasabasisforthe
gap for the first participants is 14.53 (SD = 5.81) days and
metric. Second, as mutual information under the bivariate
29.52(SD=7.57)daysforthesecond.
Gaussianmodelisdeterminedbythecorrelationbetweenthe
OurstudywasapprovedbyourInstitutionalReviewBoard.
movementsequences(residuals),itisinvariantunderlinear
Apparatus. The gesture data was recorded on a Google
transformationssuchaschangeofscale,translation,orrota-
Nexus10tabletwithAndroid4.2.2astheoperatingsystem,
tion. Hence,theuserneednotremembertheexactscale,po-
atanaverageof200framespersecond(FPS).
sition,ororientationofthegestureonthescreen.Themetric
Preprocessing. Therawdatafileswerepreprocessedus-
isalsoindependentofthesizeandtheresolutionoftheused
ing MATLAB.In thepreprocessing, eachgesture file isre-
screenunless,ofcourse,theresolutionissolowthatimpor-
sampledto60FPS,toreducetheeffectoftheunevensample
tant details of the gesture are not recorded. Third, the time
rate.
warpingstepensuresthatvariationinthetimingwithinthe
The resampling was done by cubic spline interpolation,
gesturehasonlyslighteffectonthemetric. Fourth,themet-
viaMATLAB’sbuiltinfunctioninterp1d.m. Forthexandy
ricenablescomparisonbetweengesturesofunequallengths
coordinatesforeachfinger,ittakestherecordedtimestamps
andbetweensingle-fingerandmulti-fingergestures,aswell
and resamples to a constant rate. The reduction from 200
asacrossdifferentscreensizesandresolutions,onaunified
to 60 FPS takes into account a large amount of duplicate
scale(bits).
datacreatedbythetouchscreen,andisnecessarytoprevent
4. METHOD artifactsintheresampling.
FFT analysis showed that the frequency reduction com-
Ourstudydesignbuildsonagenerate-test-retestparadigm
binedwiththecubicsplinemethodresultsinlowpassfilter-
whereparticipantswerefirstaskedtocreateagesture,recall
ingofthedata,removinghighfrequencyjitterintroducedby
it,andrecallagainduringthesecondsession,aminimumof
thetouchscreenhardware,butpreservingthelowfrequency
10dayslater. Participantsweretoldthattheyshouldgener-
contentofthegesturedata. Todealwiththeunevensample
atesecuregestureastheywoulddoinrealsituationandthat
rateinthenon-interpolateddata,theLomb-Scarglemethod
theirabilitytorecallthemwillbetestedlater.Theywerenot
wasused.
given any hints about what a secure gesture might be. For
At this stage, artifacts in the raw data were detected and
understanding the generation and recall process, we used a
correctedaswell. Theprimarysuchartifactiswhenasub-
mixedmethodapproach: aftergeneratinggesture,allpartic-
5
jectfailstoplacetheirfingersonthetouchscreeninthesame 5. RESULTS
order between consecutive trials. As fingers are numbered
Foreachparticipantwhocompletedbothsessions,atotal
sequentially upon being detected by the touchscreen, this
of17gesturerepetitionswererecorded. Inall,1038record-
placesthemoutoforder, resultinginartificiallylowscores
ingsweregenerated,as6participantsdidnotattendsession
in the later analysis. This was corrected by comparing the
2, and 3 additional traces were not completed. The groups
startingcoordinatesofeachfinger, andcorrectingtheorder
of repetitions are summarized in Table 1. Because the es-
tobeconsistent. timated Mutual Information (Iˆ), is computed in pairs, Iˆis
Experiment Design. The experiment followed a 17 x 2
reportedasthemeanfortherelevantrepetitions.
mixedfactordesignwitharepeatedmeasurementvariableof
Session Trial# Group
gesturerepetition(17levels)andabetween-subjectvariable
1 1-10 Generate
of time gap between sessions (2 levels). In the 17 gesture
1 11-12 Recall1
repetitions, 10wereperformedduringthecreationprocess,
2 13-17 Recall2
followed by another 2 repetitions after a short distraction,
and5wereperformedinthesecondsession. Table 1: Repetition Groups by Session # and Trial #. Iˆ
Procedure. Weconductedatwo-sessionstudy. Thesec- wascomputedforallpairsofrepetitionseachgroup.
ondsessionwasheldafteraminimum10daysafterthefirst
session. Thedetailsoftheprocedurewereasfollows. 5.1 FactorsAffectingSecurity
First session: First, the participants were introduced to
the study, which included reading and signing the consent 3.5
SingleFinger
form,discussionoftheirrightsasparticipantsandhowthey S) 3 MultiFinger
willbecompensated. GestureCreation(Generate). Each (
e2.5
participantwasgiventhesametabletandwasaskedtocre- m
atewhattheythoughtwouldbeasecuregesturebydrawing Ti 2
onit. Theparticipantswereaskedtogenerateagesturethat 1.5 Session1 Session2
1 2 3 4 5 6 7 8 9 1011121314151617
they think others could not guess, but they could also re-
Repetition
memberlater.Theparticipantscouldretryuntiltheyfeltsat-
isfied with their gesture. Then the participant repeated the )40
s35
same gesture for additional nine times on the same tablet. Bit30
Participants were presented with a blank screen for draw- (25
ˆI 20
ingtheirgestures. Theapplicationdidnotlimitthenumber
n15
of fingers participants use to create the gesture. However, ea10
thenumberoffingersusedcouldnotbechangedduringthe M 5 Session1 Session2
1 2 3 4 5 6 7 8 9 1011121314151617
drawing process, that is, the gesture has to be drawn con-
Repetition
tinuouslywithoutliftinganyfingersfromthescreen. Once
it was completed, the gesture is displayed on the tablet’s
screen as a colored curve line, as shown in Figure 1. The Figure 2: Mean Iˆand mean gesture duration vs Rep-
displaywascheckedvisually, toverifythatthegesturewas etition. Top: Within Generate, mean gesture duration
recorded properly. Subjective Workload Assessment The trendeddownwardsasgesturesincreasedinspeed. Bot-
participantwasaskedtofilloutNASA-TLXformregarding tom: Over the same repetitions, Iˆtrended upwards be-
thecreationprocess.Distraction.Theparticipantwasasked forelevelingout. ItthendroppedquicklybetweenGen-
toperformamentalrotationtaskandcountdownfrom20to erateandthetwoRecalls.
0inmind. GestureRecall1. Theparticipantwasaskedto
Figure 2 shows the mean Iˆof each repetition across all
recallthegesturebyrepeatingittwiceonthesametabletus-
gestures versus the repetition number. During gesture cre-
ingthesameapplication.Demographicquestions.Thepar-
ation in Generate, Iˆtrended upwards from repetitions 1-4,
ticipant was asked usual demographics questions that were
andthenleveledofffromrepetitions5-10. Thisshowsthat
aggregatedandreportedabove.
ittakesatleastthreerepetitionsforaparticipant’sgestureto
Second session: Gesture Recall 2. The participant was
becomestable.
asked to recall his/her gesture by repeating it for five times
A second major feature is that by Recall1, repetition 11
on the same tablet using the same application. Subjective
and12,theIˆhasdroppedsuddenly,despiteadelayofonly
Workload Assessment. The participant was asked to fill
afewminutes. Surprisingly, morethan10dayslater, Iˆdid
out NASA-TLX form regarding the recall process. Short
notdropmuchfurtherinRecall2. ThedropbetweenGener-
Survey. Theparticipantswereaskedsomequestionsabout
ate and Recall1 was more severe for single finger gestures,
therecallprocessandotherthoughtsaboutthestudy.
and the drop between Recalls 1 and 2 was more severe for
multifinger gestures. In both cases, Iˆstabilized, at around
27bitsforsinglefingerand15bitsformultifingergestures,
6
having dropped from an initial value of around 35 and 20 15
bitsrespectively.
Figure 2 also shows the amount of time taken to record nt10
u
eachrepetitionofeachgesture. Thisdurationalsochanged o
C 5
withrepetition. Asthenumberofrepetitionsincreased,the
mean duration of each repetition trended downwards from 0
around 3 seconds to around 2. Unlike Iˆ, it remained sta- 0 10 20 30 40 50 60 70 80 90 100110120130140
blefromtherethroughthetworecalls. Interestingly,during Mean Iˆ(Bits)
Recall2 the multifinger gestures sped up from 2.5 seconds
tounder2seconds,whereasthesinglefingergesturesstabi- Figure4: HistogramofmeanIˆofGenerate,pergesture,
lized. showinglowscoring,biaseddistribution.
AplotofthemeanIˆofeachgestureversusmeanduration
appears in Figure 3. This shows that many gestures with a
gesture by its Iˆin each of the five categories, and evalu-
shortdurationalsohadalowIˆ. ThehighestIˆgestureshada
ated the top five in each category. There was high correla-
durationofbetween2and5seconds. Iˆincreasedwithdura-
tion between the categories, and as such, the top five ges-
tion,buthadapoorfit,explainingonly5%ofthevariation,
turesforeachcategoryoverlappedsignificantly,havingonly
asthehighestdurationgestureswereeitherveryhighorvery
nineuniquegestures. Thebestgesturesfellintotwogroups,
lowIˆ. Longdurationcouldthusindicateeitheracomplex,
angular paths with many hard turns, and signatures. This
carefulgesture,orarelativelackofpractice.
matched our expectations, as the algorithm looks for both
consistency between trials and for large deviations from a
150
s) MeanIˆvsDuration straightline. Thedefiningvisualfeatureofthelowestscor-
Bit100 LinearFit,r2=0.053 ing gestures was having only a few, gentle curves. Many
(
ˆI weremultifinger,withtheadditionalfingersmerelycopying
n 50 themotionofthefirst. AgalleryappearsinFigure5.
a
e
M
0
0 1 2 3 4 5 6 2500 Finger1
Gesture Duration (S) 2000 Finger2
Figure 3: Mean Iˆvs. mean gesture duration. The r2 1500
value indicates a poor linear fit, showing little correla- 1000
tion.However,thehighestIˆgestureswerealllongerthan
2qusierceodn.ds,suggestingthatsomedegreeofprecisionisre- Pixels) 5000 Iˆ=142.53 Iˆ=110.54 Iˆ=94.51
(
2500
Figure 4 shows that majority of the user-generated ges- on
tures had relatively low Iˆwith a only a small tail of high siti2000
Iˆ. For Generate, the distribution had a mean of 27.72 bits, Po
1500
andastandarddeviationof26.30bits. However,takingonly Y
the second, stable half of Generate, the mean rose to 33.42 1000
bits, with a standard deviation of 31.13 bits. In both cases,
500
thestandarddeviationwasaboutthesamesizeasthemean,
Iˆ=1.69 Iˆ=4.26 Iˆ=4.57
indicatingalargevariability. Thehistogramshiftedaswell, 0
0 500 10001500 0 500 10001500 0 500 10001500
becomingslightlymoreuniform. Althoughmanyusercho-
sengesturesscorepoorly, somescoredhighlyaswell, sug- X Position (Pixels)
gesting that it is possible create guidelines to emulate the
characteristicsofhighscoringgestures. Figure 5: Gestures ranked by Iˆfor Generate and Re-
We compared mean Iˆ with participant age and gender. call2. Top: Best three gestures, showing the many tight
There was a mild negative relationship between Iˆand age, turns characteristic of high scoring gestures. Bottom:
withar2 of0.08. Sevenofthetopeightgestureswerecre- Worstthreegestures,Lowscoringgestureshadfew,gen-
ated by participants under the age of 25, with the remain- tleturns.
ing one under the age of 30. Mean Iˆfor male participants
(N=23) was 29.82 bits and mean Iˆfor female participants
(N=40)was25.00bits.
Finally, we looked for defining visual characteristics of
the highest and lowest scoring gestures. We ranked each
7
5.2 SecurityofMultitouchGestures 2500 Finger1
As seen in Figure 2, the mean Iˆof multifinger gestures 2000 Finger2
is lower than that of single finger ones. We compared Iˆof
1500
thesegesturestoestimatehowmuchadditionalinformation
is added by additional fingers. Figure 6 shows the higher 1000
mean Iˆof single finger gestures, and the rarity of gestures
usingmorethantwofingers. Recall2showedagreaterdif- els) 500
ferenceinIˆthanGenerate,asanumberofparticipantsfailed Pix 0 Ratio =1.34 Ratio =1.12 Ratio =1.09
tousethesamenumberoffingerswhentheyreturnedinses- (
2500
n
sion2. Ofthe63participants,32decidedtocreatemultifin- o
ger gestures, and 31 chose to create single finger gestures, siti2000
o
with only three participants using more than two fingers. P
1500
Participantswerepromptedthattheycoulduseasmanyfin- Y
gers as they liked, but were not instructed on how many to 1000
use.
500
Weperformedregressionanalysisontheeffectofthenum-
beroffingersonIˆ. Theresultshowsthattheeffectissignif- 0
icantfortheIˆofGenerate,b = 17.948,t(57) = 2.763,p = 0 500 10001500 0 500 10001500 0 500 10001500
.0077, while not for the Iˆof Recall2, b = 11.898,t(57) = X Position (Pixels)
1.841,p = .07. However, the regression model only ex-
plained11.8%ofvarianceintheIˆforthesignificanteffect. Figure7: Top: Best3Gesturesbymemorability. These
Inshort, thenumberoffingersisnotthemostmajorfactor have a shorter length and decreased complexity com-
effectingIˆ. paredtohighIˆgestures. Bottom: These3gestureshad
the greatest difference in path between fingers. Two of
40 40 thethreeareasimplemirroringofthepath(Left, Mid-
) Generate
Bits30 t30 Recall2 dle),whileonly(Right)addsalargeamountofIˆ.
( n
ˆI 20 ou20
n C
ea10 10 ferenceinIˆbetweensessions.Wecomparedthememorabil-
M
0 0
ityratiotothetimeintervalbetweenGenerateandRecall2,
1 2 3 4 1 2 3 4
Fingers Used Fingers Used asseeninFigure8. Thelinearfithowever, hadar2 ofless
than 0.03, showing minimal dependence on the delay be-
Figure6: Iˆandnumberoffingersusedtoperformges- tweensessions.
ture, and change between sessions 1 and 2. This shows
1.5
theboththehigherperformanceofsinglefingergestures,
ˆI
aswellastherarityofusingmorethantwofingers. f 1
o
o
ti0.5
5.3 FactorsAffectingMemorability a
R
Figure 7 shows the best remembered gestures. To eval- 0
uatememorability, wecomputedthecrossgroupIˆofGen- 5 10 15 20 25 30 35 40 45
Interval between sessions (Days)
erate and Recall2, with pairs consisting of one from each
group, instead of both from within a group. However, the
largedifferencesinIˆfromgesturecomplexityobscuredthe Figure 8: Memorability vs interval between sessions.
differences from repetition accuracy, as the cross group Iˆ This plot compares the relative quality of gesture recall
hasalinearfitwithanr2of0.65withtheIˆofGenerate. We versus time between Generate and Recall2. There was
compensated by dividing the mean cross group Iˆfor each nosignificantcorrelation,witha(r2<0.03).
gesturebyitsIˆfromGenerate.
Gestures that scored highly on the resulting ratio have
5.4 IndividualDifferences
thebestconsistencybetweenGenerateandRecall2,ascom-
pared to the consistency within Generate. The top gestures Giventhesurprisingdeficiencyofthemultifingergestures,
for memorability are shorter and simpler than the top ges- welookedatspecificexamples.Onlythreeparticipantsused
turesforsecurity. multitouchgestureswithsignificantlydifferentmotionsbe-
Once we had a way of comparing how well gestures are tweenfingers,andintwoofthethreecasestheyweresimple
remembered,weinvestigatedwhatmightcausethelargedif- mirrorings of the motion. These three gestures appear in
8
Figure7. Allothercaseswerejusttranslationsofthesame
trace,asifthegestureweremadewitharigidhand.Gestures
withminimaladditionalinformationperfingerwereinpart 30
scoredlowbecauseallgestureswererunthroughaPCAal-
gorithm to remove redundant information, prior to analysis
ofIˆ.
e
Participantsalsocommonlyperformedseveralcategories cor20 Sessions
of error. Despite instructions, 19 of the 32 multifinger us- n s 1
ingparticipantsplacedtheirfingersdowninaninconsistent ea 2
M
order between sequential trials. This was fixed in prepro-
10
cessing to ensure that computation of Iˆused the same fin-
gers. Inaddition,someparticipantsrotatedthetableteither
duringasessionorbetweensessions. Thisstandsoutvisu-
allywhencomparingtraces, butthealgorithmisnegligibly 0
affected by starting angle, as it islooking at the residual as
tphaerintrgacteheturernsusltfrwomithiittssosreisgsiino.nTonheisowrtawsovecroiufinetderbpyarctothma-t M e ntaPlh ysicTale mPpeorfroalrm a n c e EffFrorutstratio n
didnothavesaidrotatedtrace. Someofthelowestscoring Items of TLX form
gesturesfeaturedarotationbetweenrepetitions,andallges-
tureswithrotationbetweensessionsscoredpoorlyoncross Figure 9: Mean score and corresponding 95% confi-
sessionIˆ. However,thegestureswithcrosssessionrotation dence interval for each item on the TLX form, for the
also scored poorly within a session. Only one gesture with twosessionsofourstudy. Themeanscoresofeachitem
rotation between sessions scored significantly higher on its insessiontwowerelowerthanthoseofsessionone.
withinsessionIˆthanonitscrosssession.
Given that error containing gestures scored poorly, even sions, given the fact that the data does not follow a normal
whenexcludingthoserepetitionpairscontainingtheerrors, distribution. TheresultisshowninTable2.
wetaketheseerrorsasanindicatorofpoorrecall. Extrafin- From Table 2 we can see that except for Frustration, all
gers or complex shapes are no guarantees of a high score, items show significant differences in scores between ses-
without consistent execution. Attention when creating the sions. Itissafetosaythatgiventhedatawehave,itisvery
gestureisthusimportant,practicingaccuraterepetitionrather likelythatparticipantsfelttherecalltaskwaseasierthanthe
thanjustgoingthroughthemotions. creationtaskintermsofworkload. Thiscouldbebecauseof
increasedpracticeorfamiliarity,andintuitivelyagreeswith
5.5 SubjectiveTaskLoadAssessment
theincreaseingesturespeedseeninFigure2.
Analysis
Thissectioncontainstheanalysisofthestudy’sTLXforms. 6. PRACTICALAUTHENTICATION
Foreachsessionofourstudyweaskedparticipantstofillout SYSTEMIMPLEMENTATION
oneTLXform,whichisusedtoassesssubjectiveworkload
In this section, we describe our extension to a practical
of given task. Figure 9 shows the mean score and corre-
singletouchgesturerecognizerformultitouchgestures,and
sponding95%confidenceintervalforeachitemintheTLX
the results of how the participants’ gestures would perform
form,forbothsessions.
inarealauthenticationsystembasedonourmultitouchrec-
Item Mental Physical Temporal ognizer. We also present our trial on shoulder surfing at-
pvalue <0.001 0.025 <0.001 tacksthatindicateshowfree-formgesturesarerobustagainst
shouldersurfing.
effectsize -.038 -0.21 -0.34
Arecognizerworksbytakingauser’sgesture, passingit
Item Performance Effort Frustration
througharecognitionalgorithm,andcomputingwhetheror
pvalue 0.029 0.0013 0.072
not the gesture is a successful match for a stored template.
effectsize -0.20 -0.30 -0.17
Thedevicewillstoreaseriesoftemplatesofwhichtheges-
Table2: Wilcoxonsigned-ranktestonthescoresofeach tures are compared to for authentication; the best score is
itemoftheTLXresultforthetwosessions,showingsig- used and compared to a threshold value. We have the fol-
nificantdifferencesinallexceptFrustration. lowing assumptions for the recognizer: 1) location invari-
ance: No matter where the correct gesture is drawn on the
Figure 9 suggests that the mean scores of session two screen, itshouldbeauthenticatedcorrectly. 2)scaleinvari-
are lower than those of session one. To prove the point ance: No matter what size the correct gesture is drawn to
weconductedanon-parametricrepeated-measureWilcoxon on the screen, it should be authenticated correctly. 3) rota-
signed-ranktestonthescoresofeachitemfromthetwoses- tioninvariance: Nomatterwhatanglethecorrectgestureis
9
drawnatonthescreen,itshouldbeauthenticatedcorrectly. 1
Location and scale invariance are important when dealing
withcross-platformauthentication;thescreendimensionin-
n=2
herently limits what size the gesture can be drawn to and R
P 0.5 n=4
theareaoverwhichagesturecanbeperformedwouldcause T
n=6
wild variations in where it would be drawn depending on
n=8
theuser. Rotationinvarianceisusefulforreducingcompu- n=10
tational complexity when dealing with individualized free- 0
0 0.2 0.4 0.6 0.8 1
form gestures as we have in our data set. We note that FPR
authentication system designers can opt to restrict or relax
1
theseassumptions.
We elected to implement and extend the Protractor [18]
recognitionalgorithm,apopularnearestneighborapproach. n=2
R
Giventhegesturetemplatesobtainedandthetworecallsets, P 0.5 n=4
T
we would like to measure how well the gestures perform. n=6
Protractorisanimprovementuponthe$1Recognizer[35], n=8
n=10
having both a lower error rate [18] and an effectively con-
0
stantcomputationaltimepertrainingsampleascomparedto 0 0.2 0.4 0.6 0.8 1
FPR
$1’sgrowingcostpertrainingsample.Protractorpresentsit-
selffurtherasanattractivealgorithmforthedataundercon-
Figure10: ThisfigureshowstheROCcurvefortherec-
sideration since it has low computational complexity com-
ognizer across the two data sets with variable template
paredtoothertechniques,forexample,DynamicTimeWarp-
numbers – ’n’ corresponds to the number of templates.
ing (DTW) [35] and Hidden Markov Models (HMM) [10,
The top plot is the first data set and the second plot is
21]. Ingeneral,Protractor’serrorratefallswithanincreas-
the second data set. The ROC curve is a measure of
ingnumberoftrainingsamplesandat9-10,theerrorrateis
theperformanceofabinaryclassifier;thecloserthetop
lessthan0.5%[18].
left corner of the plot moves towards the vertical axis,
Protractor is only a single touch recognition algorithm;
the better the performance. The first data set is closer
other projects considering gestures have used more general
to that corner than the second, showing that the second
techniquesforgesturerecognitioninsteadofanearestneigh-
datasetperformedworseandmusthaveahigherequiva-
borapproach.Forexample,Sae-baeetal.[28]appliedDTW
lenterrorrate.Asthenumberoftemplatesincreases,the
to deal with their multitouch gesture set. For the authenti-
closer the curve shifts up and the better the recognizer
cator to remain practical, it needs low computational com-
performs. In general, the recognizer classified the first
plexity, high speed, and low error rate per template to be
data set with a lower EER than the second set despite
implementedonamobiledevice–Protractorcanmeetthis
those being the same gesture types, indicating a weak-
demand; DTW cannot. As we are dealing with multitouch
ness on the parts of participants to accurately replicate
gestures, itisnecessarytomodifyProtractor. Theaccount-
theirgestures. TheEERvaluescanbereadinTable3.
ingprocedureisasfollows.
OurMultitouchExtensiontoProtractor itisconsideredapositiveauthentication;otherwise,it
isnegative.
1. Eachfingerissplitintoitsownsetofpointsandpassed
through the algorithm and compared to templates of It is important to note that the threshold should be set high
similar fingers and the score is computed for each in- enoughsuchthatauthenticationfailureisallbutguaranteed
dividually. for gestures that are being matched to templates other than
theirown.
2. Theyarethenaveragedtogether.
Asareminder,whentheparticipantsbeganthestudy,they
were asked to repeat their gestures ten times; each of these
3. Thereareprovisionsbuiltintoplacetoensuretheau-
tentrialsisusedbyProtractorastemplatesforthatgesture.
thentication failure for the wrong number of fingers.
There are two authentication data sets under consideration
Inthecaseofnfingersversusm,thenumberoffingers
here: the first is where participants were asked to replicate
is compared. If n is equal to m, then the recognizer
their gesture after a mental task and the second where they
continuestothenextstep. Ifnisnotequaltom,then
wereaskedtoreplicatetheirgestureafteratleast10days.
therecognizerimmediatelystopsthecomputationand
registersthescoreas0;afailure. 6.1 RecognizerPerformance
4. Thisscoreistheniscomparedtothethresholdvalue.If Wewantedtoknowhowaccuratelytherecognizerisper-
thescoreisgreaterthanorequaltothethresholdthen forming across the different gestures in our data sets. To
10