ebook img

Mobile phone identification through the built-in magnetometers PDF

0.64 MB·
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Mobile phone identification through the built-in magnetometers

1 Mobile phone identification through the built-in magnetometers Gianmarco Baldini∗, Gary Steri∗, Raimondo Giuliani∗,Vladimir Kyovtorov∗ ∗European Commission, Joint Research Centre, Ispra, Italy [email protected], [email protected], [email protected], [email protected] 7 Abstract—Mobile phones identification through their built-in The identification of mobile phones through their compo- 1 componentshasbeendemonstratedinliteratureforvarioustypes nents has various applications in security, forensics or fight 0 of sensors including the camera, microphones and accelerom- against the counterfeiting of electronic products. In security, 2 eters. The identification is performed by the exploitation of identification proofs based on physical characteristics are the small but significant differences in the electronic circuits n generatedduringtheproductionprocess.Thus,thesedifferences much more difficult to be faked and reproduced and they a become an intrinsic property of the electronic components, are intrinsically related to the component and the mobile J which can be detected and become an unique fingerprint of phone itself. Fingerprints can be used to perform multi-factor 6 the component and of the mobile phone. In this paper, we authentication where physical identification is combined with 2 investigatetheidentificationofmobilephonesthroughtheirbuilt- cryptographic authentication (see [2]). In this case, both inter- in magnetometers, which has not been reported in literature ] yet. Magnetometers are stimulated with different waveforms model and intra-model identification would be applicable. R using a solenoid connected to a computer’s audio board. The Many forensics applications can exploit the knowledge of C identification is performed analyzing the digital output of the the identity of the mobile phone used in a crime scene. For magnetometer through the use of statistical features and the . example, camera identification techniques based on SPN can s SupportVectorMachine(SVM)machinelearningalgorithm.We c be applied to the images uploaded to the web by a criminal prove that this technique can distinguish different models and [ brandswithveryhighaccuracybutitcanonlydistinguishphones to identify the mobile phone and the criminal identity [3]. In 1 of the same model with limited accuracy. the fight against the distribution of counterfeit products, the v identification of a mobile phone through its components can 6 beusedtodetectcounterfeitphones,whichareoftenbuiltwith 7 I. INTRODUCTION cheaper components compared to the original ones. Cheaper 6 The mobile phone identification based on the physical components would have a different and distinct signatures 7 0 propertiesofitscomponentsisrelatedtothecapabilitytodis- from the ones used in original phones. In this case, inter- . tinguishbetweenphonesofthesamemodelbutdifferentserial model identification would be appropriate. In fingerprinting 1 numbers (intra-model identification) and between phones of literature, a distinction is made between Classification and 0 7 differentmodelsandbrands(inter-modelidentification).Intra- Verification. In this paper, we will adopt similar definitions 1 model identification is usually more difficult to achieve than to what presented in [4] where Verification is the process : inter-modelidentificationbecausemobilephonemanufacturers to verify the claimed identity of a mobile phone, while v i usethesamematerialsinthesamemodel,whiledifferentmod- Classification considers authorized mobile phone fingerprints X elsareusuallybuiltusingdifferentmaterialsandcomponents. (e.g., a reference library of known mobile phones) to create a r The identification can be performed by exploiting the tiny model that best discriminates the mobile phones on the basis a but significant differences in the materials or the differences of the built-in magnetometers. In this paper, both processes introduced in the manufacturing process. These differences will be implemented. result in small disturbances in the digital output generated As described in the section II, there is a considerable by the mobile phone. They are also called fingerprints of the research body for various components of the mobile phones, component, similarly to the fingerprints of a human being. but there are no reported research findings for the fingerprints For example, the digital camera of a mobile phone introduce of the magnetometers in a mobile phone. This paper aims to in every image processed and stored in the memory of the address this gap. phone, a specific feature which is called Sensor Pattern Noise (SPN). The SPN can be extracted from a sufficient number The structure of this paper is the following: section II of images as demonstrated in the pioneering work by [1]. In provides an overview of the literature of electronic device a similar way, researchers have demonstrated the capability fingerprinting in mobile phones. Section III describes the to identify mobile phones with different degrees of accuracy methodology. In particular, the section describes the approach for the built-in accelerometers, radio frequency components, used to stimulate the magnetometers of the mobile phone microphones and so on. The section II gives an overview of in order to generate repeatable digital outputs. Then, the the research work for the different types of components and section describes the extraction of the statistical features from the different adopted techniques. the digital output and the application of the Support Vector 2 Machine (SVM) machine learning algorithm to perform on inter-model identification and verification as they used the classification and verification process. Then, section IV mobile phones of various models from different brands. The presents the results of the application of the classification features were then fed to three distinct classifiers: the Sparse algorithm on a set of nine phones of different models and representationsClassifier(SRC),theSVMandNearestNeigh- brands including phones of the same model to test the bour (NN). SVM provided the best performance in many intra-model identification. Finally, section V concludes the configurations. paper. Finally, accelerometers and gyroscopes can also be used for fingerprinting as demonstrated in [9], [10] and [11]. The mobile phones are submitted to repeatable motion patterns, II. RELATEDWORK whichstimulatetheaccelerometersandgyroscopestoproduce The fingerprint concept has been applied to different com- specific responses. The small variations in the responses are ponents of the mobile phones. There is an extensive literature used to create the fingerprints and distinguish between the on the possibility to fingerprint the digital camera of a mobile mobile phones. Both inter-model and intra-model identifica- phone thanks to the imperfections present in the various tionandverificationwereperformedwithverygoodaccuracy. components,whicharepartofthecamera:lenses,ColorFilter SVM and other classifiers were used in the cited references. Array (CFA), the sensor or even the software to process the While there is an extensive literature on the identification images before storing them (e.g., compression algorithms). andverificationofmobilephonesfordifferenttypesofcompo- The SPN presented in the pioneer work by [1] is the most nents, there is no reported study on the identification through adopted and referenced approach because of its high accuracy magnetometers. The goal of this paper is to address this gap. even for intra-model identification and its persistence in time. We note from this survey that the use of statistical features The SPN is based on the non-uniformity of each sensor pixel in combination with SVM has been widely used in literature, sensitivity to light. In comparison to other types of noise or which supports our choice to use a similar approach. imperfections having random distribution, the sensor pattern noise is a deterministic component, which stays the same for III. METHODOLOGY different images (in fact, it is strengthened with an increasing A. Workflow number of images). Even a limited set of 30-50 images can Thegoalofthissectionistodescribetheoverallmethodol- provide a SPN with good accuracy [3]. ogytoperformtheclassificationofthemagnetometersbuiltin Fingerprinting of the radio frequency components of a the mobile phones, the test setup to collect the digital output mobile phone has also been performed by various authors for (i.e., observables) of the magnetometers, the set of statistical different wireless standards. Researchers have applied Radio features used to generate the fingerprints and the SVM ma- Frequency(RF)fingerprintingto802.11ain[2],toGlobalSys- chine learning algorithm used to perform the classification. tem for Mobile Communications (GSM) in [5] and to ZigBee The overall workflow is presented in figure III-A and the devices in [4]. In both cases, the fingerprinting technique is single steps are described in the following subsections. based on the selection of statistical features of the collected and processed radio frequency signals emitted by the mobile phone.Thestatisticalfeaturesarevariance,standarddeviation, B. Stimulation of the magnetometers skewness and kurtosis (e.g., [4]), which are also used in The magnetometers in the phones are stimulated using a this paper. Other statistical features based on the Hilbert- cost-effective solenoid, which is connected to the audio board Huang Transform are used in [6] for the GSM standard. RF of the computer. Since the goal of the experiment is to show fingerprinting usually provides very high accuracy (more than the feasibility of mobile phone identification using low cost 90%) both for intra-model and inter-model classification). equipment, a consumer mass market audio board is used to The microphone of a mobile phone is another component stimulatethemagnetometersbecause.Apictureoftheschema thatcanbeusedforidentificationandclassification.In[7],the used to stimulate the magnetometers of the mobile phone authors use Mel-Frequency Cepstrum Coefficient (MFCC) as is shown in figure 2. The audio board runs a set of three fingerprinting feature because it is commonly employed as a synthetic waveforms based on square waves. The waveforms featuretocharacterizehumanspeakersinaudiorecordings.In are shown in figure 3. The central waveform is the reference [7],theauthorsapplyMFCCtotheidentificationofthebrand waveformusedtoconductmostofthemeasurements;theother and model of a mobile phone on a experimental set of 14 two are used to show the impact on performance accuracy if differentmobilephones.Apartfromtwomobilephones,which the density of the square waves increases or decreases. In the are of the same model and brand, all the other phones have rest of the paper, we will identify with waveform A the first differentbrandsandmodels,sotheanalysisismostlyforinter- waveform (figure 3a), with waveform B the second waveform model rather than intra-model verification and identification. (figure 3b) and waveform C the third waveform (figure 3c). The authors used the SVM classifier for identification and The waveforms were defined on the basis of the following verification. considerations: a) a sharp impulse is needed to stimulate Another set of features for microphone fingerprinting was the magnetometers to generate adequate fingerprints, b) the used in [8], where large-size raw feature vectors are obtained distance between the square waves is defined on the average by averaging the log-spectrogram of a speech recording along hysteresis values of the common mass market magnetometers the time axis for each mobile phone. The authors focused and then empirically tested, c) a sequence of square waves 3 Magnetometer Digital Outputs Power Synchronization collection Normalization Magnetometers Mobile Stimulation Phone Generation of statistical features SVM Identification/ Machine Features Verification Learning selection Algorithm Fig. 1: Workflow for the classification of mobile phones based on magnetometers fingerprints generates response shapes by the mobile phones, which are 1.2 appropriatefortheuseofthestatisticalfeatures(e.g.,skewness 1 and kurtosis) defined in a subsequent section of this paper. The generated audio form is then played on the audio card 0.8 connectedtothesolenoid.Notethatthesharpriseofthesquare mplitude0.6 A wave is what generates the impulse to the magnetometer. 0.4 0.2 Solenoid 0 0.5 1 1.5 2 2.5 Computer Time ×104 (a)WaveformA(short) 1.2 Audio card 1 0.8 mplitude0.6 A 0.4 0.2 Collected Response 0 0.5 1 1.5 2 2.5 3 3.5 Stimulating waveform Time ×104 (b)WaveformB(medium) Fig.2:Testset-uptostimulatethemagnetometersofamobile phone 1.2 1 Eachphoneisstimulatedwitharepetitionofthe260square 0.8 wbeatvweesendetshceribneededinto3.haTvheeavsatlauteistwicaasl cshigonsiefincaanstanutmrabdeeroofff Amplitude0.6 0.4 samples (a sample is the response of a mobile phone for a single waveform) and the time requested to stimulate the 0.2 magnetometers (it is roughly one hour for the waveform B). 0 1 2 3 4 5 6 7 8 Time ×104 (c)WaveformC(long) C. Digital outputs collection Fig.3:Waveformpatternsusedtostimulatethemobilephones The digital output from the magnetometers is collected using a free available application called AndroSensor that we installed on the phones. In this experiment, we use only phones supporting the android operating system but a similar after synchronization and normalization is shown in figure study can be performed on iOS based phones. The responses 4. We note that the response is quite manifold for different from two different mobile phones to the three waveform models of phones, which provides the unique fingerprint of 4 Mobilephonemodel Numberofdevices the magnetometer (and the mobile phone) as described in HTCOne 3 section IV. The collection of the data from the mobile phones HuaweiAscendMate 1 was set to a frequency of 20 Hz: samples were taken by the SamsungGalaxyS5Androidversion4.4 1 SamsungGalaxyS5Andriodversion6.0 2 magnetometers every 50 ms. This value was chosen because SonyExperia 2 it was the lower common limit for all the mobile phones’ sensors. TABLE I: List of Mobile Phones used in the experiment The list of the 9 mobile phones used in the experiments is shown in table I. Phones are from 4 different brands and in fingerprinting (see [2], [9] and [5]). The synchronization 4 models. The models for which more than one phone was is performed using the Variance Trajectory technique. This usedareHTCOne(threedevices),SamsungS5(threephones) technique is based on the calculation of the variance on a and Sony Experia (two devices). Although the three Samsung sliding window of samples, which moves along the response. devices are of the same model, one of them runs a different The variance will increase substantially when the sliding version of the Android operating system. The selection of the windows meet a sharp rise or fall of the response. The rise Samsung mobile phones with different software version was of the variance identifies the beginning and the end of the done on purpose to evaluate the impact of different software. response. This process is applied to all the 260 responses Inmostcases,mobilephoneswereproducedindifferentyears obtained in collection phase. The application of the variance even if they are from the same model. trajectorywasinspiredbyitsuseinRFfingerprintingtodetect the start and end of the wireless communication bursts (see 8 [2]). 6 ThenormalizationisperformedbyapplyingtheRootMean 4 Square (RMS) to each single response for each single mobile 2 mplitude0 phone. A -2 -4 E. Statistical features -6 Inthissection,wedefinethestatisticalfeaturesusedinthis -8 0 5 10 15 20Time25 30 35 40 45 paper to generate the fingerprints. As described in [12], the (a)ResponsetoWaveformA classification of time series can be based on the representa- tion of the time series using a set of derived properties, or 8 6 features (i.e., feature-based classification). The advantage of 4 the feature-based approach is that it transforms a temporal Amplitude-202 pcaronblbeemtihneraefsotarteicmonoereancdomthpeutvaetrioifincaalltyionefafincdieindtenotinficceatitohne -4 statistical features have been generated. Another advantage -6 is the presence of many statistical features, which could be -80 10 20 30 40 50 60 used for classification. This provides a larger set of tools for Time identification even if there is the risk that some statistical (b)ResponsetoWaveformB features are similar or correlated and increasing the number 8 offeaturesdoesnotprovideasignificantincreaseinaccuracy. 6 Thus a features extraction or features selection process is 4 needed. The disadvantage of the feature based approach is 2 mplitude0 that it usually requires a statistical significant number of A-2 observables from the mobile phone to perform an accurate -4 identification and verification. This is the reason why the -6 magnetometers must be stimulated a significant number of -8 0 20 40 60 80 100 120 140 160 180 times (260 in our case) to support the subsequent application Time (c)ResponsetoWaveformC of the machine learning algorithm. Since this is the first attempt in literature to fingerprint Fig.4:Responsescollectedbythemobilephonesfordifferent magnetometers, there is no previous research work that can waveforms recommend a set of statistical features. Then, we use the references presented in the related work section to collect features,whichareusedbyotherauthors.Fromthefingerprint D. Synchronization and Normalization work on RF frequency components [4], [13], we selected The responses must be synchronized and normalized, oth- variance, standard deviation, skewness and kurtosis; from erwisethestatisticalfeatureswillgeneratefalsevalues,which the work on accelerometers [11], we selected entropy based are not dependent on the mobile phone itself but on other features. factorslikethedistanceofthemobilephonefromthesolenoid. The definition of the features is provided in equations (1) Synchronization and normalization is a common approach to (6): 5 features, which results in 18564 features). In the brute force N approach, all the possible combinations of 6 features were (cid:88) H {S }=− (S2 ∗Ln(S2 )) (1) calculated. The process was repeated for all the folds used in ShannonEntropy TD TD TD i=1 themachinelearningclassification(seeIII-F).Anexamplefor onefoldisprovidedinthenormalizedgraphoffigure5where N H {S }=(cid:88)Ln(S2 ) (2) the presence of peaks (maximum values) is evident. From LogEnergy TD TD these optimum values, the SFS algorithm calculated which i=1 features should be added. The final result of the example of a (cid:118) single fold is that the best combination of features is [1 2 3 (cid:117) N Standard Deviation{S }=(cid:117)(cid:116) 1 (cid:88)(S −µ)2 (3) 6 7 10 15 17] (where two additional features were added to TD N −1 TD one of the best sets identified from the brute force approach). i=1 N 1 (cid:88) Variance{STD}= N −1 (STD−µ)2 (4) 1 i=1 0.8 SKkuerwtonseisss{{SSTTDD}}==σ1σ413(cid:88)N(cid:88)i=N1(S(ST4T3DD−−µµ43),) ((56)) Normalized acuracy metric000...246 i=1 where the term S represent the instantaneous amplitude TD 0 2000 4000 6000 8000 10000 12000 14000 16000 18000 of the signal. Index of the feature set The response was also converted in the frequency domain Fig.5:Accuracymetricagainstthesetoffeaturesforonefold with a Fast Fourier Transform (FFT). Then, the statistical features identified in equations (1) to (6) were applied respec- tivelytothePhaseandtheAmplitudeofthefrequencydomain representation of the response. This transformation provides a F. MachineLearningalgorithmsandparametersoptimization set of additional 12 features (6 for the phase of the FFT and The SVM is a very well know technique in supervised 6 for the amplitude of the FFT) for a total of 18 features. The machine learning and it has been used in this paper since it FFT transformation was implemented because it is a common SVM has demonstrated its effectiveness for RF fingerprinting practiceappliedinthesurveyedpapers[6]butalsobecausethe in the related research literature (see [14], [7] and [11]. empiricalevidencefromtheFFThasshownsubtledifferences, On the basis of a set of training samples (in the case study which could be exploited for classification. presented in this paper, the training samples are the features In the rest of this paper and especially in the results of the magnetometers responses), the SVM algorithm assigns section IV, we identify the features with the numbering of each sample to one of two categories in the training phase. the equations: 1-6 in the time domain, 7-12 for the phase This makes SVM a non-probabilistic binary linear classifier. in the frequency domain and 13-18 for the amplitude in the The resulting SVM model is a representation of the samples frequency domain. as points in space, mapped so that the samples of the separate Combinations of statistical features extracted from the out- categoriesaredividedbyacleargapthatisaswideaspossible. put of magnetometers represent their fingerprints. The goal is In the testing phase (e.g., the verification of the identify of a to select the best combination of features that can provide the mobile phone) new samples are then mapped into that same optimal identification and verification accuracy. The process space and predicted to belong to a category based on which to achieve this goal is called features selection. Various ap- side of the gap they fall on. proacheshavebeenproposedinliteratureforfeaturesselection Here, we provide a brief formal description of the SVM (see [12]). In this paper, we adopt a combination of a brute algorithm to highlight the key points, which will be used in force approach with the Sequential Feature Selection (SFS) our study. Additional details on the SVM algorithm can be algorithm.Thisalgorithmstartswithasinglefeatureorasmall found in [15]. setoffeaturesandincrementallyaddsafeatureatthetimeand We define the n labeled examples (x ,y ),...,(x ,y ) 1 1 n n measurestheresultingvalueofmetric.Ifthemetricimproves, withlabelsy ∈{1,−1}.WewanttofindthehyperplaneH(in i thefeatureisadded,otherwiseanotherfeatureischecked.The aproperd-dimensionalspaceK)definedby<w,x>+b=0 process continues until the maximum (i.e., optimal) value of (i.e. with parameters (w,b) ), which satisfies the following the metric is reached. In this paper, a metric based on the conditions: accuracy was used for the SFS algorithm. The accuracy is 1) The scale of (w,b) is fixed so that the H plane is in described in section III-F. To avoid local maximum values, canonical position, a brute force approach is also performed to select one or few sets of combinations of 6 features among all the possible min|<w,x >+b|=1 i permutations of the features (groups of 6 on a total set of 18 i≤n 6 2) The H plane with parameters (w,b) separates the +1’s is held out for classification. The training and classification from the the −1’s. as described from the following process is repeated ten times until each of the ten blocks equation, has been held out and classified. In this way each block of statistical fingerprints is used once for classification and y (<w,x >+b)≥0 for all i≤n i i ninetimesfortraining.Thefinalcross-validationperformance 3) The H plane has a maximum margin λ = 1/|w|. i.e., statistics are calculated by averaging the results of all folds. minimum |w|2. In this way, the presence of bias in a specific training set or We can redefine < w,x > +b = 0 to the following in portions of the training set are averaged or mitigated. equation: SVM is a binary classifier, and to perform classification of more than two systems as in our case (i.e., 9 mobile phones), w•φ(x)+b=0 (7) weneedtouseamulti-classifierbasedonSVM.Twocommon approaches are the One Against One (OAO) and One Against whereφ(x)representsapropermappingofxintothespace All (OAA) techniques [15]. OAA involves the division of an K;w =[w ;w ;:::;w ]representsad-dimensionalrealvector 1 2 d N (i.e., 9 in our case) class data-sets into N two-class cases, normal to H and b is a real parameter such that |b|/||w|| is while OAO involves the creation of a classification machine theperpendiculardistanceoftheoriginfromH.•isthescalar composed by N(N-1)/2 machines for each pair of systems. product between two vectors. While OAO is more computationally intensive than OAA TheproblemoffindingthehyperplaneHisanoptimization (N(N-1)/2againstN),OAAhassomedisadvantagesespecially problem, which can be defined on the basis of the following withunbalancedtrainingdata-sets.Sincewehavealimitedset equation: of systems (i.e., nine) and the observables are also in limited 1 (cid:88) number(i.e.,260),performanceisnotanissueandwedecided min WTW +C υ (8) (w,b,υ)2 i to select OAO. Finally, we adopted the Sequential minimal optimization Where υ are the slack variables and i is in the range of i (SMO) for SVM. SMO is an algorithm for solving the 1 to N , which is the number of training vectors. The Train quadraticprogramming(QP)problemanditiscommonlyused slack variables are subject to υ >0 and they account for the i in machine learning. presence of classification errors. The parameter C (which we will call Box Constraint in the rest of this paper) allows the The Sequential forward selection algorithm must be based SVM user to control the weight of the classification errors in on a criterion against which the optimum value is identified. the previous equation and it is one of the two parameters to In machine learning, the following parameters are defined: be tuned in the training process. • Tp is the number of true positive matches where the The second parameter to be tuned is related to the Kernel machine learning algorithm has correctly identified a function, which is used to define the shape and format of the sample (e.g., a collected RF signal in our context) as hyperplane.Variouskernelfunctionsareavailableinliterature belonging to the correct class. includinglinear,polynomialandRadialBasisFunction(RBF). • Tn is the number of true negative matches where the In this paper, we use SVM with RBF as a kernel function machine learning algorithm has correctly identified a becauseithasdemonstrateditseffectivenessforfingerprinting sample as not belonging to the correct class. classification in [14] and other references. In addition, this • Fp is the number of false positive matches where the kernel has a number of good features, since it can properly machine learning algorithm has identified a sample as handle the cases in which the relation between class labels belonging to a class while it is not true. and features is nonlinear in classification problems (which is • Fn is the number of false negative matches where the indeed our case). machine learning algorithm has identified a sample as The definition of the RBF is the following: not belonging to the class while this is not true. K(x ,x )=e−γ(cid:107)xi−xj(cid:107)2, (9) The combination of the different parameters can define i j different metrics to evaluate the effectiveness of a machine where the γ scaling factor is the second parameter to be learning algorithm. In this case, we use the verification accu- tuned together with C Box Constraint parameter. racy as a criterion, which is calculated as: To summarize, the application of SVM in this context requires the optimization of the statistical features and the pa- T +T Accuracy = p n , (10) rametersoftheSVMandtheRBF,whicharethescalingfactor TotalPopulation γ fromequation(10)andtheBoxConstraintCparameterfrom equation (9). whereT isthenumberoftruepositivesandT isthenumber p n To avoid the problem of over-fitting in the training phase, of true negatives resulting from the application of the SVM whereasinglesetoftrainingdatacancontainbiasnotpresent machine learning algorithm to the problem of verifying that in other data sets, a 10-fold partition is used for training the collected fingerprints are representative of the same mag- and classification. In the 10-fold method, each collection of netometerevaluatedinthetrainingphase(i.e.,forverification). statistical fingerprints (one for each mobile phone) is divided Thetotalpopulationrepresentsthetotalpopulationofsamples intotenblocks.Nineblocksareusedfortrainingandoneblock (which is the sum of T , T , F and F ). p n p n 7 Beyond accuracy, in this paper, we will also use Receiver v4) v6) v6) ( ( ( Operating Characteristics (ROC) and the Equal Error Rate 1 2 3 1 2 3 G G G (EETRhe) RmOetCriciss ctoalecvualaluteadteatshReOveCr-ilfiikcaetipoenrfaocrmcuarnaccye.curve and TCOne TCOne TCOne uawei AMSUN AMSUN AMSUN ONY1 ONY2 H H H H S S S S S it is generated by plotting the Fp vs. Fn as the verification HTCOne1 101 74 79 6 0 0 0 0 0 threshold changes. HTCOne2 36 157 33 34 0 0 0 0 0 HTCOne3 32 38 185 5 0 0 0 0 0 TheEERcorrespondstothepointontheROCcurvewhere Huawei 0 29 0 231 0 0 0 0 0 the F vs. F are equal. This metric is frequently used as SAMSUNG1(v4) 1 0 0 1 190 16 49 2 1 p n SAMSUNG2(v6) 0 0 0 0 25 197 38 0 0 a summary statistic to compare the performance of various SAMSUNG3(v6) 0 0 0 0 29 33 195 3 0 classificationsystems.Ingeneral,alowerEERsindicatebetter SONY1 0 0 0 0 3 0 0 137 120 SONY2 0 0 0 1 3 2 0 123 131 system classification performance. Finally,theconfusionmatrixisalsousedtoshowtheresults TABLE II: Confusion matrix for all the mobile phones using of the identification process. In the confusion matrix, each Support Vector Machine column of the matrix represents the instances of a predicted classwhileeachrowrepresentstheinstancesoftheactualclass (and vice-versa). As in our experiments we used 9 phones, IV. RESULTS the confusion matrix shown in the results section IV has a A. Results based on one day of measurements dimensionof9*9.Intheconfusionmatrix,thecorrectguesses (i.e., true positive or negative) are located in the diagonal In this section, we describe the experimental results for of the table, so it’s easy to inspect the table for errors, as identification and accuracy. they will be represented by values outside the diagonal. The Table II is the confusion matrix of all the mobile phones confusionmatrixisalsousedinthispapertodefinethemetric stimulated with the Waveform B using the SVM algorithm. employed in the SFS and for the optimization of the scaling From the table, we can immediately see that intra-model factor and box constraint parameters. The metric is the sum classification is very difficult to achieve with this approach, of the diagonal values of the confusion matrix divided for all while inter-model classification is possible with a very good the values of the confusion matrix. It can be considered an level of accuracy. More specifically, intra-brand classification extension of the accuracy metric. In general, the confusion is easy to achieve, because the algorithms do not confuse matrix is used for classification purposes while the ROCs are observables from different brands. The cross-values between used for verification. HTC, Samsung and Sony are basically empty. Intra-model Using the accuracy metric derived from the confusion classification is difficult to achieve for the HTC One and matrix,wecalculatedthebestsetoffeatures:[123671015 SAMSUNG models, while we basically have random choice 17] as described before and then we identified the optimum fortheSONYExperiamodel.Wealsonoticethatthedifferent values of the scaling factor and the box constraint. This was versionsoftheoperatingsystems(version4againstversion6) achieved by creating a two dimensional array based on a set for the SAMSUNG models do not have significant impact on ofvaluesofscalingfactorrangingfrom2−8 to28 andthebox the classification accuracy (i.e., no effects on data collection constraint ranging from 2−8 to 228. Similar ranges of values from magnetometers, which have the same characteristics). are suggested by various references like [15]. We also evaluated the performance of the SVM algorithm Theresultisshowninfigure6,whichfeaturesapeakvalue against the well known k-nearest neighbors algorithm (KNN). in correspondence of a Scaling Factor equal to 27(128) and The resulting confusion matrix is shown in figure III, where a Box Constraint of 222 (4194304) for a specific fold as an it can be seen that the results are worst than what obtained example. The identified set of features and optimal values of with SVM. This can also be evaluated by calculating the the Scaling Factor and Box Constraint for each fold are used ratio of the sum of the diagonal elements between the two to produce all the graphs and results in the section IV. confusion matrix (the overall sum of the elements of the matrixisthesameinbothcases).Theresultis1545(SVM)÷ 28 1361(KNN) = 1.135, which shows a clear advantage for 27 26 SVM (around 13% better accuracy). 25 24 The results from the confusion matrix are confirmed by the 23 ROC curves among the different phones of different brands, Scaling Factor22222--21012 moFdieglusreor7ssehrioawlsnuthmebReOr oCfctuhrevedsiffaenrdenthtemroedlaetlesd. EERvalues 2-3 for verification among brands and models. ROCs are the 2-4 results of a binary classification and they can be used for the 2-5 2-6 verificationprocessbetweenamobilephoneAandB.Inother 2-7 2-8-28 -27 -26 -25 -24 -23 -22 -21 20 21 22 23 24 25 26 27 28 29 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 words, if a mobile phone B falsely claims that it is actually Box Constraint mobile phone A, the algorithm should be able to verify the Fig. 6: Optimization of the scaling Factor and box Constraint authenticity. ROC curves, which are more leaning against the values for one fold center of the graph, indicate a lower verification accuracy. In fact,theEERparameteristheintersectionofthediagonalline 8 4) 6) 6) Mobilephonemodel AverageEER v v v ( ( ( HTCOne 0.47 1 2 3 1 2 3 G G G HuaweiAscendMate 0.44 TCOne TCOne TCOne uawei AMSUN AMSUN AMSUN ONY1 ONY2 SSaomnysuEnxgpeGraialaxyS5 0.500.57 H H H H S S S S S TABLE IV: Averaged EER among data from same phones in HTCOne1 92 71 76 19 0 0 0 0 2 HTCOne2 47 124 29 60 0 0 0 0 0 different days HTCOne3 52 51 143 14 0 0 0 0 0 Huawei 4 21 1 234 0 0 0 0 0 SAMSUNG1(v4) 1 0 1 152 37 65 0 4 SAMSUNG2(v6) 0 0 0 0 37 174 49 0 0 B. Results based on different days of measurements SAMSUNG3(v6) 0 0 0 0 31 49 177 1 2 SONY1 0 0 0 0 5 1 1 142 111 Toensurethattheresultsobtainedintheprevioussectionare SONY2 0 0 3 0 4 2 2 126 123 consistent in time, we repeated the stimulation and collection TABLE III: Confusion matrix for all the mobile phones using ofdataindifferentdays.Wecollecteddataforothertwodays the k-nearest neighbors algorithm forallthemobilephones.Notethatthemeasurementwerenot taken in consecutive days but in days separated by weeks to evaluate the stability of the algorithm for a long time-frame. with the ROC curve. The higher is an EER, the lower is the We measured the stability of the algorithm on the basis of verification accuracy. two approaches: a) we evaluated how different are the ROCs Wenoticethatitisveryeasytodistinguishbetweenmobile of the verification of a mobile phone against the data sets phones of different brands (HTC against Huawei, Samsung taken in three different days for another mobile phone and or Sony) as expected from the confusion matrix. The EER b) we evaluated how similar are the data sets of the same valuesareverylow,whichdemonstrateaveryhighverification mobile phone across the three different days. In the case a) accuracy. the ROC curves and the related EER values should be quite similar,whileinthecaseb)theEERcalculatedfromtheROCs comparingthedifferentdatasetsofthesamephoneshouldbe 1 0.9 almostrandomchoice(EER=0.5)becausetheyareobservables 0.8 taken from the same mobile phone. 0.7 EER=0.054 HTC ONE 1 vs HUAWEI The results are shown in figure 9 for the ROCs between True positive rate000...456 EEEEEEEEEERRRRR=====00000... .000H00001U3332A888 SW HHHAETTUMICCA Sv WsUOO NSENNOIGEE vN 11s1Y Svvv sss2A SSMSOAOSMNNUSYYNU G22N 1G 1 ocfounrrevtehHseTaoCrtehOeqrnueimteaonbndielaetrh.peVhHoenruyeasws.iemiiplahrornees.uWltsehnaovteebteheant tohbetaRinOeCd 0.3 0.2 0.1 1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 False positive rate 0.8 valIunFesistge.afod7r,:fiIvngetureirrfie-cm8aotsidhoeonlwabsnetdthweberReanOndCmvcoeubrriivlfieecsaptahinoodnnetfhsoerordfealaythteeodnseEaEmRe True positive rate00..46 EEEEEERRR===000...000568495 HHHTTTCCC OOOnnneee 111 DDD111 vvvsss HHHUUUAAAWWWEEEIII DDD123 model. In our case, this set includes only the HTC One (three 0.2 phones), SAMSUNG (three phones) and SONY (two phones) 0 models. We noticed that the algorithm is able to distinguish 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 False positive rate the mobile phones only with great difficulty (EER around 0.3 Fig. 9: ROCs of HTC One against Huawei in different days and0.2 )for theHTCOne andSAMSUNGphones, whilewe get random-choice between the SONY phones. The table IV shows the average EER for all the models used in our experiments. The three values of the EER among 1 the combinations of days (day one against day two, day two 0.9 against day three and day one against day three) have been 0.8 averagedforeachphone.Inaddition,theaverageamongallthe 0.7 True positive rate000...456 EEEEEERRR===000...331 17H HHTTTCCC O OONNNEEE 1 12v svv ssH HHTTTCCC O OONNNEEE 2 33 pivsshaholoouwnneelssystohoofafnttethhtepheheEsoaEsnmtReae)baiwmlrietaoysdneoeclfaarl(tcahrupealanaradttpeofdpmr.roo-AmcashcthoheiexcinpeHet(uciEmtaeEwdeRe.,i=thw0e.h5re)er,sewutlhhtiiencrghe 0.3 EER=0.2 SAMSUNG 1 vs SAMSUNG 2 EER=0.2 SAMSUNG 1 vs SAMSUNG 3 0.2 EER=0.28 SAMSUNG 2 vs SAMSUNG 3 EER=0.5 SONY 1 vs SONY 2 0.1 C. Results in presence of Noise 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 False positive rate In this section, we evaluate the impact of Gaussian Noise Fig. 8: Intra model verification for day one on the verification accuracy. Additive white Gaussian noise (AWGN) was added to the digital output collected by the 9 magnetometers to simulate disturbances or attenuation in the 1 propagation path between the solenoid and the mobile phone. 0.9 This may be possible in a realistic application of magne- 0.8 ttltohohmesesessitoomeflreusnvloafietirnidiogfinecm,rapatAriyoiWnnntGionaNtgc,cbuwweriahtichdeyreedaiilfnt.fheTeprehrpeneostsegvintoaicaoleulneoisosffotafothttSeeeinvgpuanhlaauotlianotNeeno.atinhsIdnee True positive rate0000....4567 WWWaaavvveeefffooorrrmmm ABC (((smlohenodgrti)u),, m EEE)E, RER=E=00R..2=30085.2314 0.3 Ratio (SNR) was added to the observables already collected. 0.2 The process of adding AWGN was repeated 50 times and 0.1 the results were averaged to ensure the randomness of the 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 simulation. The results are shown in figure 10. As expected, False positive rate theEERincreaseswiththedecreaseoftheSNR.Foravalueof Fig. 12: ROCs between HTC One 1 and HTC One 3 for SNRequalto0,weobtainalmostrandom-choice(EER=0.43). different waveforms 1 Intermodel AverageEER WaveformA 0.0159 0.9 WaveformB 0.0136 0.8 WaveformC 0.00303 0.7 True positive rate0000....3456 SSSSNNNNRRRR====0512 00EE EEEEEERRRR====00..0044..3220596 WWIWnaaatrvvvaeeefffmooorrrommmdeBCAl Average000E...231E939R645 SNR=30 EER=0.05 0.2 SNR=40 EER=0.054 TABLE V: Averaged EER for intermodel and intramodel 0.1 classification 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 False positive rate Fig. 10: Verification accuracy for different SNRs of AWGN TheROCsfigures(11and12)showtheverificationbetween between HTC One and Huawei the mobile phone HTC One 1 and HTC One 3 for the intra-model case and between HTC One 1 and the Huawei mobile phone for the inter-model case. We notice that the D. Results for different stimulating patterns waveform C has a slightly better performance in both cases. Finally, we evaluate the change in verification accuracy for This result is confirmed by averaging the EERs values for different stimulating patterns: the waveforms from A to C all the combination of mobile phones in the inter-model and described in figure 3. The goal is to evaluate if longer or intra-modelcasesasshownintableVwhereWaveformChas more complex waveforms provide better results than shorter a clear advantage over the other two waveforms, which have waveforms or vice-versa. We note that there is a trade-off similar accuracy scores. becauselongerwaveformsrequirelongertimesforthecreation From these results, it is clear that a longer waveform can ofthetrainingsetorfortheevaluationprocess.Inaddition,the provideabetterverificationaccuracyattheexpenseofalonger hysteresis cycle of the magnetometers require that there is a training and evaluation time. Indeed, the intra-model scores minimaltimeseparationbetweenthestimuli(i.e.,thepulsesin for waveform C show that it would be possible to distinguish the waveforms of figure 3). The ROCs and the average values withagoodlevelofaccuracyevenmobilephonesofthesame ofEERshavebeencalculatedforthedifferentwaveformsand model and brand (i.e., intra-model). the results are provided in figure 11 and 12 for the ROCs and in table V for EERs. V. CONCLUSIONSANDFUTUREDEVELOPMENTS In this paper we have investigated the identification of 1 mobile phones through their built-in magnetometers on a 0.9 set of nine mobile phones, when they are stimulated by 0.8 a solenoid with specific waveforms. We have shown that 0.7 True positive rate000...456 WWWaaavvveeefffooorrrmmm ABC (((smlohenodgrti)u),, m EEE)E, RER=E=00R..0=01078.702354 iawinmcitctephura-rcmalticmoyod,fiteeGlwdahvuaielscersciifiuaicrnnaatctNriyaoo-nmi.seoWc.daeeWnlehbvahevearevifiaeeccvhaaaitlelisuoovanetpderdcoawvnaeinnthdbtehsveheaorscywthanibehivlitigehthdye 0.3 0.2 of the verification results across different days. Regarding 0.1 the application of different waveforms, the results show that 0 accuracy can be improved with longer and more complex 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 False positive rate waveforms. Future developments will extend the analysis Fig. 11: ROCs between HTC One 1 and Huawei for different shown in this paper with more complex waveforms and with waveforms the application of different sets of features with the goal to obtain a better intra-model accuracy. 10 Disclosure Policy The authors declare that there is no conflict of interest regarding the publication of this paper. REFERENCES [1] J.Lukas,J.Fridrich,andM.Goljan,“Digitalcameraidentificationfrom sensorpatternnoise,”IEEETransactionsonInformationForensicsand Security,vol.1,no.2,pp.205–214,June2006. [2] W.C.SuskiII,M.A.Temple,M.J.Mendenhall,andR.F.Mills,“Using spectral fingerprints to improve wireless network security,” in IEEE GLOBECOM2008-2008IEEEGlobalTelecommunicationsConference. IEEE,2008,pp.1–5. [3] J.Fridrich,“Digitalimageforensics,”IEEESignalProcessingMagazine, vol.26,no.2,pp.26–37,March2009. [4] T. J. Bihl, K. W. Bauer, and M. A. Temple, “Feature selection for rf fingerprinting with multiple discriminant analysis and using zigbee device emissions,” IEEE Transactions on Information Forensics and Security,vol.11,no.8,pp.1862–1874,Aug2016. [5] D.R.Reising,M.A.Temple,andM.J.Mendenhall,“Improvedwireless security for gmsk-based devices using rf fingerprinting,” International JournalofElectronicSecurityandDigitalForensics,vol.3,no.1,pp. 41–59,2010. [6] J. Zhang, F. Wang, O. A. Dobre, and Z. Zhong, “Specific emitter identification via hilbert-huang transform in single-hop and relaying scenarios,” IEEE Transactions on Information Forensics and Security, vol.11,no.6,pp.1192–1205,June2016. [7] C.Hanilci,F.Ertas,T.Ertas,and.Eskidere,“Recognitionofbrandand modelsofcell-phonesfromrecordedspeechsignals,”IEEETransactions onInformationForensicsandSecurity,vol.7,no.2,pp.625–634,April 2012. [8] C. L. Kotropoulos, “Source phone identification using sketches of features,”IETBiometrics,vol.3,no.2,pp.75–83,June2014. [9] S.Dey,N.Roy,W.Xu,R.R.Choudhury,andS.Nelakuditi,“AccelPrint: ImperfectionsofAccelerometersMakeSmartphonesTrackable,”in2014 NetworkandDistributedSystemSecurity(NDSS)Symposium,Feb2014. [10] H.Bojinov,Y.Michalevsky,G.Nakibly,andD.Boneh,“Mobiledevice identification via sensor fingerprinting,” CoRR, vol. abs/1408.1416, 2014.[Online].Available:http://arxiv.org/abs/1408.1416 [11] G.Baldini,G.Steri,F.Dimc,R.Giuliani,andR.Kamnik,“Experimental identificationofsmartphonesusingfingerprintsofbuilt-inmicro-electro mechanicalsystems(mems),”Sensors,vol.16,no.6,p.818,2016. [12] B. D. Fulcher and N. S. Jones, “Highly comparative feature-based time-series classification,” IEEE Transactions on Knowledge and Data Engineering,vol.26,no.12,pp.3026–3037,Dec2014. [13] D.R.Reising,M.A.Temple,andJ.A.Jackson,“Authorizedandrogue devicediscriminationusingdimensionallyreducedrf-dnafingerprints,” IEEETransactionsonInformationForensicsandSecurity,vol.10,no.6, pp.1180–1192,June2015. [14] J.Hasse,T.Gloe,andM.Beck,“Forensicidentificationofgsmmobile phones,” in Proceedings of the first ACM workshop on Information hidingandmultimediasecurity. ACM,2013,pp.131–140. [15] N. Cristianini and J. Shawe-Taylor, An introduction to support vector machines and other kernel-based learning methods. Cambridge uni- versitypress,2000.

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.