ebook img

Comparative study on classifying human activities with miniature inertial and magnetic sensors PDF

16 Pages·2010·1.38 MB·English
by  
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Comparative study on classifying human activities with miniature inertial and magnetic sensors

ARTICLE IN PRESS PatternRecognition43(2010)3605–3620 ContentslistsavailableatScienceDirect Pattern Recognition journal homepage: www.elsevier.com/locate/pr Comparative study on classifying human activities with miniature inertial and magnetic sensors Kerem Altun, Billur Barshan(cid:2), Orkun Tunc-el DepartmentofElectricalandElectronicsEngineering,BilkentUniversity,Bilkent,TR-06800Ankara,Turkey a r t i c l e i n f o a b s t r a c t Articlehistory: Thispaperprovidesacomparativestudyonthedifferenttechniquesofclassifyinghumanactivitiesthat Received6October2009 areperformedusingbody-wornminiatureinertialandmagneticsensors.Theclassificationtechniques Receivedinrevisedform implementedandcomparedinthisstudyare:Bayesiandecisionmaking(BDM),arule-basedalgorithm 30March2010 (RBA) or decision tree, the least-squares method (LSM), the k-nearest neighbor algorithm (k-NN), Accepted22April2010 dynamictimewarping(DTW),supportvectormachines(SVM),andartificialneuralnetworks(ANN). Humanactivitiesareclassifiedusingfivesensorunitswornonthechest,thearms,andthelegs.Each Keywords: sensor unit comprises a tri-axial gyroscope, a tri-axial accelerometer, and a tri-axial magnetometer. Inertialsensors Afeaturesetextractedfromtherawsensordatausingprincipalcomponentanalysis(PCA)isusedinthe Gyroscope classificationprocess.Aperformancecomparisonoftheclassificationtechniquesisprovidedintermsof Accelerometer their correct differentiation rates, confusion matrices, and computational cost, as well as their pre- Magnetometer processing, training, and storage requirements. Three different cross-validation techniques are Activityrecognitionandclassification Featureextraction employed to validate theclassifiers. The results indicate that ingeneral, BDM results in the highest Featurereduction correctclassificationratewithrelativelysmallcomputationalcost. Bayesiandecisionmaking &2010ElsevierLtd.Allrightsreserved. Rule-basedalgorithm Decisiontree Least-squaresmethod k-Nearestneighbor Dynamictimewarping Supportvectormachines Artificialneuralnetworks 1. Introduction up new possibilities for the use of inertial sensors, one of them being human activity monitoring, recognition, and classification Inertialsensorsareself-contained,nonradiating,nonjammable, through body-worn sensors [11–15]. This in turn has a broad dead-reckoningdevicesthatprovidedynamicmotioninformation range of potential applications in biomechanics [15,16], ergo- through direct measurements. Gyroscopes provide angular rate nomics [17], remote monitoring of the physically or mentally informationaroundanaxisofsensitivity,whereasaccelerometers disabled,theelderly,andchildren[18],detectingandclassifying providelinearorangularvelocityrateinformation. falls[19–21],medicaldiagnosisandtreatment[22],home-based For several decades, inertial sensors have been used for rehabilitationandphysicaltherapy[23],sportsscience[24],ballet navigation of aircraft [1,2], ships, land vehicles, and robots and other forms of dance [25], animation and film making, [3–5], for state estimation and dynamic modeling of legged computer games [26,27], professional simulators, virtual reality, robots [6,7], for shock and vibration analysis in the automotive andstabilizationofequipmentthroughmotioncompensation. industry,andintelesurgery[8,9].Recently,thesize,weight,and Early studies in activity recognition employed vision-based cost of commercially available inertial sensors have decreased systemswithsingleormultiplevideocameras,andthisremains considerably with the rapid development of micro electro- the most common approach to date [28–31]. For example, mechanical systems (MEMS) [10]. Some of these devices are althoughthegesture recognition problemhas been wellstudied sensitivearoundasingleaxis;othersaremulti-axial(usuallytwo- incomputervision[32],muchlessresearchhasbeendoneinthis orthree-axial).TheavailabilityofsuchMEMSsensorshasopened areawithbody-worninertial sensors [33,34]. The use ofcamera systems may be acceptable and practical when activities are confinedtoalimitedareasuchascertainpartsofahouseoroffice (cid:2)Correspondingauthor. environment and when the environment is well lit. However, E-mailaddress:[email protected](B.Barshan). when the activity involves going from place to place, camera 0031-3203/$-seefrontmatter&2010ElsevierLtd.Allrightsreserved. doi:10.1016/j.patcog.2010.04.019 ARTICLE IN PRESS 3606 K.Altunetal./PatternRecognition43(2010)3605–3620 systemsaremuchlessconvenient.Furthermore,camerasystems classification is its potential applications in the many different interfere considerably with privacy, may supply additional, areas mentioned above. The main contribution of this paper is unneededinformation,andcausethesubjectstoactunnaturally. that unlike previous studies, we use many redundant sensors to Miniatureinertialsensorscanbeflexiblyusedinsideorbehind begin with and extract a variety of features from the sensor objects withoutocclusion effects. This is a major advantageover signals. Then, we use an unsupervised feature transformation visual motion-capture systems that require a free line of sight. technique that allows considerable feature reduction through When a single camera is used, the3-D scene is projected onto a automaticselectionofthemostinformativefeatures.Weprovide 2-D one, with significant information loss. Points of interest are anextensiveandsystematiccomparisonbetweenvariousclassi- frequently pre-identified by placing special, visiblemarkers such ficationtechniquesusedforhumanactivityrecognitionbasedon aslight-emittingdiodes(LEDs)onthehumanbody.Occlusionor the same data set. We compare the successful differentiation shadowingofpointsofinterest(byhumanbodypartsorobjectsin rates,confusionmatrices,andcomputationalrequirementsofthe thesurroundings)iscircumventedbypositioningmultiplecamera techniques. systemsintheenvironmentandusingseveral2-Dprojectionsto The paper is organized as follows: In Section2, we introduce reconstruct the 3-D scene. This requires each camera to be theactivitiesclassifiedinthisstudyandoutlinetheexperimental separatelycalibrated.Anothermajordisadvantageofusingcamera methodology. Describing the feature vectors and the feature systemsisthatthecostofprocessingandstoringimagesandvideo reductionprocessisthetopicofSection3.InSection4,webriefly recordings is much higher than those of 1-D signals. 1-D signals reviewtheclassificationmethodsusedinthisstudy.InSection5, acquired from multiple axes of inertial sensors can directly we present the experimental results and compare the methods’ provide the required information in 3-D. Unlike high-end computational requirements. We also provide a brief discussion commercial inertial sensors that are calibrated by the manufac- on selecting classification techniques and their advantages and turer, in low-cost applications that utilize these devices, calibra- disadvantages.Section6addressesthepotentialapplicationareas tion is still a necessary procedure. Accelerometer-based systems ofminiatureinertialsensorsinactivityrecognition.InSection7, are more commonly adopted than gyros because accelerometers we draw conclusions and provide possible directions for future areeasilycalibratedbygravity,whereasgyrocalibrationrequires work. anaccuratevariable-speedturntableandismorecomplicated. The use of camera systems and inertial sensors are two inherently different approaches that are by no means exclusive 2. Classifiedactivitiesandexperimentalmethodology andcanbeusedinacomplementaryfashioninmanysituations. Inanumberofstudies,videocamerasareusedonlyasareference The19activitiesthatareclassifiedusingbody-wornminiature forcomparisonwithinertialsensordata[35–40].Inotherstudies, inertialsensorunitsare:sitting(A1),standing(A2),lyingonback data from these two sensing modalities are integrated or fused and on right side (A3 and A4), ascending and descending stairs [41,42]. The fusion of visual and inertial data has attracted (A5andA6),standinginanelevatorstill(A7)andmovingaround considerableattentionrecentlybecauseofitsrobustperformance (A8),walkinginaparkinglot(A9),walkingonatreadmillwitha and potentially wide applications [43,44]. Fusing the data of speedof4km/h(inflatand151inclinedpositions)(A10andA11), inertial sensors and magnetometers is also reported in the runningonatreadmillwithaspeedof8km/h(A12),exercisingon literature[38,46,47]. astepper(A13),exercisingonacrosstrainer(A14),cyclingonan Previous work on activity recognition based on body-worn exercisebike in horizontaland vertical positions(A15and A16), inertial sensors is fragmented, of limited scope, and mostly rowing(A17),jumping(A18),andplayingbasketball(A19). unsystematic in nature. Due to the lack of a common ground Five MTx 3-DOF orientation trackers (Fig. 1) are used, amongdifferentresearchers,resultspublishedsofararedifficult manufactured by Xsens Technologies [54]. Each MTx unit has a to compare, synthesize, and build upon in a manner that allows tri-axial accelerometer, a tri-axial gyroscope, and a tri-axial broad conclusions to be reached. A unified and systematic magnetometer,sothesensorunitsacquire3-Dacceleration,rate treatmentofthesubjectisdesirable;theoreticalmodelsneedto of turn, and the strength of Earth’s magnetic field. Each motion be developed that will enable studies designed such that the tracker is programmed via an interface program called MT obtainedresultscanbesynthesizedintoalargerwhole. Manager to capture the raw or calibrated data with a sampling Most previous studies distinguish between sitting, lying, and frequencyofupto512Hz. standing[18,35–37,39,45,48–50],astheseposturesarerelatively Accelerometers of two of the MTx trackers can sense up to easy to detect using the static component of acceleration. 75gandtheotherthreecansenseintherangeof 718g,where Distinguishing between walking, and ascending and descending g¼9.80665m/s2isthegravitationalconstant.Allgyroscopesinthe stairs has also been accomplished [45,48,50], although not as MTx unit can sense in the range of 712001/s angular velocities; successfully as detecting postures. The signal processing and magnetometerscansensemagneticfieldsintherangeof 775mT. motion detection techniques employed, and the configuration, Weuseallthreetypesofsensordatainallthreedimensions. number,andtypeofsensorsdifferwidelyamongthestudies,from usingasingleaccelerometer[18,51,52]toasmanyas12[53]on different parts of the body. Although gyroscopes can provide valuable rotational information in 3-D, in most studies, accel- erometers are preferred to gyroscopes due to their ease of calibration. To the best of our knowledge, guidance on finding a suitableconfiguration,number,andtypeofsensorsdoesnotexist [45].Usually,someconfigurationandsomemodalityofsensorsis chosen without strong justification, and empirical results are presented. Processing the acquired signals is also often done ad hocandwithrelativelyunsophisticatedtechniques. In this work, we use miniature inertial sensors and magnet- ometers positioned on different parts of the body to classify Fig.1. MTx3-DOForientationtracker(reprintedfromhttp://www.xsens.com/en/ human activities. The motivation behind investigating activity general/mtx). ARTICLE IN PRESS K.Altunetal./PatternRecognition43(2010)3605–3620 3607 bodTyhaessdeenpsiocrtsedarineFpilga.c2e.dSionncefilveegdmifofetiroennstipnlagceenseroanlmthaeyspurbojdeuctc’es skewnessðsÞ¼ Efðs(cid:3)s3msÞ3g ¼ N1s3XNs ðsi(cid:3)msÞ3 largeraccelerations,twoofthe718gsensorunitsareplacedonthe s i¼1 sidesoftheknees(rightsideoftherightkneeandleftsideofthe leftknee),theremaining718gunitisplacedonthesubject’schest kurtosisðsÞ¼ Efðs(cid:3)msÞ4g ¼ 1 XNs ðs(cid:3)m Þ4 (Fig.2(b)),andthetwo 75gunitsonthewrists(Fig.2(c)). s4 Nss4i¼1 i s The five MTx units are connected with 1m cables to a device calledtheXbusMaster,whichisattachedtothesubject’sbelt.The 1 NsX(cid:3)D(cid:3)1 autocorrelation:R ðDÞ¼ ðs(cid:3)m Þðs (cid:3)m Þ, XbusMastertransmitsdatafromthefiveMTxunitstothereceiver ss Ns(cid:3)D i¼0 i s i(cid:3)D s using a BluetoothTM connection. The Xbus Master, which is D¼0,1,...,N(cid:3)1 s connected to three MTx orientation trackers, can be seen in Fpiogr.t.3T(aw).oTohfetrheecefiivveerMisTxcounnnietcstaerdetdoiraeclatlpytocponcnoemctpeudtetrovtihaeaXUbSuBs DFT:SDFTðkÞ¼ NXs(cid:3)1sie(cid:3)j2pki=Ns, k¼0,1,...,Ns(cid:3)1 Masterandtheremainingthreeunitsareindirectlyconnectedtothe i¼0 Xbus Master by wires to the other two. Fig. 3(b) illustrates the In these equations, si is the ith element of the discrete-time connectionconfigurationofthefiveMTxunitsandtheXbusMaster. sequences,Ef(cid:4)gdenotestheexpectationoperator,m andsarethe s Each activity listed above is performed by eight different mean and the standard deviation of s, RssðDÞ is the unbiased subjects(4female,4male,betweentheages20and30)for5min. autocorrelationsequenceofs,andSDFT(k)isthekthelementofthe Thesubjectsareaskedtoperformtheactivitiesintheirownstyle 1-DNs-pointDFT.Incalculatingthefirstfivefeaturesabove,itis and were not restricted on how the activities should be assumed that the signal segments are the realizations of an performed. For this reason, there are inter-subject variations in ergodicprocesssothatensembleaveragesarereplacedwithtime the speeds and amplitudes of some activities. The activities are averages.Apartfromthoselistedabove,wehavealsoconsidered performed at the Bilkent University Sports Hall, in the Electrical using features such as the total energy of the signal, cross- andElectronicsEngineeringBuilding,andinaflatoutdoorareaon correlation coefficients of two signals, and the discrete cosine campus. Sensor units are calibrated to acquire data at 25Hz transformcoefficientsofthesignal. sampling frequency. The 5-min signals are divided into 5-s Since there are five sensor units (MTx), each with three tri- segments, from which certain features are extracted. In this axial devices, a total of nine signals are recorded from every way,480(¼60(cid:2)8)signalsegmentsareobtainedforeachactivity. sensor unit. Different signal representations, such as the time- domain signal, its autocorrelation function, and its DFT for two selected activities are given in Fig. 4. In parts (a) and (c) of the 3. Featureextractionandreduction figure, the quasi-periodic nature of the walking signal can be observed. After acquiring the signals as described above, we obtain a Whenafeaturesuchasthemeanvalueofasignaliscalculated, discrete-timesequenceofN elementsthatcanberepresentedas s 45(¼9axes (cid:2)5units)differentvaluesareavailable.Thesevalues anN (cid:2)1vectors¼ [s ,s ,y,s ]T.For the5-stimewindowsand s 1 2 Ns fromthefivesensorunitsareplacedinthefeaturevectorsinthe the 25-Hz sampling rate, N ¼125. The initial set of features we s orderofrightarm,leftarm,rightleg,torso,andleftleg.Foreach use before feature reduction are the minimum and maximum one of these sensor locations, nine values for each feature are values,themeanvalue,variance,skewness,kurtosis,autocorrela- calculated and recorded in the following order: the x,y,z axes’ tion sequence, and the peaks of the discrete Fourier transform acceleration,thex,y,zaxes’rateofturn,andthex,y,zaxes’Earth’s (DFT) of s with the corresponding frequencies. These are magnetic field. In constructing the feature vectors, the above calculatedasfollows: procedureisfollowedfortheminimumandmaximumvalues,the 1 XNs mean,skewness,andkurtosis.Thus,225(¼45axes (cid:2)5features) meanðsÞ¼m ¼Efsg¼ s s N i elements of the feature vectors are obtained by using the above si¼1 procedure. After taking the DFT of each 5-s signal, the maximum five varianceðsÞ¼s2¼Efðs(cid:3)m Þ2g¼ 1 XNs ðs(cid:3)m Þ2 Fourier peaks are selected so that a total of 225 (¼9 axes (cid:2)5 s N i s si¼1 units (cid:2)5 peaks) Fourier peaks are obtained for each segment. Fig.2. PositioningofXsenssensormodulesonthebody. ARTICLE IN PRESS 3608 K.Altunetal./PatternRecognition43(2010)3605–3620 Fig.3. (a)MTxblocksandXbusMaster(reprintedfromhttp://www.xsens.com/en/movement-science/xbus-kit),(b)connectiondiagramofMTxsensorblocks(bodypartof thefigureisfromhttp://www.answers.com/bodybreadths). Eachgroupof45peaksisplacedintheorderofrightarm,leftarm, first five transformed features are given in Fig. 6 pairwise. As right leg, torso, and left leg, as above. The 225 frequency values expected, in the first two plots or so (parts (a) and (b) of the thatcorrespondtotheseFourierpeaksareplacedaftertheFourier figure),thefeaturesfordifferentclassesarebetterclusteredand peaksinthesameorder. moredistinct. Eleven autocorrelation samples are placed in the feature We assume that after feature reduction, the resulting feature vectors for each axis of each sensor, following the order given vectorisanN(cid:2)1vectorx¼[x ,y,x ]T. 1 N above. Since there are 45 distinct sensor signals, 495 (¼45 axes (cid:2)11samples)autocorrelationsamplesareplacedineachfeature vector. The first sample of the autocorrelation function (the 4. Classificationtechniques variance)andeveryfifthsampleuptothefiftiethareplacedinthe featurevectorsforeachsignal. The classification techniques used in this study are briefly As a result of the above feature extraction process, a total of reviewedinthissection.Moredetaileddescriptionscanbefound 1170(¼225+225+225+495)featuresareobtainedforeachofthe in[14,57]andinthegivenreferences. 5-s signal segments so that the dimensions of the resulting Weassociateaclasso witheachactivitytype(i¼1,y,c).An i feature vectors are 1170(cid:2)1. All features are normalized to the unknown activity is assigned to class o if its feature vector i interval[0,1]soastobeusedforclassification. x¼[x ,y,x ]T falls in the region O. A rule that partitions the 1 N i Because the initial set of features was quite large (1170) and decisionspaceintoregionsO,i¼1,...,ciscalledadecisionrule.In i notallfeatureswereequallyusefulindiscriminatingbetweenthe our work, each one of these regions corresponds to a different activities, we investigated different feature selection and reduc- activitytype.Boundariesbetweentheseregionsarecalleddecision tion methods [55]. In this work, we reduced the number of surfaces. The training set contains a total of I¼I þI þ(cid:4)(cid:4)(cid:4)þI 1 2 c features from 1170 to 30 through principal component analysis samplefeaturevectorswhereI samplefeaturevectorsbelongto i (PCA)[56],whichisatransformationthatfindstheoptimallinear class o, and i¼1,y,c. The test set is then used to evaluate the i combinationsofthefeatures,inthesensethattheyrepresentthe performanceofthedecisionrule. data with the highest variance in a feature subspace, without takingtheintra-classandinter-classvariancesintoconsideration separately. The reduced dimension of the feature vectors is 4.1. Bayesiandecisionmaking(BDM) determinedbyobservingtheeigenvaluesofthecovariancematrix of the 1170(cid:2)1 feature vectors, sorted in Fig. 5(a) in descending In BDM, class conditional probability density functions order. The 30 eigenvectors corresponding to the largest 30 (CCPDFs) are estimated for each class. In this study, the CCPDFs eigenvalues (Fig. 5(b)) are used to form the transformation are assumed to have a multi-variate Gaussian parametric form, matrix, resulting in 30(cid:2)1 feature vectors. Although the initial andthemeanvectorandthecovariancematrixoftheCCPDFfor set of 1170 features do have physical meaning, because of the eachclassareestimatedusingmaximumlikelihoodestimatorson matrix transformation involved, the transformed feature vectors the training vectors. For a given test vector x, the maximum a cannot be assigned any physical meaning. Scatter plots of the posteriori(MAP)decisionruleisusedforclassification[56]. ARTICLE IN PRESS K.Altunetal./PatternRecognition43(2010)3605–3620 3609 6 40 5 20 4 2) 2) 0 s s m/ 3 m/ (az a (z−20 2 −40 1 0 −60 0 1 2 3 4 5 0 1 2 3 4 5 t (sec) t (sec) 1.5 120 100 1 80 0.5 60 Δ) Δ) (s 0 (s 40 s s R R 20 −0.5 0 −1 −20 −1.5 −40 −5 −4 −3 −2 −1 0 1 2 3 4 5 −5 −4 −3 −2 −1 0 1 2 3 4 5 Δ Δ 80 400 70 350 60 300 50 250 k) k) (T40 (T200 F F D D S S 30 150 20 100 10 50 0 0 0 25 50 75 100 125 0 25 50 75 100 125 k k Fig.4. (Coloronline)(a)and(b):Time-domainsignalsforwalkingandbasketball,respectively;z-axisaccelerationoftheright(solidlines)andleftarm(dashedlines)are given;(c)and(d):autocorrelationfunctionsofthesignalsin(a)and(b);(e)and(f):125-pointDFTofthesignalsin(a)and(b),respectively. 4.2. Rule-basedalgorithm(RBA) such as ‘‘is feature xirti?,’’ where t is the threshold value for a given feature and i¼1,2,y,T, with T being the total number of Arule-basedalgorithmoradecisiontreecanbeconsidereda featuresused[58]. sequentialprocedurethatclassifiesgiveninputs.AnRBAfollows As the information necessary to differentiate between the predefined rules at each node of the tree and makes binary activities is completely embodied in the decision rules, the RBA decisions based on these rules. Rules correspond to conditions has the advantage of not requiring storage of any reference ARTICLE IN PRESS 3610 K.Altunetal./PatternRecognition43(2010)3605–3620 eigenvalues in descending order first 50 eigenvalues in descending order 4.5 4.5 4 4 3.5 3.5 3 3 2.5 2.5 2 2 1.5 1.5 1 1 0.5 0.5 0 0 −0.5 −0.5 0 200 400 600 800 1000 1200 0 10 20 30 40 50 Fig.5. (a)Alleigenvalues(1170)and(b)thefirst50eigenvaluesofthecovariancematrixsortedindescendingorder. 4 4 2 2 2 3 e e A1 atur 0 atur 0 A2 e e f f A3 −2 −2 A4 A5 A6 −4 −4 −6 −4 −2 0 2 4 −6 −4 −2 0 2 4 A7 feature 1 feature 2 A8 A9 A10 A11 4 4 A12 A13 2 2 A14 e 4 e 5 A15 atur 0 atur 0 A16 e e f f A17 A18 −2 −2 A19 −4 −4 −6 −4 −2 0 2 4 −6 −4 −2 0 2 4 feature 3 feature 4 Fig.6. (Coloronline)ScatterplotsofthefirstfivefeaturesselectedbyPCA. featurevectors. The main difficulty is indesigning therulesand Inthisstudy,weautomaticallygenerateabinarydecisiontree makingthemindependentofabsolutequantitiessothattheywill basedonthetrainingdatausingtheCARTalgorithm[59].Givena bemorerobustandgenerallyapplicable. setoftrainingvectorsalongwiththeirclasslabels,abinarytree, ARTICLE IN PRESS K.Altunetal./PatternRecognition43(2010)3605–3620 3611 and a decision rule for each node of the tree, each node 4.6. Supportvectormachines(SVMs) corresponds to a particular subset of the training vectors where each element of that subset satisfies the conditions imposed by The support vector machine classifier is a machine learning the ancestors of that node. Thus, a decision at a node splits the techniqueproposedearlyinthe1980s[64–66].Ithasbeenmostly corresponding subset into two: those that satisfy the condition used in applications such as object, voice, and handwritten and those that do not. Naturally, the ideal split is expected to characterrecognition,andintextclassification. isolateaclassfromothersateachdecisionnode.Sincethisisnot If the feature vectors in the original feature space are not thecaseinpractice,adecisionruleisfoundbysearchingamong linearly separable, SVMs pre-process and represent them in a allpossibledecisionsthatminimizetheimpurityofthatnode.We higher-dimensional space where they can become linearly useentropyasameasureofimpurity,andtheclassfrequenciesat separable. The dimension of the transformed space may some- eachnodetoestimatetheentropy[59].Testvectorsarethenused times be much higher than the original feature space. With a toevaluatetheclassificationperformanceofthedecisiontree. suitablenonlinearmappingfð(cid:4)Þtoasufficientlyhighdimension, data from two different classes can always be made linearly separable, and separated by a hyperplane. The choice of the 4.3. Least-squaresmethod(LSM) nonlinear mapping method depends on the prior information availabletothedesigner.Ifinformationisnotavailable,onemight InLSM,theaveragereferencevectorforeachclassiscalculated choose to use polynomials, Gaussians, or other types of basis as a representative for that particular class. Each test vector is functions. The dimensionality of the mapped space can be compared with the average reference vector (instead of each arbitrarily high, however, in practice, it may be limited by individualreferencevector)asfollows: computational resources. The complexity of SVMs is related to the number of resulting support vectors rather than the high D2i ¼ XN ðxn(cid:3)rinÞ2¼ðx1(cid:3)ri1Þ2þ(cid:4)(cid:4)(cid:4)þðxN(cid:3)riNÞ2, i¼1,...,c ð1Þ dimInensthioisnaslittuydoyf, tthheetrSaVnMsfomrmeethdosdpaicse.applied to differentiate n¼1 featurevectorsthatbelongtomorethantwoclasses(19classes). Thetestvectorisassignedtothesameclassasthenearestaverage Following the one-versus-the-rest method, c different binary reference vector. In this equation, x¼[x ,x ,y,x ]T represents a 1 2 N classifiers are trained, where each classifier recognizes one of c testfeaturevector,r¼[ri1,ri2,y,riN]Trepresentstheaverageofthe activity types. A nonlinear classifier with a radial basis function reference feature vectors for each distinct class, and D2i is the kernel Kðx,xiÞ¼e(cid:3)gjx(cid:3)xij2 is used with g¼4. A library for SVMs squareofthedistancebetweenthesetwovectors. (LIBSVMtoolbox)isusedintheMATLABenvironment[67]. 4.7. Artificialneuralnetworks(ANN) 4.4. k-Nearestneighbor(k-NN)algorithm Multi-layerANNsconsistofaninputlayer,oneormorehidden Ink-NN,theknearestneighborsofthevectorxinthetrainingset layers to extract progressively more meaningful features, and a areconsideredandthevectorxisclassifiedintothesameclassas single output layer, each composed of a number of units called themajorityofitsknearestneighbors[56].TheEuclideandistance neurons.Themodelofeachneuronincludesasmoothnonlinear- measure is used. The k-NN algorithm is sensitive to the local ity, called the activation function. Due to the presence of structureofthedata.Theselectionoftheparameterk,thenumber distributed nonlinearity and a high degree of connectivity, ofneighborsconsidered,isaveryimportantissuethatcanaffectthe theoretical analysis of ANNs is difficult. These networks are decisionmade by the k-NN classifier. Unfortunately,a pre-defined trainedtocomputetheboundariesofdecisionregionsintheform rulefortheselectionofthevalueofkdoesnotexist.Inthisstudy, of connection weights and biases by using training algorithms. thenumberofnearestneighborskisdeterminedexperimentallyby TheperformanceofANNsisaffectedbythechoiceofparameters maximizingthecorrectclassificationrateoverdifferentkvalues. related to the network structure, training algorithm, and input signals,aswellasbyparameterinitialization[68,69]. Inthiswork,athree-layerANNisusedforclassifyinghuman 4.5. Dynamictimewarping(DTW) activities.TheinputlayerhasNneurons,equaltothedimension ofthefeaturevectors(30).Thehiddenlayerhas12neurons,and Dynamic time warping is an algorithm for measuring the theoutputlayerhascneurons,equaltothenumberofclasses.In similaritybetweentwosequencesthatmayvaryintimeorspeed. the input and hidden layers each, there is an additional neuron An optimal match between two given sequences (e.g. a time with a bias value of 1. For an input feature vector xARN, the series) is found under certain restrictions. The sequences are targetoutputis1fortheclassthatthevectorbelongsto,and0for ‘‘warped’’ nonlinearly in the time dimension to determine a all other output neurons. The sigmoid function used as the measure of their similarity independent of certain nonlinear activation function in the hidden and output layers is given by variations inthetime dimension. In DTW, theaim is to findthe g(x)¼(1+e(cid:3)x)(cid:3)1. least-cost warping path for the tested feature vector among the The output neurons can take continuous values between 0 stored reference feature vectors [60] where the cost measure is and1.FullyconnectedANNsaretrainedwiththeback-propaga- typicallytakenastheEuclideandistancebetweentheelementsof tionalgorithm[68]bypresentingasetoftrainingpatternstothe the feature vectors. DTW is used mostly in automatic speech network. The aim is to minimize the average of the sum of recognition to handle different speaking speeds [60,61]. Besides squarederrorsoveralltrainingvectors: speech recognition, DTW has been used in signature and gait rtieocno,gfnoirtiowno,rfdorspECotGtinsiggninalhcalansdswifirciattteionnh, ifsotrofirincaglerdporcinutmveenrtisficoan- EavðwÞ¼ 21IXI Xc ½tik(cid:3)oikðwÞ(cid:5)2 ð2Þ electronic media and machine-printed documents, and for face i¼1k¼1 localizationincolorimages[62,63].Inthisstudy,DTWisusedfor Here,wistheweightvector,t ando arethedesiredandactual ik ik classifying feature vectors of different activities extracted from output values for the ith training pattern and the kth output thesignalsofminiatureinertialsensors. neuron,andIisthetotalnumberoftrainingpatterns.Whenthe ARTICLE IN PRESS 3612 K.Altunetal./PatternRecognition43(2010)3605–3620 entire training set is covered, an epoch is completed. The error Table1 betweenthedesiredandactualoutputsiscomputedattheendof Correct differentiation rates for all classification methods and three cross- each iteration and these errors are averaged at the end of each validationtechniques. epoch(Eq.(2)).Thetrainingprocessisterminatedwhenacertain Method Correctdifferentiationrate(%)7onestandarddeviation precision goal on the average error is reached or if the specified maximum number of epochs (5000) is exceeded, whichever RRSS P-fold L1O occurs earlier. The latter case occurs very rarely. The acceptable BDM 99.170.12 99.270.02 75.8 average error level is set to a value of 0.03. The weights are RBA 81.071.52 84.570.44 53.6 initialized randomly with a uniform distribution in the interval LSM 89.470.75 89.670.10 85.3 [0,0.2],andthelearningrateischosenas0.2. k-NN(k¼7) 98.270.12 98.770.07 86.9 Inthetestphase,thetestfeaturevectorsarefedforwardtothe DTW1 82.671.36 83.270.26 80.4 network,theoutputsarecomparedwiththedesiredoutputs,and DTW2 98.570.18 98.570.08 85.2 SVM 98.670.12 98.870.03 87.6 theerrorbetweenthemiscalculated.Thetestvectorissaidtobe ANN 86.973.31 96.270.19 74.3 correctlyclassifiedifthiserrorisbelowathresholdvalueof0.25. TheresultsoftheRRSSandP-foldcross-validationtechniquesarecalculatedover 10runs,whereasthoseofL1Oareoverasinglerun. 5. Experimentalresults The classification techniques described in Section 4 are usuallylowerthan0.5%withafewexceptions.Fromthetable,it employed to classify the 19 different activities using the 30 canbeobservedthatthereisnotasignificantdifferencebetween features selected by PCA. A total of 9120 (¼60 feature vectors the results of RRSS and P-fold cross-validation techniques. The (cid:2)19 activities (cid:2)8 subjects) feature vectors are available, each results of subject-based L1O are always lower than the two. In containingthe30reducedfeaturesofthe5-ssignalsegments.In terms of reliability and repeatability, the P-fold cross-validation thetrainingandtestingphasesoftheclassificationmethods,we technique results in smaller standard deviations than RRSS. usetherepeatedrandomsub-sampling(RRSS),P-fold,andleave- Because L1O cross validation would give the same classification one-out(L1O)cross-validationtechniques.InRRSS,wedividethe percentageifthecompletecycleoverthesubject-basedpartitions 480 feature vectors from each activity type randomly into two isrepeated,itsstandarddeviationiszero. setssothatthefirstsetcontains320featurevectors(40fromeach Among the classification techniques we considered and subject)andthesecondsetcontains160(20fromeachsubject). implemented, when RRSS and P-fold cross-validation is used, Therefore,two-thirds(6080)ofthe9120featurevectorsareused BDM gives the highest classification rate, followed by SVM and fortrainingand one-third(3040) fortesting. This is repeated10 k-NN. RBA and DTW perform the worst in general. In subject- 1 times and the resulting correct differentiation percentages are basedL1Ocrossvalidation,SVMisthebest,followedbyk-NN.The averaged.Thedisadvantageofthismethodisthatsomeobserva- correctclassificationratesreportedforL1Ocrossvalidationcanbe tionsmayneverbeselectedinthetestingorthevalidationphase, interpretedastheexpectedcorrectclassificationrateswhendata whereasothersmaybeselectedmorethanonce.Inotherwords, from a new subject are acquired and given as input to the validationsubsetsmayoverlap. classifiers.Themostsignificantdifferenceintheperformancesof InP-foldcrossvalidation,the9120featurevectorsaredivided thedifferentvalidationmethodsisobservedfortheBDMmethod into P¼10 partitions, where the 912 feature vectors in each (Table1).TheRRSSandP-foldcrossvalidationresultin99%correct partition are selected completely randomly, regardless of the classificationrate,suggestingthatthedataarewellrepresentedby subject or the class they belong to. One of the P partitions is a multi-variate Gaussian distribution. However, the 76% correct retainedasthevalidationsetfortesting,andtheremainingP(cid:3)1 classification rate of L1O cross validation implies that the partitions are used for training. The cross-validation process is parametersoftheGaussian,whencalculatedbyexcludingoneof thenrepeatedPtimes(thefolds),whereeachofthePpartitionsis the subjects, cannot represent the data of the excluded subject usedexactlyonceforvalidation.ThePresultsfromthefoldsare sufficientlywell.Thus,ifoneistoclassifytheactivitiesofanew then averaged to produce a single estimation. The random testsubjectwhosetrainingdataarenotavailabletotheclassifiers, partitioning is repeated 10 times and the average correct SVM,k-NN,orLSMmethodscouldbeused. differentiation percentage is reported. The advantage of this We chose to employ the P-fold cross-validation technique in validation method over RRSS is that all feature vectors are used reporting the results presented in Tables 2–8. Looking at the forbothtrainingandtesting,andeachfeaturevectorisusedfor confusionmatricesofthedifferenttechniques,itcanbeobserved testingexactlyonceineachofthe10runs. thatA7andA8aretheactivitiesmostconfusedwitheachother. Finally,wealsousedsubject-basedL1Ocrossvalidation,where This is because both of these activities are performed in the the 7980 (¼60 vectors (cid:2)19 activities (cid:2)7 subjects) feature elevator and the signals recorded from these activities have vectorsofsevenofthesubjectsareusedfortrainingandthe1140 similarsegments. Therefore, confusionat the classification stage feature vectors of the remaining subject are used in turn for becomesinevitable.A2andA7,A13andA14,aswellasA9,A10, validation. This is repeated eight times such that the feature A11,arealsoconfusedfromtimetotimeforsimilarreasons.Two vectorsetofeachsubjectisusedonceasthevalidationdata.The activitiesthatarealmostneverconfusedareA12andA17. eightcorrectclassificationratesareaveragedtoproduceasingle The confusion matrices for BDM and RBA are provided in estimate. This is similar to P-fold cross validation with P being Tables2and3.Withthesemethods,correctdifferentiationrates equaltothenumberofsubjects(P¼8),andwhereallthefeature of99.2%and84.5%are,respectively,achieved.Thefeaturesused vectors in the same partition are associated with the same intheRBAcorrespondtothe30featuresselectedbyPCAandthe subject. ruleschangeateverytrainingcycle. Correct differentiation rates of the classification techniques In the LSM approach, test vectors are compared with the over 10 runs and their standard deviations are tabulated in average of the reference vectors calculated for each of the Table 1 for the three cross-validation techniques we considered. 19 activities. The confusion matrix for this method is provided With RRSS and P-fold cross-validation, all of the correct in Table 4. The overall successful differentiation rate of LSM differentiation rates are above 80%, with standard deviations is89.6%. ARTICLE IN PRESS K.Altunetal./PatternRecognition43(2010)3605–3620 3613 Table2 ConfusionmatrixforBDM(P-foldcrossvalidation,99.2%). True Classified A1 A2 A3 A4 A5 A6 A7 A8 A9 A10 A11 A12 A13 A14 A15 A16 A17 A18 A19 A1 480 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 A2 0 478 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 A3 0 0 478 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 A4 0 0 0 480 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 A5 0 0 0 0 478 0 0 2 0 0 0 0 0 0 0 0 0 0 0 A6 0 0 0 0 0 477 0 3 0 0 0 0 0 0 0 0 0 0 0 A7 0 0 0 0 0 0 467 13 0 0 0 0 0 0 0 0 0 0 0 A8 0 0 0 0 0 0 44 435 0 0 0 0 0 0 0 0 0 0 1 A9 0 0 0 0 0 0 0 1 479 0 0 0 0 0 0 0 0 0 0 A10 0 0 0 0 0 0 0 0 0 478 2 0 0 0 0 0 0 0 0 A11 0 0 0 0 0 0 0 0 0 0 480 0 0 0 0 0 0 0 0 A12 0 0 0 0 0 0 0 0 0 0 0 480 0 0 0 0 0 0 0 A13 0 0 0 0 0 0 0 0 0 0 0 0 479 1 0 0 0 0 0 A14 0 0 0 0 0 0 0 2 0 0 0 0 0 478 0 0 0 0 0 A15 0 0 0 0 0 0 0 0 0 0 0 0 0 0 480 0 0 0 0 A16 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 480 0 0 0 A17 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 480 0 0 A18 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 480 0 A19 0 0 0 0 0 0 0 4 0 0 0 0 0 0 0 0 0 0 476 Table3 ConfusionmatrixforRBA(P-foldcrossvalidation,84.5%). True Classified A1 A2 A3 A4 A5 A6 A7 A8 A9 A10 A11 A12 A13 A14 A15 A16 A17 A18 A19 A1 418 6 22 4 0 0 8 2 0 0 1 1 0 5 7 2 2 0 2 A2 9 404 0 1 2 5 37 9 1 0 2 2 1 1 3 2 0 1 0 A3 12 1 439 13 0 0 2 1 1 1 1 2 3 0 2 1 1 0 0 A4 7 3 11 446 4 0 4 0 0 0 0 0 0 0 0 2 1 2 0 A5 0 2 1 1 421 1 4 11 10 4 5 1 1 4 0 6 3 3 2 A6 4 2 1 0 4 409 10 19 7 0 0 0 9 1 7 0 1 5 1 A7 8 33 2 3 1 16 360 48 0 0 0 0 1 0 2 0 1 2 3 A8 2 17 1 2 21 31 60 266 10 4 4 1 13 7 7 3 8 7 16 A9 0 3 1 1 6 5 1 4 397 20 11 4 7 8 0 9 0 2 1 A10 0 1 0 1 2 1 0 2 14 416 27 0 2 7 1 2 0 2 2 A11 0 1 1 2 2 0 0 2 13 38 404 3 1 9 0 1 0 2 1 A12 1 0 2 2 0 0 0 0 0 3 2 456 2 2 2 3 0 1 4 A13 1 1 0 0 1 0 1 4 8 1 3 4 404 35 6 6 0 1 4 A14 0 1 1 1 1 0 0 8 5 3 6 5 20 411 5 4 0 3 6 A15 3 0 2 0 1 8 0 4 2 0 0 3 4 1 432 9 9 0 2 A16 2 3 1 1 9 2 1 3 3 2 3 2 9 8 7 420 2 0 2 A17 1 0 3 0 2 7 0 1 1 0 0 0 2 0 5 1 455 2 0 A18 0 1 1 1 2 7 1 9 8 1 2 4 0 1 1 2 5 430 4 A19 1 1 1 3 1 6 1 17 2 5 2 11 12 10 0 1 7 5 394 Performanceofthek-NNmethodchangesfordifferentvalues validation,19differentSVMmodelsarecreatedforclassifyingthe ofk.Avalueofk¼7gavethebestresults,thereforetheconfusion vectorsineachpartition,resultinginatotalof190SVMmodels. matrixofthek-NNalgorithmisprovidedfork¼7inTable5,anda Thenumberofcorrectlyandincorrectlyclassifiedfeaturevectors successfuldifferentiationrateof98.7%isachieved. foreachactivitytypeistabulatedinTable8(a).Theoverallcorrect We have implemented the DTW algorithm in two different classificationrateoftheSVMmethodiscalculatedas98.8%. ways: In the first (DTW ), the average reference feature vector For ANN, since the network classifies some samples as 1 of each activity is used for distance comparison. The confusion belonging to none of the classes and output neurons take matrix for DTW is presented in Table 6, and a correct continuous values between 0 and 1, it is not possible to form a 1 differentiation rate of 83.2% is achieved. As a second approach confusion matrix. The number of correctly and incorrectly (DTW ), DTW distances are calculated between the test vector classified feature vectors with P-fold cross validation is given in 2 andeachofthe8208(¼9120 (cid:3)912)referencevectorsfromother Table8(b).Theoverallcorrectclassificationrateofthismethodis classes.Theclassofthenearestreferencevectorisassignedasthe 96.2%. On average, the network converges in about 400 epochs classofthetestvector.ThesuccessrateofDTW is98.5%andthe whenP-foldcrossvalidationisused. 2 correspondingconfusionmatrixisgiveninTable7. Todeterminewhichactivitiescanbedistinguishedeasily,we In SVM, following the one-versus-the-rest method, each type employthereceiveroperatingcharacteristic(ROC)curvesofsome of activity is assumed as the first class and the remaining 18 of the classifiers [56]. For a specific activity, we consider the activitytypesaregroupedintothesecondclass.WithP-foldcross instancesbelongingtothatactivityaspositiveinstances,andall ARTICLE IN PRESS 3614 K.Altunetal./PatternRecognition43(2010)3605–3620 Table4 ConfusionmatrixforLSM(P-foldcrossvalidation,89.6%). True Classified A1 A2 A3 A4 A5 A6 A7 A8 A9 A10 A11 A12 A13 A14 A15 A16 A17 A18 A19 A1 415 0 60 0 0 0 3 2 0 0 0 0 0 0 0 0 0 0 0 A2 4 398 0 0 0 1 72 5 0 0 0 0 0 0 0 0 0 0 0 A3 3 0 471 0 0 0 1 0 0 0 0 0 0 0 0 0 5 0 0 A4 0 0 0 478 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 A5 0 0 0 0 480 0 0 0 0 0 0 0 0 0 0 0 0 0 0 A6 0 0 0 0 0 448 0 32 0 0 0 0 0 0 0 0 0 0 0 A7 16 55 0 1 6 1 350 51 0 0 0 0 0 0 0 0 0 0 0 A8 1 11 0 0 9 5 57 384 9 0 0 0 0 0 0 0 0 0 4 A9 0 0 0 0 19 7 0 0 361 52 35 0 6 0 0 0 0 0 0 A10 0 0 0 0 0 0 0 0 0 414 66 0 0 0 0 0 0 0 0 A11 0 0 0 0 1 0 0 0 0 78 401 0 0 0 0 0 0 0 0 A12 0 0 0 0 0 0 0 0 0 0 0 480 0 0 0 0 0 0 0 A13 0 0 0 0 0 0 0 4 0 0 0 0 466 9 0 0 0 1 0 A14 0 0 0 0 0 0 0 1 0 0 0 0 123 347 0 0 0 9 0 A15 0 0 0 0 0 0 0 0 0 0 0 0 1 0 476 1 2 0 0 A16 0 0 0 0 16 0 0 0 0 0 0 0 0 1 0 462 0 1 0 A17 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 480 0 0 A18 0 0 0 0 1 65 0 0 15 0 0 0 0 0 0 0 0 399 0 A19 0 0 0 0 0 1 0 11 0 0 0 1 2 0 0 0 0 0 465 Table5 Confusionmatrixforthek-NNalgorithmfork¼7(P-foldcrossvalidation,98.7%). True Classified A1 A2 A3 A4 A5 A6 A7 A8 A9 A10 A11 A12 A13 A14 A15 A16 A17 A18 A19 A1 480 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 A2 0 479 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 A3 0 0 480 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 A4 0 0 0 479 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 A5 0 0 0 0 480 0 0 0 0 0 0 0 0 0 0 0 0 0 0 A6 0 0 0 0 0 480 0 0 0 0 0 0 0 0 0 0 0 0 0 A7 1 11 0 0 0 2 446 20 0 0 0 0 0 0 0 0 0 0 0 A8 0 5 0 0 4 10 38 422 1 0 0 0 0 0 0 0 0 0 0 A9 0 0 0 0 0 1 0 0 477 1 1 0 0 0 0 0 0 0 0 A10 0 0 0 0 0 0 0 0 0 474 6 0 0 0 0 0 0 0 0 A11 0 0 0 0 0 0 0 0 0 1 479 0 0 0 0 0 0 0 0 A12 0 0 0 0 0 0 0 0 0 0 0 480 0 0 0 0 0 0 0 A13 0 0 0 0 0 0 0 0 0 0 0 0 476 4 0 0 0 0 0 A14 0 0 0 0 0 0 0 0 0 0 0 0 1 479 0 0 0 0 0 A15 0 0 0 0 0 0 0 0 0 0 0 0 0 0 479 1 0 0 0 A16 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 479 0 0 0 A17 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 480 0 0 A18 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 480 0 A19 0 0 0 0 0 0 0 7 0 0 0 0 1 0 0 0 0 0 472 otherinstancesasnegativeinstances.Then,bysettingadecision atestvectorandtheaveragereferencevectorofeachclass;and threshold or criterion for a classifier, the true positive rate (TPR) in ANN, the norm of the difference between the desired and (theratioofthetruepositivestothetotalpositives)andthefalse actual outputs. Since there are 19 activities, the number of positive rate (FPR) (the ratio of the false positives to the total positive instances of each class is much less than the number negatives)canbecalculated.Varyingthedecisionthresholdover of negative instances. Consequently, the FPRs are expected to aninterval,asetofTPRsandthecorrespondingFPRsareobtained be low and therefore, we plot the FPR in the logarithmic scale andplottedasaROCcurve. for better visualization. It can be observed in Fig. 7 that the Fig. 7 depicts the ROC curves for BDM, LSM, k-NN, and ANN sensitivity of BDM classifier is the highest. A test vector from classifiersasexamples.InBDMandk-NN,thedecisionthreshold classes A2, A7, or A8 is less likely to be correctly classified than is chosen as the posterior probability. For BDM, the posterior a test vector belonging to one of the other classes. It is also probability is calculated using the Bayes’ rule. For k-NN, it is confirmed by the confusion matrices that these are the most estimatedbytheratio(k+1)/(k+c),wherek¼7forourcase,c¼19 confused activities. For the LSM classifier, the same can be said i is the total number of classes, and k is the number of training forA13andA14,aswellasforA9,A10,andA11wheretheFPRs i vectors that belong to class o, out of the k nearest neighbors. for a given TPR are rather high. Despite this, for a tolerable i This gives smoother estimates than using binary probabilities. FPR such as, say, 0.1, the TPR for LSM and ANN still remains InLSM,thedecisionthresholdischosenasthedistancebetween above0.75.

Description:
are performed using body-worn miniature inertial and magnetic sensors. being human activity monitoring, recognition, and classification through
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.