Table Of Content

SenseGen: A Deep Learning Architecture for Synthetic Sensor Data Generation Moustafa Alzantot Supriyo Chakraborty Mani Srivastava University of California, Los Angeles IBM T. J. Watson Research Center University of California, Los Angeles [email protected] [email protected] [email protected] Abstract—Ourabilitytosynthesizesensorydatathatpreserves data segments that are sensitive to the user thus protecting specificstatisticalpropertiesoftherealdatahashadtremendous privacy and resulting in improved analytics. 7 implicationsondataprivacyandbigdataanalytics.Thesynthetic However, increasingly adversarial roles taken by the data 1 data can be used as a substitute for selective real data segments recipients mandate that the synthetic data, in addition to 0 – that are sensitive to the user – thus protecting privacy and 2 resulting in improved analytics. However, increasingly adver- preserving statistical properties, should also be “difficult” to sarial roles taken by data recipients such as mobile apps, or distinguishfromtherealdata.Eveninnon-adversarialsettings, n other cloud-based analytics services, mandate that the synthetic analytics services can behave in unexpected ways if the input a data, in addition to preserving statistical properties, should also J data is different from the expected data, thereby requiring the 1 binesp“edcitfifiocnuhltastobedeinstiunsgeudisahsafrtoemstttohedirsetianlgduaistha.bTetywpeiceanlldy,atvaisseutasl. synthesizedandrealdatasetstoexhibit“similarity”.Typically, 3 But more recently, sophisticated classifier models (discrimina- visualinspectionhasbeenusedasatesttodistinguishbetween tors), corresponding to a set of events, have also been employed datasets. But more recently, sophisticated classifier models G] to distinguish between synthesized and real data. The model (discriminators), corresponding to a set of events, have also operates on both datasets and the respective event outputs are been employed to distinguish between synthesized and real L compared for consistency. data. The model operates on both datasets and the respective . Priorworkondatasynthesishaveoftenfocussedonclassifiers s eventoutputsarecomparedforconsistency.Infact,priorwork that are built for features explicitly preserved by the synthetic c [ data. This suggests that an adversary can build classifiers that on data synthesis have often focussed on classifiers that are canexploitapotentiallydisjointsetoffeaturesfordifferentiating built for features explicitly preserved by the synthetic data. 1 between the two datasets. In this paper, we take a step towards This suggests that an adversary can build classifiers that can v generating sensory data that can pass a deep learning based exploit a potentially disjoint set of features for differentiating 6 discriminator model test, and make two specific contributions: 8 first, we present a deep learning based architecture for synthe- between the two datasets. 8 sizing sensory data. This architecture comprises of a generator In this paper, we present SenseGen – a deep learning 8 model, which is a stack of multiple Long-Short-Term-Memory based generative model for synthesizing sensory data. While 0 (LSTM) networks and a Mixture Density Network (MDN); deep learning methods are known to be capable of generating . second,weuseanotherLSTMnetworkbaseddiscriminatormodel 1 realistic data samples, training them was considered to be for distinguishing between the true and the synthesized data. 0 difficult requiring large amounts of data. However, recent Using a dataset of accelerometer traces, collected using smart- 7 phones of users doing their daily activities, we show that the work on generative models such as Generative Adversarial 1 deep learning based discriminator model can only distinguish Networks[4],[5](GAN)andvariationalauto-encoders [6],[7] : v between the real and synthesized traces with an accuracy in the haveshownthatitispossibletotrainthesemodelswithmoder- i neighborhood of 50%. X atesizeddatasets.GANshaveprovensuccessfulingenerating differenttypesofdataincludingphoto-realistichighresolution ar I. INTRODUCTION images[8],realisticimagesfromtextdescription[9],andeven A large number of data recipients (e.g., mobile apps, and for new text and music composition [10], [11]. Furthermore, other cloud-based big-data analytics) rely on the collection of inspired by the architecture of GANs, we also use a deep personalsensorydatafromdevicessuchassmartphones,wear- learning based discriminator model. The goal of the generator ables,andhomeIoTdevicestoprovideservicessuchasremote model is to synthesize data that can pass the discriminator health monitoring [1], location tracking [2], automatic indoor test that is designed to distinguish between synthesized and map construction and navigation [3] and so on. However, the real data. Note, unlike prior work on data synthesis, a deep prospect of sharing sensitive personal data, often prohibits learningbaseddiscriminatorisnottrainedonapre-determined large-scale user adoption and therefore the success of such set of features. Instead, it continuously learns the best set of systems.Tocircumventtheseissuesandincreasedatasharing, features that can be used to differentiate between the real and syntheticdatagenerationhasbeenusedasanalternativetoreal synthesized data making it hard for the generator to pass the data sharing. The generated data preserves only the required discriminator test. statistics of the real data (used by the apps to provide service) Tosummarize,wemaketwocontributions.First,wepresent and nothing else and are used as a substitute for selective real a deep learning based architecture for synthesizing sensory data.Thisarchitecturecomprisesofageneratormodel,which h which is updated at each time-step according to the new t is a stack of multiple Long-Short-Term-Memory (LSTM) input x and previous internal state memory value h t t−1 networks and a Mixture Density Network (MDN). Second, we use another LSTM network based discriminator model h =σ(W h +W x +b ) t hh t−1 xh t h for distinguishing between the true and the synthesized data. where σ(x) is the sigmoid activation function Usingadatasetofaccelerometertraces,collectedusingsmart- phones of users doing their daily activities, we show that the 1 σ(x)= deep learning based discriminator model can only distinguish 1+e−x between the real and synthesized traces with an accuracy in Also each unit generates another time-series of outputs o as t the neighborhood of 50%. a function of the internal memory state which is computed The rest of this paper is organized as follows: Section II according to the following equation: provides a description for our model architecture and the o =tanh(W h +b ) training algorithm used. This is followed by Section III that t o t o describes our experimental design and initial results. Finally, The set θ = {W ,W ,b ,W ,b } represents the RNN hh xh h o o Section IV concludes the paper. cell parameters. The RNN training algorithm picks the values II. MODELDESIGN of θ that minimizes the defined loss function. In order to handle complex time-series sequences, Multiple Sensorsdata,e.g.accelerometer,gyroscope,barometer,etc., RNN units can be used at the same layer and also multiple are represented as a sequence of values x = (x ,x ,...,x ) 1 2 T RNN units can be stacked on top of each other such that the wherex ∈Rd,fori=1,...,|T|wheredisthedimensional- i time series of outputs from the RNN units at one layer are ityofthetimeseries(i.e.d=3incaseof3-axisaccelerometer usedasinputstotheRNNunitsontopofthem.Thisway,we ) and T is the number of time steps for which the data has candesignmorepowerfulrecurrentneuralnetworkswhichare been collected. both deep and wide. Like other neural networks, we train a SenseGen consists of two deep learning models: recurrentneuralnetworkspossiblebyusingamodifiedversion • Generator (G): The generator G is capable of generating of back-propagation known as back-propagation through time new synthetic time series data from random noise input. (BPTT) algorithm [12]. However, RNN units suffer from two • Discriminator (D): The goal of the discriminator D is major problems during training deep models over long time- to assess the quality of the examples generated by the seriesinputs.First,itisthevanishinggradientproblem,where generator G. the error gradient goes to zero during propagation presenting BothG andDarebasedonrecurrentneuralnetworkmodels difficulty while learning the weights of early layers or captur- whichhaveshownalotofsuccessinsequentialdatamodeling. inglong-termdepedencies.Second,itistheexplodinggradient We describe the model details below. problem, where the gradient value might grow exponentially causing numerical errors in the training algorithm. These two Algorithm 1 Training algorithm problems present a major hurdle in training RNNs. To solve 1: for t=1,2,...,T do the exploding gradient problem, the gradient value is clipped 2: Sample Xtrue minibatch from true data at each unit, while modified architectures of RNN units such 3: Sample Xgen minibatch from the generative model G as the Long Short Term Memory (LSTM) [13] and Gated 4: Train the discriminative model D on the training set Recurrent Units (GRU) [14] have been introduced to come (X ,X ) for 200 epochs true gen over the vanishing gradient problem. 5: Sample another Xtrue minibatch from true data LSTMunitsaremodifiedversionofthestandardRNNunits 6: Sample another Xgen minibatch from the generative that add three additional gates inside the RNN unit : input model G gate (i ), forget gate (f ) and output gate (o ). The values of t t t 7: TrainthegenerativemodelG onthetrainingset(Xtrue) thesegatesarecomputedasfunctionsoftheunit’sinternalcell for 100 epochs state c and current input x . These gates are used to control t t 8: end for what information being stored in the unit’s internal memory h to avoid vanishing gradient problem and become better in t remembering sequence dependencies for longer range. The A. Generative Model gates, internal memory and LSTM unit output at each time Recurrent neural networks (RNN) are a class of neural step are computed according to the following equations: networks which are distinguished by having units with feedback cycles which allows the units to maintain a memory of f =σ(W x +W h +b ) t xf t hf t−1 f state about the previous inputs. This makes them suitable for handling tasks dealing with sequential time-series inputs. The it =σ(Wxixt+Whiht−1+bi) input time-series is applied to the neural network units one o =σ(W x +W h +b ) t xo t ho t−1 o step at time. Each RNN artificial neuron (often called RNN unit or RNN cell) maintains a hidden internal state memory c =f (cid:12)c +i (cid:12)tanh(W h +W x +b ) t t t−1 t hc t−1 xc t c h =o (cid:12)tanh(c ) Our generative model architecture is shown in Figure II-A. t t t At the bottom we have a stack of 3 layers (l(1),l(2),l(3)) of where (cid:12) is the elementwise multiplication. In the rest of the t t t LSTM units. Each layer has 256 units. paper, we define the function LSTM that maps the current input xt and current LSTM unit output ht to new output as lt(1) =LSTM(lt(−1)1,xt) an abstraction of the previous LSTM update equations. l(2) =LSTM(l(2) ,l(1)) h =LSTM(x ,h ) t t−1 t t t t−1 LikethestandardRNNunits,LSTMunitscanalsobestacked l(3) =LSTM(l(2) ,l(1)) t t−1 t on top of each other in order to model complex time-series The output from the last LSTM layer is feed into a fully data.WeuseLSTMsinourmodelbecausetheyaresuccessful connected layer with 128 units with sigmoid activations. in modeling sequences with long-term dependencies. Recurrent Neural networks can be used for the generation l(4) =σ(W l(3)+b ) t 4 t 4 of a sequence with any length by predicting the sequence one step at a time. At each time step, the network output y is where W ∈ R256x128,b ∈ R128 The final layer is another t 4 4 used to define a probability distribution for the next step x fully connected layer with 72 output units. t+1 value. l(5) =σ(W l(4)+b ) x ∼pr(x |y ) t 5 t 5 t+1 t+1 t The value xt+1 is then fed back into the model as a new where W5 ∈R128x72,b5 ∈R72. The outputs from the last input to predict another time step. By repeatedly doing this, it layerlt(5) areusedastheweightsandparametersoftheoutput is theoretically possible to generate a sequence of any length. Gaussian mixture model (GMM). However, the choice of output distribution becomes critical π (x )=softmax(l(5)[1...24]) and must be chosen carefully to represent the type of data t 1..t t we are generating. The simplest choice that we consider the µ (x )=l(5)[25...48] t 1..t t output y as the next step sample x = y . and then we t t+1 t define the loss as the root mean squared difference between σt(x1..t)=e(lt(5)[49...72]) the sequence of inputs and the sequence of predictions. where the softmax function: T LG(θ )=(cid:88)(x −y )2 exk G t t softmax(x )= t=1 k (cid:80)24 exj j=1 Then we train the whole model by using gradient descent isusedtoensurethatweightsdefinedbypi arenormalized(i.e to minimize the loss value. However, we find this setup to be t (cid:80)24 π =1), and the exponential function while computing incapable of generating good sensory data sequences for the k=1 k the standard deviation of the Gaussians σ is meant to ensure following reasons: t that the σ is positive. The mixture weights π , guassian t t • Since all RNN update equations are deterministic, this means µ , and standard deviations σ are used to define to t t means that if you try generating sequences from a given a probability distribution for the next output start input value x (usually starting by zero) the model 0 will generate the same sequence again at every-time. pr(xt+1|πt,µt,σt)= • Assigningthemodeloutputasthenextsamplemeansthat 24 (cid:88) the next sample distribution is a uni-model distribution πk(x )∗N(x ;µk(x ),σk(x )) t 1..t t+1 t 1..t t 1..t with zero variance. Because for sensory data at a given k=1 step more than one value can be a good choice for the from which we can sample the predicted next step value. next step a uni-modal prediction is not enough. A more (cid:16) (cid:17) flexible generation of sensory data requires probabilistic x ∼pr x |π (x ),µ (x ),σ (x ) t+1 t+1 t 1..t t 1..t t 1..t sampling from a multi-modal distribution on top of the RNN. The whole model is trained end-to-end by RMSProp [17] and truncated back-propagation through time with a cost As a solution for these issues, we use Mixture Density functionL(θ )definedtoincreasethelikelihoodofgenerating Network (MDN) [15]. Mixture density network is a com- G the next timestep value. This is the equivalent to minimizing bination of a neural network and mixture distribution. The the negative log likelihood L(θ ) with respect to the set of outputsofneuralnetworkareusedtospecifytheweightsofthe G generative model parameters θ . mixtures and the parameters of each distribution. [16] shows G how MDN with Gaussian mixture model (GMM) defined on topofarecurrentneuralnetworkissuccessfulinlearninghow T (cid:88) to generate highly realistic handwriting by predicting the pen LG(θ )=− log(pr(x |π (x ),µ (x ),σ (x ))) G t+1 t 1..t t 1..t t 1..t location one point at a time. t=1 B. Discriminative Model showsthenegativeloglikelihoodcostofthegenerativemodel In order to quantify the similarity between the generated LG during training. This means that the model is becoming time-seriesandtherealsensortimeseriescollectedfromusers. better in assigning higher probability for the true next-step We build another model D whose goal is to distinguish values during prediction. between samples generated by G. The discriminative model D is trained to distinguish between the samples coming from the dataset for real sensor traces Xtrue and others samples 1 from the dataset X which is generated by the model G. gen ThearchitectureofmodelDconsistsofalayerof64LSTM 0 unitsfollowedbyafullyconnectedlayerswith16hiddenunits 1 using sigmoid activation function and an output layer with a single unit with sigmoid activation function. The output value 2 L thisdiscriminativemodelwhenagivenaninputsensorvalues NL 3 timeseries x is interpreted as the probability that the given test input timeseries is coming from the real dataset Xtrue. 4 D(x )=Pr(x ∈X ) 5 test test true We train the model D in a supervised way by using a 6 0 5000 10000 15000 20000 25000 training data consists from mini-batchs of m samples from Epochs the real data dataset X rue with their target output = 1, and t other mini-batchsof m samples generatedfrom the themodel G withtheirtargetoutput=0.Eachsamplesisatimeseriesof Fig.3. NegativeLoglikelihoodcostofthegenerativemodelduringtraining 400steps.Thetrainingaimstominimizethecross-entropyloss LD withrespecttothesetofdiscriminitivemodelparameters. Generative loss during training Figure 2 shows a visual (cid:32) m (cid:33) comparison between 4 random samples generated by genera- LD(θ )=− (cid:88)log(D(X(i) ))+log(1−D(X(i))) tive model G and 4 random subset of real accelerometer time- D true gen series values from the HAR dataset. i=1 III. RESULTSANDANALYSIS Indistinguishability between synthesized and real data samples We use another deep learning model D whose is For our experiments and evaluation studies, We use the training to quantify the differences between the real samples Human Activity Recognition database [18] as our training and the synthesized samples. Figure 4 shows how the accu- data. The HAR database contains accelerometer and gyro- racy of this model goes down as training continues. At the scope recordings of 30 individuals while performing activities beginning the accuracy in deciding whether input samples are of daily living (ADL) (Walking, walking upstairs, walking synthesized is almost 100% However, as we train the models downstairs, sitting, standing, and laying). Accelerometer and for more epochs, the accuracy of model D in identifying the gyroscopewerecollectedat50HzfromaSamsungGalaxySII synthetic samples reduces to around 50%. phone attached to the user’s the waist. The accelerometer and gyroscope values were pre-processed to compute the linear acceleration (by removing the gravity component). IV. CONCLUSION We train the deep learning model using Google Tensor- Flow [19] deep learning framework r0.11 on Nvidia GTX In this paper, we outlined our initial experiences of using a Titan X GPU with 3,584 CUDA cores running 11 TFLOPS deeplearningbasedarchitectureforsynthesizingtimeseriesof with12GBMemory.Thetrainingtakesabout5hoursuntilthe sensorydata.Weidentifiedthatthesynthesizeddatashouldbe generative model converges after 20,000 epochs when trained able to pass a deep learning based discriminator test designed on a time-series of 7000 time steps. to distinguish between the synthesized and true data. We then Evaluating a generative model is challenging because it is demonstrated that our generator can be successfully used to hard to find one metric that quantifies how realistic the output beat such a discriminator by restricting its accuracy to around looks and also how novel is it compared to the training data 50%. (to avoid the trap of having a model that just remembers the Our generator-discriminator model pair is a GAN-similar inputtrainingdataandoutputsitagain).Thesemetricsshould architecture. However, due to the difficulties of doing back- be specific according to the type of the data the model is propagation through the MDN-based stochastic network, we trained on. Prior work on generative models for images resort do not yet incorporate adversarial training by feeding back to human judgment of output samples quality. In our work, the discriminator output into the generator training. we hope we use the following methods for qualification: to close the feedback loop between the discriminator and the Generative loss during training We show how the loss generator model for synthesizing even more effective data of the generative model goes down while training. Figure III samples. [10] L. Yu, W. Zhang, J. Wang, and Y. Yu, “Seqgan: sequence generative Accuracy of Discriminator in distinguishing between generated samples and true samples adversarialnetswithpolicygradient,”arXivpreprintarXiv:1609.05473, 2016. 1.0 [11] S. R. Bowman, L. Vilnis, O. Vinyals, A. M. Dai, R. Jozefowicz, andS.Bengio,“Generatingsentencesfromacontinuousspace,”arXiv preprintarXiv:1511.06349,2015. 0.8 [12] P.J.Werbos,“Backpropagationthroughtime:whatitdoesandhowto doit,”ProceedingsoftheIEEE,vol.78,no.10,pp.1550–1560,1990. Discriminator Accuracy00..46 [[[111345]]] crSJCe.o..cmHuCMprhor.ueuctBnhnatrgitesi,nohitenoeCu,prr.,vaaol“GnlMdn.uë9liJtcx¸,w.etnuhSoorrrcee.kh,s8dm,”,eKinpdC.sphio.utCyRb1hRe7nor,3e,,5ta“w–abL1noso/7dr1kn85sg0Y0,”,.2s1h1.0B9o929r9et37n4-6t.g.e7iro,m,20“m1Ge5am.teodry,f”eeNdebuarcakl [16] A.Graves,“Generatingsequenceswithrecurrentneuralnetworks,”arXiv preprintarXiv:1308.0850,2013. 0.2 [17] T.TielemanandG.Hinton,“Lecture6.5-rmsprop:Dividethegradient by a running average of its recent magnitude,” COURSERA: Neural NetworksforMachineLearning,vol.4,no.2,2012. 0.00 5 10 15 20 [18] D.Anguita,A.Ghio,L.Oneto,X.Parra,andJ.L.Reyes-Ortiz,“Human Training Step activityrecognitiononsmartphonesusingamulticlasshardware-friendly supportvectormachine,”inInternationalWorkshoponAmbientAssisted Fig.4. AccuracyofDiscriminatorDindecidingwhichsamplesarenotreal Living. Springer,2012,pp.216–223. [19] M.Abadi,A.Agarwal,P.Barham,E.Brevdo,Z.Chen,C.Citro,G.S. Corrado,A.Davis, J.Dean,M.Devin etal.,“Tensorflow:Large-scale machinelearningonheterogeneousdistributedsystems,”arXivpreprint ACKNOWLEDGEMENT arXiv:1603.04467,2016. This research was sponsored by the U.S. Army Research Laboratory and the U.K. Ministry of Defence under Agree- mentNumberW911NF-16-3-0001.Theviewsandconclusions containedinthisdocumentarethoseoftheauthorsandshould not be interpreted as representing the official policies, either expressed or implied, of the U.S. Army Research Laboratory, the U.S. Government, the U.K. Ministry of Defence or the U.K. Government. The U.S. and U.K. Governments are au- thorized to reproduce and distribute reprints for Government purposes notwithstanding any copy-right notation hereon. REFERENCES [1] K.M.Diaz,D.J.Krupka,M.J.Chang,J.Peacock,Y.Ma,J.Goldsmith, J.E.Schwartz,andK.W.Davidson,“Fitbit(cid:13)R:Anaccurateandreliable deviceforwirelessphysicalactivitytracking.”Internationaljournalof cardiology,vol.185,pp.138–140,2015. [2] H. Wang, S. Sen, A. Elgohary, M. Farid, M. Youssef, and R. R. Choudhury, “No need to war-drive: unsupervised indoor localization,” inProceedingsofthe10thinternationalconferenceonMobilesystems, applications,andservices. ACM,2012,pp.197–210. [3] M. Alzantot and M. Youssef, “Crowdinside: automatic construction of indoorfloorplans,”inProceedingsofthe20thInternationalConference on Advances in Geographic Information Systems. ACM, 2012, pp. 99–108. [4] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S.Ozair,A.Courville,andY.Bengio,“Generativeadversarialnets,”in Advances in Neural Information Processing Systems, 2014, pp. 2672– 2680. [5] I. Goodfellow, “Nips 2016 tutorial: Generative adversarial networks,” arXivpreprintarXiv:1701.00160,2016. [6] D.P.KingmaandM.Welling,“Auto-encodingvariationalbayes,”arXiv preprintarXiv:1312.6114,2013. [7] T. Salimans, I. Goodfellow, W. Zaremba, V. Cheung, A. Radford, and X.Chen,“Improvedtechniquesfortraininggans,”inAdvancesinNeural InformationProcessingSystems,2016,pp.2226–2234. [8] C.Ledig,L.Theis,F.Husza´r,J.Caballero,A.Cunningham,A.Acosta, A. Aitken, A. Tejani, J. Totz, Z. Wang et al., “Photo-realistic single image super-resolution using a generative adversarial network,” arXiv preprintarXiv:1609.04802,2016. [9] S. Reed, Z. Akata, X. Yan, L. Logeswaran, B. Schiele, and H. Lee, “Generative adversarial text to image synthesis,” arXiv preprint arXiv:1605.05396,2016. Sampling GMM Model ……... Fully ……... connected Layers ……... 3 Layers LSTM Stack Fig.1. GenerativeModelarchitecture True Accelerometer Values Generative Model Outputs n n o o ati ati r 1 r 1 e e el el c 0 c 0 c c A A r 1 r 1 a a e e n n Li Li 0 100 200 300 400 500 600 700 800 0 100 200 300 400 500 600 700 800 Step Index Step Index n n o o ati ati r 1 r 1 e e el el c 0 c 0 c c A A r 1 r 1 a a e e n n Li Li 0 100 200 300 400 500 600 700 800 0 100 200 300 400 500 600 700 800 Step Index Step Index n n o o ati ati r 1 r 1 e e el el c 0 c 0 c c A A r 1 r 1 a a e e n n Li Li 0 100 200 300 400 500 600 700 800 0 100 200 300 400 500 600 700 800 Step Index Step Index n n o o ati ati r 1 r 1 e e el el c 0 c 0 c c A A r 1 r 1 a a e e n n Li Li 0 100 200 300 400 500 600 700 800 0 100 200 300 400 500 600 700 800 Step Index Step Index Fig.2. Visualcomparisonbetweentherealandgeneratedaccelerometertime-seriessamples