ebook img

Particle swarm optimization for time series motif discovery PDF

0.55 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Particle swarm optimization for time series motif discovery

1 Particle Swarm Optimization for Time Series Motif Discovery Joan Serra` and Josep Lluis Arcos Abstract—Efficientlyfindingsimilarsegmentsormotifsintime discoveryalgorithmsresorttosomekindofdatadiscretization series data is a fundamental task that, due to the ubiquity of or approximation that allows them to hash and retrieve seg- these data, ispresent in a wide range ofdomains and situations. ments efficiently. Following the works by Lin et al. [1] and Because of this, countless solutions have been devised but, to 5 date, none of them seems to be fully satisfactory and flexible. In Chiuetal.[4],manyofsuchapproachesemploytheSAXrep- 1 this article, we propose an innovative standpoint and present a resentation[5]and/orasparsecollisionmatrix[6].Theseallow 0 solution coming from it: an anytime multimodal optimization themtoachieveatheoreticallylowcomputationalcomplexity, 2 algorithm for time series motif discovery based on particle butsometimes atthe expenseofvery highconstant factors.In swarms.Byconsideringdatafromavarietyofdomains,weshow n addition,approximatealgorithmsusuallysufferfromanumber that this solution is extremely competitive when compared to a of data-dependent parameters that, in most situations, are not J thestate-of-the-art,obtainingcomparablemotifsinconsiderably less time using minimal memory. In addition, we show that it is intuitive to set (e.g., time/amplitude resolutions, dissimilarity 9 robust to different implementation choices and see that it offers radius, segment length, minimum segment frequency, etc.). 2 anunprecedenteddegreeofflexibilitywithregardtothetask.All A few recent approaches overcome some of these lim- these qualities make the presented solution stand out as one of ] itations. For instance, Castro & Azevedo [7] propose an G the most prominent candidates for motif discovery in long time amplitude multi-resolution approach to detect frequent seg- series streams. Besides, we believe the proposed standpoint can L be exploited in further time series analysis and mining tasks, ments, Li & Lin [8] use a grammar inference algorithm . widening the scope of research and potentially yielding novel for exploring motifs with lengths above a certain threshold, s c effective solutions. Wilson et al. [9] use concepts from immune memory to [ Index Terms—Particle swarm, multimodal optimization, time deal with different lengths, and Floratou et al. [10] combine 1 series streams, motifs, anytime. suffix trees with segment models to find motifs of any length. v Nevertheless, in general, these approaches still suffer from 9 other data-dependent parameters whose correct tuning can 9 I. INTRODUCTION requireconsiderabletime.Inaddition,approximatealgorithms 3 TIME SERIES are sequences of real numbers measured are restricted to a specific dissimilarity measure between seg- 7 0 at successive, usually regular time intervals. Data in the ments (the one implicit in their discretization step) and do not . form of time series pervade science, business, and society. allow easy access to preliminary results, which is commonly 1 Examplesrangefromeconomicstomedicine,frombiologyto known as anytime algorithms [11]. Finally, to the best of our 0 5 physics, and from social to computer sciences. Repetitions or knowledge,only[12]–[14]considertheidentificationofmotif 1 recurrences of similar phenomena are a fundamental charac- pairs containing segments of different lengths. This can be v: teristic of non-random natural and artificial systems and, as considered a relevant feature, as it produces better results in a i a measurement of the activity of such systems, time series number of different domains [13]. X often include pairs of segments of strikingly high similarity. In contrast to approximate approaches, algorithms that do r Thesesegmentpairsarecommonlycalledmotifs[1],andtheir notdiscretizethedatahavebeencomparativelymuchlesspop- a existence is unlikely to be due to chance alone. In fact, they ular,withlowefficiencygenerally.Exceptionstothisstatement usually carry important information about the underlying sys- achieved efficiency by sampling the data stream [15] or by tem. Thus, motif discovery is fundamental for understanding, identifying extreme points that constrained the search [16]. In characterizing, modeling, and predicting the system behind fact,untiltheworkofMueenetal.[17],theexactidentification the time series [2]. Besides, motif discovery is a core part of of time series motifs was thought to be intractable for even severalhigher-levelalgorithmsdealingwithtimeseries,inpar- timeseriesofmoderatelength.Insaidwork,acleversegment ticular classification, clustering, summarization, compression, ordering was combined with a lower bound based on the andrule-discoveryalgorithms(see,e.g.,referencesin[2],[3]). triangular inequality to yield the true, exact, most similar Identifying similar segment pairs or motifs implies examin- motif. According to the authors, the proposed algorithm was ing all pairwise comparisons between all possible segments more efficient than existing approaches, including all exact in a time series. This, specially when dealing with long and many approximate ones [17]. After Mueen et al.’s work, time series streams, results in prohibitive time and space a number of improvements have been proposed, the majority complexities. It is for this reason that the majority of motif focusing on eliminating the need to set a fixed segment length [18]–[20]. J.Serra`andJ.Ll.ArcosarewithIIIA-CSIC,theArtificialIntelligenceRe- Mueen himself has recently published a variable-length searchInstituteoftheSpanishNationalResearchCouncil,CampusdelaUAB s/n,08193Bellaterra,Barcelona,Spain,email:{jserra,arcos}@iiia.csic.es. motif discovery algorithm which clearly outperforms the it- 2 erative search for the optimal length using [17] and, from the −15 reported numbers, also outperforms further approaches such as [18]–[20]. This algorithm, called MOEN [3], is essentially −17 1 parameter-free, and is believed to be one of the most efficient C C motif discovery algorithms available nowadays. However, its MF −19 complexityisstillquadraticinthelengthofthetimeseries[3], and hence its applicability to large-scale time series streams −21 remainsproblematic.Furthermore,inordertoderivethelower 0 100 200 300 400 500 600 700 800 900 bounds used, the algorithm is restricted with regard to the i [samples] dissimilarity measure used to compare time series segments Fig. 1. Example of a time series motif pair found in the WILDLIFE time (Euclidean distance after z-normalization). In general, exact series of [21] using SWARMMOTIF and normalized dynamic time warping motif discovery algorithms have important restrictions with asthedissimilaritymeasure:a=248,wa=244,b=720,andwb=235. regard to the dissimilarity measure, and many of them still Note that wa (cid:54)= wb and, hence, a warping of the two segments needs to takeplace.Thisspecificsolutioncannotbefoundbyanyoftheapproaches sufferfrombeingnon-intuitiveandtedioustotuneparameters. mentionedinSec.I. Moreover, few of them allow for anytime versions and, to the best of our knowledge, not one of them is able to identify An example of a time series motif pair from a real data set is motif pairs containing segments of different lengths. shown in Fig. 1. In this article, we propose a new standpoint to time se- It is important to stress that D needs to normalize with ries motif discovery by treating the problem as an anytime respect to the lengths of the considered segments. Otherwise, multimodal optimization task. To the best of our knowl- we would not be able to compare motifs of different lengths. edge, this standpoint is completely unseen in the literature. Therearemanywaystonormalizewithrespecttothelengthof Here, we firstly reason and discuss its multiple advantages the considered segments. Ratanamahatana & Keogh [22] list (Sec. II). Next, we present SWARMMOTIF (Sec. III), an a number of intuitive normalization mechanisms for dynamic anytime algorithm for time series motif discovery based on timewarpingthatcaneasilybeappliedtoothermeasures.For particle swarm optimization (PSO). We subsequently evaluate instance, in the case of a dissimilarity measure based on the the performance of the proposed approach using 9 different L norm [23], we can directly divide by the segment length2, real-world time series from distinct domains (Sec. IV). Our p using brute-force upsampling to the largest length [22] when results show that SWARMMOTIF is extremely competitive w (cid:54)=w . when compared to the state-of-the-art, obtaining motif pairs a b From the definitions above, we can see that a brute-force of comparable similarity in considerably less time and with search in the motif space for the most similar motifs is of minimum storage requirements (Sec. V). Moreover, we show O(n2w 2), where w = w − w + 1 (for the final that SWARMMOTIF is significantly robust against different ∆ ∆ max min time complexity one needs to further multiply by the cost implementation choices. To conclude, we briefly comment of calculating D). Hence, for instance, in a perfectly feasible on the application of multimodal optimization techniques to casewheren=107andw =103,wehave1020possibilities. time series analysis and mining, which we believe has great ∆ Magnitudes like this challenge the memory and speed of any potential(Sec.VI).Thedataandcodeusedinourexperiments optimization algorithm, specially if we have no clue to guide will available online. the search [24]. However, it is one of our main objectives to show here that time series generally provide some continuity II. TIMESERIESMOTIFDISCOVERYASANANYTIME to this search space, and that this continuity can be exploited MULTIMODALOPTIMIZATIONTASK by optimization algorithms. A. Definitions and Task Complexity From the work by Mueen et al. [3], [17], we can derive a B. Continuity formal, generic similarity-based definition [2] of time series A fundamental property of time series is autocorrelation, motifs. Given a time series z of length n, z = [z ,...z ], a 1 n implying that consecutive samples in a time series have some normalized segment dissimilarity measure D, and a temporal degree of resemblance and that, most of the time, we do not window of interest between w and w samples, the top- min max observe extremal differences between them3. This property, k time series motifs M = {m ,...m } correspond to the k 1 k together with the established ways of computing similarity mostsimilarsegmentpairszwaa =[za,...za+wa−1]andzwbb = betweentimeseries[23],iswhatgivescontinuitytooursearch [z ,...z ],forw ,w ∈[w ,w ],a∈[1,n−w +1], b b+wb−1 a b min max a space. Consider a typical dissimilarity measure like dynamic and b ∈ [1,n − w + 1] where, in order to avoid repeated b time warping between z-normalized segments and the time and trivial matches [1], a + w < b. Thus, the i-th motif a series of Fig. 1. If we fix the motif starting points a and b can be fully described by the tuple m = {a,w ,b,w }. The i a b to some random values, we can compute D(zi,zj) for i,j = motifs in M are non-overlapping1 and ordered from lowest a b w ,...,w (Fig. 2A). We see that these two dimensions to highest dissimilarity such that D(m ) ≤ D(m ) ≤ ··· ≤ min max 1 2 D(mk) where D(mi) = D({a,wa,b,wb}) = D(zwaa,zwbb). 2The only exception is with L∞, which could be considered as already beingnormalized. 1Notice that, following [3], this definition can be trivially extended to 3If a time series had no autocorrelation, we might better treat it as an differentdegreesofoverlap. independentrandomprocess. 3 A200 B 0.1304 Random sampling 100 mples] 221200 mples] 200 0.0749 BAansyetliimnee raelfgeorreinthcme j [sa 223400 j [sa 340000 0.0430 250 500 200 210 220 230 240 250 100 200 300 400 500 m)i i [samples] i [samples] D( 0.0247 Fig.2. VisualizationsofthesearchspaceobtainedwiththeWILDLIFEtime series[21]anddynamictimewarpingasthedissimilaritymeasure:(A)fixing a=110andb=602buti,j=200,...250and(B)fixingwa=222and 0.0142 wb=240buti,j=1,...500.Darkercolorscorrespondstomoresimilarity. 0.0082 have a clear continuity, i.e., that D(zi,zj) ∼ D(zi+1,zj) ∼ a b a b −1 0 1 2 3 4 D(zi,zj+1)∼D(zi+1,zj+1),andsoforth.Similarly,ifwefix 10 10 10 10 10 10 a b a b t [s] the motif lengths w and w to some random values, we can a b compute D(zwia,zwjb) for i=1,...n−wa and j =1,...n− Fig. 3. Schema of a plot to assess the performance of an anytime motif w (Fig. 2B). We see that the remaining two dimensions of discovery algorithm (blue curve). Error bars indicate 5 and 95 percentiles b the problem also have some continuity, i.e., D(zwa,zwb+j)∼ of D(mi), their central marker indicates the median, and the isolated dots i j indicatemaximumandminimumvalues.Thegrayareaatthebottomdenotes D(zwi+a1,zwjb) ∼ D(zwia,zwj+b1) ∼ D(zwi+a1,zwj+b1), and so forth. the area where D(mi)∗ lie. The top-left black error bar acts as a reference The result is a four-dimensional, multimodal, continuous but and shows the range of D(mi) obtained by random sampling the motif space. The bottom-right red error bar is placed at the time that the baseline noisy4motifspace,wherethedissimilarityDactsasthefitness algorithmspentinthecalculations.Betterperforminganytimealgorithmshave measureandthetop-k valleypeaks(consideringdissimilarity) acurveclosertothebottomleftcorner,quicklyenteringthegrayareaastheir correspond to the top-k motifs in M. executiontimeincreases.Noticethatbothaxesarelogarithmic. C. Anytime Solutions M∗, we can easily and efficiently retrieve further repetitions Findinganoptimizationalgorithmthatcanlocatetheglobal via common established approaches [28], [29]. Thus, only minimaoftheprevioussearchspacesfasterthanexistingmotif non-frequent or singular motifs may be missed. These can discovery algorithms can be a difficult task. However, we be valuable too, as the fact that they are non-frequent does have robust and established algorithms for efficiently locating not imply that they cannot carry important information (think prominent local minima in complex search spaces [25]–[27]. for example of extreme events of interest that perhaps only Hence, we can intuitively devise a simple strategy: if we happen twice in a measurement). For those singular motifs, keep the best found minima and randomly reinitialize the we can wait longer if using an anytime algorithm, or we can optimization algorithm every time it stagnates, we should, resorttothestate-of-the-artifthatisabletoprovideitsoutput sooner or later, start locating the global minima. In the mean- within an affordable time limit. time, we could have obtained relatively good candidates. This correspondstothebasicparadigmofanytimealgorithms[11]. D. Particle Swarm Optimization Anytime algorithms have recently been highlighted as “very beneficial for motif discovery in massive [time series] The continuity and anytime observations above (Secs. II-B datasets” [19]. In an anytime algorithm for motif discovery, andII-C)relaxtherequirementsfortheoptimizationalgorithm D(m ) improves over time, until it reaches the top-k dis- to be employed in the considered motif spaces. In fact, if i similarity values D(m )∗ obtained by a brute-force search we do not have to assess the global optimality of a solution, i approach. Thus, we gradually improve M until we reach we have a number of approaches that can deal with large, the true exact solution M∗. A good anytime algorithm will multimodal, continuous but noisy search spaces [25]–[27]. quickly find low D(m ), ideally reaching D(m )∗ earlier than Amongthem,wechoosePSO[30]–[34].PSOisapopulation- i i its non-anytime competitors (Fig. 3). based stochastic approach for solving continuous and discrete Note that a sufficiently good M may suffice in most situa- optimization problems [33] which has been applied to multi- tions,withouttheneedthatM=M∗.Thisisparticularlytrue modal problems [35]. It is a metaheuristic [27], meaning that for more exploratory tasks, where one is typically interested it cannot guarantee whether the found solution corresponds to in data understanding and visual inspection (see [2]), and can a global optimum. The original PSO algorithm cannot even also hold for other tasks, as top-k motifs can be very similar guarantee the convergence to a local optimum, but adapted among themselves. In the latter situation, given a seed within versions of it have been proven to solve this issue [36]. Other versionsguaranteetheconvergencetotheglobaloptimum,but 4Weusethetermnoisyheretostressthatthecontinuityofthespacemay only with the number of iterations approaching infinity [36]. be altered at some points due to potential noise in the time series. It is not PSO has gained increasing popularity among researchers the case that we have a noisy, unreliable dissimilarity measurement D that couldchangeinsuccessiveevaluations. andpractitionersasarobustandefficienttechniqueforsolving 4 difficult optimization problems. It makes few or no assump- 2) Byconstruction,wehaveananytimealgorithm(Sec.II-D). tions about the problem being optimized, does not require it 3) Wecan obtainaccurate andmuch fastersolutions, ascom- to be differentiable, can search very large spaces of candidate pared to the state-of-the-art in time series motif discovery solutions, and can be applied to problems that are irregular, (Sec. V-C). incomplete,noisy,dynamic,etc.(see[30]–[35]andreferences 4) We have an essentially parameter-free algorithm [33]. As therein). PSO iteratively tries to improve a candidate solution will be shown, all our parameter choices turn out to be with regard to a given measure of quality or fitness function. non-critical to achieve the most competitive performances Hence,furthermore,itcanbeconsideredananytimealgorithm. (Secs. V-A and V-B). 5) We have an easily parallelizable algorithm. The agent- E. Advantages of an Optimization-Based Solution Using Par- basednatureofPSOnaturallyyieldstoparallelimplemen- ticle Swarms tations [32]. 6) We still have the possibility to apply lower bounding tech- Notice that treating time series motif discovery as an niques to D in order to reduce its computational cost [2], optimization problem naturally yields several advantages: [29]. Among others, we may exploit the particles’ best-so- 1) Wedonotrequiremuchmemory,aswecanbasicallystore far values or spatially close dissimilarities. only the stream time series and preprocess the required 7) All of these use a simple, easy to implement algorithm segments at every fitness evaluation. requiring low storage capabilities (Sec. III-B). 2) We are able to achieve a certain efficiency, as optimization algorithms do not usually explore the full solution space and perform few fitness evaluations [24]. III. SWARMMOTIF 3) We can employ any dissimilarity measure D as our fitness A. Main Algorithm function.Itsonlyrequirementsaresegmentlengthindepen- dence and a minimal search space continuity. Intuitively, OurPSOapproachtotimeseriesmotifdiscoveryisbasedon this holds for the high majority of time series dissimilarity thecombinationoftwowell-knownextensionstothecanonical measures that are currently used (Secs. II-A and II-B). PSO [31]. On one hand, we employ multiple reinitializations Additionally, we can straightforwardly incorporate notions oftheswarmonstagnation[39].Ontheotherhand,weexploit of‘interestingness’,hubness,orcomplexity(seereferences the particles’ “local memories” with the intention of forming in [23]). This flexibility is very uncommon in current time stable niches across different local minima [40]. The former series motif discovery algorithms (Sec. I). emulates a parallel multi-swarm approach [35] without the 4) We do not need to force the two segments of the motif to need of having to define the number of swarms and their be of the same length. The dissimilarity function D can communication. The latter, when combined with the former, expressly handle segments of different lengths or we can results in a low-complexity niching strategy [35] that does simply upsample to the largest length (see [22]). Although not require niching parameters (see the related discussion consideringdifferentsegmentlengthshasbeenhighlighted in[41],[42]).SWARMMOTIF,theimplementationofthetwo as an objectively better approach, practically none of the extensions, is detailed in Algorithm 1. currenttimeseriesmotifdiscoveryalgorithmscontemplates SWARMMOTIF takes a time series z of length n as input, this option (Sec. I). together with a segment dissimilarity measure D, and the 5) Since we search for the optimal wa and wb, together with range of segment lengths of interest, limited by wmin and a and b, we do not need to set the exact segment lengths wmax. The user also needs to specify k, the desired number of as a parameter. Instead, we can use a more intuitive and motifs, and tmax, the maximum time spent by the algorithm easier to set range of lengths w ,w ∈[w ,w ]. (in iterations5). SWARMMOTIF outputs a set of k non- a b min max 6) We can easily modify our fitness criterion to work with overlapping motifs M. We implement M as a priority queue, different task settings. Thus, just by replacing D, we are which typically stores more than k elements to ensure that it abletoworkwithmulti-dimensionaltimeseries[37],detect contains k non-overlapping motifs. This way, by sorting the sub-dimensional motifs [38], perform a constrained motif motifcandidatesassoonastheyarefound,weallowpotential discovery task [16], etc. queries to M at any time during the algorithm’s execution. In 7) We can incorporate notions of motif frequency to our that case, we only need to dynamically check the candidates’ fitness function and hence expand our similarity-based overlap (Sec. III-B). Notice that n, D, wmin, wmax, k, and definition of motif to incorporate both notions [2]. For tmax are not parameters of the algorithm, but requirements instance, instead of optimizing for individual motifs m , of the task (they depend on the data, the problem, and the i we can optimize sets of motifs M(cid:48) of size r such that available time). The only parameters to be set, as specified i i 1 (cid:80) D(m ) is minimal. We can choose r to be a in Algorithm 1’s requirements, are the number of particles κ, mriinimmuj∈mMf(cid:48)irequenjcy of motif appearance or we cian even the topology θ, the constriction constant φ, and the maximum decide to optimize it following any suitable criterion. amount of iterations at stagnation τ. Nevertheless, we will show that practically none of the possible parameter choices Inaddition,usingPSOhasanumberofinterestingproperties, some of which may be shared with other metaheurisics: 5The number of iterations is easy to infer from the available time as, for 1) Wehaveastraightforwardmappingtotheproblemathand thesameinput,theelapsedtimewillberoughlydirectlyproportionaltothe (Sec. III-A). numberofiterations. 5 Algorithm 1 SWARMMOTIF(z,D,wmin,wmax,k,tmax) Algorithm 2 INITIALIZESWARM(n,wmin,wmax,κ) Input: Time series z of length n, dissimilarity measure D, Input: Time series length n, minimum and maximum seg- minimum and maximum segment length w and w , ment length w and w , and number of particles κ. min max min max numberofmotifsk,andmaximumamountoftime(num- Output: ParticlepositionsX,velocitiesV,bestscoresS,and ber of iterations) t . best positions P. max Require: Number of particles κ, topology θ, constriction 1: for i=1,...κ constant φ, and maximum amount of time at stagnation 2: xi,2 ←wmin+(wmax+1−wmin)u (number of iterations) τ. 3: xi,4 ←wmin+(wmax+1−√wmin)u Output: A set of motifs M. 4: xi,1 ←1+(n−xi,2)(1− u) 1: c0,c1,c2 ← GETCONSTANTS(φ) 5: xi,3 ←1+(n−xi,4−(xi,1+xi,2))u 2: X,V,S,P ← INITIALIZESWARM(n,wmin,wmax,κ) 6: x(cid:48) ← As in lines 2–5 3: Θ← INITIALIZETOPOLOGY(θ,κ) 7: vi ←x(cid:48)−xi 4: s∗ ←∞ 8: si ←∞ 5: M← EMPTYPRIORITYQUEUE() 9: pi ←xi 6: for t=1,...tmax 10: return X,V,S,P 7: for i=1,...κ 8: if VALIDPOSITION(xi) 9: d←D(xi) a motif candidate, and have a direct correspondence with m i 10: if d<si (seeSec.III-B).AfurtherdatastructureΘindicatestheindices 11: si ←d of the neighbors of each particle according to a given social 12: pi ←xi topologyθ (line3).Apartfromtheswarm,wealsoinitializea 13: M.PUSH(d,xi) globalbestscores∗ (line4)andthepriorityqueueM(line5). 14: if d<s∗ We then enter the main loop (lines 6–26). In it, we perform 15: s∗ ←d three main actions. Firstly, we compute the particles’ fitness 16: tupdate ←t and perform the necessary updates (lines 7–16). Secondly, we 17: for i=1,...κ modifytheparticles’positionandvelocityusingtheirpersonal 18: g ←i and neighborhood best positions (lines 17–23). Thirdly, we 19: for j in Θi control for stagnation and reinitialize the swarm if needed 20: if sj ≤sg (lines 24–26). Finally, when we exit the loop, we return the 21: g ←j first k non-overlapping motif candidates from M (line 27). 22: vi ←c0vi+c1u1⊗(pi−xi)+c2u2⊗(pg−xi) The particles’ fitness loop (lines 7–16) can be described 23: xi ←xi+vi as follows. For the particles that have a valid position within 24: if t−tupdate =τ the ranges used for particle initializations (line 8; see also 25: X,V,S,P ← INITIALIZESWARM(n,wmin,wmax,κ) Algorithm 2 for initializations), we calculate their fitness D 26: s∗ ←∞ (line 9) and, if needed, update their personal bests s and p 27: return NONOVERLAPPING(M,k) i i (lines 10–12). As mentioned, D needs to be independent of the segments’ lengths, which is typically an easy condition for time series dissimilarity measures (Sec. II-A). In the case introduces a significant variation in the reported performance that the particles find a new personal best, we save the motif (Sec. V-A). dissimilarity d and its position x into M (line 13). Next, we i Having clarified SWARMMOTIF’s input, output, and re- update t , the last iteration when an improvement of the update quirements, we now elaborate on its procedures. Algorithm 1 global best score s∗ has occurred (lines 14–16). starts by computing the velocity update constants (line 1) The particles’ update loop (lines 17–23) is straightforward. following Clerc’s constriction method [43], i.e., We first select each particle’s best neighbor g using the 2 neighborhood personal best scores s (lines 18–21). Then, c0 = (cid:12) (cid:112) (cid:12) we use the positions of the best nejighbor’s personal best (cid:12)2−φ− φ2−4φ(cid:12) (cid:12) (cid:12) p and the particle’s personal best p to compute its new g i and velocity and position (lines 22–23). We employ component- wisemultiplication,denotedby⊗,andtworandomvectorsu c =c =c φ/2. (1) 1 1 2 0 and u whose individual components u = U(0,1), being 2 i,j Next, a swarm with κ particles is initialized (line 2). The U(l,h) a uniform real random number generator such that swarm is formed by four data structures: a set of particle l≤U(l,h)<h.Notethatbyconsideringtheparticles’neigh- positions X = {x ,...x }, a set of particle velocities V = borhoodpersonalbestsp wefollowtheaforementionedlocal 1 κ g {v ,...v }, a set of particle best scores S = {s ,...s }, neighborhoodnichingstrategy[40].Attheendoftheloopwe 1 κ 1 κ and a set of particle best positions P = {p ,...p } (the control for stagnation by counting the number of iterations 1 κ initialization of these four data structures is detailed in Algo- since the last global best update and applying a threshold τ rithm 2). Particles’ positions x and p completely determine (line 24). Note that this is the mechanism responsible for the i i 6 aforementioned multiple reinitialization strategy [39]. Algorithm3VariationstoAlgorithm1:sociability(lines1–2), TheinitializationoftheswarmusedinAlgorithm1(lines2 stochastic(line3),velocityclamping(lines4–6),andcraziness and 25) is further detailed in Algorithm 2. In it, for each (lines 7–10). particle,tworandompositionsxi andx(cid:48) aredrawn(lines2–6) 1: c1 ←c0φ(1−α) and the initial velocity is computed as the subtraction of the 2: c2 ←c0φα two(line7).Toobtainx andx(cid:48),uniformrealrandomnumbers i u = U(0,1) are subsequently generated. The personal best 3: vi ←(1−2(1−c0))uvi+c1u1⊗(pi−xi)+c2u2⊗(pg−xi) score s is set to infinite (line 8) and x is taken as the current i √ i bestpositionpi(line9).Notethat u(line4)isusedtoensure 4: vrange ←[n,w∆,n,w∆]/2 a uniform distribution of the particles across the triangular 5: for vrange in vrange j subspace formed by xi,1 and xi,3 (line 5; see also Sec. II-A). 6: vi,j ←min(vjrange,max(−vjrange,vi,j)) B. Implementation Details 7: v(cid:48) ← As in Algorithm 2 i Some implementation details are missing in Algorithms 1 8: for vi,j in vi and 2. Firstly, positions x are floored component-wise in- 9: if u<ρ i side VALIDPOSITION, D, and M (thus obtaining motif mi). 10: vi,j ←vi(cid:48),j. Secondly, the motif priority queue M is implemented as an associative container (logarithmic insertion time) that sorts its elementsaccordingtodandstoresm .Thirdly,thelastvisited constrictionfactorandvelocityclampingresultsinimproved i positions are cached into a hash table (constant lookup time) performance on certain problems [46]. To apply velocity in order to avoid some of the possible repeated dissimilarity clampingweaddlines4–6ofAlgorithm3betweenlines22 computations.Fourthly,weincorporatetheoptiontoconstrain and 23 of Algorithm 1. the motif search by specifying a maximum segment stretch in • Craziness: We introduce so-called “craziness” or “perturba- Algorithm 2 and VALIDPOSITION. Finally, the function that tion” in the particles’ velocities, as initially suggested by returns the non-overlapping top-k motifs employs a boolean Kennedy & Eberhart [45]. In such variant, inspired by the array of size n in order to avoid O(k2) comparisons between sudden direction changes observed in flocking birds, the membersofthequeue(cf.[3]).Noticethatwehaveamemory- particles’velocityisalteredwithacertainprobabilityρ,with efficient implementation, as we basically only need to store z the aim of favoring exploration by increasing directional and the boolean array (both of O(n) space), M (of O(k) diversity and discouraging premature convergence [47]. We space, k (cid:28)n), and X, V, S, P, and Θ (all of them of O(κ) coincide with [47] in that, in some sense, this can be seen space, κ(cid:28)n). The aforementioned hash table (optional) can as a mutation operation. To implement craziness we add beallocatedinanypredefined,availablememorysegment.For lines 7–10 of Algorithm 3 between lines 22 and 23 of thesakeofbrevity,theinterestedreaderisreferredtothepro- Algorithm 1. vided code for a full account of the outlined implementation details. IV. EVALUATIONMETHODOLOGY To evaluate SWARMMOTIF’s speed and accuracy we con- C. Variants siderplotsliketheonepresentedinFig.3.Asareference,we Given the main Algorithm 1, we consider a number of draw uniform random samples from the motif search space variations that may potentially improve SWARMMOTIF’s andcomputetheirdissimilarities(wetakeasmanysamplesas performance without introducing too much algorithmic com- thelengthnofthetimeseries).Asabaseline,weusethetop- plexity: 25motifsfoundbyMOEN[3],whichwewilldenotebyM∗. • Sociability: We study whether a “cognitive-only” model, a Existing empirical evidence [3], [17] suggests that MOEN is “social-only”one,ordifferentweightingsofthetwoyieldto themostefficientalgorithmtoretrievethetopexactsimilarity- someimprovements[44].Todoso,wejustneedtointroduce basedmotifsinarangeoflengths6(Sec.I).Noticefurthermore aparameterα∈[0,1]controllingthedegreeof‘sociability’ that here we are not that interested in obtaining all top-25 of the particles, and implement lines 1–2 of Algorithm 3 true exact motifs, but more concerned on obtaining good seed instead of Eq. 1. motifs within these using an anytime approach (Sec. II-C). • Stochastic: We investigate the use of a random inertia As its competitors, MOEN has however some limitations weight [39]. This may alleviate the need of using the same (Sec. I). Thus, to fairly compare results, we have to apply c in different environments, providing a potentially more some constrains to our algorithm. Since MOEN can only 0 adaptivetrade-offbetweenexplorationandexploitation(also use the Euclidean distance between z-normalized segments, controlled by α in the previous point). To consider this here we also adopt this formulation for D. In addition, as variant, we just need to replace line 22 in Algorithm 1 by MOEN only considers pairs of segments of the same length line 3 in Algorithm 3. (withoutresampling),wehavetoconstrainSWARMMOTIFso • Velocityclamping:Inadditiontoconstriction,westudylim- 6Besides,wecouldnotfindanyotherpromisingexactoranytimeapproach iting the maximum velocity of the particles [45]. Empirical with some available code, nor with sufficient detail to allow a reliable studieshaveshownthatthesimultaneousconsiderationofa implementation. 7 TABLEI (linear frequency) Vamp SDK example plugin from Sonic CHARACTERISTICSOFTHETIMESERIESUSEDFOREVALUATION, Visualizer [53]. We used a Hann window of 8192 samples TOGETHERWITHTHECONSIDEREDSEGMENTLENGTHRANGES. at 44.1 KHz with 75% overlap. Name Duration SampleRate n wmin wmax 6) WIND: The wind speed (in m/s) registered in the buoy of DOWJONES 129y 1d−1 35503 30 300 Rincon del San Jose, TX, USA, between January 1, 2010 CARCOUNT 175d 1/5min−1 47534 270 320 and April 11, 2014. The time series was retrieved from INSECT 32min 38Hz 73929 300 500 the Texas Coastal Ocean Observation Network website11. EEG 1h 50Hz 180214 150 250 Segments of missing values were manually interpolated or FIELDRECORDING 3h 21.5Hz 226383 50 150 WIND 4y 1/6min−1 369138 200 280 removed. POWER 32months 1/5min−1 410608 250 620 7) POWER: The electric power consumption (in KW) of an EOG 9h 25Hz 809948 150 200 individual household12. The data was retrieved from the RANDOMWALK - - 1000000 100 250 UCIMachineLearningRepository[51].Wetooktheglobal active power, removed missing values, and downsampled the original time series by a factor of 5 using averaging that x = x . Therefore, the reported motif dissimilarities i,2 i,4 and 50% overlap. D(m ) correspond to the Euclidean distance between two z- i 8) EOG: An electrooculogram tracking the eye movements normalized segments of the same length (we divide by the of a sleeping patient [54]. We took the downsampled time length of the segments to compare different segment lengths, series [48] from Mueen’s web page13. Sec. II-A). In the reported experiments, SWARMMOTIF is 9) RANDOMWALK: A random walk time series. This was run 10 times with k = 10. We stop its execution when we artificially synthesized using z = z + N(0,1) for i+1 i find 95% of D(m ) within M∗. This way, we assess the time i i = 2,...n and z = 0, where N(0,1) is a real Gaus- 1 taken to retrieve any 10 motifs from those with at least 95% sian random number generator with zero mean and unit confidence.Allexperimentsareperformedusingasinglecore variance. of an Intel(cid:13)R Xeon(cid:13)R CPU E5-2620 at 2.00 GHz. To assess the statistical significance of the differences To demonstrate that SWARMMOTIF is not biased towards betweenalternativeparametersettings,weemployatwostage a particular data source, time series length, or motif length, approach. First, we consider all settings at the same time and we consider 9 different time series of varying length, coming perform the Friedman’s test [55], which is a non-parametric from distinct domains, and a number of arbitrary but source- statistical test used to detect differences in treatments across consistent motif lengths (Table I). As mentioned, we make multiple test attempts. We use as inputs the median values for thesetimeseriesandourcodeavailableonline(Sec.I).Fourof all settings for 25 equally-spaced time steps. In the case some thetimeserieshavebeenusedformotifdiscoveryinprevious difference between settings is detected (i.e., we reject the null studies [17], [48], while the other five are employed here for hypothesisthatthesettings’performancescomefromthesame the first time for this task: distribution),weproceedtothesecondstage.Init,weperform 1) DOWJONES: The daily closing values of the Dow Jones all possible pairwise comparisons between settings using the average in the USA from May 2, 1885 to April 22, Wilcoxonsigned-ranktest[55],anothernon-parametricstatis- 2014 [49]. tical hypothesis test used for comparing matched samples. To 2) CARCOUNT: The number of cars measured for the Glen- counteract the problem of multiple comparisons and control dale on ramp for the 101 North freeway in Los Angeles, the so-called family-wise error rate, we employ the Holm- CA, USA [50]. The measurement was carried out by the Bonferroni correction [56]. In all statistical tests, we consider Freeway Performance Measurement System7 and the data a significance level of 0.01. was retrieved from the UCI Machine Learning Repos- itory [51]. Segments of missing values were manually V. RESULTS interpolated or removed. 3) INSECT:Theelectricalpenetrationgraphofabeetleafhop- A. Configuration per (circulifer tenellus) [17]. The time series was retrieved In pre-analysis, and according to common practice, we set from Mueen’s website8. κ = 100, φ = 4.1, and τ = 2000. We then experimented 4) EEG: A one hour electroencephalogram (in µV) from a with6differentstatictopologiesθ [57]:globalbest,localbest single channel in a sleeping patient [17]. The time series (two neighbors), Von Neumann, random (three neighbors), wasretrievedfromMueen’swebsite9 and,accordingtothe wheel, and binary tree. The results showed the qualitative authors, was smoothed and filtered using domain-standard equivalence of all topologies except, perhaps, global best procedures. and wheel (Fig. 4). In some data sets, these two turned 5) FIELDRECORDING: The spectral centroid (in Hz) of a out to yield slightly worse performances for short-time runs field recording retrieved from Freesound10 [52]. We used of the algorithm (small t), although for longer runs they the mean of the stereo channels and the spectral centroid gradually became equivalent to the rest. However, in general, 7http://pems.dot.ca.gov 11http://lighthouse.tamucc.edu/pq 8http://www.cs.ucr.edu/∼mueen/MK 12http://archive.ics.uci.edu/ml/datasets/Individual+household+electric+ 9http://www.cs.ucr.edu/∼mueen/OnlineMotif power+consumption 10http://www.freesound.org/people/JeffWojo/sounds/121250 13http://www.cs.ucr.edu/∼mueen/DAME 8 0.1304 0.1219 Global best 20 Local best 40 Von Neumann 80 0.0749 0.0742 Random 160 Wheel 320 Binary tree 0.0430 0.0452 m)i m)i D( D( 0.0247 0.0276 0.0142 0.0168 0.0082 0.0102 1 2 3 4 0 1 2 3 4 10 10 10 10 10 10 10 10 10 t [s] t [s] Fig.4. EffectofθontheEEGtimeseries.Equivalentresultswereobserved Fig.5. EffectofκontheWINDtimeseries.Equivalentresultswereobserved withtheothertimeseries. withtheothertimeseries. no systematic statistically significant difference was detected 0.2012 500 betweentopologies.Withthisinmind,wechosethelocalbest 1000 topologytofurtherfavorexplorationandparallelism,andtobe 2000 0.1316 4000 more consistent with our local neighborhood design principle 8000 of Sec. III-A. Next, we studied the effect of the number of particles 0.0860 κ and the stagnation threshold τ. To do so, we kept the m)i previous configuration with the local best topology and sub- D( 0.0562 sequently evaluated κ = {20,40,80,160,320} and τ = {500,1000,2000,4000,8000}. Essentially, we observed al- most no performance changes under these alternative settings 0.0368 (Figs. 5 and 6). We only found a statistically significant differenceinthecaseoftheCARCOUNTdataset.Specifically, 0.0240 the performance with τ = 500 was found to be statistically 0 1 2 3 4 significantly worse than τ ≥ 2000. Regarding κ, and after 10 10 10 10 10 considering different n, w and k, a partial tendency seemed t [s] ∆ to emerge: an increasing number of particles κ was slightly Fig.6. Effectofτ ontheFIELDRECORDINGtimeseries.Equivalentresults beneficialforincreasinglengthsn,increasingw ,andincreas- ∆ wereobservedwiththeothertimeseries. ing k. Unfortunately, we could not obtain strong empirical evidence nor formal proof for this statement. Nonetheless, in subsequent experiments, we decided to use a value for κ and strange to observe that low φ values are more appropriate for τ that dynamically adapts SWARMMOTIF’s configuration to searching the large motif spaces we consider here. We finally such predefined task parameters. We arbitrarily set τ propor- chose φ=4.05. tional to κ, and κ proportional to n and in direct relation Overall,theresultofourpre-analysissuggestsahighdegree to w∆ and k (we refer to the provided code for the exact of robustness with respect to the possible configurations. The formulation). topologyθ,thenumberofparticlesκ,thestagnationthreshold To conclude our pre-analysis, we studied the influence of τ, and the constriction constant φ have, in general, no signif- the constriction constant φ. Following common practice, we icant influence on the obtained results. The only consistent consideredφ={4.02,4.05,4.1,4.2,4.4,4.8}.Inthiscase,we exception is observed for values of φ ≥ 4.4, which are not saw that high values had a negative impact on performance the most common practice [34]. The global best and wheel (Fig.7).Inparticular,valuesofφ≥4.2orφ≥4.4,depending topologies could also constitute a further exception. However, on the data set, statistically significantly increased the motif as we have shown, these become qualitatively equivalent to dissimilarities at a given t. Contrastingly, values 4<φ<4.2 the rest as execution time t increases, yielding no statistically yielded stable dissimilarities with no statistically significant significant difference. We believe that the reported stability variation (in some data sets, this range could be extended of SWARMMOTIF against the tested configurations and data to 4 < φ ≤ 4.4). It is well-known that higher φ values setsjustifiestheuseofoursettingforfindingmotifsindiverse favorexploitationratherthanexploration[43].Hence,itisnot time series coming from further application domains. 9 0.0895 0.1304 4.02 0 4.05 0.00001 4.1 0.0001 0.0579 4.2 0.0749 0.001 4.4 0.01 4.8 0.1 0.0375 0.0430 m)i m)i D( D( 0.0243 0.0247 0.0157 0.0142 0.0102 0.0082 0 1 2 3 4 1 2 3 4 10 10 10 10 10 10 10 10 10 t [s] t [s] Fig. 7. Effect of φ on the INSECT time series. Equivalent results were Fig.8. EffectofρontheEEGtimeseries.Equivalentresultswereobserved observedwiththeothertimeseries. withtheothertimeseries. C. Final Performance B. Variants After extending SWARMMOTIF with velocity clamping Usingtheconfigurationresultingfromtheprevioussection, and craziness, we assess its performance on all considered we subsequently assessed the performance of the variations time series using the default parameter combination resulting considered in Sec. III-C. We started with the sociability from the previous two sections. As can be seen, the obtained variant, experimenting with social-only models, α = 1, motif dissimilarities are far from the random sampling in all cognitive-only models, α = 0, and intermediate configura- cases(Fig.9;noticethelogarithmicaxes).Inaddition,wesee tions, α = {0.2,0.33,0.66,0.8}. Apart from the fact that no that SWARMMOTIF is able to already obtain dissimilarities cleartendencycouldbeobserved,noneoftheprevioussettings withinM∗ assoonasitsexecutionbegins.Specifically,motif wasabletoconsistentlyreachtheperformanceachievedbythe dissimilaritiesinMstarttooverlaptheonesinM∗att<10s original variant (α = 1/2, Eq. 1) in all time series. That is, for practically all time series. The only exceptions are EOG none of the previous settings could statistically significantly and RANDOMWALK, where M starts to overlap with M∗ at outperform α=1/2 in the majority of the data sets. t < 100s. We hypothesize that taking a smaller number of Next, we experimented with the stochastic and the velocity particles κ could make M overlap with M∗ earlier, but leave clamping variants. While the former did not improve our the formal assessment of this hypothesis for future work. results, the latter led to a statistically significant improvement Finally, as execution time t progresses, we see that forsometimeseries.Becauseofthat,wedecidedtodiscardthe SWARMMOTIFconsistentlyretrieveslowerdissimilarities,up useofastochasticvariantbuttoincorporatevelocityclamping to the point that M (cid:39) M∗ (Fig. 9 and Table II). Following to our main algorithm. The former could be difficult to justify the condition we specify in Sec. IV, this means that the while the latter has empirical evidence behind it (Sec. III-C). distancesinthemotifsetobtainedbySWARMMOTIFarenot Finally, we experimented with craziness and its probability statistically worse than the ones of the true exact motif set. ρ. The results showed a similar performance for 0 ≤ ρ < With respect to MOEN’s execution time, this happens 1487 0.001,aslightlybetterperformancefor0.001≤ρ≤0.01,and (DOWJONES), 184 (CARCOUNT), 50 (INSECT), 179 (EEG), an increasingly worse performance for ρ > 0.01 (Fig. 8). A 100 (FIELDRECORDING), 241 (WIND), 286 (POWER), 74 statisticallysignificantdifferencewasfoundbetweenρ≤0.01 (EOG), and 287 (RANDOMWALK) times faster. This implies and ρ > 0.1, being ρ > 0.1 a consistently worse setting. between one and three orders of magnitude speedups (more These results were expected, as the swarm performs a more than that for DOWJONES). Overall, we believe this is an random search with increasing ρ, being completely random in extremely competitive performance for an anytime motif dis- thelimitingcaseofρ=1.Theslightlybetterperformancefor covery algorithm. 0.001≤ρ≤0.01 was not found to be statistically significant under our criteria. However, it was visually noticeable for VI. CONCLUSION some data sets. For instance, with the EEG data set, we see that curves 33 and 34 hit the dissimilarities of the true exact In this article, we propose an innovative standpoint to the motifsetM∗ (grayarea)twoorthreetimesearlierthancurves task of time series motif discovery by formulating it as an 30, 31, and 32 (Fig. 8). With these results, and seeing that ρ anytime multimodal optimization problem. After a concise values between 0.001 and 0.01 never harmed the performance but comprehensive literature review, we reason out the new of the algorithm, we chose ρ=0.002. formulation and the development of an approach based on 10 A0.2243 B0.1040 C0.0895 0.1163 0.0766 0.0579 0.0602 0.0564 0.0375 m)i m)i m)i D( 0.0312 D( 0.0415 D( 0.0243 0.0162 0.0306 0.0157 0.0084 0.0225 0.0102 0 1 2 3 0 1 2 3 1 2 3 4 10 10 10 10 10 10 10 10 10 10 10 10 t [s] t [s] t [s] D0.1304 E0.2012 F0.1219 0.0749 0.1316 0.0742 0.0430 0.0860 0.0452 m)i m)i m)i D( 0.0247 D( 0.0562 D( 0.0276 0.0142 0.0368 0.0168 0.0082 0.0240 0.0102 1 2 3 4 1 2 3 4 1 2 3 4 10 10 10 10 10 10 10 10 10 10 10 10 t [s] t [s] t [s] G0.0911 H0.1242 I 0.1601 0.0588 0.0800 0.0879 0.0380 0.0515 0.0482 m)i m)i m)i D( 0.0246 D( 0.0332 D( 0.0264 0.0159 0.0213 0.0145 0.0102 0.0137 0.0080 1 2 3 4 5 1 2 3 4 5 2 3 4 5 10 10 10 10 10 10 10 10 10 10 10 10 10 10 t [s] t [s] t [s] Fig.9. SWARMMOTIFperformanceontheconsideredtimeseries:(A)DOWJONES,(B)CARCOUNT,(C)INSECT,(D)EEG,(E),FIELDRECORDING,(F) WIND,(G)POWER,(H)EOG,and(I)RANDOMWALK. evolutionary computation. We then highlight the several ad- flexibility, SWARMMOTIF stands out as one of the most vantages of this new formulation, many of which relate to a prominent choices for motif discovery in long time series high degree of flexibility of the solutions that come from it. streams. Since the used data and code are available online To the best of our knowledge, such a degree of flexibility is (Sec. I), the research presented here is fully reproducible, unseen in previous works on time series motif discovery. and SWARMMOTIF is freely available to researchers and We next present SWARMMOTIF, an anytime multimodal practitioners. optimization algorithm for time series motif discovery based We believe that the consideration of multimodal optimiza- on particle swarms. We show that SWARMMOTIF is ex- tion algorithms is a relevant direction for future research in tremely competitive when compared to the best approach we the field of time series analysis and mining. Not only with could find in the literature. It obtains motifs of comparable regard to motif discovery, but also in other tasks such as similarities, in considerably less time, and with minimum queryingfor segmentsofunknownlength [28]ordetermining memory requirements. This is confirmed with 9 independent optimal alignments and similarities [23]. With regard to the real-world time series of increasing length coming from a latter, we envision powerful approaches to variable-length variety of domains. Besides, we find that the high majority local similarity calculations, in the vein of existing dynamic of the possible implementation choices lead to non-significant programming approaches [58], [59]. Finally, we believe that performancechangesinallconsideredtimeseries.Thus,given considering the search spaces and the time constraints de- this robustness, we can think about the proposed solution rived from time series problems can be a challenge for the as being parameter-free from the user’s perspective. Over- evolutionary computation community. We look forward to all, if we add the aforementioned, unprecedented degree of exploring all these topics in forthcoming works, together with

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.