ebook img

Wavelet analysis on symbolic sequences and two-fold de Bruijn sequences PDF

1.1 MB·
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Wavelet analysis on symbolic sequences and two-fold de Bruijn sequences

WAVELET ANALYSIS ON SYMBOLIC SEQUENCES AND TWO-FOLD DE BRUIJN SEQUENCES. V.AL. OSIPOV Chemical Physics, Lund University, Getingev¨agen 60, 22241, Lund, Sweden Abstract. The concept of symbolic sequences play important role in study of complex systems. Intheworkweareinterestedinultrametricstructureofthesetofcyclicsequences 6 1 naturally arising in theory of dynamical systems. Aimed at construction of analytic and 0 numericalmethodsforinvestigationofclustersweintroduceoperatorlanguageonthespace 2 of symbolic sequences and propose an approach based on wavelet analysis for study of the n a cluster hierarchy. The analytic power of the approach is demonstrated by derivation of a J formulaforcountingoftwo-folddeBruijnsequences,theextensionofthenotionofdeBruijn 9 sequences. Possible advantages of the developed description is also discussed in context of ] h applied problem of construction of efficient DNA sequence assembly algorithms. p - h t a 1. Motivation and structure. m [ Nowadayssymbolicsequencesisafundamentalconceptwidelyusedinvariousfieldsofnat- 1 ural sciences [1–9]. In bioinformatics, theory of information, and theory of discreet Markov v 7 chains the intrinsic structure of objects under consideration provides direct mapping on sym- 9 0 bolic sequences [1]. Less obvious extension of the symbolic approach one can find in study 2 0 of complex behavior in dynamical systems [2,3]. In essence, the underlying idea is in organi- . 1 zation of a stroboscopic sampling of the multidimensional trajectory. In case of Hamiltonian 0 6 systems one constructs a Poincar´e section surface in phase-space [10], such that it is oriented 1 : orthogonal to the dynamical flow at each point. Linearization of the dynamics allows to v i separate stable and unstable directions of motion and thus define the set of feasible positions X r at the next crossing of the Poincar´e section surface. All intersections falling within the same a sub-region of the surface are designated by a certain symbol. For the system possessing a chaotic dynamics one can aver existence of the Markov partitioning of the surface, meaning that each next symbol in the symbolic dynamics is defined only by the previous one. For infinite or cyclic sequences the real trajectory can be uniquely restored from the symbolic dynamics. Often, the obtained Markov alphabet includes an infinite number of symbols, as, for instance, in chaotic billiards, out of the flow discontinuity on the billiard walls. It is Date: January 12, 2016. Key words and phrases. Symbolic sequences; wavelet; de Bruijn sequences; ultrametrics; dynamical sys- tems; DNA sequence assembly. 1 WAVELET ANALYSIS ON SEQUENCES, TWO-FOLD DE BRUIJN SEQUENCES 2 not forbidden, however, to define some other partition prompted by physical (geometrical) properties of the system, which would generate a finite alphabet. The fee for such conve- nience is formation of constraints on combinations of more then two subsequent symbols in the symbolic dynamics. In this article we consider a special case of symbolic dynamics gen- erated by baker’s map [11]. It is a chaotic map from the unit square into itself, its symbolic dynamics do not assume any constraints, i.e. all possible symbolic sequences correspond to some trajectory generated by the baker’s map. Study of periodic orbits and consequently cycled symbolic sequences plays a special role in theory of quantum chaotic systems [12]. Unstable periodic orbits form “skeleton” of a dynamical system and knowledge of their hierarchy can give one’s access to many of the system’s dynamical averages, such as, natural measure, Lyapunov exponents, fractal dimensions, entropy. They can efficiently expressed in terms of a sum over the unstable periodic orbits [13,14]. For instance, by means of semiclassical Gutzwiller trace formula [15] the eigenenergy density of chaotic quantum system can be expressed as a sum over long (of orderHeisenbergtime∝ (cid:126)1−d, whereddimensionofthesystem)classicalperiodicorbitsand, in turn, the density-density correlations as the double sum. Into the correlator formula the orbits come in pairs. The non-trivial oscillatory sub-leading terms in the correlator [16–18] result from interference between special type of periodic orbits, such that the partner orbits are close to each other everywhere in configuration space, but the same configuration points are visited in different time-order (see fig. 1 a). The time-order switching happens in the regions called encounters, where each of the orbits comes onto the shortest distance to itself, being, at the same time, mostly distanced from the partner. Symbolic representation of periodic orbits is not sensitive enough to distinguish points within the encounters and the free loops on the level of a single symbol (fig. 1 b). The closeness of the partner orbits is reflected in the fact that the corresponding symbolic sequences being globally different are locally identical, i.e. they both consist of the same set of sub-strings of some given length p (fig. 1 c). The above picture was recently formalized in [19] under the notion of p-closeness: Two cyclic symbolic sequences X and Y of the same length n are p-close p to each other (X ∼ Y) if any sub-string of the length p appears both in X and in Y the p same number of times (might be zero). The properties of the equivalence relation ∼ allows to introduce an ultrametric distance d(X,Y) on the set of periodic sequences. Such distance, in addition to the standard properties of distances, also satisfies the strong triangle inequality: d(X,Y) ≤ max[d(X,Z),d(Z,Y)] (see for instance [20]). In other words, all cyclic sequences of a given length n are distributed over hierarchically nested clusters with respect to their p-closeness and thus they can be classified with respect to their local content, see fig. 2. The p-closeness can be seen as generalization of de Bruijn sequences [4], which first appearance can be traced back to the mathematical work of Flye Sainte-Marie [21] published in 1894. WAVELET ANALYSIS ON SEQUENCES, TWO-FOLD DE BRUIJN SEQUENCES 3 Figure 1. a) Schematic representation of a periodic orbit with two encounters crossingthePoincar´esectionsurfaceinpointseandd(thecoloredloops). b)Possible choice of the finite partition on the Poincar´e section surface. The 8 letters of the emerging alphabet are encoded by binary numbers. c) Symbolic representation for the pair of periodic orbits. The sequences are different but all triples of symbols containing in the upper sequence enter the same number of times into the lower one. d) A part of DNA sequence and the frequencies of the three-letters sub-strings. e) Graphical representation of the DNA sequences as a path on de Bruijn graph. De Bruijn sequence is a cyclic sequence characterized by the property that each possible sub-string of a given length p enters the sequence and only once. Interesting, that the similar need in searching of symbolic sequences from the set of their sub-strings emerges in area absolutely different from dynamical systems, the problem of contiguous DNA sequence assembly from the shorter DNA fragments [5]. In more detailed exposition, the procedure of genome reading represents a set of multiple readings of small DNA pieces called reads, while reconstruction of the whole original nucleotide sequence becomes an algorithmic problem of assembling a number of reads into a single sequence. In the modern de novo assembly approach the sequence assembly is considered as a search of an Eulerian cycle (path on a graph which traverses all edges and each only once) on a directed graph [22] (see fig. 1 d,e), a sub-graph of de Bruijn graph (more precisely de Bruijn-Good graph [23,24]). The vertices of the sub-graph are associated with the strings of a fixed length p extracted from the set of all reads. Two vertices are connected if the corresponding strings share an exact p−1 overlap (exact definition of de Bruijn graph and its properties will be discussed in section 2.3). Taking into account that a certain sub-string can appear in the native DNA several times one have to deal with sequences possessing a certain encounter structure. There is a number of principal questions arising in the field, which has been WAVELET ANALYSIS ON SEQUENCES, TWO-FOLD DE BRUIJN SEQUENCES 4 Figure 2. Clustering of cyclic binary sequences of the length n = 11 (only half of the tree, 94 sequences out of 188, is shown). The end-points of the tree (black circles) represent the sequences, the numbers in the circles give the total number of sequences in the corresponding cluster. The tree is constructed by the procedure developed in section 3. Cluster of sequences with z = 2 ones and 9 zeros contains 5 sequences, with z = 3 ones – 15, with z = 4 – 30, with z = 5 – 42, the sequences with z = 0,1 do not form non-trivial clusters, see eq. (12). Parameter z determines the number of branches on the level p = 1 of the tree (first from the top), the tree brunches up to the maximal level p = 5, see section 3.4. max reported in the literature: how the number of possible realizations of sequences behaves with the length of the sequences segments, p, and the related question what is the optimal length of the reads [25]? The computational bottleneck in realization of de novo algorithms is the amount of memory required for saving the de Bruijn sub-graph data, which, obviously, becomes central for assembling of large genomes [26–28]. Generallyspeaking,boththestudyoftopologicalstructuresofperiodicorbitsindynamical systemsandtheproblemofDNAsequenceassemblyarebasedonthesamemathematicalap- paratus. Our recent finding of the ultrametric structure of the set of symbolic sequences [19] allows to extend our understanding of the problems and develop novel analytic and algo- rithmic approaches. Some of these ideas we demonstrate in the present article. Moreover, we believe that the introduced below operator language and wavelet analysis on cyclic sym- bolic sequences will be useful as well for formulation of advanced theoretical approaches in WAVELET ANALYSIS ON SEQUENCES, TWO-FOLD DE BRUIJN SEQUENCES 5 theories describing multi-scale stochastic processes on the set of symbolic sequences (see for instance [29–33]). In section 2 we introduce the operator formalism for the work with symbolic sequences in their frequency representation and reproduce the main known facts about the ultrametric structure of sequences (section 2.2), their geometric representation by means of de Bruijn graphs, as well as formula for the number of de Bruijn sequences obtained by counting Euler and Hamiltonian cycles on the de Bruijn graphs (section 2.3). Here (section 2.4) we also repeat the known results regarding the distribution of clusters of symbolic sequences and derive some related basic formulae. The main analytic structures discussed in the present research are rising (section 2.2) and lowering (section 3.1) operators on the cluster tree. Analysis of them in basis of wavelets, constructed by analogy with p-adic wavelets [34,35] (see also [36]), allows to define rigorous procedure for reconstruction of sequences belonging to certain clusters (see example on fig. 2). According to the procedure, on each hierarchical level one reduces the problem to the solution of a number of independent linear systems of Diophantine equations (section 3.2). The main advantage of the approach is that after a certain hierarchy level p, such that (cid:96)p > n ((cid:96) is the number of letters in alphabet, n is the length of the sequence) there is only a finite number of relevant solutions and this number decreases as p increases. As we show in section 3.3, the “center of masses” of our approach lies in analysis of connectivity of the obtained sequences. Such “factorisation” of the problem made possible derivation of a formula, eq. (33), for counting of two-fold binary de Bruijn sequences. The notion of two-fold de Bruijn sequences is introduced in section 3.5. Finally, in section 4 we discuss obtained results and consider possible applications of the revealed classification of sequences in various fields. 2. Set of cyclic symbolic sequences. Ultrametric structure of the set. 2.1. Primary definitions. Let XA(cid:96) denotes a set of all strings of n characters chosen from n (cid:96)-symbols alphabet A . The total number of such sequences is known to be (cid:96)n, we write (cid:96) (cid:13) (cid:13) (cid:13)XA(cid:96)(cid:13) = (cid:96)n. It is convenient to enumerate the letters of the alphabet A by integer numbers n (cid:96) from zero to (cid:96)−1. LetV bean(cid:96)p-dimensionalvectorspacewiththescalarproductdefinedinastandardway. (cid:96)p (cid:110) (cid:111) Let the set e(p), e(p),...,e(p) be the elementary basis in the linear space V . Elementary 1 2 (cid:96)p (cid:96)p basis consists of column-vectors, such that all entries of the kth vector, e(p), except the kth k one, are zeros. Below, we also use notation {e } for the elementary basis vectors of k k=1,...,(cid:96) the space V . (cid:96) Let us define the bijective transformation V between the symbolic sequences from XA(cid:96) n and the basis vectors of the space V by the rule (cid:96)p n (cid:79) (1) V : [α ,α ,...α ] ∈ XA(cid:96) → e ≡ e(n) ∈ V . 1 2 n n 1+αk 1+(cid:80)nk=1αk(cid:96)n−k (cid:96)n k=1 WAVELET ANALYSIS ON SEQUENCES, TWO-FOLD DE BRUIJN SEQUENCES 6 Here the symbol ⊗ stands for the Kronecker product, such that   aa(cid:48) ab(cid:48) ba(cid:48) bb(cid:48) (cid:32) (cid:33) (cid:32) (cid:33) a b a(cid:48) b(cid:48)  ac(cid:48) ad(cid:48) bc(cid:48) bd(cid:48)    ⊗ =  . c d c(cid:48) d(cid:48)  ca(cid:48) cb(cid:48) da(cid:48) db(cid:48)    cc(cid:48) cd(cid:48) dc(cid:48) dd(cid:48) Representation of matrices in terms of Kronecker product is useful out of its basic properties: (A⊗B)·(C ⊗D) = (A·C)⊗(B·D) and (A⊗B)† = A†⊗B†. Here and below the symbol † stands for transponation and, if applicable, simultaneous complex conjugation. To proceed with the formalization we introduce the reduction operator σ : V → V p (cid:96)n (cid:96)p for p ≤ n. Contrary to the transformation V the aim of the latter construction is to project onlythefirstpsymbolsofthesequence[α ,α ,...α ] ∈ XA(cid:96) ontothevectorspaceofreduced 1 2 n n dimensionality, ignoring information of other symbols: p (cid:79) σ V[α ,α ,...α ] = e ≡ e(p) ∈ V . p 1 2 n 1+αk 1+(cid:80)pk=1αk(cid:96)p−k (cid:96)p k=1 In terms of direct product the operator takes the following form (here and below 1 is k×k k identity matrix and for matrices 1 we omit the index (cid:96)) (cid:96) p n (cid:79) (cid:79) σ = 1⊗ q†, p k=1 k=p+1 where q† means the left multiplication on the vector of one’s, i.e. q† = (e +e +···+e )†. 1 2 (cid:96) It is assumed that at p = n the matrix σ coincides with the (cid:96)n ×(cid:96)n identity matrix. p Let us finally introduce operator T of a cyclic shift of symbols in symbolic sequence of n the length n, such that: V−1T V[α ,α ,...α ] → [α ,...α ,α ]. Its matrix representation n 1 2 n 2 n 1 in the basis e(n), ... e(n) is given by the formula 1 (cid:96)n (cid:32) (cid:33) (cid:96) n−1 (cid:88) (cid:79) T = e ⊗ 1 ⊗e†. n k k k=1 m=1 With the help of the shift operator one can define the set of cyclic sequences XA(cid:96) as the n factor set XA(cid:96) over T , XA(cid:96) = XA(cid:96)/T . The volume of the space can be estimated by the n n n n n combinatorial formula known as the number of necklaces [6,37] (cid:18) (cid:19) (cid:13) (cid:13) 1 (cid:88) (cid:89) 1 (2) (cid:13)XA(cid:96)(cid:13) = φ(d)(cid:96)n/d φ(d) = d 1− n n q d|n q|d where the sum runs over d, the divisors of n (including d = 1 and d = n), and φ(d) is the Euler’s totient function, q|d means that the product runs over all prime divisors of d. For n – prime the formula becomes particularly simple (cid:13) (cid:13) (cid:96)n +(n−1)(cid:96) (cid:13)XA(cid:96)(cid:13) = n is prime. n n WAVELET ANALYSIS ON SEQUENCES, TWO-FOLD DE BRUIJN SEQUENCES 7 Notealsothatfrom(2)onecanseethatthenumberofnon-primecyclicsequencesisnegligible in comparison with the number of prime sequences in the limit n → ∞. 2.2. Hierarchical structure of the set of cyclic symbolic sequences. In the article we are interested in the properties of the combined projecting operator n−1 (cid:88) (3) P = σ Tk p,n p n k=0 The operator P acting on the vector e(n) = V[α ,α ,...,α ] ∈ V has the image p,n 1+(cid:80)n α (cid:96)n−k 1 2 n (cid:96)n k=1 k in V . Its action on the basis vector e(n) produces a vector x ∈ V with non-negative (cid:96)p k p,n (cid:96)p integer entries, we write x ∈ N . This set we denote as F . The jth entry of the vector p,n 0 p,n x ∈ F can be seen as frequency, i.e. the number of entrances (might be zero) of the p,n p,n string [β ,...,β ] of the length p, such that j = 1+(cid:80)p β (cid:96)p−k, into the original sequence 1 p k=1 k [α ,α ,...,α ]. 1 2 n Consider some properties of P . Each vector x ∈ F has the property p,n p,n p,n (cid:96)p (cid:88) (4) [x ] = n. p,n j j=1 This follows from the fact that P is a sum of n terms (see eq. (3)), each of them produces p,n an elementary basis vector in V . In particular, action of P on the vector corresponding (cid:96)p p,n to the sequence with identical symbols produces a vector proportional to n: (5) P e(n) = ne(p) α = 0,...,(cid:96)−1. p,n 1+α(cid:80)n (cid:96)n−k 1+α(cid:80)p (cid:96)p−k k=1 k=1 This is reflection of the generic property, that if some sequence (e(n)) has the minimal period k d, then all entries of x = P e(n) are proportional to n/d. p,n p,n k All sequences obtained as a shift of a given one, i.e. vectors e(n),T e(n),...,Tn−1e(n), k n k k correspond to one and the same image x = P e(n). It follows directly from the structure p,n p,n k of P . Below, to avoid this trivial degeneracy we prefer to work with the operator (3) acting p,n on the factor space VXA(cid:96). n Let us now introduce a raising operator R : F → F which acts by the rule: for p−1 p,n p−1,n all k = 1,...,(cid:96)n (6) if x = P e(n) and x = P e(n), then R x = x . p+1,n p+1,n k p,n p,n k p p+1,n p,n Its matrix form can be read off from the relation P = R P , it has the form p,n p p+1,n p (cid:79) (7) R = 1⊗q†. p k=1 Note, that the form of operator R does not depend on n. p WAVELET ANALYSIS ON SEQUENCES, TWO-FOLD DE BRUIJN SEQUENCES 8 (cid:13) (cid:13) Example 1. Let n = 3, (cid:96) = 2. The number of sequences is (cid:13)X(2)(cid:13) = 8, the size of the (cid:13) 3 (cid:13) (cid:13) (cid:13) factor set (cid:13)X(2)(cid:13) = 4. The elementary basis has the form (cid:13) 3 (cid:13) (cid:32) (cid:33) (cid:32) (cid:33) 1 0 e = , e = . 1 2 0 1 The operators T and σ in this basis have the forms 2   1 0 0 0 0 0 0 0 (cid:32) (cid:33) 1 1 1 1 0 0 0 0   0 0 1 0 0 0 0 0 σ =   1   0 0 0 0 1 1 1 1  0 0 0 0 1 0 0 0       0 0 0 0 0 0 1 0    T =  , 1 1 0 0 0 0 0 0  0 1 0 0 0 0 0 0     0 0 1 1 0 0 0 0       0 0 0 1 0 0 0 0  σ2 =      0 0 0 0 1 1 0 0   0 0 0 0 0 1 0 0      0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 1 Operator P has the following matrix representation if it acts in the space VX(2) (on the left) p 3 and in the space VX(2) (the right column) 3 (cid:32) (cid:33) (cid:32) (cid:33) 3 2 2 1 2 1 1 0 3 2 1 0 P = P = 1,3 1,3 0 1 1 2 1 2 2 3 0 1 2 3     3 1 1 0 1 0 0 0 3 1 0 0     0 1 1 1 1 1 1 0 0 1 1 0     P =   P =   2,3 2,3  0 1 1 1 1 1 1 0   0 1 1 0      0 0 0 1 0 1 1 3 0 0 1 3 As one can see vectors from the sets F and F have integer entries and support the 1,3 2,3 general results (4) and (5). One can reveal (generically not uniquely) the corresponding symbolic sequence form x . For instance, p,n j = 1+(cid:80)2 β 22−k β β [x ] k=1 k 1 2 2,3 j 1 0 0 1 x = (1 1 1 0) : 2 0 1 1 2,3 3 1 0 1 4 1 1 0 correspond to the cyclic sequence [001]. Now we are at position to discuss the hierarchical structure of the set of cyclic sequences. First we introduce the notion of p-closeness (see [19]). Defenition 1. Two sequences a,b ∈ XA(cid:96) are p-close, a ∼p b, if their images x = P Va n p,n p,n and y = P Vb are equal x = y . p,n p,n p,n p,n WAVELET ANALYSIS ON SEQUENCES, TWO-FOLD DE BRUIJN SEQUENCES 9 p There are three important properties of the equivalence relation ∼: p p p i. The relations a ∼ b and a ∼ c also imply that b ∼ c; p p−1 ii. The relation a ∼ b implies that a ∼ b. p−1 p iii. The relation a ∼ b does not imply that a ∼ b. The first is the direct sequence of the definition 1, while the second follows from the linear properties of the raising operator (6), (7). The third sentence, in particular, means that the lowering operator translating x = P e(n) into x = P e(n) cannot be constructed p,n p,n k p+1,n p+1,n k as a linear operator. The latter claim will be clarified in the part 3. From the properties i, ii, iii it follows that all cyclic sequences can be distributed over hierarchically nested clusters with respect to their closeness. Using the notion of p-closeness one can naturally introduce the ultrametric distance on the set XA(cid:96) in the following way: n The ultrametric distance d(a,b) between a, b ∈ XA(cid:96) is defined as n (8) (cid:110) (cid:111) d(a,b) = e−γmax(a,b) γ (a,b) = arg max p : a ∼p b and d(a,b) = 0 if a = b. max p=0,...,n−1 The distance d(a,b) is positive, symmetric and satisfies the strong triangle inequality: d(a,b) ≤ max{d(a,c), d(b,c)} The latter can be easily proved. Assume that a ∼p c, b ∼p(cid:48) c but b p(cid:54)∼(cid:48)+1 c and p(cid:48) > p. From the p(cid:48) p(cid:48) p(cid:48)+1 definition of p-closeness it follows that a ∼ c and therefore a ∼ b but a (cid:54)∼ b. The function γ can be calculated with the help of the operator R . If x , x(cid:48) ∈ F max p n,n n,n n,n are respectively represent the sequences a and b then (cid:8) (cid:9) γ = arg max p : R ·R ...R (x −x(cid:48) ) = 0 . max p p+1 n−1 n,n n,n p=0,...,n−1 Before we start our study of the clusters it is instructive to discuss first the graphical representation of sequences and some basic results. 2.3. Geometric representation of the cyclic sequences. De Bruijn sequences. The simplest way to represent the set of sequences XA(cid:96) graphically is to use regularly branching n tree. Setting the “left” end of the sequence onto the root (the top point) of the tree one goes down choosing the branches according to the current letter within the sequence (see appendix A). Thus the “leaves” are enumerated by integer numbers from 1 to (cid:96)n, i.e. by the indexes of the elementary basis vectors of the space V . The obtained tree obviously (cid:96)n generates ultrametrics on the set of symbolic sequences, but does not reflect the idea of equivalence of symbols with respect to their positions in the sequence and dissimilarity of symbols with respect to their local surrounding. To this end a more adequate graphical structure, de Bruijn graph, has been proposed [23]. The notion of de Bruijn graph is used here for a class of directed graphs G (p) labeled by (cid:96) the integer numbers p and (cid:96). Graph G (p) has (cid:96)p vertices connected by (cid:96)p+1 directed edges. (cid:96) WAVELET ANALYSIS ON SEQUENCES, TWO-FOLD DE BRUIJN SEQUENCES 10 Figure 3. De Bruijn graphs, from left to right, G (0), G (1), G (3). The edges 2 2 2 and vertices are encoded by binary sequences, see text for explanation. Each vertex is designated with the symbolic string from XA(cid:96), figure 3. For our purposes, it is p convenient also to mark the vertices by the corresponding basis vectors e(p) ∈ V (exceptions k (cid:96)p are the graphs G (0) and G (1), otherwise one has to work with zero- and one-dimensional (cid:96) (cid:96) spaces), where the vector index k is calculated according to the rule (1). Two vertices e(p) k and e(p) are connected by the edge with the label e(p+1) = e(p)⊗e ∈ V (j = 1,...,(cid:96)) iff k(cid:48) (cid:96)k+j−1 k j (cid:96)p+1 e(p) = L e(p) ⊗e , where L is the incidence matrix: k(cid:48) p k j p (cid:32) (cid:33) p (cid:79) L = q† ⊗ 1 . p k=1 Consider now some properties of the graph G (p). First, vertices have exactly (cid:96) outgoing (cid:96) and (cid:96) incoming edges. Therefore, the graph is Eulerian, i.e. one can find a path through all vertices, such that it traverses each edge only once. Second, the (cid:96)p × (cid:96)p adjacency matrix Q of the graph G (p) can be restored from matrices R and L . Indeed, starting from some (cid:96)p (cid:96) p p vertex e(p) we obtain all possible outgoing edges by the formula T ·q⊗e(p) = R†e(p). The k p+1 k p k backward projection onto the set of vertices can be done with the help of L . Finally we p obtain (cid:32) (cid:33) p−1 (cid:79) (9) Q = q† ⊗ 1 ⊗q = L R†. (cid:96)p p p k=1 By analogy one can derive the expression for the connectivity (cid:96)p+1 × (cid:96)p+1 matrix Q˜ (the (cid:96)p adjacencymatrixforedges). ItisQ˜ = T ·q⊗L = L R† , whichis, ontheotherhand, (cid:96)p p+2 p p+1 p+1 ˜ the vertex adjacency matrix Q of the graph G (p+1). Due to equivalence Q = Q via (cid:96)p+1 (cid:96) (cid:96)p (cid:96)p+1 the line graph construction [38] each graph G (p) is Eulerian and Hamiltonian. The number (cid:96) of Hamiltonian paths on the graph G (p + 1) is equal to the number of Eulerian paths on (cid:96) the graph G (p). (cid:96) The sequences corresponding to the Eulerian paths on de Bruijn graph have a joint name of de Bruijn sequences [4]. The de Bruijn sequence is characterized by the fact that each possible sub-string of the length p enters the sequence and only once, therefore its length is n = (cid:96)p. Below we give definition of de Bruijn sequences in terms introduced in this article and as a special case of f-fold de Bruijn sequences.

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.