ebook img

Universal Joint Image Clustering and Registration using Partition Information PDF

1.4 MB·
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Universal Joint Image Clustering and Registration using Partition Information

1 Universal Joint Image Clustering and Registration using Partition Information Ravi Kiran Raman and Lav R. Varshney Abstract The problem of joint clustering and registration of images is studied in a universal setting. We define universal joint clustering and registration algorithms using multivariate information functionals. We first study the problem of registering two images using maximum mutual information and prove its asymptotic optimality. We then show the shortcomings of pairwise registration in multi-image registration, and design an asymptotically optimal algorithm 7 1 based on multiinformation. Finally, we define a novel multivariate information functional to perform joint clustering 0 and registration of images, and prove consistency of the algorithm. 2 n Index Terms a J Image registration, clustering, universal information theory, unsupervised learning, asymptotic optimality 0 1 I. INTRODUCTION ] Suppose you have an unlabeled repository of medical images of MRI, CT, and PET scans of brain regions T I corresponding to different patients from different stages of the diagnostic process. You wish to sort the images into . s clusters corresponding to individual patients and align the images within each cluster. In this work, we address this c exact problem of joint clustering and registration of images using novel multivariate information functionals. [ Differentdigitalimagesofthesamescenecanappearsignificantlydifferentfromeachotherbasedontheimaging 1 device, the application of filters, and orientation of the images. Such meta-data about the digital images is often not v 6 completely available. This emphasizes the need for universality in the design of reliable clustering and registration 7 algorithms. 7 The tasks of image clustering and registration have often been dealt with independently. We emphasize here that 2 0 the two problems are not necessarily independent and define a universal, reliable, joint clustering and registration 1. algorithm. 0 There is a rich literature on clustering and registration; we describe a non-exhaustive listing of relevant prior 7 work. 1 : Unsupervised clustering of objects has been studied under a wide variety of optimality and similarity criteria. The v k-means algorithm and its generalization to Bregman divergences [1] are some popular distance-based methods.. i X In this work, we focus on information-based clustering algorithms [2], [3], owing to the ubiquitous nature of r information functionals in universal information processing. The task of universal clustering has been studied in a the communication and crowdsourcing settings in [4] and [5], respectively. Separate from clustering, multi-image registration has been studied extensively. Some prominent region-based registration methods include maximum likelihood (ML) [6], minimum KL divergence [7], correlation detection [8], and maximum mutual information (MMI) [9]. Feature-based techniques such as [10], [11] have also been explored. For a more comprehensive and detailed survey of existing techniques, see [12]. Lower bounds on mean squared error for image registration in the presence of additive noise using Ziv-Zakai and Cramer-Rao bounds have been explored recently [13], [14]. The MMI decoder was originally developed in communications [15] and deterministic reasons for its effectiveness in image registration have also been studied [16]. Correctness has also been established through information-theoretic arguments [17]. While the MMI method has been found to perform well through numerous empirical studies, concrete theoretical guarantees of optimality are still lacking. In this work, we extend the framework of universal delay estimation for memoryless sources [18] to derive universal asymptotic optimality guarantees for the MMI method under the This work was supported in part by Air Force STTR Contract FA8650-16-M-1819. The authors are with the University of Illinois at Urbana-Champaign. 2 Hamming loss. Further, we show the shortcomings of pairwise information measures in multi-image registration and define novel multivariate functionals that overcome them. We next introduce the image and channel model, and characterize rigid-body transformations. We then prove asymptoticoptimalityoftheMMImethodfortwo-imageregistration.Wethenconsidermulti-imageregistrationand defineasymptoticallyoptimalregistrationalgorithmsusingmultiinformation.Finally,weperformreliable,universal, joint clustering and registration of images using multivariate information. II. MODEL We now formulate the joint image clustering and registration problem and define the model of the images that we work with. A. Image and Noise In this work, we consider a simple model of images. Specifically, we assume that each image is a collection of n pixels drawn independently and identically from an unknown prior distribution P (·) defined on finite space of R pixel values [r]. Since the pixels are drawn i.i.d., we represent the original scene of the image by an n-dimensional random vector, R ∈ [r]n. More specifically, the scene is drawn according to P[R] = P⊗n(R). R Consider a finite collection of (cid:96) distinct scenes (drawn i.i.d. according to the prior) R = (cid:8)R(1),...,R((cid:96))(cid:9). This set can be interpreted as a collection of different scenes. Each image may be viewed as a noisy depiction of an underlying scene. Consider a collection, {R˜(1),...,R˜(m)}, of m scenes drawn from R, with each scene chosen independently and identically according to the pmf (p ,...,p ). Specifically the image i corresponds to a depiction of the scene R˜(i), 1 (cid:96) for i ∈ [m]. We model the images corresponding to this collection of scenes as noisy versions of the underlying scenes drawn as follows (cid:96) n P(cid:104)X˜[m](cid:12)(cid:12)R˜[m](cid:105) = (cid:89)(cid:89)W (cid:16)X˜(K(j))|R(j)(cid:17), (1) (cid:12) i i j=1i=1 where K(j) ⊆ [m] is the inclusion-wise maximal subset such that R˜(i) = R(j) for all i ∈ K(j). That is, images corresponding to the same scene are jointly corrupted by a discrete memoryless channel, while the images corresponding to different scenes are independent conditioned on the scene. Here we assume X˜ ∈ [r]n. The system is depicted in Fig. 1. B. Image Transformations The corrupted images are also subject to independent rigid-body transformations such as rotation and translation. Since images are vectors of length n, transformations are represented by permutations of [n]. Let Π be the set of all allowable transformations. Let π ∼ Unif(Π) be the transformation of image j. Then, the final image is X(j) = X˜(j) , for all i ∈ [n]. j i π (i) j Image X transformed by π is depicted interchangeably as π(X) = X . π In order to facilitate registration, we assume that we are aware of Π. Further, we assume Π forms a commutative algebra over the composition operator ◦. More specifically, 1) for π ,π ∈ Π, π ◦π = π ◦π ∈ Π; 1 2 1 2 2 1 2) there exists unique π ∈ Π s.t. π (i) = i, for all i ∈ [n]; 0 0 3) for any π ∈ Π, there exists a unique inverse π−1 ∈ Π, s.t. π−1◦π = π◦π−1 = π . 0 The number of distinct rigid transformations of images with n pixels on the Z-lattice is polynomial in n, i.e., |Π| = O(nα) for some α ≤ 5 [19]. Definition 1 (Permutation Cycles): A permutation cycle, {i ,...,i }, is a subset of permutation π of [n], such 1 k that π(i ) = i , for all j < k and π(i ) = i . j j+1 k 1 It is clear from the pigeonhole principle that any permutation is composed of at least one permutation cycle. Let the number of permutation cycles of a permutation π be κ . π 3 Fig. 1. Model of the joint image clustering and registration problem Definition 2 (Identity Block): The identity block for any permutation π ∈ Π is the inclusion-wise maximal subset I ⊆ [n] such that π(i) = i, for all i ∈ I . π π Definition 3 (Simple Permutation): A permutation π is a simple permutation if κ = 1 and I = ∅. π π Definition 4 (Non-overlapping Permutations): Any two permutations π,π(cid:48) ∈ Π are said to be non-overlapping if π(i) (cid:54)= π(cid:48)(i) for all i ∈ [n]. Lemma 1: Let π be chosen uniformly at random from the set of all permutations of [n]. Then, for any constants c ∈ (0,1], C, (cid:20) (cid:21) n P[|I | > cn] (cid:46) exp(−cn), P κ > C = o(1). (2) π π logn Proof: First, we observe that the number of permutations that have an identity block of size at least cn is given by (cid:18) (cid:19) n n! ν ≤ ((1−c)n)! = . c cn (cn)! Thus, (cid:18) (cid:19) 1 1 P[|I | ≥ cn] ≤ √ exp −(cn+ )log(cn)+cn , π 2π 2 from Stirling’s approximation. Lengths and number of cycles in a random permutation may be analyzed as detailed in [20]. In particular, we note that for a random permutation π, E[κ ] = logn+O(1). π Using Markov’s Inequality, the result follows. (cid:16) (cid:17) Following the observations of Lemma 1, we assume that for any π ∈ Π, κ = o n , i.e., the number of π log(n) permutation cycles does not grow very fast. Secondly, let |I | ≤ γ n for any π ∈ Π, for γ = o(1). π π π 4 Fig. 2. Model of the two image registration problem: Image Y is to be registered with source image X. C. Performance Metrics We now formally introduce performance metrics to quantify performance of the joint clustering and registration algorithms. Definition 5 (Correct Clustering): A clustering of a set of images {X(1),...,X(m)} is a partition P of [m]. The sets of a partition are referred to as clusters. The clustering is said to be correct if i,j ∈ C ⇔ R(i) = R(j), for all i,j ∈ [m], C ∈ P. Let P be the set of all partitions of [m]. For a given collection of scenes, we shall represent the correct clustering by P∗. Definition 6 (Partition Ordering): A partition P is finer than P(cid:48), if the following ordering holds: P (cid:22) P(cid:48) ⇔ for all C ∈ P, there exists C(cid:48) ∈ P(cid:48) : C ⊆ C(cid:48). Similarly, a partition P is said to be denser than P(cid:48), P (cid:23) P(cid:48), if P(cid:48) (cid:22) P. Definition 7 (Correct Registration): The correct registration of an image X transformed by π ∈ Π is πˆ = π−1. Definition 8 (Universal Clustering and Registration Algorithm): A universal clustering and registration algo- rithm is a sequence of functions Φ(n) : (cid:8)X(1),...,X(m)(cid:9) → P×Πm that are designed in the absence of knowledge of W, {p ,...,p }, and P . Here the index n corresponds to the number of pixels in each image. 1 (cid:96) R In this work, we focus on the 0−1 loss function to quantify the performance of algorithms. Definition 9 (Error Probability): TheerrorprobabilityofanalgorithmΦ(n) thatoutputsPˆ ∈ P and(πˆ ,...,πˆ ) ∈ 1 m Πm is (cid:104) (cid:105) P (Φ(n)) = P {Pˆ (cid:54)= P∗}∪{there exists i ∈ [m] : πˆ (cid:54)= π−1} . (3) e i i Definition 10 (Asymptotic Consistency): A sequence of decoders Φ(n) is said to be asymptotically consistent if lim P (Φ(n)) = 0. e n→∞ In particular, it is exponentially consistent if lim −logP (Φ(n)) > 0. e n→∞ Definition 11 (Error Exponent): The error exponent of an image clustering and registration algorithm Φ(n)(·) is 1 E(Φ(n)) = lim − logP (Φ(n)). (4) e n→∞ n For simplicity, we will use Φ to denote Φ(n) when it is clear from context. III. REGISTRATION OF TWO IMAGES We now restrict the problem to that of registering two images. Specifically, we consider the scenario with m = 2, (cid:96) = 1. Thus the problem reduces to one of registering an image Y obtained as a result of transforming the output of an equivalent discrete memoryless channel W, given input image (reference) X. The model of the channel is depicted in Fig. 2. This problem has been well-studied in practice and a popular heuristic method used extensively is the maximum mutual information (MMI) method defined as πˆ = argmaxIˆ(X;Y ), (5) MMI π π∈Π 5 where Iˆ(X;Y) is the mutual information corresponding to the empirical joint distribution n 1 (cid:88) Pˆ(x,y) = 1{X = x,Y = y}., i i n i=1 where 1{·} is the indicator function. It is worth noting that the MMI method for image registration is universal. Since transformations are chosen uniformly at random, the maximum likelihood (ML) estimate is Bayes optimal: n (cid:89) (cid:0) (cid:1) πˆ = argmax W Y |X . (6) ML π(i) i π∈Π i=1 We first show that the MMI method, and consequently the ML method, are exponentially consistent. We then show that the error exponent of MMI matches that of ML. A. Exponential Consistency The empirical mutual information of i.i.d. samples is asymptotically exponentially consistent. Theorem 1: Let {(X ,Y ),...,(X ,Y )} be random samples drawn i.i.d. according to the joint distribution p 1 1 n n defined on the finite space X × Y. For fixed alphabet sizes, |X|,|Y|, the ML estimate of entropy and mutual information are asymptotically exponentially consistent and satisfy (cid:104) (cid:105) P |Hˆ(X)−H(X)| > (cid:15) ≤ (n+1)|X|exp(cid:0)−cn(cid:15)4(cid:1), (7) (cid:104) (cid:105) P |Iˆ(X;Y)−I(X;Y)| > (cid:15) ≤ 3(n+1)|X||Y|exp(cid:0)−c˜n(cid:15)4(cid:1), (8) where c = (cid:0)2|X|2log2(cid:1)−1 and c˜= (cid:0)32max{|X|,|Y|}2log2(cid:1)−1. The proof is given in [5, Lemma 7]. Theorem 2: The MMI and ML image registration estimates are exponentially consistent. Proof: Let Φ (X,Y) = πˆ and let the correct registration be π∗. Then, MMI MMI P (Φ ) = P[πˆ (cid:54)= π∗] e MMI MMI (cid:88) (cid:104) (cid:105) ≤ P Iˆ(X;Y ) > Iˆ(X;Y˜) (9) π π∈Π (cid:88) (cid:104) (cid:105) ≤ P |Iˆ(X;Y )−(Iˆ(X;Y˜)−I(X;Y˜))| > I(X;Y˜) π π∈Π (cid:88) (cid:104) (cid:105) ≤ P Iˆ(X;Y )+|Iˆ(X;Y˜)−I(X;Y˜)| > I(X;Y˜) (10) π π∈Π (cid:110) (cid:111) ≤ 2|Π|exp −c˜n(I(X;Y˜))4 , (11) where (9), (10), and (11) follow from the union bound, the triangle inequality, and (8), respectively. Thus MMI is exponentially consistent as |Π| = O(nα). Finally, P (Φ ) ≤ P (Φ ) and thus, the ML estimate e ML e MMI is also exponentially consistent. Theorem 2 also proves that there exists (cid:15) > 0 such that E(Φ ) ≥ E(Φ ) ≥ (cid:15). ML MMI B. Whittle’s Law and Markov Types We now summarize a few results on the number of types and Markov types which are eventually used to analyze the error exponent of image registration. Consider a sequence x ∈ Xn. The empirical distribution q of x is referred to as the type of the sequence. Let X X ∼ q be a dummy random variable. Let Tn be the set of all sequences of length n, of type q . The number X X X of possible types of sequences of length n is polynomial in n, i.e., O(n|X|) [21]. The number of sequences of length n, of type q , is X n! |Tn| = . X (cid:81) (nq (a))! a∈X X 6 From the bounds on multinomial coefficients, the number of sequences of length n and type q is bounded as [21] (n+1)−|X|2nH(X) ≤ |Tn| ≤ 2nH(X). (12) X Consider a Markov chain defined on the space [k]. Given a sequence of n+1 samples from [k] we can compute the matrix F of transition counts, where F corresponds to the number of transitions from state i to state j. By ij Whittle’s formula [22], the number of sequences (a ,...,a ) with a ∈ [k],i ∈ [n + 1], with a = u and 1 n+1 i 1 a = v is given by n+1 (cid:80) ( F )! N(n)(F) = (cid:89) j∈[k] ij G∗ , (13) uv (cid:81) F ! vu j∈[k] ij i∈[k] where G∗ corresponds to the (v,u)th cofactor of the matrix G = {g } with vu ij i,j∈[k] F ij g = 1{i = j}− . ij (cid:80) F j j∈[k] i The first-order Markov type of a sequence x ∈ Xn is defined as the empirical distribution q , given by X ,X 0 1 n 1 (cid:88) q (a ,a ) = 1{(x ,x ) = (a ,a )}. X ,X 0 1 i i+1 0 1 0 1 n i=1 Here we assume that the sequence is cyclic with period n, i.e., for any i > 0,x = x . Let (X ,X ) ∼ q . n+i i 0 1 X ,X 0 1 Then, from (13), the set of sequences of type q , Tn , satisfies X0,X1 X0,X1 (cid:32) (cid:33) |Tn | = (cid:88)G∗ (cid:89) (nq0(a0))! . X0,X1 a,a (cid:81) (nq (a ,a ))! a∈X a0∈X a1∈X x0,x1 0 1 From the definition of G, we can bound the trace of the cofactor matrix of G as |X| (cid:88) ≤ G∗ ≤ |X|. (n+1)|X| a,a a∈X Again using the bounds on multinomial coefficients, we have |X|(n+1)−(|X|2+|X|)2n(H(X0,X1)−H(X0)) ≤ |Tn | ≤ |X|2n(H(X0,X1)−H(X0)). (14) X ,X 0 1 The joint first-order Markov type of a pair of sequences x ∈ Xn, y ∈ Yn is the empirical distribution n 1 (cid:88) q (a ,a ,b) = 1{(x ,x ,y ) = (a ,a ,b)}. X ,X ,Y 0 1 i i+1 i 0 1 0 1 n i=1 Then given x, the set of conditional first-order Markov type sequences, Tn (x) satisfies [18] Y|X ,X 0 1 (n+1)−|X|2|Y|2n(H(X0,X1,Y)−H(X0,X1)) ≤ |Tn (x)| ≤ 2n(H(X0,X1,Y)−H(X0,X1)). (15) Y|X ,X 0 1 Lemma 2: Let π ,π ∈ Π be any two non-overlapping permutations and let x ,x be the corresponding 1 2 π π 1 2 permutations of x. Let π−1◦π ∈ Π be a simple permutation. Then, for every x, there exists x˜, such that 1 2 |Tn | = |Tn |, |Tn (x)| = |Tn (x˜)|. Xπ1,Xπ2 X0,X1 Y|Xπ1,Xπ2 Y|X0,X1 Proof: Since the permutations are non-overlapping, there exists a bijection from Tn to Tn , where X ,X X ,X (X ,X ) ∼ q . Specifically, consider the permutation π ∈ Π defined iteratively as π(πi1+1π2) = π (π0−11(π(i))), wit0h π(11) = πXπ(11,X).πT2hen, for any x ∈ Tn , the sequence x ∈ Tn . Further, this map is inver2tibl1e and thus 1 X ,X π X ,X π1 π2 0 1 the sets are of equal size. The result for the conditional types also follows from the same argument. In particular, Lemma 2 implies that |Tn | and |Tn (x)|, satisfy (14) and (15) respectively. We now show that the result of Lemma 2 can beXπe1x,tXeπn2ded to anYy|Xtwπ1o,Xpπe2rmutations π ,π ∈ Π. 1 2 Lemma 3: Let π ,π ∈ Π. For any x, 1 2 |TXn ,X | = 2n(H(qXπ1,Xπ2)−H(qX)+o(1)). π1 π2 7 Proof:Letπ = π−1◦π andκ = κ .Fori ∈ [κ]letthelengthofpermutationcycleiofπ beα nforα ∈ (0,1]. 1 2 π i i Further, (cid:80)κ α ≤ 1. Let I be the identity block of π and let γ = γ . Then we have the decomposition i=1 i π π κ (cid:88) q (a ,a ) = α q (a ,a )+γq (a ,a ), (16) X ,X 0 1 i i 0 1 I 0 1 π1 π2 i=1 for all (a ,a ) ∈ X2. Here, q is the first-order Markov type defined on the ith permutation cycle of π and q is 0 1 i I the zeroth-order Markov type corresponding to the identity block of π. From (16), we see that given a valid decomposition of types {q },q , the number of sequences can be computed i I as a product of the number of subsequences of each type. That is, κ (cid:88) (cid:89) |Tn | = |Tγn| |Tαin|, X ,X q q π1 π2 I i i=1 where the summation is over all valid decompositions given in (16). Additionally, from Lemma 2 we know the number of valid subsequences of each type. Let q(cid:48) be the marginal i corresponding to the first-order Markov type q . i Thus, we upper bound the size of the set as κ (cid:88) (cid:89) |TXn ,X | ≤ 2γnH(qI) |X|2αin(H(qi)−H(qi(cid:48))) (17) π1 π2 i=1 κ (cid:89) ≤ |X|κ(γn+1)|X| (αin+1)|X|22nM(qXπ1,Xπ2), (18) i=1 where κ (cid:88) M(q ) = maxγH(q )+ α (H(q )−H(q(cid:48))), Xπ1,Xπ2 I i i i i=1 the maximum taken over all valid decompositions defined by (16). Here, (17) follows from (12) and (14), and (17) follows from the fact that the total number of possible types is polynomial in the length of the sequence. Since (cid:16) (cid:17) κ = o n , log(n) |TXn ,X | ≤ 2n(M(qXπ1,Xπ2)+o(1)). π1 π2 Now, let q(cid:48)(cid:48)(a ,a ) = 1 q(cid:48)(a ), for all i ∈ [κ],(a ,a ) ∈ X2. And let q = q . Since γ = o(1), using i 0 1 |X| i 1 0 1 Xπ1,Xπ2 Jensen’s inequality, we have κ κ (cid:88)α (H(q )−H(q(cid:48))) = (cid:88)α (cid:2)log(|X|)−D(cid:0)q (cid:107)q(cid:48)(cid:48)(cid:1)(cid:3) i i i i i i i=1 i=1 ≤ log(|X|)−D(cid:0)q(cid:107)q(cid:48)(cid:48)(cid:1) = H(q)−H(q(cid:48)). Thus, |Tn | ≤ 2n(H(q)−H(q(cid:48))+o(1)). X ,X π1 π2 To obtain the lower bound, we note that the total number of sequences is at least the number of sequences obtained from any single valid decomposition. Thus from (12) and (14), |TXn ,X | ≥ 2n(M(qXπ1,Xπ2)+o(1)). π1 π2 Now, for large n, consider S = {i ∈ κ : α n = Ω(nβ)} for some β > 0. Any other cycle of smaller length i contributes o(1) to the exponent owing to Lemma 1. One viable decomposition of (16) is to have q = q, for all i i ∈ [κ]. However, the lengths of the subsequences are different and q may not be a valid type of the corresponding length. Nevertheless, for each i ∈ S, there exists a type q such that i |X| d (q ,q) ≤ , TV i 2α n i 8 where d (·) is the total variational distance. Further, entropy is continuous in the distribution [23] and satisfies TV |X| |H(q )−H(q)| ≤ log(α n) = o(1). i i α n i This in turn indicates that |Tn | ≥ 2n(H(q)−H(q(cid:48))+o(1)). X ,X π1 π2 Hence the result follows. A similar decomposition rule follows for the conditional type as well. C. Error Analysis We are interested in the error exponent of the MMI-based image registration, in comparison to ML. We first note that the error exponent of the problem is characterized by the pair of transformations that are the hardest to compare. Define Ψ as the binary hypothesis testing problem corresponding to image registration when the allowed π,π(cid:48) transformations are only {π,π(cid:48)}. Let P (Φ),E (Φ) be the corresponding error probability and error exponent. π,π(cid:48) π,π(cid:48) Lemma 4: Let Φ be an asymptotically exponentially consistent estimator. Then, E(Φ) = min E (Φ). (19) π,π(cid:48) π,π(cid:48)∈Π Proof: Let πˆ be the estimate output by Φ and π∗ ∼ Unif(Π) be the correct registration. We first upper bound the error probability as P (Φ) = P[πˆ (cid:54)= π∗] e ≤ (cid:88) 1 P(cid:2)πˆ = π|π∗ = π(cid:48)(cid:3) (20) |Π| π,π(cid:48)∈Π,π(cid:54)=π(cid:48) 2 (cid:88) ≤ P (Φ), π,π(cid:48) |Π| π,π(cid:48)∈Π,π(cid:54)=π(cid:48) where (20) follows from the union bound. Additionally, we have P (Φ) = 1 (cid:88) (cid:88) P(cid:2)πˆ = π(cid:48)|π∗ = π(cid:3) e |Π| π∈Ππ(cid:48)∈Π\{π} 1 ≥ max P (Φ). π,π(cid:48) |Π| π,π(cid:48)∈Π Finally, since |Π| = O(nα), the result follows. Thus, it suffices to consider the binary hypothesis tests to study the error exponent of image registration. Theorem 3: Let {π ,π } ⊆ Π. Then, 1 2 P (Φ ) lim π1,π2 MMI = 1. (21) n→∞ Pπ1,π2(ΦML) Proof: Probability of i.i.d. sequences are defined by their joint type. Thus, we have P (Φ ) = P[πˆ (cid:54)= π∗] π1,π2 MMI MMI (cid:88) (cid:88) = P[x,y]1{πˆ (cid:54)= π∗} MMI x∈Xny∈Yn (cid:32) (cid:33) (cid:88) (cid:89) (cid:89) = (P[a,b])nq(a,b) ν (q), (22) MMI q a∈Xb∈Y where the summation in (22) is over the set of all joint types of sequences of length n and ν (q) is the number MMI of sequences (x,y) of length n with joint type q such that the MMI algorithm makes a decision error. 9 Fig. 3. Model of the three-image registration problem: Images Y,Z are to be registered with source image X. If a sequence y ∈ Tn (x) is in error, then all sequences in Tn (x) are in error as x is i.i.d.. Now, Y|X ,X Y|X ,X we can decompose the nuπm1beπr2of sequences under error as follows π1 π2 (cid:88) (cid:88) ν(q) = |Tn (x)| Y|X ,X π1 π2 x∈XnTn ⊆Tn :error Y|Xπ1,Xπ2 Y|X (cid:88) (cid:88) = |T | |Tn |, Xπ1,Xπ2 Y|Xπ1,Xπ2 Tn ⊆Tn Xπ1,Xπ2 X where the sum is taken over Tn ⊆ Tn such that there is a decision error. The final line follows from Y|X ,X Y|X the fact that given the joint first-orπd1erπM2 arkov type, the size of the conditional type is independent of the exact sequence x. Finally, we have (cid:26) (cid:27) (cid:88) (cid:89) (cid:89) ν (q) P (Φ ) = (P[a,b])nq(a,b)ν (q) MMI π1,π2 MMI ML ν (q) ML q a∈Xb∈Y (cid:26) (cid:27) ν (q) ≤ P (Φ )max MMI . (23) π1,π2 ML q νML(q) The result follows from Lemma 5. Lemma 5: (cid:26) (cid:27) ν (q) lim max MMI = 1. (24) n→∞ q νML(q) Proof: We first observe that for images with i.i.d. pixels, image registration using MMI is the same as minimizing the joint entropy. That is, maxIˆ(X;π(Y)) = maxHˆ(X)+Hˆ(π(Y))−Hˆ(X,π(Y)) π∈Π π∈Π = Hˆ(X)+Hˆ(Y˜)−minHˆ(X,π(Y)). π∈Π Additionally we know that there exists a bijective mapping from sequences corresponding to the permutations and the sequences of the corresponding first-order Markov type from Lemmas 2 and 3. Thus, the result follows from [18, Lemma 1]. Theorem 4: E(Φ ) = E(Φ ). (25) MMI ML Proof: This follows as a direct consequence of Theorem 3 and Lemma 4. Thus, we can see that the using MMI for image registration is not only universal, but also asymptotically optimal. We next study the problem of image registration for multiple images. IV. MULTI-IMAGE REGISTRATION Having studied the problem of registering two images in a universal sense, we now consider the problem of aligning multiple copies of the same image. For simplicity, let us consider the task of aligning three images. The results obtained for this case can directly be extended to any finite number of copies of images to be aligned. Again, let X be the source image and Y,Z be the noisy, transformed versions to be aligned as shown in Fig. 3. 10 Here, the ML estimates are n (cid:89) (cid:0) (cid:1) (πˆ ,πˆ ) = arg max W Y ,Z |X . (26) Y,ML Z,ML π (i) π (i) i (π ,π )∈Π2 1 2 1 2 i=1 A. MMI is not optimal We know that MMI is asymptotically optimal for aligning two images. Thus the natural question is if performing MMI in a pairwise sense is optimal for multi-image registration. That is, πˆ = argmaxIˆ(X;π(Y)), πˆ = argmaxIˆ(X;π(Z)). (27) Y Z π∈Π π∈Π We first address this question, making it explicit that pairwise MMI method is sub-optimal even though individual transformations are chosen independently and uniformly from Π. Theorem 5: There exists a channel W and a source distribution, such that pairwise MMI method for multi-image registration is sub-optimal. i.i.d. Proof: Let X ∼ Bern(1/2),i ∈ [n]. Consider the physically degraded images Y,Z obtained as the output of i the channel defined by W (y,z|x) = W (y|x)W (z|y). 1 2 This scenario is depicted in Fig. 4. Naturally, the ML estimate of the transformations is obtained by registering image Z to Y and subsequently registering Y to X, instead of registering each of the images in a pairwise sense to X. That is, from (26) n (cid:89) (πˆ ,πˆ ) = arg max W (Y |X )W (Z |Y ). Y,ML Z,ML 1 π (i) i 2 π (i) π (i) (π ,π )∈Π2 1 2 1 1 2 i=1 It is evident that π is estimated based on the estimate of π . Z Y Let E (Φ ) be the error exponent of Φ for the physically degraded channel. Since the ML estimate here W ML ML involves registration of Y to X and Z to Y, the effective error exponent is given by E (Φ ) = min{E (Φ ),E (Φ )}. W ML W ML W ML 1 2 LetE (Φ )betheerrorexponentofimageregistrationusingMMIforthechannelQ.Then,theerrorexponent Q MMI of pairwise MMI based image registration is given by E(Φ ) = min{E (Φ ),E (Φ )}. MMI W MMI W ∗W MMI 1 1 2 We know that for registration of two images, MMI is asymptotically optimal and thus E (Φ ) = E (Φ ), E (Φ ) = E (Φ ). W MMI W ML W ∗W MMI W ∗W ML 1 1 1 2 1 2 More specifically, let W = BSC(α) and W = BSC(β) for some α,β ∈ (0,1/2). Let, W ∗W = BSC(γ), 1 2 1 2 where γ = α(1−β)+(1−α)β > max{α,β}. Then, E(Φ ) ≤ E (Φ ) MMI W ∗W MMI 1 2 = E (Φ ) W ∗W ML 1 2 < min{E (Φ ),E (Φ )} W ML W ML 1 2 = E (Φ ). (28) W ML Fig. 4. three-image registration problem with image Z a physically degraded version of Y and to be registered with source image X.

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.