Introduction to the Mathematics of Computed Tomography Adel Faridani ∗† Abstract Computed tomography (CT) entails the reconstruction of a function f from line integrals of f. This mathematical problem is encountered in a growing number of diverse settings in medicine, science, and technology. This introductory arti- cle is divided into three parts. The first part is concerned with general theory and explores questions of uniqueness, stability and inversion, as well as detection of singularities. The second part is devoted to local tomography and is centered around a discussion of recently developed methods for computing jumps of a func- tion from local tomographic data. The third part treats optimal sampling and has at its core a detailed error analysis of the parallel-beam filtered backprojection algorithm. Matlab source code for the filtered backprojection algorithm and the Feldkamp-Davis-Kress algorithm is included in an appendix. 1 Introduction Computed tomography (CT) entails the reconstruction of a function f from line integrals of f. This mathematical problem is encountered in a growing number of diverse settings in medicine, science, and technology, ranging from the famous application in diagnostic radiologytoresearchinquantumoptics. Asaconsequence,manyaspectsofCThavebeen extensively studied and are now well understood, thus providing an interesting model case for the study of other inverse problems. Other aspects, notably three-dimensional reconstructions, still provide numerous open problems. The purpose of this article is to give an introduction to the topic, treat some aspects in more detail, and to point out references for further study. The reader interested in a broader overview of the field, its relation to various branches of pure and applied mathematics, and its development over the years may wish to consult the monographs ∗Dept. of Mathematics, Oregon State University, Corvallis, OR 97331. Email: [email protected]; Homepage: http://oregonstate.edu/˜faridana †This work was supported by NSF grant DMS-9803352 and NIH grant R01 RR 11800-4. 1 [6, 31, 32, 36, 62, 67, 78], the volumes [21, 22, 28, 33, 34, 76, 77], and review articles [42, 49, 56, 58, 66, 84, 89]. In practice only integrals over finitely many lines can be measured, and the distribu- tion of these lines is sometimes restricted. The following presentation is centered around the question: What features of f can be stably recovered from a given collection of line integrals of f? For example, we may ask what resolution can be achieved with the avail- able data. If a full reconstruction of f is not possible, we may try to detect the location of boundaries (jump discontinuities of f), or also the sizes of the jumps. The exposition is divided into three parts. The first part is concerned with general theory. Its main themes are questions of uniqueness, stability and inversion for the x- ray transform, as well as detection of singularities. The second part is devoted to local tomography. The exposition is similar to [17] and is centered around a discussion of recently developed methods for computing jumps of a function from local tomographic data. The third part treats optimal sampling and has at its core a detailed error anal- ysis of the parallel-beam filtered backprojection algorithm. The article conludes with three appendices containing basic results on wavelets, Matlab source code for the filtered backprojection algorithm and the Feldkamp-Davis-Kress algorithm, and some exercises. 2 The x-ray and Radon transforms Webeginbyintroducingsomenotationandbackgroundmaterial. IRn consistsofn-tuples of real numbers, usually designated by single letters, x = (x ,...,x ), y = (y ,...,y ), 1 n 1 n etc. The inner product and absolute value are defined by < x,y >= nx y and x = 1 i i | | √< x,x >. The unit sphere Sn−1 consists of the points with absolute value 1. C∞(IRn) P 0 denotes the set of infinitely differentiable functions on IRn with compact support. A continuous linear functional on C∞ is called a distribution. If X is a set, X◦ denotes 0 its interior, X its closure, and Xc its complement. χ and χ denote the characteristic X n functions(indicatorfunctions)ofX,andoftheunitballinIRn,respectively. I.e.,χ (x) = X 1 if x X, and χ (x) = 0 if x X. X denotes the n-dimensional Lebesgue measure X ∈ 6∈ | | of X IRn. However, when it is clear that X should be treated as a set of dimension ⊂ m < n, X is the m-dimensional area measure. Thus | | Sk−1 = 2πk/2/Γ(k/2) | | is the (k 1)-dimensional area of the (k 1)-dimensional sphere. − − The convolution of two functions is given by f g(x) = f(x y)g(y)dy. ∗ ZIRn − The Fourier transform is defined by fˆ(ξ) = (2π)−n/2 f(x)e−i<x,ξ>dx ZIRn 2 for integrable functions f, and is extended to larger classes of functions or distributions by continuity or duality. For square-integrable functions f,g we have f g(x) = fˆ(ξ)gˆ(ξ)ei<x,ξ>dξ. (1) ∗ ZIRn The integral transforms most relevant for tomography are the x-ray transform and the Radon transform. Definition 2.1 Let θ Sn−1 and Θ⊥ the hyperplane through the origin orthogonal to θ. ∈ We parametrize a line l(θ,y) in IRn by specifying its direction θ Sn−1 and the point y ∈ where the line intersects the hyperplane Θ⊥. The x-ray transform of a function f L (IRn) is given by 1 ∈ Pf(θ,y) = P f(y) = f(y +tθ)dt, y Θ⊥. (2) θ ∈ ZIR The Radon transform of f is defined by Rf(θ,s) = R f(s) = f(x+sθ)dx, s IR. (3) θ ZΘ⊥ ∈ We see that Pf(θ,x) is the integral of f over the line l(θ,y) parallel to θ which passes through y Θ⊥, and that Rf(θ,s) is the integral of f over the hyperplane orthogonal to ∈ θ with signed distance s from the origin. In the following we will be mostly concerned with the x-ray transform. In two dimensions the two transforms coincide apart from the parameterization: We parametrize θ S1 by its polar angle ϕ and define a vector θ⊥ ∈ orthogonal to θ such that θ = (cosϕ, sinϕ), θ⊥ = ( sinϕ, cosϕ). (4) − Then the points in the subspace Θ⊥ are given by Θ⊥ = sθ⊥, s IR and we have the { ∈ } relation Pf(θ,sθ⊥) = Rf(θ⊥,s). Also, when working in two dimensions, we will often use the simplified notation Pf(θ,s) or P f(s) instead of Pf(θ,sθ⊥). Occasionally we will θ also replace θ by the polar angle ϕ according to (4) and write Pf(ϕ,s) . Let us consider two examples. Let G be the Gaussian function G(x) = e−<x,x>/2. Then PG(θ,y) = e−<y,y>/2 e−<tθ,tθ>/2dt = (2π)1/2e−<y,y>/2, y Θ⊥. (5) ∈ ZIR For χ , the characteristic function of the unit ball in IRn, we can use a geometrical n argument. We obtain Pχ (θ,y) = 0 for y > 1 since then the line l(θ,y) does not n | | intersect the unit ball. For y 1 observe that the intersection of the line l(θ,y) with | | ≤ the unit ball in IRn is a line segment of length 2 1 y 2 and that Pχ (θ,y) is equal to n −| | this length. q The following relation between the Fourier transforms of P f and f will prove to be θ useful: 3 Theorem 2.2 Under the hypotheses of Definition 2.1, (P f)∧(η) = (2π)1/2fˆ(η), η Θ⊥ (6) θ ∈ (R f)∧(σ) = (2π)(n−1)/2fˆ(σθ), σ IR (7) θ ∈ Proof: This is a straightforward computation. We demonstrateit for the x-ray transform. Let η Θ⊥. Then ∈ (Pf)∧(θ,η) = (2π)(1−n)/2 Pf(θ,x)e−i<x,η> dx ZΘ⊥ = (2π)(1−n)/2 f(x+sθ) ds e−i<x,η>dx ZΘ⊥ZIR = (2π)(1−n)/2 f(y)e−i<y,η> dy ZIRn = √2π fˆ(η), η Θ⊥.2 ∈ As we will see below, Theorem 2.2 can be used to explore questions of uniqueness, non- uniqueness, stability, and inversion. Current medical scanners employ an x-ray source which moves around the patient. To describe this type of data collection, the parameterization of lines by θ Sn−1 and ∈ y Θ⊥ is less convenient. It is more suitable to introduce the divergent beam x-ray ∈ transform ∞ Df(a,θ) = D f(θ) = f(a+tθ)dt, θ Sn−1, (8) a ∈ Z0 which gives the integral of f over the ray with direction θ emanating from the source point a. The x-ray and Radon transforms are special cases of the general k-plane transform, which maps a function into its integrals over k-dimensional affine subspaces; see, e.g., [42]. 3 Uniqueness and nonuniqueness Theorem 3.1 ([89, 42]) Let f L (IRn) have compact support, and let Pf(θ, ) 0 for 2 ∈ · ≡ infinitely many θ. Then f 0. ≡ 4 ˆ ˆ Proof: The Fourier transform f is analytic and f(η) = P f(η) = 0 on the hyper- θ planes < η,θ >= 0. Since no nontrivial entire function can vanish on an infinite set of hyperplanes through the origin, we must have fˆ 0. 2 d ≡ As an application, consider the so-called limited angle problem. Let Pf(θ, ) be given · for infinitely many θ concentrated in a cone C. Then f is uniquely determined, even if C is very small. However, if C = Sn−1, the reconstruction is not stable. Indeed, the proof 6 of the above theorem shows that reconstructing f is equivalent to analytic continuation ˆ of f, and analytic continuation is known to be extremely unstable. The uniqueness result requires an infinite number of directions, while in practice only a finite number can be measured. It was already recognized by the pioneers of CT that this entails the loss of uniqueness; see the example given in [3]. The next theorem shows that the nonuniqueness is quite extensive, i.e., given Pf(θ , ) for finitely many directions j · θ , there are null functions which can be prescribed arbitrarily on a large portion of their j domain. Theorem 3.2 ([89]) Let θ ,...,θ Sn−1, K IRn compact, and f C∞(K). Let 1 p ∈ ⊂ ∈ 0 K U K with U open and K compact. Then there is f C∞(K), f = f on K , 0 ⊂ ⊂ 0 0 ∈ 0 0 0 and Pf (θ , ) 0, k = 1,...,p. 0 k · ≡ While this result makes it seem difficult to obtain reliable reconstructions in practice, it is not the end of the story. It turns out that the null functions for the x-ray transform are high-frequency functions [51, 52, 53, 60] , and that it is possible to suppress such functions in practical reconstructions. Theorem 3.3 [51, 52, 53] Let f L (IR2) with support contained in the unit disk. If 0 2 ∈ Pf (θ , ) 0 for k = 1,...,p, then 0 k · ≡ fˆ(σθ) = imσ−1J (σ)q (θ), 0 m+1 m m>p X where σ IR, θ S1, J the order m+1 Bessel function of the first kind and q a m+1 m ∈ ∈ polynomial of degree m. SinceJ (t)isverysmallforl > t, itfollowsthatifP f vanishesforpdistinctdirections l θ θ ,thenfˆ(ξ)isalmostentirelyconcentratedintheset ξ IRn : ξ > p [60]. Thismeans j { ∈ | | } ˆ that measuring P f determines f(ξ) reliably for ξ < p. However, the reconstruction θj | | problem may still be severely unstable, e.g., when the directions are concentrated in a narrow range. In cases where sufficient stability is present, a low-pass filtered version of f may be recovered. A loose application of Shannon’s sampling theorem yields that the reconstruction will resolve details of size 2π/p or greater. Remark 3.4 It follows that the influence of nonuniqueness may be avoided in practice under the following conditions: 5 ˆ a) A-priori information that f(ξ) is small for ξ > b is available. | | | | b) Data P f for p > b directions θ are measured. θj j ˆ c) The reconstruction method used produces a function f with f (ξ) small for R R | | ξ > b. | | Nonuniqueness theorems for the divergent beam x-ray transform have been proved in [48, 93]. A generalization to the general k-plane transform has been given in [42]. 4 Inversion and Ill-posedness Calder´on’s operator Λ is defined in terms of Fourier transforms by (Λϕ)∧(ξ) = ξ ϕˆ(ξ), ϕ C∞(IRn). | | ∈ 0 It is extended by duality to the class of functions f for which (1+ x )−1−nf is integrable | | [14]. Note that Λ2 = ∆, ∆ = Laplacian. (9) − For n 2, Λ−1, the inverse of Λ, is given by convolution with the Riesz kernel R : 1 ≥ Λ−1f = R f, R (x) = (π Sn−2 )−1 x 1−n. (10) 1 1 ∗ | | | | In dimension n = 1 we have Λf = ∂f, where ∂f denotes the derivative of f and H H denotes the Hilbert transform 1 f(t) f(s) = dt (11) H π s t ZIR − where the integral is understood as a principal value. We can formally derive an inversion formula for Pf by combining Theorem 2.2 and theinverseFouriertransform. Forsimplicitywefirstconsiderdimensionn = 2. Usingthe Fourierinversionformula,Theorem2.2,therelation(4)andchangingtopolarcoordinates we obtain f(x) = (2π)−1 fˆ(ξ)ei<x,ξ>dξ ZIR2 2π ∞ = (2π)−1 σfˆ(σθ⊥)ei<x,σθ⊥>dσdϕ Z0 Z0 2π ∞ = (4π)−1 σ fˆ(σθ⊥)ei<x,σθ⊥>dσdϕ | | Z0 Z−∞ 1 2π ∞ = (2π)−3/2 σ P f(σ)eiσ<x,θ⊥>dσdϕ (12) θ 2 | | Z0 Z−∞ d 6 1 2π = (2π)−3/2 (ΛP f)∧(σ)eiσ<x,θ⊥>dσdϕ θ 2 Z0 ZIR 2π = (4π)−1 ΛP f(< x,θ⊥ >)dϕ (13) θ Z0 1 2π ∂P f(s) θ = dsdϕ (14) 4π2 < x,θ⊥ > s Z0 ZIR − In the last step we made use of the relation Λg = ∂g mentioned above. H For general dimension n one uses the change of variables g(ξ)dξ = Sn−2 −1 η g(η)dηdθ (15) ZIRn | | ZSn−1ZΘ⊥| | ([89, Formula (9.2’)] ), and obtains −1 f(x) = 2π Sn−2 ΛP f(E x)dθ (16) θ Θ⊥ (cid:16) | |(cid:17) ZSn−1 where E x denotes the orthogonal projection of x onto the subspace Θ⊥. Θ⊥ If we use the backprojection operator P♯ defined by P♯g(x) = g(θ,E x)dθ, (17) Θ⊥ ZSn−1 then (16) assumes the compact form −1 f(x) = 2π Sn−2 P♯ΛPf(x). (18) | | (cid:16) (cid:17) An inversion formula for the Radon transform can be derived in a similar way. For other inversion formulas see [62, II.2]. § From equation (14) we see that computation of f(x) requires integrals over lines far from x, because the Hilbert transform kernel has unbounded support. Note that P f(< x,θ⊥ >) is the integral over the line with direction θ which passes through x. θ Hence the inversion formula is not “local”. A local inversion formula would utilize only values P f(s) with s close to < x,θ⊥ >. We will discuss what can be done with local θ formulas in a later section. Equation (12) gives us valuable information about the stability of the inversion. The factor σ in the inverse Fourier integral will become arbitrarily large. This means that | | the inversion is unstable. In practice measurement and discretization errors will prevent accurate computation of P f(σ) for large σ , and these errors are then amplified by θ | | multiplication with σ . In other words, due to the integration in P, Pf is smoother than | | d f itself. The inversion has to reverse this smoothing and this makes it unstable. The 7 extent of this instability will depend on the amount of smoothing inherent in P. This can be quantified using Sobolev norms. For functions f with compact support we define 1/2 f = (1+ ξ 2)α fˆ(ξ) 2dξ Hα k k 0 (cid:18)ZIRn | | | | (cid:19) 1/2 Pf = dθ dη (1+ η 2)α P f(η) 2 . α θ k k (cid:18)ZSn−1 ZΘ⊥ | | | | (cid:19) d Then we have Theorem 4.1 [62, p. 42] If f C∞ is supported in the unit ball, then there are constants ∈ 0 c(α,n),C(α,n) such that c(α,n) f Pf C(α,n) f . k kH0α ≤ k kα+21 ≤ k kH0α Hence the operator P smoothes by an order 1/2 measured in a Sobolev scale. In order to see what the instability might mean in practice we assume that we have measured data gǫ such that Pf gǫ ǫ, and a-priori information about f of the form f ρ. k − kL2≤ k kH0β≤ For β > 0 this excludes highly oscillatory functions, so this condition corresponds to condition a) in Remark 3.4. Let f , f be two candidate functions for reconstruction, i.e., 1 2 f ,f both satisfy the a-priori condition and Pf gǫ ǫ. We are interested to know 1 2 k i− kL2≤ by how much f and f can differ. Since P(f f ) 2ǫ and f f 2ρ, we 1 2 k 1− 2 kL2≤ k 1− 2 kH0β≤ have the worst case error f f d(ǫ,ρ,β), with 1 2 L2 k − k ≤ d(ǫ,ρ,β) = sup f : Pf 2ǫ, f 2ρ . k kL2 k kL2≤ k kH0β≤ n o A natural choice for β is such that functions which are smooth except for jump discontinuities along smooth boundaries are in Hβ. This leads to the condition β < 1/2 0 [62, p. 92]. For the limiting case β = 1/2 the worst case error satisfies ([62, p.94]) 1 d(ǫ,ρ, ) c(n)√ǫρ. 2 ≤ This means that the reconstruction problem is moderately ill-posed. We expect a gain of 2k digits in data accuracy to yield k additional accurate digits in the reconstruction. In other words, the instability in the reconstruction causes a loss of half the number of accurate digits. Another approach to quantify the degree of ill-posedness is provided by the singular value decomposition of P [60]. Here one looks at how fast the singular values converge to zero. Again, the assessment of moderate ill-posedness is confirmed. 8 In order to use the inversion formula in practice we have to stabilize it. This involves a well-known trade-off between stability and accuracy of the reconstruction. Here we give up the goal of recovering the function f itself, and aim instead at reconstructing an approximation e f, where e is an approximate delta function. As the computation ∗ belowshows, stabilization requirestheFouriertransformeˆ(ξ)todecaysufficientlyfastfor large ξ . The price to pay for the stabilization is limited resolution, so e must be chosen | | carefully, depending on the amount and accuracy of the available measurements. Note also that a proper choice of e helps to satisfy the condition c) for avoiding the influence of nonuniqueness given in Remark 3.4. As we will see later, it is sometimes advantageous to reconstruct Λmf instead of f, with m > 1 an integer. The case m = 0 of course yields an approximation to the − function f itself. Using the convolution theorem (1) for the Fourier transform we obtain in a similar way as above e Λmf(x) ∗ = eˆ(ξ) ξ mfˆ(ξ)ei<x,ξ>dξ ZIRn | | = Sn−2 −1 η m+1eˆ(η)fˆ(η)ei<x,η>dηdθ | | ZSn−1ZΘ⊥| | = (2π)−1 Sn−2 −1 η m+1(Pθe)∧(η)(Pθf)∧(η)ei<EΘ⊥x,η>dηdθ | | ZSn−1ZΘ⊥| | = (k P f)(E x)dθ, m 1, (19) θ Θ⊥ ZSn−1 ∗ ≥ − with the convolution kernel k(y) = (2π Sn−2 )−1Λm+1P e(y), y Θ⊥. (20) θ | | ∈ If e is a radial function, then P e and the convolution kernel k are independent of θ. θ A corresponding formula for the Radon transform can be derived by using polar coordinates in IRn instead of (15). For rigorous proofs and general conditions on e and f for which (19) is valid see [48], [90] and [59]. Of greatest interest are the case m = 0, which gives the formulas for reconstructing the function f itself, and the cases m = 1. ± Letting e δ yields the exact inversion formula → Λmf(x) = (2π Sn−2 )−1 Λm+1P f(E x)dθ. (21) θ Θ⊥ | | ZSn−1 A desirable property would be the possibility of local reconstruction, i.e., reconstruc- tion at a point should require only lines passing through a small neighborhood of that 9 point. Since the parameters θ and y Θ⊥ of a line passing through a point x must ∈ satisfy the equation E x = y, reconstruction according to (19) will be local if the Θ⊥ kernel k is supported in a small neighborhood of the origin. However, for m even and ˆ e(x)dx = 0, k is not analytic, so k cannot have compact support. Hence ordinary IRn 6 tomography is global, not local. On the other hand, it follows from (20) and (9) that R k has compact support if m 1 is odd and e has compact support. This explains ≥ − the interest in the cases m = 1. Computing Λ−1f(x) consists of taking the average ± of all integrals over lines passing through x. This was done in early imaging techniques preceding CT. However, the result is a very blurry image of f which by itself is of limited usefulness; see the bottom left picture in Fig. 1. Current local tomography, reviewed below, avoids this disadvantage by computing a linear combination of Λf and Λ−1f. If f is supported in the unit ball, and the source points a lie on a sphere A with center in the origin and radius R > 1, then the approximate inversion formula for the divergent beam x-ray transform reads [90] e Λmf(x) = R−1 D f(θ) < a,θ > k(E (x a)) dθda, (22) a Θ⊥ ∗ ZAZSn−1 | | − with m 1 and k as in (20). ≥ − Weconcludethissectionwithafewremarksonreconstructionalgorithms. Thefiltered backprojection algorithm is the most popular reconstruction method. It is a computer implementation of the approximate reconstruction formulas (19) and (22) for parallel- beam and fan-beam sampling, respectively. We will discuss it in detail in a later section. For references on the filtered backprojection algorithm see, e.g., [49]. The Fourier reconstruction algorithm uses the Fast Fourier transform to compute P f(η) = √2πfˆ(η), η Θ⊥, j = 0,...,P 1. θj ∈ j − d ˆ In 2D this gives values of f on a polar grid. These are now interpolated onto a rectan- gular grid and a 2D inverse FFT is used to obtain f. This is much faster than filtered backprojection, but the interpolation is problematic, i.e., prone to cause artifacts in the reconstructed image. For further discussion and references on methods to overcome these drawbacks see [66]. Algebraic methods do not discretize an inversion formula or use the projection slice theorem, but start from an ansatz f(x) = N c ψ (x) and then solve the linear system i=1 i i P N c P ψ (y ) = g , j,k = 1,2,... i θj i k jk i=1 X for the unknown coefficients c . Here g = P f(y ) are the measured data. Often the i jk θj k basis functions are the characteristic functions of pixels or voxels, but this is of course not the only choice. Indeed, the advantage of such methods lies in their flexibility, e.g., in incorporating irregular sampling geometries or available a-priori information on f. The 10