A Unified Statistical Approach to Modeling the Shape of Human Hyoid Bones in CT Images 6 1 Moo K. Chung1,2,3, Nagesh Adluru3, Houri K. Vorperian2 0 1Department of Biostatistics and Medical Informatics 2 2Vocal Tract Laboratory, Waisman Center n a 3Waisman Laboratory for Brain Imaging and Behavior J University of Wisconsin-Madison, USA 4 [email protected] 1 ] P January 15, 2016 A . t a Abstract t s We present a unified statistical approach to modeling 3D anatom- [ ical objects extracted from medical images. Due to image acquisition 1 and preprocessing noises, it is expected the imaging data is noisy. Us- v ing the Laplace-Beltrami (LB) eigenfunctions, we smooth out noisy 4 7 data and perform statistical analysis. The method is applied in char- 4 acterizing the 3D growth pattern of human hyoid bone between ages 3 0 and 20 obtained from computed tomography (CT). We detected a 0 significant age effect on localized parts of the hyoid bone. . 1 0 6 1 Introduction 1 : v For normally developing children, age and gender could be major factors i X that affect the functions and structures of growing hyoid bone. As in other r developmental studies [13, 32], we expect highly localized complex growth a pattern to emerge between ages 0 and 20 in the hyoid bone. It is expected the growth to be outward with respect to the surface of the bone. How- ever, it is unclear what specific parts of the hyoid bone are growing. This provides a biological motivation for a need to develop a local surface-based morphometric technique beyond simple volumetric techniques that cannot detect localized subtle anatomical changes along the hyoid bone surface. The end results of existing surface-based morphometric studies in med- ical imaging are statistical parametric maps (SPM) that shows statistical 1 significance of growth at each surface mesh vertex [13, 27, 39]. In order to obtain stable and robust SPM, various signal smoothing and filtering meth- ods have been proposed. Among them, diffusion equations, kernel smooth- ing, and wavelet-based approaches are probably most popular. Diffusion equations have been widely used in image processing as a form of noise reduction starting with Perona and Malik in 1990’s [25]. Although numer- ous techniques have been developed for performing diffusion along surfaces [12, 2, 29, 28, 22, 30], most approaches are nonparametric and requires fi- nite element or finite difference schemes which are known to suffer various numerical instabilities [11]. Kernel smoothing based models have been also proposed for surface and manifolds data [6, 11]. The kernel methods basically smooth data as weighted average of neighboring mesh vertices using mostly a Gaussian ker- nel and its iterative application is supposed to approximates the diffusion process. Recently, wavelets have been popularized for surface and graph data. Spherical wavelets have been used on brain surface data that has been mapped onto a sphere [23, 7]. Since wavelet basis functions have local supports in both space and scale, the wavelet coefficients from the scale- space decomposition using the spherical wavelets provides shape features that describe local shape variation at a variety of scales and spatial lo- cations. However, spherical wavelets have an intrinsic problem that they require to establish a smooth mapping from the surface to a unit sphere, which introduces a serious metric distortion. The spherical mapping such as conformal mapping introduces serious metric distortion which usually com- pounds SPM. Furthermore, such basis functions defined on sphere seem to be suboptimal rather than those directly defined on anatomical surface, in detecting locations or scales of shape variations. To remedy the limitation of spherical wavelets, spectral graph wavelet transform defined on a graph has been applied to arbitrary surface meshes by treating surface meshes as graphs[3,17,19]. Wavelettransformisapowerfultooldecomposingasignal or function into a collection of components localized at both location and scale. Although all three methods (diffusion-, kernel- and wavelet-based) look different from each other, it is possible to develop a unified framework that relates all of them in a coherent statistical fashion. Starting with a symmetric positive definite kernel, we propose a novel unified kernel regression framework within the Hilbert space theory. The proposed kernel regression works for any symmetric positive definite kernel, which behaves like weights between two functional data. We show how this facilitates a coherent statistical inference for functional signals defined on an arbitrary manifold. The outline of this paper is as follows. (i) First, we 2 Figure 1: CT image showing the location of the hyoid bone and 3D model showing the relative location of the hyoid bone with respect to the mandible (gray) and vocal tract structures (green). present a unified bivariate kernel regression that is related to diffusion-like equations. (ii) We establish the relationship between the kernel regression and recently popular spectral graph wavelets for manifolds. The proposed kernel regression is shown to be equivalent to the wavelet coefficients. This mathematical equivalence levitates a need for constructing wavelets using a complicated computational machinery as often done in previous diffusion wavelet constructions [3, 17, 19]. (iii) A unified statistical inference frame- work is developed for neuroimaging applications by linking the kernel re- gression to the random field theory [31, 38]. (iv) Subsequently, we illustrate how the kernel regression procedure can be used to localize the hyoid bone growth pattern in human. 2 Preliminary First, let us illustrate two statistical problems in an Euclidean space that motivate the development of the proposed kernel regression in manifolds. Consider measurements f sampled at p ∈ Rd. The measurements are i i usually modeled as f = h(p )+(cid:15) i i i with some noise (cid:15) and unknown mean function h that has to be estimated. i In the well known local polynomial or kernel regression frameworks [6, 15, 24], a univariate kernel G(p) is introduced to minimize the weighted least 3 squares of the form n k (cid:88) (cid:12) (cid:88) (cid:12)2 (cid:98)h(p) = arg min G(p−pi)(cid:12)(cid:12)fi− βj(p−pi)j(cid:12)(cid:12) . (1) β0···βk i=1 j=0 Often Gaussian kernels are used for G. In many related local polynomial or kernel regression frameworks, kernel G and polynomial basis pj are trans- lated by the amount of p in fitting the data locally. In this fashion, at i each data point p , exactly the same shape of kernel and distance are used. i However, oneimmediatelyencountersadifficultyofdirectlygeneralizingthe Euclidean formulation (1) to an arbitrary surface since it is unclear how to translate the kernel and basis in a coherent fashion. A similar problem is also encountered in wavelets in a Euclidean space. Consider a wavelet basis W (p) obtained from a mother wavelet W with t,q scale and translation parameters t and q: 1 (cid:0)p−q(cid:1) W (p) = W . (2) t,q t t Scaling a function on a surface is trivial. But the difficulty arises when one tries to define a mother wavelet and translate it on a surface. It is not straightforward to generalize the Euclidean formulation (2) to an arbitrary manifold. To remedy these two different but related problems, we propose to use a bivariate kernel and bypass the problem of translating a univari- ate kernel. By simply changing the second argument, it has the effect of translating the kernel. 3 Methods InmanyanatomicalstudiesinCT,measurementsaresampleddenselyatthe resolution 0.5mm or less so it is more practical to model the measurements as functions. Consider a functional measurement f defined on a manifold M ⊂ Rd. We assume the following additive model: f(p) = h(p)+(cid:15)(p), (3) where h is the unknown signal and (cid:15) is a zero-mean random field, possibly Gaussian. The manifold M can be a single connected or multiple disjoint components as our hyoid bone application. We further assume f ∈ L2(M), the space of square integrable functions on M with the inner product (cid:90) (cid:104)f,g(cid:105) = f(p)g(p) dµ(p), M 4 where µ is the Lebesgue measure. Define a self-adjoint operator L satisfying (cid:104)g ,Lg (cid:105) = (cid:104)Lg ,g (cid:105) 1 2 1 2 forallg ,g ∈ L2(M). ThenLinducestheeigenvaluesλ andeigenfunctions 1 2 j ψ on M: j Lψ = λ ψ . (4) j j j Without loss of generality, we can order the eigenvalues 0 = λ ≤ λ ≤ λ ≤ 0 1 2 ··· . The eigenfunctions ψ form an orthonormal basis in L2(M). Then j from Mercer’s theorem [14], any smooth symmetric positive definite kernel on manifold M can be written as ∞ (cid:88) K(p,q) = τ ψ (p)ψ (q) (5) j j j j=0 for some τ . The constants τ are identified as follows. Apply the kernel j j convolution on the eigenfunction ψ : j (cid:90) K ∗ψ (p) = K(p,q)ψ (q) dµ(q). (6) j j M Substituting (5) into (6), we have K ∗ψ (p) = τ ψ (p) indicating τ and ψ j j j j j must be the eigenvalues and eigenfunctions of the convolution (6). However, the relationship between the two different sets of eigenvalues λ and τ is j j not yet given but will be shown shortly afterward. 3.1 Proposed kernel regression Consider subspace H ⊂ L2(M) spanned by the orthonormal basis {ψ }, k j i.e. k (cid:88) H = { β ψ (p) : β ∈ R}. k j j j j=0 We estimate h by minimizing the weighted distance to the space H : k (cid:90) (cid:90) (cid:12) (cid:12)2 (cid:98)h(p) = arg min K(p,q)(cid:12)f(q)−h(p)(cid:12) dµ(q) dµ(p). (7) (cid:12) (cid:12) h∈Hk M M Without loss of generality, we will assume the kernel to be a probability distribution so that (cid:90) K(p,q) dµ(q) = 1 M for all p ∈ M. The solution of (7) has the following analytic expression: 5 Figure 2: Laplace-Beltrami eigenfunctions ψ of various degrees (j = j 0,1,5,20,100,500) on the template. The first eigenfunction is constant in each component. As the degree increases, the spatial frequency increases. Theorem 1. (cid:90) (cid:90) (cid:12) (cid:12)2 (cid:88)k (cid:98)h(p) = arg min K(p,q)(cid:12)(cid:12)f(q)−h(p)(cid:12)(cid:12) dµ(q) dµ(p) = τjfjψj, h∈Hk M M j=0 where f = (cid:104)f,ψ (cid:105) are Fourier coefficients. j j Proof. SeetheAppendix. Theorem1generalizesthesphericalregression onaunitspheretoanarbitrarymanifold[9]. Theorem1impliesthattheker- nelregressioncanbeperformedbysimplycomputingtheFouriercoefficients f = (cid:104)f,ψ (cid:105)withoutdoinganynumericaloptimization. Thenumericallydif- j j ficult optimization problem is reduced to the problem of computing Fourier coefficients. If the kernel K is a Dirac-delta function, the kernel regression simply collapses to the least squares estimation (LSE) which results in the 6 Figure 3: Heatkernelregressionwithdifferentbandwidthbetween0.1and1000. Asthe bandwidth increases, the kernel regression becomes inversely proportional to the square root of the surface area. standard Fourier series, i.e. (cid:90) (cid:12) (cid:12)2 (cid:88)k (cid:98)h(p) = arg min (cid:12)(cid:12)f(q)−h(q)(cid:12)(cid:12) dµ(q) = fjψj. h∈Hk M j=0 It can be also shown that as k → ∞, the kernel regression k (cid:88) (cid:98)h = τjfjψj j=0 converges to convolution K∗f establishing the connection to the manifold- based kernel smoothing framework [5, 11]. Hence, asymptotically the pro- posed kernel regression should inherit many statistical properties of kernel smoothing. 7 3.2 Properties The kernel regression can be shown to be related to the following diffusion- like Cauchy problem. Theorem2. Foranarbitraryself-adjointdifferentialoperatorL, theunique solution of the following initial value problem ∂g(p,t) +Lg(p,t) = 0,g(p,t = 0) = f(p) (8) ∂t is given by ∞ (cid:88) g(p,t) = e−λjtf ψ (p). (9) j j j=0 Proof. See the Appendix. For a particular choice of kernel K with τj = e−λjt, the proposed kernel regression (cid:98)h = (cid:80)kj=0τjfjψj should converge to the solution of the diffusion-like equation. If we let L be the Laplace- Beltrami operator, (8) becomes an isotropic diffusion equation as a special case and we are then dealing with heat kernel ∞ (cid:88) H (p,q) = e−λjtψ (p)ψ (q), t j j j=0 which is often explored mathematical objects in various fields [5, 11]. In order to construct wavelets on an arbitrary graph and mesh, diffusion wavelet transform has been proposed recently [3, 17, 19]. The diffusion wavelet construction has been fairly involving so far. However, it can be showntobeaspecialcaseoftheproposedkernelregressionandtheproposed method is substantially simpler to construct. Following the notations in [3, 17, 19], diffusion wavelet W (p) at position p and scale t is given by t,p k (cid:88) W (p) = g(λ t)ψ (p)ψ (q), t,q j j j j=0 for some scale function g. If we let τ = g(λ t), the diffusion wavelet trans- j j form is given by (cid:90) k (cid:88) (cid:104)W ,f(cid:105) = W (p)f(p) dµ(p) = τ f ψ (q), t,p t,q j j j M j=0 8 Figure4: TheGibbsphenomenononahatshapedsimulatedsurfaceshowingtheringing effectonthetraditionalFourierseriesexpansion(top)andthereducedeffectontheheat kernel regression (bottom). 7225 basis functions were used for the both cases and the bandwidth t=0.001 is used for the kernel regression. which is the exactly kernel regression we introduced. Hence, the diffusion wavelet transform can be simply obtained by doing the kernel regression without an additional wavelet machinery [19]. Further, if we let g(λ t) = j e−λjt, we have W (q) = H (p,q), t,p t which is a heat kernel. The bandwidth t of heat kernel controls resolution while the translation is done by shifting one argument in the kernel. Althoughthekernelregressionisconstructedusingglobalbasisfunctions ψ , the kernel regression at each point p coincides with the diffusion wavelet j transform at that point. Hence, just like wavelets, the kernel regression will have the localization property of wavelets. This is demonstrated in a toy example in Figure 4 with a hat-shaped surface in 3D. The hat-shaped step function is simulated as z = 1 for x2+y2 < 1 and z = 0 for 1 ≤ x2+y2 ≤ 2. Then the step function is reconstructed using the Fourier series expansion via LSE (top) and kernel regression (bottom). In the both cases, up to 7225 basis functions were used. For the kernel regression, the heat kernel with bandwidth t = 0.0001 is used. LSE clearly shows the visible Gibbs 9 phenomenon, i.e., ringing artifact [8, 16] compared to the kernel regression. 3.3 Numerical Implementation The Laplace-Beltrami operator is chosen as the self-adjoint operators L of choice. TheeigenfunctionsoftheLaplace-Beltramioperatoronanarbitrary curved surface is analytically unknown. So it is necessary to discretize (4) using the Cotan formulation as a generalized eigenvalue problem [41, 26]: Cψ = λAψ, (10) whereCisthestiffnessmatrix,Aisthemassmatrixandψ = (ψ(p ),··· ,ψ(p ))(cid:48) 1 n is the eigenfunction evaluated at n mesh vertices. Once we obtained the ba- sis functions ψ , the corresponding Fourier coefficients β are estimated as j j β = f(cid:48)Aψ , j j where f = (f(p ),··· ,f(p ))(cid:48) and ψ = (ψ (p ),··· ,ψ (p ))(cid:48) [41]. Figure 2 1 n j j 1 j n shows few representative LB-eigenfunctions on the hyoid surface. For heat kernel regression, we used the bandwidth σ = 5 and 500 LB-eigenfunctions on the final template. The number of eigenfunctions used is more than sufficient to guarantee relative error less than 0.3% in our data. 3.4 Statistical Inference We are interested in determining the significance of functional signals on a manifold7. Weborrowthestatisticalparametricmapping(SPM)framework for analyzing and visualizing statistical tests performed on the template surface that is often used in brain image analysis [2, 20, 33, 34, 40]. Since test statistics are constructed over all mesh vertices on the surface, multiple comparisons need to be accounted. For continuous functional data, the random field theory [31, 34, 38] is natural to use. The random field theory assumesthemeasurementstobesmoothGaussianrandomfield. Heatkernel regression will make the data more smooth and Gaussian as well as increase the signal-to-noise ratio [10]. Consider a functional measurements f ,··· ,f on manifold M. In the 1 n simplest statistical setting, the measurements can be modeled as f (p) = h(p)+(cid:15) (p), i i where h is an unknown group level signal and (cid:15) is a zero-mean Gaussian i random field [38]. At each fixed point p, we are assuming (cid:15) ∼ N(0,σ2). i 10