Use R! Jörg Polzehl Karsten Tabelow Magnetic Resonance Brain Imaging Modeling and Data Analysis Using R Use R! Series Editors Robert Gentleman, 23andMe Inc., South San Francisco, USA Kurt Hornik, Department of Finance, Accounting and Statistics, WU Wirtschaftsuniversität Wien, Vienna, Austria Giovanni Parmigiani, Dana-Farber Cancer Institute, Boston, USA Use R! This series of inexpensive and focused books on R will publish shorter books aimed at practitioners. Books can discuss the use of R in a particular subject area (e.g.,epidemiology,econometrics,psychometrics)orasitrelatestostatisticaltopics (e.g., missing data, longitudinal data). In most cases, books will combine LaTeX and R so that the code for figures and tables can be put on a website. Authors shouldassumeabackgroundassuppliedbyDalgaard’sIntroductoryStatisticswith R or other introductory books so that each book does not repeat basic material. More information about this series at http://www.springer.com/series/6991 ö J rg Polzehl Karsten Tabelow (cid:129) Magnetic Resonance Brain Imaging R Modeling and Data Analysis Using 123 Jörg Polzehl Karsten Tabelow WIAS Berlin WIAS Berlin Berlin, Germany Berlin, Germany ISSN 2197-5736 ISSN 2197-5744 (electronic) UseR! ISBN978-3-030-29182-2 ISBN978-3-030-29184-6 (eBook) https://doi.org/10.1007/978-3-030-29184-6 MathematicsSubjectClassification(2010): 62-0792-C55 ©SpringerNatureSwitzerlandAG2019 Thisworkissubjecttocopyright.AllrightsarereservedbythePublisher,whetherthewholeorpart of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission orinformationstorageandretrieval,electronicadaptation,computersoftware,orbysimilarordissimilar methodologynowknownorhereafterdeveloped. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publicationdoesnotimply,evenintheabsenceofaspecificstatement,thatsuchnamesareexemptfrom therelevantprotectivelawsandregulationsandthereforefreeforgeneraluse. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained hereinorforanyerrorsoromissionsthatmayhavebeenmade.Thepublisherremainsneutralwithregard tojurisdictionalclaimsinpublishedmapsandinstitutionalaffiliations. ThisSpringerimprintispublishedbytheregisteredcompanySpringerNatureSwitzerlandAG Theregisteredcompanyaddressis:Gewerbestrasse11,6330Cham,Switzerland Für Louis Pascal, Maurice und Jean-Loup Frederic —Karsten Tabelow To my friends and family —Jörg Polzehl Preface Our interest in neuroimaging started some 20 years ago, initiated by talks given by Fred Godtliebsen (UiT, Tromsø, Norway) and by Fridhjof Kruggel (Max-Planck-Institute of Cognitive Neuroscience, Leipzig) on functional magnetic resonance imaging in the colloquium of the Weierstrass Institute for Applied Analysis and Stochastics (WIAS). At that time, Vladimir Spokoiny and one of the authors of this book, Jörg Polzehl, were interested in imaging problems starting from pure theoretic developments in nonparametric statistics, particularly in adaptivesmoothingforregressionproblems.Thishasleadtointerestingalgorithms for image denoising that rely on qualitative assumptions on the image structure, adapt to this structure, and preserve essential spatial information (Polzehl and Spokoiny 2000). fMRI seemed to be a perfect application for the methodology. It resulted in a paper, Polzehl and Spokoiny (2001), and a successful grant proposal withintheDFGResearchCenterMATHEON.Thisapproachwaspresentedasaposter attheOrganizationofHumanBrainMapping(OHBM)MeetingatSanAntonioin 2000, which led to a first contact with a group of statisticians working on neuroimaging problems and resulted in an invitation by Keith Worsley to partici- pate in a statistics in neuroimaging workshop in Montreal in December 2000. In 2005, the other author of this book, Karsten Tabelow, joints the project. We met Henning U.Vossfrom Weill Cornell Medical College,NewYork,USA.This collaborationwasveryimportant,asHenningisaphysicistandacquiresMRIdata himself with clinical questions in the background. We always felt that the combi- nationofmathematicaldevelopmentalongaspecificscientificquestioniskeytothe success of the project. The first paper of this collaboration (Tabelow et al. 2006) was a complete elaboration of a structural adaptive smoothing procedure for fMRI but it also included the statistical inference on the fMRI activation areas.1 It was importantto show theusefulness ofsuch anapproach inclinical context(Tabelow et al. 2008b), specifically for better alignment of activation regions in presurgical planning. 1AsKeithWorsleynicelystatedinoneofhisreviewsontheaccompanyingRpackagefmri. vii viii Preface HenningU.Vossalsointroducedustodiffusionmagneticresonanceimagingand diffusion models, e.g.,thediffusion tensormodel,butalso to theorientation distri- butionfunction.Thus,westartedtodevelopastructuraladaptivesmoothingmethod fordMRIdatainthecontextofthetensormodel(Tabelowetal.2008a).Presenting the results at the Annual Meeting of the OHBM 2007, we met Michael Deppe (University Hospital, Münster, Germany), Siawoosh Mohammadi and Alfred Anwander(Max-Planck-InstituteofCognitiveNeuroscience,Leipzig),whichstim- ulatedourinterestindMRIanalysis.Thefirstauthorgotthechancetoparticipatein the 2nd International Summer School in Biomedical Engineering on Diffusion- weightedMagneticResonanceImaging:PrinciplesandApplicationsin2007. Together with our Ph.D. student Saskia Becker, we developed the model-free structuraladaptivesmoothingmethodfordMRIdata,whichwecalled(multi-shell) position-orientation adaptive smoothing (msPOAS, Becker et al. 2012, 2014). The motivationfortheseapproachesstemsfromacollaborationwithRemcoDuits(TU Eindhoven,Netherlands).Atthesametime,wealsocontributedtothemodelingof diffusion data byproviding a tensor mixture model with automatic selection ofthe number of tensor components (Tabelow et al. 2012). Meanwhile, Siawoosh Mohammadi moved to the Wellcome Trust Centre for Neuroimaging atUCL.There,we started discussionsonanotherimaging modality based on multiparameter mapping with Nikolaus Weiskopf and Martina F. Callaghan. One of the results of this collaboration is the structural adaptive smoothing method for MPM data (Mohammadi et al. 2017). Accompanying with the methodological development, we always implemented the algorithms in packages for R, specifically aws, fmri, dti, and qMRI and released them on CRAN under a GNU license. However, much more people workedonmethodsforneuroimagingusingR.Around2011,wefelt,togetherwith Brandon Whitcher, that time was ripe to summarize those activities in a special volume of the Journal of Statistical Software on “Magnetic Resonance Imaging in R”, see also Tabelow et al. (2011). At CRAN, the “Medical Imaging Task View” was created by Brandon Whitcher. We used some of the material that finally went into this book for a lecture course on Statistics in Neuroimaging at the Humboldt University Berlin in winter 2010. The plan to write this book dates back to this time. With a workshop on Neuroimaging Data Analysis in 2013 and the Program on Challenges in Computational Neuroscience (CCNS)2 at SAMSI in 2015–2016, both initiated by Hongtu Zhu (University of Chapel Hill, US), the statistics and neuroscience communities tried to further link both disciplines and encourage young statisticians to work on emerging problems at this interface of disciplines. 2SupportedbytheNationalScienceFoundationunderGrantDMS-1127914totheStatisticaland AppliedMathematicalSciencesInstitute. Preface ix In 2017, the Neuroconductor project3 (Muschelli et al. 2019) went online col- lecting and adding many R packages related to neuroimaging and providing an easy-to-use access to them. This really enhanced the usage of R in neuroimaging, e.g., because John Muschelli contributed a huge number of missing implementa- tions as packages. This allowed us to present complete analysis pipelines solely based on R packages in all chapters of this book. Wehavetothankalotofpeople,notonlythosementionedpreviously.Without the collaboration and support, our results and the writing of this book would not havebeenpossible.TheWeierstrassInstituteforAppliedAnalysisandStochastics (WIAS Berlin) provided the perfect environment. We want to specifically thank Jürgen Sprekels for his support of the whole imaging project at WIAS. We are thankful to many colleagues from neuroscience we closely collaborate with, especially André Brechmann and Reinhard König (LIN, Magdeburg), John-Dylan Haynes (Einstein Zentrum Berlin, Charité Berlin), with whom we organized a workshop on “Statistics and Neuroimaging” in 2011, and Christophe Lenglet (CMRR, University of Minnesota, US). They introduced us into real-world prob- lems, explained the physics and neuroscientific background behind, and provided us with extensive data or even performed specific experiments to validate our modeling approaches. This book is about processing and modeling of images (in a general sense) from magnetic resonance imaging. For their implementation, we rely on R (R Development Core Team, 2019), the software environment for statistical com- puting andgraphics. Withthis book, we wantto bridge twooften distinct commu- nities.Itisintendedforstatisticians,whoareinterestedinneuroimagingandlookfor an introduction. On the other hand, we hope it will be suited for neuroimaging students,whowanttolearnaboutthestatisticalmodelingandanalysisofMRIdata. By providing full worked-out examples, the book shall serve as a tutorial for MRI analysiswithR.Wewouldverymuchliketoseefurtherdevelopmentinthefield,as RisalreadycapableofprovidingmostofMRIanalysis. In this book, we cover the theory underlying the modeling and data analysis or data description when we think it is necessary. For all imaging modalities, we outline complete analysis pipelines and illustrate them by worked-out examples. Still, the field of neuroimaging is huge, so the book remains incomplete. For example,wedonotcovergroupfMRIanalysis,butthesecond-levelanalysiscanbe performed using existing R functionality. We do not address clinical diagnostics based on multimodal structural MRI, see, e.g., Crainiceanu et al. (2017). Furthermore, we also dropped the use of machine learning techniques in multi- variate pattern analysis, which still would require some implementation in R to become a feasible analysis tool. We also leave out important diffusion models like NODDI or g-ratio maps for the lack of implementations in R. We are, however, convinced that R provides a rich environment for further developments. 3Neuroconductor and itsmaintainers have beenpartially supportedby theR01 grant NS060910 from the National Institute of Neurological Disorders and Stroke at the National Institutes of Health(NINDS/NIH). x Preface Somewhat parallel to the development and success of the imaging technologies in neurosciences, the reproducibility crisis in science reached the field with the general question on how many of the results are false. In fact, most are (Ioannidis 2005). The dispute raised important issues on the handling of the ever-growing amount of data, i.e., its organizations, description, and publication, as well as the reportingofthemethodsthathavebeenusedfortheanalysisofthedataandleadto the scientific findings. It also met the discussion on the way of the publication of scientific data, analysis, and results promoting Open Access publication, Open Data, and more general Open Science. Specifically, data sharing repositories like OpenNeuro (Stanford Center for Reproducible Neuroscience, 2019; Poldrack and Gorgolewski 2017) or repositories providing access for scientific purposes, see Turneretal.(2016,2017)andalsoAppendixBemerged.Withtheadvanceofnew standards for data organization like the Brain Imaging Data Structure (BIDS) (Gorgolewski et al. 2016) for the organization of MRI data in NIfTI format, such repositories become an easy-to-use source for neuroimaging data to acquire new andinitiallyunintendedscientificresultsor,liketheauthorsofthisbook,todevelop new methodology. They also, together with a complete description of the corre- spondingmetadataandpreferablytheanalysissoftware,enablethereproductionof published results by other scientists than the original authors. Furthermore, for the neuroimaging community appropriate recommendations for reporting the path to thescientificresultsaredeveloped,e.g.,bytheCommitteeonBestPracticesinData Analysis and Sharing (COBIDAS) of the Organization for Human Brain Mapping (OHBM) (Nichols et al. 2017). WithinRaworkflowisavailablethatnaturallysupportsthecreationofscientific reports, e.g., based on LATEX, that are directly linked to the data and the code to produce the results. Originally, the approach was to use R function Sweave (Leisch 2002a,b),whichbycallingRSweave onaLATEXdocumentthat includes R code executes the latter and includes the resulting numbers, statistics, or even figures together with the commands that create them within the document. Later, theknitrpackage(Xie2019)wasdevelopedtosolvesomelong-standingproblems inSweaveandtoserveasatransparentenginefordynamicreportgenerationwith R(Xie2015).Inprinciple,suchadynamicreportiscreatedfromaLATEXdocument that includes so-called chunks of code in a ⋆.Rnw file. R with the knitr package then creates the actual ⋆.tex that contains highlighted code and the results of its execution. The system is flexible enough to work with other programming lan- guages, like Python, Julia, or awk, and produce any output markup, like LATEX, HTML, or Markdown. In an attempt for full reproducibility, this book completely relies on such a dynamic report generation: It uses neuroimaging data publicly available on repositories; the PDF was created running the R code in the included chunks and thenrunLATEXonthe⋆.texmarkupcode.Thus,almostallfigures,numbers,and resultsaregeneratedwhileproducingthePDFfromthesources.Theonlyexception wemadetothisprinciplewasduetothefactthatsomeneuroimageanalysisrunsfor manyhoursorevendays.Themostextremeexampleisthecalltoxfibresforthe ball-and-stickmodel inSect. 5.2.5whichrequiredalmost2weeksofcomputation.