ebook img

Optimization and Data Analysis in Biomedical Informatics PDF

199 Pages·2012·3.438 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Optimization and Data Analysis in Biomedical Informatics

Fields Institute Communications VOLUME 63 The Fields Institute for Research in Mathematical Sciences FieldsInstituteEditorialBoard: CarlR.Riehm,ManagingEditor EdwardBierstone,DirectoroftheInstitute MatheusGrasselli,DeputyDirectoroftheInstitute JamesG.Arthur,UniversityofToronto KennethR.Davidson,UniversityofWaterloo LisaJeffrey,UniversityofToronto BarbaraLeeKeyfitz,OhioStateUniversity ThomasS.Salisbury,YorkUniversity NorikoYui,Queen’sUniversity TheFieldsInstituteisacentreforresearchinthemathematicalsciences,locatedin Toronto,Canada.TheInstitutesmissionistoadvanceglobalmathematicalactivity intheareasofresearch,educationandinnovation.TheFieldsInstituteissupported bytheOntarioMinistryofTraining,CollegesandUniversities,theNaturalSciences and Engineering Research Council of Canada, and seven Principal Sponsoring Universities in Ontario (Carleton, McMaster, Ottawa, Toronto, Waterloo, Western andYork),aswellasbyagrowinglistofAffiliateUniversitiesinCanada,theU.S. andEurope,andseveralcommercialandindustrialpartners. Forfurthervolumes: http://www.springer.com/series/10503 Panos M. Pardalos • Thomas F. Coleman Petros Xanthopoulos Editors Optimization and Data Analysis in Biomedical Informatics 123 TheFieldsInstituteforResearch intheMathematicalSciences Editors PanosM.Pardalos ThomasF.Coleman CenterforAppliedOptimization DepartmentofMathematics DepartmentofIndustrialand UniversityofWaterloo SystemsEngineering Waterloo,ON,Canada UniversityofFlorida Gainesville,FL,USA and PetrosXanthopoulos LaboratoryofAlgorithmsandTechnologies DepartmentofIndustrialEngineering forNetworksAnalysis(LATNA) andManagementSystems NationalResearchUniversity UniversityofCentralFlorida HigherSchoolofEconomics Orlando,FL,USA Moscow,Russia ISSN1069-5265 ISSN2194-1564(electronic) ISBN978-1-4614-4132-8 ISBN978-1-4614-4133-5(eBook) DOI10.1007/978-1-4614-4133-5 SpringerNewYorkHeidelbergDordrechtLondon LibraryofCongressControlNumber:2012939726 ©SpringerScience+BusinessMediaNewYork2012 Thisworkissubjecttocopyright.AllrightsarereservedbythePublisher,whetherthewholeorpartof thematerialisconcerned,specificallytherightsoftranslation,reprinting,reuseofillustrations,recitation, broadcasting,reproductiononmicrofilmsorinanyotherphysicalway,andtransmissionorinformation storageandretrieval,electronicadaptation,computersoftware,orbysimilarordissimilarmethodology nowknownorhereafterdeveloped.Exemptedfromthislegalreservationarebriefexcerptsinconnection with reviews or scholarly analysis or material supplied specifically for the purpose of being entered and executed on a computer system, for exclusive use by the purchaser of the work. Duplication of this publication or parts thereof is permitted only under the provisions of the Copyright Law of the Publisher’slocation,initscurrentversion,andpermissionforusemustalwaysbeobtainedfromSpringer. PermissionsforusemaybeobtainedthroughRightsLinkattheCopyrightClearanceCenter.Violations areliabletoprosecutionundertherespectiveCopyrightLaw. Theuseofgeneraldescriptivenames,registerednames,trademarks,servicemarks,etc.inthispublication doesnotimply,evenintheabsenceofaspecificstatement,thatsuchnamesareexemptfromtherelevant protectivelawsandregulationsandthereforefreeforgeneraluse. While the advice and information in this book are believed to be true and accurate at the date of publication,neithertheauthorsnortheeditorsnorthepublishercanacceptanylegalresponsibilityfor anyerrorsoromissionsthatmaybemade.Thepublishermakesnowarranty,expressorimplied,with respecttothematerialcontainedherein. Printedonacid-freepaper SpringerispartofSpringerScience+BusinessMedia(www.springer.com) Preface As science and society as a whole become more and more information intensive, there is an urgentneedto develop,create,and applynew algorithmsandmethods to model, manage, and interpret this information. This is nowhere more evident than in biomedicine, where clinicians and scientists are routinely faced with conflicting (sometimes contradictory) sources of knowledge, in addition to the overwhelming and ever increasing stream of data. Bioinformatics and the -omics (genomics,proteomics,etc.)heraldtheadventofaneweraandanewparadigmfor scientificand,inparticular,biomedicalresearch.Togetherwiththetoolsdeveloped inoptimizationtheoryandthemathematicalsciences,weareatacrossroads,where amorefundamentalunderstandingofbiologicalprocessesiswithinourgrasp.This understanding will certainly pave the way for a more systematic attack on the mechanicsof diseases, as opposed to a naive treatmentof their symptoms(which hasbeenthehallmarkofclassicalmedicine).Itseemsclearthatthereisanurgent need in biomedicine for new methods that will make sense out of clinical and experimentaldata thatcan be used to learnand generaterationalhypothesesfrom thedataandhencetoadvancetheunderlyingdisciplines. In this volume we cover some of the topics that are related to this emerging and rapidly growing field. In June 11–12, 2010, we organizeda Workshop on Optimization and Data Analysis in Biomedical Informaticsat the Fields Institute. Followingthiseventwegatheredinvitedcontributionsbasedonthetalkspresented at the workshop and additional invited chapters from world leading experts. We asked the authors to share their expertise in the form of state-of-the-art research andreviewchapters.Ourgoalwastobringtogetherresearchersfromdifferentareas andemphasizethevalueofmathematicalmethodsintheareasofclinicalsciences. This volume is targeted to applied mathematicians, computer scientists, industrial engineers, and clinical scientists who are interested in exploring emerging and fascinatinginterdisciplinarytopicsofresearch.Wehopethatthisbookwillstimulate and enhance fruitful collaborations between scientists from different disciplines. TheeditorswouldliketoacknowledgetheFieldsInstitutefortheirfinancialsupport v vi Preface and hospitality. In addition, we would like to thank all the authors of the invited chapters as well as Mrs. Debbie Iscoe for her valuable help during the editing of thisvolume. Gainesville,FL PanosM.Pardalos Waterloo,ON ThomasF.Coleman Orlando,FL PetrosXanthopoulos Contents NovelBiclusteringMethodsforRe-orderingDataMatrices................ 1 PeterA.DiMaggioJr., AshwinSubramani,andChristodoulos A.Floudas ClusteringTimeSeriesDatawithDistanceMatrices........................ 41 OnurS¸erefandW.ArtChaovalitwongse MathematicalModelsofSupervisedLearningandApplication toMedicalDiagnosis............................................................. 67 RobertaDeAsmundisandMarioRosarioGuarracino Predictive Model for Early Detectionof Mild Cognitive ImpairmentandAlzheimer’sDisease.......................................... 83 EvaK.Lee,Tsung-LinWu,FeliciaGoldstein,andAllanLevey StrategiesforBiasReductioninEstimationofMarginalMeans withDataMissingatRandom.................................................. 99 BaojiangChenandRichardJ.Cook Cardiovascular Informatics: A Perspective on Promises andChallengesofIVUSDataAnalysis........................................ 117 IoannisA.KakadiarisandE.GerardoMendizabalRuiz An Introductionto the Analysis of FunctionalMagnetic ResonanceImagingData........................................................ 131 Gianluca Gazzola, Chun-An Chou, Myong K. Jeong, andW.ArtChaovalitwongse SensoryNeuroprostheses:FromSignalProcessingandCoding toNeuralPlasticityintheCentralNervousSystem.......................... 153 FivosPanetsos,AbelSanchez-Jimenez,andCeliaHerrera-Rincon vii viii Contents EEGBasedBiomarkerIdentificationUsingGraph-Theoretic Concepts:CaseStudyinAlcoholism........................................... 171 VangelisSakkalisandKonstantinosMarias MaximalConnectivityandConstraintsintheHumanBrain............... 191 RomanV.Belavkin Novel Biclustering Methods for Re-ordering Data Matrices PeterA.DiMaggioJr.,AshwinSubramani,andChristodoulosA.Floudas Abstract Clusteringoflarge-scaledatasetsisanimportanttechniquethatisused for analysis in a variety of fields. However,a number of these methods are based on heuristics for the identification of the best arrangement of data points. In this chapter, we present rigorous clustering methods based on the iterative optimal re-ordering of data matrices. Distinct Mixed-integer linear programming (MILP) modelshavebeenimplementedtocarryoutclusteringofdensedatamatrices(such as gene expression data) and sparse data matrices (such as drug discovery and toxicology).Wepresentthecapabilityoftheoptimalre-orderingmethodsonawide arrayofdatasetsfromsystemsbiology,moleculardiscoveryandtoxicology. Mathematics Subject Classification (2010): Primary 54C40, 14E20, Secondary 46E25,20C20 Theproblemofdataclusteringisprevalentacrossanumberofdisciplinessuchas image processing [39], pattern recognition [3], microarraygene expression [27], informationretrieval [68] andproteinstructureprediction [60,74,86].In general, theaimofanyclusteringapproachistoidentify“similar”elementsinthedataset, andtoorganizeitsothatelementswithsimilarattributesarebroughttogether. The mostcommonapproachesto clusteringcanbe categorizedas hierarchical [27]orpartitioning [35]clusteringalgorithms.Althoughalgorithmstoidentifythe optimalsolutionstothesecategoriesofproblemsdoexist[8,71,72],mostalgorithms P.A.DiMaggioJr. DepartmentofMolecularBiology,PrincetonUniversity,Princeton,NJ08540,USA e-mail:[email protected] A.Subramani•C.A.Floudas((cid:2)) DepartmentofChemicalandBiologicalEngineering,Princeton University,Princeton,NJ08540,USA e-mail:[email protected];fl[email protected] P.M.Pardalosetal.(eds.),OptimizationandDataAnalysisinBiomedicalInformatics, 1 FieldsInstituteCommunications63,DOI10.1007/978-1-4614-4133-5 1, ©SpringerScience+BusinessMediaNewYork2012 2 P.A.DiMaggioJr.etal. endupwithsuboptimalsolutionsbecauseoftheuseofheuristicsearchtechniques andtheidentificationoflocalsolutions.Whileanumberofapproacheslikemodel- basedclustering [26,84],neuralnetworks [40],simulatedannealing [44],genetic algorithms [9,66], decomposition-based clustering [76–78], information-based clustering [73]anddataclassification [14,63]havebeenproposedinliterature,the fieldof rearrangementclusteringhasrecentlyemergedasa veryusefulalternative methodforminimizingthesumofpairwisedistancesbetweenrowsandcolumnsto reachthe optimalsolution.Ithasbeenshownthatthisproblemcan beformulated as an instance of the traveling salesman problem (TSP), which can be solved to optimality [53,54]. Abiclusterisdefinedasasubmatrixoftheoriginalmatrix,whichspansasubset ofrowsandcolumns.Thisway,commonelementscouldbesharedamonganumber ofbiclusters.ThisproblemhasbeenclassifiedasanNP-hardproblem [16].Anex- ampleoftheapplicationofbiclusteringmethodsisthestudyofdownstreameffects ofglobalchangesinregulatedgeneexpression,asmeasuredbyDNAmicroarrays. The aforementioned clustering techniques would fail to uncover genes which are involved in more than one biological process or which are co-expressed under limited conditions [82].This isbecause in anattemptto generatebiclusters,most algorithmseithersimplifytheproblemrepresentationoremployheuristicmethods. A number of biclustering algorithms have been presented in literature. The Cheng and Church algorithm [16] iteratively solves mean square residue based optimization problem using greedy heuristics. This provides a measure of the differencebetweentheactualvalueofanelementanditsexpectedvaluebasedonits positioninthedatamatrix.Sincethisalgorithmdoesnottransformthedata,itallows fortheintegrationofotherdatatypes.Theplaidmodel[82]expressesdataasaseries ofadditivelayers,whilethespectramodel [50]identifieseigenvectorswhichreveal the existence of checkerboard structures within the data matrix by using singular value decomposition. For a given factorization rank, the nsNMF method [15] usesnon-negativematrixfactorizationwithnon-smoothnessconstraintstoidentify biclusters. The biclustering methods Bimax [65] and Samba [79] discretize the expression level which allows them to enumerate a large number of biclusters in lesstimethanmorecomplicatedmodels.Tocomplementtheassortmentofproblem representationsforbiclustering,therehavebeenavarietyofalgorithmicapproaches developed to solve these models of varying complexity, such as zero-suppressed binary decision diagrams [85], evolutionary algorithms [10,25], Markov chain MonteCarlo [67],bipartitegraphs [79],and0-1fractionalprogramming [13].An excellentreviewofdifferentbiclusterdefinitionsandbiclusteringalgorithmscanbe foundin [58]. One of the main applicationsofsparse matrix clusteringis in the field of Drug discovery. Drug discovery is a tedious and expensive process, involving several phasesfromtargetidentificationtoclinicaltrials [62].Oneofthebottlenecksinthis process is the identification of potential drug compounds,normally small organic moleculesorpeptides,thatcanachievemultipledesiredbiologicalproperties [57]. Finding such lead molecules can be highly difficult even with the assistance of combinatorialchemistry and high-throughputscreening [7,38]. For example, if a singlemolecularscaffoldhasN substituentsiteswithS distinctfunctionalgroups

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.