SCALING ONTOLOGY ALIGNMENT by RYANE.FRECKLETON B.S.C.p.E,UniversityofColoradoColoradoSprings,2008 AthesissubmittedtotheGraduateFacultyofthe UniversityofColoradoColoradoSprings inpartialfulfillmentofthe requirementsforthedegreeof MasterofScienceinComputerScience DepartmentofComputerScience 2013 c CopyrightbyRyanE.Freckleton2013 � AllRightsReserved ii ThisthesisfortheMasterofScienceinComputerSciencedegreeby RyanE.Freckleton hasbeenapprovedforthe DepartmentofComputerScience by Dr. JugalKalita,Chair Dr. CharlesShub Dr. LisaHines Dr. SuzetteStoutenburg Date iii iv Freckleton,RyanE.(M.S.C.S.,ComputerScience) ScalingOntologyAlignment ThesisdirectedbyProfessorDr. JugalKalita Abstract As ontologies become more prevalent in biomedicine and other fields, effective on- tology alignment is a necessary for their economical and practical use. An ontology is a groupofconceptsderivedfromacorpusofknowledge. Ontologyalignmentdeterminesthe relationshipsbetweentheseconceptsacrossdifferentontologies. Thereforeontologyalign- mentisanareaofactiveresearch,especiallyscalingontologyalignment,asthenumberand sizeofontologiesincreasesdramatically. This thesis describes an approach and implementation of ontology alignment called Parallel Ontology Bridge, which maintains good alignment quality while increasing scala- bility and speed of ontology alignment by matching linguistic and structural features in a support vector machine. This approach is based on Ontology Bridge [1] and provides the same advantages. It is able to handle non-equivalence relationships very effectively and is a general approach to ontology alignment that can be used across many domains. Parallel OntologyBridgeincreasesscalabilitybyusingmap-reduce,anapproachtobreakingdown problems and running them in parallel. This thesis describes how this is done. Parallel OntologyBridgeisalmosttwoordersofmagnitudefasterthanOntologyBridgeandshows verygoodscalabilitywhilemaintainingqualityasmeasuredthroughF-Measure. TheresultsofParallelOntologyBridgearecomparedagainstseveralotherscalability approaches, both with experimental data and theoretical maximum scalability. Parallel Ontology Bridge is significantly more scalable in the experimental data and maintains this advantageduringtheoreticalanalysis. v vi TomyMontessoriteacher,whoalwaysknewthejoyoflearningandunderstanding. vii viii Acknowledgements I’d like to acknowledge all the people that have positively affected the creation of this thesis. My employer, The MITRE Corporation, my coworkers, my advisory committee, myfamily. I’d like to especially thank Dr. Suzette Stoutenburg. Without her help and previous workinthisareaIwouldnotbeabletocreatethisthesis. I’despeciallyliketothankmymother,IreneFreckletonandfather,GroverFreckleton for their emotional support as well as deep discussions on the concepts of ontology align- ment and graphical presentation. I’d also like to thank my Aunt Karen, whose excitement wasinfectiousandwaywithwordshelpedmakethisthesissuccinctandclear. Myfriend,Dr. GregoryPlettgavemeincomparablehelpandadvicewithtypesetting. TimFlink,myfriendandfellowgraduatestudent,sawarchitecturalissuesIwasblindto. My friend and colleague Dr. Norman Facas gave unparalleled advice on organization andtheappropriatelayoutofgraphsanddata. Thank you Dr. Lisa Hines, for giving me one-on-one attention get up to speed on biology and medicine. Dr. Charlie Shub, thank you for your continued support and fo- cus. Your mentorship during my undergraduate studies prepared me for this thesis and my professionalcareer. Finally, I’d like to thank my advisor Dr. Jugal Kalita. His expertise in artificial intel- ligencehasbeenunparalleled. Withouttheirassistance,feedbackandsupportthiswouldnotbepossibletocomplete. It’sbeenalong,sometimesstressfuljourneyonthispathofknowledge. Iappreciateallthat you’vedoneforme. Thankyou. ix x
Description: