ebook img

NASA Technical Reports Server (NTRS) 20120010233: From Science to e-Science to Semantic e-Science: A Heliosphysics Case Study PDF

0.29 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview NASA Technical Reports Server (NTRS) 20120010233: From Science to e-Science to Semantic e-Science: A Heliosphysics Case Study

Computers&Geosciences](]]]])]]]–]]] ContentslistsavailableatSciVerseScienceDirect Computers & Geosciences journal homepage: www.elsevier.com/locate/cageo Review From science to e-Science to Semantic e-Science: A Heliophysics case study Thomas Narocka,b,n, Peter Foxc aAdnetSystems,Inc,Rockville,MD,USA bNASA/GoddardSpaceFlightCenter,HeliosphericPhysicsLaboratory,USA cRensselaerPolytechnicInstitute,TetherlessWorldResearchConstellation,USA a r t i c l e i n f o a b s t r a c t Articlehistory: Thepastfewyearshavewitnessedunparalleledeffortstomakescientificdatawebaccessible.TheSemantic Received30November2010 Web has proven invaluable in this effort; however, much of the literature is devoted to system design, Receivedinrevisedform ontologycreation,andtrialsandtribulationsofcurrenttechnologies.Inordertofullydevelopthenascent 15November2011 fieldofSemantice-Sciencewemustalsoevaluatesystemsinreal-worldsettings.Wedescribeacasestudy Accepted16November2011 withinthefieldofHeliophysicsandprovideacomparisonoftheevolutionarystagesofdatadiscovery,from manualtosemanticallyenable.Wedescribethesocio-technicalimplicationsofmovingtowardautomated Keywords: andintelligentdatadiscovery.Indoingso,wehighlighthowthisprocessenhanceswhatiscurrentlybeing Semantice-Science donemanuallyinvariousscientificdisciplines.OurcasestudyillustratesthatSemantice-Scienceismore SemanticWeb than just semantic search. The integration of search with web services, relational databases, and other Ontology cyberinfrastructure is a central tenet of our case study and one that we believe has applicability as a Provenance generalizedresearchareawithinSemantice-Science.Thiscasestudyillustratesaspecificexampleofthe benefits, and limitations, of semantically replicating data discovery. We show examples of significant reductionsintimeandeffortenablebySemantice-Science;yet,wearguethata‘‘complete’’solutionrequires integratingsemanticsearchwithotherresearchareassuchasdataprovenanceandwebservices. &2011ElsevierLtd.Allrightsreserved. Contents 1. Introduction........................................................................................................1 2. Setting:NASAHeliophysicsdataenvironment.............................................................................2 3. Casestudy .........................................................................................................3 3.1. Science......................................................................................................3 3.2. e-Science ....................................................................................................3 3.3. Semantice-Science.............................................................................................3 4. CurrentSemantice-Sciencelimitationsandinfrastructuralchallenges..........................................................4 4.1. Provenance...................................................................................................4 4.2. Ontologymappingandontologyintegration ........................................................................5 5. Summaryandconclusions.............................................................................................5 References .........................................................................................................6 1. Introduction aidsindatadiscoveryandintegrationandseveralimplementations canbefoundintheliterature(Madinetal.,2007;Neumann,2005; ManyorganizationsareleveragingtheInformationAge(Heyand Foxetal.,2007).Specifically,withinthefieldofheliophysicsthree Trefethen, 2005) to move online to find new avenues of reaching SemanticWebenabledsearchandintegrationsystemsareavailable their consumers. The past few years have witnessed unparalleled (McGuinnessetal.,2007a;Narocketal.,inpress). efforts (Dalton, 2007) to make scientific data web accessible However, much of the literature is devoted to system design, especiallyinthephysicalsciences.TheSemanticWeb(Berners-Lee ontologycreation,andtrialsandtribulationsofcurrenttechnologies etal.,2001)hasproveninvaluableinthiseffort.TheSemanticWeb (e.g.Madinetal.,2007;McGuinnessetal.,2007a;Narocketal.,in press). Moreover, Semantic e-Science is emerging as its own discipline (Fox and Hendler, 2009a; McGuinness et al., 2009) in nCorresponding author at: NASA/Goddard Space Flight Center, Code 672, which intelligent applications automate scientific research tasks. GreenbeltMD20771,USA.Tel.:þ13012861086. E-mailaddress:[email protected](T.Narock). Theaforementionedimplementationanddeploymentstudiesserve 0098-3004/$-seefrontmatter&2011ElsevierLtd.Allrightsreserved. doi:10.1016/j.cageo.2011.11.018 Pleasecitethisarticleas:Narock,T.,Fox,P., Fromsciencetoe-SciencetoSemantice-Science:AHeliophysicscasestudy.Computers& Geosciences(2011),doi:10.1016/j.cageo.2011.11.018 2 T.Narock,P.Fox/Computers&Geosciences](]]]])]]]–]]] asvaluablefirststeps;yet,weneedtocomplementthesetechnol- discovery methods. Following the recent trend of online data ogy demonstrations in order to move Semantic e-Science main- discovery, and wanting to retain the uniqueness of each of these stream. We need to investigate the socio-technical implications of sub-disciplines, NASA, in 2007, commissioned several federated semanticsearchandtheeffectsonendusers.Technologyresearch informationsystems.Fiveinformationsystemswerefundedserving and user behavior are not dichotomous, they are inseparable (Lee, thesolar,heliospheric,3magnetospheric,4Earth’sradiationbelts,and 2000).InordertofullydevelopSemantice-Sciencewemusttakea Earth’s upper atmosphere communities. Each of these systems is design-scienceapproach(Hevneretal.,2004)inwhichweinvesti- responsible for providinguniform accessto their respective under- gatenotonlyhowasystemworks,butalsowhyitworks.Evaluation lyingsourcesofheterogeneousanddistributeddata.Thesesystems ofsystemsinreal-worldsettingsisinvaluabletothisprocess. arebroadlyknownasVirtualObservatories,aparadigm(Szalayand ArecentNationalScienceFoundationreport(Cummingsetal., Gray, 2001) that began in astronomy and quickly spread to helio- 2008) concluded that ‘‘few studies integrate the distinct social, physics, oceanography, volcanology, and other diverse scientific organizational,andinfrastructuredimensionsofdynamicdistrib- communities. Specifically, the Virtual Observatory paradigm unites uted collaborations’’. The Cummings report (2008) also touches largequantitiesofdisparateandheterogeneousdatausuallyunder onthephysicalsciencesasakeyapplicationareaforthistypeof one web-based portal. The underlying data remain heterogeneous study.In regards tosemantictechnologies,thephysicalsciences and distributed, yet common metadata, access protocols, and are amongst the early adopters of semantic technologies (Finin terminology provide transparent access to users. Within the NASA andSachs,2004)andservetobenefitgreatlyfromit. Heliophysics domain the five Virtual Observatories are supported In this work we describe a case study within the field of by numerous data analysis and visualization capabilities,5,6 that Heliophysicsandprovideacomparisonoftheevolutionarystages enabletheprovisionofadiverseHeliophysicsdataenvironment. of scientific research. Specifically, we start with a recent Helio- The systems within NASA’s Heliophysics Data Environment physics study conducted under traditional means—manual implement search capabilities relevant to their domain. For inspection of large quantities of data and personal interaction example, the Virtual Solar Observatory,7 dealing primarily with with data producers and other domain experts. We label this solar images, focuses on optical search parameters such as effort science as it describes the norm in heliophysics over the wavelengthandintensity.TheVirtualHeliosphericObservatory,8 pastfewdecades.ThedatadiscoveryportionofthisHeliophysics by contrast, deals primarily with in situ time series data. In a study is then reproduced using multiple online search and similarmannertheremainingNASAVirtualObservatoriesimple- retrieval systems. This effort is termed e-Science and describes ment search capabilities analogous to the types of data they the recent trendto automate data discovery using webservices, contain. databases,andotherInternettechnologies.Finally,wereproduce The diversity of scientific data and numerous methods by thedatadiscoveryasecondtimeusinganewsemanticsearchand which to relate it makes for a challenging search problem. retrievalsystem.WelabelthisscenarioSemantice-Science. Previous research (Merka et al., 2008a; King et al., 2008) into Thesciencescenarioisusedasabaseline.Itprovidesuswith the applicability of traditional information retrieval techniques an estimate of the status quo. It tells us the processes by which implemented in contemporary internet search engines (e.g. theheliophysicistsconductedtheirresearchandtheresultsthat Google, Yahoo, etc.) has shown that relevance scoring, such as they achieved. Comparing the Semantic e-Science results with PageRank(Brinand Page,1998),whilesuccessful inwebsearch, scienceande-Scienceallowsustoseehowwellsemanticsearch fails in scientific search scenarios as there are few predefined enhancesthescientificprocess.Inotherwords,weareprimarily connections between data. Any connections, if they exist, are focusedonhowwellsemanticsearchenhancestraditionalscien- determinedduringtheresearchprocessbydomainexperts. tific data discovery. We are interested in the socio-technical The recently emerged microformats9 are also impractical for implications of moving toward automated intelligent discovery scientific search. Specifically, microformats offer a means to andweseektounderstandhowwellthisprocessenhanceswhat augmentXHTMLwithwell-definedsemantics.Microformatshave iscurrentlybeingdonemanuallyinvariousscientificdisciplines. seenextensiveusagelatelyasthenumberofwebsitesusingthem The novelty of this research is that we begin to understand the hasbeenestimatedtobeinthehundredsofmillions.10However, broaderimpactsofmovingSemantice-Sciencemainstreamandin theuseofmicroformatsimpliessearchoverwebpages,whichis doing so we are able to come full-circle and report on the not how search is conducted within the sciences (Merka et al., relationshipbetweensemantictechnologiesandtheirefficacyin 2008a, King et al., 2008).The aforementionedlack of predefined scientificresearch.Wealsoprovideempiricalevidenceinsupport connections between data and the fact that scientific data, and ofseveralfoundingpropositionsofSemantice-Science. associatedmetadata,areoftennotstoredaswebpagesprecludes Specifically we aim to address the following questions: How theuseofmicroformats.Inasimilarmanner,populartechniques efficiently and effectively does semantic search allow us to suchsocialtaggingandcollaborativefiltersarealsonotrelevant facilitate ingrained data discovery techniques in one physical formanyscientificsearchandretrievalsystems.Tagsandrecom- sciencearea?Doessemanticsearcharriveatthesame(orbetter) mendations can be useful in scientific collaborations; however, conclusionsasresearchersusingmanualdatadiscoverymeans?Is theyareprimarilyusefulforinitialdiscoveryandlatterevaluation semanticsearchbyitselfsufficientfordatadiscoveryinscientific of fitness for use, and not the primary means of data retrieval. research?Whataffectdoessemanticsearchhaveontheresearch Rather, users are supplying a set of constraints on the data and process?Whatgeneralstatementsofapplicabilitycanbemadeto the system is identifying time periods and data sets in which otherscienceareas? those constraints are met. Thus, solutions involving domain 2. Setting:NASAHeliophysicsdataenvironment 3TheregionofthesolarsystemaffectedbytheSun. 4TheregionofspaceencompassedbytheEarth’smagneticfield. TheNASAHeliophysicsdataenvironment1,2isbrokenintoseveral 5http://hpde.gsfc.nasa.gov/hpde_data_access.html. sub-disciplineseachwiththeirspecializedinstrumentationanddata 6http://spdf.gsfc.nasa.gov/. 7http://virtualsolar.org. 8http://vho.nasa.gov. 1http://dx.doi.org/10.1029/2009EO470001. 9http://microformats.org. 2http://hpde.gsfc.nasa.gov/. 10 http://microformats.org/blog/2007/06/21/microformatsorg-turns-2/. Pleasecitethisarticleas:Narock,T.,Fox,P., Fromsciencetoe-SciencetoSemantice-Science:AHeliophysicscasestudy.Computers& Geosciences(2011),doi:10.1016/j.cageo.2011.11.018 T.Narock,P.Fox/Computers&Geosciences](]]]])]]]–]]] 3 semantics and the use of logical inference are the ideal solution Table1 formostscientificsearchusecases. Spatio-temporalrestrictionsusedbySlavinetal.(2002)andMerkaetal.(2008b). Combiningthediversityofdata,thediversityofsearchtypes, Therestrictionsforthequeriesusedinourcasestudy. and the need to provide uniform online access we arrive at a 1. Measurementsoftypemagneticfieldoccurringintherange scenarioripeforsemanticsearch.Semantictechnologies,suchas (cid:3)150o¼GSMX(Re)o¼15 the Web Ontology Language (OWL) (McGuinness and van 2. Measurementsoftypethermalplasmaoccurringintherange Harmelen, 2004b) utilize the meaning of the data as well as (cid:3)15o¼GSMX(Re)o¼(cid:3)5 3. Measurementsoftypemagneticfieldoccurringintherange relationships between constituent data. The ability of semantic (cid:3)5o¼GSMX(Re)o¼(cid:3)2.5 search to enable subsumption, inheritance, and inferencing cap- 4. Magneticfieldandthermalplasmameasurementsoccurringinthe abilities helps overcome data heterogeneity and terminology range20o¼GSMX(Re)o¼50 differences and allows encoding of previously unconnected con- 5. Restrictions1.,2.,and3.mustalsoberestrictedtotherange cepts.Thelackofthesecapabilitiesinsyntacticsearchmakesits (cid:3)10o¼GSMY(Re)o¼10 6. Restriction4.mustalsoberestrictedtotherange utilizationinscientificdatadiscoverylimited. (cid:3)40o¼GSMY(Re)o¼40 Inparticular,twooftheNASAsystems(theVirtualHeliospheric Observatory (VHO) and the Virtual Magnetospheric Observatory (VMO)) now use formal ontologies to discover and integrate data (Narocketal.,inpress).Moreover,thesetwosystemswerespecifi- 3.2. e-Science callydesignedtosharetechnologyandinfrastructure(Merkaetal., 2008a)inordertoreducecostsandshortendevelopmenttime.Asa While automated web technologies can reduce much of the result,operationsofeachofthesesystemsarecompletelydrivenby aforementionedworkloadthequestionremainsastotheefficacy the ontologies. In other words, there is one underlying software of such a system and technology’s ability to enhance the search infrastructurewiththetwodomainsemanticscompletelycaptured process.Toexaminethisinacontrolledsettingwefirstreplicated in independent formal ontologies. Simply pointing the software (Merka et al., 2008b) the Slavin (2002) study using the initial to one of the ontologies changes the domain that is searched. versions of VHO and VMO. The initial systems used purely McGuinnessetal.previouslydeveloped(2007a)asimilarframework, relational database technology converting web form input into which may hint at a convergence of semantic deployment meth- Structured Query Language (SQL) queries. While these systems odologies.However,thequestionsposedinSection1regardingthe hadmanyadvantages,reducingthesearchtimeto(cid:2)2h,theyalso efficacyofthesesystemsstillremainlargelyunanswered. highlightedseveralsocio-technicallimitations. The VHO and VMO integrate data within two distinct helio- physics sub-domains. However, heliophysics research questions do not always fall into one sub-domain or the other. Rather, in 3. Casestudy today’s world of cross-disciplinary and system science, research questions often encompass resources from multiple domains. 3.1. Science Thus,researchersthinkintermsofcomplexinterrelatedsystems whilemanagersareoften forcedto deploymultiplecomponent- TheSlavinetal.(2002)studythatisthebasisofourcasestudy based IT systems that are more manageable. Moreover, despite represents research questions that are common within the field of best efforts, users are often unaware of such divisions. In our Heliophysics. Moreover, the Slavin (2002) study requires multiple particular case, our spatio-temporal requirements span the sub- typesofdatatobeavailableduringastringentsetofspatio-temporal domains covered by VHO and VMO. A standardized messaging conditions. Specifically, the Slavin study specifies a spatial config- framework (Narock and King, 2008) allows results from one urationoffourspacecraftwhilespecificinstrumentsonboardthese system to be saved and used as a query in the other system. spacecraftareoperatingandcollectingdata.Anyfourspacecraftare However,usersmustfirstbeawareoftheexistenceandcapabil- acceptable as long as they meet the spatio-temporal restrictions ities of both systems. Subsequently, the researcher must artifi- listed in Table 1. All restrictions must occur simultaneously. The cially decompose their query into multiple questions of the spatial restrictions are listed using the Geocentric Solar Magneto- respectivesystems—aprocessthatisforeigntomostresearchers. spheric (GSM) coordinate system—an Earth-centered system com- Other limitations come behind the scenes of the information moninHeliophysics.Suchaconfigurationofspacecraftisrareand systems. In a relational search system all relationships must be heliophysicistsdonotknowitsoccurrenceintervalsapriori. stated explicitly. This simple fact has two far-reaching conse- As a result, Slavin and colleagues (2002) selected the most quences. First, it limits the discovery of new knowledge. Users likely subset of spacecraft that might fit their needs. While this mayfindrelationshipsunbeknownsttothem;however,theywill selection was based upon extensive experience within the not find a relationship that was not pre-defined by system domain it nevertheless highlights the gap between data sources maintainers.Semanticsearchalleviatesthislimitationandoffers domain experts are familiar with and data sources that are the potential for knowledge discovery—a capability essential to regularly emerging and appearing online. In recent years this scientific research. Second, the system maintainer who defines, gap has been widening (Gray et al., 2005) to the point of being approves,andcommitsthesepre-definedrelationshipsservesasa acknowledgedinthepopularmedia(Hotz,2009).Withmore,and bottlenecktothesystem.Thispersonslowsdowntheadditionof increasinglyvoluminous,dataconstantlycomingonline,knowing newdata,whichinturnslowsdownthescientificprocess. whattosearchisbecomingintractable.Thisisincreasingtheneed forsearchingeneralandsemanticsearchinparticular. 3.3. Semantice-Science Slavin’s team manually investigated 16 months of an initial data source to undercover 43 possible events. Gradually, these The data search portion of the Slavin (2002) study was events were whittled away as the remaining spatio-temporal once again reproduced. By replacing the e-Science structured restraintswereadded.Alltold,2useableeventswhereidentified SQL queries with inferences over domain ontologies we sought after (cid:2)100hofmanuallabor.11 to identify which areas of the scientific research process are adequately addressed by semantic search and which parts 11Privatecommunicationwiththestudy’sfirstauthorDr.JamesSlavin. may still be lacking. For this process, we utilized the SPASE Pleasecitethisarticleas:Narock,T.,Fox,P., Fromsciencetoe-SciencetoSemantice-Science:AHeliophysicscasestudy.Computers& Geosciences(2011),doi:10.1016/j.cageo.2011.11.018 4 T.Narock,P.Fox/Computers&Geosciences](]]]])]]]–]]] Such is certainly the case in semantic eScience. The addition of basic Semantic Web technologies – ontologies, reasoner, and softwaretoolkit(Jena15)–reduces thedatadiscovery process in ourapplicationsfrom (cid:2)2hto (cid:2)15min.Suchareductionoccurs because we no longer need to manually visit 2 sites and more importantlywenolongerneedtoartificiallybreakourqueryinto sub-domainspecific questions. In the Semantic e-Science imple- mentation incoming queries are converted from a standard XML representation (Narock and King, 2008) into SPARQL (Prud’hommeauxandSeaborne,2008),asemanticquerylanguage approvedasaW3C16Recommendation.Itisatthisstagethatthe benefits of Semantic e-Science become apparent. Using our domain ontology, which spans all of Heliophysics, we can infer towhichsub-domaineachcomponentofthequeryrefers.Further inferences can then be made as to which information system withintheNASAdataenvironmentaddressesthissub-domain.At present,thiscapabilityislimitedtotheVHOandVMOastheyare the only systems utilizing semantics. Nevertheless, this limited Fig. 1. Graphical depiction of system complexity within the e-Science and Semantice-Sciencescenarios. capability illustrates an advantage over e-Science—namely the artificialdecompositionofqueries. ontology12 with its VHO13 and VMO14 instantiations. These ontologies are OWL encodings (Narock et al., in press) of the 4. CurrentSemantice-Sciencelimitationsandinfrastructural XML-based schema developed by the Space Physics Archive challenges Search and Extract (SPASE) consortium (Harvey et al., 2008). Specifically, these ontologies have Description Logic (Baader While basic Semantic Web technologies reduced the search etal.,2003)expressivityALCOIN(D),35classes,20objectproper- time by orders of magnitude they are not sufficient to offer a ties,45datatypeproperties,andontheorderof20,000instances. ‘‘complete’’answer.Thatis,theSemantice-Scienceapproachdid The aforementioned SPASE consortium, which consists of notretrievealloftheresultsfoundwithinthescienceapproach. heliophysicsresearchers,softwaredevelopersanddataproviders, While the Semantic e-Science approach offered previously was founded to aid in the integration of heliophysics data. The unknown events to ponder, the semantic search only returned intentofSPASEistocreateametadataschemathatwouldleadto oneofthetwooriginalSlavinevents. standardized descriptions of NASA Heliophysics data resources. More importantly the results lacked provenance information WithinthiscasestudytheOn-To-Knowledgemethodology(Staab regardingcertainaspectsofthequery. et al., 2001) was used (Narock et al., in press) to continuously While the semantic search correctly identified data sets of convertreleasesoftheSPASEmetadatamodelintoOWL.Itshould interest, calls to services that search the underlying numerical benotedthatknowledgeacquisitionandknowledgerepresenta- valuesfailedtoturnuponeofthetwooriginalevents.Thiswas tion have historically been challenging and time consuming due to the resolution of data that the service searched over and efforts.However,On-To-Knowledge(Staabetal.,2001)andother wasnotafaultofanysemanticinferencing.However,thishigh- emerging methodologies (e.g. Benedict et al., 2007; Fox and lights the importance of coupling semantic search to related McGuinness, submitted for publication) are beginning to lead to researchinwebservicesandSemanticWebservices.Inorderto quick deployment, rapid community buy-in, and reduced effort be completely successful these technologies need to be compo- ontheend-userspart. nents in frameworks rather than independent streams of We utilized the final versions of the VHO and VMO search research. This observation is consistent with previous research systems that now utilize formal ontologies (Narock et al., in (Lim et al., 2010), which noted that e-Science lacks ‘‘a single press). Fig. 1 illustrates the how the VHO and VMO search backbone that can support the entire spectrum of requirements systemswereaugmentedbytheadditionofsemantics. thatareessentialtoundertakecross-disciplinaryscientificcolla- EvidentfromFig.1isthatminimalchangesweremadetothe borations.’’Additionally,Abbott(2009)and(GoodmanandWong, system architecture. Specifically, the system now consults the 2009) have commented that current and future scientific chal- aforementioned ontologies prior to constructing SQL queries. lenges can only be addressed with a focus on systems thinking Domainlogiciscapturedwithintheontologiesandcanbeeasily and generalized frameworks, respectively. Such groundwork has manipulatedandinferencedwithusingSemanticWebtools.This been laid (Fox et al., 2009b) and is being advanced by many isoppositeoftheinitialsystemdesigninwhichthisinformation groupsintheEarthandSpaceSciencescommunities. hadtobehard-codedintothequeryalgorithms.Previousresearch has shown (Narock et al., in press) that such a scenario reduces 4.1. Provenance complexity, query execution time, and system maintenance. In addition,wecanposeourcompletequerytothesystem,haveit Also missing from the semantic search results is provenance answertheportionsitiscapableofanswering,andinfertowhich information. One definition of provenance is ‘‘the description of othersystemtosendtheremainingportionsofthequery. theoriginsofapieceofdataandtheprocessbywhichitarrivedin In what has become a catch phrase of the Semantic Web, a database’’ (Buneman et al., 2001, p. 316). Metadata does exist Hendler (2007) mused, ‘‘A little Semantics Goes a Long Way’’. foralldatasetsreturnedbythesystems,andthisinformationis readilyavailabletotheuser.However,subtletiesintheexecution 12http://vho.nasa.gov/ontology/spase.owl. 13http://vho.nasa.gov/ontology/vho.owl. 15 http://jena.sourceforge.net/. 14http://vho.nasa.gov/ontology/vmo.owl. 16 http://w3c.org/. Pleasecitethisarticleas:Narock,T.,Fox,P., Fromsciencetoe-SciencetoSemantice-Science:AHeliophysicscasestudy.Computers& Geosciences(2011),doi:10.1016/j.cageo.2011.11.018 T.Narock,P.Fox/Computers&Geosciences](]]]])]]]–]]] 5 ofqueriescanextrapolatetothepointthatresultsareambiguous Observatory (VSTO) (McGuinness et al., 2007a, Fox et al., or even unusable. For example, our query required a spatial 2009b). Specifically, Slavin (2002) utilized an auxiliary data restrictionthatwasexpressedbyakeyword—‘‘Near-EarthHelio- source in order to confirm initial assumptions. Such an effort sphere’’. Within the domain semantics this term has specific could have been replicated via VSTO data; yet, that system meaning and one can utilize various services to determine if a operateswithitsownindependentontologyandassociatedweb spacecraftisinsideoroutsideofthisspatialregion.However,how services(Foxetal.,2007).Atpresentthereisnoformalmapping thedataintheservicewasgeneratedisabsentfromtheresult,as betweenVSTOandVHO/VMOontologies.Thisisnotacriticismof isalmostalwaystrueforcurrentweb-basedsearchengines;only eithersystem.Rather,itisanempiricalconfirmationofthebroad limited provenance on the search result itself is often available. tendency (Cummings et al., 2008) to create one-off systems The lack of formal identification that provenance metadata is without taking time or effort for harmonization and as a result indeedrelatedto,andmustbereturnedwith,asearchresultisa the lack of interoperability (in this case, semantic interoperabil- seriousomissioninthesearchforrelevantscientificdata.Absence ity)canhindersystem-levelscience. of this information further leads to questions of accuracy as the Semantic e-Science needs to have a concerted effort to stan- computationalmodels(e.g.Peredoetal.,1995;MerkaandSzabo, dardize,align,andmapontologies.Suchneedshaverecentlybeen 2004) that predict the spatial region have inherent biases, recognized with several National Science Foundation projects assumptions, and errors known to the scientists who are most devoted to semantic interoperability, such as the SONET (Madin likely to have initiated the search. This particular case is easily etal.,2007)project.AlsotheSemanticeScienceFramework(SESF; remediedbysupplyingmoreinformationtotheresults,however, Fox et al., 2009b; McGuinness et al., 2010) project mentioned asqueriesbecomemorecomplexthesesubtlepiecesofinforma- earlierhasamongitsgoalstoenableconfigurablesemanticdata tionincreaseandtheuser’scompleteunderstandingoftheresults frameworks using modular approaches to ontologies and the diminishes. bridging of disciplines as well as application level integration, Provenance is an actively researched topic (Pinheiro da Silva also using ontologies to encode meaning and relations for such et al., 2004) that is now finding its place on the Semantic Web applications (Rozell et al., 2010). Further, SESF advances the key (McGuinness et al., 2007b), and specifically within semantic notionofsemanticallyawareapplication-leveltoolsasanessen- eScience—known as knowledge provenance (Fox et al., 2008a, tialpartofthesemanticecosystem,integratingnotonlydataand 2008b;Zedniketal.,2009;Zedniketal.,inpress),andweseethis informationsourcesbutprovenanceaswell. asakeycomponenttoprovidinguserswitha‘‘complete’’answer to their queries. Complete in the sense that essential explana- tions, verificationsand justificationsare given to satisfy a scien- 5. Summaryandconclusions tist’squestionsaboutanyreturnedresult. In the area of knowledge provenance, the encoding of prove- Admittedly, one case study can only begin to address the nance interlinguas such as the Proof Markup Language (PML; questionssurroundingtheemergenceofSemantice-Science.Yet, Pinheiro da Silva et al., 2004; McGuinness et al., 2007b) and the our case study has shed light on emerging issues and provides Open Provenance Model (Moreau et al., 2011) as ontologies empiricalevidenceinsupportofsomeofthefoundingtenets(Fox (Michaelis and McGuinness, 2010) brings new expressive power and Hendler, 2009a) of Semantic e-Science. We now return to to the important but previously disconnected provenance meta- considerourinitialresearchquestions. data.However,languagesforencodingarereallyonlyafoundation for provenance. What really address a scientist’s curiosity are (cid:4) Howefficientlyandeffectivelydoessemanticsearchallowus application tools that work with the encoded provenance. At tofacilitateingraineddatadiscoverytechniques? present these tools include provenance specific search browse (cid:4) Does semantic search arrive at the same (or better) conclu- and visualization (McGuinness and Pinheiro da Silva, 2004a) as sionsasresearchersusingmanualdatadiscoverymeans? well as integrated applications such as provenance-aware smart faceted search (Fox et al., in preparation). These tools work primarily with PML but similar tools are geared to work with Semanticsearchandbasicsemantictechnologies(i.e.RDF/OWL, OPM(e.g.theKARMAsystem;Caoetal.,2009).Manyofthesetools reasoners,andtoolkits)areprovidinggreatbenefittothescientific however are still clumsy for users to use and much work to research process. As evidenced by this case study these basic improvethemremains.Ofparticularsignificanceforconvergence technologies allow one to replicated previous studies in fractions inthisareaofweb-basedprovenanceistheformationandactivity of the time. While this is but one of many possible metrics ofaW3Cincubatorgroupforprovenance.17 (McGuinnessetal.,2007c)itisonethatbusyscientistsoftencare This progress not-withstanding, ‘complete’ answers to user about most. Moreover, these technologies do not require the queriesrequireamulti-domainknowledgebasethatisaninter- artificial decomposition of complex questions. Reasoning and section and super-set of domain knowledge, provenance, and inferencing capabilities are providing great benefit to both the oftenotherdataproductrelatedinformation(Zedniketal.,2009; user and the system maintainer. This study, as well as previous Foxetal.,inpreparation). work(Narocketal.,inpress),hasshownevidenceofeasiersystem maintenanceandeasierqueryexecutionwhentransitioningfrom e-Science to Semantic e-Science although further quantitative 4.2. Ontologymappingandontologyintegration studiesarerequiredtosubstantiatethisclaim. While semantic search is a vital component of Semantic The VHO and VMO share a common base ontology (Narock e-Science it is not the only component. Interspersed withontol- et al., in press) and as a result queries passed amongst the two ogies are non-semantic technologies such as Web Services and systemsareeasilyinterchangeableandexecutable.However,we relationaldatabases.Howandwhenthesetechnologiesinteractis foundthisnottobethecasewithinthebroadercommunity.Our justbeginningtobeunderstood.Forexample,provenanceinfor- casestudy querycould havebenefittedfromother datasources, mation is often thought of as the lineage of the data and some- specifically ones available via the Virtual Solar–Terrestrial thingtopresentattheendofexecution.Yetthebenefitsofusing provenancetoenhancethesearchprocesscanbeseeninprevious 17http://www.w3.org/2005/Incubator/prov/charter. research (Fox et al., in preparation) as well as in evidence Pleasecitethisarticleas:Narock,T.,Fox,P., Fromsciencetoe-SciencetoSemantice-Science:AHeliophysicscasestudy.Computers& Geosciences(2011),doi:10.1016/j.cageo.2011.11.018 6 T.Narock,P.Fox/Computers&Geosciences](]]]])]]]–]]] presentedearlierinthispaper.Similarscenariosshouldbesought we have shown, one can closely replicate manual discovery infutureresearch. methods,yeta‘‘complete’’answerwillrequiretheentireSeman- Semanticsearchreturnedmixedresultswithinourcasestudy. ticWebstackandthefullemergenceofSemantice-Science. While it pointed us toward previously unconsidered data it also failedtofindoneoftheeventsfromthescienceapproach.Again, this was not the result of the search processes itself, but rather References the ongoing challenges of connecting semantic search to other cyberinfrastructurecapabilities. Abbott,M.R.,2009.Anewpathforscience?Inthefourthparadigm:data-intensive scientificdiscovery.MicrosoftResearch,111–116. (cid:4) Issemanticsearchbyitselfsufficientfordatadiscoveryin Baader,F.,Calvanese,D.,McGuinness,D.L.,Nardi,D.,Patel-Schneider,P.F.,2003. The Description Logic Handbook: Theory, Implementation, Applications. scientificresearch? CambridgeUniversityPress,Cambridge,UK. Benedict,J.L.,McGuinness,D.L.,Fox,P.,2007.ASemanticWeb-basedmethodology Thiscasestudyindicatesthatsemanticsearchasimplemented forbuildingconceptualmodelsofscientificinformation.AmericanGeophysi- calUnion(FallMeeting(AGU2007)(EosTransactionAGU88(52),FallMeeting in our case study is insufficient for ‘‘complete’’ data discovery. Supplement,AbstractIN53A-0950)). This conclusion is reached for a number of reasons. First, the Berners-Lee, T., Hendler, J., Lassila, O., 2001. ‘‘The Semantic Web’’. Scientific massivequantitiesofscientificdata(Grayetal.,2005)thatapply American(May17). Berners-Lee, T., 2006. Artificial Intelligence and the Semantic Web, AAAI 2006, toHeliophysics,makeitunlikelythatall,possiblynotevenmost, /http://www.w3.org/2006/Talks/0718-aaai-tbl/Overview.htmlS#(14). datavalueswillresideasontologyinstances;letaloneasLinked Bizer,C.,Heath,T.,Berners-Lee,T.,2009.Linkeddata—thestorysofar.Interna- Data(Bizeretal.,2009).Thus,semanticsearchisonlyasgoodas tionalJournalonSemanticWebInformationSystems5(3),1–22.doi:10.4018/ jswis.2009081901. the cyberinfrastructure services to which it connects. This was Brin, S., Page, L., 1998. The anatomy of a large-scale hypertextual Web search evidenced in our study where the ontology returned the correct engineSource.ComputerNetworksandISDNSystems30(1–7)(April). data set, however, data services were unable to retrieve all Buneman,P.,Khanna,S.,Tan,W.-C.,2001.WhyandWhere:ACharacterizationof DataProvenance,DatabaseTheory,LectureNotesinComputerScienceBook relevant time periods. Second, provenance plays an important Series,vol.1973.Springer,Berlin/Heidelberg. role in semantic search. Without the complete Semantic Web Cao,B.,Plale,B.,GirishS.,Robertson,E.,Simmhan,Y.,2009.Provenanceinforma- stack (Berners-Lee, 2006, i.e. Provenance, Rules, and Trust) the tion model of karma, version 3. In: Proceedings of the IEEE 2009 Third capabilitiesofsemanticsearchappeartobeartificiallylimited. InternationalWorkshoponScientificWorkflows(SWF’09). Cummings,J.,Finholt,T.,Foster,I.,Kesselman,C.,Lawrence,K.,2008.BeyondBeing There:ABlueprintforAdvancingtheDesign,Development,andEvaluationof (cid:4) What affect does semantic search have on the research Virtual Observatories, Final Report From Workshops on Building Effective process? VirtualOrganizations,NationalScienceFoundation,May,2008.Availableat: /http://www.ci.uchicago.edu/events/VirtOrg2008S. Dalton,R.,2007.Geophysicistscombineforces.Nature447(7148),1037. Our case study has shown quantitative benefits in query Finin, T., Sachs, J., 2004. Will the Semantic Web change science? Science Next execution time. Similar results were found in a previous study Wave.AmericanAssociationfortheAdvancementofScience(September15, 2004). (Szabo et al., 2009) with researchers conducting studies in a Fox, P., Zednik, S., West, P. Semantic provenance in support of explaining, fractionofthetimepreviouslyrequired.Webelievethatthiswill justifyingandverifyingsciencedataontheweb.JournalofWebSemantics, slowly,butsteadily,acceleratethepaceofheliophysicsresearch. inpreparation. Fox, P., Hendler, J., 2009a. Semantic e Science: encoding meaning in next- Time, money, and resources saved on one research problem can generationdigitallyenhancedscience,inthefourthparadigm:data-intensive bedevotedtootherproblems.Thecumulativeeffectisaccelera- scientificdiscovery.MicrosoftResearch,147–153. tioninthecreationofnewknowledge. Fox,P.,McGuinness,D.L.,Cinquini,L.,West,P.,Garcia,J.,Benedict,J.,Middleton,D., 2009b. Ontology-supported scientific data frameworks: the virtual solar- terrestrial observatory experience. Computers and Geosciences 35 (4), (cid:4) Whatgeneralstatementsofapplicabilitycanbemadetoother 724–738. scienceareas? Fox,P.,McGuinness,D.L.,PinheirodaSilva,P.,Zednik,S.,Garcia,J.,DingL.,DelRio, N., Chang, C., 2008a. Semantic provenance for image data processing. In: ProceedingsofGeoinformaitcs2008DatatoKnowledge,June. The benefits of using provenance to enhance the search Fox,P.,McGuinness,D.L.,PinheirodaSilva,P.,2008b.Knowledgeprovenancein processhavebeenevidencedwithinthisstudyaswellasinother virtualobservatories:applicationtoimagedatapipelines.In:Proceedingsof domains (Fox et al., in preparation). This tenet extends the theInternationalSemanticWebConference. Fox,P.,Cinquini,L.,McGuinness,D.,West,P.,Garcia,J.,Benedict,J.L.,Zednik,S., traditionaldatalineageviewofprovenanceandappearstohave 2007.SemanticWebServicesforinterdisciplinaryscientificdataqueryand generalapplicability.Moreover,Semantice-Scienceismorethan retrieval. In: Proceedings of the AAAI Workshop on Semantic eScience. justsemanticsearch.Theintegrationofsearchwithwebservices, doi:10.11.144.7441. Fox, P. McGuinness, D.L., 2011b. An Open-world iterative methodology for the relational databases, and other cyberinfrastructure, has applic- developmentofsemantically-enabledapplications.ComputersandGeosciences, abilityasageneralizedresearchareawithinSemantice-Science. submittedforpublication. Massivedatavolumes(Grayetal.,2005)prohibit,oratleastdeem Gray, J., Liu, D., Nieto-Santisteban, M., Szalay, A., DeWitt, D., Heber, G., 2005. ScientificDataManagementintheComingDecade,ACMSIGMODRecord,vol. it very unlikely, that all information will reside as ontology 34,Issue4December2005,(pp.34–41). instances or Linked Data. Thus, specific and targeted research Goodman,A.A.,Wong,C.G.,2009.Bringingthenightskycloser:discoveriesinthe intocomponentandframework-basedsystemsisvitallyneeded. data deluge, in the fourth paradigm: data-intensive scientific discovery. MicrosoftResearch,39–44. ThefutureofSemantice-Scienceisverypromising.However, Harvey,C.,Gangloff,M.,King,T.,Perry,C.,Roberts,D.,Thieman,J.,2008.Virtual there are numerous challenges that still need to be overcome. observatoriesforspaceandsolarphysicsresearch.EarthScienceInformatics Semanticsearchispresentlyoneofthemaindrivers,however,it 1(1),5–13. cannotexistinisolation.Weneedtomovebeyondclosedsystems Hendler,J., 2007.Thedarksideof theSemantic Web.IEEE IntelligentSystems (January/Febuary). tobroadersemanticaccess.Dataaccess,integration,provenance, Hevner, A.R., March, S.T., Park, J., Ram, S., 2004. Design science in information andprocessingservicesallneedsemanticrepresentation(Foxand systemsresearch.MISQuarterly28(1),75–105(March). Hendler, 2009a). This case study has shown the benefits, and Hey, T., Trefethen, A.E., 2005. Cyberinfrastructure for e-Science. Science 308 (5723),817–821(May). limitations,ofsemanticallyreplicatingdatadiscoveryinonecase Hotz,R.,2009.Adatadelugeswampssciencehistorians.TheWallStreetJournal studyinthephysicalsciences.Ithasattemptedtoprovideamuch (August 28, 2009) available at:/http://online.wsj.com/article/ needed extension to the current technological discussion. SB125139942345664387.htmlS. King,T.,Narock,T.,Walker,R.,Merka,J.,Joy,S.,2008.Abravenew(virtual)world: Namely, we have investigated the socio-technical implications distributedsearches,relevancescoringandfacets.EarthScienceInformatics ofdeployingsemantictechnologiesinreal-worldapplications.As 1(1).doi:10.1007/s12145-008-0002-7(April). Pleasecitethisarticleas:Narock,T.,Fox,P., Fromsciencetoe-SciencetoSemantice-Science:AHeliophysicscasestudy.Computers& Geosciences(2011),doi:10.1016/j.cageo.2011.11.018 T.Narock,P.Fox/Computers&Geosciences](]]]])]]]–]]] 7 Lee,A.,2000.Systemsthinking,designscience,andparadigms:heedingthreelessons Moreau,L.,Clifford,B.,Freire,J.,Futrelle,J.,Gil,Y.,Groth,P.,Kwasnikowska,N., fromthepasttoresolvethreedilemmasinthepresenttodirectatrajectoryfor Miles,S.,Missier,P.,Myers,J.,Plale,B.,Simmhan,Y.,Stephan,E.,denBuss-che, futureresearchintheinformationsystemsfield,keynoteaddress.In:Proceedings J.V., 2011. The open provenance model core specification (v1.1). Future oftheEleventhInternationalConferenceonInformationManagement,Taiwan, GenerationComputingSystems27(6),743–756. May. (available online at: /http://www.people.vcu.edu/(cid:2)aslee/ Narock, T., King, T., 2008. Developing a SPASE query language. Earth Science ICIM-keynote-2000S. Informatics1(1).doi:10.1007/s12145-008-0007-2(April,2008). Lim,H.B.,Iqbal,M.,Yao,Y.,Wang,W.,2010.Asmarte-Sciencecyberinfrastructure Narock, T., Yoon, V., Merka, J., Szabo, A., The Semantic Web in federated forcross-disciplinaryscientificcollaborations.AnnalsofInformationSystems information systems: a space physics case study. Journal of Information 11,2010.doi:10.1007/978-1-4419-5908-9(Chapter3). TechnologyTheoryandApplication,inpress. Madin,J.,Bowers,S.,Schildhauer,M.,Drivov,S.,Pennington,D.,Villa,F.,2007.An Neuman,A.,2005.Alifesciencesemanticweb:arewethereyet?ScienceSTKE ontologyfordescribingandsynthesizingecologicalobservationdata.Ecology 2005(283),pe22.doi:10.1126/stke.2832005pe22. Information2(3),279–296. Peredo,M.,Slavin,J.A.,Mazur,E.,Curtis,S.A.,1995.Three-dimensionalposition McGuinness,D.L.,Fox,P.A.,West,P.,Rozell,E.,Zednik,S.,Chang,C.,2010.Progress and shape of the bow shock and their variation with Alfve´nic, sonic and toward a Semantic eScience framework; building on advanced cyberinfras- magnetosonicMachnumbersandinterplanetarymagneticfieldorientation. tructure.EOSTransactions(IN22A-02). McGuinness, D.L., Fox, P., Brodaric, B., Kendall, E., 2009. The emerging field of JournalofGeophysicalResearch100,7907–7916. PinheirodaSilva,P.,McGuinness,D.L.,Fikes,R.E.,2004.Aproofmarkuplanguage semantic scientific knowledge integration. IEEE Intelligent Systems 24 (1), 25–26. forSemanticWebservices.In:Bell,David,Bussler,Christoph,Yang,Jian(Eds.), McGuinness, D.L., Fox, P., Cinquini, L., West, P., Benedict, J., Garcia, J., 2007a. TheSemanticWebandWebservices:InformationSystems.31(4–5),381–395. CurrentandFutureUsesofOWLforScientificDataFrameworks:Successes Prud’hommeaux,E.,Seaborne,A.,2008.SPARQLQueryLanguageforRDF,W3C andLimitations,InOWLED. Recommendation, 15 January 2008, available at: /http://www.w3.org/TR/ McGuinness,D.L.,Ding.L.,PinheirodaSilva,P.,Chang,C.,2007b.PML2:amodular 2008/REC-rdf-sparql-query-20080115/S,lastaccessed30August2009. explanationinterlingua.In:Proceedingsofthe2007WorkshoponExplana- Rozell, E., Fox, P., Maffei, A., Belliau, S., 2010. S2S an application integration tion-awareComputing(ExaCt-2007).Vancouver,Canada,July22–23. frameworkfordistributedoceanographic repositories. EosTransaction AGU McGuinness,D.L.,P.Fox,L.Cinquini,P.West,J.Garcia,J.L.Benedict,D.Middleton, (FallMeetingSupplement,AbstractIN23A-1349). 2007c. The virtual solar-terrestrial observatory: a deployed Semantic Web Slavin,J.A.,Fairfield,D.H.,Lepping,R.P.,Hesse,M.,Ieda,A.,Tanskanen,E.,Østgaard, application case study for scientific research. In the proceedings of the N., Mukai, T., Nagai, T., Singer, H.J., Sutcliffe, P.R., 2002. Simultaneous 19th Conference on Innovative Applications of Artificial Intelligence (IAAI). observationsofearthwardflowburstsandplasmoidejectionduringmagneto- Vancouver,BC,Canada,July2007,pp.1730–1737andAImagazine,29,#1, spheric substorms. Journal of Geophysical Research 107 (A7). doi:10.1029/ pp.65–76. 2000JA003501. McGuinness, D.L., Pinheiro da Silva, P., 2004a. Explaining answers from the Staab, S., Schnurr, H.P., Studer, R., Sure, Y., 2001. Knowledge processes and Semantic Web: the inference web approach. Journal of Web Semantics ontologies.IEEEIntelligentSystems16(1),26–34. 1,397–413. Szabo, A., Merka, J., Narock, T.W., 2009. Multispacecraft observations of solar McGuinness,D.L.,vanHarmelen,F.,2004b.OWLWebOntologyLanguageOver- wind gradients near L1: VHO case studies. American Geophysical Union view, W3C Recommendation, 2004, available at: /http://www.w3.org/TR/ (FallMeeting2009,abstract#SH51B-1271). owl-featuresS/. Szalay, A., Gray, J., 2001. The World-Wide Telescope. Science 293, 2037–2038 Merka,J.,Narock,T.,Szabo,A.,2008a.NavigatingthroughSPASEtoheliospheric (14September2001). and magnetospheric data. Earth Science Informatics 1 (1). doi:10.1007/ Zednik,S.,Fox,P.,McGuinness,DeborahL.,PinheirodaSilva,P.Chang,C.,2009. s12145-008-0004-5(April). Semantic provenance for science data products: application to image data Merka,J.,Szabo,A.,Narock,T.,Walker,R.,King,T.,Slavin,J.,Imber,S.,Karimabadi, H., Faden, J., 2008b. Using the virtual heliospheric and magnetospheric processing. In: Juliana Freire, Paolo Missier, Satya Sanket Sahoo (Eds.), observatoriesforgeospacestudies.EosTransactionsAGU89(53)(FallMeeting Proceedings of the First International Workshop on the Role of Semantic Supplement,AbsractSA51B-05). WebinProvenanceManagement(SWPM2009),WashingtonDC,USA,October Merka,J.,Szabo,A.,2004.Bowshock’sgeometryatthemagnetosphericflanks. 25,2009.CEURWorkshopProceedings526,CEUR-WS.org. JournalofGeophysicalResearch109(A12224).doi:10.1029/2004JA010567. Zednik, S., Fox, P., McGuinness, D.L. System transparency, or how i learned to Michaelis,J.,McGuinness,D.L.,2010.Towardsprovenanceandcollectiveknowl- worry about meaning and love provenance. In: Proceedings of the 3rd edge for web applications. In: Proceedings of the Third International International Provenance and Annotation Workshop (IPAW2010), Troy, NY, ProvenanceandAnnotationWorkshop(IPAW2010),pp.265–273. USA(June15–16,2010),LNCS,inpress. Pleasecitethisarticleas:Narock,T.,Fox,P., Fromsciencetoe-SciencetoSemantice-Science:AHeliophysicscasestudy.Computers& Geosciences(2011),doi:10.1016/j.cageo.2011.11.018

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.