ebook img

Pulsar virtual observatory PDF

0.16 MB·
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Pulsar virtual observatory

Proceedingsofthe363.WE-HeraeusSeminaron:“NeutronStars and Pulsars”(Postersand contributedtalks) Physikzentrum BadHonnef, Germany,May. 14−19,2006, eds.W.Becker,H.H.Huang, MPEReport291,pp.223-225 Pulsar Virtual Observatory M. Keith1, B. Harbulot2, A. Lyne1, and J. Brooke2 1 Jodrell Bank Observatory,UniversityOf Manchester, Macclesfield, Cheshire SK11 9DL, United Kingdom 2 School of ComputerScience, Universityof Manchester, Oxford Road, Manchester M13 9PL, United Kingdom 7 0 0 2 n Abstract. ThePulsarVirtualObservatorywillprovidea 2. Data archives a means for scientists in all fields to access and analyse the J large data sets stored in pulsar surveys without specific The Pulsar Virtual Observatory has not been designed 4 knowledge about the data or the processing mechanisms. with any constraints over the data format that is used, This is achieved by moving the data and processing tools andit is intendedthat processingcancovermultiple data 1 v to a grid resource where the details of the processing are formats seamlessly. Data will be indexed by sky position 1 seen by the users as abstract tasks. By developing intelli- asitisexpectedthatuserswillsearchdataatspecificloca- 0 gent scheduling middle-ware the issues of interconnecting tionsofinterest,andsoaccesstomultipledatacatalogues 1 tasks and allocating resources are removed from the user that cover the same sky area is advantageous. 1 domain.This opensuplargesetsofradiotime-seriesdata 0 to a wider audience, enabling greater cross field astron- Initially the data set that will be made available 7 0 omy, in line with the virtual observatory concept. Imple- via the Pulsar Virtual Observatory will be the Parkes / mentationofthe PulsarVirtualObservatoryisunderway, Multi-beam Pulsar Survey (PMPS) raw data archive h utilising the UK National Grid Service as the principal (Manchester et al. 2001). This is the most complete sur- p - grid resource. vey of the galactic plane to date, and provides an ideal o resourcefor finding new pulsars and other transientradio r phenomenon. Data was collected using the 13 beam HI t s (21cm) receiver on the 64m Parkes telescope in Australia a 1. Background and quickly became the most successful pulsar survey to : v date. The archived data covers the galactic plane from Xi The concept of a Virtual Observatory (VO) is to enable 2600 to 500 in galactic longitude and ±50 of galactic lati- r access to astronomical data archives via a generic inter- tude.Therawdataarchiveisaround4.4terabytesinsize, a face (Quin & Gorski 2004). This interface can be some consisting of approximately 35000integrations of 35 min- well specified machine interface, e.g. an XML schema, or utes with a 288 MHz band centred on 1374 MHz. The a direct to user interface, e.g. a HTML interface. The use data archive has been extensively searchedfor period sig- of a generic interface is to reduce the level of domainspe- nals,howevermorepulsarscontinuetobediscoveredwith cificknowledgeaboutthedata,enablingusersfromawide in depth targeted searches and by using more advanced range of backgrounds to access the data. This reduces search tools(Faulkner et al. 2004). theissuesinvolvedinextractingandprocessingdatafrom many resources, allowing scientists to search for answers This data is stored as raw data, as it is taken from across a wide range of data archives. Standardisation of the telescope,givingthe maximumflexibility inanalysing communication protocols allows for intercommunication the data. Working with raw data means that the analysis betweenmanyVOprojects,furtherwidening the scopeof can be performed with the latest algorithms, which are research. constantly being improved. This allows for expansion of The PulsarVirtualObservatoryaimstoapplytheVO the functionality of the Pulsar Virtual Observatory that concepttodatafromarchivedpulsarsurveys.Thiswillen- would not be possible with archives of search results, or able new pulsar science as astronomers and theoreticians partially processed files. The Pulsar Virtual Observatory can access data that would otherwise be hidden. Access can also be expanded by adding new data archives to the will be made as simple as possible by providing a web system.Thiswillenablenewtypesofsearch,e.g.searching interface that canbe used by a standardweb browserap- in differentfrequency bands,as wellas expanding the sky plication. There will also be a range of options to allow coverageavailable.ItisexpectedthatmoreParkessurveys users with different levels of experience to benefit fully canbeaddedtothesystemwhenthedataismadepublicly from the Pulsar Virtual Observatory. available. 224 M. Keith et al.: Pulsar Virtual Observatory Therearesomeissuesregardingdatastoragethatneed is possible for users from outside the immediate scientific to be addressed when developing this system. Currently fieldtoanalysedata,leadingtomorecross-fieldscience,in the rawdataarchivesarestoredonRAID disksatJodrell line with the VO principles. It is intended that there will Bank Observatory. It is intended that these archives be be direct access to the raw data if required, allowing ad- made available to the Pulsar Virtual Observatory via a vanced users to develop their own processing algorithms. simple file transfer interface, however most frequently ac- Telescopedatacanbe searchedwithuserspecifiedpa- cesseddatacanbemovedclosetotheprocessingnodes,to rameters, using a supplied set of search tools. For exam- reduce data transfer times. The design of the Pulsar Vir- ple, a user could select a number of telescope pointings tual Observatorymakesit possible to store data in multi- and perform a Fourier analysis ona time-series generated plelocations,allowingforreplicationofdataforbackupor by the dedispersion of these pointings. Users will be able speedpurposes.Itis intendedthat the system alsoimple- to perform other types of searches, and new algorithms ment a pre-fetching system, where the data is transfered will be added when they are made available. toa locationnearto theprocessingnodepriorto the pro- cessing starting. This is can be achieved by transferring the appropriate data file when the task is entered into a processing queue. This reduces the processortime wasted while waiting for the file to transfer to the node that is performing the computation. 3. Computation resources The Pulsar Virtual Observatory will provide access to compute resourcesaswell asthe largedata archives.Pro- cessing will be made available via a simple web interface, directlylinkedtothe searchabledataarchive.Thisdesign allows users to analyse data without the need to locate and install software to their local systems. This enables greater access to the data as many users do not have the knowledge,skills,resourcesortimerequiredtoinstalland run the appropriate software.Advanced users can still be given the opportunity to customise the software that is being used, or downloadthe data to process by their own means. The current target system is the UK National Grid Service1 (NGS) a gridcomputing resourcefor UK science projects. There is however nothing to prevent the Pulsar VirtualObservatoryusing other resourcesin combination with or instead of the NGS. TheNGSisadedicatedhighperformancegridresource for the UK eScience programme. It comprises four core Fig.1. A simple overview of the search process on the nodesatManchester,OxfordandLeedsuniversitiesandat Pulsar Virtual Observatory. the Rutherford Appleton Laboratories. The NGS is suit- able to be used as our initial target system as it provides The searchparametersareusercustomisable,however highperformancecomputinganddatastoragefacilitiesin pre-defined default values will also be available for many a grid environment. The use of grid facilities makes inte- options. This means that expert users will be able to cus- grationwiththe PulsarVirtualObservatoryeasy,as each tomise the search routines for their specific task, but still site can be accessed via a uniform interface. giving wide access to users with less experience. The sys- BecausethePulsarVirtualObservatorycanusemulti- tem will also provide feedback regarding the status and ple resourcesfor analysingdata,the systemisdesignedin remainingruntime oftasks.This means thatuserswill be suchawaythattheuserdoesnotneedtoknowaboutthe able to better optimise the settings for their jobs. system their jobs are running on. This means that users of the Pulsar Virtual Observatory will not require sepa- rate authentication for each system that processing tasks 4. Implementation are to be run on. By providing a simple user interface, it The core architecture of the Pulsar Virtual Observatory 1 http://www.ngs.ac.uk is based around a three layer design. M. Keith et al.: Pulsar Virtual Observatory 225 The top layer handles user interaction and generates Testing of the basic system is currently in progress, the tasks that are required to process the user-selected and it is expected that the system will soon be usable for data.It is alsoresponsible for scheduling the tasks onthe a selection of trial problems. available compute resources. Compute resources are modelled by a web services 6. Conclusion based queueing system, developed in parallel with the Pulsar Virtual Observatory (Harbulot et al. 2006). The ThePulsarVirtualObservatorywillprovideaccesstodata queueingsystemis aapplicationindependent, i.e.there is and resources in order to enable greater use of archived nocodethatisspecifictothe PulsarVirtualObservatory, pulsarsurveydata.Thiswillallowuserstoanalysepulsar and is platform independent, i.e. there is no code specific data in a simple and uniform manner, without detailed to the compute resources that the tasks are being run knowledge of data formats and processing software, en- on. By having tasks pulled from the queues, rather than couragingmorescience to be carriedout.The PulsarVir- pushedontotheprocessingresources,thequeueingsystem tual Observatory is designed to integrate with other VO does not need any specialist code to interface with par- projects and so enable integration of pulsar data archives ticular resources. This means that all resources, whether with data sets from other astronomical fields. single machines, clusters or grids are viewed through the same interface. Acknowledgements. Michael Keith is funded by the Particle Physics and Astronomy Research Council (PPARC). Thelowlevelcomputationishandledbythefinallayer, Bruno Harbulot acknowledges funding under PPARC which is handled by a series of simple shell scripts. This Project PP/000653/1 (GridOneD). approach allows for deployment on a wide range of sys- tems as it does not rely on any specialist software being installed.Astasksarepulledfromthequeuesystemthere References is no need to run the scripts in a listening mode. This Faulkner,A.J., Stairs,I.H.,Kramer, M., etal., 2004, MNRAS meansthereisnoneedforinboundcommunication,which 355, 147 is often restricted, and the queueing server does not need Harbulot, B., Keith, M., Brooke, J., 2006, to appear in the to know the physical location of the machine that is run- Proceedings of the HPC-GECO/CompGrame workshop ning the processing script. Manchester,R.N.,LyneA.G.,Camilo,F.,etal.,2001,MNRAS Userinterfaceisprovidedviaawebinterfacethatcom- 328, 17 municates directly with the scheduler a database of the Quin, P.J. & Gorski, K.M., 2004, Towards an International available data archives and current process status. This Virtual Observatory will provide abilities to search the data catalogues and submit new jobs, as well as monitor progress of running jobs. The webinterface will also providetools for viewing the results of the data analysis, although the user may download the results to analyse with their own software. 5. Current status The Pulsar Virtual Observatory is not yet fully opera- tional, however a large fraction of the implementation is complete. There is a searchabledatabaseofthe data availablein thesystem,whichisaccessibleviathewebinterface.Users can select this data and select and customise processing to be performedon the data. This work is then scheduled on the available processing nodes via a simple scheduling algorithm. Processing algorithms have also been success- fully deployed on the NGS, and test workflows have been completed. Many issues remain to be resolved however, includ- ing error handling for failed jobs, user notification for job status updates and remaining runtime estimation. There is still a lot of development work that is required to get the output from the processing software back to the user, and to provide satisfactory visualisation tools. Data pre- fetching is not yet implemented.

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.