ebook img

SoFiA: a flexible source finder for 3D spectral line data PDF

1.3 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview SoFiA: a flexible source finder for 3D spectral line data

Mon.Not.R.Astron.Soc.000,1–9(2013) Printed19January2015 (MNLATEXstylefilev2.2) SoFiA: a flexible source finder for 3D spectral line data Paolo Serra,1(cid:63) Tobias Westmeier,2 Nadine Giese,3 Russell Jurek,1 Lars Flo¨er,4 At- tila Popping,2,5 Benjamin Winkel,6 Thijs van der Hulst,3 Martin Meyer,2 Ba¨rbel S. Koribalski,1 Lister Staveley-Smith2,5 and He´le`ne Courtois7 1CSIROAstronomyandSpaceScience,AustraliaTelescopeNationalFacility,POBox76,Epping,NSW1710,Australia 2ICRAR,M468,TheUniversityofWesternAustralia,35StirlingHighway,Crawley,WA6009,Australia 3UniversityofGroningen,KapteynAstronomicalInstitute,Landleven12,NL-9747AD,Groningen,theNetherlands 5 4Argelander-Institutfu¨rAstronomie,AufdemHu¨gel71,53121Bonn,Germany 1 5ARCCentreofExcellenceforAll-skyAstrophysics(CAASTRO) 0 6Max-Planck-Institutfu¨rRadioastronomie,AufdemHu¨gel69,53121Bonn,Germany 2 7Universite´Lyon1,CNRS/IN2P3,InstitutdePhysiqueNucle´aire,Lyon,France n a J Accepted2014December30.Received2014December22;inoriginalform2014July28 6 1 ABSTRACT ] M We introduce SoFiA, a flexible software application for the detection and parameterization ofsourcesin3Dspectral-linedatasets.SoFiAcombinesforthefirsttimeinasinglepieceof I softwareasetofnewsource-findingandparameterizationalgorithmsdevelopedonthewayto . h futureHIsurveyswithASKAP(WALLABY,DINGO)andAPERTIF.Itisdesignedtoenable p thegeneraluseofthesenewalgorithmsbythecommunityonabroadrangeofdatasets.The - keyadvantagesofSoFiAaretheabilityto:searchforlineemissiononmultiplescalestode- o r tect3Dsourcesinacompleteandreliableway,takingintoaccountnoiselevelvariationsand t thepresenceofartefactsinadatacube;estimatethereliabilityofindividualdetections;look s a forsignalinarbitrarilylargedatacubesusingacatalogueof3Dcoordinatesasaprior;provide [ awiderangeofsourceparametersandoutputproductswhichfacilitatefurtheranalysisbythe user.WehighlightthemodularityofSoFiA,whichmakesitaflexiblepackageallowingusers 1 toselectandapplyonlythealgorithmsusefulfortheirdataandsciencequestions.Thismod- v 6 ularitymakesitalsopossibletoeasilyexpandSoFiAinordertoincludeadditionalmethods 0 astheybecomeavailable.ThefullSoFiAdistribution,includingadedicatedgraphicaluser 9 interface,ispubliclyavailablefordownload. 3 0 Keywords: methods:dataanalysis. . 1 0 5 1 1 INTRODUCTION studied. We illustrate this point in Fig. 1, where we show a data v: Thedetectionofastronomicalsignalaboveinstrumentalnoiseisa cubeoftheATLAS3D HIsurvey(Serraetal.2012).Inthisfigure, i the central object is bright (and therefore easy to detect) but has X crucial aspect of all astronomy observations. The techniques em- acomplex3Dstructure,includingalowsurfacebrightnessexten- ployedtodetectandcharacterisethissignaldependonthetypeof r siontowardslargeRA.Onthecontrary,thetopobjectisbrightand a databeinganalysed(seeMasiasetal.2012forareview).Standard relatively simple as emission is confined within a small range of methodsandtoolshaveemergedinfieldswithalargecommunity RAandDec.Finally,thebottomobjectisthetypicalcaseofare- basesuchas2Dimaging(e.g.,SExtractor;Bertin&Arnouts solved,edge-ongalaxywherethetwopeaksofthedouble-hornve- 1996)and1Dspectroscopy(e.g.,GANDALF;Sarzietal.2006).In locityprofileareclearlyvisible,anddetectionofthefaintemission other fieldswith relatively fewer users,detection algorithms vary between the two peaks is challenging. An ideal 3D source finder significantlybetweenprojects.Thisisthecaseforstudiesbasedon shouldbeabletodetectandparameterizeallthesedifferentsources 3D spectral line data (for brevity, data cubes), where the flux of inacompleteandreliableway. aspectrallineismappedasafunctionofpositionontheskyand line-of-sightvelocityoftheemittingmatter. Radio single dishes and interferometers have traditionally The diversity of source finding methods for data cubes is at beenthemostcommontelescopesusedtoconstructdatacubes(al- leastpartlyduetothediversityof3Dstructureofthesourcesbeing thoughopticalintegral-fieldspectrographsarenowalsogenerating largenumbersofsuchcubes–e.g.,Cappellarietal.2011;Croom etal.2012;Sa´nchezetal.2012).Theupgradeandcontinuingoper- (cid:63) E-mail:[email protected] ation of existing radio telescopes, as well as the construction of (cid:13)c 2013RAS 2 PaoloSerraetal. Input data cube Select sub-cube Flagbad voxels Input Apply weights weights cube S R E Noisenor- LT Convolution 2D-1D malisation FI Figure1.Volumerenderingofan HI datacubeshowingthatindividual sourceshaveacomplexanddiverse3Dstructure.Thismakestheirdetection andaccurateparameterizationchallenging. RCEERS Threshold CNHI OUND finder S+Cfinder finder SFI the Square Kilometre Array and its precursors, are leading to a rapidincreaseinthenumberandsizeofdatacubes.Standardand sufficientlygeneralsource-findingtoolswillbenecessarytoanal- Merge Input yse these data, and recent work has started addressing this need detected mask (see,e.g.,DuchampbyWhiting2012).Inthispaperweintroduce voxels cube SoFiA,anew,flexibleSourceFindingApplicationfordatacubes whichcombinesdetectionalgorithmsandtechniquesfromseveral sourcefinders. Rejectfalse detections SoFiA is designed to work on any data cube independent of telescope or observed spectral line. However, its development is part of preparatory work for a few specific, upcoming HI sur- Optimize veys: WALLABY, a blind HI survey of 3/4 of the entire sky out mask toz∼0.25tobecarriedoutwiththeAustralianSquareKilometre ArrayPathfinder(ASKAP;seeKoribalski2012a);DINGO,adeep HIsurveyouttoz ∼ 0.4(alsotobecarriedoutwithASKAP;see Parame- terize Meyer2009);andtheHIsurveysplannedforAPERTIF(Verheijen etal.2008).Thispreparatoryworkhasresultedinthedevelopment ofanumberofnewsource-findingalgorithms,whicharedescribed Filter inaseriesofpapersreferredtointhenextSection(forasummary output seeKoribalski2012b).SoFiAputsthesedifferentalgorithmsto- getherforthefirsttimeinacoherent,flexibleandpubliclyavailable pieceofsoftware. SoFiA can be obtained from https://github.com/ SoFiA-Admin/SoFiA.Onthesamewebpageweprovidealist Source Output Moment Single- mask source of requirements, installation instructions and a user manual. The catalogue cube maps products aimofthispaperistodescribehowSoFiAoperatesondatacubes andtherebyprovideareferenceforcurrentandfutureusers. Figure2.SoFiAflowchart.Wehighlightthe“Filteroutput”modulewith adashedboxasthiswillbecomeavailableinfuturereleasesofSoFiA. 2 DESCRIPTIONOFSoFiA cube(orasub-cubeselectedbytheuser)isloaded,thesemodules allowusersto: SoFiAisamodularapplicationwhoseaimistodetectandparam- eterize sources in a data cube. The flowchart in Fig. 2 shows the • modifytheinputcubebyapplyingflags,weights,orasetof variousmodulesthatuserscanchoosetouse(ornottouse),inthe filters; order in which they are executed by SoFiA. Once an input data • detectthespectrallinesignal; (cid:13)c 2013RAS,MNRAS000,1–9 SoFiA3Dsourcefinder 3 switchedoff.Alternatively,itcouldbegivenanumberofrelatively morecomplextaskssuchas,forexample,applyingawaveletfil- teringalgorithm,rejectingfalsedetectionsorfittingmodelstothe spectrumofthedetectedsources. WhileSoFiAwillcontinuetobeimproved,thisbasicprinci- pleofmodularitywillnotchange.Therefore,althoughthispaper describes the software as it is at the time of writing and new al- gorithms may be introduced in the future, the main workings of SoFiAwillremainasillustratedhere. 2.1 Datacube,weightscube,maskcubesandfilters Four different types of input and/or output cubes are relevant at differentstagesofSoFiA. • Datacube,whichincludessignalfromastronomicalsources superimposedoninstrumentalnoise(anderrors). • Weights cube, which allows users to weight voxel values to takeintoaccount,e.g.,noiselevelvariationsacrossthecubeorthe presenceofimagingartefactsincertainregionsofthedatacube. • Binary mask, where detected and non-detected voxels have valuesof1and0,respectively. • Objectmask,wherenon-detectedvoxelshaveavalueof0and Figure3.ScreenshotoftheSoFiAGUI.TheGUIadoptsautomatically detected voxels have an integer value corresponding to the ID of thenativestyleofthewindowmanagerusedonthesystemwhereSoFiA theobjecttheybelongto. isinstalled.InthisfigureweshowtheGUIasitappearsonaKubuntuLinux system.TheGUIalsoofferstheoptionofdisplayingthesourcecatalogue Allsource-findingalgorithmsimplementedinSoFiAandde- generatedbySoFiAandincludesahelpbrowserthatexplainstheavailable scribed in Sec. 2.2 below assume that the noise level is uniform parametersettings. across the data cube. Therefore, noise variations caused by, e.g., mosaicking or frequency-dependent flagging need to be removed first.ThiscanbedonewithinSoFiAbymeansofaweightscube • identifysourcesbymergingdetectedvoxelstogether; inversely proportional to the noise level. SoFiA removes noise • rejectfalsedetections; variationsbymultiplyingthedatacubebytheweightscube.Once • optimizethemaskofindividualsources; sourcedetectioniscompleted,SoFiAwillundothisoperationbe- • measuresourceparameters; foremeasuringsourceparameters.Theweightscubecouldalsobe • filtertheoutputbyselectingaregionofinterestinsourcepa- usefultodown-weightregionsofadatacubeaffectedbyimaging rameterspace; artefacts(e.g.,cleaningorcontinuum-subtractionresiduals). • and produce output catalogues as well as cubes, moment Theweightscubecanbeprovidedbytheuser.Alternatively, maps,position-velocitydiagramsandintegratedspectra. userscanprovideananalyticdescriptionoftheweightsvariation Individualmodulesaredescribedinmoredetailintherestofthis across thecube. Finally, a weightscube inversely proportional to Section. They are written in either Python or C++ and rely on a thelocalnoiselevelcanbederivedbySoFiAandappliedtothe rangeofexternallibraries,includingNumPyandSciPy(Jonesetal. datacube.Theevaluationofthelocalnoiseleveliscarriedoutin- 2001; Walt, Colbert & Varoquaux 2011), Cython (Behnel et al. dependently along any or all of the three axes of the data cube. 2011),Astropy(AstropyCollaborationetal.2013),theGNUSci- Forexample,ausermaywishtoremovenoisevariationsalongthe entificLibrary1and,optionally,matplotlib(Hunter2007).Provided frequencyaxisalone,undertheassumptionthatthenoisedoesnot thattheselibrariesareavailable,SoFiAcanrunonallmachines varywithineachfrequencyplane. withaUnixorLinuxoperatingsystem(including,e.g.,MacOSX WenotethatSoFiAmeasuresthenoisewithinadatacubeat and Ubuntu). We refer to the SoFiA webpage for up-to-date de- variousotherstagesoftheprocessing.Differentmethodsofnoise tails. measurementareimplementedanduserscandecidewhichoneis SoFiA can be executed from the command line or using a moreappropriatefortheirpurpose.Possiblechoicesare:i)standard dedicated graphical user interface (GUI) based on the Qt library deviation;ii)medianabsolutedeviation;andiii)standarddeviation (seeFig.3).Bothmethodsallowuserstoselectwhichcombination ofazero-centredGaussianfittothenegativesideofthefluxhis- oftheabovemodulesandwhichsourcefindingandparameteriza- togram. tionalgorithmstouse.ThisselectionisdoneusingeithertheGUI The calculation and application of the inverse-noise weights or a plain text parameter file (if running SoFiA from the com- cube described above is part of a more general SoFiA module mandline),allowingthesource-findingstrategyanditscomplexity which allows users to apply a filter to the data cube before run- to be optimized for the type of data and sources of interest. For ningtheselectedsource-findingandparameterizationalgorithms. example, SoFiA could be asked the simple question of creating AsindicatedinFig.2,thismoduleincludestwoadditionalfiltering amoment-0imageofallvoxelsaboveagiventhresholdinadata methods:firstly,theconvolutionwitha3Dkernelwhoseshapecan cube – in which case most of SoFiA’s functionalities would be be chosen among a few options and whose size can be specified bytheuser;andsecondly,the2D-1Dwaveletde-noisingalgorithm developedbyFlo¨er&Winkel(2012).Thisalgorithmprocessesthe 1 http://www.gnu.org/software/gsl/ twospatialdimensionsandthespectraldimensionofthedatacube (cid:13)c 2013RAS,MNRAS000,1–9 4 PaoloSerraetal. separately,andreturnsanoise-freedatacubereconstructedusing todeliverhighercompletenessandreliabilitythanS+CandCNHI only wavelet coefficients above a specified threshold. Additional forsourcesunresolvedonthesky,especiallyatnarrowlinewidths. filteringoptionsmaybeprovidedinfuturereleases. Incontrast,theS+Cmethodisbyconstructionwellsuitedtofind- AsindicatedbytheflowchartinFig.2,portionsofthecube ingsourcesonavarietyofscalesandPoppingetal.(2012)deemit canbeblankedout(flagged)priortosourcefinding.Thismaybe thebestchoiceforextendedobjects. necessaryatthelocationofverybrightcontinuumsourceswhose Wenotethatmanyofthealgorithmshavebeenimprovedsince spectrumwasnotsubtractedproperlyfromthedata,oratchannels thecomparativestudyofPoppingetal.(2012).Additionaltesting dominatedbylineemissionfromtheGalaxyoraffectedbystrong cannowbecarriedoutwithinSoFiAandwillbeusedtoinvesti- radiofrequencyinterference. gatehowtofurtherimprovetheirperformance.Untilthen,werefer Finally, mask cubes are generally calculated within SoFiA to the aforementioned papers for a complete discussion of these (seebelow)butcanalsobeprovidedbytheuser.Thelattercould methods. bedesirableifauser,followinganinitialsource-findingrun,wishes Alltheabovealgorithmsreturnabinarymaskofdetectedvox- tolookforadditionalsourceswithadifferentsearchalgorithmor els(andanyadditionalsource-findingalgorithmcouldbeaddedto parameters. In this case the new sources are added to the initial, SoFiAaslongastheysatisfythiscondition).Asanexample,the inputmask.Alternatively,aninputmaskcouldbeusedifsources toppanelsofFig.4showfivechannelsextractedfromthedatacube havealreadybeenidentifiedandonlysubsequentparameterization inFig.1and,withblackcontours,theregionsincludedinthebinary stepsarerequired. mask.InthiscasetheS+Cfinderwasemployedusing12different smoothingkernels.Therelativelylowadoptedthreshold(3.5σ)re- sultsinanumberofnoisepeaksbeingincludedinthemask.We 2.2 Detectionofspectrallinesignal comebacktothispointinSec.2.4. SoFiA is meant to offer a number of detection algorithms that userscanchoosefrom.Acommonadvantageofthesealgorithms 2.3 Mergingdetectedvoxelsintosources istheabilitytolookforemissiononmultiplescales,whichises- sentialtodetectsourcesin3D(seeFig1).Anexceptionisthesim- Theaforementionedbinarymaskisthebasisforidentifyingindi- plethresholdmethod(seebelow),unlessusedincombinationwith vidualsourcesorobjects.InSoFiA,thiscomputationallyexpen- someofthefilteringmethodsdescribedabove(e.g.,2D-1Dwavelet siveoperationisperformedusingtheC++implementationofthe denoising).ThefollowingalgorithmsareimplementedinSoFiA. Lutz(1980)one-passalgorithmbyJurek(2012),combinedwitha sparse representation of 3D objects. We refer to Jurek (2012) for • Simplethreshold.Thisisthesimplestpossiblealgorithm(and detailsonthisimplementation.Hereitissufficienttosaythatthis theonlyonenotoperatingonmultiplescales):onlyvoxelswhose algorithmproducesthesameresultasafriends-of-friendsmethod absolute value is above a specified threshold are detected. Users withlinkingelementequaltoanellipticcylinder.Userscanspecify canspecifythethresholdinfluxunitsorrelativetothenoiselevel. thecylindersize.ThisstepofSoFiAalsoreturnsbasicsourcepa- • S+C.Thisisthesmooth+clipalgorithmdevelopedbySerra rameterssuchastotalflux,peakflux(bothnormalisedbythenoise et al. (2012) on the basis of techniques traditionally used within level)andsize. the HI community. It consists of searching for emission at mul- ThebottompanelsofFig.4showtheobjectscreatedfromthe tipleangularandvelocityresolutionsbysmoothingthedatacube binarymaskusingamergingcylinderwitharadiusof3pixelsand with 3D kernels specified by the user. At each resolution, voxels a height of 7 channels (we show only objects with positive total are detected if their absolute value is above a threshold given by flux).Thesepanelsshowfourrealdetectionsaswellasanumber theuser(innoiseunits).Thefinalmaskistheunionofthemasks ofpositivenoise-peakobjects.Itisworthhighlightingthesuccess- constructedatthevariousresolutions. fuldetectionofafaint,extendedHItaileastofthebrightestgalaxy • CNHI.ThisalgorithmwasdevelopedbyJurek(2012).Indi- (second panel from the left). This detection is made possible by vidual 1D spectra (or bundles of adjacent spectra) are extracted thefactthatSoFiAlooksforemissiononmultiplescales.Further- from the data cube. For each of them, the Kuiper test is used to more, SoFiA correctly identifies as a single source the resolved, identifyregionsofthespectrumwhicharenotconsistentwithcon- edge-ongalaxylocatedinthesouthernpartofthecube(visiblein tainingonlynoise.Inpractice,usersneedtoprovideaprobability allpanelsbutthefirst)despitethelowlevelemissionatchannels thresholdabovewhichaspectralregionisconsidereddetectedand closetothesystemicvelocity. isaddedtothefinalbinarymask. Thenumerouspossiblecombinationsofthesesource-findingmeth- 2.4 Reliabilityandrejectionoffalsedetections odstogetherwiththefilteringalgorithmsdescribedinSec.2.1al- lowuserstodesignanumberofdifferentstrategiestodetectsignal Alldetectionalgorithmslistedaboverequireuserstospecifyade- intheirdatacube.Forexample,theCNHIfindercouldberunfol- tectionthreshold.Thecloserthisthresholdistothenoise,themore lowing convolution with a 3D kernel appropriate for the type of noise peaks will be included in the resulting binary mask. Some sources being searched. Alternatively, a simple threshold method of these noise peaks may be identified as separate objects if they couldbeusedafterthenoisehasbeenremovedfromthecubeby aresufficientlyfarfromarealobject(seebottompanelsofFig.4). the2D-1Dwaveletfilter. SoFiA offers two ways of removing these false detections from Popping et al. (2012) discuss strengths and weaknesses of thefinaloutput. thesealgorithmsandcomparetheirperformance.Animportantrec- Thefirstmethodisasimplesizefilterandisbasedonthefact ommendationofthatworkisthatallsourcefindersshouldincorpo- thatallrealdetectionsareatleastaslargeasthedatacube’sresolu- ratesomeformof3Dsmoothinginordertoincreasecompleteness. tion.Inpractice,userscanspecifytheminimumacceptablesource Inthisrespect,thesimplethresholdalgorithmisoflimiteduseun- size along each axis of the cube independently. The downside of less coupled with a filtering methods such as the 2D-1D wavelet thismethodisthatitmaypotentiallyremoverelativelybrightbut de-noising.Poppingetal.(2012)findthisparticularcombination unresolvedsourcesfromthefinalobjectmask. (cid:13)c 2013RAS,MNRAS000,1–9 SoFiA3Dsourcefinder 5 1110.3 km/s 1234.0 km/s 1357.7 km/s 1481.3 km/s 1605.0 km/s +62.3° +62.2° Dec (J2000)+62.1° +62.0° +61.9° 1110.3 km/s 1234.0 km/s 1357.7 km/s 1481.3 km/s 1605.0 km/s +62.3° +62.2° Dec (J2000)+62.1° +62.0° +61.9° 180.8° 180.6° 180.4° 180.2° 180.8° 180.6° 180.4° 180.2° 180.8° 180.6° 180.4° 180.2° 180.8° 180.6° 180.4° 180.2° 180.8° 180.6° 180.4° 180.2° RA (J2000) RA (J2000) RA (J2000) RA (J2000) RA (J2000) Figure4.IllustrationofthedetectionofsignalandidentificationofindividualsourcesinSoFiA.Toppanels.Channelmapsextractedfromthedatacube showninFig.1.Theline-of-sightvelocityofeachchannelisindicatedinthetop-leftcorner(notethatthesearenotadjacentchannelsintheoriginalcube). Thebeamisshowninthebottom-leftcorner.Blackcontoursshowregionsincludedinthebinarymask(Sec.2.2).Bottompanels.Samechannelmapsasin thetoppanelsbutnowshowingtheindividualobjectsformedonthebasisofthebinarymask(Sec.2.3).Weshowonlyobjectswithpositivetotalflux.Each objectisindicatedwithadifferentrandomcolour.Blackcontoursindicatethefourobjectswhosereliabilityishigherthan99percent(Sec.2.4). The second method is illustrated by Serra, Jurek & Flo¨er However, experience shows that masks can miss the faint, outer (2012)andestimatesthereliabilityofindividualobjectsbycom- edge of objects, in particular if obtained with a high detection paringthedistributionofpositiveandnegativesources(i.e.,sources threshold.Thiswouldintroducesystematiceffectsinthemeasured with positive and negative total flux, respectively) in parameter parameters(e.g.,thetotalfluxwouldbeunderestimated;seeWest- space.Thesimpleideaisthatthedistributionofpositiveandneg- meier,Popping&Serra2012).Topreventthis,SoFiAofferstwo ativenoisepeaksshouldbeidenticalwhilepositive,realdetections maskoptimizationmethodswhichmodifytheobjectmaskcubeby should not have a negative counterpart in parameter space. It is growingthemaskswhichdefineindividualobjects.Inbothmeth- basedontheassumptionsthatthenoiseissymmetricandthatreal ods,themaskisgrownindependentlyforeachobject. sourceshavepositivetotalflux(i.e.,absorptionlinesourceshave beenmasked). Thefirstmethodismostlyappropriateforsourcesthatareun- WithinSoFiA,thereliabilitycanbecalculatedfollowingthe resolvedontheskyor,ifresolved,face-onandsymmetric.Itstarts runofanysource-findingalgorithmchosenbytheuseraslongas byfittinganellipsetothemoment-0imageoftheobject.Theel- both positive and negative noise peaks are included in the binary lipseisthenusedasamaskforallvelocitychannelsoccupiedby mask,andafterthedetectedvoxelsaremergedintosources(Sec. theobject–i.e.,theinitialmask,whichgenerallyhasanarbitrary 2.3).Thereliabilitycalculationalsorequiresthatasufficientnum- 3Dshape,isconvertedintoanellipticcylinder.Finally,thesizeof berofnegativenoisepeaksisincludedinthemasksuchthattheir theellipseisincreaseduntilamaximumintotalfluxisreached(a distributioninparameterspacecanbestudiedmeaningfully.Users similarmethodisdescribedbyBardenetal.2012inthecontextof canselecttoproducediagnosticplotsonthereliabilitycalculation 2Dimaging). similartothoseshowninSerra,Jurek&Flo¨er(2012).Theblack The above method should in principle be applied only to contoursinthebottompanelsofFig.4highlightobjectswhosere- sourceswhichfillmostofthecylindricalmaskinallchannels(see liabilityishigherthan99percent. above),whileforobjectswithamorecomplex3Dstructureitcan Insummary,userscandecidetorunSoFiAwithahighdetec- resultinadecreaseoftheintegratedsignal-to-noiseratio.Forthis tionthreshold,resultinginareliablebutpossiblyincompletecata- reasonweprovideasecondmaskgrowthmethod.Thisconsistsin logueofdetections;buttheycanalsodecidetodigdeeperintothe performingabinarydilationoftheinitialmaskalongthetwospa- noiseusingalowerthreshold,andsuccessivelyremovefalsedetec- tialaxesofthedatacubeusinga2Ddilationstructuringelement tions. In the latter case, a reliability value can be returned for all whoseshapeapproximatesacircle.Thesizeofthestructuringele- positivedetections. mentisincreasediterativelyuntilthetotalfluxconverges(i.e.,until therelativefluxgrowthbetweensuccessiveiterationsislowerthan a threshold specified by the user). This method preserves the 3D 2.5 Maskoptimization shapeoftheinitialmask.Inadditiontogrowthalongthetwospa- SoFiAmeasurestheparametersofallsources(e.g.,totalflux,size, tialaxes,thisalgorithmcanalsogrowallmasksbyafixednumber line width) considering only voxels included in the mask cube. ofchannels(selectedbytheuser)alongthefrequencyaxis. (cid:13)c 2013RAS,MNRAS000,1–9 6 PaoloSerraetal. +62.2° 400 0.20 1 Dec (J2000)+62.1° velocity (km/s) 0 1200 1300 F(Jy)000...011505 0 1 1 0.00 180.7° 180.6° 180.5° 180.4° 180.7° 180.6° 180.5° 180.4° –0.1 –0.05 0 0.05 0.1 1000 1100 1200 1300 1400 RA (J2000) RA (J2000) offset (deg) velocity(km/s) Figure5.DataproductsforthebrightestgalaxyinthecubeshowninFig.1.Fromlefttoright:moment-0image;moment-1velocityfield;position-velocity diagramalongthemorphologicalmajoraxis(PA=32◦);andintegratedspectrum(filledcircles)withthebest-fittingbusyfunctionoverlaid(solidline).Inthe latterpanelthedottedanddashedlinesindicatethelinewidthsW50andW20estimatedat50and20percentofthepeakflux,respectively,onthebasisofthe busyfunctionfit. 2.6 Sourceparameterization 2.8 PerformanceofSoFiA As mentioned in Sec. 2.3, basic source parameters are measured InthecurrentimplementationofSoFiAtheentireinputdatacube whendetectedvoxelsaremergedintoobjects.Thesecanbeused (or the selected sub-cube; see Fig. 2) is loaded into memory and toestimatethereliabilityofeachsourceandrejectfalsedetections processed on a single core. Additional cubes will also need to inparameterspace.AftermaskoptimizationSoFiAre-computes be stored in memory at various stages of processing, such as the thoseparametersandmeasuresadditionalones.Theseinclude:po- weightscube,thebinaryorobjectmaskcube,andasmoothedver- sition (both geometric and centre of mass); total flux; minimum sion of the data cube if required by the source-finding algorithm and maximum voxel value; size and bounding region along each beingused(e.g.,S+C),plusapotentiallylargearrayofsourcepa- axis; line width measured using different methods (including the rameters.Itisthereforeinterestingtodiscusshowthememoryre- one proposed by Courtois et al. 2009); results of an ellipse fit to quirementandexecutiontimeofSoFiAvarywithcubesize.For themoment-0image;resultsofabusyfunctionfittotheintegrated thispurposewemakeuseoftwocubes.Thesmallercubeistheone spectrum(foradescriptionofthebusyfunctionseeWestmeieretal. usedforillustrationpurposeinthispaper(Figs.1and4).Ithas360 2014).Theseparametersareprovidedbothina“raw”format(i.e., pixelsalongbothspatialaxesand150channelsalongthefrequency coordinatesinpixelunits,fluxesindataunits,line-widthinchan- axis,resultinginafilesizeof78MB.Thesecondcubeistheone nels) as well as converted into more useful units (e.g., WCS co- usedforthesource-findingtestsofSerra,Jurek&Flo¨er(2012)and ordinatesandstandardfluxandvelocityunits).Someofthesepa- Westmeier, Popping & Serra (2012). It too has 360 pixels along rameterizationstepsareoptionalandimplementationofadditional bothspatialaxesbutconsistsof1464channelsalongthefrequency parametersisstraightforwardwithinthecode. axis.Therefore,itssizeis∼10timesthatofthefirstcube. We process the two cubes with identical settings employing a representative combination of the algorithms described in this paper: noise normalisation along the frequency axis; S+C source finding with 12 smoothing kernels; merging of detected voxels 2.7 Outputproducts into sources; calculation of reliability and removal of unreliable Users can decide what output SoFiA should produce. Available sources; optimization of the mask of individual objects using the outputproductsinclude: dilation method; source parameterization including busy function fit;creationofoutputproductsforthecubesasawholeandforthe • Catalog of objects and their parameters, both in ASCII and individualdetections;creationofASCIIandXMLcatalogues.The VO-compliantXMLformat. tworunsarecarriedoutonamachinerunningLinuxMint17with • Finalobjectmask. amemoryof16GBanda2.9GHzIntelCoreprocessor. • Moment 0 and 1 images of the sky area covered by the full Figure6showsthememoryusageofSoFiAasafunctionof datacubedeterminedfromthedatawithinthemask. time for the two cubes. Both axes of the plot are normalised by • Cut-out data cubes containing individual objects as well as thecubesize.Thetimebehaviourofthetwocurvesappearsvery their corresponding mask, moment 0, 1 and 2 images, integrated similar,indicatingthattheexecutiontimescalesapproximatelylin- spectrum and position-velocity diagram along the morphological earlywithcubesizewithintherangeexploredhere.Thememory majoraxis.AnexampleoftheseproductsisshowninFig.5. offsetbetweenthetwocurvesisduetotheloadingintomemoryof anumberoflibrariesusedbySoFiA.Thesecomewithamemory Infuturereleasesitwillbepossibletoproducetheseproductsfor overheadoftheorderofafewtensofmegabytes,whichismore just a subset of the detections by selecting a region of interest in noticeableinthecaseofasmallerdatacube.Fordatacubesmuch sourceparameterspace. larger than this overhead the memory usage is between 2 and 3 This output is designed to not only give useful information timesthesizeofthecube,withoccasionalpeaksbetween3and4 aboutthedetectedsourcesbutalsotoenablefurther,higher-level timesthecubesize. analysisbytheuser.Forexample,thecut-outcubesofindividual Figure6allowsustoinvestigatethememoryandprocessing objectsandthecorrespondingmaskscouldbeusedtomeasuread- timetakenbythevariousalgorithms.Inthiscase,mostofthetime ditional source parameters not included in SoFiA or to produce istakenbytheS+Cfinder.Thebeginningandendofitsexecution Gauss-Hermitevelocityfieldstoenablekinematicalstudies. aremarkedbyopenandfilledblackcirclesforthesmallandlarge (cid:13)c 2013RAS,MNRAS000,1–9 SoFiA3Dsourcefinder 7 5 sizecube=79MB sizecube=759MB 4 e b u ec3 z si / y r o 2 m e m 1 0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 time/size (s/MB) cube Figure6.MemoryusageasafunctionoftimefortworunsofSoFiAontwocubeswhosesizesdifferbyafactor∼10(seelegend).Asexplainedinthetext, bothrunsincludetheuseoftheS+Csourcefinder,andopenandfilledblackcirclesindicatethebeginningandtheendoftheS+Cexecutionforthetwocubes, respectively.SeeSec.2.8foramoredetaileddiscussionofthetimetakenbyotheralgorithmsduringthetwoSoFiAruns. datacube,respectively.ThepeaksinmemoryusageofS+Ccorre- thatitallowsuserstosearchforemissioninanynumberofsmall spondtothesmoothingoperations,whiletheplateausinbetween sub-cubescentredatasetof3Dcoordinateswithinanarbitrarily peakscorrespondtothenoiselevelcalculation.Eachadditionalfil- large data cube. For example, in an era of large HI and optical ter would contribute another ∼ 0.05 s/MB to the execution time redshiftsurveys,thismodecouldbeusedtolookforemissionin inthiscase.Thenoisecalculationappearstobeparticularlytime alarge HI cubeatthelocationofgalaxiesincludedinanoptical consuming.Inthiscase,itiscarriedoutusingtheaforementioned spectroscopiccatalogue. Gaussianfittothenegativesideoffluxhistogram(Sec.2.1).This ThismodeisfullyintegratedinSoFiAandinterestedusers calculationusesthefullcube,andanobviouswaytoincreaseits needtosimplyprovidetheinputdatacubeandacatalogueof3D speedwouldbetocalculatethenoiselevelonasub-cube.Thisop- coordinates.SoFiAwillprocessthevariouspositionssequentially, tionmaybecomeavailableinthefuture. eachtimeloadingintomemoryonlythesub-cubeofinterest.The ThetimebeforethebeginningoftheS+CfinderinFig.6is 3Dsizeofthesub-cubescanbesetbytheuserandisthesamefor takenbythenoisenormalisationalongthefrequencyaxisandby allpositions.Userscanalsorequestthecreationofasingleoutput an initial measurement of the noise level in the normalised cube. catalogueofsources,whichisgeneratedbymergingthecatalogues The time after S+C is taken by all other algorithms listed above, obtainedateachposition. andthesearetypicallymuchfaster.Thememorypeakrightatthe endofS+Ccorrespondstothemergingofvoxelsintosources.The heightofthispeakdependsonthenumberofsourcesdetected.This 2.10 Comparisontoothersourcefinders isfollowedbythecalculationofthereliabilityandtherejectionof unreliablesources,whicharebothrelativelyinexpensiveinterms Anumberofestablishedsoftwarepackagesforthereductionand of memory but can be time consuming. The final memory peaks analysis of interferometric data allow some source finding to be correspondtothecreationofmomentimages. carriedoutondatacubes(e.g.,GIPSY,Miriad).However,this approachrequiresuserstodevelopcustomcodeswhichmakeuse of (and are limited by) the tasks available within those general- purposepackages.Incontrast,themorespecialisedSoFiAoffers 2.9 Source-findingbasedonacatalogueof3Dcoordinates awiderangeofready-to-usesource-findingalgorithms,whichare The above discussion makes it clear that SoFiA is currently not alreadyintegratedwithoneanotherandcanbecombinedinaflex- able to process arbitrarily large data cubes but is limited by the iblewaytoproduceavarietyofoutputproducts. memoryofthesystemonwhichitisrun.Thisproblemispartially Theother3Dapplicationforspectrallinedatawhichshares alleviatedbythefactthatSoFiAisabletolimittheprocessingto some of these characteristics is Duchamp (Whiting 2012). This asub-cubewhoseboundariesarespecifiedbytheuser.Therefore, applicationdetectssourcesusingasimplethresholdmethod(sim- users could choose to run SoFiA multiple times on sufficiently ilartotheonedescribedinSec.2.2)andthengrowsthemusinga small portions of a large input cube, obtaining individual output secondarythreshold.Thisalgorithmdiffersfromthoseavailablein productsforeachofthem.Theycouldthencombinetheseproducts, SoFiAand,inthisrespect,thetwopackagescouldbeseenascom- creatingforexampleasinglemaskorcatalogue.Inthefuturewe plementary(Poppingetal.2012showsthatDuchamphasthebest may be able to offer such breaking up of a large input cube into performance for unresolved sources but does not reach the com- sub-cubes–andthecreationoffinaldataproductsforthefullcube pleteness of S+C for resolved sources). With respect to memory –asaprocessingmodefullyintegratedwiththeothermodulesof requirements, Duchamp is similar to SoFiA in that it loads and SoFiA. processesinonecorethefullinputdatacube.Therefore,ittoois Inthiscontext,ausefulfeaturealreadyavailableinSoFiAis limitedbythememoryofthesystemonwhichitisrun. (cid:13)c 2013RAS,MNRAS000,1–9 8 PaoloSerraetal. AsignificantadvantageofSoFiAcomparedtoDuchampis sources.Thisoutputisdesignedtobothprovideausefuldescrip- that it offers a larger number of algorithms for both source find- tionofthesourcesaswellasfacilitatesubsequentanalysis. ingandparameterization.ThisincludestheS+CandCNHIfinders, WehighlightthemodularityofSoFiA,whichallowsusersto the2D-1Dwaveletde-noising(denoisingisavailableinDuchamp optimize the source-finding and parameterization strategy for the butitusesisotropicwavelets,whicharenotidealforspectralline data and sources of interest. This modularity also enables future sources whose size along the spectral axis is decoupled from the expansionsofSoFiAtoincludenewsource-findingandparame- sizealongthetwospatialaxes),thecalculationofthereliabilityof terisationalgorithms. individualdetections,themaskoptimizationbybinarydilation,the SoFiA is publicly available at the website indicated in Sec. possibilityofsearchingforsignalonthebasisofaninputcatalogue 1togetherwithtechnicalinformationonhowtousethesoftware. of3Dcoordinatesandthebusyfunctionfit.Thecreationofcubelets Softwareupdates,improvementsandbugfixesarepostedregularly andPVdiagramsforindividualdetectionsisalsonotincludedin at this webpage. SoFiA is registered at the Astrophysics Source DuchampbutisavailableinSelavy,asourcefinderbuiltupon CodeLibrarywithIDascl:1412.001. Duchamp for distributed processing of large cubes (Whiting & Humphreys2012).Whilefuturedevelopmentmayreducethedif- ference between Duchamp/Selavy and our package, all above ACKNOWLEDGMENTS methodsareatthemomentuniquetoSoFiA. Finally,itisworthmentioningthatSoFiAdoesnotofferat TheauthorsacknowledgefinancialsupportfromaResearchCol- themomentafullanalysisofthesources’morphology.Forexam- laborationAwardoftheUniversityofWesternAustralia.TvdHand ple,agroupofnearbydetectedvoxelsismergedintoasinglesource NGweresupportedbytheEuropeanResearchCouncilunderunder regardlessofthesizeofthesourceandonlybasedonthemerging theEuropeanUnion’sSeventhFrameworkProgramme(FP/2007- elementchosenbytheuser(Sec.2.3).Thismeansthatnoinforma- 2013)/ERCGrantAgreementnr.291531.LFacknowledgessup- tionisgivenaboutwhetherthesource,whichcouldbeverylarge, portbytheDeutscheForschungsgemeinschaft(DFG)undergrant iscomposedofdistinctandeasilyrecognisablecomponents.Dif- numbers KE757/7-1, KE757/7-2, KE757/7-3 and KE757/9-1. LF ferentandmorespecialisedsourcefindersareabletoprovidesuch isamemberoftheInternationalMaxPlanckResearchSchool(IM- characterisation(e.g.,ClumpfindbyWilliams,deGeus&Blitz PRS)forAstronomyandAstrophysicsattheUniversitiesofBonn 1994;BLOBCATbyHalesetal.2012).Wenotehoweverthatthe andCologne. objectmaskcubereturnedbySoFiAcouldbeusedasastarting pointforfurthermorphologicalanalysisofthedetections. REFERENCES AstropyCollaborationetal.,2013,A&A,558,A33 BardenM.,Ha¨ußlerB.,PengC.Y.,McIntoshD.H.,GuoY.,2012, 3 SUMMARY MNRAS,422,449 We provide a high-level description of SoFiA, a flexible source Behnel S., Bradshaw R., Citro C., Dalcin L., Seljebotn D. S., finderfor3Dspectrallinedata.SoFiAputstogetherforthefirst SmithK.,2011,ComputinginScience&Engineering,13,31 timeinasinglepackageanumberofnewsource-findingandpa- BertinE.,ArnoutsS.,1996,A&AS,117,393 rameterization algorithms developed in preparation of upcoming CappellariM.etal.,2011,MNRAS,413,813 HIsurveyswithASKAP(WALLABY,DINGO)andAPERTIF.It CourtoisH.M.,TullyR.B.,FisherJ.R.,BonhommeN.,Zavodny is,however,designedtoenabletheuseofthesenewalgorithmson M.,BarnesA.,2009,AJ,138,1938 anydatacubeindependentofemissionlineortelescopeused. CroomS.M.etal.,2012,MNRAS,421,872 Wedescribethevariousmethodsandalgorithmsavailablein Flo¨erL.,WinkelB.,2012,PublicationsoftheAstronomicalSoci- SoFiA as well as planned developments. One key advantage of etyofAustralia,29,244 SoFiAisthatitallowsuserstosearchforspectrallinesignalon Hales C. A., Murphy T., Curran J. R., Middelberg E., Gaensler multiple scales on the sky and in frequency (using. e.g., the S+C B.M.,NorrisR.P.,2012,MNRAS,425,979 finderorthe2D-1Dwaveletfilter),whichiscrucialtodetectand HunterJ.D.,2007,ComputingInScience&Engineering,9,90 parameterize3Dsourcesinacompleteandreliableway.Further- JonesE.,OliphantT.,PetersonP.,etal.,2001,SciPy:Opensource more,withinSoFiAitispossibletotakeintoaccountnoiselevel scientifictoolsforPython.[Online;accessed2014-07-02] variationsacrossthecubeandthepresenceoferrorsandartefacts. JurekR.,2012,PublicationsoftheAstronomicalSocietyofAus- Moreover, SoFiA is able to estimate the reliability of individual tralia,29,251 detections,whichshouldbeparticularlyusefulforsurveysexpected KoribalskiB.S.,2012a,PublicationsoftheAstronomicalSociety todetectalargenumberofsources.Itcanalsoproduceavariety ofAustralia,29,359 of output products, including moment images, cut-out cubes and KoribalskiB.S.,2012b,PublicationsoftheAstronomicalSociety images,integratedspectraandcataloguesofsourceparameters.Fi- ofAustralia,29,213 nally,SoFiAisabletosearchforlineemissioninarbitrarilylarge LutzR.K.,1980,TheComputerJournal,23,262 datacubesonthebasisofacatalogueof3Dcoordinates.Mostof MasiasM.,FreixenetJ.,Llado´ X.,PeracaulaM.,2012,MNRAS, thesemethodsarenotavailableinothersourcefindersandarecur- 422,1674 rentlyuniquetoSoFiA. MeyerM.,2009,inPanoramicRadioAstronomy:Wide-field1-2 WeprovideafewvisualexamplesofhowSoFiAworksin- GHzResearchonGalaxyEvolution cluding a view of the dedicated graphical user interface. We de- PoppingA.,JurekR.,WestmeierT.,SerraP.,Flo¨erL.,MeyerM., scribetheavailableparameterizationandthewiderangeofoutput KoribalskiB.,2012,PublicationsoftheAstronomicalSocietyof products, which include mask cubes, moment images, position- Australia,29,318 velocity diagrams and busy function spectral fits of individual Sa´nchezS.F.etal.,2012,A&A,538,A8 (cid:13)c 2013RAS,MNRAS000,1–9 SoFiA3Dsourcefinder 9 SarziM.etal.,2006,MNRAS,366,1151 SerraP.,JurekR.,Flo¨erL.,2012,PublicationsoftheAstronomi- calSocietyofAustralia,29,296 SerraP.etal.,2012,MNRAS,422,1835 VerheijenM.A.W.,OosterlooT.A.,vanCappellenW.A.,Bakker L.,IvashinaM.V.,vanderHulstJ.M.,2008,inAmericanInsti- tuteofPhysicsConferenceSeries,Vol.1035,TheEvolutionof Galaxies Through the Neutral Hydrogen Window, Minchin R., MomjianE.,eds.,pp.265–271 Walt S. v. d., Colbert S. C., Varoquaux G., 2011, Computing in Science&Engineering,13,22 Westmeier T., Jurek R., Obreschkow D., Koribalski B. S., Staveley-SmithL.,2014,MNRAS,438,1176 WestmeierT.,PoppingA.,SerraP.,2012,PublicationsoftheAs- tronomicalSocietyofAustralia,29,276 WhitingM.,HumphreysB.,2012,PublicationsoftheAstronom- icalSocietyofAustralia,29,371 WhitingM.T.,2012,MNRAS,421,3242 WilliamsJ.P.,deGeusE.J.,BlitzL.,1994,ApJ,428,693 (cid:13)c 2013RAS,MNRAS000,1–9

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.