Mon.Not.R.Astron.Soc.000,1–18(2007) Printed11December 2013 (MNLATEXstylefilev2.2) Clustered Calibration: An Improvement to Radio Interferometric Direction Dependent Self-Calibration 1⋆ 2 1 S. Kazemi , S. Yatawatta , S. Zaroubi 1KapteynAstronomicalInstitute,UniversityofGroningen,P.O.Box800,9700AVGroningen,theNetherlands 2ASTRON,Postbus2,7990AADwingeloo,theNetherlands 3 11December 2013 1 0 2 ABSTRACT Thenewgenerationofradiosynthesisarrays,suchasLOFARandSKA,havebeendesignedto n surpassexistingarraysintermsofsensitivity,angularresolutionandfrequencycoverage.This a J evolutionhasledtothedevelopmentofadvancedcalibrationtechniquesthatensurethedeliv- eryof accurateresultsatthe lowestpossible computationalcost. However,the performance 3 ofsuchcalibrationtechniquesisstilllimitedbythecompact,brightsourcesinthesky,used ] ascalibrators.Itis importantto havea brightenoughsourcethatis welldistinguishedfrom M the backgroundnoise levelin orderto achievesatisfactory resultsin calibration.This paper presents“clusteredcalibration”asamodificationtotraditionalradiointerferometriccalibra- I . tion,inordertoaccommodatefaintsourcesthatarealmostbelowthebackgroundnoiselevel h intothecalibrationprocess.Themainideaistoemploytheinformationofthebrightsources’ p measuredsignalsasanaidtocalibratefaintersourcesthatarenearbythebrightsources.Inthe - o casewherewedonothavebrightenoughsources,asourceclustercouldactasabrightsource r that can be distinguished from backgroundnoise. For this purpose, we construct a number t s of source clusters assuming that the signals of the sources belongingto a single cluster are a corrupted by almost the same errors. Under this assumption, each cluster is calibrated as a [ single source,using the combinedcoherenciesof its sourcessimultaneously.This upgrades 1 thepowerofanindividualfaintsourcebytheeffectivepowerofitscluster.Thesolutionsthus v obtainedforeveryclusterareassignedtoeachindividualsourceinthecluster.We giveper- 3 formanceanalysisofclusteredcalibrationtoshowthesuperiorityofthisapproachcompared 3 to the traditional un-clustered calibration. We also provide analytical criteria to choose the 6 optimumnumberofclustersforagivenobservationinanefficientmanner. 0 . Keywords: methods:statistical,methods:numerical,techniques:interferometric 1 0 3 1 : v 1 INTRODUCTION before imaging. We also consider the removal of bright sources i fromtheobserveddatapartofcalibration,thatenableimagingthe X Low frequency radio astronomy is undergoing a revolution as a weakbackground sources.Forlowfrequencyobservationswitha r new generation of radio interferometers with increased sensitiv- widefieldofview,calibrationneedstobedonealongmultipledi- a ityandresolution,suchastheLOwFrequencyARray(LOFAR)1, rectionsinthesky.Propercalibrationacrossthefullfieldofviewis theMurchisonWide-fieldArray(MWA)2andtheSquareKilome- requiredtoachievetheinterferometer’sdesiredprecisionandsen- terArray(SKA)3 arebeingdevisedandsomearealreadybecom- sitivitygivingushighdynamicrangeimages. ingoperational.Thesearraysformalargeeffectiveaperturebythe combinationofalargenumberofantennasusingaperturesynthe- Early radio astronomy used external (classical) calibration sis (Thompsonetal. 2001). In order to achieve the full potential whichestimatestheinstrumentalunknownparametersusingara- of such an interferometer, it isessential that the observed data is dio source with known properties. This method has a relatively properlycalibratedbeforeanyimagingisdone.Calibrationofra- lowcomputationalcostandafastexecutiontime,butitgenerates diointerferometersreferstotheestimationandreductionoferrors results strongly affected by the accuracy with which the source introduced by the atmosphere and also by the receiver hardware, properties are known. In addition, existence of an isolated bright source, which is the most important requirement of external cal- ibration, in a very wide field of view is almost impractical, and even when it does, external calibration would only give informa- ⋆ E-mail:[email protected] 1 http://www.lofar.org tionalong thedirectionof the calibrator.Theexternal calibration 2 http://www.mwatelescope.org wasthenimprovedbyself-calibration(Pearson&Readhead1984). 3 http://www.skatelescope.org Self-calibration has the advantage of estimating both the source (cid:13)c 2007RAS 2 Kazemi et al. and instrumental unknowns, without the need of having a prior putationallycomplex. Hierarchical clustering(Johnson 1967)isa knowledgeofthesky,onlyutilizingtheinstrument’sobserveddata. well-knownclusteringapproachwithastraightforwardimplemen- Moreover,itcanachieveasuperiorprecisionbyiteratingbetween tationsuitableforourcase.But,itscomputational costgrowsex- theskyandtheinstrumentalparameters. ponentiallywiththesizeofitstargetdatasetwhichcanbeadis- The accuracy of any calibration scheme, regardless of the advantagewhenthenumberofsourcesishuge.WeightedK-means usedtechnique, isdeterminedbythelevelofSignaltoNoiseRa- clustering(Kerdprasopetal.2005;MacQueen1967)isalsooneof tio (SNR). This limits the feasibility of any calibration scheme themostusedofclusteringschemesapplicableinclusteredcalibra- toonlybright sources that haveahighenough SNRtobedistin- tion.Theadvantageofthisclusteringtechniqueisitslowcomputa- guishedfromthebackgroundnoise(Bernardietal.2010;Liuetal. tionalcost,whichisproportionaltothenumberofclustersandthe 2009;Pindoretal.2010).Notethatinterferometricimagesarepro- sizeofthetargetdataset.Nonetheless,weemphasizethatthecom- duced using the data observed during the total observation (inte- putational timetaken byanyof theaforementioned clusteringal- gration) time. However, calibration is done at shorter time inter- gorithmsisnegligiblecomparedwiththecomputationaltimetaken valsofthattotalduration.Thisincreasesthenoiselevelofthedata by the actual calibration. Therefore, we pursue all clustering ap- for which calibration is executed compared to the one in the im- proachesinthispaper.However,theuseofFuzzyC-meanscluster- ages.Inotherwords,therearesomefaintsourcesthatappearwell ing(Bezdek1981)inclusteredcalibrationrequiresmajorchanges abovethenoiseintheimageswhiletheyarewellbelowthenoise inthecalibrationdatamodelandwillbeexploredinfuturework. level during calibration.It hasbeen agreat challenge tocalibrate This paper is organized as follows: First, in section 2 we suchfaintsourceshavingaverylowSNR.Inthispaperwepresent presentthegeneraldatamodelusedinclusteredcalibration.Insec- clusteredself-calibration,introducedinKazemietal.(2011a),and tion3, wepresent modified weighted K-means and divisive hier- emphasizeitsbetterperformancecomparedtoun-clusteredcalibra- archical clustering forclustering sources inthesky. Next, insec- tionbelowthecalibrationnoiselevel.Existingun-clusteredcalibra- tion 4, we focus on analyzing the performance of clustered cali- tionapproaches(Intemaetal.2009;Smirnov2011)canonlyhan- bration,withanaprioriclusteredskymodel,andcomparethere- dleahandfulofdirectionswherestrongenoughsourcesarepresent sultswithun-clusteredcalibration.Inclusteredcalibration,thereis andweimprovethis,inparticularforsubtractionofthousandsof contentionbetweentheimprovementofSNRbyclusteringsources sources over hundreds of directions in the sky, as in the case for andtheerrorsintroducedbytheclusteringofsourcesitself.Thus, LOFAR(Bregman2012). we relate the clustered calibration’s performance to the effective Theimplementationof clusteredcalibrationisperformed by SignaltoInterferenceplusNoiseRatio(SINR)obtained foreach clusteringthesourcesinthesky,assumingthatallthesourcesina cluster. For this purpose, we use statistical estimation theory and singleclusterhavethesamecorruptions,andcalibratingeachclus- theCramer-RaoLowerBounds(CRLB)(Kay1993).Insection5, terasasinglesource.Attheend,theobtainedcalibrationsolutions wederivecriteriaforfindingtheoptimumnumberofclustersfora foreverysourceclusterisassignedtoallthesourcesinthatcluster. givensky.WeusetheSINRanalysisandadoptAkaike’sInforma- Thisprocedureimprovestheinformationusedbycalibrationbyin- tionCriterion(AIC)(Akaike1973)andtheLikelihoodRatioTest corporatingthetotalofsignalsobservedateachclusterinsteadof (LRT)(Graves1978)toestimatetheoptimumnumberofclusters. eachindividualsource’ssignal.Therefore,whenSNRisverylow,it Wepresentsimulationresultsinsection6toshowthesuperiority providesaconsiderablybetterresultcomparedtoun-clusteredcal- ofclusteredcalibrationtoun-clusteredcalibrationandtheperfor- ibration.Moreover, calibratingforasetofsourceclustersinstead mance of thepresented criteriaindetecting theoptimum number offor alltheindividual sourcesintheskyreducesthenumber of ofclusters.Finally,wedrawourconclusionsinsection7.Through directionsalongwhichcalibrationhastobeperformed,thuscon- this paper, calibration is executed by the Space Alternating Gen- siderably cutting down the computational cost. However, there is eralizedExpectationmaximization(SAGE)(Fessler&Hero1994; onedrawbackinclusteredcalibration:Thecorruptionsofeachin- Yatawattaetal. 2009; Kazemietal. 2011b) algorithm. Moreover, dividualsourceinonegivenclusterwillalmostsurelybeslightly in our simulations, radio sources are considered to be uniformly different from the corruptions estimated for the whole cluster by distributedintheskyandtheirfluxintensitiesfollowRaleighdis- calibration. Wecallthisadditional error as“clustering error”and tribution,whichistheworstcasescenario.Inrealskymodels,there in order to get the best performance in clustered calibration, we usually exist only a few (two or three) number of bright sources shouldfindtherightbalancebetweentheimprovementinSNRas whichdominatetheemission.Inthepresenceofthesesourcesand opposedtodegradationbyclusteringerror. thebackground noise, itisimpracticaltosolvefortheotherfaint Recently,clusteringmethodshavegainedalotofpopularityin sourcesinthefieldofviewindividually.Therefore,obtainingabet- dealingwithlargedatasets(Kaufman&Rousseeuw1990).There tercalibrationperformanceviatheclusteredcalibrationapproach, arevariousclusteringapproachesthatcouldbeappliedtocalibra- comparedtotheun-clusteredone,isguaranteed.Therefore,insec- tionbasedontheirspecificcharacteristics.Anoverviewofdifferent tion6weillustratethisusingsimulationsinwhichthebrightness clusteringmethodsisgiveninXu&Wunsch(2008).Clusteringof distributionofsourcesisapowerlawwithaverysteepslope.On radio sources should take into account (i) their physical distance topofthat,Kazemietal.(2012);Yatawattaetal.(submitted2012) fromeachotherand(ii)theirindividualintensity.Thesmallerthe alsopresenttheperformanceofclusteredcalibrationonrealobser- angular separations of sources are, the higher the likelihood that vationsusingLOFAR. they share the same corruptions in their radiated signals. More- The following notations areused inthispaper: Bold, lower- over, in order to get the best accuracy in the calibration results, caselettersrefertocolumnvectors,e.g.,y.Uppercaseboldletters thereshouldbeabalancebetweeneffectiveintensitiesofdifferent refertomatrices,e.g.,C.Allparametersarecomplexnumbers,un- clusters.Thus,intheclusteringprocedure,everysourceshouldbe lessstatedotherwise.Theinverse,transpose,Hermitiantranspose, weightedsuitablyaccordingtoitsbrightnessintensity. andconjugationofamatrixarepresentedby(.)−1,(.)T,(.)H,and The brightness distribution of radio source in the sky is a (.)∗,respectively.Thestatisticalexpectationoperatorisreferredto powerlawandthespatialdistributionisPoisson.Therefore,clus- asE . .ThematrixKroneckerproductandtheproper(strict)sub- {} tering thesources via probabilistic clustering approaches iscom- setaredenotedby and(,respectively.I isthen nidentity n ⊗ × (cid:13)c 2007RAS,MNRAS000,1–18 ClusteredRadioInterferometricCalibration 3 matrixand∅istheemptyset.TheKroneckerdeltafunctionispre- andfrequency resolutionisperformedtoestimateand correctfor sentedbyδ .RandCarethesetsofRealandComplexnumbers, G -sandthecorrectedvisibilitiesareobtainedas ij p respectively.TheFrobeniusnormisshownby . . Estimatedpa- rametersaredenotedbyahat,(.).Alllogarithm||ic||calculationsare Vpq =G−p1VpqG−qH. (4) to the base e. The multivariate Gaussian and Complex Gaussian Theremainingerrorsareuniquetoagivendirection,butresidual e distributionsaredenotedby cand ,respectively. errors in G -s are also absorbed into these errors, which are de- N CN p notedbyJ intheusualnotation. pi VectorizingEq.(4),thefinalvisibilityvector ofthebaseline p qisgivenby − 2 CLUSTEREDSELF-CALIBRATIONDATAMODEL K In this section, we present the measurement equation of a po- vpq = J∗qi(θθθ)⊗Jpi(θθθ)vec(Ci{pq})+npq. (5) larimetric clustered calibration in detail (Hamakeretal. 1996; i=1 X Hamaker 2006).Suppose wehave aradiointerferometer consist- Stacking up all the cross correlations (measured visibilities) and ing of N polarimetric antennas where each antenna iscomposed noise vectors in y and n, respectively, the un-clustered self- oftwoorthogonaldual-polarizationfeedsthatobserveKcompact calibrationmeasurementequationisgivenby sources in the sky. Every i-th source signal, i 1,2,...,K , causes an induced voltage of vpi = [vXpi vY∈pi]T{ at X and Y} K dipolesofeveryp-thantenna,p 1,2,...,N .Inpractice, y= si(θθθ)+n. (6) ∈{ } i=1 e X vpi =Jpiei, (1) InEq.(6),y,n CM,M =2N(N 1),thenoisevectoriscon- ∈ − wJphierreepereise=nts[etXhei2eYi]2T Jieosntehsemsoautrricxe’(sHealmecatkreicrefitealdl.v1e9c9t6o)rcaonrd- ΠΠsΠid,enre∼dtNoh(0av,ΠΠeΠaMz×eMro)m,aenadntGheaunsosnialinnedaisrtfruibnucttiioonnwsii(thθθθ)ciosvdaerifiannecde × as respondingtothedirection-dependentgaincorruptionsintheradi- a(awttghheaeditcisbohienglaan(mtaterlro.sopThnoah,spepiesnh,eetschryioicssrtresauemnpcdttiifoioronennsq,ouawserpienlhcloeybrrieigrceeinsxdpapiotsletanodisnreftei,rodoenmitsnc,.t)mheetaocnir.n)edspdttrrheuotemapiepler.rntoitpeas-l si(θθθ)≡ JJ∗2∗3ii((θθθθθθ))⊗⊗JJ11ii((θθθθθ...θ))vveecc((CCii{{ppqq}})) . (7) The signal vp obtained at every antenna p is a linear su- J∗Ni(θθθ)⊗J(N−1)i(θθθ)vec(Ci{pq}) perposition of the K sources corrupted signals, v where i pi ∈ CalibrationisessentiallytheMaximumLikelihood(ML)esti- 1,2,...,K ,plustheantenna’sthermalnoise.Themultitudeof { } mationoftheunknownparametersθθθ(P complexvaluesor2P real ignoredfaintersourcesalsocontributestothisaddietivenoise. values),oroftheJonesmatricesJ(θθθ),fromEq.(6)andremovalof Thevoltagescollectedattheinstrumentantennasgetcorrected theKsources.Notethatcalibrationmethodscouldalsobeapplied forgeometricdelays,basedonthelocationoftheirantennas, and totheuncorrectedvisibilitiesof(4)toestimateG andG errors p q someinstrumentaleffects,liketheantennaclockphasesandelec- aswell. tronicgains. Then, they arecorrelated inthearray’s correlator to TheJonesmatrixJ ,foreveryi-thdirectionandateveryp-th pi generatevisibilities(Hamakeretal.1996).Thevisibilitymatrixof antenna,isgivenas thebaselinep−q,E{vp⊗vqH},isgivenby J E Z F . (8) pi pi pi pi K ≡ Vpq =Gp Jpi(θθθ)Ci{pq}JHqi(θθθ) GHq +Npq. (2) In Eq. (8), Epi, Zpi, and Fpi are the antenna’s voltage pattern, i=1 ! ionospheric phase fluctuation, and Faraday Rotation Jones matri- X ces,respectively. Inpractice,theE,Z,andFJonesmatricesob- In Eq. (2), θθθ CP, P = 4KN, is the unknown instrumental ∈ tainedfornearbydirectionsandforagivenantennaarealmostthe andskyparametervector,Npq istheadditive2 2noisematrix × same.Thus,foreveryantennap,ifthei-thandj-thsourceshavea of thebaseline p q,andC istheFouriertransformof the i-th source cohere−ncy matrixi{Cpqi}= E{ei ⊗eHi } (Born&Wolf smallangularseparationfromeachother,wemayconsider i1s9I9i9,;thHeanmCakie=retI2ailI.21.9C9o6n).siIdferthinegit-hthisssoouurrcceertaodhiaatvioeneqinutaetnosriitayl Jpi∼=Jpj. (9) coordinate, (RightAscensionα,Declinationδ),equal to(αi,δi), Thisistheunderlyingassumptionforclusteredcalibration. and the geometric components of baseline p q tobe (u,v,w), Clusteredcalibrationfirstassignssourceclusters,Li fori then − 1,2,...,Q whereQ K,onwhichtheskyvariationisconsid∈- { } ≪ eredtobeuniform.Then,itassumesthereexistsauniqueJ which Ci{pq} =e−2π(ul+vm+w(√1−l2−m2−1))Ci, (3) issharedbyallthesourcesofthei-thclusterLi,i 1,2,p.i..,Q , ∈{ } atreceiverp,p 1,2,...,N .Basedonthat,thevisibielityatev- where2 = 1and, ∈{ } − erybaselinep q,givenbyEq.(4),isreformulatedas − l=sin(αi−α0)cos(δi), Q m=cos(δ0)sin(δi)−cos(αi−α0)cos(δi)sin(δ0), Vpq = Jpi(θθθ){ Cl{pq}}JHqi(θθθ)+Npq. (10) arethesourcedirectioncomponentscorrespondingtotheobserva- Xi=1 lX∈Li e e e e e e tionphasereferenceof(α0,δ0)(Thompsonetal.2001).Theerrors In Eq. (10), Npq is the clustered calibration’s effective noise at commontoalldirections,suchasthereceiverdelayandamplitude baselinep qwhichwillbeexplicitlydiscussedatsection4.Note − errors,aregivenbyGp andGq.Initialcalibrationatafinertime that clusteredecalibration estimates the new unknown parameter (cid:13)c 2007RAS,MNRAS000,1–18 4 Kazemi et al. θθθ CPe where P = 4QN. Denoting the effective signal of ev- For defining thecentroids, weassociate aweight toevery source ∈ eryi-thclusteratbaselinep qby xi,as − e e I Ci{pq} ≡ Cl{pq}, (11) wi =w(xi)≡ I∗i, fori∈{1,2,...,K}, (16) e lX∈Li whereIiisthesource’sintensityandI∗ =min I1,I2,...,IK . theclusteredcalibrationvisibilityvectoratthisbaseline(vectorized { } Applyingtheweightstotheclusteringprocedure,thecentroidsof formofEq.(10))is theclustersleanmostlytowardsthebrightestsources.Thatcauses a tendency in faint sources to gather with brighter sources close Q vpq = J∗qi(θθθ)⊗Jpi(θθθ)vec(Ci{pq})+npq. (12) tboeitnhgemaddiendtouponweitchlusostmere.bTrhiguhst,erthseoiurrwceesa’ksigsinganlas.lsMaorreeopvroerm,voeterdy i=1 X Finally,stackingupteheveisibilietieseofalltheeinstrumeent’sbaselines strongsourceswillbeisolatedsuchthattheirsignalsarecalibrated individually,withoutbeingaffectedbytheotherfaintsources. invector y,clusteredcalibration’sgeneral measurement equation We cluster radio sources using weighted K-means isresultedas (Kerdprasopetal. 2005) and divisive hierarchical clustering Q (Johnson 1967) algorithms. Since the source clustering for cal- y= si(θθθ)+n. (13) ibration is performed offline, its computational complexity is Xi=1 negligiblecomparedwiththecalibrationprocedureitself. InEq.(13)s isdefinedsimilarteoseinEeq.(7)whereJandCare i i replacedbyJandC,respectively. 3.1 WeightedK-meansclustering Becausee of the similarity between the clustered and the un- clustered caelibratioen’s measurement equations presented by Eq. Step1.SelecttheQbrightestsources,x1∗,x2∗,...,xQ∗,andini- (13) and Eq. (6), respectively, they could utilize the same cali- tializethecentroidsofQclustersbytheirlocationsas bration techniques. Thus, the only difference between these two typesofcalibrationisthatclusteredcalibrationsolvesforclusters cq ≡[αq∗,δq∗], forq∈{1,2,...,Q}, q∗ ∈{1∗,2∗,...,Q∗}. (17) ofsourcesinsteadoffortheindividualones.Thatupgradesthesig- Step2.Assigneachsourcetotheclusterwiththeclosestcentroid, nalswhichshouldbecalibratedforfromEq.(3)toEq.(11). definingthemembershipfunction mLq(xi)=(10,, iOfthde(rxwi,isceq)=min{d(xi,cj)|j =1,...,Q} 3 CLUSTERINGALGORITHMS Step3.Updatethecentroidsby Clusteringisgroupingasetofdatasothatthemembersofthesame g1r9o9u0p).(Tchluisstesirm) ihlaarvietysiosmdeefisniemdilbaaristeiedso(nKtahuefmapapnl&icaRtioounssoefeuthwe cq = PKi=Ki=11mmLLqq(x(xi)i)wwixii, forq∈{1,2,...,Q}. (18) clusteringmethod. We need to define clustering schemes in which two radio RepeatstepPs2and3untiltherearenoreassignmentsofsourcesto clusters. sources merge to a single cluster based on the similarity in their direction dependent gain errors ( Eq. (8) ). Radiations of sources that are close enough to each other in the sky are assumed to be 3.2 Divisivehierarchicalclustering affectedbyalmostthesamecorruptions(Eq.(9)).Basedonthat, weaimtodesignsourceclusterswithsmallangulardiameters.On Step 1. Initialize the cluster counter Q′ to 1, assign all the K theotherhand,everycluster’sintensityisthesumoftheintensities sourcestoasingleclusterL1and∅toasetofnullclustersA. ofitsmembers(Eq.(11)).Inordertokeepabalancebetweendif- Step2.ChooseclusterLq∗,forq∗ 1,2,...,Q′ A,withthe ∈{ }− ferentclusters’intensities,weintendtoapplyweightedclustering largestangulardiameter techniquesinwhichthesourcesareweightedproportionaltotheir D(Lq∗)=max D(Lq)q 1,2,...,Q′ A . (19) intensities. { | ∈{ }− } Suppose that the K sources, x ,x ,...,x have Step 3. Apply the presented weighted K-means clustering tech- 1 2 K (α1,δ1),(α2,δ2),...,(αK,δK) equatorial coordinates, re- niquetosplitLq∗ intotwoclusters,L′q∗ andL′q′∗. spectively. The aim is to provide Q clusters so that the objective fduianmcteiotenrfof=clusPteQqr=L1qD,f(oLrqq)∈is{m1,i2ni,m..iz.e,dQ.}D,d(Lefiqn)eidsatshe angular SLRteqe∗ppe≡a4t.LsItf′qe∗pD,s(L2LQ,′q3′∗,≡)an+LdD′q4′∗(u,Lnant′q′i∗dl)QA<′==D∅Q(,L.oqt∗h)e,rwthiesnesseettAQ′==AQ∪′{+q∗1},. D(Lq) max d(xi,xj)xi,xj Lq , (14) ≡ { | ∈ } andd(.,.)istheangularseparationbetweenanytwopointsonthe 3.3 Comparisonofclusteringmethods celestialsphere.Havingtworadiosourcesaandbwithequatorial Hierarchical clustering method tends to design clusters with al- coordinates(αa,δa)and(αb,δb),respectively,theangularsepara- mostthesameangulardiameters,whereas,theK-meansclustering tiond(a,b),inradians,isobtainedby methodtendstokeepthesamelevelofintensityatallitsclusters. tan−1pcos2δbsins2in∆δaαs+inδ[bco+sδcaossiδnaδcbo−sδsbicnoδsa∆coαsδbcos∆α]2, Iinndperdacictiactein,gsitnhceeshaimerearscohluictiaolncslutostseoriunrgcemseinthsomdamllackleusstleersss,ietrproerrs- (15) formsbetterthanWeightedK-meansclusteringinaclusteredcali- where∆α=αb αa. brationprocedure.But,whenthenumberofsources(andclusters) − (cid:13)c 2007RAS,MNRAS000,1–18 ClusteredRadioInterferometricCalibration 5 42 42 41 41 40 40 g.]39 g.]39 e e d d δ [38 δ [38 37 37 36 36 35 35 278 276 274 272 270 278 276 274 272 270 α [deg.] α [deg.] Figure1.Asimulated8by8degreesskyoffiftypointsourceswithinten- Figure 2. Fifty point sources are clustered into ten source clusters by sitiesbelow3Jy.Thesourcepositions andtheirbrightnessarefollowing WeightedK-meansclustering technique. Thereisnotagoodbalancebe- uniformandRayleighdistributions,respectively.Themarkersizesarepro- tweendifferentclustersangulardiameters. portionaltosourcesintensities. 42 isverylarge(Q>100),itsprohibitivecomputationalcostsmakes thefastK-meansclusteringmethodpreferable. 41 3.3.1 Example1:WeightedK-meansandhierarchicalclustering 40 Wesimulatean8by8degreesskywithfiftypointsourceswithin- tensitiesbelow3Jansky(Jy).Thesourcepositionsandtheirbright- g.]39 nessfollow uniform andRayleigh distributions, respectively. The e d resultisshownbyFig.1inwhichthesymbolsizesareproportional δ [38 tointensitiesofsources.WeightedK-meansanddivisivehierarchi- calclusteringmethodsareappliedtoclusterthefiftysourcesinto 37 tensourceclusters.TheresultsarepresentedinFig.2andFig.3, respectively. Fig. 2 shows that the Weighted K-means clustering 36 coulddesignsourceclusterswithconsiderablylargeangulardiam- eters. Assigning the same calibration solutions to the sources of 35 theselargeclusterscouldcausesignificanterrors.However,asFig. 278 276 274 272 270 3 shows, this isnot thecase for the hierarchical clustering and it α [deg.] constructsclusterswithalmostthesameangulardiameters. Sincethenumberofsourcesinthissimulationisnotthatlarge Figure3.Fiftypointsourcesareclusteredintotensourceclustersviahier- (K = 50),thedifferencebetweenexecutiontimeofthetwoclus- archicalclusteringmethod.Differentclustershavealmostthesameangular diameters. teringmethodsisnotsignificant.Henceinsuchacase,theuseofhi- erarchicalclusteringmethod,ratherthantheWeightedK-means,is advised.However,thisisnotthecasewhenwehavealargenumber SNR (Kazemietal. 2012). This superiority is in the sense of ofsources,andsubsequentlyalargenumberofsourceclusters,in achievinganunprecedentedprecisioninsolutionswithaconsider- thesky.Todemonstratethis,weusethetwoclusteringtechniques ablylowcomputationalcomplexity,giventheoptimumclustering forclusteringthousandofsources(K =1000)intoQsourceclus- scheme.Inthenextsection,wepresentdifferentcriteriaforfinding ters,Q 3,4,...,100 .Themethods’computationaltimesver- theoptimumnumberofclustersatwhichtheclusteredcalibration ∈{ } susthenumber of clustersareplottedinFig.4.AsFig.4shows, performsthebest. forlargeQs,thecomputationalcostofWeightedK-meansismuch cheaperthantheoneofthehierarchicalclustering.Thatcanmake the Weighted K-means clustering method more suitable than the 4.1 Cramer-RaoLowerBounds hierarchicalclusteringforsuchacase. The most fundamental assumption in clustered calibration is that thesourcesatthesamecluster haveexactlythesamecorruptions in their radiated signals. This assumption is of course incorrect, 4 PERFORMANCEANALYSIS nonetheless, it provides us with a stronger signal, the sum of the In this section, we explain the reasons for clustered calibration’s signalsinthewholecluster.Wepresentananalyticcomparisonof betterperformance,comparedtoun-clusteredcalibration,atalow clustered and un-clustered calibration where we use the Cramer- (cid:13)c 2007RAS,MNRAS000,1–18 6 Kazemi et al. Inother words, thevarianceof anyunbiased estimatorof theun- knownparametervectorθθθisboundedfrombelowbythediagonal Weighted K−means 800 elementsof[ (θθθ)]−1. Divisive Hierarchical I s) Using(23)and(26),theFisherinformationmatrixofthevis- me (600 ibilityvectoryisobtainedas al ti I(θθθ)=2σ−2Re(JsHJs), (27) on400 whereJsistheJacobianmatrixofswithrespecttoθθθ ati ut 2 ∂ mp200 Js(θθθ)= ∂θθθ{J∗qi⊗Jpi}[I4⊗vec(Ci{pq})]. (28) o i=1 C X Thus,variationsofanyunbiased estimatorofparameter vectorθθθ, 0 letssayθθθ,isboundedfrombelowbytheCRLBas 0 20 40 60 80 100 Number of clusters b Var(θθθ)>[2σ−2Re(JsHJs)]−1. (29) Figure4.WeightedK-meansanddivisivehierarchicalclusteringmethods Letstrytoboundtheerrorvariationsoftheclusteredcalibra- b computational costs.Forsmallnumberofsourceclusters,thereisnodif- tion parameters assuming that the two sources construct a single ferencebetweenexecutiontimesofthetwoclusteringmethods.But,when cluster,calledclusternumber1.WereformEq.(20)as thenumberofsourceclustersislarge,thecomputationalcostofWeighted K-meansbecomesmuchcheaperthantheoneofthehierarchicalclustering. Vpq =Jp1(C1{pq}+C2{pq})JHq1+Γ1{pq}+Γ2{pq}+Npq, (30) where Γ , referred to as the “clustering error” matrices, are ie{pq} e Rao Lower Bound (CRLB) (Kay 1993) as a tool to measure the givenby performanceofthecalibration. Γi{pq} =JpiCi{pq}JHqi−Jp1Ci{pq}JHq1, (31) andJp1(θθθ)istheclusteredcalibrationseolutionatreeceiverp. 4.1.1 EstimationsofCRLBfortwosourcesatasinglecluster Eq.(30)impliesthatwhatisconsideredasthenoisematrixN in pq Forsimplicity,firstconsiderobservingtwopointsourcesatasingle thecelusteeredcalibrationdatamodel,Eq.(10),isinfact baseline,letssaybaselinep q.BasedonEq.(4),thevisibilities e aregivenby − Npq ≡Γ1{pq}+Γ2{pq}+Npq. (32) Vectorizing Eq. (30), the clustered calibration visibility vector is Vpq =Jp1C1{pq}JHq1+Jp2C2{pq}JHq2+Npq, (20) obtainedby e binilitthyevuenc-etcolruissteredcalibrationstrategy.VectorizingVpq,thevisi- y=J∗q1⊗Jp1vec(C1{pq}+C2{pq})+npq, (33) y=J∗q1⊗Jp1vec(C1{pq})+J∗q2⊗Jp2vec(C2{peq})+npq. wherWeneppqo=intvoeeuct(Nthapteqd).ependingontheobservationeaswellasthe Assumingnpq (0,σ2I4),wehave positionesofthetwoesourcesonthesky,theclusteringerrorΓi{pq} ∼CN willhavedifferentproperties.However,inordertostudytheper- y (s(θθθ),σ2I4), (21) formance of theclusteredcalibrationinastatisticalsense, and to ∼CN simplifyouranalysis,wemakethefollowingassumptions. where s(θθθ)≡ J∗qi(θθθ)⊗Jpi(θθθ)vec(Ci{pq}). (22) and(i)ovCerondsifidfeerrensttastkisytirceaalliezxaptieocntsatwiohneroevtehredsioffuerrceenstaorbesrearnvdaotimonlys i=1,2 X distributedonthesky.Inthatcase,almostsurelyE J E J UsingEq.(21),thelog-likelihoodfunctionofthevisibilityvector { } → { } andconsequently yisgivenby e E Γ 0. (34) (θθθy)= 4ln π σ−2(y s(θθθ))H(y s(θθθ)). (23) { i{pq}}→ L | − {σ2}− − − Inotherwords,weassumetheclusteringerrortohavezeromean CRLBisatightlowerboundontheerrorvarianceofanyun- overmanyobservationsofdifferentpartsofthesky. biasedparameterestimators(Kay1993).Basedonitsdefinition,if (ii) Weassumethatthecloserthesourcesaretogetherinthesky, thelog-likelihoodfunctionoftherandomvectory, (θθθy),satis- thesmallertheerrorsintroducedbyclusteringwouldbe.Therefore, L | fiesthe“regularity”condition givenasetofsources,theclusteringerrorwillreduceasthenum- berofclustersincrease.Infactthiserrorintroducedbyclustering ∂ Ey ∂θθθL(θθθ|y) =0, forallθθθ, (24) vanishes when the number of clusters is equal to the number of (cid:20) (cid:21) sources(eachclustercontainsonlyonesource).Therefore,givena setofsources,thevarianceofΓ willdecreaseasthenumber thenforanyunbiasedestimatorofθθθ,θθθ, i{pq} ofclustersincrease. var(θθθi)>[I−1(θθθ)]ii, fobri∈{1,...,M}, (25) Using Eq. (34) and bearing in mind that E Npq = 0, we whereI(θθθ)istbheFisherinformationmatrixdefinedas can consider npq ∼ CN(0,σ2I4) where E{np{qnHpq}} = σ2I4. Therefore, ∂2 (θθθy) I(θθθ)=−Ey ∂Lθθθ∂θθθ|T . (26) e y e (s,σ2I4), e e e (cid:20) (cid:21) ∼CN e (cid:13)ec 2007RAS,MNRAS000,1–18 ClusteredRadioInterferometricCalibration 7 The CRLBs obtained for the un-clustered and clustered cal- 2000 ibrations in Eq. (29) and Eq. (35), respectively, are both almost Clustered calibration equaltotheinverseoftheSignaltoInterferenceplusNoiseRatio Un−clustered calibration (SINR), SINR−1. In un-clustered calibration, the effective signal 1500 for the faintest source is C where the noise is N . There- 2{pq} pq fore,SINRforthissourceis CRLB1000 SINR2 = ||CN2{ppqq}2||2. (36) || || But, in clustered calibration, the effective signal and noise are 500 C{pq} ≡ C1{pq} +C2{pq} and Npq, respectively. Thus, SINR fortheclusteris e e C 2 00 1000 2000 3000 4000 5000 SINRc = || N{pq}|2| . (37) Clustered calibration effective noise power ||e pq|| Clusteredcalibrationhasanimprovedperformancewhen Figure5.Clusteredandun-clusteredcalibrationsCRLB.Whentheeffec- e tivenoisepoweroftheclusteredcalibration,||Ne||2,issmallenough,then SINR SINR . (38) c 2 its CRLB is lower than of the un-clustered calibration’s and it reveals a ≫ superiorperformance. Considerthetwopossibleextremesinaclusteredcalibration procedure: s≡J∗q1⊗Jp1vec(C1{pq}+C2{pq}), sm(ai)llCnulumstbeerrinogfmclaunsytersso.uIrncetshiisncaaslea,rgtehefiaenldguolfarvideiwamteotearvoefrya andsimilartoeEq.e(29),weehave clusterisprobablytoolargefortheassumptionofuniformcorrup- tionstoapply.Subsequently,dedicatingasinglesolutiontoallthe Var(θθθ)>[2σ−2Re(JesHJes)]−1. (35) sourcesofeveryclusterbyclusteredcalibrationintroducescluster- ingerror matricesΓΓΓ witha largevariance (see Eq. (31)). Having We use numerical sibemulationes to compare the un-clustered and high interference power, the clustered calibration effective noise clustered calibrations performances via their CRLBs which are NofEq.(32)becomesverylarge.Therefore,clusteredcalibration givenbyEq.(29)andEq.(35),respectively. SINRwillbeverylowanditdoesnotproducehighqualityresults. e(ii) Clustering sources in a small field of view to a very large numberofclusters.Inthiscase,thevarianceofΓΓΓmatricesareal- 4.1.2 Example2:CRLBfortwosourcesandonecluster mostzerowhilethesignalpowersofsourceclustersarealmostas Wesimulateatwelvehour observation of twopoint sources with lowasoftheindividualsources.Therefore,theSINRofclustered intensitiesI =11.25andI =2.01Jyatskycoordinates(l,m) calibrationisalmostequaltotheun-clusteredcalibrationSINRand 1 2 equalto( 0.014, 0.005)and( 0.011, 0.010)radians,respec- the calibration performance isexpected tobe almost the same as − − − − tively.Weusetheuv-coverageofWesterborkSynthesisRadioTele- well. scope(WSRT)with14receiversinthissimulation. Thus,thebestefficiencyofclusteredcalibrationisobtainedatthe We Consider the J Jones matrices in Eq. (20) to be diagonal. smallestnumberofclustersforwhichEq.(38)issatisfied.Weuse Their amplitude and phase elements follow (0.75,0.95) and theSINRofEq.(38)asanefficientcriterionfordetectingtheopti- U (0.003,0.004)distributions,respectively.Thebackgroundnoise mumnumberofclusters. U isN (0,10I). Jones matricesof the clustered calibration, ∼ CN Jp1 for p = 1,2,areobtainedasJp1+ (0.02,0.40)ejU(0.5,2). U 4.1.4 Generalizationtomanysourcesandmanyclusters For20realizationsofJmatrices,wecalculatedCRLBoftheun- celustered and clustered calibrations using Eq. (29) and Eq. (35), Forthevisibilityvectoryofun-clusteredcalibration’sgeneraldata respectively.TheresuletsarepresentedinFig.5.Asshowninthis model,presentedbyEq.(6),wehave figure, for small enough errors matricesΓΓΓ of Eq. (31), the clus- tered calibration’s CRLB stands below the un-clustered calibra- K tion’sCRLB.Ontheotherhand,withincreasingpoweroferrorma- y∼CN( si(θθθ),ΠΠΠ). (39) trices,orthepowerofeffectivenoiseN,theun-clusteredcalibra- Xi=1 tion’sCRLBbecomeslowerthantheclusteredcalibration’sCRLB. Therefore,theCRLBofun-clusteredcalibrationis e K K −1 var(θθθ)> 2Re ( Jsi(θθθ))HΠΠΠ−1( Jsi(θθθ)) , 4.1.3 AnalysisofCRLB (cid:20) (cid:26) i=1 i=1 (cid:27)(cid:21) X X Generally, if source 1 is considerably brighter than source 2, whereJsi istheJacobianmatrixofsiwithrespecttoθθθ. C C , and if the weak source power ismuch Computing the exact CRLB is more complicated when we || 1{pq}|| ≫ || 2{pq}|| lower than the noise level, C N , then clustered haveclusteredcalibration.Intheclusteredcalibrationmeasurement || 2{pq}|| ≪ || pq|| calibration’s performance is better than un-clustered calibration. equation,givenbyEq.(13),wehave Note that the worst performance of both calibrations is at the K faintestsourceandwearemoreconcernedtocomparetheCRLBs n ΓΓΓi+n, (40) forthissource. ≡ i=1 X e (cid:13)c 2007RAS,MNRAS000,1–18 8 Kazemi et al. wherenistheun-clusteredcalibration’snoisevector, x 10−10 5 ΓΓΓi=[vec(ΓΓΓi{12})T...vec(ΓΓΓi{(N−1)N})T]T, (41) Clustered calibration Un−clustered calibration andΓΓΓ isgivenbyEq.(31).Duetotheexistenceofthenuisance i{pq} 4 parametersΓΓΓi intheclusteredcalibrationdatamodel, calculation ofitsconventionalCRLBisimpractical.Thisleadsustotheuseof B Cramer-Raolikeboundsdevisedinthepresenceofthenuisancepa- L3 rameters(Gini&Reggiannini2000).WeapplytheModifiedCRLB R C (MCRLB)(Ginietal.1998)totheperformanceof clusteredcali- M bration. 2 TheMCRLBforestimatingtheerrorsofθθ˜θinthepresenceof thenuisanceparametersΓΓΓ(clusteringerror)isdefinedas b 1 ∂ ∂ −1 var(θθ˜θ)> E E ln P(yΓΓΓ;θθ˜θ) , (cid:20) y,ΓΓΓ(cid:26)− y|ΓΓΓ(cid:26)∂θθ˜θ∂θθ˜θT { | }(cid:27)(cid:27)(cid:21) 0 10 Num20ber of clu3s0ters 40 50 (42) b whereP(yΓΓΓ;θθ˜θ)istheProbabilityDensityFunction(PDF)ofthe Figure 6. Clustered calibration’s MCRLB. Forvery small Q,where the | visibilityvector y assumingthat theΓΓΓmatricesofEq. (41) area effectofinterferenceislarge,MCRLBishigh.Byincreasingthenumber prioriknown.Sincen (0,ΠΠΠ),fromEq.(13)wehave of clusters, MCRLB decreases and reaches its minimum where the best ∼CN performanceoftheclusteredcalibrationisexpected.Afterthat,duetothe Q K dominanteffectofthebackgroundnoise,MCRLBstartstoincreaseuntilit yΓΓΓ ([ ˜si+ ΓΓΓi],ΠΠΠ), (43) reachestheun-clusteredcalibrationCRLB. | ∼CN i=1 i=1 X X andthereforeinEq.(42), E ∂ ∂ ln P(yΓΓΓ;θθ˜θ) ,which − y|ΓΓΓ ∂θθθ˜∂θθθ˜T { | } (cid:20) (cid:21) −6 iscalledthemodifiedFisherinformation,isequalto x 10 10 Q K Q K Clustered calibration 2Re [ J (θθ˜θ)+ J (θθ˜θ)]HΠΠΠ−1[ J (θθ˜θ)+ J (θθ˜θ)] . 9 { ˜si ΓΓΓi ˜si ΓΓΓi } Un−clustered calibration i=1 i=1 i=1 i=1 X X X X 8 Note that E in Eq. (42) could be estimated by Monte-Carlo y,ΓΓΓ method. R 7 Asaruleofthumb,reducingtheheavycomputationalcostof N MCRLB,onecaninterprettheSINRtestofEq.(38)asfollows:If SI 6 inaveragetheeffectiveSINRofclusteredcalibration,SINR ,gets c higherthantheeffectiveSINRofun-clusteredcalibrationobtained 5 fortheweakestobservedsignal,SINR , w 4 E SINRc E SINRw , (44) { }≫ { } 3 thenclusteredcalibrationcanachieveabetterresults.InEq.(44), 0 10 20 30 40 50 Number of clusters theexpectationistakenwithrespecttothenoiseN,errormatrices ΓΓΓ,andallthebaselines. Figure 7.Clustered calibration’s SINR. SINRis low for smallQ,when the interference is large. By increasing the number of clusters the SINR increasesandgetsitshighestlevelforwhichthebestperformanceofthe 4.1.5 Example3:MCRLBandSINRestimations calibrationisexpected.Afterthat,itdecreasesduetothedominanteffectof thebackgroundnoise,andconvergestotheun-clusteredcalibrationSINR. We simulate WSRT including N = 14 receivers which observe fifty sources with intensities below 15 Jy. The source positions andtheirbrightnessfollowuniformandRayleighdistributions,re- spectively. The background noise is N (0,15IM), where ∼ CN M = 2N(N 1) = 364. We cluster sources using divisive estimatedresultsofMCRLBandE SINRc arepresentedbyFig. − { } hierarchical clustering, into Q number of clusters where Q 6andFig.7,respectively.AswecanseeinFig.6,forverysmall ∈ 3,4,...,50 .Clusteredcalibration’sJonesmatrices,J,aregener- Q,wheretheeffectofinterferenceislarge,MCRLBishigh.Byin- a{tedas (0.9},1.1)ejU(0,0.2).Sinceforsmallernumberofclusters, creasingthenumberofclusters,MCRLBdecreasesandreachesits U weexpectlargerinterference(errors)inclusteredcaliberation’sso- minimumwherethebestperformanceoftheclusteredcalibrationis lutions,foreveryQ,weconsider 5i=01Γi ∼CN(0,1Q50IM).The expected.Afterthat,duetothedominanteffectofthebackground choiceofthecomplexGaussiandistributionfortheerrormatrices noise,MCRLBstartstoincreaseuntilitreachestheCRLBofun- P ΓΓΓisduetothecentrallimittheoremandtheassumptionsmadein clustered calibration. Thesame result isderived fromE SINRc { } section4.1.1. plotofFig.7.AsFig.7shows,E SINRc islowforverysmallQ, { } Weproceed tocalculate theclustered calibration’sMCRLB, whentheinterference(i.e.,theerrorduetoclustering)islarge.By given by Eq. (42), and E SINRc , utilizing the Monte-Carlo increasingthenumberofclusters,E SINRc increasesandgetsits { } { } method.JacobianmatricesforMCRLBarecalculatednumerically peak for which the clustered calibration performs the best. After andincomputationofE SINRc ,signalpowerofeveryclusteris that, it decreases and converges to the E SINR of un-clustered { } { } obtainedonlyusingthecluster’sbrightestandfaintestsources.The calibration. (cid:13)c 2007RAS,MNRAS000,1–18 ClusteredRadioInterferometricCalibration 9 4.2 Computationalcost and“solver noise”which isreferredtoaserrorsproduced bythe calibrationalgorithmitself.WeassumethatthetrueJonesmatrices Inthemeasurementequationofun-clusteredcalibration,presented along different directions (clusters) at the same antenna are sta- in Eq. (6), we have M = 2N(N 1) constraints given by the − tisticallyuncorrelated.Notethatthisassumptionisonlymadefor visibility vector y, and need to solve for P = 4KN unknown theLRTtoproduce reasonable resultsand thisassumption isnot parametersθθθ.IfP >M,thenEq.(6)willbeanunder-determined neededforclusteredcalibrationtowork.Therefore,ifsuchcorre- non-linearsystem.Thisclarifiestheneedofhavingasmallenough lations exist, they are caused by the aforementioned errors. Con- N(numberofantennas)andalargeenoughK(numberofsources) sequently,themoreaccuratetheclusteredcalibrationsolutionsare, forestimatingθθθ.However,clusteredcalibration,Eq.(13),hasthe thesmallertheirstatisticalsimilaritieswouldbe.Basedonthisgen- advantageofdecreasingthenumberofdirections,K,relativetothe eralstatement,thebestnumberofclustersinaclusteredcalibration numberofsourceclusters,Q K.Thisconsiderablycutsdown ≪ procedureistheonewhichprovidesuswiththeminimumcorrela- thenumberofunknownparametersP thatneedstobecalibrated, tionsinthecalibratedsolutions.Notethatforafixedmeasurement, thusreducingthecomputationalcostofcalibration. thecorrelationduetothesystemnoiseisfixed.Therefore,differ- encesinthestatisticalsimilaritiesofsolutionsobtainedbydifferent clusteringschemesareonlydueto“clusteringerrors”and“solver 5 SELECTIONOFNUMBEROFCLUSTERS noise”. Toinvestigatethestatisticalinteractionbetweenthegainsolu- Consideraclusteredcalibrationprocedurewithapredefinedclus- tionsweapplytheLikelihood-RatioTest(LRT). tering scheme. There is no guarantee that the calibration results forQnumber ofclusters,whereQ 1,2,...,K israndomly ConsidertheclusteredcalibrationsolutionJpi(θθθ)fordirectionsi, chosen, isthemostaccurate.Thus,w∈e{seektheopti}mumnumber i 1,2,...,Q ,atantennasp,wherep 1,2,...,N , ∈{ } ∈{ } ofclustersatwhichtheclusteredcalibrationperformsthebest.In e e J J ttheriisosnec(AtioICn,)w(Aekdaeiskceri1b9e7t3h)e,uassewoefl:l(ai)sA(iki)aiLkiek’esliIhnofoordm-RaatitoionTCersi-t Jpi =" Je1211,,pp Je1222,,pp #i. (49) (LRT) (Graves 1978), in finding this optimum Q for a given ob- e Then,theparametervectorθθθpei(corresepondingtothei-thdirection servation. Some other alternative criteria could also be found in andp-thantenna)isobtainedby Wax&Kailath(1985). Notethat for different clustering schemes e theoptimumQisnotnecessarilythesame. θθθpi=[Re(J11,p)Im(J11,p) ... Re(J22,p)Im(J22,p)]Ti . (50) Letusdefineforeachantennapandeachpairofdirectionskand e e e e e 5.1 Akaike’sInformationCriterion(AIC) l,wherekandlarebelongto 1,2,...,Q ,avectorzpklas { } Weutilize Akaike’s Information Criterion(AIC) tofind the opti- z =[θθθT θθθT]T. (51) pkl pk pl mumQforclusteredcalibration. Infact,weareconcatenatingthesolutionsofthesameantennafor Considerhavingn ∼ CN(0,σ2IM)inthegeneraldatamodelof twodifferentdirections(clusterse)togeether. clusteredcalibration,Eq.(13).Then,thelog-likelihoodofthevisi- WedefinethenullH modelas bilityvectoryisgeivenby e 0 H : z (m,Σ ). (52) (θθθy) = M logπ M logσ2 0 pkl ∼N 0 L | − − where Q Q 1 e − σ2(y− si(θθθ))He(y− si(θθθ)). (45) m=[m¯(θθθpk)T m¯(θθθpl)T]T, (53) i=1 i=1 X X Themaximumlikelihoodesetimeationofthenoieseevarianceσ2is and e e e σ2= M1 (y− Q si(θθθ))H(y− Q si(θθθ)). e (46) Σ0 =" s2(θθ0θpk) s2(0θθθpl) #. (54) i=1 i=1 e SubstitutingceEq.(46)inEqX.(4e5),ewearriveaXttheemeaximumlikeli- InEq.(53)andEq.(54),m¯ ands2 aredeenotingsamplemeanand samplevariance,respectively.Notethathavingalargenumberof hoodestimationofθθθ, samplesinhand,theassumptionofhavingaGaussiandistribution for solutions is justified according to the Central Limit theorem. (θθθy) = Melogπ M L | − − ThestructureofthevariancematrixΣ tellsusthatthestatistical 0 Q Q be − Mlog{M1 (y− si(θθθ))H(y− si(θθθ))}.(47) bcoetrwreelaetniotnhebseotwluetieonntshθθθecoamnpdoθθθnen,tissozfertoh.eTrahnisdiosmthveeicdteoarlzcqaksle,oinr i=1 i=1 pk pl X X whichtherearenoestimationerrors. UsingEq.(47),theAICisgivenbye e e e Toinvestigatetheevalidityeofthenullmodelcomparedwiththe caseinwhichthereexistssomecorrelationbetweenthesolutions, AIC(Q)= 2 (θθθy)+2(2P). (48) − L | wedefinethealternativeH1modelas TheoptimumQisselectedastheobenethatmineimizesAIC(Q). H1: zpkl (m,Σ1), (55) ∼N wherethevariancematrixΣ isgivenby 1 5.2 Likelihood-RatioTest(LRT) s2(θθθ ) Cov(θθθ ,θθθ ) pk pk pl Errorsinclusteredcalibrationoriginatefromthesystem(skyand Σ = . (56) 1 instrumental)noise,“clusteringerrors”introducedinsection4.1.1, Cov(θθθ ,θθθ )T s2(θθθ ) pek pl e ple (cid:13)c 2007RAS,MNRAS000,1–18 e e e 10 Kazemi et al. 1994;Yatawattaetal.2009;Kazemietal.2011b)withnineitera- Cov(θθθ ,θθθ )inEq.(56)isthe8 8samplecovariancematrix. tions.PlotsoftheaveragedFrobeniusdistancebetweenthesimu- pk pl × Usingtheabovemodels,theLikelihood-Ratioisdefinedas lated(true)Jonesmatricesandtheobtainedsolutionsisshownin e e Fig.11.AswecanseeinFig.11,forbothclusteringschemes,in- Likelihoodfornullmodel Λ= 2ln , (57) creasingthenumberofclustersdecreasesthisdistanceandthemin- − (cid:18)Likelihoodforalternativemodel(cid:19) imumisreachedatapproximatelythirtythreeclusters(Q = 33). Beyondthisnumberofclusters,itincreasesuntilthefiftyindividual whichhasaχ2 distributionwith64degreesoffreedom.AsΛbe- sourcesbecomeindividualclusters.Thisshowsthatthebestperfor- comessmaller,thenull model,inwhichthestatisticalcorrelation manceofboththedivisivehierarchicalandtheweightedK-means betweenthesolutionsiszero,becomesmoreacceptableratherthan clusteredcalibrationsisapproximatelyatthirtythreeclustersand thealternativemodel.Therefore,thesmallertheΛis,thelessthe issuperiortothatoftheun-clusteredcalibration. clusteredcalibration’serrorsare,andvice-versa. The Frobenius distance curves in Fig. 11, the MCRLB curve in Fig.6,andtheSINRcurveinFig.7illustratethatclusteredcalibra- tionwithanextremelylownumberofclustersdoesnotnecessarily performbetterthantheun-clusteredcalibration.Thereasonisthat 6 SIMULATIONSTUDIES whenthereareonlyasmallnumberofclusters,theinterference,or We use simulations to compare the performance of un-clustered theso-called“clusteringerrors”introducedinsection4.1.1,isrela- andclusteredcalibration.Workingwithsimulationshastheadvan- tivelylarge.Therefore,theeffectofthisinterferencedominatesthe tageofhavingthetruesolutionsavailable,whichisnotthecasein clustering of signals. On theother hand, wereach the theoretical real observations. That makes the comparison much more objec- performancelimitapproximatelyaftertwentyfivenumberofclus- tive.Nevertheless,thebetterperformanceoftheclusteredcalibra- ters and therefore increasing the number beyond this point gives tionincomparisonwiththeun-clusteredonesincalibratingforreal highlyvariableresults,mainlybecausewearelimitedbythenum- observations of LOFAR is also shown by Kazemietal. (2011a); berofconstraintsasopposedtothenumberofparametersthatwe Yatawattaetal.(submitted2012). needtosolvefor.But,thisisnotthecasefortheplotsinFig.6and WesimulateanEast-Westradiosynthesisarrayincluding14 Fig.7.ThereasonisthattheMCRLBresultsofFig.6aswellasthe antennas (similartoWSRT)and an8by 8degrees skywithfifty SINRresultsofFig.7areobtained byMonte-Carlomethodwith sourceswithlowintensities,below3Jy.Thesourcepositionsand iterationsoverfiftydifferentskyandnoiserealizations.However, theirbrightnessfollowuniformandRayleighdistributions,respec- Fig.11islimitedtothepresentedspecificsimulationwithonlyone tively. The single channel simulated observation at 355 MHz is skyandonenoiserealization. showninFig.8. Theresidualimagesoftheun-clusteredcalibrationaswellas Weproceedtoaddgainerrors,multiplyingsourcecoherencies thedivisivehierarchicalandweightedK-meansclusteredcalibra- bytheJonesmatrices,asitisshowninEq.(2),tooursimulation. tionsforQ = 33areshownbyFig.12andFig.13,respectively. TheamplitudeandphaseoftheJonesmatrices’elementsaregener- AsitisshownbyFig.12,intheresultofun-clusteredcalibration, atedusinglinearcombinationofsineandcosinefunctions.Weaim thesources arealmost not subtracted at all and there isasignifi- atsimulatingaskywithalmostuniformvariationsonsmallangular cantresidualerrorremaining.TheresidualshaveasymmetricGaus- scales.Inotherwords,weprovideverysimilarJonesmatricesfor siandistributionwithvarianceσ2 = 82.29 whichismuchlarger sourceswithsmallangularseparations.Toaccomplishthisgoal,for thanthesimulated(true)noisevarianceσ2 =10.85.Ontheother everyantenna,wefirstchooseasingledirectionasareferenceand hand,thesourceshavebeenperfectlysubtractedinthecaseofclus- simulate its Jones matrix as it is explained before. Then, for the teredcalibration,Fig.13,andtheresidualsconverge tothesimu- remaining forty nine sources, at that antenna, the Jones matrices lated background noise distribution. The residuals of the divisive (amplitudeandphaseterms)arethatinitialJonesmatrixmultiplied hierarchical and Weighted K-means clustered calibrations follow by the inverse of their corresponding angular distances from that symmetriczeromeanGaussiandistributionswithσ2 =20.17and reference direction. The result of adding such gain errors to our σ2 = 18.76,respectively.Thesevariancesareclosertothesimu- simulationisshowninFig.9. latedoneσ2=10.85andthisindicatesthepromisingperformance ofclusteredcalibration.Aswecansee,hierarchicalclusteredcal- ibrationprovidesaslightlybetterresultcomparedtotheK-means one. Thisisdue tothe fact that hierarchical clustering constructs 6.1 PerformancecomparisonoftheClusteredand clustersofsmallerangulardiametersandthusitassignsthesame un-clusteredcalibrationsatSNR=2 calibration solutions to the sources with smaller angular separa- Weaddnoisen (0,σ2I)withσ2 =28,asitisshowninEq. tions. ∼CN (6),tooursimulation.TheresulthasaSNR=2andispresented WealsocalculatetheRootMeanSquaredErrorofPrediction inFig.10.WehavechosentopresentthecaseofSNR=2since (RMSEP)toassesstheperformanceofclusteredandun-clustered forthisparticularsimulatedobservationboththedivisivehierarchi- calibrations’ non-linear regressions. The results of log(RMSEP), calandtheweightedK-meansclusteredcalibrationsachievetheir presentedbyFig.14,alsojustifythatthebestefficienciesofthehi- bestperformancesatthesamenumberofclusters,aswillbeshown erarchicalandK-meansclusteredcalibrationsareobtainedatthirty laterinthissection. three number of clusters. But, note that there is a difference be- Weapplyun-clusteredandclusteredcalibrationonthesimu- tween the behavior of log(RMSEP) plot of Fig. 14 and the plots lationtocomparetheirefficiencies.Thefiftysourcesaregrouped ofMCRLB,SINR,andFrobeniusdistancebetweenthesimulated into Q 3,4,...,49 number of clusters, using the proposed JonesmatricesandsolutionsinFig.6,Fig.7,andFig.11,respec- ∈ { } divisivehierarchicalandweightedK-meansclusteringalgorithms. tively.InFig.14,log(RMSEP)ofclusteredcalibrationislessthan Self-calibration is implemented via Space Alternating General- thatofun-clusteredcalibration,evenforextremelylownumberof izedExpectationMaximization(SAGE)algorithm(Fessler&Hero clusters.Thismeansthatevenwiththoselownumber ofclusters, (cid:13)c 2007RAS,MNRAS000,1–18