MNRAS000,1–23(2016) Preprint10August2016 CompiledusingMNRASLATEXstylefilev3.0 Scalable splitting algorithms for big-data interferometric imaging in the SKA era Alexandru Onose1?†, Rafael E. Carrillo2†, Audrey Repetti1, Jason D. McEwen3, Jean-Philippe Thiran2, Jean-Christophe Pesquet4 and Yves Wiaux1 1Institute of Sensors, Signals and Systems, Heriot-Watt University, Edinburgh EH14 4AS, United Kingdom 2Signal Processing Laboratory (LTS5), Ecole Polytechnique Fédérale de Lausanne, Lausanne CH-1015, Switzerland 3Mullard Space Science Laboratory, University College London, Surrey RH5 6NT, United Kingdom 6 4Laboratoire d’Informatique Gaspard Monge, Université Paris-Est, Marne la Vallée F-77454, France 1 0 2 AcceptedXXX.ReceivedYYY;inoriginalformZZZ g u A ABSTRACT 9 In the context of next generation radio telescopes, like the Square Kilometre Ar- ray, the efficient processing of large-scale datasets is extremely important. Convex ] M optimisation tasks under the compressive sensing framework have recently emerged andprovidebothenhancedimagereconstructionqualityandscalabilitytoincreasingly I largerdatasets.Wefocushereinmainlyonscalabilityandproposetwonewconvexop- . h timisationalgorithmicstructuresabletosolvetheconvexoptimisationtasksarisingin p radio-interferometric imaging. They rely on proximal splitting and forward-backward - iterations and can be seen, by analogy with the clean major-minor cycle, as run- o r ning sophisticated clean-like iterations in parallel in multiple data, prior, and image t spaces. Both methods support any convex regularisation function, in particular the s a well studied ‘ priors promoting image sparsity in an adequate domain. Tailored for 1 [ big-data, they employ parallel and distributed computations to achieve scalability, in terms of memory and computational requirements. One of them also exploits ran- 3 domisation, over data blocks at each iteration, offering further flexibility. We present v 6 simulation results showing the feasibility of the proposed methods as well as their ad- 2 vantagescomparedtostate-of-the-artalgorithmicsolvers.OurMatlabcodeisavailable 0 online on GitHub. 4 Key words: techniques: image processing – techniques: interferometric 0 . 1 0 6 1 1 INTRODUCTION the image reconstruction need to be fast and to scale well : withthenumberofmeasurements.Suchchallengesprovided v Radio-interferometry (RI) allows the observation of radio motivationforvigorousresearchtoreformulateimagingand i emissionswithgreatsensitivityandangularresolution.The X calibration techniques for RI (Wijnholds et al. 2014). technique has been extensively investigated and provides r The construction of the first phase of SKA is sched- valuabledatadrivingmanyresearchdirectionsinastronomy, a uled to start in 2018. It will consist of two subsystems: a cosmology or astrophysics (Thompson et al. 2001). Next- low frequency aperture array, the SKA1-low, operating in generation radio telescopes, such as the LOw Frequency the 50-350 MHz frequency range and containing approxi- ARray (LOFAR) (van Haarlem et al. 2013) and the future mately131,000antennaelements;amidfrequencyarrayof Square Kilometre Array (SKA) (Dewdney et al. 2009), are reflector dishes, the SKA1-mid, operating above 350 MHz, envisaged to produce giga-pixel images and achieve a dy- consisting of 197 dishes (Dewdney et al. 2009; Broekema namic range of six or seven orders of magnitude. This will et al. 2015). Both subsystems are planned to operate on beanimprovementovercurrentinstrumentsbyaroundtwo theorderof65,000frequencybands.Datarateestimatesin orders of magnitude, in terms of both resolution and sen- this first phase are around five terabits per second for each sitivity. The amount of data acquired will be massive and subsystem (Broekema et al. 2015) and will present a great the methods solving the inverse problems associated with challenge for the infrastructure and signal processing. The celebratedcleanalgorithm(Högbom1974)anditsvariants ? E-mail:[email protected] do not scale well given the large dimension of the problem. † Theauthorshavecontributedequallytotheworkherein. They rely on local greedy iterative procedures and are slow ©2016TheAuthors 2 A. Onose, R. Carrillo et al. comparedtomodernconvexoptimisationtechniques,which fersonlypartialsplittingoftheobjectivefunctionleadingto are guaranteed to converge towards a global optimal solu- asub-iterativealgorithmicstructure.ThePDmethodoffers tion. Moreover, they are not designed for large-scale paral- the full splitting for both operators and functions. It does lelisation or distributed computing (Carrillo et al. 2014). not need sub-iterations or any matrix inversion. Addition- Inthepastfewyears,sparsemodelsandconvexoptimi- ally, it can attain increased scalability by using randomised sationtechniqueshavebeenappliedtoRIimaging,showing updates. It works by selecting only a fraction of the visi- the potential to outperform state-of-the-art imaging algo- bilities at each iteration, thus achieving great flexibility in rithms in the field (Wiaux et al. 2009a; Rau et al. 2009; Li terms of memory requirements and computational load per et al. 2011; Carrillo et al. 2012, 2013, 2014; Garsden et al. iteration, at the cost of requiring more iterations to con- 2015). These methods typically solve the imaging problem verge.Oursimulationssuggestnosignificantincreaseinthe by minimising an objective function defined as a sum of total computation cost. a data term, dependent on the measured visibilities, and The remainder of this article is organised as follows. severalregularisationterms,usuallypromotingsparsityand Section 2 introduces the RI imaging problem and describes positivity.Scalablealgorithms,specificallytailoredforlarge- thestate-of-the-artimagereconstructiontechniquesusedin scale problems using parallel and distributed schemes, are radio astronomy. In Section 3 we review some of the main justnowbeginningtogainattentioninthecontextofimag- tools from convex optimisation needed for RI imaging. Sec- ing (Carrillo et al. 2014; Ferrari et al. 2014) and calibra- tion 4 formulates the optimisation problem for RI imaging tion (Yatawatta 2015) for next-generation radio telescopes. given the large-scale data scenario and presents the pro- In this context, proximal splitting methods are very posedalgorithms,ADMMandPD,respectively.Wediscuss popularduetotheirabilitytodecomposetheoriginalprob- implementationdetailsandtheircomputationalcomplexity lemintoseveralsimpler,easiertosolve,sub-problems,each in Section 5. Numerical experiments evaluating the perfor- oneassociatedwithonetermoftheobjectivefunction(Com- mance of the algorithms are reported in Section 6. Finally, bettes & Pesquet 2011). Another class of algorithms cur- we briefly present the main contributions and envisaged fu- rently gaining traction for large-scale problems in optimi- ture research directions in Section 7. sation is based on primal-dual (PD) methods (Komodakis & Pesquet 2015). Such methods efficiently split the optimi- sation problem and, at the same time, maintain a highly 2 RADIO-INTERFEROMETRIC IMAGING parallelisable structure by solving concomitantly for a dual Radio-interferometricdata,thevisibilities,areproducedby formulationoftheoriginalproblem.Buildingonsuchtools, anarrayofantennapairsthatmeasureradioemissionsfrom the simultaneous direction method of multipliers (SDMM) agivenareaofthesky.Theprojectedbaselinecomponents, wasrecentlyproposedinthecontextofRIimagingbyCar- inunitsofthewavelengthofobservation,arecommonlyde- rillo et al. (2014). It achieves the complete splitting of the noted(u,v,w),wherewidentifiesthecomponentintheline functions defining the minimisation task. In the big-data of sight and u = (u,v) the components in the orthogonal context, SDMM scales well with the number of measure- plane. The sky brightness distribution x is described in the ments, however, an expensive matrix inversion is necessary samecoordinatesystem,withcomponentsl,m,nandwith when updating the solution, which limits the suitability of √ l = (l,m) and n(l) = 1−l2−m2, l2 +m2 ≤ 1. The the method for the recovery of very large images. general measurement equation for non-polarised monochro- The scope of this article is to propose two new algo- matic RI imaging can be stated as rithmicstructuresforRIimaging.Westudytheircomputa- Z tional performance and parallelisation capabilities by solv- y(u)= D(l,u)x(l)e−2iπu·ld2l, (1) ing the sparsity averaging optimisation problem proposed in the SARA algorithm (Carrillo et al. 2012), previously withD(l,u)=1/n(l)D¯(l,u)quantifyingalltheDDEs.Some showntooutperformthestandardcleanmethods.Theap- dominant DDEs can be modelled analytically, like the w plication of the two algorithms is not limited to the SARA component which is expressed as D¯ (l,u)=e−2iπw(n(l)−1). w prior,anyotherconvexpriorfunctionsbeingsupported.We At high dynamic ranges however, unknown DDEs, related assume a known model for the measured data such that to the primary beam or ionospheric effects, also affect the thereisnoneedforcalibration.WeuseSDMM,solvingthe measurementsintroducingtheneedforcalibration.Herewe same minimisation problem, to compare the computational work in the absence of DDEs. burden and parallelisation possibilities. Theoretical results The recovery of x from the visibilities relies on algo- ensure convergence, all algorithms reaching the same solu- rithms solving a discretised version of the inverse problem tion. We also showcase the reconstruction performance of (1). We denote by x∈RN the intensity image of which we the two algorithms coupled with the SARA prior in com- take M visibility measurements y∈CM. The measurement parison with CS-CLEAN (Schwab 1984) and MORESANE model is defined by (Dabbech et al. 2015). y=Φx+n, (2) Thefirstalgorithmicsolverisasub-iterativeversionof the well-known alternating direction method of multipliers wherethemeasurementoperator Φ∈CM×N isalinearmap (ADMM).ThesecondisbasedonthePDmethodanduses fromtheimagedomaintothevisibilityspaceandydenotes forward-backward (FB) iterations, typically alternating be- thevectorofmeasuredvisibilitiescorruptedbytheadditive tween gradient (forward) steps and projection (backward) noisen.Duetolimitationsinthevisibilitysamplingscheme, steps. Such steps can be seen as interlaced clean-like up- equation (2) defines an ill-posed inverse problem. Further- dates.Bothalgorithmsarehighlyparallelisableandallowfor more, the large number of the data points, M (cid:29) N, in- anefficientdistributedimplementation.ADMMhoweverof- troducesadditionalchallengesrelatedtothecomputational MNRAS000,1–23(2016) Scalable splitting algorithms for SKA 3 and memory requirements for finding the solution. In what the deconvolution performed by the operator T is named follows, we assume the operator Φ to be known is advance the minor cycle. All proposed versions of clean use vari- such that no calibration step is needed to estimate it. ations of these major and minor cycles (Rau et al. 2009). Duetothehighlyiterativenatureofthereconstruction cleanbuildsthesolutionimageiterativelybysearchingfor algorithms,afastimplementationofalloperatorsinvolvedin atomsassociatedwiththelargestmagnitudepixelfromthe theimagereconstructionisessential,forbothregularisation residual image. A loop gain factor controls how aggressive anddataterms.Tothispurpose,themeasurementoperator istheupdatestep,byonlyallowingafractionofthechosen is modelled as the product between a matrix G∈CM×noN atoms to be used. and an n -oversampled Fourier operator, Multiple improvements of clean have been suggested. o Inthemulti-scaleversion(Cornwell2008)thesparsitymodel Φ=GFZ. (3) isaugmentedthroughamulti-scaledecomposition.Anadap- ThematrixZ∈RnoN×N accountsfortheoversamplingand tive scale variant was proposed by Bhatnagar & Cornwell the scaling of the image to pre-compensate for possible im- (2004) and can be seen as MP with over-complete dictio- perfections in the interpolation (Fessler & Sutton 2003). In nariessinceitmodelstheimageasasuperpositionofatoms theabsenceofDDEs,Gonlycontainscompactsupportker- over a redundant dictionary. Another class of solvers, the nels that enable the computation of the continuous Fourier maximumentropymethod(Ables1974;Gull&Daniell1978; samples from the discrete Fourier coefficients provided by Cornwell & Evans 1985) solves a regularised global optimi- F. Alternatively, seen as a transform from the u–v space to sation problem through a general entropy prior. In practice the discrete Fourier space, G†, the adjoint operator of G, however, clean and its variants have been preferred even grids the continuous measurements onto a uniformly sam- thoughtheyareslowandrequireempiricallychosenconfigu- pledFourierspaceassociatedwiththeoversampleddiscrete rationparameters.Furthermore,thesemethodsalsolackthe Fourier coefficients provided by FZ. This representation of scalability required for working with huge, SKA-like data. the measurement operator enables a fast implementation thanks to the use of the fast Fourier transform for F and to the fact that the convolution kernels used are in gen- 2.2 Compressed sensing in radio-interferometry eralmodelledwithcompactsupportintheFourierdomain, Imaging algorithms based on convex optimisation and us- which leads to a sparse matrix G1. ing sparsity-aware models have also been proposed, espe- cially under the theoretical framework of compressed sens- ing (CS), reporting superior reconstruction quality with re- 2.1 Classical imaging algorithms spect to clean and its multi-scale versions. CS proposes Variousmethodshavebeenproposedforsolvingtheinverse both the optimisation of the acquisition framework, going problem defined by (2). The standard imaging algorithms beyondthetraditionalNyquistsamplingparadigm,andthe belongtothecleanfamilyandperformagreedynon-linear use of non-linear iterative algorithms for signal reconstruc- deconvolution based on local iterative beam removal (Hög- tion,regularisingtheill-posedinverseproblemthroughalow bom 1974; Schwarz 1978; Schwab 1984; Thompson et al. dimensionalsignalmodel(Donoho2006;Candès2006).The 2001). A sparsity prior on the solution is implicitly intro- keypremiseinCSisthattheunderlyingsignalhasasparse duced since the method reconstructs the image pixel by representation,x=Ψαwithα∈CD containingonlyafew pixel. Thus, clean is very similar to the matching pursuit nonzero elements (Fornasier & Rauhut 2011), in a dictio- (MP)algorithm(Mallat&Zhang1993).Itmayalsobeseen nary Ψ∈CN×D, e.g. a collection of wavelet bases or, more as a regularised gradient descent method. It minimises the generally, an over-complete frame. residualnormky−Φxk22viaagradientdescentsubjecttoan ThefirststudyofCSappliedtoRIwasdonebyWiaux implicitsparsityconstraintonx.Anupdateofthesolution et al. (2009a), who demonstrated the versatility of convex takes the following form optimisationmethodsandtheirsuperiorityrelativetostan- x(t) =x(t−1)+T(cid:16)Φ†(cid:0)y−Φx(t−1)(cid:1)(cid:17), (4) dwaarsddienvteelrofpereodmbeytriWc iiamuaxgientgalt.ec(h2n01iq0u)est.oArecCovSerapthperosaicgh- nal induced by cosmic strings in the cosmic microwave where Φ† is the adjoint of the linear operator Φ. In the as- background. McEwen & Wiaux (2011) generalised the CS tronomycommunity,thecomputationoftheresidualimage imaging techniques to wide field-of-view observations. Non- (cid:0) (cid:1) Φ† y −Φx(t−1) , which represents a gradient step of the coplanareffectsandtheoptimisationoftheacquisitionpro- residual norm, is being referred to as the major cycle while cess, were studied by Wiaux et al. (2009b) and Wolz et al. (2013).Alltheaforementionedworkssolveasynthesis-based problem defined by 1 Assumingpre-calibrateddatainthepresenceofDDEs,theline of G associated with frequency u, is explicitly given by the con- minkαk subject to ky−ΦΨαk ≤(cid:15), (5) 1 2 volutionofthediscreteFouriertransformofD(l, u),centredon α u,withtheassociatedgriddingkernel.Thismaintainsthesparse where(cid:15)isaboundonthe‘ normofthenoisen.Synthesis- 2 structureofG,sincetheDDEsaregenerallymodelledwithcom- basedproblemsrecovertheimagerepresentationαwiththe pact support in the Fourier domain. A non-sparse G drastically final image obtained from the synthesis relation x = Ψα. increasesthecomputationalrequirementsfortheimplementation Here, the best model for the sparsity, the non-convex ‘ ofthemeasurementoperator.However,itisgenerallytransparent 0 tothealgorithmssincetheydonotrelyonthesparsitystructure norm, is replaced with its closest convex relaxation, the explicitly. This is the case for all the algorithmic structures dis- ‘1 norm, to allow the use of efficient convex optimisation cussedherein. solvers. Re-weighting schemes are generally employed to MNRAS000,1–23(2016) 4 A. Onose, R. Carrillo et al. approximate the ‘ norm from its ‘ relaxation (Candès the alternating direction method of multipliers (Boyd et al. 0 1 et al. 2008; Daubechies et al. 2010). Imaging approaches 2011) or the simultaneous direction method of multipliers basedonunconstrainedversionsof (5)havealsobeenstud- (Setzer et al. 2010). All proximal splitting methods solve ied(Wengeretal.2010;Lietal.2011;Hardy2013;Garsden optimisation problems like et al. 2015). For example, Garsden et al. (2015) applied a ming (z)+···+g (z), (7) synthesis-based reconstruction method to LOFAR data. 1 n z Asopposedtosynthesis-basedproblems,analysis-based withg ,i∈{1,...,n},proper,lower-semicontinuous,convex i approaches recover the signal itself, solving functions. No assumptions are required about the smooth- minkΨ†xk subject to ky−Φxk ≤(cid:15). (6) ness, each non-differentiable function being incorporated 1 2 x into the minimisation through its proximity operator (C1). Thesparsityaveragingreweighedanalysis(SARA),basedon Constrained problems are reformulated to fit (7) through the analysis approach and an average sparsity model, was the use of the indicator function (C2) of the convex set C introduced by Carrillo et al. (2012). Carrillo et al. (2014) defined by the constraints. As a general framework, proxi- proposedascalablealgorithm,basedonSDMM,tosolve(6). mal splitting methods minimise (7) iteratively by handling For such large-scale problems, the use of sparsity operators eachfunctiong ,possiblynonsmooth,throughitsproximity i Ψ that allow for a fast implementation is fundamental. Hy- operator. A good review of the main proximal splitting al- bridanalysis-by-synthesisgreedyapproacheshavealsobeen gorithmsandsomeoftheirapplicationstosignalandimage proposed by Dabbech et al. (2015). processing is presented by Combettes & Pesquet (2011). To provide an analogy between clean and the FB it- Primal-dual methods (Komodakis & Pesquet 2015) in- erations employed herein, we can consider one of the most troduce another framework over the proximal splitting ap- basicapproaches,theunconstrainedversionoftheminimisa- proachesandareabletoachievefullsplitting.Alltheopera- tion problem (6), namely min kΨ†xk +βky−Φxk2 with tors involved, not only the gradient or proximity operators, x 1 2 β a free parameter. To solve it, modern approaches using butalsothelinearoperators,canbeusedseparately.Dueto FB iterations perform a gradient step together with a soft- this, no inversion of operators is required, which gives im- thresholding operation in the given basis Ψ† (Combettes & portantcomputationaladvantageswhencomparedtoother Pesquet 2007b). This FB iterative structure is conceptually splitting schemes (Combettes & Pesquet 2012). The meth- extremelyclosetothemajor-minorcyclestructureofclean. ods solve optimisation tasks of the form At a given iteration, the forward (gradient) step consists in ming (z)+g (Lz), (8) doingastepintheoppositedirectiontothegradientofthe z 1 2 ‘ normoftheresidual.Itisessentiallyequivalenttoamajor 2 with g and g proper, lower semicontinuous convex func- cycleof clean.Thebackward(soft-thresholding)stepcon- 1 2 tions and L a linear operator. They are easily extended to sists in decreasing the absolute values of all the coefficients problems, similar to (7), involving multiple functions. The of Ψ†x that are above a certain threshold by the thresh- minimisation(8),usuallyreferredtoastheprimal problem, old value, and setting to zero those below the threshold. accepts a dual problem (Bauschke & Combettes 2011), This step is very similar to the minor cycle of clean, with thesoft-thresholdvaluebeingananalogoustotheloopgain ming1∗(−L†v)+g2∗(v), (9) v factor. The soft-thresholding intuitively works by removing small and insignificant coefficients, globally, on all signal whereL† istheadjointofthelinearoperatorLandg2∗ isthe locations simultaneously while clean iteratively builds up Legendre-Fenchel conjugate function of g2, defined in (C3). the signal by picking up parts of the most important coef- Under our assumptions for g1 and g2 and, if a solution to ficients, a local procedure, until the residuals become negli- (8)exists,efficientalgorithmsforsolvingtogethertheprimal gible. Thus, clean can be intuitively understood as a very and dual problems can be devised (Condat 2013; Vu˜ 2013; specific version of the FB algorithm. As will be discussed Combettes & Pesquet 2012). Such PD approaches are able inSection4,fromtheperspectiveof clean,thealgorithms to produce highly scalable algorithms that are well suited presented herein can be viewed as being composed of com- forsolvinginverseproblemssimilarto(2).Theyareflexible plex clean-like FB steps performed in parallel in multiple andofferabroadclassofmethodsrangingfromdistributed data, prior and image spaces. computing to randomised or block coordinate approaches (Pesquet & Repetti 2015; Combettes & Pesquet 2015). AugmentedLagrangian(AL)methods(Bertsekas1982) havebeentraditionallyusedforsolvingconstrainedoptimi- 3 CONVEX OPTIMISATION sation problems through an equivalent unconstrained min- Optimisation techniques play a central role in solving the imisation. In our context, the methods can be applied for large-scale inverse problem (2) from RI. Some of the main findingthesolutiontoaconstrainedoptimisationtaskequiv- methodsfromconvexoptimisation(Bauschke&Combettes alent to (8), 2011) are presented in what follows. ming (z)+g (r), subject to r=Lz, (10) Proximal splitting techniques are very attractive due 1 2 z,r to their flexibility and ability to produce scalable algorith- by the introduction of the slack variable r. The solution mic structures. Examples of proximal splitting algorithms is found by searching for a saddle point of the augmented include the Douglas-Rachford method (Combettes & Pes- Lagrange function associated with (10), quet 2007a; Boţ & Hendrich 2013), the projected gradient approach(Calamai&Moré1987),theiterativethresholding s† 1 maxming (z)+g (r)+ (Lz−r)+ kLz−rk2. (11) algorithm (Daubechies et al. 2004; Beck & Teboulle 2009), s z,r 1 2 µ 2µ 2 MNRAS000,1–23(2016) Scalable splitting algorithms for SKA 5 The vector s and parameter µ, correspond to the La- wavelet bases (Carrillo et al. 2012) are a good candidate grange multipliers. No explicit assumption is required on butproblem(12)isnotrestrictedtothem.Are-weighted‘ 1 the smoothness of the functions g and g . Several algo- approach(Candèsetal.2008)mayalsobeusedbyimplicitly 1 2 rithms working in this framework have been proposed. The imposingweightsontheoperatorΨbutitisnotspecifically alternatingdirectionmethodofmultipliers(ADMM)(Boyd dealt with herein since it does not change the algorithmic et al. 2011; Yang & Zhang 2011) is directly applicable to structure. This would serve to approximate the ‘ pseudo 0 theminimisation(10).Ageneralisationofthemethod,solv- norm, kΨ†xk , by iteratively re-solving the same problem 0 ing (7), is the simultaneous direction method of multipli- as in (12) with refined weights based on the inverse of the ers (SDMM)(Setzer et al. 2010). It finds the solution to an solutioncoefficientsfromthepreviousre-weightedproblem. extended augmented Lagrangian, defined for multiple func- An efficient parallel implementation can be achieved tions g . Both methods can also be characterised from the from (2) by splitting of the data into multiple blocks i PD perspective (Boyd et al. 2011; Komodakis & Pesquet y Φ G M 2015). Algorithmically, they split the minimisation step by 1 1 1 1 alternatingbetweentheminimisationovereachoftheprimal y= .. , Φ= .. = .. FZ. (14) . . . variables,z andr,followedbyamaximisationwithrespect y Φ G M to the multipliers s, performed via a gradient ascent. nd nd nd nd Since Gj ∈CMj×noNj is composed of compact support ker- nels, the matrices Mj ∈ RnoNj×noN can be introduced to 4 LARGE-SCALE OPTIMISATION select only the parts of the discrete Fourier plane involved in computations for block j, masking everything else. The The next generation telescopes will be able to produce a selected,n N ,N ≤N,frequencypointsaredirectlylinked o j j hugeamountofvisibilitydata.Tothisregard,thereismuch to the continuous u–v coordinates associated with each of interest in the development of fast and well performing re- thevisibilitymeasurementsfromblocky .Thus,foracom- construction algorithms (Carrillo et al. 2014; McEwen & j pact grouping ofthevisibilitiesintheu–v space,eachblock Wiaux2011).Highlyscalablealgorithms,distributedorpar- onlydealswithalimitedfrequencyinterval.Thesefrequency allelised, are just now beginning to gather traction (Car- ranges are not disjoint since a discrete frequency point is rillo et al. 2014; Ferrari et al. 2014). Given their flexibility generally used for multiple visibilities due to the interpola- and parallelisation capabilities, the PD and AL algorith- tion kernels and DDEs modelled through the operator G . j micframeworksareprimecandidatesforsolvingtheinverse Since both have a compact support in frequency domain, problems from RI. without any loss of generality, we consider for each block j an overlap of n such points. v We rewrite (2) for each data block as 4.1 Convex optimisation algorithms for radio-interferometry y =Φ x+n , (15) j j j UndertheCSparadigm,wecanredefinetheinverseproblem with n being the noise associated with the measurements as the estimation of the image x ∈ RN given the measure- j y .Thus,wecanredefinetheminimisationproblem(12)as mentsy∈CM undertheconstraintthattheimageissparse j in an over-complete dictionary Ψ. Since the solution of in- Xnb Xnd terests is an intensity image, we also require x to be real minf(x)+ li(Ψ†ix)+ hj(Φjx) (16) x andpositive.Theanalysisformulation(6)ismoretractable i=1 j=1 since it generally produces a simpler optimisation problem where, similarly to (13), we have whenover-completedictionariesareused(Eladetal.2007). Additionally,theconstrainedformulationoffersaneasyway li =k·k1, (17) ofdefiningtheminimisationgivenaccuratenoiseestimates. h (z)=ι (z), B ={z∈CMj :kz−y k ≤(cid:15) }. Thus, we state the reconstruction task as the convex j Bj j j 2 j minimisation problem (Carrillo et al. 2013, 2014) Here, (cid:15)j represents the bound on the noise for each block. For the sparsity priors, the ‘ norm is additively separable minf(x)+l(Ψ†x)+h(Φx) (12) 1 and the splitting of the bases used, x (cid:2) (cid:3) withthefunctionsinvolvedincludingalltheaforementioned Ψ= Ψ ... Ψ , (18) 1 nb constraints, with Ψ ∈ CN×N for i ∈ {1,...,n }, is immediate. The i b l =k·k , 1 new formulation involving the ‘ terms remains equivalent 1 f =ι , C =RN, (13) to the original one. Note that there are no restrictions on C + the number of blocks Ψ is split into. However, a different h(z)=ι (z), B={z∈CM :kz−yk ≤(cid:15)}. B 2 splittingstrategymaynotallowfortheuseoffastalgorithms Thefunctionf introducestherealityandpositivityrequire- for the computation of the operator. ment for the recovered solution, l represents the sparsity Hereafter we focus on the block minimisation problem prior in the given dictionary Ψ and h is the term that en- definedin(16)andwedescribetwomainalgorithmicstruc- sures data fidelity constraining the residual to be situated turesforfindingthesolution.Thefirstclassofmethodsuses in an ‘ ball defined by the noise level (cid:15). aproximalADMManddetailsthepreliminaryworkofCar- 2 We set the operator Ψ∈CN×nbN to be a collection of rilloetal.(2015).ThesecondisbasedonthePDframework n sparsityinducingbases(Carrilloetal.2014).TheSARA andintroducestoRI,anewalgorithmabletoachievethefull b MNRAS000,1–23(2016) 6 A. Onose, R. Carrillo et al. splittingpreviouslymentioned.Thesemethodshaveamuch A proximal version of ADMM deals with the non- lightercomputationalburdenthantheSDMMsolverprevi- smooth functions from (21) and (23) by approximating the ously proposed by Carrillo et al. (2014). They are still able solutionviaproximalsplitting.Algorithm1presentsthede- toachieveasimilarlevelofparallelism,eitherthroughanef- tails. In Figure 1 we present a diagram of the algorithm to ficientimplementationinthecaseofADMMor,inthecase provide further insight into its parallelisation and distribu- ofPD,bymakinguseoftheinherentparallelisablestructure tioncapabilities.Itcanalsobeusedtounderstandthealgo- ofthealgorithm.ThemainbottleneckofSDMM,whichthe rithm from the clean perspective, performing FB clean- proposedalgorithmsavoid,istheneedtocomputethesolu- like updates in multiple data, prior and image spaces. Data tion of a linear system of equations, at each iteration. Such fidelity is enforced through the slack variables r(t), by min- j operation can be prohibitively slow for the large RI data imising(23)andthusconstrainingtheresidualtobelongto sets and makes the method less attractive. The structure the‘ ballsB .Thisacceptsaclosedformsolutionand,for 2 j of SDMM is presented in Appendix B, Algorithm 3. For its each ball j, represents the projection, complete description in the RI context we direct the reader z−y to Carrillo et al. (2014), the following presentation being (cid:15) j +y kz−y k >(cid:15) focused on the ADMM and PD algorithms. PBj(z)=∆ jkz−yjk2 j j 2 j (24) z kz−y k ≤(cid:15) j 2 j 4.2 Dual forward-backward based alternating onto the feasible regions defined by it. Given the structure direction method of multipliers of the function h¯, this is implemented in parallel with dis- tributedcomputationsandpresentedinAlgorithm1,step8, TheADMMisonlyapplicabletotheminimisationofasum together with the update of the Lagrange variables s(t), in of two functions and does not exhibit any intrinsic paral- j step9.Thevariablesb(t) ∈CnoNj,computedinsteps3to6, lelisationstructure.However,byrewritingtheminimisation j are required in the computations and need to be transmit- problem from (16) as ted to the different processing nodes. The nodes compute minf¯(x)+h¯(Φx), (19) the solution updates q(t) ∈ CnoNj in step 10 after which x j they are centralised and used to revise the previous solu- an efficient parallel implementation may be achieved. We tionestimatex(t−1) andtocomputex(t).Thus,bycarefully define the two functions involved in as defining the minimisation problem, a high degree of paral- Xnb Xnd lelismisachieved.Notethatthisstepcaneasilyincorporate f¯(x)=f(x)+ li(Ψ†ix), h¯(Φx)= hj(Φjx). (20) all types of weighting of the data specific to RI. i=1 j=1 Forourspecificproblem,theminimisationoverxfrom Furthermore, since h¯ is a sum of indicator functions (21)doesnotacceptaclosedformsolution.Weapproximate ι (Φ x), we can redefine it as h¯(Φx) = ι (Φx), with itbyusingaFBstep.Theforwardstepcorrespondstoagra- Bj j B¯ dientstepandthebackward stepisanimplicitsub-gradient- B¯=B ×B ×...×B . 1 2 nd like step performed through the proximity operator. Thus, ADMM iteratively searches for the solution to an aug- in step 12, the solution is updated using the descent step mented Lagrangian function similar to (11). The computa- ρ, in the direction of the gradient of the smooth part. This tionsareperformedinaserialfashionandexplicitparalleli- isfollowedbytheiterativedualFB(Combettesetal.2011) sation may only be introduced inside each of its three al- updatesnecessarytoapproximatetheproximityoperatorto gorithmic steps. Thus, at each iteration, ADMM alternates the non smooth f¯. Algorithm 1, function DualFB, details between the minimisation the required sub-iterations. In steps 23 and 20, the method minµf¯(x)+ 1(cid:13)(cid:13)Φx+s−r(cid:13)(cid:13)2 (21) alternatesbetween,projectionsontotheconvexsetC,which, x 2 2 component wise, are defined as othveerstlahcekvavrairaibableleofri,nterestxandtheminimisationinvolving (cid:16)P (z)(cid:17) =∆ (cid:26) <(zk) <(zk)>0 ∀k, (25) C 0 <(z )≤0 minµh¯(r)+ 1(cid:13)(cid:13)r−Φx−s(cid:13)(cid:13)2. (22) k k r 2 2 and the application of the proximity operator to the spar- These are followed by a gradient ascent with a step % per- sity prior functions li, which is the component wise soft- formed for the Lagrange multiplier variable s. Given the thresholding operator definitionofthefunctionh¯(r),theminimisationinvolvingr ( z {|z |−α} can be split into nd independent sub-problems (cid:16)Sα(z)(cid:17) =∆ k k|zk| + |zk|>0 ∀k, (26) mrijnµh¯j(rj)+ 21(cid:13)(cid:13)rj−Φjx−sj(cid:13)(cid:13)22, j ∈{1,...,nd}. (23) with threskhold α. The s0oft threshold r|ezsku|lt=in0g for the algo- Thisminimisationamountstocomputingtheproximityop- rithm is ηρµ. However, since µ is a free parameter, we re- eratorofµh¯ atΦ x+s ,which,giventhedefinitionofthe parameterisetheoperationtousethesoftthresholdκkΨk , j j j S function h¯ , reduces to a projection operation. The method withκasanewscale-freeparameter,independentoftheop- j imposes that every r approaches Φ x while x converges eratorΨused.Here,wedenotebykΨk theoperatornorm j j S towardsthesolution.Theconvergencespeedisgovernedby of the sparsifying transform. The operator {·} from (26) + the Lagrange multiplier µ and by the ascent step % asso- sets the negative values to 0. The parameter η serves as an ciated with the maximisation over the Lagrange multiplier update step for the sub-problem. In step 20, we have ad- variable s. ditionally used the Moreau decomposition (C4) to replace MNRAS000,1–23(2016) Scalable splitting algorithms for SKA 7 Algorithm 1 Dual forward-backward ADMM. 1: givenx(0),r(0),s(0),q(0),κ,ρ,% =FZ j j j 2: repeat fort=1,... 3: ˜b(t)=FZx(t−1) | {z } 54:: ∀j∈b(j{t)1,=..M.,jn˜bd(}t)set b(1t) ,b(2t) ,...,b(ntd) 6: end 7: ∀j∈{1,...,nd}distributebj(t) and do in parallel Data1 Datand (cid:16) (cid:17) 98:: sr(j(jtt))==sP(jtB−j1)G+j%b(cid:0)j(tG)j+bj(st(j)t−−1r)(jt)(cid:1) yG11 sr(1(1tt)) Gynndd srn((nttdd)) (cid:16) (cid:17) 10: q(jt)=G†j Gjbj(t)+r(jt)−sj(t) sequentialsteps sequentialsteps proxim(cid:8)alstep(cid:9) proxim(cid:8)alstep(cid:9) 11: end and gatherqj(t) PB1 ······ ˙˙ PBnd ······ Xnd | {z } | {z } 12: x˜(t)=x(t−1)−ρZ†F† M†jqj(t) r(1t) r(ntd) j=1 (cid:0) (cid:1) 13: x(t)=DualFB x˜(t),κ gradientascent(cid:0) (cid:1) gradientascent(cid:0) (cid:1) 14: untilconvergence s(1t)=s(1t−1)+% ··· sn(td)=sn(td−1)+% ··· (cid:0) (cid:1) 15: functionDualFB z,κ 16: givend(0),η 17: z¯(0)=PiC(cid:0)z(cid:1) FBstep q1(t) ,q(2t) ,...,q(ntd) 18: repeat fork=1,... z }| { 19: ∀i∈{1,...,n(cid:18)b}do in pa(cid:19)r(cid:16)allel (cid:17) DualFB −ρZ†F† 20: d(ik)= η1 I−SκkΨkS ηdi(k−1)+Ψ†iz¯(k−1) | {z } 21: d˜(ik)=Ψid(ik) sub-iterations 2223:: ez¯n(kd)=PC(cid:16)z−Xin=b1d˜(ik)(cid:17) SparsitΨy11 z¯(0) SparsitΨynnbbz¯(0) 24: untilconvergence d(1k) d(nkb) 25: returnz¯(k) FBstep FBstep ˙˙ forwardstep forwardstep the proximity operator of the conjugates l∗ with that of i thefunctionsl ,withI denotingtheidentityoperator.The (cid:8) (cid:9) (cid:8) (cid:9) computationsiinvolvingeachbasisΨ†aretobeperformedin SκkΨkS ···· SκkΨkS ···· i parallel, locally. Distributed processing is problematic here backwardstep backwardstep due to the large size of the image z¯(k) that would need to be transmitted. FBstep d˜(1k) ,d˜(2k) ,...,d˜(nkb) z }| { 4.3 Primal-dual algorithms with randomisation Xnb The main advantage that makes the PD algorithms attrac- PC − Ψi tiveforsolvinginverseproblemsistheirflexibilityandscal- i=1 | {z } ability. They are able to deal with both differentiable and non-differentiable functions and are applicable to a broad z¯(k) rangeofminimisationtasks.Theinherentparallelisationon thelevelofsplittingthefunctionsgivesadirectapproachfor solving(16).Anotherimportantaspectisgivenbytheuseof randomisation, allowing the update for a given component function to be performed less often and thus lowering the computational cost per iteration. Block coordinate compu- Figure 1. The diagram of the structure of ADMM, detailed in tationsarealsosupportedbutarenotexplicitlyusedherein. Algorithm 1, showcasing the parallelism capabilities and over- all computation flow. The algorithm performs in parallel proxi- WedefinetheminimisationtasktobesolvedusingPD mal and gradient updates (similarly to the CLEAN performing methods, similarly to (16), as major-minorcycle)foralldatafidelityterms.Itsstructureissub- Xnb Xnd iterativeandenforcessparsityandpositivitythroughthedualFB minf(x)+γ l (Ψ†x)+ h (Φ x), (27) algorithm.Theseupdates,performedinparallelforeachsparsity i i j j x basis,canbeagainseenasanalogoustoclean.Thus,thewhole i=1 j=1 algorithmcanbeseenascomposedofinterlacedclean-likeproxi- where γ is an additional tuning parameter. Note that the malsplittingandFBupdatesrunninginparallelinmultipledata, minimisation problem does not change, regardless of the prior,andimagespaces. MNRAS000,1–23(2016) 8 A. Onose, R. Carrillo et al. Algorithm 2 Randomised forward-backward PD. =FZ 1: givenx(0),x˜(0),u(i0),v(j0),u˜(i0),v˜(j0),κ,τ,σi,ςj,λ 2: repeat fort=1,... | {z } | {z } 3: generate setsP ⊂{1,...,nb}andD⊂{1,...,nd} 4: ˜b(t)=FZx˜(t−1) b(1t) ,b(2t) ,...,b(ntd) x˜(t−1) 5: ∀j∈D set 6: b(jt)=Mj˜b(t) 7: end Data1 Datand Sparsity1 Sparsitynb 8: run simultaneously Gy11 v(1t) Gynnddv(ntd) Ψu(11t) uΨ(nntbb) 9: ∀j∈D dis(cid:18)tributeb(j(cid:19)t)(cid:16)and do in para(cid:17)llel FBstep FBstep (cid:12) FBstep FBstep 10: v¯(jt)= I−PBj v(jt−1)+Gjb(jt) forwardstep ˙˙ forwardstep (cid:12)(cid:12) forwardstep ˙˙ forwardstep 11: v(t)=v(t−1)+λ(cid:16)v¯(t)−v(t−1)(cid:17) j j j j PB1(cid:8)····(cid:9) PBnd(cid:8)····(cid:9) SκkΨkS(cid:8)····(cid:9) SκkΨkS(cid:8)····(cid:9) 1123:: endv˜(jat)nd=gGa†jtvh(jte)rv˜(t) j backwardstep backwardstep backwardstep backwardstep 14: ∀j∈{1,...nd}\D set 15: v(t)=v(t−1) j j 16: v˜(t)=v˜(t−1) FBstep v˜(1t) ,v˜(2t) ,...,v˜(ntd) u˜(1t) ,u˜(2t) ,...,u˜(ntb) 17: endj j PC −τZ†F†z }| {−τXnb σizΨi }| { 1198:: ∀i∈u¯P(it)d=o(cid:18)inIp−aSraκlkleΨlkS(cid:19)(cid:16)u(it−1)+Ψ†ix˜(t−1)(cid:17) (cid:16) (cid:17) i=1 20: u(t)=u(t−1)+λ u¯(t)−u(t−1) | {z } i i i 21: u˜(it)=Ψiu(it) x(t) 22: end 23: ∀i∈{1,...nb}\P set 24: u(t)=u(t−1) i i 25: u˜(t)=u˜(t−1) i i Figure2.ThediagramofstructureofPD,detailedinAlgorithm 26: end 2flfo,orswmh.oIwanlclacusoipnndtgraattsheteswpoaintrhatlhAleelDisdMmuaMcla,vptaahrbeiailPbitlDieessaavlgn(tod)riotavhnemdraluils(ctao)bmluepsuitnotgaptFieorBn- 2278:: ex¯n(td)=PC(cid:18)x(t−1)−τ(cid:16)Z†F†Xnd ςjM†jv˜(jt)+Xnb σiu˜(it)(cid:17)(cid:19) i j j=1 i=1 iterationsandinparallel.Theupdateoftheprimalvariablex(t) 29: x(t)=x(t−1)+λ(cid:0)x¯(t)−x(t−1)(cid:1) is also a FB step. Viewed though the perspective of the intu- itivesimilaritybetweenaFBiterationandclean,thistranslates 30: x˜(t)=2x¯(t)−x(t−1) to performing clean-like iterations in parallel in multiple data, 31: untilconvergence prior,andimagespaces. dual variables in parallel. The update of the primal vari- value γ takes due to the use of the indicator functions in able, the image of interest x(t), requires the contribution f and hj which are invariant to scaling. This fits under the of all dual variables v(t) and u(t). The algorithm uses the i j frameworkintroducedbyCondat(2013);Vu˜(2013);Pesquet update steps τ, σ and ς to iteratively revise the solution i j & Repetti (2015) and we devise a PD algorithm towards andallowsforarelaxationwiththefactorλ.FBiterations, finding the solution. The method iteratively alternates be- consisting of a gradient descent step coupled with a prox- tweensolvingtheprimalproblem(27)andthedualproblem, imal update, are used to update both the primal and the ! dual variables. These FB updates can be seen as clean- minf∗ −Xnb Ψ u −Xnd Φ†v likestepsperformedinthemultiplesignalspacesassociated ui i i j j withtheprimalandthedualvariables.Inthedeterministic vj i=1 j=1 (28) case, the active sets P and D are fixed such that all the 1 Xnb Xnd dual variables are used. The randomisation capabilities of + l∗(u )+ h∗(v ), γ i i j j the algorithm are presented later on, given a probabilistic i=1 j=1 construction of the active sets. essentially converging towards a Kuhn-Tucker point. This When applied in conjunction with the functions from produces the algorithmic structure of Algorithm 2 where (17), the primal update from step 28 is performed through additionally we have used the Moreau decomposition (C4) the projection (25) onto the positive orthant defined by C. torewritetheproximaloperationsandreplacethefunction Thedualvariablesareupdatedinsteps10and19usingthe conjugates. A diagram of the structure is presented in Fig- proximityoperatorsforh andl ,whichbecometheprojec- j i ure 2 further exemplifying the conceptual analogy between tion onto an ‘ ball B defined by (24) and the component 2 j thePDalgorithmandclean.Thealgorithmallowsthefull wisesoft-thresholdingoperator(26).WeusetheMoreaude- split of the operations and performs all the updates on the composition (C4) to replace the proximity operator of the MNRAS000,1–23(2016) Scalable splitting algorithms for SKA 9 conjugate functions l∗ and h∗ with that of the function l canbetoocostly.Inthiscase,asharedmemoryarchitecture i j i andh ,respectively.TheidentityoperatorisdenotedbyI. mightbemoreappropriatethandistributedprocessing.For j Step19alsocontainsare-parametrisationsimilartotheone the data nodes, the communication cost is low and a dis- performed for ADMM. We replace the implicit algorithmic tributed approach is feasible. We have assumed these two soft-threshold size γ/σi with κkΨkS by appropriately choos- different strategies for dealing with the different terms in ingthefreeparameterγ.Thisensuresthatweareleftwith the presentation of Algorithms 1 and 2. the scale-free parameter κ independent to the operator Ψ. Mostoftheoperationstobeperformedareproportional Steps 11, 20 and 29 represent the relaxation of the applica- with N since the main variable of interest x(t) is the image tion of the updates. To make use of the parallelisation, the to be recovered. The most demanding operation performed application of the operators G† and Ψ is also performed in onx(t) istheapplicationoftheoversampledFourieropera- j i parallel,insteps12and21.Notethatthesplittingoftheop- tors. When computed with a fast Fourier algorithm (FFT) eratorsispresentedin(14),morespecificallyΦ =G M FZ, (Cooley&Tukey1965),thecomputationalcostofthetrans- j j j ∀j ∈{1,...,n }.Theseoperationsaregiveninsteps4to7. forms F and F† applied to n -oversampled data scales as d o The computation of the dual variables ui(t) associated O(noNlognoN). It should be noted that the FFT imple- with the sparsity priors requires the current solution esti- mentationcanbespedupbyusingmultipleprocessingcores mate. This solution estimate is then revised with the up- ornodes.ThewaveletoperatorsΨandΨ†areappliedtothe dates u˜(t) computed from the dual variables. Both x(t) and imagex(t) aswell.TheDiscreteWaveletTransform(DWT) i u˜(t) areofsizeN andtheircommunicationmightnotbede- can be performed with fast wavelet implementations using i sirableinalooselydistributedsystem.Insuchcaseallcom- liftingschemesorfilterbanks(Cohenetal.1993;Daubechies putationsinvolvingu(t)canbeperformedinparallelbutnot & Sweldens 1998; Mallat 2008) and achieves a linear com- i in a distributed fashion. The dual variables v(t), associated plexity of O(N) for compactly supported wavelets. A dis- j tributed processing of the operations involved in the appli- with the data fidelity functions, should be computed over a cationofeachsparsitybasisΨ maybeused.However,this distributedcomputingnetwork.Theyonlyrequirethecom- i munication of the updates b(t) ∈ CnoNj and dual updates requiresthecommunicationofthecurrentsolutionestimate, j whichmightnotbefeasible.Weconsiderthatthesecompu- v˜(t) ∈CnoNj which remains feasible. j tations are performed locally, on the central meta-node. The main challenge associated with the inverse prob- For the data nodes, a manageable computational load lem defined by (2) is linked with the dimensionality of the andanefficientcommunicationcanbeachievedbybothal- data. The large data size is a limiting factor not only from gorithms by adopting a balanced and compact split of the thecomputationalperspectivebutalsofromthatofmemory data; splitting the data into blocks of similar size having a availability. A randomisation of the computations following compact frequency range as proposed in (14). An overlap the same PD framework (Pesquet & Repetti 2015) is much ofsizen betweendiscretefrequencyrangesisnecessaryfor v more flexible at balancing memory and computational re- anefficientinterpolation(Fessler&Sutton2003)totheuni- quirements. By selectively deciding which data fidelity and formfrequencygridwhichallowsfastFouriercomputations sparsitypriorfunctionsareactiveateachiterations,fullcon- or to include DDEs (Wolz et al. 2013). Besides this over- trol over the memory requirements and computational cost lap, each block only deals with a limited frequency range per iteration can be achieved. In Algorithm 2, this is con- reducing the communication performed. In such case, the trolledbychangingthesetsP,containingtheactivesparsity matricesM maskoutthefrequenciesoutsidetherangeas- j prior dual variables, and D, which governs the selection of sociatedwiththeblocksy .Furthermore,theuseofcompact thedatafidelitydualvariables.Ateachiteration,eachdual j support interpolation kernels and DDEs with compact sup- variable has a given probability of being selected, p for Pi port in the Fourier domain makes Gj sparse, which lowers thesparsityprior,andp forthedatafidelity,respectively. Dj the computational load significantly. We consider it has a Theseprobabilitiesareindependentofeachother.Notethat generic sparsity percentage n . s thealgorithmhasinertiastillperformingtheprimalupdates Details on the levels of parallelisation and the scaling usingalldualvariableseventhoughsomedualvariablesre- tomultiplenodesforbothmethodsarepresentedbelow.As main unchanged. mentioned earlier, the main computational difficulties arise from working with large images and data sets, thus making important the way the complexity of the algorithms scales withN andM.Anoverviewofthecomplexityrequirements 5 IMPLEMENTATION DETAILS AND is presented in Table 1. COMPUTATIONAL COMPLEXITY AnefficientimplementationoftheADMMandthePDalgo- 5.1 Alternating direction method of multipliers rithms takes advantage of the data split and of the implicit parallelisationfromthedefinitionoftheminimisationprob- TheefficientimplementationofADMMfortheproblemde- lem. For presentation simplicity, we consider the processing fined by (19) offloads the data fidelity computations to the to be split between a central meta-node, a single processing data nodes. As can be seen from Figure 1 and Table 1, the unitorpossiblyacollectionofnodes,centralisingtheupdate basic structure of the algorithm is serial and the processing on the desired solution x(t) and performing the computa- is just accelerated by parallelising each serial step. tions associated with the sparsity priors, and a number of The iterative updates follow the operations presented datafidelitynodesdealingwiththeconstraintsinvolvingthe in Algorithm 1. The central node computes an estimate balls B . The computation of the sparsity prior terms can x˜(t) of the solution and iteratively updates it to enforce j be easily parallelised however, the distribution of the data sparsity and positivity. The update from step 12 requires MNRAS000,1–23(2016) 10 A. Onose, R. Carrillo et al. Table 1.ComplexityofADMM(top)andPD(bottom)algorithmsforoneiteration.Eachnodehasitscomputationalloadlisted.The ADMM algorithm iterates nf¯ times over steps 18 to 23. The serial nature of its structure can be observed, the nodes not operating simultaneously. The PD methods alternate between updating the primal and the dual variables. All dual variables are computed in parallel.Thevisibilitydataareassumedtobesplitintocompactblockscomposedofanequalnumberofvisibilitiesintheu–v space. Algorithm1(ADMM) centralnode nd datafidelitynodes (cid:0) (cid:1) steps3-6 O noNlognoN — (cid:16) (cid:17) steps8-10 — O 2nnsndoMNj+Mj (cid:0) (cid:1) (cid:0) (cid:1) step 12 O noNlognoN +O noN+ndnv — (cid:0) (cid:1) nf¯×steps17-22 O 2nbN — Algorithm2(PD) centralnode nd datafidelitynodes (cid:0) (cid:1) steps4-8 O noNlognoN — (cid:0) (cid:1) (cid:16) (cid:17) steps10-27 pPiO 2nbN pDjO 2nnsndoMNj+Mj (cid:0) (cid:1) (cid:0) (cid:1) steps29-30 O noNlognoN +O (nb+no)N+ndnv — O(n Nlogn N)operationsforthecomputationoftheover- Theprocessingisperformedintwosynchronousalternating o o sampled FFT. Given a compact partitioning of the matrix serialstepstoupdatetheprimalanddualvariables,respec- G, the sum involving the updates q(t) requires computa- tively.Eachstepishoweverhighlyparallelisable.Thecentral j tions of the order O(n N)+O(n n ). Note that it may be node uses the current estimate of the solution x(t−1) and o d v accelerated by using the data node network, however since distributes the oversampled Fourier transform coefficients generally n is not large, the gain remains small. The com- b(t) to the data fidelity nodes. The data fidelity and cen- v j putationoftheFouriercoefficientsfromstep3alsoincursa tral nodes compute simultaneously the dual variables and complexity O(n Nlogn N). provide the updates v˜(t) and u˜(t) to be centralised and in- o o j i For the approximation of the proximal operator of the cluded in the next solution estimate on the central node. function f¯, the algorithm essentially remains serial and re- Such a strategy requires at each step the propagation of quiresanumbern ofiterations.Inthiscase,thecomplexity variablesofsizen N ,betweenthecentralanddatafidelity f¯ o j ofeachupdateperformedforthesparsitypriorisdominated nodes. As suggested in Algorithms 2, the computation of by the application of the operators Ψ and Ψ†, which, given thesparsitypriordualvariablesisalsohighlyparallelisable. an efficient implementation of the DWT requires O(N) op- However, the communication of the current image estimate erations.Theupdatesd(k)andd˜(k)fromstep20and21may isrequired,limitingthepossibilitytodistributethedatadue i i be computed in parallel. Given a serial processing however to its large size. We leave the computation to be performed thiswouldneedO(n N)computations.Notethatalthough by the central node, without an explicit exploitation of the b in this case the complexity scales linearly with N, the scal- possible parallelism. ingconstantscanmakethecomputationstobeofthesame All dual variables can be computed simultaneously as level as the FFT. can be seen in Figure 2. The data fidelity nodes need to The data fidelity nodes perform steps 8 to 10 in par- apply the linear operators Gj as in steps 10 and 12. Simi- allel using the Fourier coefficients b(t) precomputed in step larlytoADMM,thisincurstheheaviestcomputationalbur- j 5. The computations are heavier due to the linear opera- den. Given the very sparse structure of the matrix Gj this torGj.Asmentionedearlier,theoperatorhasaverysparse accounts for a complexity of O(nsMjnoNj) with noNj be- structure. This reduces the computation cost for applying ingthepreviouslymentionednumberof,uniformlygridded, Gj or G†j to O(nsMjnoNj), where noNj is the number of frequency points for the visibilities yj. The remaining op- uniformly gridded, frequency points associated with each erations only involve vectors of size Mj and thus the over- visibility block y . The remaining operations only involve all resulting complexity is O(2nsMjnoNj)+O(2Mj). The j vectorsofsizeM .Theoverallresultingcomplexitypernode wavelet decomposition from steps 19 and 21 achieves a lin- j is O(n M n N )+O(M ). Under the assumption that the ear complexity of O(N) for compactly supported wavelets. s j o j j blocks contain an equal number of visibilities, this further Theotheroperationsfromsteps19and20areoforderO(N) reducestoO(ns/ndMnoNj)+O(Mj)Thecommunicationre- resulting in a load for the sparsity prior nodes that scales quired between the central and the data fidelity nodes is of linearly with N. order n N , the size of frequency range of each data block. In step 28 of Algorithm 2, the summing of the sparsity o j prior updates requires O(n N) operations. For the ‘ data b 2 fidelityterms,givenacompactpartitioninginfrequencyfor thematrixG,thecomputationrequiresO(n N)+O(n n ) 5.2 Primal-dual algorithm o d v operations.ThecomputationalcostofthetransformsFand An implementation of the PD algorithms benefits from the F†, steps 4 and 28, scales as O(n Nlogn N) since this o o fullsplitachievedbythemethodswhichallowsforthecom- requires the FFT computation of the n -oversampled im- o putationofallthedualvariablestobecompletedinparallel. age. The remaining operations, including the projection, MNRAS000,1–23(2016)