Table Of Content

Darwin Meets Einstein: LISA Data Analysis Using Genetic Algorithms Jeff Crowder, Neil J. Cornish, and Lucas Reddinger Department of Physics, Montana State University, Bozeman, MT 59717 This work presents the first application of the method of Genetic Algorithms (GAs) to data analysis for the Laser Interferometer Space Antenna (LISA). In the low frequency regime of the LISAbandthereareexpectedtobetensofthousandsgalactic binarysystemsthatwillbeemitting gravitational waves detectable by LISA. The challenge of parameter extraction of such a large numberofsourcesintheLISAdatastreamrequiresasearchmethodthatcanefficientlyexplorethe large parameter spaces involved. As signals of many of these sources will overlap, a global search methodisdesired. GAsrepresentsuchaglobal searchmethodforparameterextractionofmultiple overlappingsourcesintheLISAdatastream. WefindthatGAsareabletocorrectlyextractsource parameters for overlapping sources. Several optimizations of a basic GA are presented with results derived from applications of theGA searches to simulated LISA data. 6 0 I. INTRODUCTION eter space such as Markov Chain Monte Carlo (MCMC) 0 2 methods [8]. At this time, however, it is not clear which of these techniques, or which combination of techniques n The Laser Interferometer Space Antenna (LISA) [1] is willprovidethebestsolutiontotheCocktailPartyProb- a set to be launched in the middle of the next decade. As J lem. LISAisanall-skyantenna,itwilldetectsourcesinalldi- 7 rections,andacrossagreatrangeofdistances. Thetypes Here we present the first application of the method 1 of genetic algorithms [16] to the challenge of extracting of sources range from monochromatic white dwarf bina- parameters from a simulated LISA data stream contain- riesinourowngalaxytorapidlycoalescingsupermassive 2 ing multiple monochromatic gravitational wave sources. v blackholebinariesinthedistantreachesoftheUniverse. The strength of this method lies in its searching capa- 6 ThechallengeforanalyzingtheLISAdatastreamwillbe bilities, and thus GAs might be used as the first step in 3 pulling out the various parameters of as many of these 0 sourcesasispossible. Alargeimpedimentto completing dealing with the confusion background. The initial solu- 1 tioncould then be handed offto a MCMC algorithm[8], this challenge is the many thousands of low frequency, 0 whichspecializes in determining the nature of the poste- effectively monochromatic sources[2, 3, 4, 5, 6] that will 6 rior distribution function. be present in the LISA data streams. Extracting the 0 In section II we explore various factors that influence / parameters from so many sources at once is analogous c the performance of a genetic search algorithm. A bare- to determining what every member in the audience of a q bones algorithm is introduced in IIA, and succeeding rock concert is saying. As more sources overlapthe con- - r fusion grows rapidly [7]. The name given to this issue is layers of complexity are added to this algorithm in IIB g through IIH, with an emphasis on developing an effi- ‘The Cocktail Party Problem’ (see Ref. [8] for a detailed : v discussion). cient algorithm, which is robust enough to handle the i entire low frequency regime of the LISA detector. Ap- X With so many sources, it will be impossible to ex- plications of the advanced algorithms to multiple source r tracttheindividualsourceparametersforeverysourcein cases are shown in IIG. We conclude with a discussion a the LISA band. This will leave a background of sources of future improvements and plans for the application of whose indeterminable signals blend together into a con- genetic algorithms to LISA data analysis. fusionlimited background. Severalstudies[4,5,6, 9,10, 11]haveindicatedthattheconfusionnoisemaydominate instrument noise at the low end of the LISA frequency range, so that other sources of interest may be buried II. GENETIC SEARCH ALGORITHMS beneaththeconfusionbackground. Forthis reasonakey goal of LISA data analysis is to reduce the level of the Thefundamentalideabehindageneticalgorithmisthe confusion noise as much as possible. survivalofthefittest. Itisbecauseofthisthatgenetical- Previous approaches to the extraction of parameters gorithmsareoftenreferredtoasevolutionaryalgorithms, from the LISA data stream have used several methods. thoughDarwin[17]wouldprobablyhaveconsideredGAs Gridbasedtemplatesearchesusingoptimalfilteringpro- as “Variation under Domestication” since we are breed- vide a systematic method to search through all possible ing toward a predetermined goal. Through the process combinationsofgravitationalwavesources,butthecom- of continually evolving solutions to the given problem, putationalcostofsuchasearchappearstomakeitunfea- genetic algorithms provide a means to search the large sible [12]. Other techniques applied to simulated LISA parameter space that we will be confronted with in the datainvolveiterativerefinementofasequentialsearchof low frequency region of the LISA band. sources[13,14],atomographicapproach[15],globaliter- A few definitions are in order before delving into our ative refinement, and ergodic exploration of the param- applications of genetic algorithms to LISA data analy- 2 sis. These definitions will refer to a hypothetical search Weuseabreedingpatternknownas1-pointcrossover, of the LISA data stream for N monochromatic gravita- which consists of the combination of complimentary sec- tional wave sources. The search will take advantage of tions of the binary strings of two parent organisms. The the F-statistic to reduce the searchspace to 3N parame- cross-overpoint can be chosenat random or fixed in ad- ters. The hypothetical searchwill also involve the use of vance. We chose a fixed cross-over with the cross-over n simultaneous, competing solution sets. pointoccurringatthe midpointofthe strings. As anex- An organism is a particular 3N parameter set that is a ample we show the breeding of a parameter represented possible solution for the source parameters. by strings that are 8 digits long. A gene is an individual parameter within an organism. A generation is the set of all n concurrent organisms. Breeding or cross-over is the process through which a new organism is formed from one or more organisms of TABLE I:Midpoint crossover for an 8 bit string the previous generation. Mutation is a process which allows for variation of a Parent 1 0100 1110 organismasitisbredfromtheorganismsoftheprevious Parent 2 0011 0011 generation. Elitismis the technique of carryingover one or more of Offspring 0011 1110 the best organismsin one generationto the next generation. A simplified genetic algorithm begins with a set of n We will start with a basic search using 10 organisms organisms that comprise the first generation. The genes in each generation. The first generation has the genes of this generation may be chosen at random or selected of its organisms chosen at random from their respective throughsome other process. The organismsof eachgen- ranges. The probabilityofeachof these organismsbeing eration are checked for fitness, and those with the best chosen for reproduction is proportional to its likelihood, fitness are more likely to breed, with mutation, to form (known as fitness proportionate cross-over). Mutated the organismsof the next generation. With passing gen- L gametes are formed using a PMR of 0.04, and are bred erations the organisms tend toward better solutions to using a single midpoint crossover. the source parameters. We use the F-statistic to mea- sure the fitness of each organism. Figure 1 shows trace plots of the log likelihood, frequency, θ, and φ for a source with SNR = 15.4464 and parameters: A=1.97703 10−22,f =1.000848032mHz, A. Basic Implementation × θ = 1.2713, φ = 5.34003, ι = 2.73836, ψ = 1.43093, and γ = 5.59719 (it is this source that will be used repeat- o For our investigations source frequencies were cho- edly throughout the paper). The plotted values were for sen to lie within the range f [0.999995,1.003164] theorganismwiththebestfitineachgeneration. Ascan ∈ mHz. This range spans 100 frequency bins of width be seen the parameters are well determined with even ∆f = 1/year. Amplitudes were restricted to the range this basic scheme, though the noise in the data stream −23 −21 A [10 ,10 ]. ByuseoftheF-statisticoursearches pushes them off their true values. The parameter val- are∈reduced to frequency f, and sky location θ and φ. ues are shifted by δf = 1.5 10−9 Hz, δθ = 2.9◦ and ForadetaileddescriptionoftheF-statisticanditsusein δφ = 1.5◦ from their −input×values. These shifts are reducing the search space see Refs. [8, 18]. consist−entwiththeerrorpredictionsfromaFishermatrix A simple approach is to represent the values of each analysis: ∆f =1.7 10−9Hz,∆θ =3.5◦ and∆φ=1.9◦. × search parameter with binary strings. The length of the Thecostofthesearchismeasuredintermsofthenumber stringsdetermines theprecisionofthe search,e.g. repre- ofcallstotheF-statisticroutineandisgivenby$=n g, × senting θ with a binary string of 8 digits gives precision where g is the generation number. Typical runs of our to 0.7◦. Resolution is given by, (parameter range)/2L, basicgeneticalgorithmcost$=32650calls. Thisshould whereLis thelengthofthe binarystring. Suchabinary be compared to a grid based search across the same fre- representation allows for ease of mutation and breed- quency range, which, for a minimal match of MM=0.9, ing. We employed binary strings of length L = 16 for wouldrequire$=110,000callstotheF-statisticroutine f, L=13 for θ and L=14 for φ. (this value is 23/2 larger than that quoted in Ref. [8] as In this basic scheme, we first mutate the parent’s pa- our earlier calculations used a noise level that was √2 rameter strings, and then breed the mutated gametes. larger than the LISA baseline due to a mix up between Simple mutation consists of flipping the binary digits of one and two sided noise spectral densities). theparent’sparameterstringswithprobabilityPMR,the parameter mutation rate. A large PMR will tend to re- While the basicalgorithmis sufficient for finding a so- sult in more variation in the gametes, and thus the off- lution,itisnotefficient. Nextwewilldiscussadjustments spring,whileasmallPMRwilllessenvariation,resulting tothealgorithmthatwillimproveitsefficiency,andmake in more offspring that resemble their parents. it considerably cheaper than a grid based search. 3 150 1.0009 coolingprocess. Thebestchoiceofvaluesforthisscheme is again impossible to know a priori. In section IIF we 120 log L 90 f (in mHz) 1.00085 wpriollvisdeeeahonwatu“GraelnseotliuctiGonenteotitchiAslpgroorbitlhemms.” are able to 60 True Parameter logL True frequency logL frequency 150 1.002 30 1.0008 True Parameter logL True frequency 0 1000 2000 3000 4000 5000 0 1000 2000 3000 4000 5000 logL frequency 120 (a) (b) θ (in degrees) 1 7000 φ (in degrees) 333005 log L 963 0000 0 1000 2000 3000 4000 5000 f (in mHz) 1.00 11 0 1000 2000 3000 4000 5000 (a) (b) True theta True phi 40 theta phi 180 360 290 True theta 0 1000 2000 3000 4000 5000 0 1000 2000 3000 4000 5000 theta F(bI)Gf.re1q:ueBnacsyic,(Ac)l(,cg)θo,riatnhdm(:d)Tφrafcoerpthloetbsafsoirci(ma)pll(eod)mgelinktealtihioonodof, θ (in degrees) 1 6200 φ (in degrees) 214200 ageneticalgorithm. They-axesaretheparametervaluesand True phi phi log likelihoods of the best fit organism for each generation. 0 0 0 1000 2000 3000 4000 5000 0 1000 2000 3000 4000 5000 The x-axesare generation number. (c) (d) FIG. 2: Large Mutation Rate: Trace plots for (a) log likelihood,(b)frequency,(c),θ,and(d)φforthebasicimplemen- B. Aspects of Mutation tationofageneticalgorithmwithPMR=0.1. They-axesare theparametervaluesandloglikelihoodsofthebestfitorgan- Inthe previousexamplethe PMRwassetatthe fairly ism for each generation. The x-axesare generation number. lowvalueof0.04. Figure2showstraceplotsforthesame search, but with PMR = 0.1. While the PMR = 0.04 example shows a tendency for small deviations from the improving solutions, the larger PMR search allows large 150 1.003 swings in the solution away from a good fit to the true 120 shoouwrcaespmaarallmPeMteRrs.(0O.0n01t)hecaonthcearusheatnhde,rFaitgeuoref p3roshgroewsss log L 9600 f (in mHz) 11..000021 to be greatly slowed. A small mutation rate slows the 30 True Parameter logL True frequency exploration of the likelihood surface. logL frequency 0 1 Astheseexamplesshow,choosingtheproperPMRcan 0 10000 20000 30000 0 10000 20000 30000 (a) (b) haveasignificanteffectontheefficiencyofthealgorithm. 180 360 Knowing which value is the proper choice a priori is im- pdthioffosesseribesnlaet.mvFeauvluratelhuseeorsmfatothroeet,hPaeMtrdpRihffawesrieelsln.tbEepahmralyoserosenoeifffintcthiheeenstseeatarhrcachnh, θ (in degrees) 1 6200 φ (in degrees) 214200 alargePMRisdesirableforincreasedexploration. Once True theta True phi theta phi convergenceto the solutionhasbegun,asmallerPMRis 0 0 10000 20000 30000 0 0 10000 20000 30000 preferable, to prevent suddenly mutating away from the (c) (d) solution. One can imagine a process which changes the FIG. 3: Small Mutation Rate: Trace plots for (a) log like- PMR in a manner analogous to the simulated annealing lihood, (b) frequency, (c), θ, and (d) φ for the basic imple- process, where we start the PMR high (hot) and lower mentation of a genetic algorithm with PMR=0.001. The y- (cool) it in succeeding generations. In fact, this process axes are the parameter values and log likelihoods of the best in sometimes called simulated annealing in the GA lit- fit organism for each generation. The x-axes are generation erature. Figure 4 shows trace plots for the same source, number. usingagenetic(PMR)simulatedannealingschemegiven by: gcool−g C. The effect of the number of organisms on PMR=PMRf(cid:16)PPMMRRfi(cid:17) gcool 0<g <gcool (1) efficiency PMRf g gcool  ≥ While choosing the PMR is one degree of freedom in where PMRi = 0.2, PMRf = 0.01, g is the generation our basic schema, another is the number of organisms number, and gcool = 1000 is the last generation of the used in the search. Here we look at how the choice of 4 150 1.003 1.48 1013 to6.02 1020. Asthe probabilityofbreeding × × 120 is set by the value of the organism likelihood, that new log L 9600 f (in mHz) 11..000021 btohregesatnnfieisxtmtogrhgeaansneriasalmstoioijnsum(gtohpioneduggtthooiabtepisotiphneotspisnirbipmleaartrahymatbertaeeersdesceporancodef 30 True Parameter logL True frequency with a similar likelihood value). logL frequency 0 1 0 500 1000 1500 0 500 1000 1500 Increasing the number of organisms not only provides (a) (b) this stabilizing effect, it also provides more chances per 180 360 generationforimprovementsduetomutations. Onecan- θ (in degrees) 1 6200 φ (in degrees) 214200 nldeormot,pwhiiontwheoeffivuetcrip,eansyicmiyn.pglAyastphrarincoewe;xtmthraoetrmepeorirecgxeaawnmiislplmlbes,eaiamtntahegveienpnertouubas--l True theta True phi ing the basic scheme describe in IIA and putting 40000 theta phi 0 0 organisms into the search. Even if one of the randomly 0 500 1000 1500 0 500 1000 1500 (c) (d) chosen organisms matched the best fit parameters, the computational cost ($ = 40000) is already larger than FIG. 4: Genetic Simulated Annealing: Trace plots for (a) the cost of using 10 organisms ($ = 38510). Figure 5 log likelihood, (b) frequency, (c), θ, and (d) φ for the basic provides a snapshot of the how this choice effects effi- implementation of a genetic algorithm with the inclusion of ciency. genetic simulated annealing. The y-axes are the parameter values and log likelihoods of the best fit organism for each generation. Thex-axes are generation number. Cost Computational Cost 1e+05 1e+06 5e+04 1e+04 the number of organisms effects the efficiency of the al- 5e+03 100000 gorithm. The efficiency is inversely related to the computational cost $, which is measured by the number of 10000 300000 calls to the function calculating the F-statistic (where 250000 the bulk of calculations for an organism are performed), 1000 200000 150000 which occurs once per newly formed organism. For ex- 100000 ample, in Figure 1 there are 10 organisms in the search 21000 50000 and the search surpasses the true parameter log likeli- Organism Numbe 2r00 0 hood value at 3851 generations. Thus its computational 300 cost is $=38510 (function calls). 400 0.04 0.08 0.12 0.16 0.2 PMR ThedatainFigure5showstheinterplayofthenumber of organisms with the PMR (held constant within each datarun)andtheireffectsonthecomputationalcost. We FIG. 5: Average Computational Cost as a function of PMR would expect that relatively large PMRs would be less andthenumberoforganisms. Thez-axesistheaveragecom- efficient as was seen in subsection IIB (and will show up putational cost calculated from 1000 searches. in Figure 7). The size of the effect, however, is modified by the number of organisms in the search. For example, one can find from Figure 5 that the minimum cost ($ = 4492)for a 20 organismsearch occurs when PMR=0.1, D. Elitism however for 400 organisms in the search the minimum cost ($=7490) is at PMR=0.14. The addition of more organisms in the search pro- Elitism is akin to cloning. It allows for a perfect copy vides a kind of stability to the system that decreasesthe of an organism or organisms to be bred into the next chancesofmutatingawayfromgoodsolutions. Withjust generation. Including elitism is another way to provide ahandfuloforganisms,andalargePMR,thechancesare a stabilizing force across generations. This allows for a higher of each organism undergoing a large mutation in larger PMR to enhance exploration without the danger at least one parameter. However, with hundreds of or- of moving off the best fit solution. ganismstheprobabilityofallorganismsundergoingsuch Figure6showstraceplotsfor the nominalsourcewith a mutation drops appreciably. Then in the succeeding PMR = 0.1 and a single elite organism being cloned at generation,thoseorganismsthatremainedagoodfitare each generation. As expected there is increased explo- muchmorelikelytobreedtheoffspringofthenextgener- ration (compared to results shown in Figure 1) due to ation. However,thisdoesnothindergreatleapsforward. thelargerPMR,butunliketheresultsshowninFigure2, To illustrate this point we will use the data shown in convergence is now helped by the cloned organism. Figure1. Ingoingfromthe 7th to the 8th generationthe Figure 7 shows a plot relating the average computa- valueofthelikelihoodofthebestfitorganismjumpsfrom tional cost to the PMR for the case of no elitism, and 5 150 1.001 which case there is no exploration beyond the first gen- 120 eration. At the other extreme of no elitism the algo- log L 9600 f (in mHz) 1.0009 rianimthFomiugnuitsreouf2ne.sltiTatibhslmeereaaginsadiantsbhtaellsaairnzgceeeoPtfoMthbReePvstaMrluuRcekst,hbaaesttwwweaielslnspetrehone- 30 True Parameter logL True frequency vide the most efficient scheme, but the exact nature of logL frequency 0 1.0008 0 250 500 750 1000 0 250 500 750 1000 the balance can depend on the nature of the search. We (a) (b) describe a solution to this problem in IIF. 140 360 θ (in degrees) 1 6000 φ (in degrees) 321048000 E. Simulated Annealing True theta True phi theta phi 20 120 0 250 500 750 1000 0 250 500 750 1000 Simulated annealing is a technique that effectively (c) (d) makes the detector more noisy, thus lessing the range FIG. 6: Elitism: Trace plots for (a) log likelihood, (b) fre- of the likelihood function. This increases the probabil- quency,(c),θ,and(d)φforthebasicimplementationofage- ity of choosing poorer sources for reproduction, which neticalgorithm with PMR=0.1andsingleorganism elitism. allows for a more thorough exploration of the likelihood The y-axes are the parameter values and log likelihoods of surface. Think of the likelihood as a partition function the best fit organism for each generation. The x-axes are Z = Cexp( βE), in which the role of the energy is generation number. − played by the log likelihood, E = (s hs h), and − | − β plays the role of the inverse temperature. Heating up the case where a single organism is cloned. Computa- the system(loweringβ) lowersthe likelihood range,pro- tional cost is now derived from the average number of viding for increased exploration. Starting hot, we use a newly formed organisms (note: a cloned organism does power law cooling schedule given by: not increase computational cost, as all of its associated values are already known). The plot shows the average computational cost of 100 searches, using 20 organ- 1 g/gcool Aism=s,1o.f61a4g8i6veen s2o2u,rcfe=(S1N.R00=3 m19H.2z3,3θ5=an0d.8p,aφra=me2t.e1r4s,: β =β10(cid:16)2β0(cid:17) 0<g <gcool (2) ι=0.93245,ψ−=2.24587,andγ =5.29165). Aswasex- 2 g ≥gcool o  pected, elitism has allowed for a larger PMR, compared to the zero elitism case, increasing the parameter space where β0 is the initial value of the inverse temperature, exploration without sacrificing efficiency. g is the generation number, and gcool is the last generation of the cooling process (subsequent generations have β = 1/2). As the likelihood is a sharply peaked 14000 No Elitism Single Organism Elitism function, we found for a single source an initial value of 12000 β0 1/100was sufficient to speed the process. For mul- ∼ tiple source searches increasing that by factors of 3 to 5 10000 produced more efficient explorations. Similarly, for multiple sources an increase in gcool was needed to properly 8000 explorethesurface. Thisincreasescaledroughlylinearly Cost with the number of sources. 6000 This mode of simulated annealing, which will be re- ferred to as standard simulated annealing, is markedly 4000 different than the genetic version of simulated anneal- 2000 ing discussed in IIB. Standard simulated annealing al- ters the search space, using the heat/energy to smooth 0 the likelihood surface, whereas in genetic simulated an- 0.01 0.06 0.11 0.16 0.21 0.26 0.31 0.36 0.41 PMR nealing the search space was left unchanged and the heat/energyoftheorganismswasincreasedviathelarger FIG.7: AverageComputationalCostfornoelitismandsingle PMRs. organism elitism. Datapoints aredetermined bytheaverage of 100 distinct searches. Figure 8 shows trace plots of the log likelihood, frequency,θ, andφ searchingfor the same sourceasin Fig- If one decides to use elitism there is the additional ure 4. The only change between the two examples is the choiceofhowmanyeliteorganismswillbeclonedateach type of annealing process. For this run PMR = 0.04, generation. At one extreme all organisms are cloned, in β0 =1/100,and gcool =300. 6 150 1.003 be so much less efficient. Also, as one can see from the 120 datapresented,thevariationsinthefrequencyaresignif- log L 9600 f (in mHz) 11..000021 itidcoaenathtloeyfgstmeanialeol.lreerGdtihPvaiMnngRthsaobsseeeypooafnrθadtaetnhdPeφMo.rRgWatenoicsaeman,cehaxntpdeanrddaomtwhene- 30 True Parameter logL True frequency ter will allow for evenbetter adaptation. (In the natural logL frequency 0 1 0 100 200 300 400 500 0 100 200 300 400 500 worldorganismscontroltheirmutationratesbybuilding (a) (b) in DNA repair mechanisms to counteract the externally 180 360 determined mutation rate set by cosmic rays and other θ (in degrees) 1 943055 φ (in degrees) 21 978000 pathogens). True theta True phi 150 1.001 theta phi 0 0 0 100 200 300 400 500 0 100 200 300 400 500 120 FIG. 8: Standar(dc) Simulated Annealing: Trace(d)plots for (a) log L 9600 f (in mHz) 1.0008 log likelihood, (b) frequency, (c), θ, and (d) φ for the ba- 30 True Parameter logL True frequency sic implementation of a genetic algorithm with the inclusion logL frequency 0 1.0006 of standard simulated annealing and PMR = 0.04. The y- 0 100 200 300 400 500 0 100 200 300 400 500 axes are the parameter values and log likelihoods of the best (a) (b) 180 360 fit organism for each generation. The x-axes are generation numbeFr.. Giving more control to the algorithm θ (in degrees) 1 943055 φ (in degrees) 21 978000 True theta True phi theta phi 0 0 0 100 200 300 400 500 0 100 200 300 400 500 In the previous examples, choices were required as to (c) (d) whatPMRorwhichdegreeofelitismshouldbeusedwith FIG. 9: Genetic-Genetic Algorithm: Trace plots for (a) log a particular source to provide the most efficient search. likelihood,(b)frequency,(c),θ,and(d)φforagenetic-genetic In making those choices, we are searching for a solution algorithminwhichthePMRevolveswiththeorganisms. The thatdependsontheinformationinthedatastream. Just y-axesaretheparametervaluesandloglikelihoodsofthebest as we use the power of the genetic algorithm to search fit organism for each generation. The x-axes are generation fortheparametersofthe gravitationalwavesourcesthat number. contributetothe datastream,wecanalsousethatsame power to search for efficient values for PMR or elitism. Treating the PMR, elitism, or other factors in the genetic algorithmlike a sourceparameterthese factorscan be elevated,or onemight saydemoted, to the same level 1 Elite PMR average asthesourceparameters. Wementionedthisatthenend Total PMR average ofsubsectionIIBandhaveimplementedthisideaforthe PMR. The initial PMR for each organism is chosen randomly, and the PMR for each organism in the next gen- 0.1 efisirtmantepisolesn.giTsenhbeirsteidcchjaaulnsggtoerasistthfhm,eθnt,oaatnuadrgeφeonafertteih,ce-bgaaeslngeeodtrioictnhamolrggofraronitmihsmma R averages M (GGA), in which a factor, or factors, determining the P 0.01 search for the source parameters evolve along with the organisms. Figure 9 shows trace plots for a GGA with the PMR evolvingwiththeorganisms. Thisrunincludesthesimu- latedannealingschemeusedinthepreviousexampleand 0.001 0 100 200 300 400 500 elitism of the single best fit organism. Figure 10 shows Generation Number the evolutionofthe PMRforthe samerun. The ‘genetic FIG. 10: Genetic-Genetic Simulated Annealing of the PMR: simulatedannealing’schemeisvisibleintheplotwiththe Trace plots for the PMR as it evolves with the organisms. largerPMRsmoreefficientearlieron,andsmallerPMRs Thedataforthisplotisfromthesamerunthatproducedthe dominatinginthelaterstages. AstheevolvingPMRval- data in Figure 9. ues range overnearly two orders of magnitude, it is easy to see why a single, constant choice for the PMR would 7 G. Multiple sources in the data stream TABLE II: GGA search for 5 galactic binaries. The frequencies are quotedrelative to1 mHzas f =1mHz+δf with δf At the low end of the LISA band there will be many in µHz. All angles are quoted in radians. thousands of sources. Thus, we expect to see multiple SNR A (10−22) δf θ φ ψ ι ϕ0 sources even in small segments of the data stream such True 12.7 1.02 1.638 2.77 1.48 2.28 0.886 0.273 as the one we have been considering. Simulations point GA ML 11.6 1.08 1.635 2.86 1.40 2.63 1.02 5.94 to bright source densities of up to one source per five True 19.3 2.23 0.7000 2.41 5.87 0.435 1.88 4.29 modulation frequency bins (fmod = 1/year) [10]. Thus, any search algorithm must be able to perform multiple GA ML 17.7 2.11 0.7008 2.43 5.90 0.460 1.86 4.20 source searches at the low end of the LISA band. True 17.8 1.74 0.3937 0.756 1.85 1.41 2.02 3.09 Figure 11 shows an implementation of the GGA with GA ML 17.0 1.80 0.3942 0.777 1.84 1.27 1.95 2.57 standard simulated annealing to a LISA data stream True 15.8 2.16 1.002 1.53 1.30 1.35 1.70 4.63 snippetofwidth100fmod,containingfivemonochromatic GA ML 14.8 2.17 1.002 1.59 1.28 1.37 1.68 4.68 binary systems. The standard simulated annealing was True 12.1 0.836 1.944 0.872 0.802 1.56 0.805 3.87 completedin the firstgcool =4000generations,bywhich GA ML 11.8 1.09 1.950 0.876 0.803 2.87 1.09 3.48 timetheGGAhadseparatedoutthevaluesforthesource frequencies and co-latitudes. The grouping of azimuthal angles was separated soon thereafter, with minor modi- H. Using Active Organisms fications of the parameters occurring over the next 5000 generations. Search results are summarized in Table II. The GGA accuratelyrecoveredthe source parametersin So far all of the organisms that have been discussed this and similar multiple (3 5) source data sets, con- are passive organisms. They are passive in the sense verging to a best fit solution−in less than 5000 genera- that once they are bred, the organisms themselves re- tionspersourcewith10organismspergeneration,solong main unchanged, and are simply used to breed the next as the source correlation coefficients were below 0.25. generation. One can imagine organismsthat ‘learn’ dur- The intrinsic parameters for the sources were rec∼overed ing their lifetime, advancing toward a better solution. to within 2σ of the true parameters (based on a Fisher Directed search methods such as an uphill simplex, i.e. Information Matrix estimate of the uncertainties of the an amoeba, provide a means for organisms to advance recovered parameters). When highly correlated sources within a generation. As the likelihood surface is not en- areused,the GGAspends acorrespondinglylongertime tirely smooth, the simplex may getstuck in a localmax- to pick out the source parameters. Investigations in this imum thatis removedfrom the globalmaximum. So the area were limited. A full study of the affect of source generational process is still necessary to ensure full ex- correlation on computational cost is to be carried out in ploration of the surface. One approach is to use the the the future. parameters bred from one generation as the centroid of the simplex (amoeba), which will then proceed to move uphill across the likelihood surface. Another approach, 600 True frequency that we will describe in a future publication, is to use 1.003 frequency ‘Genetic Amoeba’, where genes code for each vertex of 400 1.002 thesimplex. Theamoebaareallowedtobreedafterthey havefoundenoughfood(i.e. increasedtheirlikelihoodby 200 1.001 a specified amount). Amoeba that eat well get to breed True Parameter logL logL 1 the most often and have the most offspring. 0 0 5000 10000 15000 0 5000 10000 15000 Figure 12 shows trace plots for an implementation of a GGA with a single directed organism per generation. True theta True phi 180 theta 360 phi The other 9 organisms were the standard passive organ- 135 270 isms. There was elitism with a single organism being cloned into the succeeding generation, and there was no 90 180 standardsimulatedannealing. What is missing from the 45 90 plotisthecomputationalcost. Whilecomputationalcost 0 0 can easily be derived from the plots with passive organ- 0 5000 10000 15000 0 5000 10000 15000 isms, active organisms,such as an uphill simplex involve FIG. 11: Genetic algorithm search for 5 sources: Trace plots multiple calls to the F-statistic function within a single for (a) log likelihood, (b) frequency, (c), θ, and (d) φ for a generation. At the 8th generation, where the search sur- genetic algorithm searching for the presence of five gravita- passes the true likelihood value, the computational cost tional wave sources in the data stream. The y-axes are the is $ = 876. This cost is slightly lower than the cost of parameter values and log likelihoods of the best fit organism a GGA with only passive organisms at the point where for each generation. The x-axesare generation number. its search surpasses the likelihood value for the true parameters. However,fortrueLISAdata,wewillnotknow 8 the true parameters, and thus will have to allow the al- We have shown that the method is a feasible search gorithms to undergo extended runs to ensure they have method capable of handling multiple sources in a re- fully exploredthe space and found the global maximum. strictedfrequency range. Next we will seek to determine Thehighercomputationalcostpergenerationofthesim- the limits of the algorithm both in terms of source plex method (which averages 100 calls to find a local number and source density across the low frequency ∼ maximum) will quickly lead to a higher total cost of the regime of the LISA band. While an optimal solution search. Other directed methods that are more efficient would employ a matched filter that includes every than an uphill simplex may provide an alternative that resolvablesourceinthe LISAband[8],it isunlikely that will provide an overallimprovementin efficiency. Future a direct search for this “super template” is the best way work will include an examination of other possibly more to proceed. A better approach may be to start with a efficient directed methods, and a detailed study of the collection of “single cell” organism that each code for Genetic Amoeba algorithm. a single source (or possibly small collections of highly correlated sources), then combine these cells into a 150 1.0009 multi-cellular organism that searches for the super template. This approach is motivated by the cellular slime 120 log L 9600 f (in mHz) 1.0008 mtbhuoetlidruspliDovneicsttyhaoessrtseeellepidaaasreaatonefdasiAncchgrleeam-sciideclaalel,dswighanimcahol,estbphoeeindidnpdmrivooitsditsutoasfl, 30 True Parameter logL True frequency cells aggregate into a great swarm that acts as a single logL frequency 0 0 5 10 15 20 1.0007 0 5 10 15 20 multi-celluar organism, capable of movement and the (a) (b) formation of large fruiting bodies. Future work will 180 320 also include investigations into algorithm optimization θ (in degrees) 1 8300 φ (in degrees) 221738505 a(scetanu.pdgda.ybaidclciaotopmiaetlsapetasoicrofiinnnaggnoftbohtipenhtaecimroiameilzsgp)eo.durtiagtFhteimunorentttahioclercaomoltgshotoerrerai,tnshdoamurrtecthsoeoolrtouoyttuphigoeehnsr True theta True phi 30 theta 140 phi (optimized) search methods like Markov Chain Monte 0 5 10 15 20 0 5 10 15 20 Carlo searches, gClean, Slice & Dice, and Maximum (c) (d) Entropy methods would provide guidance on how to FIG. 12: GGA with a directed organism: Trace plots for (a) proceed in solving the LISA Data Analysis Challenge. loglikelihood,(b)frequency,(c),θ,and(d)φforaGGAwith a single directed organism. The y-axes are the parameter values and log likelihoods of the best fit organism for each generation. Thex-axes are generation number. Acknowledgements III. CONCLUSIONS This work was supported by NASA Grant Thisworkisthefirstapplicationofageneticalgorithm NNG05GI69G and NASA Cooperative Agreement to the search of gravitational wave source parameters. NCC5-579. [1] P. L. Bender, et al., LISA Pre-Phase A Report; Second Soc. 346, 1197 (2003). Edition, MPQ 233 (1998). [10] S.Timpano,L.J.Rubbo&N.J.Cornish,gr-qc/0504071 [2] C. R. Evans,I. Iben & L. Smarr, ApJ 323, 129 (1987). (2005). [3] V.M.Lipunov,K.A.Postnov&M.E.Prokhorov,A&A [11] L. Barack & C. Cutler, Phys.Rev. D70, 122002 (2004). 176, L1 (1987). [12] J. R. Gair, L. Barack, T. Creighton, C. Cutler, S. L. [4] D. Hils, P. L. Bender & R. F. Webbink, ApJ 360, 75 Larson, E. S. Phinney & M. Vallisneri, Class. Quant. (1990). Grav. 21, S1595 (2004). [5] D.Hils & P. L.Bender, ApJ 537, 334 (2000). [13] N.J. Cornish & S.L. Larson, Phys. Rev. D67, 103001 [6] G. Nelemans, L. R. Yungelson & S. F. Portegies Zwart, (2003). A&A375, 890 (2001). [14] N.J. Cornish, Talk given at GR17, Dublin, July (2004); [7] J. Crowder & N.J. Cornish, Phys. Rev. D70, 082005 N.J. Cornish, L.J. Rubbo & R. Hellings, in preparation (2004). (2005). [8] N.J. Cornish & J. Crowder, Phys. Rev. D72, 043005 [15] M. S.Mohanty, & R.K. Nayak,gr-qc/0512014 (2005). (2005). [16] J.Holland,AdaptationinNaturalandArtificialSystems, [9] A. J. Farmer & E. S. Phinney, Mon. Not. Roy. Astron. (Ann Arbor, Michigan, University of Michigan Press, 9 1975). D58 063001 (1998). [17] C. Darwin, The Origin of Species, (J. Murray, London, Mohanty and Nayak,gr-qc/0512014 1859). [18] P. Jaranowski, A. Krolak & B. F. Schutz, Phys. Rev. 150 1.003 120 1.002 ) 90 z H L m g o n l 60 i ( 1.001 f 30 True Parameter logL True frequency logL frequency 0 1 0 200 400 600 800 1000 0 200 400 600 800 1000 (a) (b) 180 360 135 270 ) ) s s e e e e r r g g 90 180 e e d d n n i i ( ( θ 45 φ 90 True theta True phi theta phi 0 0 0 200 400 600 800 1000 0 200 400 600 800 1000 (c) (d)