Algorithmic differentiation of code with multiple context-specific activities Jan Christian Hueckelheim, Laurent Hascoët, Jens-Dominik Müller To cite this version: Jan Christian Hueckelheim, Laurent Hascoët, Jens-Dominik Müller. Algorithmic differentiation of code with multiple context-specific activities. ACM Transactions on Mathematical Software, 2016. hal-01413321 HAL Id: hal-01413321 https://hal.inria.fr/hal-01413321 Submitted on 9 Dec 2016 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. 1 Algorithmic differentiation of code with multiple context-specific activities JanChristianHückelheim,QueenMaryUniversityofLondon LaurentHascoët,INRIASophiaAntipolis Jens-DominikMüller,QueenMaryUniversityofLondon Algorithmicdifferentiation(AD)bysource-transformationisanestablishedmethodforcomputingderivativesofcom- putationalalgorithms.Staticdata-flowanalysisiscommonlyusedbyADtoolstodeterminethesetofactivevariables, thatis,variablesthatareinfluencedbytheprograminputinadifferentiablewayandhaveadifferentiableinfluenceon theprogramoutput.Inthiswork,acontext-sensitivestaticanalysiscombinedwithprocedurecloningisusedtogen- eratespecialisedversionsofdifferentiatedproceduresforeachcallsite.Thisenablesbetterdetectionandelimination ofunusedcomputationsandmemorystorage,resultinginperformanceimprovementsofthegeneratedcode,inboth forwardandreversemodeAD.Theimplicationsofthismulti-activityADapproachonthestaticanalysisofanADtool isshownusingdataflowequations.Theworst-casecostofmulti-activityADonthedifferentiationprocessisanalysed andpracticalremediestoavoidrunningintothisworst-casearepresented.ThemethodwasimplementedintheAD toolTapenade,andwepresentitsapplicationtoa3Dunstructuredcompressibleflowsolver,forwhichwegeneratean adjointsolverthatperformssignificantlyfasterwhenmulti-activityADisused. CCSConcepts:•Mathematicsofcomputing→Automaticdifferentiation;•Softwareanditsengineering→Automated staticanalysis;Sourcecodegeneration; AdditionalKeyWordsandPhrases:Algorithmicdifferentiation,automaticdifferentiation,sourcetransformation,Activity analysis,staticanalysis,reversemode,adjoint,tangent-linear ACMReferenceFormat: JanChristianHückelheim,LaurentHascoëtandJens-DominikMüller,2015.Algorithmicdifferentiationofcodewith multiplecontext-specificactivities.ACMTrans.Math.Softw.1,1,Article1(January2099),20pages. DOI:0000001.0000001 Author’s addresses: Jan Christian Hückelheim and Jens-Dominik Müller, School for Engineering and Materials Sci- ence,QueenMaryUniversityofLondon,MileEndRoad,E14NSLondon,UK;LaurentHascoët,INRIASophiaAntipolis Méditerranée,2004RoutedesLucioles,06902Valbonne,France. Permissiontomakedigitalorhardcopiesofallorpartofthisworkforpersonalorclassroomuseisgrantedwithoutfee providedthatcopiesarenotmadeordistributedforprofitorcommercialadvantageandthatcopiesbearthisnoticeand thefullcitationonthefirstpage.CopyrightsforcomponentsofthisworkownedbyothersthanACMmustbehonored. Abstractingwithcreditispermitted.Tocopyotherwise,orrepublish,topostonserversortoredistributetolists,requires priorspecificpermissionand/[email protected]. ©2099ACM. 0098-3500/2099/01-ART1$15.00 DOI:0000001.0000001 ACMTransactionsonMathematicalSoftware,Vol.1,No.1,Article1,Publicationdate:January2099. 1:2 J.C.Hückelheimetal. 1. INTRODUCTION Algorithmicdifferentiation(AD)isatooltoobtainderivativesofcomputerprograms.Deriva- tivesareanessentialcomponentforuncertaintyquantificationorgradient-basedoptimisation incountlessapplicationareassuchasfluidandstructuraldynamics,weatherforecasting,and finance.ThederivativescomputedbyADaretheresultofasymbolicdifferentiationoftheorigi- nal(primal)program,createdbyapplyingthechainruleofcalculustothesequenceofprogram statements.Thederivativesarethereforeaccurate,exceptforroundofferrorsthataffectboththe primalandderivativeprogramandareinevitableinanynontrivialfloating-pointcomputation. ThisdistinguishesADfromapproximateapproachessuchasfinitedifferencesthat,inaddition toroundofferrors,sufferfromtruncationerrorsthatareoftenordersofmagnitudelarger. AuserofADistypicallyinterestedinthederivativesofasubsetoftheoutputvariableswith respecttoasubsetoftheinputvariablesofaprogram.Werefertothesesubsetsasdependent variablesy∈(cid:82)m andindependent variablesx∈(cid:82)n,respectively.Thedifferentiableportionofa programisoftenasubsetoftheoverallprogramandweassumethatthedifferentiableportionis implementedinatopprocedureP andallotherproceduresinthecalltreethataredominated 0 byP ,thatis,calleddirectlyorindirectlybyP ,whereP canbewrittenas 0 0 0 [y,c ]←P (x,c ), out 0 in andc andc areinputandoutputparametersinadditiontotheindependentanddependent in out variables.AnADtoolinforwardmodecangeneratethetangent-linearderivativeprocedureP˙ 0 that,givenaseedvectorx˙∈(cid:82)n,computestheproductoftheJacobianofP withx˙givenby 0 ∂P(x,c ) y˙:= in ·x˙, y˙∈(cid:82)m. ∂x Alternatively,anADtoolinreversemodecanbeusedtogeneratetheadjointprocedureP¯ that, 0 givenaseedvectory¯∈(cid:82)m,computestheproductofthetransposeJacobianofP withy¯as 0 (cid:181)∂P(x,c )(cid:182)T x¯:= in ·y¯, x¯∈(cid:82)n. ∂x (cid:48) Theadjointandtangent-linearderivativeprocedures,whichcanbothbereferredtoasP ,do notactuallycomputethefullJacobianmatrix,butinsteadpropagatetheseedvectorthrougha linearisationoftheprimalprocedureP aroundx,ataruntimethatistypicallywithinanorder 0 ofmagnitudeoftheprimalruntimeT. Dependingontheprogramminglanguage,P canbeamethod,function,procedureorsub- 0 routineandtheinputsandoutputsmaybegivenforexampleasformalarguments,globalvari- ables,orclassmembervariables.Thesamevariablecanbeusedasbothaninputandoutput toP ,andcanbebothindependentanddependent.Thisisnotacontradiction:Inmostpro- 0 gramminglanguages,thesamememorylocation,representedbythesamevariablename,can beusedforanindependentinputbeforeP iscalled,andthenoverwrittenwithadependent 0 outputofP .Eventhoughbothnumbersresideinthesamecomputermemorylocation(albeit 0 atdifferentpointsintime)andsharethesamevariablenameinaparticularimplementationof theprocedure,thesetwoaredistinctmathematicalobjects.AnADtoolmustbeabletohandle thesecasescorrectly. Eitherthetangent-linearoradjointderivativeprocedurescanbeusedtoassembletheJaco- bianmatrixofP ,bysubsequentlyusingallunitvectorsintheappropriatespacesasseedvec- 0 tors.ThisresultsinatotalruntimeofO(n·T)forthetangent-linearcode,andO(m·T)forthe adjointcode.TheJacobianmatrixcomputedinbothwaysisidentical,exceptforroundofferrors. However,thetimeneededtoassembletheJacobiancandiffervastlyifthereisalargedifference betweennandm,thatis,alargedifferencebetweenthenumberofindependentanddependent variables. AD is in practice applied to large industrial codes that are called with many inputs suchasmaterialparametersorphysicalconstants,andthousandsormillionsofindependent ACMTransactionsonMathematicalSoftware,Vol.1,No.1,Article1,Publicationdate:January2099. Algorithmicdifferentiationofcodewithmultiplecontext-specificactivities 1:3 variablessuchasgeometrycoordinatesorCADparameters.Ontheotherhand,thedependent variablesthatareofinterestfordifferentiationcanoftenbeexpressedinoneorafewnumbers, suchasfuelconsumption,averagenoiselevels,ormaximummaterialstress.Insuchcaseswhere m(cid:191)n,itisbeneficialtousethereversemode. Industrialproblemsizesdictatetheuseofcarefulperformanceoptimisationtokeeptherun- timesoftheprimalandderivativecodesacceptable,see[MüllerandCusdin2005].Theamount ofcomputationsinthederivativecodecanbereducedbyperformingactivityanalysis,whichis theprocessofidentifyingactivevariablesintheprimalcode.Avariableissaidtobeactiveifits currentvalueinfluencesadependentvariableinadifferentiableway,andisinfluencedbyan independentvariableinadifferentiableway[Bischofetal.1992;FaganandCarle2004;Kreaseck etal.2006;Shinetal.2007].Thisknowledgeformsthebasisofmanysubsequentcodeanalysis andoptimisationsteps.Theactivityofvariablesinagivenpieceofcodecandependonrun- timeinput,andingeneralisundecidableatcompile-time.ADtoolsthereforemakeconservative assumptionsduringtheactivityanalysistoensurethecorrectnessofthederivativecode.The sharpnessoftheseassumptionsisdecisivefortheefficiencyofthegeneratedderivativecode. Inthiswork,wepresentawaytoimprovetheactivityanalysisofreverseandforwardmodeAD. Themethod,whichwerefertoasmulti-activityAD,detectsproceduresthatareaccessedfrom multiplecallsiteswithdifferentsetsofactivearguments.Weshowthatbycreatingspecialised differentiatedproceduresforsomecallsites,wecanachievesignificantspeedupsinderivative code runtime. The specialisation is based on a combination of source code analysis and user inputduringthesourcetransformationprocess.Asimilarapproachpurelybasedonthedetected callsiteactivitywasusedinearlyversionsofADIFOR[Bischofetal.1992],buthasbeendropped fromlaterversionswithoutathoroughinvestigationofitsbenefits. AfterareviewofADactivityanalysisinSec.2,thecontributionspresentedinthispaperare —aformaldescriptionoftheanalysisandcodegenerationformulti-activityADinSec.3, —acomplexityanalysisshowingexponentialworst-casedifferentiationtimeandlengthofthe generatedderivativecode,andastrategytoavoidthisworst-caseinpractice,inSec.4, —animplementationofmulti-activityADinthedifferentiationtoolTapenade[HascoetandPas- cual2013]forforwardandreversemodeADinSec.5,and —acasestudydemonstratingtheperformancegainedbyapplyingTapenadewithmulti-activity ADtoalargefluiddynamicscodeinSec.6. 2. RELATEDWORKANDBACKGROUND The method presented in this work applies only to algorithmic differentiation (AD) using the source-transformation approach. In source transformation AD, the source code of the primal programisparsed,analysed,andtransformedintoaderivativecodethatoftenusesthesame programminglanguageandasimilarstructureastheprimalcode.Examplesfortoolswiththis purposeareADIFOR,TAMC[Giering1999],TAF[Gieringetal.2006],OpenAD[Utkeetal.2008]or Tapenade.Strategiesusedbysourcetransformationtoolstoimprovetheefficiencyofgenerated derivative programs include elimination techniques on the computational graph [Naumann 2004],exploitationofindependentcomputations[Hascoëtetal.2002;Bückeretal.2002],check- pointingschemes[GriewankandWalther2000;Wangetal.2009]andactivityanalysis[Bischof etal.1992;FaganandCarle2004;Kreasecketal.2006;ShinandHovland2007;HascoëtandPas- cual2012]. OnecanmanuallyimplementderivativecodeinthesamewayasanADtool,andusehighlevel knowledgeofthemathematicalpropertiesoftheprimalprogramtocreatederivativeprograms that are more efficient than those generated by an AD tool. However, this is a significant and continuouseffort,asthederivativecodeisoftenascomplexastheprimalcode,andallupdates andbugfixesthatinfluencethederivativesoftheprimalcodehavetobeincorporatedintothe derivativecodewhenevertheyoccur,withtheriskofintroducingbugswitheverymanualmod- ification.Incontrast,anADtoolcanbeusedtocreatederivativecodeinanautomatedprocess, ACMTransactionsonMathematicalSoftware,Vol.1,No.1,Article1,Publicationdate:January2099. 1:4 J.C.Hückelheimetal. forexampleaspartofthebuildprocessofanapplication,whichguaranteesconsistentgradients evenaftermodificationstotheprimalcode.Itcanbebeneficialinpracticetouseacombined approachwhereanADtoolisusedtodifferentiatethemajorityofthecode,andmanualdiffer- entiationisusedforpartsofthesoftwarewhereastraightforwardsymbolicdifferentiationexists andchangestothecodeoccurlessfrequently. An AD tool relies on information that can be obtained from the source code of the primal program to generate an efficient derivative program. To this end, the tool may remove all in- structionsthatdonotcontributetothederivatives,leadingtosignificantcostsavingsasshown inSec.6.Furthermore,thederivativecodegeneratedinreversemodefromanonlinearprimal needstostoreorrecomputesomeintermediateresultsoftheprimalcomputation,andunder- standingwhichoftheseintermediateresultsareactuallyneededforthederivativecomputation iscrucialtokeepthememoryfootprintofthegeneratedprogramacceptable.Activityanalysisis aprerequisiteforallthis. Wereviewactivityanalysisintheremainderofthissection.Avariablev iscalledvariedifv currentlyholdsavaluethatwasinfluencedbyanindependentvariableinadifferentiableway, andusefulifvinfluencesadependentvariableinadifferentiableway.Ifvissimultaneouslyvar- iedanduseful,v iscalledactive.Toensurethecorrectnessofthederivativecode,itisnecessary forstaticactivityanalysistotreatallvariablesasactiveiftheymightbeactiveforsomepossible runoftheinputcode.Theactualactivitycanvaryatruntime,e.g.dependingonuserinputor thecurrentstateoftheprogram,andagivenpieceofsourcecodemayhavedifferentactivities eachtimethatitisexecuted.Wethereforedetermineanassumedactivitythatisanon-strictsu- persetoftheactualactivity.Fortheremainderofthiswork,weusethewordactivitytoreferto theassumedactivity,asthisisthepropertythatisofpracticalinterest,asopposedtotheactual activitythatwemaynotdeterminewithcertaintyatcompiletime. It is desirable to perform as many steps of the activity analysis as possible in an intra- proceduralfashion,thatis,separatelyforeachprocedure,toreducethetimeandmemoryre- quirementsoftheanalysis.Ateverycallsitetoanotherprocedure,theanalysisdependsonthe precomputeddifferentiationdependencyofthecalledprocedure,operatororintrinsicfunction. Thedifferentiationdependencydeterminesthewayinwhichanoperationchangesthevaried- nessandusefulnessofallaffectedvariables.Consideraprocedure[w ...w ]←P(v ...v )with 1 m 1 n ninputsandmoutputs.ThedifferentiationdependencyofaninstructionIthatcallsP,denoted asDiff-dep(I),isdefinedasasetofvariablepairswhere(v ,w )∈Diff-dep(I)iffthevalueofw j i i hasadifferentiabledependenceonv throughthecalltoP.Thedifferentiationdependencyof j aprocedureisthecompositionofthedifferentiationdependenciesofallcontainedoperators, intrinsicfunctionsandprocedurecalls.Thedifferentiationdependenciescanbecomputedina bottom-upsweepthroughthecalltree,sothatthepropertyofeachprocedureisknownwhena calltoitisencounteredduringtheanalysis. Followingthedifferentiationdependencyanalysis,theactivityanalysiscanbecarriedoutin atop-downsweepthroughthecalltree.Foreachprocedure,aforwardsweepthroughitsflow graphisrequiredtodeterminethevariedness,andareversesweepisrequiredtodeterminethe usefulness.Bothcantheneasilybecombinedtodeterminetheactivity.Everyactivevariablev j receivesaderivativecounterpartv˙ inforwardmodeorv¯ inreversemode.Foreachinstruction j j − I,thesetofvariablesthatarevariedbeforeandaftertheexecutionofIaredenotedasVaried (I) + andVaried (I),respectively.ThesetofvariablesthatareusefulbeforeandafterI aredenoted − + asUseful (I)andUseful (I).Therelationshipbetweenthesesetscanbeexpressedinthedata flowequationsshownbelowandin[HascoëtandPascual2012]. Varied+(I)=Varied−(I)⊗Diff-dep(I) Useful−(I)=Diff-dep(I)⊗Useful+(I) Active−(I)=Varied−(I)∩Useful−(I) (1) Active+(I)=Varied+(I)∩Useful+(I) ACMTransactionsonMathematicalSoftware,Vol.1,No.1,Article1,Publicationdate:January2099. Algorithmicdifferentiationofcodewithmultiplecontext-specificactivities 1:5 Thecomposition⊗isdefinedforanarbitrarysetofvariablesSas v ∈S⊗Diff-dep(I) ⇐⇒ ∃v ∈S|(v ,v )∈Diff-dep(I) 2 1 1 2 v ∈Diff-dep(I)⊗S ⇐⇒ ∃v ∈S|(v ,v )∈Diff-dep(I) 1 2 1 2 Asanexample,letusconsiderthesubtractionsub:w←v −v withinputsv ,v andoutputw. 1 2 1 2 Ifv orv isvaried,thenwbecomesvaried.Ifwisuseful,thenv andv becomeuseful.Neither 1 2 1 2 v norv aremodified.Thedifferentiationdependencyofthesubtractionoperatoristherefore 1 2 {(v ,w),(v ,w),(v ,v ),(v ,v )}.Toillustratetheforwardsweep,letusnowassumethatonlyv is 1 2 1 1 2 2 1 variedbeforethesubtraction.ThismeansthatVaried−(sub)={v }.ItfollowsthatVaried+(sub)= 1 {v }⊗{(v ,w),(v ,w),(v ,v ),(v ,v )}={v ,w}. 1 1 2 1 1 2 2 1 AninstructionIhasonesuccessorandonepredecessor,withtheexceptionofthefirstandlast instructionincontrolflowblockssuchasbranchorloopbodies.Thenumberofpredecessorscan belargerthanoneforthefirstinstructionofacontrolflowblock,andthenumberofsuccessors canbelargerthanoneforthelastinstructionofacontrolflowblock.Thevariednessofvariables beforeI isgivenbytheunionofvariedvariablesafterthepredecessorspre(I)ofI.Similarly,the usefulnessofvariablesaftertheexecutionofI isgivenastheunionofallusefulvariablesbefore thesuccessorinstructionssuc(I)ofI.Formally,thiscanbewrittenas Varied−(I)= (cid:91) Varied+(J) J∈pre(I) (2) Useful+(I)= (cid:91) Useful−(J). J∈suc(I) TheonlyinstructionthatdoesnothaveapredecessoristhefirstinstructionI ofthetopproce- 0 − dure,denotedasI (P ),andVaried (I (P ))isthesetofindependentvariablesasdefinedbythe 0 0 0 0 user.Likewise,theonlyinstructionwithoutasuccessoristhefinalinstructionofthetopproce- + dureI∞(P0),andUseful (I∞(P0))isgivenbytheuser-definedsetofdependentvariables.Forall proceduresotherthanP ,thevariednessbeforethefirstinstructiondependsonthevariedness 0 beforethecallsitestothatprocedure,andtheusefulnessafterthelastinstructiondependson (cid:48) theusefulnessafterthecallsites.WhengeneratingaderivativeprocedureP ,onecanusethe unionofthecallsitevariednessandusefulnessfortheinternalactivityanalysisinP,givenby Varied−(I (P))= (cid:91) Varied−(c) (3) 0 c∈C(P) Useful+(I∞(P))= (cid:91) Useful+(c), (4) c∈C(P) whereC(P)isthesetofallinstructionsthatarecallsitestoP.Thiscanleadtoanover-estimation oftheactivityinP andtopoorperformanceofthederivativecodefortworeasons. (1) Assumethatatleastonevariablev isnotvariedatagivencallsite,butassumedtobevaried 1 before I (P), or a variable v is not useful at the call site, but assumed to be useful after 0 2 I∞(P).Thisresultsinthecreationofdummyderivativevariablesforv1andv2atthecallsite location.Inforwardmode, v˙ needstobeinitialisedtozerotoavoidincorrectderivatives 1 insideP˙,whilev˙ receivesaderivativevaluethatremainsunused.Inreversemode,v¯ needs 2 2 tobezeroedandv¯ receivesavaluefromP¯thatremainsunused. 1 (cid:48) (2) InsideP weassumevariablestobeactivethatareactuallyinactive.Thisincreasesthein- (cid:48) struction count and memory footprint of P . For instance, if a variable is assumed active whileactuallyinactive,thencodetocomputeitsderivativeisinserted.Thiscodemaydepend onotherwiseunneededintermediatevariables,thusrequiringmorecodefromtheprimalto beinsertedintothederivativeproceduretocomputetheseintermediatevalues.Evenworse, thisadditionalcodemayoverwriteothervariablesthatneedtobestoredandretrievedor recomputedinreverse-differentiatedcode. Multi-activityADcanovercomethisproblembycreatingspecialiseddifferentiatedprocedures. ACMTransactionsonMathematicalSoftware,Vol.1,No.1,Article1,Publicationdate:January2099. 1:6 J.C.Hückelheimetal. 3. FORMALDESCRIPTIONOFMULTI-ACTIVITYAD Multi-activityADcanresultinthecreationofmultiplespecialiseddifferentiatedproceduresfor anyprimalprocedureP.Eachofthesespecialisedproceduresisdeterminedbytheassumedvar- iednessofthefirstinstructionandassumedusefulnessofthefinalinstructionofP.Tosimplify thenotation,theactivitypattern A isusedtodenoteapair(V,U)where A.V isasetofvaried variablesand A.U isasetofusefulvariablesassociatedwiththatactivitypattern.Weredefine inSec.3.1thedataflowequationstotakeintoaccountthepresenceofmorethanoneactivity pattern.Afterthat,Sec.3.2presentsstrategiestoselectasetofactivitypatternsforaprocedure basedon,amongotherthings,thepropertiesofitscallsites.Finally,Sec.3.3outlinesthecode generationinmulti-activityAD. Asmentionedbefore,theusermustspecifydependentandindependentvariablesforthetop procedureP .Withmulti-activity,ausermayinadditionspecifyoneormoresetsofdependent 0 andindependentvariablesforanyprocedureP thatisdominatedbyP ,orforP itself. 0 0 3.1. Dataflowandactivitypatterns ThevariednessofthefirstinstructionandtheusefulnessofthelastinstructionofPcanformally bedefinedintermsoftheactivitypatternAofP as A:=(V,U) Varied−(I0(P,A))=V (5) Useful+(I∞(P,A))=U. Thedataflowequations(1)canbemodifiedsothattheyaredefinednotonlyperinstruction, butinsteadperinstructionandforagivenactivitypattern A.Wenotethatthedifferentiation dependencyiscalculatedinthesamewayaswithoutmulti-activitydifferentiation. Varied+(I,A)=Varied−(I,A)⊗Diff-dep(I) Useful−(I,A)=Diff-dep(I)⊗Useful+(I,A) Active−(I,A)=Varied−(I,A)∩Useful−(I,A) (6) Active+(I,A)=Varied+(I,A)∩Useful+(I,A) 3.2. Selectionofactivitypatterns Thereisatradeoffbetweentheoperationcountoftheresultingderivativeprogram,whichcan bereducedbycreatingspecialiseddifferentiatedproceduresforasmanycallsitesaspossible, andthesizeoftheresultingderivativeprogram,whichisincreasedwiththenumberofcreated differentiatedprocedures.Wediscussdifferentstrategiestochooseasetofactivitypatternsin thissection. Wedefineasetofactivitypatterns(cid:65)(P)foreveryprocedureP anddemandthat(5)and(6) hold∀A∈(cid:65)(P).Toensurethecorrectnessofthedifferentiatedprogram,foreachcallsitectoP theremustbeatleastoneactivitypatternA∈(cid:65)(P)withA.V ⊇Varied−(c)andA.U⊇Useful+(c). The user may insert activity patterns of his choice into (cid:65)(P) by specifying sets of indepen- dent and dependent variables for any P. These user-defined activity patterns are referred to asdifferentiationheadsandforagivendifferentiationheadD wedemandsimilarlyto(5)that − Varied (I (P),D)isequaltothesetofindependentvariablesD.V andthesetofusefulvariables 0 + Useful (I∞(P),D)isequaltothesetofdependentvariablesD.U.Ausercanspecifymorethan onedifferentiationheadforP,andwedenotethesetofdifferentiationheadsas(cid:68)(P). Inadditiontothedifferentiationheads,thesetofactivitypatternsforP cancontainoneor morepatternsbasedontheactivitiesofcallsitestoP.WedenoteasC(P)thesetofallcallin- structionstoP.Theprocedureinwhichaparticularcallsitec∈C(P)iscontainedisdenotedas caller(c)andmayitselfhavemultipleactivitypatterns.Thisleadstoasetofactivitypatternsfor ACMTransactionsonMathematicalSoftware,Vol.1,No.1,Article1,Publicationdate:January2099. Algorithmicdifferentiationofcodewithmultiplecontext-specificactivities 1:7 cgivenby (cid:65)(c):=(cid:169)(V,U):∃A∈(cid:65)(caller(c)):V =Varied−(c,A)∧U=Useful+(c,A)(cid:170) ∀c∈C(P). BasedontheactivitypatternsofcallsitestoP,wecandefinethesetofactivitypatternsfor P.Oneextremeapproachistoincludetheexactmatchingactivitypatternforeachcallsiteinto (cid:65),wecallthisapproachspecialize-allanddenotethecorrespondingsetofactivitypatternsas (cid:65) .Theotherextremeistocreateonlyoneactivitypatternthatenclosesallcallsiteactivities,we s refertothisasgeneralize-alldenotedby(cid:65) . g Formally,(cid:65) (P)canbedefinedas s (cid:65)int:= (cid:91) (cid:65)(c) s c∈C(P) (cid:65) (P):=(cid:68)∪(cid:65)int s s where(cid:65)int isthesetofactivitypatternsthatwascreatedduetointernalcallsitesinthecode. s AusermayexplicitlydefineadifferentiationheadforaprocedureP thatmatchessomeother activitypatternforP,thatis,(cid:68)∩(cid:65)int(cid:54)=(cid:59).Thisdoesnotaffecttheanalysisandonlyoneinstance ofthispatterniscontainedin(cid:65). Incontrast,thegeneralize-allapproachyieldsonlyuptooneactivitypatternforeachproce- dureinadditiontothedifferentiationheads. To givetheusermore flexibility,onecanimple- mentastrategythatdefaultstogeneralize-all,butallowsuser-definedspecialisations,forexam- plethroughadditionaldifferentiationheadssuppliedasargumentstotheADtool,orthrough pragmasintheprimalcodethatselectspecificcallsitesforspecialisation.Bothoptionswere implemented in Tapenade in the course of this work. If we considerC (P) to be the call sites s markedforspecialisation,thesetofactivitypatternsisgivenby (cid:65)int:= (cid:91) (cid:65)(c) gs c∈Cs(P) (cid:195) (cid:33) Vint:= (cid:91) (cid:91) Varied−(c,A) gn c∈C(P)\Cs(P) A∈(cid:65)(c) (7) (cid:195) (cid:33) Uint:= (cid:91) (cid:91) Useful+(c,A) gn c∈C(P)\Cs(P) A∈(cid:65)(c) (cid:110)(cid:179) (cid:180)(cid:111) (cid:65) (P):=(cid:68)∪(cid:65)int∪ Vint,Uint , g gs gn gn thatis,theunionofthesetofspecialisedactivitypatternsforallmarkedcallinstructions,and thesetcontainingthegeneralisedactivitypatternforallothercallinstructions.Thefinalsetof activitypatternsforeachprocedureis (cid:189)(cid:65) (P) ifP specialised (cid:65)(P)= (cid:65)s(P) else , (8) g whereagivenprocedureisspecialisedifitwasmarkedforspecialisationbytheuserbythemeans ofacommand-lineflagorpragmainthecode,orifthespecialize-allapproachwaschosen.It followsfromtheaboveequationsthatregardlessofthespecialisationmethod,thereisalwaysan exactmatchingactivitypatternforeachdifferentiationheadthatwasspecifiedbytheuser. Bothspecialisationandgeneralisationhavetheiradvantages.Thespecialisationfacilitatesthe best-possibleactivityanalysis,leadingtomoreefficientderivativecode.Ontheotherhand,there isapricetopayintermsofderivativecodesizeandruntimeofthedifferentiationtool,seeSec.4. 3.3. Generationofthederivativecode Inferencerulesforforwardandreversedifferentiationhavebeenshownin[HascoëtandPascual 2012],followingthenaturalsemanticsnotation[Kahn1987].Theserulesareawaytoformalise ACMTransactionsonMathematicalSoftware,Vol.1,No.1,Article1,Publicationdate:January2099. 1:8 J.C.Hückelheimetal. thetransformationofprimalcodeintoderivativecode.Codegenerationisthestepthatfollows theanalysisinSec.3.1,andisbasedontheactivitypatternsselectedinSec.3.2.Foreveryprimal procedureP,theADtoolmustgenerateatotalnumberof(cid:107)(cid:65) (cid:107)specialisedderivativeprocedures P P˙ inforwardorP¯ inreversemode,oneforeveryA∈(cid:65) . A A P As an example, we discuss the inference rule for the forward-differentiation of procedure headers.ThefollowingruledefinesacodetransformationforeveryprocedureP,whereA(cid:96)de- notesthattheruleisexecutedforeachactivitypattern. A(cid:96)P−p−r−o−cN−a−m−→e P• A,P,0(cid:96)ARGS→AR•GS A(cid:96)INSTRS→INS•TRS A(cid:96)DECLS→DE•CLS (9) • • • • A(cid:96)procedureP(ARGS){DECLS;INSTRS}→procedureP(ARGS){DECLS;INSTRS} Thisruleconnectsoneso-calledconclusionpredicatebelowthefractionbar,withzeroormore (here four) hypothesis predicates above the fraction bar. Each predicate represents some code transformationorrewrite,andisconsideredsolvedwhenitissuccessfullyappliedtosomecode. Inpredicates,weuseanarrowtoseparatethecodebeforeandafterrewriting.Inordertosolve theconclusionpredicateofagivenrule,allitshypothesispredicatesmustbesolved,recursively byusingotherrules. Withthisinmind,theinferencerule(9)canberead,orexecutedbyacoderewritingsystem, asfollows:foreachactivitypattern,iftheprimalcodematchesthepattern procedureP(ARGS){DECLS;INSTRS}, thusinstantiatingvariablesP,ARGS,DECLS,andINSTRSwiththecorrespondingcodepieces, andifthefourhypothesispredicatescanberecursivelysolvedusingotherinferencerules,thus • • • • instantiating variables P, ARGS, DECLS, INSTRS, then the conclusion predicate is solved and it producesthederivativecodebuiltas • • • • procedureP(ARGS){DECLS;INSTRS}, wherethevariablesP,ARGS,DECLS,andINSTRSholdtheprocedurename,anditsarguments, declarations, and instructions. We define a number of utility predicates for the elementary rewriteoperations,identifiedbyasuperscriptabovethearrow.Forinstance,predicate A(cid:96)P−p−r−o−cN−a−m−→e P• • meansthatthenameP oftheprocedureissuccessfullytransformedintothenamePofitsdif- ferentiatedversionforactivitypatternA.PredicateprocNamedealswithanimportantaspectof themulti-activityapproach:withoutspecialization,itcanactsimplybyappendingasuffixto theprocedurename,e.g._dforforwardand_bforreverse-differentiation.Ifhowevermorethan onespecialisationiscreated,itisnecessarytogenerateuniquesuffixesforeachactivitypattern toavoidassigningthesamenametoseveralprocedures,e.g.byencodingtheactivityinastring, orbynumberingthepatterns.Toavoidgeneratingexcessivelylongprocedurenames,wechose thelatterapproachandappendanumber,startingfromzero,whenevertwoprocedureswould otherwisehavethesamename.Thereisnonaturalwaytodefineanorderoveractivitypatterns, hencethenumberinginourimplementationdependsontheorderinwhichspecialisationsare created,whichdependsonimplementationdetailsofTapenadeandtheprimalcode.Theuser canchoosecustomsuffixesfordifferentiationheadstomakethenamingpredictableifneeded. Therewritepredicatefortheprocedurearguments(thesecondhypothesispredicatein(9)) requiresasacontext,inadditiontoA,thecurrentprocedureP andtheindex0ofthenextargu- ment.Thepredicatecanitselfbeformalisedinthefollowingrewriterulesfortheargumentslist, whosefirsthypothesispredicateisabooleanconditionthatselectstheapplicablerule: • • isDiffFormalArg(A,P,rk) ARG−v−a−rN−−am−→e ARG A,P,rk+1(cid:96)ARGS→ARGS • • A,P,rk(cid:96)(ARG.ARGS)→(ARG,ARG.ARGS) ACMTransactionsonMathematicalSoftware,Vol.1,No.1,Article1,Publicationdate:January2099. Algorithmicdifferentiationofcodewithmultiplecontext-specificactivities 1:9 • !isDiffFormalArg(A,P,rk) A,P,rk+1(cid:96)ARGS→ARGS • A,P,rk(cid:96)(ARG.ARGS)→(ARG.ARGS) ThepredicateisDiffFormalArg(A,P,rk)istrueiftherkthformalargumentofprocedurePisactivefor − + A,i.e.itbelongstoActive (I0(P),A)ortoActive (I∞(P),A).Inthatcase,thederivativeargument • ARGisinsertedintothederivativeargumentslist,andtheprimalargumentARGisinsertedinall cases.Theadaptedinferencerulesforaprocedurecallareshownbelow,togetherwiththerules fordifferentiatingtheactualargumentsofthecall. isActiveCall(A,B) B(cid:96)P−p−r−o−cN−a−m−→e P• A,B,P,0(cid:96)ARGS−a−c−tu−a−l−Ar−g→s AR•GS (10) • • A(cid:96)callP(ARGS)→callP(ARGS) isDiffFormalArg(B,P,rk) A(cid:96)EXPR−r−e→f EX•PR A,B,P,rk+1(cid:96)EXPRS−a−c−tu−a−l−Ar−g→s EXP•RS A,B,P,rk(cid:96)(EXPR.EXPRS)−a−c−tu−a−l−Ar−g→s (EXPR,EX•PR.EXP•RS) !isDiffFormalArg(B,P,rk) A,B,P,rk+1(cid:96)EXPRS−a−c−tu−a−l−Ar−g→s EXP•RS A,B,P,rk(cid:96)(EXPR.EXPRS)−a−c−tu−a−l−Ar−g→s (EXPR.EXP•RS) AllpropertiesofcallsandofcallargumentsarefunctionsofthecurrentactivityAofthecontain- ingprocedure,orofthecorrespondingcalledactivityofthecalledprocedure.Inparticular,isAc- − + tiveCall(A,B)istrueatcallsitec ifoneargumentofthiscallisinActive (c,A)orinActive (c,A). Inthatcase,thisprerequisiteunifies(i.e.“sets”)B withthecorrespondingactivityforthecalled procedure.ItisimportanttonotethatB isanactivitypatternofthecalledprocedure,while A is an activity pattern of the calling procedure. For any given call site and activity pattern, we havethetaskoffindinganactivitypatternB∈(cid:65) thatisapossiblematchforthevariednessand P usefulnessofthecallsite.Formally, Varied−(c,A)⊆B.V Useful+(c,A)⊆B.U. IfeithertheprocedureP orthecallsitec havebeenmarkedforspecialisation,wecanalways findaperfectmatch,i.e. Varied−(c,A)=B.V Useful+(c,A)=B.U. Ifthereisnoperfectmatch,wehavetoacceptthatsomeunnecessarycomputationsorinitialisa- tionsaremadeinthederivativecode,whichisthebehaviourthatweencounteredforADwithout specialisation.WecouldtrytofindthebestB∈(cid:65) tominimisethecostthatarisesfromsuper- P fluousderivativecode,whichwouldrequireametricforsaidcost(intermsofmemory,CPUtime etc.)givenbysomefunction cost(B.V \Varied−(c,A)∩B.V, B.U\Useful+(c,A)∩B.U). Thisishowevernotalwayspossiblewithstaticanalysis,astheruntimeandmemorycostofthe primalcodeaswellasthegeneratedderivativecodemaydependoninputthatisonlyknown atruntime.Thisisthecasemostofthetimeinpracticalapplications(e.g.iftheinputdefines theproblemsizeorthedesiredqualityoftheoutput).Hence,wechosenottoapproximatethis costwithstaticanalysis.Instead,ourimplementationconnectseachcalltothederivativepro- cedurewiththeperfectlymatchingactivitypatternifitexists,orthefirstfoundpossiblematch ACMTransactionsonMathematicalSoftware,Vol.1,No.1,Article1,Publicationdate:January2099.
Description: