Table Of Content

Automated Synthesis of Divide and Conquer Parallelism AzadehFarzan VictorNicolet UniversityofToronto EcolePolytechnique 7 Abstract in Figure 1. The array s of length ∣s∣ is partitioned into 1 0 This paper focuses on automated synthesis of divide-and- n 2 cmoinnqguesrkeplaertaolnlelsisumpp,owrtheidchbiys amcaonmymcoronssp-aprlaaltlfeolrmpromgrualmti-- sum(s[0..k1]) sum(s[k1+1..k2]) ··· sum(s[kn�1+1..kn]) sum(s[kn+1..|s|�1]) a threaded libraries. The challenges of producing (manually J orautomatically)acorrectdivide-and-conquerparallelpro- � ··· � 9 gramfromagivensequentialcodearetwo-fold:(1)assum- sum(s[0..k2]) ··· ··· ··· sum(s[kn�1+1..|s|�1]) 2 ing that individual worker threads execute a code identical � to the sequential code, the programmer has to provide the ] sum(s) L extracodefordividingthetasksandcombiningthecompu- P tation results, and (2) sometimes, the sequential code may Figure1:Divideandconquerstylecomputationofsum. s. not be usable as is, and may need to be modified by the n+1 chunks, and sum is individually computed for each c programmer. We address both challenges in this paper. We chunk. Then the results of these partial sum computations [ presentanautomatedsynthesistechniqueforthecasewhere arejoined(operator⊙)intoresultsforthecombinedchunks nomodificationstothesequentialcodearerequired,andwe 1 ateachintermediatestep,withthejoinattherootofthetree propose an algorithm for modifying the sequential code to v returningtheresultofsum fortheentirearray.Theburden 5 makeitsuitableforparallelizationwhensomemodification ofacorrectdesignistocomeupwiththecorrectimplemen- 4 isnecessary.Thepaperpresentstheoreticalresultsforwhen tationofthejoinoperator.Here,itiseasytoquicklyobserve 3 this modification is efficiently possible, and experimental thatthejoinhastosimplyreturnthesumofthetwopartial 8 evaluationofthetechniqueandthequalityoftheproduced results.However,thecorrectjoinisnotalwaysassimpleto 0 parallelprograms. . formulate. 1 Recent advances in program synthesis [2] demonstrate 0 1. Introduction the power of synthesis in producing non-trivial computa- 7 1 Despitebigadvancesinoptimizingandparallelizingcompil- tions.Anaturalquestiontoaskiswhetherthispowercanbe : ers,correctandefficientparallelcodeisoftenhand-crafted leveraged for this problem. In section 2, we present an ex- v in a difficult and error-prone process. The introduction of ampleofasequentialloopthatmaybetrickytoparallelize i X libraries like Intel’s TBB [39] was motivated by this diffi- for an average programmer. However a correct join can be r cultyandwiththeaimofmakingthetaskofwritingparallel successfullysynthesized,ifthesynthesisproblemisframed a programswithgoodperformanceeasy.Theselibrariesoffer properly. efficientimplementationsforcommonlyusedparallelskele- Inthispaper,wefocusonaclassofdivide-and-conquer tons.Thismakesiteasierfortheprogrammertowritecode parallel programs that operate on sequences (lists, arrays, inthestyleofagivenskeletonwithouthavingtomakespe- or in general any collection with a linear iterator) in which cial considerations for important factors like scalability of the divide operator is assumed to be the default sequence memory allocation or task scheduling. Divide-and-conquer concatenation operator (i.e. divide s into s and s where 1 2 parallelismisoneofthemostcommonlyusedoftheseskele- s = s •s ). In Section 4, we discuss how we use syntax- 1 2 tons. guided synthesis efficiently to synthesize these join opera- Consider the function tors. The main challenge for syntax-guided synthesis is to sum = 0; sum that returns the sum for (i = 0; i < |s|; i++) { defineasearchspaceforsynthesisthatisexpressiveenough of the elements of an ar- sum = sum + s[i]; to include the desired solution, but limited enough to stay } ray of integers. The code withinthelimitsofscalabilityofcurrentsolvers.Moreover, ontherightisasequentialloopthatcomputesthisfunction. inSection7,wediscusshowtheproofsofcorrectnessforall Tocomputethesum,inthestyleofdivideandconquerpar- synthesized join operations can be automatically generated allelism,thecomputationshouldbestructuredasillustrated and checked. This addresses a second challenge that most synthesisapproachesface,namelythatthesynthesizedarti- 2. Overview factisonlyguaranteedtobecorrectforthesetofexamples In this section, we present an overview of our approach usedinthesynthesisprocessandnottheentire(infinite)data by the way of a few examples. We use successively more domainofprograminputs. complicatedexamplestodemonstratethechallengesofthis Ageneraldivide-and-conquerparallelsolutionisnotal- problem,andhowourapproachdealswiththem.Wereferto ways as simple as the diagram in Figure 1. Consider the ourlinearcollectionsassequencesthatmodelanycollection functionis-sorted(s)whichreturnstrueifanarrayissorted, withalineariterator. and false otherwise. Providing the partial computation result,abooleanvalueinthiscase,frombothsidestothejoin Second Smallest. Con- Figure2:SecondSmallest willnotsuffice.Ifbothsub-arraysaresorted,thejoincannot sider the loop implemen- makeadecisionaboutthesortednessofthecombinedarray. tation of the function mm2==MAMXA_XI_NITN;T; In other words, a join cannot be defined solely in terms of min2thatreturnsthesec- for (i = 0; i < |s|; i++) { thesortednessofthesubarrays. ond smallest element of m2 = min(m2, max(m,s[i])); m = min(m, s[i]); It is clear to a human programmer that the join requires a sequence, on the right. } thelastelementofthefirstsubarrayandthefirstelementof minandmaxareusedfor thesecondsubarraytoconnectthedots.Theextrainforma- brevity,whichcanbereplacedbytheirstandarddefinitions tionisassimpleasrememberingavalue,butaswedemon- min(a,b) = a < b?a ∶ bandmax(a,b) = a > b?a ∶ b. stratewithanotherexampleinSection2,theextrainforma- The correct join operator for a divide and conquer parallel tionmayneedtobecomputedtobeavailabletothejoin.In- implementationis tuitively,thismeansthattheworkerthreads(i.e.leafnodesin m = min(m_l, m_r); Figure1)havetobemodifiedtocomputethisextrainforma- m2 = min(min(m2_l, m2_r), max(m_l, m_r)); tionsothatajoinoperatorexists.Wecallthismodification wherethe land rsuffixesdistinguishtheinputstothejoin ofthecodeanextensionforshort.Thenecessityoftheex- comingfromleftorright.Itiseasytomakethemistakeof tensionraisestwoquestions:(1)doessuchextensionalways usingthefollowingjoininstead1: exist? and (2) can the overhead from the extension and the m = min(m_l, m_r); m2 = min(m2_l, m2_r); accompanyingjoinovershadowtheperformancegainsfrom whichwouldreturntheincorrectresultifpartialresultsfor parallelization? In Section 5, we answer both questions is thetwosequences[5,1,2,7],wheremis1andm2is2,and the affirmative. The challenge for automation is then how [3,8,4,9], where m is 3 and m2 is 4 were to be combined. to modify the original sequential loop so that it computes Usingatemplatewhichisbasedonthecodeoftheloopbody enough additional information so that (i) a join does exist, withunknowns(akasyntax-guidedsynthesis),wesynthesize (ii)thisjoinisefficient,and(iii)theoverheadoftheexten- thecorrectjoinin5s(moreonthisexampleinSection4). sionisnotunreasonablyhigh.Furthermore,welaythetheo- reticalfoundationstoanswerthisquestion,andinSection6, Maximum Tail Sum. Consider the function mts that re- wepresentanalgorithmthatproducesasolutionsatisfying turns the maximum suffix sum of a sequence of integers allaforementionedcriteria. (positiveandnegative).Forexample,mts([1,−2,3,−1,3])= Insummary,inthispaper,wemakethefollowingcontribu- 5,whichisassociatedtosumofthesuffix[3,−1,3]. tions: The code on the right il- mts = 0; lustratesaloopthatimple- for (i = 0; i < |s|; i++) { • Wepresentanalgorithmtosynthesizeajoinfordivide- ments mts. To parallelize mts = max(mts + s[i], 0) and-conquer parallelism when one exists. Moreover, } this loop, the programmer these joins are accompanied by automatically generated needstocomeupwiththecorrectjoinoperator.Inthiscase, machine-checkedproofsofcorrectness(Sections4,7). eventhemostcleverprogrammerwouldbeataloss,because • Wepresentanalgorithmforautomaticextensionofnon- thereisnocorrectjoin.Considerthepartialsequence[1,3], parallelizableloops(whenajoindoesnotexist)totrans- when extended by two different partial sequences [−2,5] formthemintoparallelizableones(Section6). and[0,5].Wehave: • We lay the theoretical foundations for when a loop is mts([1,3])=4 mts([1,3])=4 or can be made efficiently parallelizable (divide-andmts([−2,5])=5 mts([0,5])=5 conquer style), and explore when an efficient extension existsandwhenitcanbeautomaticallydiscovered(Sec- mts([1,3,−2,5])=7 mts([1,3,0,5])=9 tions5,6). Thevaluesofmts forallpartialsequencesarethesame,but • Weimplementedourapproachasatool,PARSYNT,and the value of the mts for the concatenation of sequences is presentexperimentalresultsthatdemonstratetheefficacy oftheapproachandtheefficiencyoftheproducedsolu- 1Infact,asignificantpercentageoftheundergraduatestudentsinanalgo- tions(Section8). rithmsclassroutinelymakethismistakewhengiventhisexercise. differentinthetwocases.Ifajoinfunction⊙existssuchthat Inthelastline,thesecondterm(undermax)iscomputing mts(s•t)=mts(s)⊙mts(t),thenthisfunctionwouldhave the mts as if the element s[i + 1] is the first element of toproducetwodifferentoutputsfor4⊙5forthetwocases the (input) sequence. So, one can imagine that mtsi (i.e. mts(s[0..i])) is computed by one thread, while a second mentioned above. In other words, if the only information thread (so far) computes mts([s[i + 1]]) (the remaining availabletothejoinisthemtsvalueofthepartialsequences, part of the sequence seen so far). To compute the value of then the mts value of the concatenation is not computable. mts(s[0..i + 1]), the join code needs the aforementioned Whatistobedone? partiallycomputedvaluesandtheextravalues[i+1].Our Atthispoint,acleverprogrammermakestheobservation algorithm(presentedinSection6),atthispointconjectures thattheloopisnotparallelizableinitsoriginalform.Inor- dertoconcludethatmts([1,3,−2,5]) = 7,beyondknow- s[i + 1] as an extra auxiliary information and ‘+’ as its ingthatmts([1,3]) = 4andmts([−2,5]) = 5,oneneeds accumulator,sinceitisactivelylookingforaccumulatorsto toknowthatsum([−2,5])=3. learn. Moving to Equation (2) (skipping the simplification steps): of Cthoensipdreerviaonusexsteeqnusieonn- mstusm == 00;; mtsi+2 =max(mtsi+1+s[i+2],0)=... for (i = 0; i < |s|; i++) { =max(mts +s[i+1]+s[i+2],mts([s[i+1..i+2])) tial loop for mts on the i mts = max(mts + s[i], 0); right. There is an addi- sum = sum + s[i]; followingthesamelineofargumentasabove,ouralgorithm tional loop variable, sum, } discovers that it needs to calculate and pass along s[i + thatrecordsthesumofallthesequenceelements.Inthese- 1] + s[i + 2]. At this point, the algorithm realizes that quentialloop,itisaredundantcomputation.Wecallsuman theconjecturedaccumulatorfromthelastroundmakesthis auxiliary accumulator, for this reason. Now that the sum is information available. Therefore, not having seen anything available,thecodecanbeparallelizedwiththisjoin: new, it stops and concludes that the auxiliary accumulator sum = sum_l + sum_r; sum =sum+s[k](wherekisthecurrentiterationnumber) mts = max(mts_r, sum-r+mts_l); isthenewauxiliarycomputationrequiredtomaketheloop parallelizable. Withloopslikethis,theburdenoftheprogrammerismore than just coming up with the correct join code. She has to 3. BackgroundandNotation modifytheoriginalsequentialcode,byaddingnewcompu- tation,tomakeitparallelizable,andthendevisethecorrect Thissectionintroducesthenotationusedintheremainderof joinforthenewsequentialloop.Thealgorithmpresentedin the paper. It includes definitions of some concepts that are Section5doesexactlythat.Weprovideanoverviewofour new,andsomethatarealreadyknownintheliterature. algorithmbyillustratinghowitworksforthemts example 3.1 SequencesandFunctions todiscovertheauxiliaryinformationaboutthesum. Considerthegeneralcaseofcomputingmts(s)sequen- WeassumeagenerictypeSc thatreferstoanyofthescalar tially, where ∣s∣ = n. Imagine the point in the middle of types used in programs, such as int, float, char, and the computation, when the sequence s has been processed bool, when the specific type is not important in the con- uptoandincludingindexi.Wehavethefollowingsequence text.Thesignificanceofthetypeisthatscalarsareassumed ofrecurrence-likeequationsthatrepresenttheunfoldingof tobeofconstantsize,andtherefore,anyconstant-sizerep- thissequentialloopstartingfromindexi+1: resentabledatatypeisconsideredscalarinthispaper.More- over,fromthepointofviewoftimecomplexity,weassume mtsi+1 =max(mtsi+s[i+1],0) (1) alloperationsonscalarstobeconstant-time. mtsi+2 =max(mtsi+1+s[i+2],0) (2) Type defines the set of all sequences of elements of S ... the type Sc. For any sequence x, we use ∣x∣ to denote the Whenthe(unfolded)computationabovestarts,theinitial length of the sequence. x[i] (for 0 ≤ i < ∣x∣) denotes the valueofmts isnotavailabletoit,sinceitisbeingcomputed elementofthesequenceatindexi,andx[i..j]denotesthe i in parallel by a different thread simultaneously. The chal- subsequence between indexes i and j (inclusive). Concate- lengeishowto(mostly)dothecomputationabove,without nation operator • ∶ × → is defined over sequences S S S having the value of mts in the beginning, and adjust the inthestandardway,andisassociative..Thissequencetype i computedvalueaccordinglyoncethevalueofmts becomes standsinforarrays,lists,oranyotherlinearcollectionsthat i available.LetusstartbyexaminingEquation(1): admititeratorsandanassociativecompositionoperator(e.g. mtsi+1 =max(mtsi+s[i+1],0) concatenation) =max(max(mts +s[i+1],s[i+1]),0) Definition3.1. Afunctionh ∶ → Discalledrightwards i iffthereexistsabinaryoperatorS⊕∶D×Sc →Dsuchthat =max(mtsi+s[i+1],max(s[i+1],0)) forallx∈ anda∈Sc,wehaveh(x•[a])=h(x)⊕a =max(mts +s[i+1],max(s[i+1],0)) S i Note that the notion of associativity for ⊕ is not well- =max(mts +s[i+1],mts([s[i+1]]) i defined,sinceitisnotbinaryoperationdefinedoveraset. e∈Exp∶∶=e○e′ e,e′∈Exp random access a[i] to refer to the next element. In princi- ∣x x∈Var ple,anycollectionwithaniteratorandasplitfunctionwould ∣s[e] s∈SeVar,e∈Exp work.Therehasbeenalotofresearchoniterationspacesand ∣k k∈Z iterators(e.g.[52]inthecontextoftranslationvalidationand ′ ∣ifbetheneelsee [23] in the context of partitioning), that formalize complex Program∶∶=c;c′ c,c′∈Program traversals by abstract iterators. Programs with nested loops ∣x:=e x∈Var,e∈Exp areadmitted,howeverourtoolalwaysfocusesonparalleliz- ∣b:=be b∈BVar,be∈Exp ingoneinnerloopatatime. ∣if(e){c}else{c′} be∈Exp,c,c′∈Program ∣for(i∈D){c} i∈Iterator StateandInputVariables. everyvariablethatappearson thelefthandsideofanassignmentstatementinaloopbody Figure 3: Program Syntax . The binary ○ operator represents is called a state variable, and the set of state variables is any arithmetic operation (+,−,∗,/), operator represents any denotedbySVar.Everyothervariableisaninputvariable, comparator(<,≤,>,≥,=,≠).Disaniterationdomain. and the set of input variables is denoted by IVar. The se- Definition3.2. Afunctionh∶ →<Discalledleftwardsiff quencethatisbeingreadbytheloopisconsideredaninput there exists a binary operatorS⊗ ∶ Sc ×D → D such that variable, since it is only read and not modified. Note that forallx∈ anda∈Sc,wehaveh([a]•x)=a⊗h(x). SVar∩IVar=∅. S Associativityof⊗isnotwell-definedaswell. ModelofaLoopBody Definition 3.3 (Single-Pass Computable). Function h is Weintroduceaformalmodelforthebodyofaloop,which single-passcomputableiffitisrightwardsorleftwards. has no other loops nested inside it. The loop body consists In this paper, our focus is on single-pass computable ofassignmentandconditionalstatementsonly. functionson sequences (ofscalars). Theassumption isthat Let SVar = {s ,...,s }. The body is modelled by a 1 n theirimplementation(leftwardsorrightwards)isgiveninthe systemof(ordered)recurrenceequationsE =⟨E ,...E ⟩, 1 n formofanimperativesequentialloop. whereeachequationE isoftheforms =exp (SVar,IVar). i i i 3.2 Homomorphisms Remark3.5. Everynon-nestedloopintheprogrammodel Homomorphisms are a well-studied class of mathematical in Figure 3 can be modelled by a system of recurrence functions.Inthispaper,wefocusonaspecialclassofhomo- equations(asdefinedabove). morphisms,wherethesourcestructureisasetofsequences Thiscanbeachievedthroughatranslationoftheloopbody withthestandardconcatenationoperator. from the imperative form through the functional form. In Definition 3.4. A function h ∶ → D is called ⊙- Appendix A, we include a description of how this can be homomorphic for binary operatorS⊙ ∶ D × D → D iff done,butalsorefertheinterestedreaderto[3,24]foramore forallsequencesx,y ∈ wehaveh(x•y)=h(x)⊙h(y). completeexplanation.Weprovideanexamplethatcaptures S theessenceofthistranslation,whichisthroughtransforma- Notethat,eventhoughitisnotexplicitlystatedinthedef- inition above, ⊙ is necessarily associative on D since con- tionofconditionalstatementsintoconditionalexpressions. catenationisassociative(oversequences).Moreover,h([]) (where [] is the empty sequence) is the unit of ⊙, because Example 3.6. Con- m = MAX_INT; m2 = MAX_INT; []istheunitofconcatenation.If⊙hasnounit,thatmeans sider the long version for (i = 0; i < |s|; i++) { if m > s[i] then h([])isundefined.Forexample,functionhead(x),thatre- of the second smallest if m2 > m then m2 = m ; turns the first element of a sequence, is not defined for an examplefromSection2 else empty list. head(x) is ⊛-homomorphic, where a⊛b = a on the right, where the ifimf >m2s>[is][ith]enthmen=m2s[=i]s;[i]; (foralla,b)but⊛doesnothavealeftunitelement. auxiliary functions min } and max are not used 3.3 ProgramsandLoops anymore, and instead the code complies with the syntax in Forthepresentationoftheresultsinthispaper,weassume Figure3. thatoursequentialprogramiswritteninasimpleimperative Theloopbody language with basic constructs for branching and looping. m2 = if m2 < (if m > s[i] then m else s[i]) isthenmod- Weassumethatthelanguageincludesscalartypesintand then m2 elled by re- else (if m > s[i] then m else s[i]); bool,andacollectiontypeseq. m = if m < s[i] then m else s[i]; currenceequa- ThesyntaxofourinputprogramsispresentedinFigure3. tions on the right, through the transformation of the condi- Weforegoasemanticdefinitionsinceitisintuitivelyclear. tionalstatementsaboveintotheassignmentstatementsthat For readability we use a simple iterator, and integer index ′ (instead of the generic i ∈ D), and use the standard array useconditionalexpressions(if betheneelsee). 4. SynthesisofJoin loopbodyE (orratherf tobeprecise).Itisimportantto E notethattheSyGussolverrequiresaboundedsetofpossible Thepremiseofthissectionisthatthesequentialloopunder inputs,andtherefore,thecorrectnessspecificationisformu- consideration is parallelizable, in the sense that a join op- latedonsymbolicinputsofboundedlengthK.Ajoin⊙isa eration for divide-and-conquer parallelism exists. The loop solutionofthesynthesisproblemifforallsequencesx,yof body is represented by a system of recurrence equations E lengthlessthanK,wehavef (x•y)=f (x)⊙f (y). in the style of Section 3.3 throughout this section. All for- E E E ThereisatensionbetweenhavingasmallenoughK for mal and algorithmic developments in the this paper use E the solver to scale, and having a large enough K for the asamodeloftheloopbody,andtherefore,weusetheterm answer to be correct for inputs that are larger than K. We “loopbody”torefertoEwheneverappropriate. usesmallenoughvaluesforthesolvertoscale,andtakecare 4.1 ParallelizableLoops ofgeneralcorrectnessbyautomaticallygeneratingproofsof Westartbyformallydefiningwhenaloopisparallelizable. correctness(seeSection7). We define a function f that models E, so that we can e SyntacticRestrictions linkthecomputationdefinedbyE tohomomorphisms.Let E = ⟨s = exp (SVar,IVar),...,s = exp (SVar,IVar)⟩ The syntactic restrictions are defined through a pair of a 1 1 n n be the system of recurrence equations that represents the sketch and a grammar for expressions. Asketch is a partial body of a loop. Let IVar = {σ,ι,i ,...,i } where σ is programcontainingholes(i.e.unknownvalues)tobecom- 1 k thesequencevariable,andιisthecurrentindex(whichwe pletedbyexpressionsinthestatespacedefinedbythegram- assumetobeanintegerforsimplicity)inthesequence,and mar. The sketch is an ordered set of equations E = ⟨s1 = ⃗i=⟨i1,...,ik⟩capturestherestoftheinputvariablesinthe exp1(SVar,IVar),...,sn = expn(SVar,IVar)⟩. Intuitively loopbody.Let bethesetofallsuchk-aryvectors. speaking, it is produced from the loop body by replacing For a loop bIody E, define function f = f (cid:156) f (cid:156) occurrences of variables and constants by holes. Note that E 1 2 ...(cid:156)f ,whereeachf ∶ ×int× →type(s )isafunc- the input to a join is the result produced from two worker tionsucnhthatf (σ,ι,⃗i)ireSturnsthevIalueofs aititerationι threads, to which we refer to as left and right threads. To i i withinputvaluesσand⃗i.Itis,bydefinition,straightforward containthestatespaceofsolutionsdescribedbythissketch, toseethat⟨f (σ,ι,⃗i),...,f (σ,ι,⃗i)⟩representsthestateof wedistinguishtwodifferenttypesofholes. 1 n theloopatiterationι. Left holes ??L are holes that can be completed by an expression over variables from both the left and the right Definition4.1. Aloop,withbodyE,isparallelizableifff E threads.Rightholes?? willbefilledwithexpressionsover is⊙-homomorphicforsome⊙. R variablesfromtherightthreadonly.Wedefineacompilation Definition 4.1 basically takes the encoding of the loop function as C (k) = ?? as a tupled function, and declares the loop parallelizable if C R ?? if x∈IVar this function is homomorphic. It is important to note that (x) = { R parallelizable here is not used in the broadest sense of the C ??L if x∈SVar (x[e]) = ?? term.Itislimitedtodivide-and-conquerparallelism. C R (op(e ,..,e )) = op( (e ),..., (e )) C 1 n C 1 C n 4.2 Syntax-GuidedSynthesisofJoin where e is an expression, op is an operator from the input We use syntax-guided synthesis (SyGus)[2] to synthesize language,xisavariable,andkisaconstant.Thesketchfor the join operator for parallelizable loops. The goal of pro- thejoincodewillthenbe gramsynthesisistoautomaticallysynthesizeacorrectpro- (E)=⟨s = (exp (SVar,IVar)),...,s = (exp (SVar,IVar))⟩ gram based on a given specification. In SyGus, syntactic C 1 C 1 n C n constraints are used to make the synthesis problem more where each hole in (exp ) (1≤i≤n) can be completed tractable.Inthissection,wedescribewhatourspecification C i byexpressionsinapredefinedgrammar. is,andwhatsyntacticconstrainsareused.Beyondthat,this Thegrammarforreplacingoftheunknownsissimilarto paperisnotconcernedwithhowSyGusworks.One(outof the general expression grammar described in Figure 3, i.e. manyavailable)SyGussolversisusedtosolveinstancesthat expressionsoverapredefinedsetofoperators,asetofvari- arefullydefinedwiththesetwoparameters. ables, and integer/boolean constants. The state space con- First,inthispaperwefocusonsynthesizingbinaryjoin sidered is bounded by considering expressions of bounded operators.Therestrictionissuperficial,inthesensethatany depthfromthisset. otherstaticallyfixednumberwouldwork.And,itisnothard toimagineaneasygeneralizationtoaparametricjoin. Example 4.2. m2 = min(?? , max(?? , ?? )); L L R Consider the m = min(?? , ?? ); L R CorrectnessSpecification second small- Thehomomorphismdefinition3.4providesuswiththecor- est example from Section 2. The sketch (for the code in rectness specification to synthesize a join operator ⊙ for a Figure 2) is illustrated on the right. It is clear that the join presented in Section 2 can be simple discovered using this ically that, homomorphic functions admit efficient parallel Sketch when the unknowns are replaced by appropriate computation. singlevariables. Theorem 5.1 (First Homomorphism Lemma [9]). A func- tionh∶ →Sc isahomomorphismifandonlyifhcanbe 4.3 EfficacyofJoinSynthesis S writtenasacompositionofamapandreduceoperation. SyGuS has its limitations: (i) the syntactic limitations im- posedforscalabilityruntheriskofleavingacorrectcandi- The if part of Theorem 5.1 basically states that a ho- dateoutofthesearchspace,and(ii)mostSyGuSsolversuse momorphismhasanefficientparallelimplementation,since boundedverificationtoensurethatthecandidatesatisfiesthe efficient parallel implementations for maps and reductions correctnessspecificationandthereforethesynthesizedsolu- are known. The only if part is equally important, because tionmaynotbecorrectinthegeneralcase.Wediscusshow itindicatesthatalargeclassofsimpleandreasonablepar- todealwith(ii)inSection7.Tocharacterizetheimpactof allel implementation of a list function ought to be homo- theformerproblem,wewillprovideformalconditionsunder morphisms;thisincludes,everyimplementationthatcanfit whichtheapproachpresentedinSection4.2issuccessfulin TBB’sgenericskeleton. synthesizingacorrectjoin. Theorem5.1posestwoimportantresearchquestions:(1) Howcanonecheckifasequentialloopisanimplementation Definition4.3. Functiong isaweak-rightinverseoffunc- ofahomomorphicfunctionoversequences?(2)Ifitisnot, tionf ifff ◦g◦f =f. can we modify the sequential code so that it becomes im- ′ Intuitively, g produces a sequence s (out of a result plementationofahomomorphicfunction?Thealgorithmin r = f(s)) such that f(s′) = r. In [33], this very notion Section6addressesbothquestions.Here,webuildthefor- ofaweak-rightinversewasusedtoproduceparallelimple- mal foundations that provide the context for the algorithm ′ mentations of functions, given that s is always a bounded andthespecificdesignchoicesforit. listindependentof∣s∣.Ourjoinsynthesisisalsoguaranteed tosucceedinthesecases. Non-homomorphicFunctions. Thereisasimpleobserva- tion, which was made in [19] (and probably also in earlier Proposition 4.4. If the loop body E is parallelizable and work),thatanynon-homomorphicfunctioncanbemadeho- theweak-rightinverseoff returnsalistofboundedlength, E momorphicinarathertrivialway: thenthereisa⊙inthespacedescribedbythesketchofE suchthatf is⊙-homomorphic. Proposition5.1. Givenafunctionf ∶ →Sc,thefunction E f (cid:156)ιis⟐-homomorphicwhere S Proposition4.4(seeAppendixBforaproofsketch)pro- videsuswiththeguaranteethatundertheconditionsgiven, ι(x)=x ∧ (a,x)⟐(b,y)=f(x•y) thestatespacedefinedforthejoindoesnotmissanexisting solution. It, however, only defines a subset of functions for which,thejoinsynthesissucceeds.Therearemany(simple) Operation(cid:156)denotestuplingofthetwofunctionsinthestan- functions whose weak right inverse is not bounded, where dard sense f (cid:156) g (x) = (f(x),g(x)). Basically, the ι re- a join is successfully synthesized. For example, function membersacopyofthestringthatisbeingprocessedandthe length is a simple example, where the weak right inverse partialcomputationresultsf(x)andf(y)arediscardedby alwayshastobealistofthesamelengthastheoriginallist ⟐,whichthencomputesf(x•y)fromscratch.Itisclearthat (whichwouldleadtoaninefficientparallelimplementation). a parallel implementation based on this idea would be far In principle, without any syntactic restrictions, any lessefficientthatthesequentialimplementationoff.There- constant-timejoincanbesynthesizedwhentheentireright- fore,noteveryhomomorphicextensionisagoodchoicefor handsideofeachequationinthejoinisoneunknown.How- parallelism.Next,weidentifyasubsetofhomomorphicex- ever,synthesiswillnotscaleinthiscase.InSection6.1,we tensionsthatarecomputationallyefficient. extendthescopeofefficacyofjoinbeyondProposition4.4, whilemaintainingthescalabilityofsynthesis. 5.2 ComputationallyEfficientHomomorphisms Letusassumethatf ∶ → D isasingle-passlineartime 5. Parallelizability computable(seeDefinitSion3.3)functionandelementsofD The mts (maximum tail sum) example from Section 2 are tuples of scalars. The assumption that the function is demonstrates that a loop is not always parallelizable. Here lineartimecomesfromthefactthatwetargetloopswithout (and later in Section 6), we discuss how a loop can be ex- any loops nested inside, which (by definition) are linear- tendedtobecomeparallelizable. time computable (on the size of their input sequence). In other words, for any sequence s, f(s) can be computed 5.1 HomomorphismsandParallelism in time linear on ∣s∣. Note that ∣s∣ is the only variable Ithasbeenwell-knownthatthereisastrongconnectionbe- in the complexity argument below, and everything else is tweenhomomorphismsandparallelism[16,18,33].Specif- consideredaconstant. Consider the (rightwards) sequential computation of f a homomorphism, or (2) the function is a homomorphism, thatisdefinedusingthebinaryoperator⊗asfollows: but syntactic restrictions for the synthesis of join were too restrictedtoincludethecorrectoperator.Wedealwithcase f(y•a)=f(y)⊗a (1)here,andcase(2)inSection6.1. where a ∈ Sc, f ∶ → D, and ⊗ ∶ D ×Sc → D. The TheAlgorithm S factthatf issingle-passlineartimecomputableimpliesthat In Section 2, we provided a step by step explanation of the time complexity of computing f at every step (of the how considering the unfoldings of the computation of mts recursionabove)isconstant-time. wouldleadtodiscoveryoftheauxiliaryaccumulatorsum. Consideranextensionoff,thatisalso⊙-homomorphic Here,wepresentthealgorithmthatdiscoverstheseaccumu- (forsome⊙),whichwecallf′. latorsbyunfoldingtheloopbodystartingfromanarbitrary Proposition5.2. The(balanced)divide-and-conquerimple- state.Thisarbitrarystatesymbolicallyrepresentsthepointat mentation based on f′ has the same asymptotic complexity whichasequentialcomputationismeanttobedividedinto as f (i.e. linear time) iff the asymptotic complexity of ⊙ is two parallel computations, and the algorithm is intuitively sub-linear(i.eo(n)). followinginthefootstepsofthesecondcomputation. Proof. SeeAppendixCforasimpleproof. Algorithm1:HomomorphicExtensionComputation The simplest case is when ⊙ is constant time. We call Data:AsetofrecurrenceequationsE these constant homomorphisms for short. In this paper, (i) Data:AconstantcostboundC ′ weprovidedasynthesismethodthatexclusivelysynthesizes Result:AsetofrecurrenceequationsE constant time join operators, and (ii) in the next section FunctionExtend(E) we focus on discovery of constant-size auxiliary variable, E′ ←∅; namely scalars, which will guarantee the inputs to the join foreachsi =expiinE(inorder)do areconstant-sizeandthereforethejoinisconstant-time. if Solve(“s =exp′′)=null then i i Let us briefly consider the super-constant yet sub-linear reportfailure caseforjoins,toclarifywhythispaperonlyfocusesonthe else constant-time joins. Based on the definition of homomor- E′ ←E′∪Solve(“s =exp′′); i i ′ phism,weknow: returnE; f′([x1,...xn])=f′([x1,...xk])⊙f′([xk+1,...xn]) FunIcntiitoianllSyoklv=e(s1,=AEuxxp=(S∅V,aσr,IV=ar⟨)s)0,...,s0⟩ For⊙tobesuper-constantinn,f′([x1,...xk])(respec- whileAux=/ OldAuxdo 0 0 n ′ tively, f ([xk+1,...xn])) has to produce a super-constant OldAux←Aux; output,thatisthenprocessedby⊙.Constant-sizeinputscan- τ ←unfold(σ ,s,E,k); 0 notyieldsuper-constanttimeexecutions.Sincetheoutputof τ ←Normalize(τ,E,C); f was assumed to be constant-sized′ (scalars are by defini- ifτ =null then tion constant sized), the output to f is super-constant only returnnull; due to the auxiliary information, required to have a homo- ←collect(τ,SVar); morphism, but not required for the sequential computation E foreachein do off.Webelievetheautomaticdiscoveryofauxiliaryinfor- E ifealreadycoveredbysomethinginAux mation of this nature to be a fundamentally hard problem. then Thesub-linearauxiliaryinformationisoftentheresultofa Addtheaccumulatorandcontinue cleveralgorithmictrickthatimprovesoverthetriviallinear else auxiliary (i.e. remembering the entire sequence as was dis- CreateanewvariableinAux,with cussedinProposition5.1).Thefieldofefficientdatastream- expressione. ingalgorithms[1,4]includesexamplesofsuchcleveralgo- k←k+1; rithms.Inconclusion,despitethefactthatthejoinsynthesis Aux←remove-redundancies(Aux); canbeadaptedtohandlesuchexamples,sinceweconsider returnAux; theauxiliaryinformationsynthesisinthiscasetobealways FunctionNormalize(exp ,exp ,E,C) 1 2 necessaryandextremelydifficulttodoautomatically,wedo Output:Anexpressione=exp witha 1 nottargetthisclassforautomationinthispaper. subexpressionexp andCost (e)≤C 2 SVar “Intentionallyleftwithnoimplementation.” 6. SynthesizingParallelizableCode Let us assume that the synthesis of join from Section 4.2 Algorithm 1 illustrates our main algorithm. The algo- hasfailed.Thenweknowthateither(1)thefunctionisnot rithm starts from the first equation in the loop body E, andexamineseachrecurrenceequationinorder(bycalling gorithm1,weassumetheidealizedversionofnormalizeas Solve(). discussed. In function Solve(), we start from an arbitrary state Proposition6.2. IfAlgorithm1terminatessuccessfullyand ⟨s0,...,s0⟩.Thewhileloopiterativelyunfoldstheexpres- 0 n reportsnonewauxiliaryvariables,thentheloopbodycorre- sioninthestyleofequations(1-2)inSection2.unfoldsim- spondstoa⊙-homomorphicfunction(forsome⊙).Ifitter- plycomputesthek-thunfolding.normalizeisthekeystepin minatessuccessfullyanddiscoversnewauxiliaryaccumula- this algorithm. It takes the unfolded expression and returns tors,thentheloopbodyextendedwiththemcorrespondsto an equivalent expression of the form exp(e1,...,em,e) a⊙-homomorphicfunction(forsome⊙). whereeonlyrefers totheinputvariablesand eache isan i expressionwhichincludesoneoccurrenceofastatevariable 6.1 Completeness from SVar at depth 1. The intuition is that the ‘remainder’ Algorithm1iscorrectwhenitterminatessuccessfully.How- ofeache (besidestheoneoccurrenceofthestatevariable) i ever,itcanterminatereportingafailure.Here,weformally needstocapturedbyanauxiliaryaccumulator.Forexample discusswhenAlgorithm1succeeds,andwhenitfails. in the mts example from Section 2, in both unfoldings we hadm = 1,ande1 = mts(...)+sum(...).Thesimplifi- ExistenceofConstantHomomorphicExtensions cationstepsthatwedidperformformtsi+1 andmtsi+2 are First, independently of any algorithm, we deliberate on the exactlythestepsthatareperformedbynormalize. question of existence of an efficient solution. A constant collect gathers these e ’s in . For each e , we check i E i homomorphicextensionincludesconstantlymanyconstant- if it can already be accounted for using existing auxiliary size auxiliary variables (see Section 5.2). Remember that accumulators in Aux. Otherwise, a new auxiliary variable extendingthelooptoahomomorphismwithasinglelinear- andaccumulatorisrecorded. sizeauxiliaryvariable(Proposition5.1)isalwayspossible. normalize relies on a cost function to ensure that all instancesofthestatevariablesappearatadepthlowerthan Theorem6.1. Iff isleftwardsandrightwardslineartime, apredeterminedconstant(C)inthenormalform. thenthereexistsabinaryoperator⊙anda⊙-homomorphic extensionoff thatisaconstanthomomorphism. Definition 6.1. The cost function Cost ∶ exp → Int × V Int assigns to any expression e, a pair (d,n) where d is Proof. The proof of this theorem is rather involved and is Maxv∈Vdepe(v) where depe(v) is the depth of the maxi- includedinAppendixD. mum occurrence of v in e, and n is ∑ occ (v), where v∈V e occ (v)isthenumberofoccurrencesvine. Theorem6.1deferstheexistenceofaconstanthomomor- e phic extension to the fact that function f can be computed normalize can always succeeds when the appropriate in single-pass linear-time both leftwards and rightwards on auxiliaryaccumulatorexist,aslongasallthealgebraicrules a sequence. This may not always be true. Even if a func- thatareneededforsimplificationareatitsdisposal.Infact, tionissing-passlinear-timecomputableinonedirection,it inthealgorithmpresentedinFigure1,weassumethatthis maybemoreexpensivetocomputeintheotherdirection.In idealizedversionofnormalize.Thisiswhynoparticularim- [38],theproblemoflanguagerecognitionofastringthatis plementationispresented.Thisimpliesthatifnormalizere- divided between two agents is discussed, and a complexity turnsnull thennogoodexpressionexists.Theexistenceof argument is provided that a restriction to a single-pass (in an efficient implementation (to replace this idealized ver- a certain direction) could increase the time complexity ex- sion) depends on the state space of the equivalent expres- ponentially.Here,weprovideasimpleexampletopassthe sions. intuitionontothereader. The process of normalization can be formalized by a rewritesystemRandasetofequationsAsuchthatRiscon- Remark 6.3. Let hn ∶ {0,1}∗ → bool be defined as vergentmoduloA.GivenasetofalgebraicrulesRandaset hn(w) = trueiffw[n] = 1(i.e.then-thletter).hisright- ofequationsA,wedefinethecostminimizingnormalization wardsbutnotleftwardslineartimecomputableinn. procedureasthesuccessiveapplicationofrewriterulesinR TheargumentforthisclaimisincludedinAppendixG.Fora and substitutions using equations in A such that we reduce functionlikeh,theredoesnotexistaconstanthomomorphic anexpressiontoanequivalentexpressionwithminimalcost. extension. Inourimplementation,weusesimpleheuristicfornormal- Proposition6.4. Ifafunctionhisnotleftwards(resp.right- izationwithalgebraicruleslikeassociativity,commutativity, wards) linear-time computable, then there exists no homo- anddistributivitywitheachruleappliedonlyinthedirection morphicextensionofhthatislinear-timecomputable. that it would reduce the cost of the expression, which we will evaluate in Section 8. However, many other heuristics This is a simple consequence of the Third Homomor- fornormalizationcanbeimagined[28,29,35]. phism Theorem [17]. A homomorphism naturally induces The following Proposition states the correctness of Al- a leftwards and a rightwards function. Existence of a con- gorithm 1. Note that in all the formal statements about Al- stanthomomorphicextensionwouldimplythatthefunction shouldbesingle-passlinear-timecomputableleftwardsand spaceofthesynthesisdevisedforthejoinistoorestricted.In rightwardswhichisacontradiction. thisinstance,thenormalizedexpressionusedforthediscov- eryoftheaccumulatorscontainshintsabouttheshapeofthe Corollary 6.5. There exists a linear time loop for which joinoperator,andwiththesenewhints,joinsynthesiscanbe theredoesnotexistalineartimehomomorphicextension. re-instantiatedtosucceed. This corollary is significant, because it implies that not everysimplesequentialloop(i.e.onethatmodelsasingle- 7. CorrectnessoftheSynthesizedPrograms passlinear-timecomputablefunction)doesnecessarilyhave aconstanthomomorphicextension. As noted in Section 4, the synthesized join operators are not guaranteed to be correct for all input sequences. We CompletenessoftheAlgorithm use Dafny [26] program verifier to prove the correctness Since not every single-pass linearly computable function of the join operator, and therefore the synthesized parallel can be extended into a constant homomorphism, we can codeonallinputs.Dafnyisaninteractiveprogramverifier, concludethatthereexistsnoalgorithmthatiscompletefor in which users have to provide the necessary invariants (or theentireclassofthesefunction.Itremainstodetermine,for hints) for the proofs. Automated parallelization is meant thosethatcanbeextendedtoconstanthomomorphisms(see to reduce the programmer’s burden in correctly devising Theorem6.1),how‘complete’Algorithm1is. parallelprograms.Itisthereforeunreasonabletoassumethat the programmer would invest the extra effort of guiding a Theorem 6.2. For a function f, if there exists a constant ⊙-homomorphic extension f (cid:156) g, then there exists a ⊛- program verifier to establish a proof. It is essential that the proofs of correctness are also automatically generated with homomorphicextensionwhere thecode.Thatisexactlywhatwedo. (f(x),g(x))⊛(f(y),g(y))= Theproofofcorrectnessofajoinoperatorforafunction ′ (exp(f(x),f(y),g(y)),exp(f(x),g(x),f(y),g(y))) (overasequence)isaninductionargumentoverthelengthof the sequence. The generality of this induction, in the sense thatis,thevalueoff(x•y)componentofthejoindoesnot thatitisindependentofthespecificfunction,enablesusto dependong(x). devise a generic proof template that we can instantiate for eachfunctionanditscorrespondingjoinoperator. Proof. TheproofisincludedinAppendixE. Weuseanexampletoillustratethegenericinductiontem- Theorem6.2isverysignificant.Itisthekeyinthecom- plate. Let us assume that we want to verify the correctness pletenessargumentforAlgorithm1.Thewaythealgorithm of the join operator synthesized to parallelize the mts loop operates is that it tries to discover exp(f(x),f(y),g(y)) body. As mentioned in Section 3.3, as part of the task of throughtheunfoldingsoff(x•y),anddoesnothaveaccess synthesizing the join, we obtain a functional variant of the to g(x). To make any claims about completeness of Algo- loopbody.Weusethisfunctionalvarianttomodeltheloop rithm1,onehastoarguethatitissufficienttolookatf(x), bodyasacollectionoffunctionsinDafny.Thegeneralrule f(y),andg(y)(andnotg(x)). isthateverystatevariableintheloopbodyismodelledbya differentfunction.Figures4(a)and4(b)illustratethefunc- Theorem6.3(Completeness). Ifthereexistsaconstantho- tions that model state variables sum and mts from Section momorphicextensionoftheloopbodyE,thenthereisafi- 2respectively.TheAuxiliaryfunctionMaxalsoneedstobe nitesetofalgebraicrules ,andarunofAlgorithm1where R separatelydefinedinDafny,whichappearsinFigure4(c). thealgorithmsucceedsindiscoveringthisextension. Thenextstepistomodelthesynthesizedjoin.Eachstate ProofofTheorem6.3appearsinAppendixF.Notethatexis- variable is modelled by a distinct join function. Therefore, tenceofaconstanthomomorphicextensionwasformulated inthiscase,weintroduceafunctiontomodelthejoinforthe in Theorem 6.1. As stated by Theorem 6.3, the complete- Sum function as illustrated in Figure 4(d), and another one ness result holds under the assumption of availability of a fortheMtsfunctionasillustratedin4(e). sufficientsetofalgebraicrules,andforthenondeterministic Onceallcomponentsofthedivideandconquerprogram algorithm that theoretically may explore all equivalent ex- are modelled, we are ready to state the correctness of the pressions.Anyimplementationwillbedeterminizedusinga join as two Dafny lemmas, one for each join operator. Let particular efficient heuristic for this exploration. In Section us start with Sum. The lemma in Figure 4(f) states correct- 8,weevaluateourheuristicusingasetofbenchmarks. ness of join for Sum. This lemma is basically stating that Finally,thereisanimportantobservationthatAlgorithm Sum is SumJoin-homomorphic as stated by definition 3.4, 1includesinformationthatcanbehelpfultojoinsynthesis. throughitsensuresclause.Dafnycannotprovethislemma Imagine join synthesis fails, while a constant-time join ex- automatically.Itcanprovethelemma,however,ifitisgiven ists.WhenAlgorithm1terminatessuccessfullyanddeclares a hint about how to formulate an induction argument. One thatnonewauxiliaryaccumulatorswerediscovered,thenit needstospecifywhatthebasecaseis,andhowtobreakthe becomesclearthattheproblemlieswiththejoin;i.ethestate argumenttoprovetheinductionstep. function Sum(s: seq<int>): int (a) function Mps(s: seq<int>): int (b) function Max(x: int, y: int): int (c) { if s == [] then 0 else { if s == [] then 0 else { if x < y then y else x } Sum(s[..|s|-1]) + s[|s|-1] } Max(Mts(s[..|s|-1]) + s[|s|-1], 0)} function SumJoin(sum_l: int, sum_r: int): int function MtsJoin(mts_l: int, sum_l: int, mts_r: int, sum_r: int): int { sum_l + sum_r } (d) { Max(mts_r, mts_l + sum_r) } (e) lemma HomomorphismSum(s: seq<int>, t: seq<int>) lemma HomorphismMts(s: seq<int>, t: seq<int>) ensures Sum(s + t) == SumJoin(Sum(s), Sum(t)) ensures Mts(s + t) == MtsJoin(Mts(s), Sum(s), Mts(t), Sum(t)) { { if t == [] { assert(s + t == s); } if t == [] { assert(s + [] == s); } else { else { calc { calc { Mts(s + t); Sum(s + t); == {HomomorphismSum(s,t[..|t|-1]); == {assert (s + t[..|t|-1]) + [t[|t|-1]] == s + t;} assert (s + t[..|t|-1]) + [t[|t|-1]] == s + t;} SumJoin(Sum(s), Sum(t)); } MtsJoin(Mts(s), Sum(s), Mts(t), Sum(t)); } } } } (f) } (g) Figure 4: Theencodingsofsum (a)mts (b),max (c).Theencodingofjoinintwopartsforsum (d)andmts (e).Theproofsthatthejoinsforma homomorphismintwopartsforsum(f)andmts(g).s + tdenotestheconcatenationofsequencessandtinDafny,[]istheemptysequencewhile|s| standsforthelengthofthesequences.Thelemmasareprovedbyinductionthroughtheseparationofthebasecaset == []andtheinductionstep.The calcenvironmentprovidesthehintsnecessaryfortheinductionstepproof. There are several ways, for a human, to formulate the theassignmenttov inthecodeofthesynthesizedjoin.The proofofcorrectnessofHomomorphismSumlemmainDafny. proof for each instance can be discharged based on E (the The unique aspect of the illustrated proof (among many modeloftheloopbody)andthesynthesizedjoinoperator. others)isthatitdoesnotemployanyhintsthatarespecific Itisimportanttobeclearaboutlimitationsofautomated tofunctionSum.Thefollowinghintsareprovidedin4(f): proofs.Thescalardatatypesusedintheseimplementations mustbelongtotheDafnylanguage.Automationinprovers, ingeneral,islimitedbytheexistenceofdecisionprocedures • The proof is an induction argument on the length forasubsetofalldatatypes(usedincommonprogramming of t (the second sequence), with the base case of an languages) and operations on them. These inherent limita- emptysequence(reflectedintheifstatementcondition). tioninlackofdecisionprocedurescarryovertolimitations onthereachofourautomatedproofgenerator. • Inthebasecase,Dafnyshouldbeawareofthefactthat an empty t basically simplifies the reasoning about s + 8. Experiments ttoreasoningabouts. Ourparallelizationtechniqueisimplementedinaprototype tool called PARSYNT, for which we report experimental • In the induction step (i.e. the else part), the guide re- resultsinthissection. quired to prove the two sides equivalent is the hint that 8.1 Implementation thelastelementoftshouldbepeeledoff,andtheinduc- tionhypothesisneedstoberecalledons + t[..|t|-1] We use CIL [36] to parse C programs, do basic program (i.e.concatenationofsandtminusitslastelement). analysis, and locate inner loops. The loop bodies are then converted into functional form, from which a sketch and These hints are generic enough to be applicable to any the correctness specification is generated. ROSETTE [47] otherfunction.Letusillustratethatbypresentingtheproof is used as our backend solver, with a custom grammar for forthecorrectnessofMtsJoin.NotethatMtscallsSuminits synthesizable expressions. In addition to the narrowing of definition,andthereforetheproofofcorrectnessofthejoin thesearchspacebyusingleftandRIGHTholes(unknowns) operator for Mts has to call on the proof of the correctness asdiscussedinSection4,weusetypeinformationtoreduce ofthejoinforSum(namely,theHomomorphismSumlemma). thestatespaceofthesearchforthesolver.Wealsoboundthe Figure 4(g) illustrates this proof. The proof is identical to size of the synthesizable expressions, especially when the the4(f),withtheminordifferenceofrecallingofthealready loopbodycontainsnon-linearoperators,whicharedifficult provedlemmain4(f). forsolverstohandle.Tosidestepthisproblem,weproduce The general idea of generating Dafny proofs follows a more constrained join sketch with a smaller search space the idea presented using the Mts example above. The gen- forwhennon-linearoperatorsareinvolved. eral template of the proof is the same as the one for HomomorphismSum or HomomorphismMts lemmas. How- 8.2 Evaluation ever, the proof for each state variable v in addition recalls Benchmarks. We collected a set of benchmarks to eval- (as hints) the proofs (of smaller induction instances) of all uate the effectiveness of our approach. Table 1 includes statevariablesuthatarementionedontheright-handsideof a complete list. The benchmarks are all in C and form a