Yet Another Way of Building Exact Polyhedral Model for Weakly Dynamic Affine Programs Arkady V. Klimov InstituteofDesignProblemsinMicroelectronics RussianAcademyofSciences Moscow,Russia [email protected] ABSTRACT We define (see Section 2) the PM as a mapping that as- 5 signs to each read (load) instance in the computation its Exact polyhedral model (PM) can be built in the general 1 uniquewrite(store)instancethathaswrittenthevaluebe- case if the only control structures are do-loops and struc- 0 ingread. Inotherwords,itisacollectionofsourcefunctions, tured ifs, and if loop counter bounds, array subscripts and 2 each bound to a single read operation in a program. Such if-conditions are affine expressions of enclosing loop coun- n ters and possibly some integer constants. In more general functiontakesiterationvectorofthereadinstanceandpro- a dynamic control programs, where arbitrary ifs and whiles duces the name and the iteration vector of a write instance J orsymbol⊥indicatingthatsuchwritedonotexistandthe are allowed, in the general case the usual dataflow analysis 5 can be only fuzzy. This is not a problem when PM is used original state of memory is read. 1 just for guiding the parallelizing transformations, but is in- However when the source program contains also one or sufficientfortransformingsourceprogramstoothercompu- severalif-statementswithnon-affine(e.g.,datadependent) ] tation models (CM) relying on the PM, such as our version conditions the known methods suggest only approximation L which, generally, provides a set of possible writes for some of dataflow CM or the well-known KPN. P reads. It is usually referred to as Fuzzy Array Dataflow Thepaperpresentsanovelwayofbuildingtheexactpoly- . Analysis (FADA) [1]. In some specific cases such model s hedral model and an extension of the concept of the exact c PM, which allowed us to add in a natural way all the pro- may provide a source function that uses as its input also [ values of predicates associated with non-affine conditionals cessingrelatedtothedatadependentconditions. Currently, 1 in our system, only arbirary ifs (not whiles) are allowed in order to produce the unique source. These cases seem to be those in which the number of such predicate values is v in input programs. The resulting polyhedral model can be finite (uniformly bounded). 9 easily put out as an equivalent program with the dataflow Usually, this does not makes a problem as parallelization 3 computation semantics. can proceed relying on the approximate PM. But our aim 8 is to convert the source program (part) completely into the 3 Keywords dataflowcomputationmodel,andanyapproximationisun- 0 acceptable for us. So, our task was to extend the class of . exact polyhedral model, weakly dynamic affine programs, 1 programsforwhichexactPMcanbebuiltbyprogramswith data dependent conditionals, dataflow computation model 0 non-affine conditionals. Our model representation language 5 is extended with predicate symbols corresponding to non- 1 1. INTRODUCTION affine Boolean expressions in the source code. From such a : v The exact Polyhedral Model (PM), a.k.a. Exact Array model the exact source function for each read can be easily i DataflowAnalysis(EADA),canbebuildforalimitedclass extracted. Thesefunctionsare,generally,recursiveandthey X of programs. Normally, it embraces affine loop nests with dependonusualaffineparametersaswellasonanunlimited r assignments in between, in which loop bounds and array number of predicate values. a element indexes are affine expressions of surrounding loop But the source function is not our aim. For building variables and fixed structure parameters (array sizes etc.). dataflow program we need the inversed, use set function. If-statementswithaffineconditionsarealsoallowed. Meth- Fromtheparallelizationperspectivethisprogramcarriesim- ods of EADA are well developed [2, 3, 11] for this class of plicitly the maximum amount of parallelism of the source programs. The results are usually used for guiding paral- program. A simple computation strategy (see Section 7.3) lelizing transformations. exhibitsallthisparallelism. Moredetailsandreferencescan be found in [8, 9]. Inthispaperwedescribebrieflyouroriginalwayofbuild- ingthedataflowmodelforaffineprogramsandthenexpand ittoprogramswithnon-affineconditionals. Theaffineclass Permissiontomakedigitalorhardcopiesofallorpartofthisworkfor andtheaffinesolutiontreearedefinedinSection2. Sections personalorclassroomuseisgrantedwithoutfeeprovidedthatcopiesare 3–5 describe our algorithm of building the PM. Several ex- notmadeordistributedforprofitorcommercialadvantageandthatcopies amples are presented in Section 6. Section 7 describes two bearthisnoticeandthefullcitationonthefirstpage.Tocopyotherwise,to differentsemanticsofthePMconsideredasaprogram. Sec- republish,topostonserversortoredistributetolists,requirespriorspecific tion8comparesourapproachandresultswithrelatedones. permissionand/orafee. Copyright20XXACMX-XXXXX-XX-X/XX/XX...$15.00. 2. SOMEFORMALISM we need the reversed map, that, for each write node, finds Consider a Fortran program fragment P. We are inter- all read nodes which read the very value written. So, we ested in memory reads and writes which have the form of need the multi-valued mapping array element or scalar variable. The latter will be treated G :(w,I )(cid:55)→{(r,I )} (2) below as 0-dimension arrays. P w r We define a computation graph by running the program which for each write node (w,I ) yields a set of all read w P with some input data. The graph consists of two kinds nodes {(r,I )} that read that very value written. We call r of nodes: reads and writes, corresponding respectively to this form of computation graph a use graph, or U-graph. individual executions of load or store memory operation. AsubgraphofS-graph(U-graph)associatedwithagiven There is a link from a write w to a read r if r reads the readr (writew)willbecalledr-component(w-component). value written by w. Thus, r uses the same memory cell as For each program statement (or point) s we define the w and w is the last write to this cell before r. domain Dom(s) as a set of values of iteration vector I , s Ourpurposeistoobtainacompactparametricdescription such that (s,I ) occurs in the computation. The follow- s of all computation graphs for a given program P. To make ingpropositionsummarizesthewell-establishedpropertyof suchdescriptionfeasibleweneedtoconsideralimitedclass static control programs [2, 3, 5, 10, 11] (which is also justi- of programs. It is a well known affine class, which can be fied by our algorithm). formally defined by the set of rules shown in Fig.1. Proposition 1. Foranystatement(s,I)inastaticcon- s Λ (empty statement) trol program P its domain Dom(s) can be represented as (cid:83) A(i1,...,ik)=e (assignment, k≥0) finite disjoint union iDi, such that each subdomain Di X ; X (sequence) canbespecifiedasaconjunctionofaffineconditionsofvari- 1 2 if c then X1; else X2; endif (conditional) ables Is and structure parameters, and, when the statement do v =e1,e2; X; enddo (do-loop) is a read (r,Ir), there exist such Di that the mapping FP on each subdomain D can be represented as either ⊥ or i (w,(e ,...,e ))forsomewritew,whereeache isanaffine 1 m i Figure 1: Affine program constructors expression of variables I and structure parameters. r Therighthandsideeofanassignmentmaycontainarray Thisresultsuggeststheideatorepresenteachr-component elementaccessA(i1,...,ik),k≥0. Allindexexpressionsas of FP as a solution tree with affine conditions at branching well as bounds e1 and e2 of do-loops must be affine in sur- verticesandtermsoftheformW{e1,...,em}or⊥atleaves. rounding loop variables and structure parameters. Affine A similar concept of quasi-affine solution tree, quast, was expressions are those built from variables and integer con- suggested by P. Feautrier [3]. stants with addition, subtraction and multiplication by lit- Asingle-valuedsolutiontree(S-tree)isastructureusedto eral integer. Also, in an affine expression, we allow whole representr-componentsofaS-graph. Itssyntaxisshownin division by literal integer. Condition c also must be affine, Fig.2. It uses just linear expressions (L-expr) in conditions i.e., equivalent to e=0 or e>0 where e is affine. andtermarguments,soaspecialvertextypewasintroduced Programs (or program parts) following these limitations in order to implement integer division. havebeencalledstaticcontrolprograms(SCoP)[1,5]. Their computation graph depends only on structure parameters S-tree ::=⊥ | term and does not depend on dynamic data values. Below, we are to remove the restriction that conditional | (cond→S-treet:S-treef) (branching) | (L-expr=:numvar+var→S-tree) (integerdivision) expression c must be affine. Such extended program class term ::=name{L-expr ,...,L-expr } (k≥0) 1 k hasbeencalledweaklydynamicprograms(WDP)[12]. Here var ::=name we shall allow only arbitrary ifs but not whiles which will num ::=···|−2|−1|0|1|2|3|... be considered in the future. cond ::=L-cond|predicate (anycondition) L-cond::=L-expr=0|L-expr>0 (affinecondition) A point in the computation trace of an affine program L-expr ::=num|numvar+L-expr (affineexpression) may be identified as (s,I ), where s is a point in the pro- s atom ::=⊥|name{num1,...,numk} (groundterm,k≥0) gram and I is the iteration vector, i.e., a vector of integer s values of all enclosing loop variables of point s. The list of thesevariableswillbedenotedasIs,whichallowstodepict Figure 2: Syntax for single-valued solution tree the point s itself as (s,I ). (Here and below boldface sym- s bolsdenotevariablesorlistofvariablesassyntacticobjects, GivenconcreteintegervaluesofallfreevariablesoftheS- while normal italic symbols denote some values as usual). treeonecanevaluatethetreetoanatom. Thetwofollowing Thus, denoting an arbitrary read or write instance as evaluation rules must be applied iteratively. (r,Ir) or (w,Iw) respectively, we represent the whole com- A branching like (c → T1 : T2) evaluates to T1 if condi- putation graph as a mapping: tional expression c evaluates to true, otherwise to T . Non- 2 affine conditions are expressed by a predicate. This will be F :(r,I )(cid:55)→(w,I ) (1) P r w explained below in Section 3.3. which,foranyreadnode(r,I ),yieldsthewritenode(w,I ) Adivision(e=:mq+r→T)introducestwonewvariables r w thathaswrittenthevaluebeingread,oryields⊥ifnosuch (q,r)thattakerespectivelythequotientandtheremainder writeexistandthustheoriginalcontentsofthecellisread. of integer division of integer value of e by positive constant This form of graph is called a source graph, or S-graph. integer m. The tree evaluates as T with parameter list ex- However, for translation to dataflow computation model tended with values of these two new variables. It follows from Proposition 1 that for an affine program with affine condition the effect is built just by putting the P the r-component of the S-graph F for each read (r,I ) effects of branches into a new conditional node. P r can be represented in the form of S-tree T depending on variables Ir and structure parameters. EA[Λ]=⊥ (emptystatement) However the concept of S-tree is not sufficient for repre- EA[X1;X2]=Xeq(EA[X1],EA[X2]) (sequence) senting w-components of U-graph, because those must be EA[LA: A(e1,...,ek)=e]= (assignmentstoA) multi-valued functions in general. So, we extend the defini- (wqh1e=reeI1is→a.l.is.t(qokf=alleoku→terLlAoo{pI}va:r⊥ia)b·l·e·s:⊥) tion of S-tree to the definition of multi-valued tree, M-tree, EA[LB: B(...)=e]=⊥ (otherassignments) by two auxiliary rules shown on Fig 3. EA[if c then X1 else X2 endif]= (conditional) =(c→EA[X1]:EA[X2]) M-tree::=...thesameasforS-tree... EA[do v=e1,e2; X; enddo]= (do-loop) |(&M-tree1...M-treen) (finiteunion,n≥2) =Fold(v,e1,e2,EA[X]) |(@var→M-tree) (infiniteunion) Figure 4: The rules for computing effect tree over Figure 3: Syntax for multi-valued tree k-dimensional array A The semantics also changes. The result of evaluating M- The implementation of function Seq is straight. To com- tree is a set of atoms. Symbol ⊥ now represents the empty puteSeq(T ,T )wesimplyreplaceall⊥-sinT withacopy 1 2 2 set, and the term N{...} represents a singleton. ofT . ThentheresultissimplifiedbyafunctionPrunewhich 1 Toevaluate(&T1,...,Tn)onemustevaluatesub-treesTi prunesunreachablebranchesbycheckingtheconsistencyof and take the union of all results. The result of evaluating affineconditionsets(thecheckisknownasOmega-test[11]). (@v→T) is mathematically defined as the union of infinite TheoperationFold(v,e ,e ,T),wherevisavariableand 1 2 number of results of evaluating T with each integer value v e ande areaffineexpressions,producesS-treeT(cid:48)thatdoes 1 2 of variable v. In practice the result of evaluating T is non- not contain v and represents the following function. Given emptyonlywithinsomeboundintervalofvaluesv. Inboth values of all other parameters, T(cid:48) evaluates to cases the united subsets are supposed to be disjoint. BelowwepresentouralgorithmofbuildingaS-graph(Sec- max{v∈[e1,e2]|T(v) evaluates to a term } (3) tions 3 and 4) and then a U-graph (Section 5). MakingthisT(cid:48)usuallyinvolvessolvingsome1-Dparametric integer programming problems and combining the results. 3. BUILDINGSTATEMENTEFFECT 3.2 GraphNodeStructure 3.1 StatementEffectanditsEvaluation In parallel with building the effect of each statement we Consider a program statement X, which is a part of an also compose a graph skeleton, which is a set of nodes with affine program P, and some k-dimensional array A. Let placeholdersforfuturelinks. Foreachassignmentaseparate (wA,IwA)denoteanarbitrarywriteoperationonanelement nodeiscreated. Atthisstagethegraphnodesareassociated of array A within a certain execution of statement X, or withASTnodes,orstatements,inwhichtheywerecreated, the totality of all such operations. Suppose that the body for the purpose that will be explained in Section 4. The of X depends affine-wise on free parameters p ,...,p (in syntax of a graph node description is presented in Fig.5. 1 l particular, they may include variables of loops surrounding XinP). WedefinetheeffectofXoverarrayAasafunction node::=(node(namecontext) (domconditions) EA[X]:(p1,...,pl;q1,...,qk)(cid:55)→(wA,IwA)+⊥ ((pboordtyscpoomrtps)utations) that, for each tuple of parameters p ,...,p and indexes ) 1 l q1,...,qkofanelementofarrayA,yieldsanatom(wA,IwA) pcoonrtte:x:=t::(=nanmaemteyspesource) or⊥. Theatomindicatesthatthewriteoperation(wA,IwA) computation::=(evalnametypeexpressiondestination) is the last among those that write to element A(q ,...,q ) 1 k source::=S-tree|IN duringexecutionofX withaffineparametersp1,...,pl and destination::=M-tree|OUT ⊥ means that there are no such operations. The following claim is another form of Proposition 1: the effectcanberepresentedasanS-treewithprogramstatement Figure 5: Syntax for graph node description labels as term names. Wecallthemsimplyeffect trees. (All assignments are supposed labeled during preprocessing). Non-terminals ending with -s usually denote a repetition Building effect is the core of our approach. Using S-trees ofitsbasewordnon-terminal,e.g.,portssignifieslist of ports. as data objects we implemented some operations on them Anodeconsistsofaheaderwithnameandcontext,domain that are used in the algorithm presented on Fig.4. A good description, list of ports that describe inputs and a body mathematical foundation of similar operations for similar that describes output result. The context here is just a list trees has been presented in [6]. of loop variables surrounding the current AST node. The The algorithm goes upwards along the AST from primi- domainspecifiesaconditiononthesevariablesforwhichthe tives like emptyandassignmentstatements. OperationSeq graphnodeinstanceexists. Besidescontextvariablesitmay computestheeffectofastatementsequencefromtheeffects depend on structure parameters. Ports and body describe ofcomponentstatements. OperationFoldbuildstheeffectof inputsandoutputs. Thesourceinaportinitiallyisusually ado-loopgiventheeffectoftheloopbody. Forif-statement an atom A{e ,...,e } (or, generally, an S-tree) depicting 1 k an array access A(e1,...,ek), which must be eventually re- (node (Unew i1...in) solved into a S-tree referencing other graph nodes as the (dom Dom(X)+path-to-U-in-TX) true sources of the value (see Section 4.2). A computation (ports (at(p→RW(A1):RW(A2))) consists of a local name and type of an output value, an (body (eval ata⊥) ) expressiontobeevaluated,andadestinationplaceholder⊥ ) whichmustbereplacedeventuallybyaM-treethatspecifies output links (see Section 5). The tag IN or OUT declares Figure 7: Initial contents of the blender node for the node as input or output respectively. atomic subtree U in E [X]=T ConsiderthestatementS=S+X(i)fromprograminFig.8a. A X The initial view of its graph node is shown in Fig.6. The expression in eval clause is built from the rhs by replacing into a single one. The domain of the new node is that of all occurrences of scalar or array element with their local statement X restricted by conditions on the path to the names (that became port names as well). A graph node for sub-treeU inthewholeeffecttreeT . Theresultisdefined assignment has a single eval clause and acts as a generator X as just copying the input value a (of type t). The most of values written by the assignment. Thus, a term of an intriguingisthesourcetreeofthesoleporta. Itisobtained effecttreemaybeconsideredasareferencetoagraphnode. from the atomic sub-tree U = (p → A : A ). Each A is 1 2 i replaced(byoperatorRW)asfollows. WhenA isatermit i remainsunchanged. Otherwise,whenA is⊥,itisreplaced (node (S1i) i withexplicitreferencetothearrayelementbeingrewritten, (dom (i≥1)(i≤n)) A{q ,...,q }. However, an issue arises: variables q ,...,q (ports (s1doubleS{})(x1doubleX{i})) 1 k 1 k are undefined in this context. The only variables allowed (body (eval Sdouble(s1+x1)⊥) ) here are i ,...,i (and fixed structure parameters). Thus ) 1 n weneedtoexpressindexesq ,...,q through“known”values 1 k i ,...,i . 1 n Figure 6: An initial view of graph node for state- Toresolvethisissueconsiderthelistof(affine)conditions ment S=S+X(i) L on the path to the subtree U in the whole effect tree T asasetofequationsconnectingvariablesq ,...,q and X 1 k i ,...,i . 1 n 3.3 ProcessingNon-affineConditionals Proposition 2. Conditions L specify a unique solution When the source program contains a non-affine condi- for values q ,...,q depending on i ,...,i . tional statement X, special processing is needed. We add a 1 k 1 n new kind of condition, a predicate function call, or simply Proof. ConsidertheotherbranchA ofsubtreeU,which j predicate, depicted as must be a term. We prove a stronger statement, namely, that given exact values of all free variables occurring in A , bool-const j name {L-exprs} (4) Vars(A ), all q-s are uniquely defined. The term A de- j j notesthesourceforarrayelementA(q ,...,q )withinsome that may be used everywhere in the graph where a normal 1 k branchoftheconditionalstatementX. Note,however,that affine condition can. It contains a name, sign T or F (affir- thisconcretesourceisawriteonasinglearrayelementonly. mation or negation) and a list of affine arguments. Hence,arrayelementindexesq ,...,q aredefineduniquely However,notalloperationscandealwithsuchconditions 1 k by Vars(A ). Now recall that all these variables are present in argument trees. In particular, the Fold cannot. Thus, j in the list i ,...,i (by definition of this list). weeliminateallpredicatesimmediatelyaftertheyappearin 1 n the effect tree of a non-affine conditional statement. Now that the unique solution does exist, it can be easily First, we drag the predicate p, which is initially on the found by our affine machinery. See Section 5 in which the top of the effect tree E [X] = (p → T : T ), downward to A 1 2 machinery used for graph inversion is described. leaves. The rather straightforward process is accomplished Thus, we obtain, for conditional statement X, the effect withpruning. Intheresulttree,T ,allcopiesofpredicatep X tree that does not contain predicate conditions. All pred- occuronlyindownmostpositionsoftheform(p→A :A ), 1 2 icates got hidden within new graph nodes. Hence we can whereeachA iseithertermor⊥. Wecallsuchconditional i continuetheprocessofbuildingeffectsusingthesameoper- sub-treesatomic. Intheworstcasetheresulttreewillhave ations on trees as we did in the purely affine case. Also, for a number of atomic sub-trees being a multiplied number of eachpredicateconditionanodemustbecreatedthatevalu- atoms in sub-trees T and T . 1 2 atesthepredicatevalue. Weshallreturnbacktoprocessing Second, each atomic sub-tree can now be regarded as an non-affine conditionals in Section 5.2. indivisiblecompositevaluesource. WhenoneofA is⊥,this i symbol depicts an implicit rewrite of an old value into the 4. EVALUATIONANDUSAGEOFSTATES target array cell A(q ,...,q ) rather than just“no write”. 1 k With this idea in mind we now replace each atomic sub- 4.1 ComputingStates tree U with a new term U {i ,...,i } where argument new 1 n list is just a list of variables occurring in the sub-tree U. A state before statement (s,I ) in affine program frag- s Simultaneously,weaddthedefinitionofU intheformof ment P with respect to array element A(q ,...,q ) is a new 1 k a graph node (associated with the conditional statement X function that takes as arguments the iteration vector I = s asawhole)whichisshowninFig.7. Thiskindofnodeswill (i ,...,i ),arrayindexes(q ,...,q )andvaluesofstructure 1 n 1 k be referred to as blenders as they blend two input sources parameters and yields the write (w,I ) in the computation w ofP thatisthelastamongthosethatwritetoarrayelement (performinga”conditionalassignment”toA). Fromourway A(q ,...,q ) before (s,I ). ofhidingpredicateconditionsdescribedinSection3.3itfol- 1 k s Inotherwordsthisfunctionpresentsaneffect(overarray lowsthattheeffecttreeofX,E [X],aswellasofanyother A A) of executing the program from the beginning up to the statementcontainingX, willnotcontainatermwithname point just before (s,I ). It can be represented as an S-tree, A. Hence, due to our way of building states from effects s Σ [s],calledastatetreeatprogrampointbeforestatements described above, this is also true for the state tree of any A forarrayA. Itcanbecomputedwiththefollowingmethod. pointoutsideX,includingthestateT beforetheX itself. X For the starting point of program P we set Now, consider the state T of a point p within a branch b p 1 of X. (Below we’ll see that b =b). We have ΣA[P]=(q1≥l1→(q1≤u1→...Aini{q1,...}···:⊥):⊥) (5) 1 T =Seq(T ,T ), (6) where term A {q ,...,q } signifies an untouched value of p X X−p ini 1 k array element A(q1,...,qk) and li, ui are lower and upper where TX−p is the effect of executing the code from the bounds of the i-th array dimensions (which must be affine beginningofthebranchb1 top(recallthatthestatebefore functionsoffixedparameters). Thus,(5)meansthatallA’s the branch b1, Tb1, is the same as TX according to Rule 3 elements are untouched before the whole program P. above). Consider a term A{i1,...,ik} in Tp. As it is not ThefurthercomputationofΣA isdescribedbythefollow- fromTX,itmustbeinTX−p. Obviously,TX−pcontainsonly ing production rules: terms associated with statements of the same branch with p. Thus, b = b. And these terms are only such that their 1 1. Let ΣA[B1;B2] = T. Then ΣA[B1] = T. The state initial m indexes are just variables of m loops surrounding before any prefix of B is the same as that before B. X. Thus,giventhattheoperationSeqdoesnotchangeterm indexes, we have the conclusion of Proposition 3. 2. Let Σ [B ;B ] = T. Then Σ [B ] = Seq(T,E [B ]). A 1 2 A 2 A 1 ThestateafterthestatementB isthatbeforeB com- 4.2 ResolvingArrayAccesses 1 1 bined by Seq with the effect of B . 1 Usingstatesbeforeeachstatementwecanbuildthesource graph F . Consider agraph node X. So far, asource in its 3. Let Σ [if c then B else B endif] = T. Then P A 1 2 ports clause contains terms representing array access, say Σ [B ] = Σ [B ] = T. The state before any branch A 1 A 2 A{e ,...,e }. Recall that the node X is associated with of if-statement is the same as before the whole if- 1 k a point p in the AST. We take the state Σ [p] and use it statement. A to find the source for our access indexes (e ,...,e ) (just 1 k 4. LetΣ [do v=e ,e ; B; enddo]=T. ThenΣ [B]= doing substitution followed by pruning). The resulting S- A 1 2 A Seq(T,Fold(v,e ,v−1,E [B])). The state before the treereplacesoriginalaccessterm. Doingsowitheacharray 1 A loopbodyB withthecurrentvalueofloopvariablevis access term we obtain the source graph F . P that before the loop combined by Seq with the effect of Recall that each graph node X has a domain Dom(X) all preceding iterations of B. whichisasetofpossiblecontextvectors. Itisspecifiedbya listofconditions,whicharecollectedfromsurroundingloop The form in rule 4 worth some comments. Here the upper bounds and if conditions. We write D⇒p to indicate that limitintheFoldclausedependsonv. Tobeformallycorrect, the condition p is valid in D (or follows from D). In case wemustreplaceallotheroccurrencesofvintheclausewith of a predicate condition p = pb{e ,...,e } it implies that afreshvariable,sayv(cid:48). Thus,theresultingtreewill(gener- 1 k the list D just contains p (up to equality of e ). For a S- i ally) contain v, as it expresses the effect of all iterations of graphbuiltsofar,thefollowingpropositionlimitstheusage the loop before the v-th iteration. of atoms A{...} whose Dom(A) has a predicate condition. Using the rules 1-4 one can compute the state in any in- ternal point of the program P. The following proposition Proposition 4. Suppose that B is a regular node (not a limitstheusage,withinastatetreeT,oftermswhoseasso- blender) whose source tree T contains a term A{i1,...,ik} ciatedstatementisenclosedinaconditionalwithnon-affine (referingtoanassignmenttoanarrayA). LetDom(A)⇒p, condition. It will be used further in Section 5.2. where p=pb{j1,...,jm} is a predicate condition. Then: • m≤k, Proposition 3. LetaconditionalstatementX withnon- affine condition be at a loop depth m within a dynamic con- • j = i ,...,j = i , and all these are just variables 1 1 m m trol program P. Consider a state tree T =Σ [p] in a point of loops enclosing the conditional with predicate p, p A p within P over an array A. Let A{i ,...,i } be a term in 1 k • Dom(B)⇒p. T , whose associated statement, also A, is inside a branch p of X. Then the following claims are all true: Proof. AsDom(A)⇒pb{j ,...,j },thepredicatepde- 1 m notes the condition of a conditional statement X enclosed • m≤k, by m loops with variables j ,...,j , and this X contains 1 m • p is inside the same branch of X and the statement A in the branch b (by construction of Dom). The source tree T was obtained by a substitution into the • indexes i1,...,im are just variables of loops enclosing state tree before B, TB = ΣA[B], which must contain a X. term A{i(cid:48),...,i(cid:48)}. It follows, by Proposition 3, that state- 1 k ment B is inside the same branch b (hence, Dom(B)⇒p), Proof. Let A be a term name, whose associated state- m≤k and i(cid:48),...,i(cid:48) are just variables j ,...,j . However 1 m 1 m mentAisinsideabranchbofaconditionalstatementXwith thesubstitutionreplacesonlyformalarrayindexesanddoes non-affinecondition. Itiseitherassignmenttoanarray,say nottouchesenclosingloopvariables,herej ,...,j . Hence 1 m A, or a blender node emerged from some inner conditional i(cid:48) =i ,...,i(cid:48) =i . 1 1 m m WhenB isablendertheassertionoftheProposition4is and mutually inverse. Thus, we prefer to update the S- also valid but Dom(B) should be extended with conditions graphbeforeinversionsuchthatinversionwouldnotproduce on the path from the root of the source tree to the term predicates in M-trees. To this end, we check for each port A{...}. The details are left to the reader. whetheritssourcenodehasenoughpredicatesinitsdomain condition list. When we see that the source node has less predicates, then we insert a filter node before that port. 5. BUILDINGTHEDATAFLOWMODEL Andtheoppositecase,thatthesourcehasmorepredicates, is impossible, as it follows immediately from Proposition 4. 5.1 InvertingS-graph: AffineCase In our dataflow computation model a node producing a 6. EXAMPLES dataelementmustknowexactlywhichothernodes(andby Asetofsimpleexamplesofasourceprogram(subroutine) which port) need this data element, and send it to all such with the two resulting graphs – S-graph and U-graph – are ports. This information should be placed, in the form of shown in Figs. 8,9,11. All graphs were generated as text destination M-tree, into the eval clause of each graph node and then redrawn graphically by hand. Nodes are boxes insteadofinitialplaceholder⊥. TheseM-treesareobtained or other shapes and data dependences are arrows between by inversion of S-graph. them. Usually a node has several input and one output First, we split each S-tree into paths. Each path starts ports. The domain is usually shown once for a group of with header term R{i ,...,i }, containing the list of inde- 1 n nodes (in the upper side in curly braces). The groups are pendent variables, ends with term W{e ,...,e } and has 1 m separated by vertical line. Each node should be considered a list of affine conditions interleaved with division clauses asacollectionofinstancenodesofthesametypethatdiffer like (e =: kq+r) (k is a literal integer here). In all ex- indomain(context)parametersfromeachother. Arrowsbe- pressions, variables may be either from header or defined tweennodesmayforkdependingonsomecondition(usually by division earlier. The InversePath operation produces the it is affine condition of domain parameters), which is then inverted path that starts with header term W{j ,...,j } 1 m written near the start of the arrow immediately after the with new independent variables j ,...,j , ends with term 1 m fork. When arrow enters a node it carries a new context (if R{f ,...,f }andhasalistofaffineconditionsanddivisions 1 n it has changed) written there in curly braces. The simplest inbetween. Theinversionisperformedbyvariableelimina- and purely affine example in Fig.8 explains the notations. tion. When a variable cannot be eliminated it is simply Arrows in the S-graph are directed from a node port to its introduced with clause (@v). source. The S-graph arrows can be interpreted as the flow All produced paths are grouped by new headers, each of requests for input values. (See Section 7.2 for details). groupbeinganM-treeforrespectivegraphnode,intheform (& T T ...) where each T is a 1-path tree. Further, the 1 2 i M-tree is simplified by the operation SimplifyTree. This op- subroutine Sum(X,n,S) {} {i|1≤i≤n} {} {i|1≤i≤n} eration also involves finding bounds for @-variables, which rSe=a0l(.80) X(n),S Xin(i) Start Xin(i) are then included in(t@ov(@l-1vue1r)t(ilc2eus2i)n..th.Te)form: e den Sond d=i dS=o+ 1 X,n(i ) S1=0{} ii>=11S S2=XS+X n=S01=0n>{0i+{11}}SXS2=S+X where li,ui are affine lower and upper bounds of i-th inter- {i-1} {n} i<n i=n val, and v must belong to one of the intervals. n=0 n>0 Sout Sout 5.2 InvertingS-graphforProgramswithNon- (a) (b) (c) affineConditionals WhenprogramP hasnon-affineconditionalstheabovein- versionprocesswillprobablyyieldsomeM-treeswithpred- Figure 8: Fortran program Sum (a), its S-graph (b) icate conditions. The node with such M-tree need an addi- and U-graph (c) tional port for the value of predicate. We call such nodes filters. The simplest filter has just two ports, one for the IntheU-grapharrowsgofromnodeoutputtonodeinput. mainvalueandoneforthevalueofthepredicate,andsends IncontrastwiththeS-graph,theydenoteactualflowofdata. themainvaluetothedestinationwhenthepredicatevalueis The U-graph semantics is described in Section 7.3. true (or false) and does nothing otherwise. Splitting nodes IntheU-graphweneedtogetridofzero-portnodeswhich with complex M-trees we can always reduce our graph to arise from assignments with constant rhs. We insert into that with only simplest filters. them a dummy port that receives a dummy value. Thus a Generally, the domain of each arrow and each node may node Start sending a token to node S1 appeared in Fig.8c. haveseveralfunctionalpredicatesintheconditionlist. Nor- A simplest example with non-affine conditions is shown mally, an arrow has the same listof predicates as its source onFig.9. Hereappearsanewkindofnode,theblender,de- and target nodes. However, sometimes these lists may dif- picted as a blue truncated triangle (see Fig. 9b). Formally, fer by one item. Namely, a filter node emits arrows with it has a single port, which receives data from two different a longer predicate list whereas the blender node makes the sourcesdependingonthevalueofthepredicate. Thus,ithas predicatelistoneitemshortercomparedtothatofincoming an implicit port for Boolean value (on top). The main port arrow. In the examples below both green (dotted) and red arrows go out from sides; true and false arrows are dotted (dashed) arrows have additional predicate in their domain. green and dashed red respectively. However, our aim is to produce not only U-graph, but In the U-graph the blender does not use a condition: in both S-graph and U-graph which must be both complete eithercaseitgetsavalueonitssingleportwithoutknowing subroutine Sum(X,n,S) {} {i|1≤i≤n} {} {i|1≤i≤n} rSe=a0l(.80) X(n),S Xin(i) Start Xin(i) e s de n ue Sond brSde n d e=rSiond= doaS= d =0iuol d(+ S.=t18 0oiX n+ ,) 1 n (eX X,i )n((S i n)u )m,S( X ,n,S) n=S0{S1}1==n00>{{0}} iiii>>{==i111-11SS}{SSi|22X1=X=Xi≤nS(S{i+i≤+n)X}nX} n=n{SS0=}t1Sa=01r0=tn0>n{0>i+{0{1i+1}{1}1}S}iX<{SSin|2XX1=Si≤nS2(=i+i≤)SiXn=+}nX Sout {i-1} {n} Sout i<n i=n n=0 n>0 (a) (b) (c) Sout Sout (a) Fig.8. Fortran program Su(mb )( a), its S-Graph (b) and U-(gcr)a ph (c) Fig.8. Fortran program Sum (a), its S-Graph (b) and U-graph (c) subroutine Max(X,n,R) {} {i|1≤i≤n} {} {i|1≤i≤n} P(i)=R(i)<X(i) real(8) X(n),R Xin(i) Start Xin(i) B(i)=if P(i)thenX(i)elseR(i) e Rd e n s e ie ond =fu rRd e n Rn d eR0bie ondi=f dda= .r=Rn d <R00iooilX dd(f (= X.= u18 <0oia(Xf t(),Xi 1 in i))n()X (,i eni)t() h n Mteh),aneRx n ( X,n,R) nn==RR0{RR01o}1ou=u=tt00nn>>iii=i=>0>01111 ({{bii-R-R11)}} R{R{{XX<n<in|X}X}1Xin≤(ii)≤nRR}2=2=XX n=n{SRR0=}t1oRRau=0tr1o{0tu=1nt{}0>1n0}>0{(i+c{i<1i+)}ni <1R{}nii|=XRR1Xn≤<ini=(XiXR≤i)n<Rn2X}=RX2=X FleRRRingo1(tuiu()t)rt=eo==S1ii0ff1-ign:=r=aS1py0htshtthoeeenmnnRFRo1i1(fg)(.)r9eeelbsclesueirfirfieN>nc>1et0hetqehnueaBnt(Biio−(nns)1)eeleqslesue⊥iv⊥a- (aF) ig.9. Fortran program Max( (ba)) and its S-Graph (b) and U(c-g) raph (c) Fig.9. Fortran program Max (a) and its S-Graph (b) and U-graph (c) F igure 9: Fortran program Max (a), its S-graph (b) constraint requires that the S-graph must be well-founded, a n d U-graph (c) whichmeansthatnooneobjectnodeX{I}maytransitively s u rd s b e d ou r ard o ib o ef id ol =rZA u (aAjo i o8f=n it=l( =Z u(i()jAn1j,8Aj) =nt1=-A=ei,()1n1(,i,Aj A1 ( -j-A)e,)B101(i<, ( ( j-): u)jB0A1n-< b1: u)A(n,bj)b)Z)( l ,bjet) Zhl (et Aeh(Aen,nn ,n) ) {{i|i|00A≤A≤inii≤(ni≤(i){ni){n0}0{}}{}jj}}ii<<jjn>in>i==11nn{{iijj,,==jjAA||111111≤≤AAi1i1≤≤<<nnAA,,2121≤≤AAjj2≤2≤i}ii}=i=nnZ i<i<nn n=n{0=Ai{|i0nA0(ii|in)≤0(nii)≤>≤n0in>≤}0n}{i,j{+i,1{j+}i,j1{|}ii1,={ji≤>0|ni1i0,=≤A1{i≤>10}nni0,,≤A1A11A}1n≤2<,jA1A≤A1≤2{2<i}njA≤,i2}{i}n,i{}i-1,{j-i1-1},j-1} d7ale.lT2pienhnpeduUSto-sgnviranailtpugsheesltfch.(aaennSdbe-sgturrsuaecdptuhtroeaepsvaaarlaumPateretoeogrusrt)pa.umFtovrasliumesplgiciviteyn, e ne e d e n n e en d d e n nAdd en d ddo( nAAddjio -fddo((1 ( jjio -)f)a=1 ( =))aAZ= )( Zj - 1) nn==A0Ao0uotu(tn(in)i>)>00 {{iii>i,>,ijij=-=0-A0A110011}} {{ii,,(i(i}{}{bibi++1)1),, 1 1{{}}11,,11}}{{i+i+AZ1A21,2j,+j+11}} {0{A}0oAu}to(ui)t(i){{0i}{}{0i}} i>(jic>A(ij=1)cjA i=1)i=j 1{ii=-11i{>,1i-1}1i>,11}j=1j=1AZ2j>1AZ2j>1 wrctheueceFrusaoSirslvrl-seoegunwrfmaucipennehgcetthqi[fo4rauon]tamatweilaoepFcnrhtisograng.(9nSrobasRd.mfeoEEr.p)m,xIrnoewtcdFhhuuieitcgcieh.oS1sn-1cgaasristnasaippnrbhrtgeeslsientewrotnieouttaehttadpeiudnastyvSasovRtscaealEamutiefroo.oenr-f of the output node function. Evaluation step is to evalu- Fig.11. Fortran program Bubble (a), and its S-Graph (b) and U-graph (c) Fig.11. Fortran program Bubble (a), and its S-Graph (b) and U-graph (c) atetherighthandsidecallingotherinvocationsrecursively. For efficiency it is worth doing tabulation so that neither F igure 10: Fortran program Bubble (a), its S-graph function call is executed twice for the same argument list. (b) and U-graph (c) Note,thatboththeconsistencyandthewell-foundedness conditions together provide the termination of the S-graph. whichnodehassentitandunderwhichcondition. However, 7.3 Computing the U-graph in the Dataflow asthesourceitselfisnotundertheneededcondition,afilter ComputationModel node must be inserted in between the source node and the TheU-graphcanbeexecutedasprograminthedataflow receiver port (it is shown in Fig.8c as an inverted orange computationmodel. Anodeinstancewithconcretecontext trapezoid). Acircleattheentrypointmeansthatthefilter values fires when all its ports get data element in the form is open when the condition is false. ofdatatoken. Eachfiredinstanceisexecutedbycomputing Amoreinterestingexample,abubblesortprogramandits all its eval clauses sequentially. All port and context values graphs, is shown in Fig.10. In contrast with previous ones, are used as data parameters in the execution. In each eval this U-graph exhibits high parallelism: the parallel time is clause the expression is evaluated, the obtained value is as- 2n instead of n(n+1)/2 for sequential execution. signedtoalocalvariableandthensentoutaccordingtothe destinationM-tree. Thetreeisexecutedinanobviousway. 7. USAGEOFPOLYHEDRALMODEL In the conditional vertex, the left or right subtree is exe- cuted depending on the Boolean value of the condition. In 7.1 GeneralFormofDataflowGraph &-vertices, all sub-trees are executed one after another. An The general syntax of PM format is shown in Fig.5. The @-vertexactsasado-loopwithspecifiedbounds. Eachterm S-graph is comprised of port source S-trees (single-valued), oftheformR.x{f ,...,f }actsasatokensendstatement, 1 n whereas the U-graph of destination M-trees (multi-valued). thatsendsthecomputedvaluetothegraphnodeRtoport Both graphs must be mutually inverse, i.e., they represent x with the context built of values of f . The process stops i the same dependence relation. when all output nodes get the token or when all activity SomenodesproduceBooleanvalues,whichcanbeusedas stops(quiescencecondition). Toinitiatetheprocess,tokens predicates. In S-graph, they are alowed in a blender, which to all necessary input nodes should be sent from outside. isanidentitynodewithuniqueportsourcetreeoftheform 7.4 ExtractingSourceFunctionfromS-graph (pb{e ,...,e }→T :T ). InU-graph,weforbidpredicates 1 k 1 2 in destination trees, but we allow filter nodes, which are in Therearetwowaystoextractthesourcefunctionfromthe somesenseinversetoblenders. Insteadofdestinationtreeof S-graph. First, we may use the S-graph itself as a program theform(pb{e ,...,e }→T :⊥)theyhaveanadditional thatcomputesthesourceforagivenreadwhentheiteration 1 k out booleanportpwithsourceP{e ,...,e }andthedestination vector of the read as well as values of all predicates are 1 k tree with just p as condition. Thus, filter is a gate which is available. We take the SRE and start evaluating the term open or closed depending on the value at port p. Note that R(i ,...,i ), where i ,...,i are known integers. We stop 1 n 1 n filters are needed in U-graph, but not in S-graph. as soon as a term of the form W(j ,...,j ) is encountered 1 m The S-graph must satisfy the two following constraints. on the top level (not inside predicate evaluation), where W The first is a consistency restriction. Consider a node X{I} isanodenamecorrespondingtoatruewriteoperation(not with domain D and a source tree T. Let I ∈ D . Then a blender) and j ,...,j are some integers. X X 1 m T(I) is some atom Y{J} such that J ∈ D . The second Also,thereisapossibilitytoextractthegeneraldefinition Y of the source function for a given read in a program. We 9. CONCLUSION start from the term R{i1,...,in} where i1,...,in are sym- OuraimwastoconvertaprogramP ofaspecificclassinto bolic variables and proceed unfolding the S-graph symboli- thedataflowcomputationmodel. Thusweneednotonlyto callyintojusttheS-tree. Havingencounteredthepredicate build the exact and complete polyhedral model (which is a node we insert the branching with symbolic predicate con- set of exact source functions for all reads in P), but also to dition. HavingencounteredatermW{e1,...,em}forregu- invertitandthusobtaintheexactusesetfunctionforeach lar assignment statement W we stop unfolding the branch. write in P. The latter form can be used as a program in Having encountered a term for a blender node we unfold it a dataflow computation model. A prototype translator is further - this way we avoid taking our artificial dummy as- implemented in Refal-6 [7]. It admits an arbitrary WDP. signmentsasasource. Proceedingthiswaywewillgenerate TheworkwassupportedbyRussianAcademyofSciences apossiblyinfiniteS-tree(withpredicatevertices)represent- Presidium Program for Fundamental Research ”Fundamen- ingthesourcefunctioninquestion. Ifwe’reluckytheS-tree tal Problems of System Programming”in 2009–2014. will be finite. It seems like in [1, 5] the exact result (in the samesense)isproducedonlywhentheaboveprocessyields 10. REFERENCES a finite S-tree. ButwegetagoodresultevenwhenthegeneratedS-treeis [1] Denis Barthou, Jean-Francois Collard, and Paul infinite(thisisthecaseinexamplesMaxandBubble). Using Feautrier. Fuzzy array dataflow analysis. J. Parallel a technique like supercompilation [13] it is possible to fold Distrib. Comput., 40(2):210–226, 1997. the infinite S-tree into a finite cyclic graph. [2] Paul Feautrier. Parametric integer programming. RAIRO Recherche Op´eration-nelle, 22:243–268, 1988. 8. RELATEDWORK [3] Paul Feautrier. Dataflow analysis of array and scalar references. International Journal of Parallel Thefoundationsofdataflowanalysisforarrayshavebeen Programming, 20(1):23–53, 1991. well established in the 90-s by Feautrier [2, 3], Pugh[11], [4] Paul Feautrier. Array dataflow analysis. In S. Pande Maslov[10] and others. Their methods use the Omega and and D. Agrawal, editors, Compiler Optimizations for PIPlibrariesandyieldanexactdependencerelationforany Scalable Parallel Systems, volume 1808 of LNCS, pair of read and write references in affine program. Thus, pages 173–219. Springer-Verlag New York, Inc., New our work adds almost nothing for the affine case (besides York, NY, USA, 2001. producing a program in the dataflow computation model). [5] Martin Griebl. Automatic parallelization of loop However, for non-affine conditions, the state-of-the-art is programs for distributed memory architectures. generally a fuzzy solution [1], in which the source function Habilitation thesis, Department of Informatics and produces a set of possible sources. The authors claim that Mathematics, University of Passau, 2004. nothing more can be done. But all depends on the form we want to see the result in. Sometimes one may be satisfied [6] S.A. Guda. Operations on the tree representations of with the source function expressed in the form of a finite piecewise quasi-affine functions. ”Informatika i ee quastextendedwithpredicatevertices. Thenwhynotallow primeneniya”, applications), 7(1):58–69, 2013. a bit more general form - a S-graph with predicate nodes, [7] Arkady Klimov. Refal-6. URL: or SRE? The main thing is that it was good for something. http://refal.net/~arklimov/refal6/index.html, 2004. Known translations from WDP to KPN [12] usually rely [8] Arkady Klimov. Transforming affine nested loop on FADA and seem to succeed only when FADA succeeds programs to dataflow computation model. In Ershov tobeexact(judgingbytheexamplesused). Itisinteresting Informatics Conference, PSI’11, Preliminary what and how they do with the Bubble Sort in Fig.10. Proceedings, pages 274–285, Novosibirsk, Russia, 2011. Our base affine machinery of building the exact PM also [9] Arkady V. Klimov. Construction of exact polyhedral differs. Whileitiscommontoconsidereachread-writepair model for affine programs with data dependent separately and then combine the results, our method first conditions. In Proc. of the Forth International produces effects and states using only writes, and then re- Valentine Turchin Workshop on Metacomputation, solves each read against the respective state. It is interest- META’14, pages 136–160. University of Pereslavl, ing to compare our effect/state building process with that 2014. URL: http://meta2014.pereslavl.ru/papers/. ofbackwardtraversingthecontrolflowgraph[5]. Bothpro- [10] Vadim Maslov. Lazy array data-flow dependence cessesmovealongthesamepathbutinoppositedirections. analysis. In POPL’94, pages 311–325. ACM, 1994. Authorsusuallyargueformovingbackwardnotingthatthe [11] William Pugh. The Omega test: a fast and practical process can stop when the total source is found (cf. also integer programming algorithm for dependence [10]). We hope to obtain the same effect just implementing analysis. In Joanne L. Martin, editor, SC, pages 4–13. our algorithm in a lazy language. Then the tree T will not IEEE Computer Society / ACM, 1991. be built at all in calls like Seq(T,t), where t is a term. [12] T. Stefanov. Converting Weakly Dynamic Programs to Inprinciple,thewaywedealwithnon-affineconditionals Equivalent Process Network Specifications. Phd thesis, canbe reformulatedas follows: (1)pushall dynamicifsto Leiden University, The Netherlands, 2004. theinnermostlevel;(2)addelsepartswhichjustassignthe [13] Valentin F. Turchin. Program transformation by existing value to the same variable; (3) do FADA, identify- supercompilation. In Harald Ganzinger and Neil D. ing different copies of each predicate value; (4) collect the Jones, editors, Programs as Data Objects, volume 217 resultingexactsourcefunctionsasS-graph,orSRE;(5)use of LNCS, pages 257–281. Springer, 1985. this SRE as recursive definition of true source (passing by alldummyassignmentsintroducedatitem2). Iamgrateful totheIMPACTreviewersforpointingoutthisrelationship.