Table Of ContentYet Another Way of Building Exact Polyhedral Model for
Weakly Dynamic Affine Programs
Arkady V. Klimov
InstituteofDesignProblemsinMicroelectronics
RussianAcademyofSciences
Moscow,Russia
arkady.klimov@gmail.com
ABSTRACT We define (see Section 2) the PM as a mapping that as-
5 signs to each read (load) instance in the computation its
Exact polyhedral model (PM) can be built in the general
1 uniquewrite(store)instancethathaswrittenthevaluebe-
case if the only control structures are do-loops and struc-
0 ingread. Inotherwords,itisacollectionofsourcefunctions,
tured ifs, and if loop counter bounds, array subscripts and
2 each bound to a single read operation in a program. Such
if-conditions are affine expressions of enclosing loop coun-
n ters and possibly some integer constants. In more general functiontakesiterationvectorofthereadinstanceandpro-
a dynamic control programs, where arbitrary ifs and whiles duces the name and the iteration vector of a write instance
J orsymbol⊥indicatingthatsuchwritedonotexistandthe
are allowed, in the general case the usual dataflow analysis
5 can be only fuzzy. This is not a problem when PM is used original state of memory is read.
1 just for guiding the parallelizing transformations, but is in- However when the source program contains also one or
sufficientfortransformingsourceprogramstoothercompu- severalif-statementswithnon-affine(e.g.,datadependent)
] tation models (CM) relying on the PM, such as our version conditions the known methods suggest only approximation
L which, generally, provides a set of possible writes for some
of dataflow CM or the well-known KPN.
P reads. It is usually referred to as Fuzzy Array Dataflow
Thepaperpresentsanovelwayofbuildingtheexactpoly-
. Analysis (FADA) [1]. In some specific cases such model
s hedral model and an extension of the concept of the exact
c PM, which allowed us to add in a natural way all the pro- may provide a source function that uses as its input also
[ values of predicates associated with non-affine conditionals
cessingrelatedtothedatadependentconditions. Currently,
1 in our system, only arbirary ifs (not whiles) are allowed in order to produce the unique source. These cases seem
to be those in which the number of such predicate values is
v in input programs. The resulting polyhedral model can be
finite (uniformly bounded).
9 easily put out as an equivalent program with the dataflow
Usually, this does not makes a problem as parallelization
3 computation semantics.
can proceed relying on the approximate PM. But our aim
8
is to convert the source program (part) completely into the
3 Keywords dataflowcomputationmodel,andanyapproximationisun-
0
acceptable for us. So, our task was to extend the class of
. exact polyhedral model, weakly dynamic affine programs,
1 programsforwhichexactPMcanbebuiltbyprogramswith
data dependent conditionals, dataflow computation model
0 non-affine conditionals. Our model representation language
5 is extended with predicate symbols corresponding to non-
1 1. INTRODUCTION affine Boolean expressions in the source code. From such a
:
v The exact Polyhedral Model (PM), a.k.a. Exact Array model the exact source function for each read can be easily
i DataflowAnalysis(EADA),canbebuildforalimitedclass extracted. Thesefunctionsare,generally,recursiveandthey
X
of programs. Normally, it embraces affine loop nests with dependonusualaffineparametersaswellasonanunlimited
r assignments in between, in which loop bounds and array number of predicate values.
a
element indexes are affine expressions of surrounding loop But the source function is not our aim. For building
variables and fixed structure parameters (array sizes etc.). dataflow program we need the inversed, use set function.
If-statementswithaffineconditionsarealsoallowed. Meth- Fromtheparallelizationperspectivethisprogramcarriesim-
ods of EADA are well developed [2, 3, 11] for this class of plicitly the maximum amount of parallelism of the source
programs. The results are usually used for guiding paral- program. A simple computation strategy (see Section 7.3)
lelizing transformations. exhibitsallthisparallelism. Moredetailsandreferencescan
be found in [8, 9].
Inthispaperwedescribebrieflyouroriginalwayofbuild-
ingthedataflowmodelforaffineprogramsandthenexpand
ittoprogramswithnon-affineconditionals. Theaffineclass
Permissiontomakedigitalorhardcopiesofallorpartofthisworkfor andtheaffinesolutiontreearedefinedinSection2. Sections
personalorclassroomuseisgrantedwithoutfeeprovidedthatcopiesare 3–5 describe our algorithm of building the PM. Several ex-
notmadeordistributedforprofitorcommercialadvantageandthatcopies amples are presented in Section 6. Section 7 describes two
bearthisnoticeandthefullcitationonthefirstpage.Tocopyotherwise,to
differentsemanticsofthePMconsideredasaprogram. Sec-
republish,topostonserversortoredistributetolists,requirespriorspecific
tion8comparesourapproachandresultswithrelatedones.
permissionand/orafee.
Copyright20XXACMX-XXXXX-XX-X/XX/XX...$15.00.
2. SOMEFORMALISM we need the reversed map, that, for each write node, finds
Consider a Fortran program fragment P. We are inter- all read nodes which read the very value written. So, we
ested in memory reads and writes which have the form of need the multi-valued mapping
array element or scalar variable. The latter will be treated
G :(w,I )(cid:55)→{(r,I )} (2)
below as 0-dimension arrays. P w r
We define a computation graph by running the program which for each write node (w,I ) yields a set of all read
w
P with some input data. The graph consists of two kinds nodes {(r,I )} that read that very value written. We call
r
of nodes: reads and writes, corresponding respectively to this form of computation graph a use graph, or U-graph.
individual executions of load or store memory operation. AsubgraphofS-graph(U-graph)associatedwithagiven
There is a link from a write w to a read r if r reads the readr (writew)willbecalledr-component(w-component).
value written by w. Thus, r uses the same memory cell as For each program statement (or point) s we define the
w and w is the last write to this cell before r. domain Dom(s) as a set of values of iteration vector I ,
s
Ourpurposeistoobtainacompactparametricdescription such that (s,I ) occurs in the computation. The follow-
s
of all computation graphs for a given program P. To make ingpropositionsummarizesthewell-establishedpropertyof
suchdescriptionfeasibleweneedtoconsideralimitedclass static control programs [2, 3, 5, 10, 11] (which is also justi-
of programs. It is a well known affine class, which can be fied by our algorithm).
formally defined by the set of rules shown in Fig.1.
Proposition 1. Foranystatement(s,I)inastaticcon-
s
Λ (empty statement) trol program P its domain Dom(s) can be represented as
(cid:83)
A(i1,...,ik)=e (assignment, k≥0) finite disjoint union iDi, such that each subdomain Di
X ; X (sequence) canbespecifiedasaconjunctionofaffineconditionsofvari-
1 2
if c then X1; else X2; endif (conditional) ables Is and structure parameters, and, when the statement
do v =e1,e2; X; enddo (do-loop) is a read (r,Ir), there exist such Di that the mapping FP
on each subdomain D can be represented as either ⊥ or
i
(w,(e ,...,e ))forsomewritew,whereeache isanaffine
1 m i
Figure 1: Affine program constructors expression of variables I and structure parameters.
r
Therighthandsideeofanassignmentmaycontainarray Thisresultsuggeststheideatorepresenteachr-component
elementaccessA(i1,...,ik),k≥0. Allindexexpressionsas of FP as a solution tree with affine conditions at branching
well as bounds e1 and e2 of do-loops must be affine in sur- verticesandtermsoftheformW{e1,...,em}or⊥atleaves.
rounding loop variables and structure parameters. Affine A similar concept of quasi-affine solution tree, quast, was
expressions are those built from variables and integer con- suggested by P. Feautrier [3].
stants with addition, subtraction and multiplication by lit- Asingle-valuedsolutiontree(S-tree)isastructureusedto
eral integer. Also, in an affine expression, we allow whole representr-componentsofaS-graph. Itssyntaxisshownin
division by literal integer. Condition c also must be affine, Fig.2. It uses just linear expressions (L-expr) in conditions
i.e., equivalent to e=0 or e>0 where e is affine. andtermarguments,soaspecialvertextypewasintroduced
Programs (or program parts) following these limitations in order to implement integer division.
havebeencalledstaticcontrolprograms(SCoP)[1,5]. Their
computation graph depends only on structure parameters S-tree ::=⊥
| term
and does not depend on dynamic data values.
Below, we are to remove the restriction that conditional | (cond→S-treet:S-treef) (branching)
| (L-expr=:numvar+var→S-tree) (integerdivision)
expression c must be affine. Such extended program class term ::=name{L-expr ,...,L-expr } (k≥0)
1 k
hasbeencalledweaklydynamicprograms(WDP)[12]. Here var ::=name
we shall allow only arbitrary ifs but not whiles which will num ::=···|−2|−1|0|1|2|3|...
be considered in the future. cond ::=L-cond|predicate (anycondition)
L-cond::=L-expr=0|L-expr>0 (affinecondition)
A point in the computation trace of an affine program
L-expr ::=num|numvar+L-expr (affineexpression)
may be identified as (s,I ), where s is a point in the pro-
s atom ::=⊥|name{num1,...,numk} (groundterm,k≥0)
gram and I is the iteration vector, i.e., a vector of integer
s
values of all enclosing loop variables of point s. The list of
thesevariableswillbedenotedasIs,whichallowstodepict Figure 2: Syntax for single-valued solution tree
the point s itself as (s,I ). (Here and below boldface sym-
s
bolsdenotevariablesorlistofvariablesassyntacticobjects, GivenconcreteintegervaluesofallfreevariablesoftheS-
while normal italic symbols denote some values as usual). treeonecanevaluatethetreetoanatom. Thetwofollowing
Thus, denoting an arbitrary read or write instance as evaluation rules must be applied iteratively.
(r,Ir) or (w,Iw) respectively, we represent the whole com- A branching like (c → T1 : T2) evaluates to T1 if condi-
putation graph as a mapping: tional expression c evaluates to true, otherwise to T . Non-
2
affine conditions are expressed by a predicate. This will be
F :(r,I )(cid:55)→(w,I ) (1)
P r w
explained below in Section 3.3.
which,foranyreadnode(r,I ),yieldsthewritenode(w,I ) Adivision(e=:mq+r→T)introducestwonewvariables
r w
thathaswrittenthevaluebeingread,oryields⊥ifnosuch (q,r)thattakerespectivelythequotientandtheremainder
writeexistandthustheoriginalcontentsofthecellisread. of integer division of integer value of e by positive constant
This form of graph is called a source graph, or S-graph. integer m. The tree evaluates as T with parameter list ex-
However, for translation to dataflow computation model tended with values of these two new variables.
It follows from Proposition 1 that for an affine program with affine condition the effect is built just by putting the
P the r-component of the S-graph F for each read (r,I ) effects of branches into a new conditional node.
P r
can be represented in the form of S-tree T depending on
variables Ir and structure parameters. EA[Λ]=⊥ (emptystatement)
However the concept of S-tree is not sufficient for repre- EA[X1;X2]=Xeq(EA[X1],EA[X2]) (sequence)
senting w-components of U-graph, because those must be EA[LA: A(e1,...,ek)=e]= (assignmentstoA)
multi-valued functions in general. So, we extend the defini- (wqh1e=reeI1is→a.l.is.t(qokf=alleoku→terLlAoo{pI}va:r⊥ia)b·l·e·s:⊥)
tion of S-tree to the definition of multi-valued tree, M-tree, EA[LB: B(...)=e]=⊥ (otherassignments)
by two auxiliary rules shown on Fig 3. EA[if c then X1 else X2 endif]= (conditional)
=(c→EA[X1]:EA[X2])
M-tree::=...thesameasforS-tree... EA[do v=e1,e2; X; enddo]= (do-loop)
|(&M-tree1...M-treen) (finiteunion,n≥2) =Fold(v,e1,e2,EA[X])
|(@var→M-tree) (infiniteunion)
Figure 4: The rules for computing effect tree over
Figure 3: Syntax for multi-valued tree k-dimensional array A
The semantics also changes. The result of evaluating M- The implementation of function Seq is straight. To com-
tree is a set of atoms. Symbol ⊥ now represents the empty puteSeq(T ,T )wesimplyreplaceall⊥-sinT withacopy
1 2 2
set, and the term N{...} represents a singleton. ofT . ThentheresultissimplifiedbyafunctionPrunewhich
1
Toevaluate(&T1,...,Tn)onemustevaluatesub-treesTi prunesunreachablebranchesbycheckingtheconsistencyof
and take the union of all results. The result of evaluating affineconditionsets(thecheckisknownasOmega-test[11]).
(@v→T) is mathematically defined as the union of infinite TheoperationFold(v,e ,e ,T),wherevisavariableand
1 2
number of results of evaluating T with each integer value v e ande areaffineexpressions,producesS-treeT(cid:48)thatdoes
1 2
of variable v. In practice the result of evaluating T is non- not contain v and represents the following function. Given
emptyonlywithinsomeboundintervalofvaluesv. Inboth values of all other parameters, T(cid:48) evaluates to
cases the united subsets are supposed to be disjoint.
BelowwepresentouralgorithmofbuildingaS-graph(Sec- max{v∈[e1,e2]|T(v) evaluates to a term } (3)
tions 3 and 4) and then a U-graph (Section 5). MakingthisT(cid:48)usuallyinvolvessolvingsome1-Dparametric
integer programming problems and combining the results.
3. BUILDINGSTATEMENTEFFECT
3.2 GraphNodeStructure
3.1 StatementEffectanditsEvaluation In parallel with building the effect of each statement we
Consider a program statement X, which is a part of an also compose a graph skeleton, which is a set of nodes with
affine program P, and some k-dimensional array A. Let placeholdersforfuturelinks. Foreachassignmentaseparate
(wA,IwA)denoteanarbitrarywriteoperationonanelement nodeiscreated. Atthisstagethegraphnodesareassociated
of array A within a certain execution of statement X, or withASTnodes,orstatements,inwhichtheywerecreated,
the totality of all such operations. Suppose that the body for the purpose that will be explained in Section 4. The
of X depends affine-wise on free parameters p ,...,p (in syntax of a graph node description is presented in Fig.5.
1 l
particular, they may include variables of loops surrounding
XinP). WedefinetheeffectofXoverarrayAasafunction node::=(node(namecontext)
(domconditions)
EA[X]:(p1,...,pl;q1,...,qk)(cid:55)→(wA,IwA)+⊥ ((pboordtyscpoomrtps)utations)
that, for each tuple of parameters p ,...,p and indexes )
1 l
q1,...,qkofanelementofarrayA,yieldsanatom(wA,IwA) pcoonrtte:x:=t::(=nanmaemteyspesource)
or⊥. Theatomindicatesthatthewriteoperation(wA,IwA) computation::=(evalnametypeexpressiondestination)
is the last among those that write to element A(q ,...,q )
1 k source::=S-tree|IN
duringexecutionofX withaffineparametersp1,...,pl and destination::=M-tree|OUT
⊥ means that there are no such operations.
The following claim is another form of Proposition 1: the
effectcanberepresentedasanS-treewithprogramstatement Figure 5: Syntax for graph node description
labels as term names. Wecallthemsimplyeffect trees. (All
assignments are supposed labeled during preprocessing). Non-terminals ending with -s usually denote a repetition
Building effect is the core of our approach. Using S-trees ofitsbasewordnon-terminal,e.g.,portssignifieslist of ports.
as data objects we implemented some operations on them Anodeconsistsofaheaderwithnameandcontext,domain
that are used in the algorithm presented on Fig.4. A good description, list of ports that describe inputs and a body
mathematical foundation of similar operations for similar that describes output result. The context here is just a list
trees has been presented in [6]. of loop variables surrounding the current AST node. The
The algorithm goes upwards along the AST from primi- domainspecifiesaconditiononthesevariablesforwhichthe
tives like emptyandassignmentstatements. OperationSeq graphnodeinstanceexists. Besidescontextvariablesitmay
computestheeffectofastatementsequencefromtheeffects depend on structure parameters. Ports and body describe
ofcomponentstatements. OperationFoldbuildstheeffectof inputsandoutputs. Thesourceinaportinitiallyisusually
ado-loopgiventheeffectoftheloopbody. Forif-statement an atom A{e ,...,e } (or, generally, an S-tree) depicting
1 k
an array access A(e1,...,ek), which must be eventually re- (node (Unew i1...in)
solved into a S-tree referencing other graph nodes as the (dom Dom(X)+path-to-U-in-TX)
true sources of the value (see Section 4.2). A computation (ports (at(p→RW(A1):RW(A2)))
consists of a local name and type of an output value, an (body (eval ata⊥) )
expressiontobeevaluated,andadestinationplaceholder⊥ )
whichmustbereplacedeventuallybyaM-treethatspecifies
output links (see Section 5). The tag IN or OUT declares
Figure 7: Initial contents of the blender node for
the node as input or output respectively.
atomic subtree U in E [X]=T
ConsiderthestatementS=S+X(i)fromprograminFig.8a. A X
The initial view of its graph node is shown in Fig.6. The
expression in eval clause is built from the rhs by replacing
into a single one. The domain of the new node is that of
all occurrences of scalar or array element with their local
statement X restricted by conditions on the path to the
names (that became port names as well). A graph node for
sub-treeU inthewholeeffecttreeT . Theresultisdefined
assignment has a single eval clause and acts as a generator X
as just copying the input value a (of type t). The most
of values written by the assignment. Thus, a term of an
intriguingisthesourcetreeofthesoleporta. Itisobtained
effecttreemaybeconsideredasareferencetoagraphnode.
from the atomic sub-tree U = (p → A : A ). Each A is
1 2 i
replaced(byoperatorRW)asfollows. WhenA isatermit
i
remainsunchanged. Otherwise,whenA is⊥,itisreplaced
(node (S1i) i
withexplicitreferencetothearrayelementbeingrewritten,
(dom (i≥1)(i≤n))
A{q ,...,q }. However, an issue arises: variables q ,...,q
(ports (s1doubleS{})(x1doubleX{i})) 1 k 1 k
are undefined in this context. The only variables allowed
(body (eval Sdouble(s1+x1)⊥) )
here are i ,...,i (and fixed structure parameters). Thus
) 1 n
weneedtoexpressindexesq ,...,q through“known”values
1 k
i ,...,i .
1 n
Figure 6: An initial view of graph node for state- Toresolvethisissueconsiderthelistof(affine)conditions
ment S=S+X(i) L on the path to the subtree U in the whole effect tree
T asasetofequationsconnectingvariablesq ,...,q and
X 1 k
i ,...,i .
1 n
3.3 ProcessingNon-affineConditionals
Proposition 2. Conditions L specify a unique solution
When the source program contains a non-affine condi-
for values q ,...,q depending on i ,...,i .
tional statement X, special processing is needed. We add a 1 k 1 n
new kind of condition, a predicate function call, or simply Proof. ConsidertheotherbranchA ofsubtreeU,which
j
predicate, depicted as must be a term. We prove a stronger statement, namely,
that given exact values of all free variables occurring in A ,
bool-const j
name {L-exprs} (4) Vars(A ), all q-s are uniquely defined. The term A de-
j j
notesthesourceforarrayelementA(q ,...,q )withinsome
that may be used everywhere in the graph where a normal 1 k
branchoftheconditionalstatementX. Note,however,that
affine condition can. It contains a name, sign T or F (affir-
thisconcretesourceisawriteonasinglearrayelementonly.
mation or negation) and a list of affine arguments.
Hence,arrayelementindexesq ,...,q aredefineduniquely
However,notalloperationscandealwithsuchconditions 1 k
by Vars(A ). Now recall that all these variables are present
in argument trees. In particular, the Fold cannot. Thus, j
in the list i ,...,i (by definition of this list).
weeliminateallpredicatesimmediatelyaftertheyappearin 1 n
the effect tree of a non-affine conditional statement.
Now that the unique solution does exist, it can be easily
First, we drag the predicate p, which is initially on the
found by our affine machinery. See Section 5 in which the
top of the effect tree E [X] = (p → T : T ), downward to
A 1 2 machinery used for graph inversion is described.
leaves. The rather straightforward process is accomplished
Thus, we obtain, for conditional statement X, the effect
withpruning. Intheresulttree,T ,allcopiesofpredicatep
X tree that does not contain predicate conditions. All pred-
occuronlyindownmostpositionsoftheform(p→A :A ),
1 2 icates got hidden within new graph nodes. Hence we can
whereeachA iseithertermor⊥. Wecallsuchconditional
i continuetheprocessofbuildingeffectsusingthesameoper-
sub-treesatomic. Intheworstcasetheresulttreewillhave
ations on trees as we did in the purely affine case. Also, for
a number of atomic sub-trees being a multiplied number of
eachpredicateconditionanodemustbecreatedthatevalu-
atoms in sub-trees T and T .
1 2 atesthepredicatevalue. Weshallreturnbacktoprocessing
Second, each atomic sub-tree can now be regarded as an
non-affine conditionals in Section 5.2.
indivisiblecompositevaluesource. WhenoneofA is⊥,this
i
symbol depicts an implicit rewrite of an old value into the
4. EVALUATIONANDUSAGEOFSTATES
target array cell A(q ,...,q ) rather than just“no write”.
1 k
With this idea in mind we now replace each atomic sub-
4.1 ComputingStates
tree U with a new term U {i ,...,i } where argument
new 1 n
list is just a list of variables occurring in the sub-tree U. A state before statement (s,I ) in affine program frag-
s
Simultaneously,weaddthedefinitionofU intheformof ment P with respect to array element A(q ,...,q ) is a
new 1 k
a graph node (associated with the conditional statement X function that takes as arguments the iteration vector I =
s
asawhole)whichisshowninFig.7. Thiskindofnodeswill (i ,...,i ),arrayindexes(q ,...,q )andvaluesofstructure
1 n 1 k
be referred to as blenders as they blend two input sources parameters and yields the write (w,I ) in the computation
w
ofP thatisthelastamongthosethatwritetoarrayelement (performinga”conditionalassignment”toA). Fromourway
A(q ,...,q ) before (s,I ). ofhidingpredicateconditionsdescribedinSection3.3itfol-
1 k s
Inotherwordsthisfunctionpresentsaneffect(overarray lowsthattheeffecttreeofX,E [X],aswellasofanyother
A
A) of executing the program from the beginning up to the statementcontainingX, willnotcontainatermwithname
point just before (s,I ). It can be represented as an S-tree, A. Hence, due to our way of building states from effects
s
Σ [s],calledastatetreeatprogrampointbeforestatements described above, this is also true for the state tree of any
A
forarrayA. Itcanbecomputedwiththefollowingmethod. pointoutsideX,includingthestateT beforetheX itself.
X
For the starting point of program P we set Now, consider the state T of a point p within a branch b
p 1
of X. (Below we’ll see that b =b). We have
ΣA[P]=(q1≥l1→(q1≤u1→...Aini{q1,...}···:⊥):⊥) (5) 1
T =Seq(T ,T ), (6)
where term A {q ,...,q } signifies an untouched value of p X X−p
ini 1 k
array element A(q1,...,qk) and li, ui are lower and upper where TX−p is the effect of executing the code from the
bounds of the i-th array dimensions (which must be affine beginningofthebranchb1 top(recallthatthestatebefore
functionsoffixedparameters). Thus,(5)meansthatallA’s the branch b1, Tb1, is the same as TX according to Rule 3
elements are untouched before the whole program P. above). Consider a term A{i1,...,ik} in Tp. As it is not
ThefurthercomputationofΣA isdescribedbythefollow- fromTX,itmustbeinTX−p. Obviously,TX−pcontainsonly
ing production rules: terms associated with statements of the same branch with
p. Thus, b = b. And these terms are only such that their
1
1. Let ΣA[B1;B2] = T. Then ΣA[B1] = T. The state initial m indexes are just variables of m loops surrounding
before any prefix of B is the same as that before B. X. Thus,giventhattheoperationSeqdoesnotchangeterm
indexes, we have the conclusion of Proposition 3.
2. Let Σ [B ;B ] = T. Then Σ [B ] = Seq(T,E [B ]).
A 1 2 A 2 A 1
ThestateafterthestatementB isthatbeforeB com- 4.2 ResolvingArrayAccesses
1 1
bined by Seq with the effect of B .
1 Usingstatesbeforeeachstatementwecanbuildthesource
graph F . Consider agraph node X. So far, asource in its
3. Let Σ [if c then B else B endif] = T. Then P
A 1 2
ports clause contains terms representing array access, say
Σ [B ] = Σ [B ] = T. The state before any branch
A 1 A 2
A{e ,...,e }. Recall that the node X is associated with
of if-statement is the same as before the whole if- 1 k
a point p in the AST. We take the state Σ [p] and use it
statement. A
to find the source for our access indexes (e ,...,e ) (just
1 k
4. LetΣ [do v=e ,e ; B; enddo]=T. ThenΣ [B]= doing substitution followed by pruning). The resulting S-
A 1 2 A
Seq(T,Fold(v,e ,v−1,E [B])). The state before the treereplacesoriginalaccessterm. Doingsowitheacharray
1 A
loopbodyB withthecurrentvalueofloopvariablevis access term we obtain the source graph F .
P
that before the loop combined by Seq with the effect of Recall that each graph node X has a domain Dom(X)
all preceding iterations of B. whichisasetofpossiblecontextvectors. Itisspecifiedbya
listofconditions,whicharecollectedfromsurroundingloop
The form in rule 4 worth some comments. Here the upper
bounds and if conditions. We write D⇒p to indicate that
limitintheFoldclausedependsonv. Tobeformallycorrect,
the condition p is valid in D (or follows from D). In case
wemustreplaceallotheroccurrencesofvintheclausewith of a predicate condition p = pb{e ,...,e } it implies that
afreshvariable,sayv(cid:48). Thus,theresultingtreewill(gener- 1 k
the list D just contains p (up to equality of e ). For a S-
i
ally) contain v, as it expresses the effect of all iterations of
graphbuiltsofar,thefollowingpropositionlimitstheusage
the loop before the v-th iteration.
of atoms A{...} whose Dom(A) has a predicate condition.
Using the rules 1-4 one can compute the state in any in-
ternal point of the program P. The following proposition Proposition 4. Suppose that B is a regular node (not a
limitstheusage,withinastatetreeT,oftermswhoseasso- blender) whose source tree T contains a term A{i1,...,ik}
ciatedstatementisenclosedinaconditionalwithnon-affine (referingtoanassignmenttoanarrayA). LetDom(A)⇒p,
condition. It will be used further in Section 5.2. where p=pb{j1,...,jm} is a predicate condition. Then:
• m≤k,
Proposition 3. LetaconditionalstatementX withnon-
affine condition be at a loop depth m within a dynamic con- • j = i ,...,j = i , and all these are just variables
1 1 m m
trol program P. Consider a state tree T =Σ [p] in a point of loops enclosing the conditional with predicate p,
p A
p within P over an array A. Let A{i ,...,i } be a term in
1 k • Dom(B)⇒p.
T , whose associated statement, also A, is inside a branch
p
of X. Then the following claims are all true: Proof. AsDom(A)⇒pb{j ,...,j },thepredicatepde-
1 m
notes the condition of a conditional statement X enclosed
• m≤k,
by m loops with variables j ,...,j , and this X contains
1 m
• p is inside the same branch of X and the statement A in the branch b (by construction of Dom).
The source tree T was obtained by a substitution into the
• indexes i1,...,im are just variables of loops enclosing state tree before B, TB = ΣA[B], which must contain a
X. term A{i(cid:48),...,i(cid:48)}. It follows, by Proposition 3, that state-
1 k
ment B is inside the same branch b (hence, Dom(B)⇒p),
Proof. Let A be a term name, whose associated state- m≤k and i(cid:48),...,i(cid:48) are just variables j ,...,j . However
1 m 1 m
mentAisinsideabranchbofaconditionalstatementXwith thesubstitutionreplacesonlyformalarrayindexesanddoes
non-affinecondition. Itiseitherassignmenttoanarray,say nottouchesenclosingloopvariables,herej ,...,j . Hence
1 m
A, or a blender node emerged from some inner conditional i(cid:48) =i ,...,i(cid:48) =i .
1 1 m m
WhenB isablendertheassertionoftheProposition4is and mutually inverse. Thus, we prefer to update the S-
also valid but Dom(B) should be extended with conditions graphbeforeinversionsuchthatinversionwouldnotproduce
on the path from the root of the source tree to the term predicates in M-trees. To this end, we check for each port
A{...}. The details are left to the reader. whetheritssourcenodehasenoughpredicatesinitsdomain
condition list. When we see that the source node has less
predicates, then we insert a filter node before that port.
5. BUILDINGTHEDATAFLOWMODEL
Andtheoppositecase,thatthesourcehasmorepredicates,
is impossible, as it follows immediately from Proposition 4.
5.1 InvertingS-graph: AffineCase
In our dataflow computation model a node producing a 6. EXAMPLES
dataelementmustknowexactlywhichothernodes(andby
Asetofsimpleexamplesofasourceprogram(subroutine)
which port) need this data element, and send it to all such
with the two resulting graphs – S-graph and U-graph – are
ports. This information should be placed, in the form of
shown in Figs. 8,9,11. All graphs were generated as text
destination M-tree, into the eval clause of each graph node
and then redrawn graphically by hand. Nodes are boxes
insteadofinitialplaceholder⊥. TheseM-treesareobtained
or other shapes and data dependences are arrows between
by inversion of S-graph.
them. Usually a node has several input and one output
First, we split each S-tree into paths. Each path starts
ports. The domain is usually shown once for a group of
with header term R{i ,...,i }, containing the list of inde-
1 n nodes (in the upper side in curly braces). The groups are
pendent variables, ends with term W{e ,...,e } and has
1 m separated by vertical line. Each node should be considered
a list of affine conditions interleaved with division clauses
asacollectionofinstancenodesofthesametypethatdiffer
like (e =: kq+r) (k is a literal integer here). In all ex-
indomain(context)parametersfromeachother. Arrowsbe-
pressions, variables may be either from header or defined
tweennodesmayforkdependingonsomecondition(usually
by division earlier. The InversePath operation produces the
it is affine condition of domain parameters), which is then
inverted path that starts with header term W{j ,...,j }
1 m written near the start of the arrow immediately after the
with new independent variables j ,...,j , ends with term
1 m fork. When arrow enters a node it carries a new context (if
R{f ,...,f }andhasalistofaffineconditionsanddivisions
1 n it has changed) written there in curly braces. The simplest
inbetween. Theinversionisperformedbyvariableelimina-
and purely affine example in Fig.8 explains the notations.
tion. When a variable cannot be eliminated it is simply
Arrows in the S-graph are directed from a node port to its
introduced with clause (@v).
source. The S-graph arrows can be interpreted as the flow
All produced paths are grouped by new headers, each
of requests for input values. (See Section 7.2 for details).
groupbeinganM-treeforrespectivegraphnode,intheform
(& T T ...) where each T is a 1-path tree. Further, the
1 2 i
M-tree is simplified by the operation SimplifyTree. This op- subroutine Sum(X,n,S) {} {i|1≤i≤n} {} {i|1≤i≤n}
eration also involves finding bounds for @-variables, which rSe=a0l(.80) X(n),S Xin(i) Start Xin(i)
are then included in(t@ov(@l-1vue1r)t(ilc2eus2i)n..th.Te)form: e den Sond d=i dS=o+ 1 X,n(i ) S1=0{} ii>=11S S2=XS+X n=S01=0n>{0i+{11}}SXS2=S+X
where li,ui are affine lower and upper bounds of i-th inter- {i-1} {n} i<n i=n
val, and v must belong to one of the intervals. n=0 n>0
Sout Sout
5.2 InvertingS-graphforProgramswithNon-
(a) (b) (c)
affineConditionals
WhenprogramP hasnon-affineconditionalstheabovein-
versionprocesswillprobablyyieldsomeM-treeswithpred- Figure 8: Fortran program Sum (a), its S-graph (b)
icate conditions. The node with such M-tree need an addi- and U-graph (c)
tional port for the value of predicate. We call such nodes
filters. The simplest filter has just two ports, one for the IntheU-grapharrowsgofromnodeoutputtonodeinput.
mainvalueandoneforthevalueofthepredicate,andsends IncontrastwiththeS-graph,theydenoteactualflowofdata.
themainvaluetothedestinationwhenthepredicatevalueis The U-graph semantics is described in Section 7.3.
true (or false) and does nothing otherwise. Splitting nodes IntheU-graphweneedtogetridofzero-portnodeswhich
with complex M-trees we can always reduce our graph to arise from assignments with constant rhs. We insert into
that with only simplest filters. them a dummy port that receives a dummy value. Thus a
Generally, the domain of each arrow and each node may node Start sending a token to node S1 appeared in Fig.8c.
haveseveralfunctionalpredicatesintheconditionlist. Nor- A simplest example with non-affine conditions is shown
mally, an arrow has the same listof predicates as its source onFig.9. Hereappearsanewkindofnode,theblender,de-
and target nodes. However, sometimes these lists may dif- picted as a blue truncated triangle (see Fig. 9b). Formally,
fer by one item. Namely, a filter node emits arrows with it has a single port, which receives data from two different
a longer predicate list whereas the blender node makes the sourcesdependingonthevalueofthepredicate. Thus,ithas
predicatelistoneitemshortercomparedtothatofincoming an implicit port for Boolean value (on top). The main port
arrow. In the examples below both green (dotted) and red arrows go out from sides; true and false arrows are dotted
(dashed) arrows have additional predicate in their domain. green and dashed red respectively.
However, our aim is to produce not only U-graph, but In the U-graph the blender does not use a condition: in
both S-graph and U-graph which must be both complete eithercaseitgetsavalueonitssingleportwithoutknowing
subroutine Sum(X,n,S) {} {i|1≤i≤n} {} {i|1≤i≤n}
rSe=a0l(.80) X(n),S Xin(i) Start Xin(i)
e s de n ue Sond brSde n d e=rSiond= doaS= d =0iuol d(+ S.=t18 0oiX n+ ,) 1 n (eX X,i )n((S i n)u )m,S( X ,n,S) n=S0{S1}1==n00>{{0}} iiii>>{==i111-11SS}{SSi|22X1=X=Xi≤nS(S{i+i≤+n)X}nX} n=n{SS0=}t1Sa=01r0=tn0>n{0>i+{0{1i+1}{1}1}S}iX<{SSin|2XX1=Si≤nS2(=i+i≤)SiXn=+}nX
Sout {i-1} {n} Sout i<n i=n
n=0 n>0
(a) (b) (c)
Sout Sout
(a) Fig.8. Fortran program Su(mb )( a), its S-Graph (b) and U-(gcr)a ph (c)
Fig.8. Fortran program Sum (a), its S-Graph (b) and U-graph (c)
subroutine Max(X,n,R) {} {i|1≤i≤n} {} {i|1≤i≤n} P(i)=R(i)<X(i)
real(8) X(n),R Xin(i) Start Xin(i) B(i)=if P(i)thenX(i)elseR(i)
e Rd e n s e ie ond =fu rRd e n Rn d eR0bie ondi=f dda= .r=Rn d <R00iooilX dd(f (= X.= u18 <0oia(Xf t(),Xi 1 in i))n()X (,i eni)t() h n Mteh),aneRx n ( X,n,R) nn==RR0{RR01o}1ou=u=tt00nn>>iii=i=>0>01111 ({{bii-R-R11)}} R{R{{XX<n<in|X}X}1Xin≤(ii)≤nRR}2=2=XX n=n{SRR0=}t1oRRau=0tr1o{0tu=1nt{}0>1n0}>0{(i+c{i<1i+)}ni <1R{}nii|=XRR1Xn≤<ini=(XiXR≤i)n<Rn2X}=RX2=X FleRRRingo1(tuiu()t)rt=eo==S1ii0ff1-ign:=r=aS1py0htshtthoeeenmnnRFRo1i1(fg)(.)r9eeelbsclesueirfirfieN>nc>1et0hetqehnueaBnt(Biio−(nns)1)eeleqslesue⊥iv⊥a-
(aF) ig.9. Fortran program Max( (ba)) and its S-Graph (b) and U(c-g) raph (c)
Fig.9. Fortran program Max (a) and its S-Graph (b) and U-graph (c)
F igure 9: Fortran program Max (a), its S-graph (b) constraint requires that the S-graph must be well-founded,
a n d U-graph (c) whichmeansthatnooneobjectnodeX{I}maytransitively
s u rd s b e d ou r ard o ib o ef id ol =rZA u (aAjo i o8f=n it=l( =Z u(i()jAn1j,8Aj) =nt1=-A=ei,()1n1(,i,Aj A1 ( -j-A)e,)B101(i<, ( ( j-): u)jB0A1n-< b1: u)A(n,bj)b)Z)( l ,bjet) Zhl (et Aeh(Aen,nn ,n) ) {{i|i|00A≤A≤inii≤(ni≤(i){ni){n0}0{}}{}jj}}ii<<jjn>in>i==11nn{{iijj,,==jjAA||111111≤≤AAi1i1≤≤<<nnAA,,2121≤≤AAjj2≤2≤i}ii}=i=nnZ i<i<nn n=n{0=Ai{|i0nA0(ii|in)≤0(nii)≤>≤n0in>≤}0n}{i,j{+i,1{j+}i,j1{|}ii1,={ji≤>0|ni1i0,=≤A1{i≤>10}nni0,,≤A1A11A}1n≤2<,jA1A≤A1≤2{2<i}njA≤,i2}{i}n,i{}i-1,{j-i1-1},j-1} d7ale.lT2pienhnpeduUSto-sgnviranailtpugsheesltfch.(aaennSdbe-sgturrsuaecdptuhtroeaepsvaaarlaumPateretoeogrusrt)pa.umFtovrasliumesplgiciviteyn,
e ne e d e n n e en d d e n nAdd en d ddo( nAAddjio -fddo((1 ( jjio -)f)a=1 ( =))aAZ= )( Zj - 1) nn==A0Ao0uotu(tn(in)i>)>00 {{iii>i,>,ijij=-=0-A0A110011}} {{ii,,(i(i}{}{bibi++1)1),, 1 1{{}}11,,11}}{{i+i+AZ1A21,2j,+j+11}} {0{A}0oAu}to(ui)t(i){{0i}{}{0i}} i>(jic>A(ij=1)cjA i=1)i=j 1{ii=-11i{>,1i-1}1i>,11}j=1j=1AZ2j>1AZ2j>1 wrctheueceFrusaoSirslvrl-seoegunwrfmaucipennehgcetthqi[fo4rauon]tamatweilaoepFcnrhtisograng.(9nSrobasRd.mfeoEEr.p)m,xIrnoewtcdFhhuuieitcgcieh.oS1sn-1cgaasristnasaippnrbhrtgeeslsientewrotnieouttaehttadpeiudnastyvSasovRtscaealEamutiefroo.oenr-f
of the output node function. Evaluation step is to evalu-
Fig.11. Fortran program Bubble (a), and its S-Graph (b) and U-graph (c)
Fig.11. Fortran program Bubble (a), and its S-Graph (b) and U-graph (c) atetherighthandsidecallingotherinvocationsrecursively.
For efficiency it is worth doing tabulation so that neither
F igure 10: Fortran program Bubble (a), its S-graph
function call is executed twice for the same argument list.
(b) and U-graph (c)
Note,thatboththeconsistencyandthewell-foundedness
conditions together provide the termination of the S-graph.
whichnodehassentitandunderwhichcondition. However, 7.3 Computing the U-graph in the Dataflow
asthesourceitselfisnotundertheneededcondition,afilter ComputationModel
node must be inserted in between the source node and the
TheU-graphcanbeexecutedasprograminthedataflow
receiver port (it is shown in Fig.8c as an inverted orange
computationmodel. Anodeinstancewithconcretecontext
trapezoid). Acircleattheentrypointmeansthatthefilter
values fires when all its ports get data element in the form
is open when the condition is false.
ofdatatoken. Eachfiredinstanceisexecutedbycomputing
Amoreinterestingexample,abubblesortprogramandits
all its eval clauses sequentially. All port and context values
graphs, is shown in Fig.10. In contrast with previous ones,
are used as data parameters in the execution. In each eval
this U-graph exhibits high parallelism: the parallel time is
clause the expression is evaluated, the obtained value is as-
2n instead of n(n+1)/2 for sequential execution.
signedtoalocalvariableandthensentoutaccordingtothe
destinationM-tree. Thetreeisexecutedinanobviousway.
7. USAGEOFPOLYHEDRALMODEL
In the conditional vertex, the left or right subtree is exe-
cuted depending on the Boolean value of the condition. In
7.1 GeneralFormofDataflowGraph
&-vertices, all sub-trees are executed one after another. An
The general syntax of PM format is shown in Fig.5. The @-vertexactsasado-loopwithspecifiedbounds. Eachterm
S-graph is comprised of port source S-trees (single-valued), oftheformR.x{f ,...,f }actsasatokensendstatement,
1 n
whereas the U-graph of destination M-trees (multi-valued). thatsendsthecomputedvaluetothegraphnodeRtoport
Both graphs must be mutually inverse, i.e., they represent x with the context built of values of f . The process stops
i
the same dependence relation. when all output nodes get the token or when all activity
SomenodesproduceBooleanvalues,whichcanbeusedas stops(quiescencecondition). Toinitiatetheprocess,tokens
predicates. In S-graph, they are alowed in a blender, which to all necessary input nodes should be sent from outside.
isanidentitynodewithuniqueportsourcetreeoftheform
7.4 ExtractingSourceFunctionfromS-graph
(pb{e ,...,e }→T :T ). InU-graph,weforbidpredicates
1 k 1 2
in destination trees, but we allow filter nodes, which are in Therearetwowaystoextractthesourcefunctionfromthe
somesenseinversetoblenders. Insteadofdestinationtreeof S-graph. First, we may use the S-graph itself as a program
theform(pb{e ,...,e }→T :⊥)theyhaveanadditional thatcomputesthesourceforagivenreadwhentheiteration
1 k out
booleanportpwithsourceP{e ,...,e }andthedestination vector of the read as well as values of all predicates are
1 k
tree with just p as condition. Thus, filter is a gate which is available. We take the SRE and start evaluating the term
open or closed depending on the value at port p. Note that R(i ,...,i ), where i ,...,i are known integers. We stop
1 n 1 n
filters are needed in U-graph, but not in S-graph. as soon as a term of the form W(j ,...,j ) is encountered
1 m
The S-graph must satisfy the two following constraints. on the top level (not inside predicate evaluation), where W
The first is a consistency restriction. Consider a node X{I} isanodenamecorrespondingtoatruewriteoperation(not
with domain D and a source tree T. Let I ∈ D . Then a blender) and j ,...,j are some integers.
X X 1 m
T(I) is some atom Y{J} such that J ∈ D . The second Also,thereisapossibilitytoextractthegeneraldefinition
Y
of the source function for a given read in a program. We 9. CONCLUSION
start from the term R{i1,...,in} where i1,...,in are sym- OuraimwastoconvertaprogramP ofaspecificclassinto
bolic variables and proceed unfolding the S-graph symboli- thedataflowcomputationmodel. Thusweneednotonlyto
callyintojusttheS-tree. Havingencounteredthepredicate build the exact and complete polyhedral model (which is a
node we insert the branching with symbolic predicate con- set of exact source functions for all reads in P), but also to
dition. HavingencounteredatermW{e1,...,em}forregu- invertitandthusobtaintheexactusesetfunctionforeach
lar assignment statement W we stop unfolding the branch. write in P. The latter form can be used as a program in
Having encountered a term for a blender node we unfold it a dataflow computation model. A prototype translator is
further - this way we avoid taking our artificial dummy as- implemented in Refal-6 [7]. It admits an arbitrary WDP.
signmentsasasource. Proceedingthiswaywewillgenerate TheworkwassupportedbyRussianAcademyofSciences
apossiblyinfiniteS-tree(withpredicatevertices)represent- Presidium Program for Fundamental Research ”Fundamen-
ingthesourcefunctioninquestion. Ifwe’reluckytheS-tree tal Problems of System Programming”in 2009–2014.
will be finite. It seems like in [1, 5] the exact result (in the
samesense)isproducedonlywhentheaboveprocessyields
10. REFERENCES
a finite S-tree.
ButwegetagoodresultevenwhenthegeneratedS-treeis [1] Denis Barthou, Jean-Francois Collard, and Paul
infinite(thisisthecaseinexamplesMaxandBubble). Using Feautrier. Fuzzy array dataflow analysis. J. Parallel
a technique like supercompilation [13] it is possible to fold Distrib. Comput., 40(2):210–226, 1997.
the infinite S-tree into a finite cyclic graph. [2] Paul Feautrier. Parametric integer programming.
RAIRO Recherche Op´eration-nelle, 22:243–268, 1988.
8. RELATEDWORK [3] Paul Feautrier. Dataflow analysis of array and scalar
references. International Journal of Parallel
Thefoundationsofdataflowanalysisforarrayshavebeen
Programming, 20(1):23–53, 1991.
well established in the 90-s by Feautrier [2, 3], Pugh[11],
[4] Paul Feautrier. Array dataflow analysis. In S. Pande
Maslov[10] and others. Their methods use the Omega and
and D. Agrawal, editors, Compiler Optimizations for
PIPlibrariesandyieldanexactdependencerelationforany
Scalable Parallel Systems, volume 1808 of LNCS,
pair of read and write references in affine program. Thus,
pages 173–219. Springer-Verlag New York, Inc., New
our work adds almost nothing for the affine case (besides
York, NY, USA, 2001.
producing a program in the dataflow computation model).
[5] Martin Griebl. Automatic parallelization of loop
However, for non-affine conditions, the state-of-the-art is
programs for distributed memory architectures.
generally a fuzzy solution [1], in which the source function
Habilitation thesis, Department of Informatics and
produces a set of possible sources. The authors claim that
Mathematics, University of Passau, 2004.
nothing more can be done. But all depends on the form we
want to see the result in. Sometimes one may be satisfied [6] S.A. Guda. Operations on the tree representations of
with the source function expressed in the form of a finite piecewise quasi-affine functions. ”Informatika i ee
quastextendedwithpredicatevertices. Thenwhynotallow primeneniya”, applications), 7(1):58–69, 2013.
a bit more general form - a S-graph with predicate nodes, [7] Arkady Klimov. Refal-6. URL:
or SRE? The main thing is that it was good for something. http://refal.net/~arklimov/refal6/index.html, 2004.
Known translations from WDP to KPN [12] usually rely [8] Arkady Klimov. Transforming affine nested loop
on FADA and seem to succeed only when FADA succeeds programs to dataflow computation model. In Ershov
tobeexact(judgingbytheexamplesused). Itisinteresting Informatics Conference, PSI’11, Preliminary
what and how they do with the Bubble Sort in Fig.10. Proceedings, pages 274–285, Novosibirsk, Russia, 2011.
Our base affine machinery of building the exact PM also [9] Arkady V. Klimov. Construction of exact polyhedral
differs. Whileitiscommontoconsidereachread-writepair model for affine programs with data dependent
separately and then combine the results, our method first conditions. In Proc. of the Forth International
produces effects and states using only writes, and then re- Valentine Turchin Workshop on Metacomputation,
solves each read against the respective state. It is interest- META’14, pages 136–160. University of Pereslavl,
ing to compare our effect/state building process with that 2014. URL: http://meta2014.pereslavl.ru/papers/.
ofbackwardtraversingthecontrolflowgraph[5]. Bothpro- [10] Vadim Maslov. Lazy array data-flow dependence
cessesmovealongthesamepathbutinoppositedirections. analysis. In POPL’94, pages 311–325. ACM, 1994.
Authorsusuallyargueformovingbackwardnotingthatthe [11] William Pugh. The Omega test: a fast and practical
process can stop when the total source is found (cf. also integer programming algorithm for dependence
[10]). We hope to obtain the same effect just implementing analysis. In Joanne L. Martin, editor, SC, pages 4–13.
our algorithm in a lazy language. Then the tree T will not IEEE Computer Society / ACM, 1991.
be built at all in calls like Seq(T,t), where t is a term.
[12] T. Stefanov. Converting Weakly Dynamic Programs to
Inprinciple,thewaywedealwithnon-affineconditionals
Equivalent Process Network Specifications. Phd thesis,
canbe reformulatedas follows: (1)pushall dynamicifsto
Leiden University, The Netherlands, 2004.
theinnermostlevel;(2)addelsepartswhichjustassignthe
[13] Valentin F. Turchin. Program transformation by
existing value to the same variable; (3) do FADA, identify-
supercompilation. In Harald Ganzinger and Neil D.
ing different copies of each predicate value; (4) collect the
Jones, editors, Programs as Data Objects, volume 217
resultingexactsourcefunctionsasS-graph,orSRE;(5)use
of LNCS, pages 257–281. Springer, 1985.
this SRE as recursive definition of true source (passing by
alldummyassignmentsintroducedatitem2). Iamgrateful
totheIMPACTreviewersforpointingoutthisrelationship.