Table Of ContentA Symbolic Execution Algorithm
for Constraint-Based Testing
of Database Programs
∗
Michaël Marcozzi Wim Vanhoof Jean-Luc Hainaut
FacultyofComputerScience FacultyofComputerScience FacultyofComputerScience
UniversityofNamur UniversityofNamur UniversityofNamur
5 RueGrandgagnage,21 RueGrandgagnage,21 RueGrandgagnage,21
1 Namur,Belgium Namur,Belgium Namur,Belgium
0 mmr@info.fundp.ac.be wva@info.fundp.ac.be jlh@info.fundp.ac.be
2
n ABSTRACT General Terms
a
J Inso-calledconstraint-basedtesting,symbolicexecutionisa Algorithms, Languages, Reliability
commontechniqueusedasapartoftheprocesstogenerate
3
2 testdataforimperativeprograms. Databasesareubiquitous Keywords
insoftwareandtestingofprogramsmanipulatingdatabases
Constraints-Based Testing, Test Inputs Generation, Sym-
] is thus essential to enhance the reliability of software. This
bolic Execution, Database Applications, Relational Cons-
E work proposes and evaluates experimentally a symbolic ex-
traints, Alloy
S ecution algorithm for constraint-based testing of database
. programs. First, we describe SimpleDB, a formal language
s 1. INTRODUCTION
c which offers a minimal and well-defined syntax and seman-
[ tics, to model common interaction scenarios between pro- Testing[16,17]isacomplexprocesswhichconstitutesthe
grams and databases. Secondly, we detail the proposed al- primaryapproachtoimprovethereliabilityofsoftware,mo-
1 gorithm for symbolic execution of SimpleDB models. This tivating [31] much research to develop relevant automated
v algorithm considers a SimpleDB program as a sequence of approaches to test the quality of all aspects of software. In
1 operations over a set of relational variables, modeling both thiswork,weconsidertheautomationoftestinputsgenera-
2
the database tables and the program variables. By inte- tion for functional unit testing of database-driven software.
8
grating this relational model of the program with classical Manyadvancedtechniqueshavebeendevelopedsofar(see
5
static symbolic execution, the algorithm can generate a set e.g. [39]) to automate the generation of adequate test data
0
ofpathconstraintsforanyfinitepathtotestinthecontrol- for proper testing of units of code, with regard to their ex-
.
1 flowgraphoftheprogram. Solutionsoftheseconstraintsare pectedfunctions,andindependentlyofanyinteractionwith
0 test inputsfor the program, including an initial content for adatabase. Manyapproacheshavealsobeenproposed(e.g.
5 the database. When the program is executed with respect [1,7,19,27,36])toautomatethegenerationoftestdatabase
1 to these inputs, it is guaranteed to follow the path with re- contents,tobeusedfortestingdatabasesanddatabaseman-
v: spect to which the constraints were generated. Finally, the agementsystems(DBMS),independentlyofthedata-flowin
i algorithm is evaluated experimentally using representative theprogramsinteractingwiththedatabaseorDBMSunder
X SimpleDB models. test. Databases are nowadays ubiquitous in software and
r many units of code interact intensively with a large, per-
a Categories andSubject Descriptors sistent and highly-structured relational database [6]. But
barely few works (see [3, 10, 22]) have studied how to au-
D.2.5 [Software Engineering]: Testing and Debugging— tomate the generation of test inputs for such database pro-
Code inspections and walk-throughs; F.4.1 [Mathematical grams, totest thecorrect interaction betweenthecodeand
Logicandformallanguages]: MathematicalLogic—Logic thedatabase.
and constraint programming; H.2.0 [Database Manage-
ment]: General—Security, integrity, and protection Inthiswork,weproposeatechniquetogeneratesimulta-
∗ neously input database contents and program input values,
F.R.S.-FNRSResearch Fellow
inordertotestanimperativeprograminteractingwithare-
lational database through SQL statements. The technique
adopts a white-box and structural testing approach. Given
Permission tomake digital orhardcopies ofall orpartofthis workfor a finite set of execution paths in the control flow graph of
personalorclassroomuseisgrantedwithoutfeeprovidedthatcopiesare the program, satisfying a given code coverage criterion, it
notmadeordistributedforprofitorcommercialadvantageandthatcopies generates, for each path in the set, test inputs leading to
bearthisnoticeandthefullcitationonthefirstpage.Tocopyotherwise,to the execution of this path, using static forward symbolic
republish,topostonserversortoredistributetolists,requirespriorspecific
execution.
permissionand/orafee.
Such symbolic execution was introduced by King [20],
WorkingPaperUniversityofNamur
Copyright20XXACMX-XXXXX-XX-X/XX/XX...$15.00. Clarke [4] and DeMillo & Offut[9] and has been advocated
in many constraint-based test data generation techniques tion paths, including those that may lead to an erroneous
(e.g. [2, 8, 12, 26, 28]), in the context of programs having interaction between theprogram and thedatabase.
no interaction with a database. Given a path through the Secondly,weintroduce,detailandevaluateacompleteal-
programcodetobetested,symbolicexecutionbuildsasetof gorithmbasedonsymbolicexecutiontogeneratetestinputs
pathconstraintsovertheprograminputs. Theseconstraints for SimpleDB models. This algorithm extends, concretizes
aresuchthatwhentheprogramisexecutedwithrespect to and permits experimentation to validate the raw strategy
inputvaluessatisfyingthem,theexecutionisguaranteedto that we suggested in [22]: given a SimpleDB model and a
follow the path to be tested. In order to build these con- pathintheprogramdescribedbythismodel,thealgorithm
straints, symbolic execution considers the static single as- generatesthecorrespondingrelationalinputconstraintssys-
signment form of the program and it expresses the control temintheAlloylanguage[14]. Aprototypeofthealgorithm
dependenciesimposedbytheexecutionoftheparticularex- hasbeenevaluatedusingsampleSimpleDBmodels,andthe
ecution path tobe tested. generated constraints have been solved using theAlloy An-
Thetechniqueproposed in thiswork adaptssymbolic ex- alyzer [14], showing promising results.
ecution to the particular case of database programs. The
core strategy of the technique is, as we proposed in [22], to The remainder of this paper is organized as follows. Sec-
modeleveryvariableoftheprogramandeverytable(which tion2detailsthesyntaxandsemanticsoftheSimpleDBlan-
is,fundamentally,arelation)inthedatabaseasarelational guage. In section 3, we describe and illustrate thesymbolic
variablecontaininga mathematical relation oversimpledo- executionalgorithm,abletogenerateAlloyinputconstraints
mains, like integers. Each statement in the program, in- for any finite execution path in any SimpleDB model. We
cluding both imperative and SQL statements, can then be provide experimental results in section 4, showing the effi-
modeled as a simple operation over these relational vari- ciencyofthealgorithmoverseveralsampleSimpleDBmod-
ables. By applying the classical static symbolic execution els. Finally, some conclusions, related and future work are
mechanism over this relational version of the program, we provided in section 5.
can derive a set of path constraints over the program in-
putvariables. Thegenerated constraintsare hererelational 2. SIMPLEDB: A MINIMALISTSYNTAX
constraints and the input variables refer both to the classi-
ANDSEMANTICSFORDBPROGRAMS
cal inputs of the program and to the content of each of the
database tables at theprogram start. In thissection, we detail thesyntaxand semantics of the
Assomeoftherelationalvariablesmanipulatedbythere- SimpleDBlanguageusingastepbystepapproach. Foreach
lational version of the program model tables in a database, step, a part of the BNF grammar, describing some of the
they must obey the constraints described by the relational syntactic constructs of thelanguage, is presented. For each
schema of this database, such as, for example, the primary syntacticconstruct,additionalsyntacticrulesareexplained,
key or foreign key constraints. The technique proposed in aswellastheconstruct’ssemantics. Themajorlanguagede-
this work expresses these schema constraints as relational signchoicesarediscussedifneeded. Thechosennotationfor
constraints as well, and the path and schema constraints the BNF grammar includes some additional meta-symbols:
can thenbecombined intoan uniqueinputconstraintssys- {...} (grouping), * (repetition zero or more times) and +
tem. Eachsolutiontothisrelationalconstraintssystemrep- (repetition one or more times). When a single nontermi-
resentsa test input,includingan initialstate for each table nal appears several times in a single production, subscript
in the database, with respect to which the program can be notation allows to distinguish between theoccurrences.
executed and is guaranteed to follow the execution path to A sample SimpleDB model is provided in figure 1. This
be tested. model defines a database with two tables: one for authors
and one for the theatrical plays these authors write. The
The two main contributions of this work can be summa- number of plays written by an author is stored for each
rized as follows. author. The model also defines a program manipulating
First, wepropose aformal language, called SimpleDB,to this database: the program adds a set of new plays to the
facilitate the formal analysis of database programs. Sim- database and updates the authors’ plays counts. If the au-
pleDB refines end extends the ImperDB language that we thor of an added play was not in the database, it is added
defined in [22], introducing database schema specification to the database as well. The plays are inserted one by one
within the language, transactions management, use of lists in isolated transactions.
asinputvariablesandafullydefinedsyntaxandsemantics.
2.1 Model
A SimpleDB model describes both an imperative program
to be tested and the schema of the relational database ma-
nipulated by this program. SimpleDB is a tiny formal lan- hmodeli::= MODELhidihdb-schemaihprogrami ENDMODEL
guage, offering only a minimal set of classical well-defined
hidi::= {a|... |z|A|... |Z}{a|... |z|A|... |Z|0|... |9}*
primitivesnecessary forbuildingdatabaseprograms. These
simplicityandformalnessallowtospecifyalgorithmsableto A SimpleDB model is given a name and details first the
automatetestingofdatabaseprogramsinasimple,fastand relationalschemaofthedatabaseandsubsequentlythecode
rigorousway. Butdespiteitssimplicity,SimpleDBallowsto of the program working on this database.
model a large and interesting part of the possible interac-
tion scenarios between a real program and a real database. 2.2 Databaseschema
Notably, SimpleDB proposes basic mechanisms for throw-
ing and catching exceptions. This is an important aspect hdb-schemai::= htablei*
of thelanguage, as it allows for a clean testing of all execu-
htablei::= TABLEhidi(hattribi+ hpr-keyihf-keyi*hconstri*);
Figure 1: SimpleDB model with program inserting plays and updating author’s count in a plays database
MODELexample
TABLEauthor(name,numberOfPlays,PRIMARYKEY(name),numberOfPlays>0);
TABLEplay(title,theAuthor,PRIMARYKEY(title),FOREIGNKEY(theAuthor)REFERENCESauthor);
COMMIT();
LOAD(newPlays);
WHILE(!(newPlays=NIL))DO
error =0;
READ(authorName);
authors=SELECTnameFROMauthor WHERE(name=authorName);
isEmpty=CATCH(NEXT(authors));
IF(isEmpty=1)THEN
INSERT INTOauthorVALUES(authorName,1);
ELSE
UPDATEauthorSETnumberOfPlays=(numberOfPlays+1)WHERE(name=authorName);
ENDIF;
error =CATCH(INSERT INTOplayVALUES(newPlays.HEAD,authorName));
IF(error=0)THEN
COMMIT();
ELSE
ROLLBACK();
ENDIF;
newPlays=newPlays.TAIL;
ENDWHILE;
COMMIT();
ENDMODEL
The relational database schema is a list of table defini- 2.3 Program code
tions. This list can be empty. In such a case, the program SimpleDBallowstowriteimperativeprogramsprocessing
works independently of any database. Each table is identi- integersandlistsofintegersandinteractingwithadatabase
fiedbyitsname,containsatleastoneattributeandendorses through transactions involving one or several simple SQL
exactly one primary key. Foreign keys and additional con- Select, Insert,Updateor Delete statements.
straints can be declared for a table. Semantics of schema
definition primitives in SimpleDB is the same as defined in hprogrami ::= COMMIT();hstmti*COMMIT();
theclassical SQL DDL semantics [11].
Thecodeoftheprogramisasequenceofstatementswhich
hattribi::= hidi,
starts and ends with a commit statement. Variables must
hpr-keyi::= PRIMARYKEY(hidi) not be declared in SimpleDB. A variable can be used in
hf-keyi::= ,FOREIGN KEY(hidiatt )REFERENCES hiditab all thestatements subsequentto its initialization througha
read, load or assignment (hidi= ...) statement. A variable
hconstri::= ,hidi{<|=|>}hnaturali
cannot be used outside of the code block where it was ini-
hnaturali ::= {1|... |9}{0|... |9}+ |{0|... |9} tialized. Thismeansthatavariableinitialized insidealoop
body/if branchcannot beused in thestatements outsideof
Each attributein a table is simply identified by its name
this loop body/if branch. A variable can be of three types:
andisofintegertype. Theexclusiveuseofintegervaluesin
integer, list of integers or table. The type of a variable is
SimpleDBmodelsdoesnotlimittheexpressivepowerofour
fixed by its initialization statement and any type change in
model of database programs since all other usual primitive
a subsequent statement will result in a compile-time error.
typessuchasbooleans, strings,andfloatingpointnumbers,
SQL statements semantics in SimpleDB is the same as de-
but also data structures such as sets, arrays and matrices,
fined bytheclassical SQL DML semantics [11].
caneasilybemappedtointegersand/orsimulatedusinglists
of integers. It does, however, make both the modeling and 2.3.1 Imperativestatementsandlistsmanagement
the use of a constraint solver conceptually simpler. These
last reasons explain why the examples of SimpleDB models
used in this work manipulate values which are not usually hst|mtWiH:I:=LEIhcFohncdoindDOiThsHtEmNthis*tmENtDi*WtHhIeLnEE;LSEhstmti*else ENDIF;
integers, likeauthor namesin theexample of table 1. | hidi=hexpri; (Assignmentofvariablehidi)
Amongtheattributesofatable,oneisdeclaredtobethe
hcondi::= TRUE (Logical truevalue)
primary key of the table. Anyattribute can be be declared | FALSE (Logicalfalsevalue)
tobeaforeign keyreferencinganothertableintheschema. | (hcondi1 {&&|||}hcondi2) (Logicalconjunction
An attribute can reference several different tables. A row anddisjunction)
in a table cannot be deleted or see its primary key value | (!hcondi) (Logical negation)
updated as long as there exists at least another row in the | (hint-expri1 {<|=|>}hint-expri2) (Arithmetic
comparison)
database that references it. Cycles in tables referencing are
| (hidi=NIL) (Listemptinesstestforlistinvariablehidi)
not allowed. Simple arithmetic constraints can be declared
over each of theattributesof a table. hexpri::= hint-expri|hlist-expri
hint-expri::= hidi (Integer-typed variable) statement will result in an exception to be thrown within
| hnaturali (Naturalnumber) theprogram. ANextstatementcanbewrappedinaCatch
| (hint-expri1 {+|-|*|/}hint-expri2) (Arithmetics) statement. The later will set an integer-typed variable to 1
| (-hint-expri) (Unaryminus)
if an exception has been thrown by thestatement it wraps,
| hidi.HEAD (Firstelementoflistinvariablehidi)
and to 0 otherwise. If an exception remains uncaught, the
hlist-expri::= hidi (Listofintegers-typedvariable)
programimmediatelyterminates,revealingapotentialfault
| NIL (Emptylist)
in thecodeor database design.
| [hint-expri,hlist-expri] (Appendingintegerhint-expri
Unlike the classical SQL semantics and in order to avoid
atthebeginningoflisthlist-expri)
anynon-deterministicbehavior,theSimpleDBsemanticsre-
| hidi.TAIL (Listinvariablehidiwithoutitsfirstelement)
quires that thetables returned by Select queriesare always
sorted by ascending order of the primary key attribute val-
SimpleDB allows to control the program flow using clas-
ues. Put simply, the following SimpleDB query over the
sical condition and loop statements. Classical variable as-
database in figure 1:
signment statement evaluates an integer or list of integers
expression and assigns the obtained value to a variable of SELECTname,numberOfPlays
the program. SimpleDB allows all basic logic operations in FROMauthor
if and while conditions, as well as arithmetic comparisons WHERE(numberOfPlays>10)
andlistemptinesstesting. SimpleDBallowsfullarithmetics
over integers and hasthebasic operations overlists of inte- is always equivalentto thefollowing SQL query:
gers(concatenationofanelementtoalist andselectingthe
SELECTname,numberOfPlays
head, respectively,tail of a list). Lists are immutable. FROMauthor
WHERE(numberOfPlays>10)
2.3.2 Interactingwiththeoutsideworld
ORDERBYname
hstmti::= READ(hidi); IntheWhereclauseofaSQLstatement,aswellasinthe
| LOAD(hidi); Set clause of an Update SQL statement, hidi in hint-expri
can represent both a program variable and an attribute of
Read(respectivelyLoad)statementassignsaninteger(re-
the read or written table. In case of potential ambiguity,
spectively list of integers) value from the outside world to
hidi always represents theattributein thetable.
one of the variables of the program. This models different
kinds of interaction between the program and the outside 2.3.4 Writingdataintothedatabase
world (except from the interaction with the database) such
asparametersreceivedfromacallingprogram,userprompt, hstmti::= hdb-writei;
network access, readingfrom a file, etc. | hidi=CATCH(hdb-writei);
2.3.3 Readingdatafromthedatabase hdb-writei::=INSERT INTOhidiVALUES({hint-exprii,}*hint-exprin)
| UPDATEhiditab SEThidiatt=hint-expriWHEREhdb-condi
| DELETEFROMhidiWHEREhdb-condi
hstmti::= hidi=SELECT{hidii,}*hidinFROMhiditabWHEREhdb-condi;
| NEXT(hidi);
SQLInsert,UpdateandDeletestatementsallow thepro-
| hidi=CATCH(NEXT(hidi));
gram to write data into the database. If the execution of a
hdb-condi::= TRUE
suchastatementprovokesaviolationofoneofthedatabase
| FALSE
schemaconstraints,thedatabaseremainsunmodifiedbythe
| (hdb-condi1 {&&|||}hdb-condi2) statement and an exception is thrown within the program.
| (!hdb-condi)
As with Next statements, exceptions either can be caught
| (hidi{<|=|>}hint-expri) (hidireferstoanattributeof
using a Catch statement or make the program terminate,
thetablebeingread/modified)
revealing a potential fault in thecode or database design.
In order to access database data within the program, a 2.3.5 Transactionsmanagement
SQLSelectquerymustbeprocessedoverthedatabasecon-
tent and thetable returned by this query must be assigned
hstmti::= COMMIT();
to a variable of the program. These two steps are executed
| ROLLBACK();
bythehidi=SELECT... statement. Subsequently,thetable
intheassignedvariablecanbeaccessedbytheprogram,but SQLtransactionsaremanagedthroughtheclassicalCom-
onlyonerowatatime,usingacursorpointingatthesingle mit and Rollback statements. A Commit statement makes
readable row of thetable. After assigning a table to a vari- permanentallthechangesmadetothedatabasebythepro-
able, the Next statement must be called over this variable gram duringthecurrenttransaction, closes thistransaction
to set thecursor in front of thefirst row of thetable in the and opens a new one. A Rollback statement restores the
variable. Every subsequent Next statement will move the databasetoitsstateat thestart of thecurrenttransaction,
cursoronerowahead. Theintegervalueofattributehidiatt closes this transaction and opens a new one. In any Sim-
in therow pointed to by thecursor of thetable assigned to pleDB program, a new transaction is started at program
variable hiditab can beaccessed usingthe following syntax: start, and all uncommitted changes are saved at program
end,using a Commit statement.
hint-expri::= hiditab(hidiatt)
Ifthecursorisinfrontofthelastrowofthetableorifthe
Selectqueryreturnedanemptytable,everycalltotheNext
3. AN ALGORITHM FOR SYMBOLIC EX- 3.3 Symbolicvariables andrelational
ECUTION OFSIMPLEDB PROGRAMS constraints generationrules
Inthissection,weillustratetheexecutionofthealgorithm
3.1 Inputs and outputs
over the sample SimpleDB model detailed in figure 1. We
The symbolic execution algorithm proposed herereceives detail each step of the symbolic execution process over the
as inputs a SimpleDB model and an execution path in the pathwheretheWhileloopisexecutedonce,theNextstate-
program definedwithinthismodel. Itproducesasoutputa mentthrowsan exception,theTHEN branchof boththeIf
relational constraintssystem, whosesolutions are suchthat statements are taken, and both the two Insert statements
when the program is executed with respect to any of these do not violate any database integrity constraint. At each
solutions, its execution will follow thegiven path. step,wepresenttherulesusedbyouralgorithmtogenerate
The execution path received as inputby our algorithm is Alloy symbolic variables and constraints. This step by step
supposed to be a path in the program’s control flow graph. rules description process allows us to introduce the whole
Inparticular,itdefineswhichbranchofeachoftheencoun- symbolic execution mechanism of thealgorithm.
tered If statements was taken, how many times the body
of each of the encountered loops was executed and which The algorithm always starts bygenerating symbolic vari-
of the encountered Next and hdb-writei statements threw ables and relational constraints for the database tables de-
an exception. Inthecaseof hdb-writei statementsthrowing fined within the model. For each table, an Alloy type is
exceptions, the path also specifies which database schema firstdefinedsothateverysymbolicvariablerepresentingthe
constraint caused the exception tobe thrown. content of the table will be of this type. Then a symbolic
The constraints system generated as output by our algo- variableis created torepresent theinitial content oftheta-
rithm allows to find values for the inputs of the input Sim- ble(1). Finally,constraintsaregeneratedtoenforceonthis
pleDB program. Inputs of a SimpleDB program are com- content the primary key (2) as well as all the foreign keys
posed of the content of each database table defined by the (3) and arithmetic constraints (4) defined in the table. For
model at program start, as well as all the values gathered themodeloftable1,thegenerated Alloycodeisasfollows.
from the outside world by the Read and Load statements The reader should note that symbolic variables will be al-
executed bythe program. ways created using the sig keyword followed by the name
of the variable and by its type. Here a symbolic variable
3.2 Algorithm principle
authorINPUTDB2 was defined to represent the initial con-
The principle of the algorithm is to perform a relational tentofthetableAUTHOR,andasymbolicvariableplayIN-
symbolic execution of the program path received as input. PUTDB1 wasdefinedtorepresenttheinitialcontentofthe
Each of the different values taken by the program variables tablePLAY.Relationalconstraintswillbealwaysgenerated
andbythedatabasetablesduringtheexecutionofthepath using thefact keyword.
is represented by a corresponding symbolic relational vari-
moduleexample// Name of the Alloy constraints model
able. First, symbolic execution generates constraints stat-
// STARTof Alloy type definition for table AUTHOR
ingthatthevariablescorrespondingtotheinitialcontentof
sig author{name:Int,numberOfPlays:Int}
eachtableinthedatabaseconform tothedatabaseschema. predequalauthor[a:author,b:author]
Then, symbolic execution analyzes one by one theprogram {a·name=b·name&&a·numberOfPlays=b·numberOfPlays}
statements executed by the path, in the order in which fact{all disj a,b: author | !equalauthor[a,b]}
the path specifies they are executed. Every time a state- // END of Alloy type definition for table AUTHOR
ment sets or changes the value of a program variable or of
(1) sig authorINPUTDB2inauthor{}
a database table, symbolic execution generates a new con-
(2) fact{all disj a,b:authorINPUTDB2|!((a·name=b·name))}
straint stating how the symbolic variable representing this (4) fact{all a: author| a·numberOfPlays>0}
new valuecan becomputed as a function of theother sym-
bolicvariables. Everytimeastatementoffersachoiceinthe // STARTof Alloy type definition for table PLAY
way it can be executed that depends on the values of the sig play{theAuthor:Int,title : Int}
predequalplay[a:play,b: play]
program variables and of the database tables (conditions,
{a·theAuthor =b·theAuthor &&a·title=b·title}
loops, next and hdb-writei statements), symbolic execution
fact{all disj a,b: play | !equalplay[a,b]}
generatesaconstraintoverthesymbolicvariablessuchthat // END of Alloy type definition for table PLAY
when the program is executed with respect to values satis-
fyingtheconstraint,theexecutionisguaranteedtotakethe (1) sig playINPUTDB1inplay{}
path underconsideration. (2) fact{all disj a,b: playINPUTDB1|!((a·title=b·title))}
(3) fact{all a: playINPUTDB1|
Onceall thestatements encountered along thepath have
oneb:authorINPUTDB2|a·theAuthor=b·name}
been analyzed, the set of relational constraints generated
during analysis can be solved to find values for the rela-
Thesecondstepexecutedbyouralgorithm istogenerate
tional symbolic variables that satisfy these constraints. If
arelational Alloytypeforlists, which arenotsupportedby
such a solution exists, the values of the symbolic variables
default in Alloy:
correspondingtotheprograminputsconstitutesometestin-
putdatathatwillguaranteetheexecutionoftheconsidered onesigNil{}
executionpath. Iftheconstraintshavenosolution,thenthe sig List {head:Int,tail : List + Nil}
considered path is infeasible.
The generated relational constraints are expressed using The algorithm can then proceed with symbolic execution
a widely used and well-documented language, offering good oftheprogramdefinedinthemodel. Thealgorithmconsid-
analysis tools, called Alloy [14]. erseachstatementonebyoneandfollows thepathreceived
as input. In the case of the example model and path con- 2. Call(s)toNextmadebeforeandnoexceptionthrown:
sidered in this section, the Load statement is symbolically
executed first. Symbolic execution for Load simply creates (a) No exception is thrown: Create a new symbolic
a new symbolic variable of typelist to represent the loaded variable to represent the content of the ”nexted”
value: SimpleDB variable. Generate a relational con-
straint stating that this new variable can be ob-
onesignewplaysINPUTPROG1inList+Nil{} tained from theold onebyremovingtheelement
with the lowest primary key value from the old
The second statement in the path is a While statement.
variable. Generate arelational constraint stating
As the path specifies that the loop body must be executed
that the newly created symbolic variable should
this time, a relational constraint is generated to specify
contain at least oneelement.
that the loop condition should be true. In this case, it
meansthatthecurrentcontent(representedbythesymbolic (b) An exception is thrown: Addarelationalconstraint
variablenewplaysINPUTPROG1)oftheSimpleDBvariable stating that the symbolic variable corresponding
newPlays should not betheempty list: to the current content of the ”nexted”SimbleDB
variable should contain exactly one element.
fact{!(newplaysINPUTPROG1=Nil)}
3. Last call to Next threwan exception:
Then the algorithm proceeds with symbolic execution of
the statements in the loop body, as specified within the in- (a) No exception is thrown: This is not allowed by
put path. The first statement is an Assignment statement. theSimpleDB semantics.
Symbolic execution for Assignment creates a new symbolic
(b) An exception is thrown: Do nothing.
variableofthecorrecttypetorepresentthenewvalueofthe
assigned variable and generates a constraint to specify that In the example considered here, the algorithm chose op-
this new symbolic variable contains the value that can be tion 1.b:
computed by evaluating the expression on the right of the
”=”symbol: (1) fact{#authorsINTERNALPROG2=0}
(2) onesigisemptyINTERNALPROG3inInt{}
onesigerrorINTERNALPROG1inInt{} (2) fact{isemptyINTERNALPROG3=1}
fact{errorINTERNALPROG1=0}
As the path specifies that the THEN branch of the IF
Symbolic execution for Read simply creates a new sym- statementmustbeexecutedthistime,arelationalconstraint
bolic variableof typeinteger to represent theread value: is generated to specify that the Ifcondition should be true.
In this case, in means that the current value (represented
onesigauthornameINPUTPROG2inInt{}
bythesymbolicvariableisemptyINTERNALPROG3)ofthe
SimpleDB variableisEmpty should beone:
SymbolicexecutionforSelectcreatesanewsymbolicvari-
ableofthetypeofthetableonwhichtheSelectqueryisexe-
fact{(isemptyINTERNALPROG3=1)}
cuted,torepresenttheSelectresulttable. Arelationalcon-
straint isthengeneratedtospecify thatarowispartofthe SymbolicexecutionforInsertcreatesanewsymbolicvari-
Selectresulttableifandonlyifitispartofthecurrentcon- able(1)forthenewcontentofthetableonwhichtheInsert
tent of the table on which the Select query is executed and statement is executed. Then a relational constraint is gen-
that it enforce the WHEREcondition of theSelect query: erated stating that this new variable can be obtained by
adding one row with the correct attribute values to the old
sig authorsINTERNALPROG2inauthor{}
fact{all e:author | (e in authorINPUTDB2 one (2). Finally, relational constraints are added to specify
&&(e·name=authornameINPUTPROG2)) that no constraint was violated duringinsert. In theexam-
⇔ e in authorsINTERNALPROG2} pleconsideredhere,arelationalconstraintisaddedtostate
that thereshould notbeanyrow in thepreviouscontentof
SymbolicexecutionforCatchfirstproceedswithsymbolic thetablewhoseprimarykeyvalueisthesameasintherow
execution of thestatement wrapped by theCatch and then inserted bythestatement (3):
actsasanAssignment(2)statement. Symbolicexecutionfor
Nextwilldependon thecursorstate(whichmustbestored (1) sig authorINTERNALDB1inauthor {}
bythealgorithm)ofthe”nexted”SimbleDBvariableandon (2) fact{onee:author|
authorINTERNALDB1=authorINPUTDB2+e
whether the Next statement should throw an exception or
&&e·numberOfPlays=1
not according tothe inputpath:
&&e·name=authornameINPUTPROG2}
(3) fact{noe:authorINPUTDB2|
1. No call to Nextmade before:
e·name=authornameINPUTPROG2}
(a) Noexception is thrown: Addarelationalconstraint
Here again, symbolic execution for Next statement first
stating that the symbolic variable corresponding
proceedswithsymbolicexecutionofthewrappedstatement
to the current content of the ”nexted”SimbleDB
and then acts as an Assignment statement (5). Symbolic
variableshould contain at least one element.
execution for this Insert statement acts identically as for
(b) Anexception is thrown: Addarelationalconstraint the previous Insert (1)(2)(3), but a relational constraint is
stating that the symbolic variable corresponding addedaswell,statingthatthenewlyinsertedrowreferences
to the current content of the ”nexted”SimbleDB an existing row in the current content of the Author table
variableshould contain noelement. (1) (4).
Table 1 details thesymbolic variables and relational con- 4. EXPERIMENTAL EVALUATION
straintsgeneratedbythealgorithmforeverypossiblebehav- Thealgorithmproposedinthisworkhasbeenprototyped
ior of an Insert, Update or Delete statement. Table 2 de- and evaluated experimentally using three SimpleDB test
scribestherulesusedbythealgorithmtotranslatebetween models. For each program, several execution paths have
SimpleDB and Alloy expressions and conditions. Table 3 beensymbolicallyexecutedbyouralgorithm. Foreachpath,
explains theabbreviations definedin table 1 and 2. weaskedtheAlloyanalyzer[14]tofindseveralsolutionsfor
the constraints generated by the algorithm. The evalua-
(1) sig playINTERNALDB2inplay{}
tion process was completed by checking that each of these
(2) fact{onee:play| playINTERNALDB2=playINPUTDB1+e
&&e·theAuthor =authornameINPUTPROG2 computedsolutions was actually inputdatawith respect to
&&e·title =newplaysINPUTPROG1·head} which the experimented program was guaranteed to follow
(3) fact{noe:playINPUTDB1| the selected path. Both the SimpleDB test models and the
e·title =newplaysINPUTPROG1·head} testedpathswerecarefully selectedtoofferagoodcoverage
(4) fact{onee:authorINTERNALDB1|
of the promised abilities of our algorithm. The Alloy ana-
e·name=authornameINPUTPROG2}
lyzer is a program which allows to solve Alloy constraints
(5) onesigerrorINTERNALPROG4inInt{}
(5) fact{errorINTERNALPROG4=0} in order to find structures that satisfy them. Basically, it
transforms the set of relational constraints into an equiva-
As the path specifies that the THEN branch of the IF lentsetofbooleanconstraints,andsolvesthemusingaSAT
statementmustbeexecutedthistime,arelationalconstraint solver. Themainstatisticsmeasuredduringtheexperimen-
is generated to specify that theIf condition should be true: tation process for each of the three models are synthesized
in table 4. The whole material used and produced during
fact{(errorINTERNALPROG4=0)} the experimentation process, including the source code of
the SimpleDB test models and the Alloy constraints gener-
Symbolic execution for Commit statements simply does atedbyouralgorithm,aswellasalltheperformance-related
nothing. Symbolic execution for Rollback statements tells informationgatheredduringtestscanbefoundontheweb1.
the algorithm to use the symbolic variable representing the The first SimpleDB test model contains eighty-five lines
contentofeachdatabasetablebeforethelastexecutedCom- of SimpleDB code that performs repeated manipulations of
mitstatement(savedbythealgorithm)torepresentthecur- integersandlistsusingassignment,IfandWhilestatements.
rent content of the table. First, twolists areloaded andtheirsizearecompared. The
SymbolicexecutionforAssignmentcreatesanewsymbolic program alsoreadsasmuchintegersfrom theoutsideworld
variableofthecorrecttypetorepresentthenewvalueofthe as the number of elements in the shortest of the two lists.
assigned variable and generates a constraint to specify that A third list is created using these integers in the inverted
this variable contains the value that can be computed by order of the one in which they were read. If the two first
evaluating theexpression on theright of the”=”symbol: lists have the same size, the elements of the three lists are
compared. Ifthesecondlistisaninvertedversionofthefirst
onesignewplaysINTERNALPROG5inList+Nil{}
one and if the third list is a copy of the first list where the
fact{newplaysINTERNALPROG5=newplaysINPUTPROG1·tail}
value of each element was doubled, then the three lists are
inserted into a database table. The database schema and
Asthepathspecifiesthattheloopbodymustnotbeexe-
theway thelists are inserted in thisdatabaseconstrain the
cutedanymore,arelationalconstraintisgeneratedtospec-
elementsin thefirst list tobedifferentfrom eachotherand
ify that theloop condition should be false.
their value to range between one and five. The eight paths
fact{!(!(newplaysINTERNALPROG5=Nil))} selected for this first test model were chosen so that only
input lists with particular size and content allow the path
As all the statements in the path havebeen symbolically tobefollowed duringprogram execution. Between eighteen
executed,thealgorithmstopsandreturnsthegeneratedAl- and one hundredand threesymbolic variables and between
loyconstraintsmodel. TheAlloyanalyzer[14]canbeasked thirtyandonehundredandfortyrelationalconstraintswere
to find a solution for these constraints using the following generated bythealgorithm for each of thetested paths.
commands: The second SimpleDB test model contains seventy five
lines of SimpleDB code that performs repeated reads and
assertinputsExist{!(newplaysINPUTPROG1inList+Nil writes in a database containing four tables that represent
&&authornameINPUTPROG2inInt customers making purchases of products. Customers with
&&authorINPUTDB2inauthor
fewpurchasesareprospectcustomers. First,theprogramin-
&&playINPUTDB1inplay)}
serts a new customer and new purchases intothe database.
checkinputsExist
Then it computes the total number and cost of the pur-
The solution returned by the Alloy analyzer for the sym- chases made by each customer, as well as the total number
bolicvariablesrepresentingtheinputvaluesoftheprogram of purchases for each product, and then updates the corre-
constitute input data (in this case, two empty initial con- spondingcustomerandproductattributes. Allunpurchased
tentsfortheauthorandplaytables,onelistcontainingonly products are deleted, and the name of the customers hav-
the integer 7 as input for the LOAD statement and the in- ing made no purchase is changed. Customers whose total
teger 7 as input for the READ statement) for the program count and cost of purchase is lower than two are registered
which guarantee that theconsidered path will beexecuted: as prospect customers. Finally, a product is replaced by
another one in every purchase details, and this product is
{authorINPUTDB2={}, playINPUTDB1={}, newplaysIN-
PUTPROG1=[7,NIL], authornameINPUTPROG2=7} 1See http://info.fundp.ac.be/~mmr/issta13
Table 1: Symbolic variables and relational constraints generation for hdb-writei statements
hdb-writei ::= if (noexception throwninpathforthisINSERT) {
INSERT INTO hidi sig freshAlloyVar in hidi {}
VhiAnLtU-eExSp(rhiini,t.-..e,xhipnrti-1e,x.p..r,in) fffaaacccttt {{{ ononnoeee:eea::lfhlkiodt∗yaiNb|a|mfreee·sf|hkAp∗ek·llpo=kyV=aalral=ollyaoOlyloOfy(hfN(inhaimtn-tee-x+epxerpi&rfki&p∗pokspeo)·sat)t}∗}//=//EaPlvlroeiyrmyOaffr(oyhrienkietg-ynexiksperynio∗its)v}niooltatveidolated
} else { if (Exceptionthrowninpath: insertedprimarykeyvaluealreadyexistsinthetable) {
fact { onee:alloyName| e·pk =alloyOf(hint-expripkpos)}
} else if (Exceptionthrowninpath: aninsertedforeignkeyvaluereferencesnorow) {
fact { noe:fktiab | e·fkpik =alloyOf(hint-exprifkpios) }
} else if (Exceptionthrowninpath: aninsertedattributeviolatesanarithmeticconstraint) {
fact {!(hint-expricopos coright)}} }
hdb-writei ::= if (noexception throwninpathforthisUPDATE) {
UPDATE hidi sig freshAlloyVar in hidi {}
SET hidiatt=hint-expri fact { all e:alloyName | (alloyOf(hdb-condi,hidi,e) &&(oney:freshAlloyVar|
WHERE hdb-condi ; y·att∗−{hidiatt}=e·att∗−{hidiatt} &&y·hidiatt =alloyOf(hint-expri,hidi,e)))
|| (!(alloyOf(hdb-condi,hidi,e)) &&(oney:freshAlloyVar|equalhidi[y,e])) }
fact { all y:freshAlloyVar | onee:alloyName| (alloyOf(hdb-condi,hidi,e) &&
y·att∗−{hidiatt}=e·att∗−{hidiatt} &&y·hidiatt =alloyOf(hint-expri,hidi,e)))
|| (!(alloyOf(hdb-condi,hidi,e)) &&(oney:freshAlloyVar|equalhidi[y,e])) }
fact { all disj a,b:freshAlloyVar | !(a·pk =b·pk) } // Primary key isnot violated
// If hidiatt =fki, updated rows should still reference a row
fact { all a:freshAlloyVar | oneb:fktiab | a·hidiatt =b·fkpik }
// If hidiatt = pk, none of the updated rows should have been referenced by another row
fact { all e:alloyName | nof:ifkt∗ab | (alloyOf(hdb-condi,hidi,e) &&e·hidiatt =f·ifka∗tt ) }
} else { if (Exceptionthrowninpath: updateonprimarykeyleadstoduplicateprimarykeys) {
fact { somedisja,b:alloyName| ((alloyOf(hdb-condi,hiditab,a)
&&alloyOf(hdb-condi,hiditab,b)&&(alloyOf(hint-expri,hidi,a)
=alloyOf(hint-expri,hidi,b))) || (!(alloyOf(hdb-condi,hiditab,a))
&&alloyOf(hdb-condi,hiditab,b)&&(a·hidiatt =alloyOf(hint-expri,hidi,b))) ) }
} else if (Exceptionthrowninpath: tryingtoupdatetheprimarykeyofareferencedrow) {
fact { somea:alloyName|alloyOf(hdb-condi,hiditab,a)
&&(someifktjab |(b·ifkajtt =a·hidiatt))}
} else if (Exceptionthrowninpath: updateonforeignkeyletrowwithoutrowtoreference) {
fact { somea:alloyName|alloyOf(hdb-condi,hiditab,a)
&&(nob:fktab |b·fkpk=alloyOf(hint-expri,hidi,a)) }
i i
} else if (Exceptionthrowninpath: aninsertedattributeviolatesanarithmeticconstraint) {
fact { somea:alloyName|alloyOf(hdb-condi,hiditab,a)
&&!(alloyOf(hint-expri,hidi,a))coright)}} }
i
hdb-writei ::= if (noexception throwninpathforthisDELETE) {
DELETE FROM hidi sig freshAlloyVar in hidi {}
WHERE hdb-condi ; fact {freshAlloyVar =alloyName−{e:alloyName| alloyOf(hdb-condi,hiditab,e)}}
// Not tryingto delete a referenced row
fact {all e:alloyName | nof:ifktjab | (alloyOf(hdb-condi,hiditab,e) &&e·pk=f·ifkajtt)}
} else { // Tryingto delete a referenced row
fact {somee:alloyName|alloyOf(hdb-condi,hiditab,e) &&(somef:ifktjab |e·pk=f·ifkajtt) } } }
Table 2: Translation of SimpleDB expressions and conditions into Alloy
Parameters alloyOf(Paramaters)
hidi alloyName
n∈N n
(hint-expri1 {+|-|*|/} hint-expri2) (alloyOf(hint-expri1)).{add | sub | mul| div}[alloyOf(hint-expri2)]
( - hint-expri ) (- (alloyOf(hint-expri)))
hidi.HEAD alloyName.head
hiditab(hidiatt) {a:alloyNametab | all b:alloyNametab | a.pktab<=b.pktab}.hidiatt
NIL Nil
hidi.TAIL alloyName.tail
TRUE (0=0)
FALSE (0=1)
(hcondi1 {&&|||} hcondi2) (alloyOf(hcondi1) {&&|||} alloyOf(hcondi2))
( ! hcondi ) (!(alloyOf(hcondi))
(hint-expri1 {<|=|>} hint-expri2) ((alloyOf(hint-expri1)) {<|=|>} (alloyOf(hint-expri2)))
(hidi=NIL) (alloyName=Nil)
alloyOf(x,y,z) behaves in a similar way to alloyOf(x) except in the two following cases
hidi, table, row if (tablecontains hidi) then row.hidi elsealloyName
(hidi{<|=|>} hint-expri), table, row (row.hidi {<|=|>} (alloyOf(hint-expri,table,row)))
Table 3: Abbreviations list and details
Abbreviation Meaning
freshAlloyVar A newAlloy variable namethat has still not been used in theAlloy code generated so far.
alloyNamesuperscript if (hidisuperscript refers to a database table name) then The symbolic variable that represents the
subscript subscript
currentcontentoftablehidisuperscript)elseThesymbolicvariablethatrepresentsthecurrentcontent
subscript
of theSimpleDBvariable hidisuperscript
subscript
atti Nameof the ith attributein thelist of attributesof table hidi
pksuperscript Nameof the primary keyattributeof table hidisuperscript.
subscript subscript
pkpos Position of primary keyin thelist of attributesof table hidi
fktab Nameof the tablereferenced by thei foreign keyin thelist of foreign keysof table hidi
i th
fkpk Nameoftheprimarykeyattributeofthetablereferencedbythei foreign keyinthelist offoreign
i th
keysof table hidi
fkpos Position of the foreign key attribute, declared by the i foreign key in the list of table hidi, in the
i th
list of attributesof tablehidi
fki Nameof the foreign key attributedeclared bytheith foreign key in thelist of tablehidi
ifktab Nameof the tablewhere is declared thei foreign keyreferencing table hidiin thewhole schema
i th
ifkatt Nameoftheforeignkeyattributedeclaredbythei foreignkeyreferencingtablehidiintheschema
i th
copos Position of theattributeconstrained by thei arithmetic constraint declared in table hidi
i th
coright Right part of thei arithmetic constraint declared in table hidi(i.e. right part of ”a>0”is ”>0”)
i th
xx∗ means ”for each xxi”and xx∗−{y} means”for each xxi except from y”
Table 4: Experimental evaluation statistics
Symbolic variables Relational constraints Constraints solving time
Model Code lines Tested paths
Min Max Min Max Min Max
1 85 8 18 103 30 140 132 ms 4,620 ms
2 65 8 4 47 17 131 309 ms 262,320 ms
3 71 3 21 47 35 76 118 ms 2,215 ms
deleted. The eight paths experimented over this program program, thereby inserting new variables representing the
werecarefullyselectedtotestthegeneration ofcorrectcon- databasestructure,andtranslatingall SQLstatementsand
straintsformostpossiblebehaviorsforSQLstatementsover integrity checks into imperative program code. Classical
differentkindsoftablesstructuresandconstraints. Between white-box testing approaches can then be applied to the
four and forty seven symbolic variables and between seven- modified program. In [10], the authors propose an algo-
teen and one hundred and thirty one relational constraints rithm for testing an imperative program performing SE-
weregeneratedbythealgorithmforeachofthetestedpaths. LECT queries on a relational SQL database, based on a
ThethirdSimpleDBtestmodelcontainsseventyonelines simultaneous concrete and symbolic (concolic) dynamicex-
of SimpleDB code that mixes SQL statements with imper- ecution of some of its execution paths. Concolic execution
ative code and uses SQL transactions. The database con- runstheprogramonrandominputdataandonarandomly
tainstwotablesthatrepresentauthorswritingtheaterplays. populated input database. Given the dynamic exploration
The code contains two transactions. During a first transac- ofanexecutionpathoftheprogram,theauthorsmodeland
tion, some authors are added and some removed from the solvetheproblemoffindingotherinputs,allowingtoexplore
database. During a second transaction, plays are added for dynamically anotherexecutionpath,asa set of integerand
the previously added authors, and some statistics are com- stringconstraintsoverthequantityandfieldcontentsofthe
puted for each author. If a database schema constraint is records in the database and over the input variables of the
violatedbyaSQLstatementinoneofthetwotransactions, tested program. These constraints must be combined with
this whole transaction is cancelled and thedatabase is roll- the constraints derived from the database schema. In [34],
backed to state it was when the transaction was launched. authors adopt a similar concolic approach where the pro-
The selected paths for this third test model focus on the gramisexecutedonaparameterizedmockdatabase. In[21]
transaction management and contain a path in which both and[30],authorsadaptthisapproachtotestingofprograms
transactions are commited, one in which the first transac- runningonanexistingdatabase,sothatinputtestdatacan
tion is committed but the second is rolled back due to a beselectedinthisdatabaseinsteadofbeinggeneratedfrom
foreign-key constraint violation, and a third one in which scratch. In [29], the same concolic approach is applied con-
both transactions are partially executed but rolled back, sideringadvancedcodecoveragecriteria. Comparedtoallof
also due to a foreign-key constraint violation in each of the these approaches, our approach does not need to transform
transactions. Between twenty one and forty seven symbolic theoriginalprogram,offersacleanmodelingoftheproblem
variables and between thirty five and seventy six relational as a single relational constraints system generation prob-
constraints were generated for each of the tested paths. lem, and allows to account for Insert, Update, and Delete
For each of the tested paths among the three examples, statements, as well as transactions management primitives,
everysolutionoftheconstraintsgeneratedbyouralgorithm that are commonly used in database applications. On the
provided a correct set of program inputs leading to the ex- other hand, our approach only considers static SQL where
ecution of the path. Concerning the infeasible path of the theconcoliconesallowtoaccountfordynamicSQL.Finally,
second test model, the Alloy analyzer was able to detect translation between database schemas/programs and Alloy
thatthegeneratedrelationalconstraintswerenotsatisfiable modelshasalreadybeenconsideredinothercontexts[5,15].
and did not proceed with SAT solving. Performance of the In future work, we intend to make our technique able to
constraint solvingprocess wasacceptablein mostcases, ex- generate inputs for more complex interaction scenarios be-
cept for the longest paths involving many SQL statements, tween databases and programs. First, it would be relevant
whereittookuptofourminutestosolvetheconstraintson to evaluate how and up to which extent the symbolic exe-
a recent dual core x86 processor with 8GB of memory. cutionmechanismproposedhereforsimpleSQLstatements
and simple relational database schemas can be generalized
5. CONCLUSIONAND RELATED WORK to more elaborate ones. Secondly, it should be investigated
how dynamic SQL can be integrated with our approach,
Inthiswork,wehaveproposedanddetailedacompleteal-
possibly relying on static analysis [37, 35] or on concolic
gorithm to execute symbolically database programs. Given
execution. Thirdly, it happens frequently that SQL state-
adatabaseprogramandanexecutionpathinthisprogram,
mentshaveanon-deterministicbehavior,eitherbecausethe
the algorithm uses static analysis to generate a symbolic
underlying DBMS executing the statement behaves non-
variableforeachpotentialvaluetakenbyaprogramvariable
deterministically, or because the database is modified con-
or database table before and during the path execution. It
currently by several programs. Whether and how the ap-
generates as well an Alloy relational constraints modelcon-
proachproposedherecanencompasssuchnon-deterministic
strainingthesesymbolicvariablestoguaranteetheexecution
behaviors remains a topicfor furtherresearch.
oftheconsideredpath. Anysolutiontotheseproducedcon-
Finally,ourapproachallowstobeusedwithrespecttoany
straints describes input data for the program, including an
classicalcodecoveragecriterionbasedonthenotionofanex-
initial content for the database, with respect to which the
ecutionpath. Nevertheless,severalworks[13,18,32,33,38]
program can be executed and is guaranteed to follow the
propose test adequacy criteria particularly adapted to the
considered execution path. Given a set of execution paths
testing of database-driven programs. Integrating such par-
totestinthedatabaseprogram,satisfyingagivencodecov-
ticular coverage criteria into our constraint-based approach
eragecriterion,theproposedalgorithmcanbeusedtogener-
is a topic of ongoing research.
ateinputsforeachpathintheset. Theseinputscanthenbe
used in turn as adequate test data for structural white-box
Thisworkisunpublishedworkpreliminaryto[25,23,24].
testing of the program.
An early approach that has considered test data gener-
6. ACKNOWLEDGMENTS
ation for imperative programs interacting with a relational
SQL database is [3]. The paper proposes to transform the This work has been funded by the Belgian Fund for Sci-