* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download X-Data: Test Data Generation for Killing SQL Mutants
Survey
Document related concepts
Transcript
Bhanu Pratap Gupta
Devang Vira
S. Sudarshan
Dept. of Computer Science and Engineering, IIT Bombay
Complex SQL queries hard to get right
Question: How to check if an SQL query is
correct?
Formal verification is not applicable since we do
not have a separate specification and an
implementation
State of the art solution: Generate test databases
and check if the query gives the intended result
2
Automated Test Data generation
Based on database constraints, and SQL query
▪ Agenda [Chays et al., STVR04], a tool which generates test cases
for database applications which additionally uses user fed
heuristics
Ensuring query result is not empty
▪ Reverse Query Processing [Binning et al., ICDE07] takes desired
query output and generates relation instances
▪ Handle a subset of Select/Project/Join/GroupBy queries
None of the above guarantee anything about detecting errors in SQL
queries
Question: How do you model SQL errors?
Answer: Query Mutation
3
Mutant: Variation of the given query
Mutations model common programming errors, like
▪ Join used instead of outerjoin (or vice versa)
▪ Join/selection condition errors
▪ < vs. <=, missing or extra condition
▪ Wrong aggregate (min vs. max)
Mutant may be the intended query
4
Traditional use of mutation testing has been to check coverage of
dataset
Generate mutants of the original program by modifying the program in
a controlled manner
A dataset kills a mutant if query and the mutant give different results
on the dataset
A dataset is considered complete if it can kill all non-equivalent
mutants of the given query
Prior work:
Tuya and Suarez-Cabal [IST07], Chan et al. [QSIC05] defined a class of
SQL query mutations
Shortcoming: do not address test data generation
Our goal: generated dataset for testing query
Test dataset and query result on the dataset are shown to human, who
verifies that the query result is what is expected given this dataset
Note that we do not need to actually generate and execute mutants
5
Address the problem of test data generation for killing
non-equivalent mutants
Equivalent Mutants: r(A,B)
s(B,C) and r(A,B) s(B,C) where
r.B is a foreign key to s, and is not null will always produce the
same resultset
Define class of:
Join/outerjoin mutations
Selection predicate mutations
Algorithm for test data generation that kills all nonequivalent mutants in above class
Under some simplifying assumptions (given in the paper)
With the guarantee that generated datasets are small and
realistic, to aid in human verification of results
6
Join type mutations: An occurrence of a join operator
( , , , ) is replaced by one of the other join
operators
Defining join mutations in SQL is complicated by the
absence of a particular join order
SELECT * FROM a,b,c WHERE (a.x = b.x) and (b.x = c.x)
We consider all relational algebra expressions (trees)
equivalent (under inner join reordering) to the given
SQL query
We consider join type mutations to single join nodes in
each tree above
7
Case I: Mutation at root node, with no foreign key
constraints
Schema: r(A), s(B)
To kill this mutant: ensure that for an r tuple there is no
matching s tuple
Generated test case: r(A)={(1)}; s(B)={}
Basic idea:
(a) run query on given database,
(b) from result extract matching tuples for r and s
(c) delete s tuple to ensure no matching tuple for r
8
Case II: Extra join above mutated node
Schema: r(A,B), s(C,D), t(E)
To kill this mutant we must ensure that for an r tuple there
is no matching s tuple, but there is a matching t tuple
Generated test case: r(A,B)={(1,2)}; s(C,D)={}; t(E)={(2)}
9
Given join expression on relations r1, r2, …, rn
Create dataset where all relations have a set of matching tuples
For each relation ri, generate a dataset where rest of relations
match, but ri is empty
▪ Unless making ri empty makes join graph disconnected
Above procedure kills all join type mutations of given
inner join tree
Outer joins complicate picture when attributes are projected
out
▪ May have to make more than one ri empty at a time
Foreign keys may prevent making some ri empty
10
Case III: Mutation at root node with foreign key constraints and selection
on right side
Schema: r(A), s(B,C)
Foreign key: r.A →s.B
To kill this mutant we must create an s tuple which matches with the r
tuple on the foreign key reference, but which has s.C ≠ 4
Generated test case: r(A)={(2)}; s(B,C)={(2,5)}
Notion of valid nullable pattern defined in paper specifies which
relations can be made null/non-matching, given foreign key constraints
and join graph
11
Implemented using Java and PostgreSQL
Creates datasets by extracting and modifying
tuples from given database
Currently handles join type mutation and
selection predicate mutation
For creating a merged dataset
▪ Tuples having same values for join attributes must be
blocked from being inserted again
Handling selection predicate mutation
▪ Eg. to distinguish r.A < 3 and r.A <= 3 we generate tuples
with r.A = 2 and 3
12
Ongoing work :
Synthetic data generation taking database and
query constraints into account which is non trivial
▪ Idea (from RQP [Binning et al ICDE07]): Use a model
checker to generate data
▪ Under implementation using CVC3
Extend the technique to handle aggregations and
sub-queries
Future work: data generation for application
code with multiple queries
13
Questions
Problem: is Q equivalent to a mutant Q‘ can be
reduced to query containment and vice versa in
polynomial time
The Chase algorithm can be used to generate datasets
to show that Q and Q' are not equivalent (for SPJ
queries and several extensions)
such a dataset would kill the mutant Q‘
limited work on outerjoin containment data generation
However we don't want to enumerate each mutant
and generate separate datasets
too expensive
15
Under the following conditions we can generate
merged datasets:
Tuples having same values for join attributes must be
blocked from being inserted again
The query must not contain any equality selection on
an unique key
The result of the query must contain one or more
attributes which together form an unique key for any
relation
Also attributes from the result forming an unique key
must be guaranteed to be non-null in the result
16
Consider the three relations :
Student(name, deptcode, progcode),
Department(deptcode, deptname)
Program(progcode, progname)
And a query:
SELECT rollno, name, deptname, progname
FROM student s INNER JOIN department d
ON s.deptcode=d.deptcode INNER JOIN program p
ON s.progcode=p.progcode
17
Query Tree 1
Query Tree 2
Query Tree 3
Generate mutants by mutating join operator
of a single node for all above trees
18
Program
Department
Student
Progcode
Progname
Deptcode
Deptname
Rollno
0
B.Tech
CS
Computer
501
1
M.Tech
CH
Chemical
2
PhD
ME
Mechanical
Name
progcode
deptcode
Devang
1
CS
401
Abhijeet
0
CE
701
Sandeep
5
CH
101
Aditya
4
MA
Generated data shows :
A student (Devang) with valid program and department
A student (Abhijeet) with invalid department
A student (Sandeep) with invalid program
A student (Aditya) with invalid program invalid program and
department
A program (PhD) with no student
A department (Mechanical) with no student
19
Program
Department
Student
Progcode
Progname
Deptcode
Deptname
Rollno
Name
progcode
deptcode
0
B.Tech
CS
Computer
501
Devang
1
CS
1
M.Tech
EE
Electrical
Foreign Keys are:
Student.progcode → Program.progcode
Student.deptcode → Department.deptcode
Generated data shows :
A student (Devang) with valid program and department
A program (B.Tech) with no student
A department (Electrical) with no student
20
Case of no foreign keys
21