* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Download Recursive XML Schemas, Recursive XML Queries, and Relational
Oracle Database wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
Microsoft SQL Server wikipedia , lookup
Open Database Connectivity wikipedia , lookup
Microsoft Jet Database Engine wikipedia , lookup
Concurrency control wikipedia , lookup
ContactPoint wikipedia , lookup
Clusterpoint wikipedia , lookup
Relational model wikipedia , lookup
Recursive XML Schemas, Recursive XML
Queries, and Relational Storage: XML-to-SQL
Query Translation
Krishnamurthy, R., Chakaravarthy, V.T., Kaushik, R., &
Naughton, J.F. (2004) Proceedings of the 20th
International Conference on Data Engineering
(ICDE’04)
Presenter:
Jochen Stoesser
mailto: jstoesse@connect.carleton.ca
Motivation (1)
• General topic: XML-to-SQL query translation
• 3 scenarios of XML shredding
schema-oblivious shredding
XML Publishing
schema-based shredding
April 5, 2005
Advanced Database Systems, Jochen Stoesser
2
Motivation (2): Schema Graph
• Tree schema graph (see example)
• directed acyclic (DAG)
schema graph
multiple incoming
edges
• recursive schema
graph
Source: Krishnamurthy et al. (2004)
April 5, 2005
Advanced Database Systems, Jochen Stoesser
3
Motivation (3): Schema-based Shredding
• for each non-leaf node create separate
relation
Book(id, title)
Author(id, parentid)
Section(id, parentid,
parentcode, title)
Para(id, parentid)
Figure(id, parentid, caption, image)
• each leaf node is associated with
a column name
• parentid and parentcode to preserve structure
April 5, 2005
Advanced Database Systems, Jochen Stoesser
4
Motivation (4)
• Focus of the paper: XML-to-SQL query translation with
schema-based shredding, especially in the presence of
recursive XML query, e.g.
/book//section/title
// /descendant-or-self::node()/
recursive XML schemas
•
(>> example)
not solved by
existing schema-based shredding algorithms,
schema-oblivious shredding algorithms,
XML Publishing algorithms
April 5, 2005
Advanced Database Systems, Jochen Stoesser
5
Queries: SQL(p)
• example: retrieve nodes 9 (titles of subsections, i.e.
nodes 7)
/book/section/section/title
• path p = <book, section(4), section(7), title>
• SQL(p):
select
from
where
S2.title
Book B, Section S1, Section S2
B.id=S1.parentid and S1.parentcode=1 and
S1.id=S2.parentid and S2.parentcode=2;
>>
April 5, 2005
Advanced Database Systems, Jochen Stoesser
6
Queries: RtoL(l)
• root-to-leaf SQL query
• possibly multiple root-to-leaf paths p1, ..., pm to leaf l
• RtoL(l) := m
i = 1SQL(pi )
• retrieves all information that would be retrieved
traversing all paths from the root to leaf l
• Problem: recursive schema
possibly infinite number of root-to-leaf paths
RtoL query: union of infinitely many queries
April 5, 2005
Advanced Database Systems, Jochen Stoesser
*
7
Query Translation
• two stages:
1. Identify the paths in the XML schema graph that satisfy the
query: PathId stage
2. Use annotations (schemas) from XML-to-Relational mapping
to construct equivalent relational query: SQLGen stage
XML schema
PathId
Mapping
schema
SQLGen
SQL query
XML query
April 5, 2005
Advanced Database Systems, Jochen Stoesser
8
PathId Stage
• Problems:
recursive schema: number of paths possibly infinite
DAG graph: exponential number of paths
• General idea:
Represent matching paths as graph instead of
enumerating to reflect shared information across
multiple paths (will become important for SQLGen
stage)
execute query on a schema graph and identify
statisfying paths
April 5, 2005
Advanced Database Systems, Jochen Stoesser
9
PathId Stage
• Algorithm outline:
Take automaton AS representing schema graph S
Translate Query into DFA AQ
Create cross-product automaton ASQ from AS and AQ,
eliminate all dead states
ASQ contains all matching paths
View ASQ as mapping schema SSQ
>>
April 5, 2005
Advanced Database Systems, Jochen Stoesser
10
PathId Stage: XPath Semantics
Query: //section//title
Source: Krishnamurthy et al. (2004)
April 5, 2005
Advanced Database Systems, Jochen Stoesser
11
SQLGen Stage
XML schema
PathId
Mapping
schema
SQLGen
XML query
• informally: union of all RtoL in mapping schema SSQ
corresponds to query result
• problem: DAG and recursive graphs & queries
DAG: number of paths can be exponential in the size of
the component
recursive graphs & queries: infinite number of matching
paths
April 5, 2005
Advanced Database Systems, Jochen Stoesser
12
SQLGen Stage
• Solution: SQL99 with-clause
• Used to create temporary relations for
• Nodes in DAG components
shared computation, reflects shared information
contained in paths through DAG components
decrease exponential to polynomial complexity!
• Recursive components
>>
April 5, 2005
Advanced Database Systems, Jochen Stoesser
13
SQLGen: Algorithm (1)
• Query: /E0//E10
c1
• Find mapping schema SSQ (S=SSQ)
c2
• strongly connected, non-recursive
components (i = Ei):
{0}, {1}, {2},{3}, {4}, {5}, {6},
{10}, {11}
• merge adjacent components ci
and cj if ci dominates cj
c3
c1 = {0, 1, 2, 3, 4, 5, 6}, c3 = {10, 11}
E11
• recursive component c2 = {7, 8, 9}
Source: Krishnamurthy et al. (2004)
April 5, 2005
Advanced Database Systems, Jochen Stoesser
14
SQLGen: Algorithm (2), DAG components
• further process in top-down topological order
• c1 = {0, 1, 2, 3, 4, 5, 6} non-recursive DAG component
• create temporary relation for each non-root node that is
leaf node
has child or parent in different
component
multiple incoming/outgoing edges
( shared computation)
• N1 = {2, 3, 6}
>>
April 5, 2005
c1
Advanced Database Systems, Jochen Stoesser
15
SQLGen: Algorithm (3), DAG components
• for example for node E6 N1:
shared computation
with T6 as (
select R6.* from R4, T3 where
R4.id=R6.parentid and
T3.id=R4.parentid and
R6.parentcode=1
union all
select R6.* from R5, T3 where
R5.id=R6.parentid and
T3.id=R5.parentid and
R6.parentcode=2
)
April 5, 2005
Advanced Database Systems, Jochen Stoesser
16
SQLGen: Algorithm (4), recursive
components
• c2 = {7, 8, 9} recursive component
• temporary relation TR, schema is
union of the schemas of nodes in TR
• two parts:
1. Initialization part
captures all incoming edges
2. Recursive part
captures recursion
>>
April 5, 2005
Advanced Database Systems, Jochen Stoesser
17
SQLGen: Algorithm (5), initialization
• incoming edges from other components:
(2, 8) and (3, 7)
shared computation
• Q1 = select R8.*, id(8) as schemanode from T2, R8
where R8.parentcode=2 and R8.parentid=T2.id
• Q2 = select R7.*, id(7) as schemanode from T3, R7
where R7.parentcode=3
and R7.parentid=T3.id
• Qinit = Q1 Q2
April 5, 2005
Advanced Database Systems, Jochen Stoesser
18
SQLGen: Algorithm (6), recursion
• edges within the recursive component c2:
(7, 9), (8, 7), (8, 9), (9, 8)
• construct recursive query for each of these edges, e.g.
Qe1=(7,9) = select R9.*, id(9) as schemanode from TR, R9
where TR.schemanode=id(7) and
R9.parentid=TR.id and R9.parentcode=1
• recursive part QR = jQ ej
TR = Qinit QR
• for each n c2: temporary relation
T(n) = select * from TR where schemanode=id(n)
(>> example)
April 5, 2005
Advanced Database Systems, Jochen Stoesser
19
SQLGen: Final Query
• result of SQLGen stage:
temporary relations
-
T2, T3, T6 for c1
-
T7, T8, T9, and TR for c2
-
T10 and T11 for c3
Final query:
/E0//E10
algorithm select elemid from T11
April 5, 2005
Advanced Database Systems, Jochen Stoesser
20
Empirical Evaluation (1)
• XMark Benchmark schema
• Translation of query fragments that appear in query suite
• XML-to-Relational mapping schema has 101 nodes
• Size of cross-product schema in all cases < 100 nodes
• Result: Translation processes for each query < 6ms
April 5, 2005
Advanced Database Systems, Jochen Stoesser
21
Empirical Evaluation (2)
• test under extreme conditions
• all transitions on single label x
• query //x//x//x//x//x
• mapping schema complete graph (clique) of n nodes
• runtime of translation algorithm:
Source: Krishnamurthy
et al. (2004)
April 5, 2005
Advanced Database Systems, Jochen Stoesser
22
Contributions
• Claim: „first to present a generic algorithm that translates
path expression queries to SQL in the presence of recursion
in the schema in the context of schema-based XML storage
shredding of XML into relations“
• Algorithm translates a path expression query into a single
SQL query, irrespective of the XML schema‘s complexity
• SQL query‘s size polynomial in size of the input XML-toRelational mapping and the XML query
April 5, 2005
Advanced Database Systems, Jochen Stoesser
23
Limitations
• High complexity of SQL query even for relatively easy XML
queries
• Although running time may be small, memory requirements
may be high due to many temporary relations
• Furthermore, although authors indicate solutions for XPath
semantics and branching path expression queries (e.g.
p1[p2 op value]), there is no proposition about increase in
complexity regarding runtime, memory requirements etc.
April 5, 2005
Advanced Database Systems, Jochen Stoesser
24
Q&A
?
April 5, 2005
Advanced Database Systems, Jochen Stoesser
25
References
• Krishnamurthy, R. (2004) XML-to-SQL Query Translation.
Dissertation at the University of Wisconsin-Madison. Retrieved at
March 18 from http://www.cs.wisc.edu/~sekar/research/main.pdf
• XPath v1.0. W3C. Retrieved at March 18 from
http://www.w3.org/TR/1999/REC-xpath-19991116.html
• Tian, F., DeWitt, D.J., Chen, J., & Zhang, C. The Design and
Performance Evaluation of Alternative XML Storage Strategies.
Retrieved at April 4 from
http://www.cs.wisc.edu/~czhang/doc/publications/feng6page.pdf
April 5, 2005
Advanced Database Systems, Jochen Stoesser
26
Appendix: Recursive XML Schema
<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
version="1.0">
<xs:element name="element">
<xs:complexType>
<xs:sequence>
<xs:any processContents="skip" minOccurs="0" />
<xs:element ref="element" minOccurs="0" />
</xs:sequence>
*
</xs:complexType>
</xs:element>
element
</xs:schema>
<<
April 5, 2005
Advanced Database Systems, Jochen Stoesser
27
Appendix: SQL(path p)
<<
April 5, 2005
Advanced Database Systems, Jochen Stoesser
28
Appendix: PathId(Q,S)
<<
April 5, 2005
Advanced Database Systems, Jochen Stoesser
29
Appendix: SQLGen(SSQ)
<<
April 5, 2005
Advanced Database Systems, Jochen Stoesser
30
Appendix: SQLFromDAG(c)
<<
April 5, 2005
Advanced Database Systems, Jochen Stoesser
31
Appendix: SQLFromRecursive(c) (1)
April 5, 2005
Advanced Database Systems, Jochen Stoesser
32
Appendix: SQLFromRecursive(c) (2)
<<
April 5, 2005
Advanced Database Systems, Jochen Stoesser
33
Appendix: TR example
<<
April 5, 2005
Advanced Database Systems, Jochen Stoesser
34