* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Benchmarking XML storage systems - Index of
Semantic Web wikipedia , lookup
Asynchronous I/O wikipedia , lookup
Business intelligence wikipedia , lookup
Operational transformation wikipedia , lookup
Search engine indexing wikipedia , lookup
Resource Description Framework wikipedia , lookup
Relational model wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
National Information Exchange Model wikipedia , lookup
Versant Object Database wikipedia , lookup
Clusterpoint wikipedia , lookup
Benchmarking XML storage systems
Information Systems Lab HS 2007
Final Presentation
© ETH Zürich | Benchmarking XML
11.02.08
Agenda
Project Overview
Motivation
Goal of the Project
Benchmark Overview
Results
11.02.08
RDBMS 1
Sedna
MonetDB
Benchmarking XML – Final Presentation
2
Motivation
Traditional DBMS use relational data model
Vendors extend their systems to process XML or
build new native stores
XML processing is conceived to be slow
Benchmarks for XML are just being developed
11.02.08
Benchmarking XML – Final Presentation
3
Goal of the Project
Analyse and compare performance of different
systems to process XML
Systems tested:
RDBMS1 – big player in the relational DBMS market,
extended their product with XML capabilities
Sedna – free native XML DB designed to be a
universal system for a wide range of XML applications
MonetDB – very fast compared to other XML-DBs, but
only supports a small part of the XQuery functions
11.02.08
Benchmarking XML – Final Presentation
4
Benchmark
Benchmark used : TPC-X
currently under development at ETH
models an Amazon-like online store in XML
complete database is one XML file
e.g.: users with history, products with comments
complex queries that put stress on query engine
11.02.08
Benchmarking XML – Final Presentation
5
RDBMS1
Information Systems Lab HS 2007
Final Presentation
© ETH Zürich | Benchmarking XML
11.02.08
Impression of the System
almost all queries
work with few changes
update queries were
surprisingly easy to
adapt
11.02.08
Benchmarking XML – Final Presentation
7
Impression of the System (contd.)
not supported:
11.02.08
type-switch
(limited schema
support)
user-defined functions
Benchmarking XML – Final Presentation
8
Current Performance
datamining
about one order of
magnitude slower than
Sedna
update and search
11.02.08
seem a bit faster (but
still slower than others)
Benchmarking XML – Final Presentation
9
Tuning possibilities
any XPath expression
can be indexed
Indexes seem to be
based on rows rather
than on trees
11.02.08
Benchmarking XML – Final Presentation
10
Issue with Indexing
Indexes help only with
„split“-tables, but they
are slower in general
11.02.08
Benchmarking XML – Final Presentation
11
Issues
„When the only tool you
own is a hammer, every
problem begins to
resemble a nail.“
Abraham Maslow
11.02.08
Benchmarking XML – Final Presentation
12
Issues with Joins
there is only
Nested-Loops-Join
no use of index as
soon as a join is
needed
joins for almost
anything
11.02.08
Benchmarking XML – Final Presentation
13
Summary
almost anything works
(even the adapter for
XCheck!)
everything is slow
11.02.08
Benchmarking XML – Final Presentation
14
Conclusion
RDBMS1 is not suited
for TpcX-Benchmark
XML storage as a
improvement for
relational data but not
as stand-alone system
11.02.08
Benchmarking XML – Final Presentation
15
Sedna
Information Systems Lab HS 2007
Final Presentation
© ETH Zürich | Benchmarking XML
11.02.08
Overview
Free native XML Database
No Schema support
Bulk-Load (native XML data storage)
Document Collections
Indexing
Full-Text indexing (dtSearch)
11.02.08
Benchmarking XML – Final Presentation
Impression
Good Introduction Example
Few Reference Material
Active Development Team
11.02.08
Benchmarking XML – Final Presentation
XQuery Support
Most of the queries worked with a few changes
Not supported:
11.02.08
Schema Import
FLWR-Expression with Update-Statement
Benchmarking XML – Final Presentation
Indexing (value Indices)
Based on B-Tree
For Elements and Attribute Values
Managing:
Create Index on Nodes by Keys
Query executer does not support indexes automatically
- use „index-scan“ function in XQuery
11.02.08
Benchmarking XML – Final Presentation
Indexing (cont.)
gainsPerMonth
100
1’000
Normal
0.36
27.53
With Indices
0.08
0.52
11.02.08
10’000
50’000
100’000
5.14
25.71
65.58
Benchmarking XML – Final Presentation
Indexing (Full-Text Indices)
Sedna provides Full-Text Indices with dtSearch
dtSearch: commercial text retrieval engine
11.02.08
No free download
Benchmarking XML – Final Presentation
Conclusion
Easy to start with the system
Few reference material
Most of the queries work with a few changes
Execution time grows exponentially with larger
dataset
Value indices deliver better execution times
11.02.08
Benchmarking XML – Final Presentation
MonetDB
Information Systems Lab HS 2007
Final Presentation
© ETH Zürich | Benchmarking XML
11.02.08
Overview & impression of the system
well documented installation / usage
many xquery features not supported
good performance
xml schema support, but no noticed
performance or functionality effect
no support for user defined indexing (”automatic
and self-tuning indexes”)
11.02.08
Benchmarking XML – Final Presentation
Architecture
MonetDB: Open-source database system for
high-performance applications in data mining,
OLAP, XML Query, test and multimedia retrieval.
Provides the databse functionality using the MILinterface (MonetDB Interpreter Language).
Pathfinder: XQuery compiler that translates
xquery expressions into relational algebra and
calls MIL functions.
11.02.08
Benchmarking XML – Final Presentation
XQuery support
… quite complete support for XQuery language…
monetdb.cwi.nl
Not supported functions:
Date/Time functions (0/76)
String functions (21/32) fn:contains, fn:tokenize
Sequence functions (11/19) fn:insert-before
…
11.02.08
Benchmarking XML – Final Presentation
XML data import
pf:add-doc("url", "file", x%)
need x > 0 for update queries
-> need to adapt xcheck
influence on performance not clear
11.02.08
Benchmarking XML – Final Presentation
Performance
...often achieves a 10fold raw speed
improvement for SQL
and XQuery over
competitor
RDBMSs...
monetdb.cwi.nl
11.02.08
Benchmarking XML – Final Presentation
Scalability
11.02.08
Benchmarking XML – Final Presentation
Conclusions
Very fast, good for large documents and
expensive queries
Small documents: no drawback compared to
other DBMSs
Big problem: lack of function support
If xquery function support gets better, it’s probably
the database of our choice!
11.02.08
Benchmarking XML – Final Presentation
Project Summary
Information Systems Lab HS 2007
Final Presentation
© ETH Zürich | Benchmarking XML
11.02.08
Project Summary
RDBMS1
slow but can process almost anything.
XML as a feature.
Sedna
quite fast, can process a reasonable part of XML.
MonetDB
11.02.08
very fast, but only limited capabilities.
Benchmarking XML – Final Presentation