* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Download Techwave_2005_am18a_ASAInternals
Serializability wikipedia , lookup
Microsoft Access wikipedia , lookup
Oracle Database wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
Open Database Connectivity wikipedia , lookup
Functional Database Model wikipedia , lookup
Concurrency control wikipedia , lookup
Microsoft Jet Database Engine wikipedia , lookup
Ingres (database) wikipedia , lookup
Versant Object Database wikipedia , lookup
ContactPoint wikipedia , lookup
Microsoft SQL Server wikipedia , lookup
Clusterpoint wikipedia , lookup
Relational model wikipedia , lookup
AM18
ASA INTERNALS: DATA MANAGEMENT
GLENN PAULLEY, DEVELOPMENT MANAGER
paulley@ianywhere.com
AUGUST 2005
Goals of this presentation
Overview of data management and query processing in
Adaptive Server Anywhere 9.0.2
Concentrate on performance issues and problem areas
Provide an overview of SQL Anywhere 9.0 technology
Highlight planned features for the Jasper release
Agenda
Section One: SQL language support, data management
Section Two: query execution and optimization
2
Design goals of SQL Anywhere Studio
Ease of administration
Good out-of-the-box performance
“Embeddability” features self-tuning
Cross-platform support
Interoperability
3
Motivation for the ASA 9.0 release
Exploit the new architecture of 8.0 and add support for
additional language features, including
GROUP BY ROLLUP
RECURSIVE UNION
Window functions and other OLAP support
XML
Table Functions
INTERSECT and EXCEPT
ORDER BY, SELECT TOP N in any query block, including views
Improve performance
4
Highlights of the ASA 9.0 releases
HTTP server
ASA Index Consultant
Improved performance, scalability
better scalability in OLTP environments
Query processing improvements
optimization refinements – particularly with the server’s cost model
histograms modified according to update DML statements
alternate, efficient execution methods for complex queries
SNMP support
9.0.1 EBF build 1828, Windows platforms only
Formally part of the 9.0.2 release
5
Performance, performance,
performance
Version comparison, 10GB DB, Minutes
15.0
13.0
11.0
9.0
7.0
5.0
3.0
1.0
-1.0
Q01
Q02
7.0.4.2788
14.6
1.1
Q04
Q05
Q06
Q10
Q11
Q12
Q14
Q15
Q16
Q17
1068. 20.7
Q03
52.8
1.0
515.2 90.2 825.1 29.1
8.0.0.2065
7.7
1.0
8.1
9.0.0.1073
4.6
2.6
3.1
9.0.1.1751
4.2
0.7
10.0.1212
3.8
0.6
Q07
Q08
Q09
16.1
12.8 177.8
3.8
1.2
2.9
8.3
227.3 1500. 1500. 1500. 1500. 412.2
6.8
7.9
2.7
672.7
9.2
1.9
6.5
13.5
2.5
4.9
5.2
6.0
1500. 1500. 1500. 1500. 1500. 408.6
2.4
3.3
1.0
3.2
3.4
6.2
3.5
0.7
2.4
3.7
0.3
0.5
2.6
4.7
14.1
3.2
1.5
8.9
0.9
3.5
5.7
1.9
2.8
1.2
3.3
2.9
4.7
2.5
0.5
1.9
1.5
0.4
1.5
1.5
2.2
6.7
2.3
1.9
6.6
0.7
2.6
2.2
1.7
2.4
1.0
2.9
2.5
4.2
2.0
0.5
1.8
1.5
0.3
0.6
1.2
1.4
4.5
1.9
1.7
5.8
1.1
2.1
717.9 13.6
Q13
Q18
Q19
Q20
Q21
Q22
Avg
6
Contents
Language Support
New SQL constructs supported with 9.0.1
Data Management in 9.0.1
Database organization
Table storage organization
Index storage organization
Physical database design tips
Jasper features
7
New SQL language support in 9.0.1
Table functions (SELECT over a stored procedure)
ORDER BY clause now supported in all SELECT blocks
Necessary to support SELECT TOP n in derived tables, views,
and subqueries with correct semantics
RECURSIVE UNION (bill-of-materials) queries
INTERSECT and EXCEPT query expressions
LATERAL keyword for derived tables
Now necessary for derived tables or table expressions containing
outer references
WITH clause (common table expressions)
Essentially in-lined view definitions
8
New SQL language support in 9.0.1
SELECT TOP n START AT m
Equivalent functionality to that in MySQL, Postgres
n and m can be variables or host variables
WITH INDEX hint in FROM clause
Named CHECK, PK, FK, UNIQUE constraints
Constraint violation message refers to the constraint name
New catalog tables:
SYSCONSTRAINT contains information about all constraints, even
referential integrity constraints
SYSCHECK contains the body of the CHECK constraint; now permit
multiple CHECK constraints on the same column(s)
Specific CHECK constraint that is violated appears in error
Not available in older database formats, even if DBUPGRAD is
used
9
New SQL language support in 9.0.1
OLAP support
VARIANCE, STD_DEV aggregate functions
ORDER BY clause for LIST aggregate function
GROUP BY
ROLLUP, CUBE, GROUPING SETS
Binary set functions (linear regression, co-variance, etc.)
Rank functions
Windowed aggregate functions
Construct “moving average” results in a single SQL statement
Support for multiple DISTINCT aggregate functions in a single
SELECT block
Necessitates the use of Hash Group By
10
New SQL language support in 9.0.1
Support for SET statement in Transact-SQL dialect stored
procedures
Implemented for MS SQL Server compatibility
EXECUTE IMMEDIATE extensions
Procedures can now use EXECUTE IMMEDIATE to execute
dynamically-constructed queries which return a result set
WITH ESCAPES ON | OFF
WITH QUOTES ON | OFF
Variable assignment permitted in UPDATE statements (8.0.1)
SELECT INTO base-table
11
New SQL language support in 9.0.1
FOR XML AUTO, FOR XML RAW, FOR XML EXPLICIT,
OPENXML procedure (supports XPATH queries over XML
column values)
SQLX functionality: xmlelement(), xmlforest(), xmlgen(),
xmlconcat(), and xmlagg()
EXPRTYPE() function – outputs the type of the expression
argument
Useful when defining computed columns
LOCATE() can handle negative offsets
INSERT WITH AUTO NAME (8.0.2)
12
Table functions
SELECT *
FROM SYS.SYSTABLE as st, sa_table_fragmentation() as tbfrg
WHERE st.table_name = tbfrg.tablename
Result set description determined from the catalog; result set
must match exactly
Otherwise SQLSTATE ‘WP012’
Workaround: use the WITH clause to annotate the procedure
reference in the FROM clause:
SELECT * FROM PROC() WITH( X Integer, Y char(17) )
13
Table functions
Procedure may return only one result set
Statistics regarding cost, result set cardinality of the procedure
are captured at run time; used for subsequent requests
Statistics are stored in SYS.SYSPROCEDURE
Minimally requires DBUPGRAD of older databases to 9.0.0
14
Recursive UNION
SQL-2003 implementation of recursive (bill-of-materials) queries
Only DB2 also offers RECURSIVE UNION support; Oracle
implements a ‘cycle’ clause
Uses specialized join operators: recursive hash inner and outer
joins
will utilize a nested-loop strategy if inputs are small; done adaptively at
run-time during query execution
WITH RECURSIVE r (level, emp_id, manager_id) as (
SELECT 1, emp_id, manager_id
FROM employee
WHERE emp_id = manager_id
UNION ALL
SELECT level+1, e.emp_id, e.manager_id
FROM employee e JOIN r ON (e.manager_id = r.emp_id)
WHERE e.emp_id <> e.manager_id and level < 3)
SELECT * FROM r
15
Recursive UNION: restrictions
Query expression must be UNION ALL
Recursive reference must be in a query block that does not
contain DISTINCT, aggregation, or an ORDER BY clause
Recursive reference in a LEFT OUTER JOIN is permitted
Schema of WITH clause must match recursive query
Implicit type conversions involving truncation can yield undesired
results; SQLSTATE 42WA2 returned if server detects a type
mismatch
Use CAST to ensure compatible types
Infinite queries are possible; server kills the query after N
recursions
controlled by the new connection option
MAX_RECURSIVE_ITERATIONS (default 100)
16
INTERSECT and EXCEPT
Implement set/bag difference and set/bag intersection
Both ALL and DISTINCT variants are supported; DISTINCT
performed by default
Form query expressions in the same fashion as UNION
NULL treated as a special value in each domain, hence NULLs
are equivalent to each other
Useful when formulating queries that require counting of
identical rows
See the help for order-of-precedence amongst the set
operators
17
EXCEPT and INTERSECT ALL
Rewrite to transform ALL to DISTINCT done automatically by
the optimizer
Both EXCEPT and INTERSECT can be computed through
either a merge or hashing technique
Also supports an (expensive) nested-loop strategy in case a
cache shortage is encountered
With ALL variants:
implicitly performs aggregation to count the number of duplicate
rows in each input
A new query execution operator, ROW REPLICATE, generates
the required copies of each row
SELECT description FROM product
EXCEPT ALL
SELECT description FROM product as p2 WHERE quantity < 15
18
GROUP BY ROLLUP
Computes aggregates as usual, but result set contains
multiple sets of groups
Logically, grouping is performed N+1 times for N grouping
expressions
Essentially implements the functionality of COBOL Report
Writer in a single SQL request
SELECT state, zip, count(*), grouping(zip), grouping(state)
FROM customer
GROUP BY ROLLUP (state, zip)
19
GROUP BY CUBE
Computes aggregates as usual, but result set contains the
power set of the N grouping expressions
Expensive to execute for large N
Result can be restricted through the specification of
GROUPING SETS
SELECT state, zip, count(*), grouping(zip), grouping(state)
FROM customer
GROUP BY CUBE (state, zip)
SELECT state, zip, count(*), grouping(zip), grouping(state)
FROM customer
GROUP BY GROUPING SETS ( (state, zip), state, zip, () )
20
WINDOW functions
Part of SQL OLAP extensions
Computes aggregates (except LIST) over a window of rows
Provides an ANSI-compliant way to number the rows of a result set
ROW_NUMBER() rather than NUMBER(*)
Useful to:
Compute cumulative aggregates, or “moving averages”
Eliminate the need for correlated subqueries involving aggregation
21
WINDOW functions
List employees, by department, in four US states by their start
dates, along with their cumulative salaries:
SELECT dept_id, emp_lname, start_date, salary,
SUM(salary) OVER (PARTITION BY dept_id ORDER BY start_date
RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS “Sum_Salary"
FROM employee
WHERE state IN ('CA', 'UT', 'NY', 'AZ') AND dept_id IN ('100', '200')
ORDER BY dept_id, start_date;
22
WINDOW functions
List all orders (with part information) where the part quantity
cannot cover the maximum single order for that part:
SELECT o.id, o.order_date, p.*
FROM sales_order o, sales_order_items s, product p
WHERE o.id = s.id and s.prod_id = p.id
and p.quantity < (SELECT max(s2.quantity)
FROM sales_order_items s2
WHERE s2.prod_id = p.id)
ORDER BY p.id, o.id
SELECT order_qty.id, o.order_date, p.*, max_q
FROM ( SELECT s.id, s.prod_id,
MAX(s.quantity) OVER (partition BY s.prod_id order by s.prod_id) AS max_q
FROM sales_order_items s) as order_qty,
product p,
sales_order o
WHERE p.id = prod_id and o.id = order_qty.id and p.quantity < max_q
ORDER BY p.id, o.id
23
WINDOW functions
Find the salespeople with the best sales (total amount) for each
product, including ties:
SELECT s.prod_id, o.sales_rep, SUM(s.quantity) as total_quantity, SUM(s.quantity * p.unit_price) as total_sales
FROM sales_order o KEY JOIN sales_order_items s KEY JOIN product p
GROUP BY s.prod_id, o.sales_rep
HAVING total_sales = (SELECT FIRST SUM(s2.quantity * p2.unit_price) as sum_sales
FROM sales_order o2 KEY JOIN sales_order_items s2 KEY JOIN product p2
WHERE s2.prod_id = s.prod_id
GROUP BY o2.sales_rep
ORDER BY sum_sales DESC )
ORDER BY s.prod_id
SELECT v.prod_id, v.sales_rep, v.total_quantity, v.total_sales
FROM ( SELECT o.sales_rep, s.prod_id, SUM(s.quantity) as total_quantity,
SUM(s.quantity * p.unit_price) as total_sales,
RANK() OVER (PARTITION BY s.prod_id
ORDER BY SUM(s.quantity * p.unit_price) DESC) as sales_ranking
FROM sales_order o KEY JOIN sales_order_items s KEY JOIN product p
GROUP BY o.sales_rep, s.prod_id ) as v
WHERE sales_ranking = 1
ORDER by v.prod_id
24
Data Management in 9.0.2
25
Moving to ASA 9.0.2
If database is 8.0.2, unload/reload to 9.0 is largely
unnecessary
DBUPGRAD to 9.0 required for some catalog schema changes, in
particular for the Index Consultant
There should be no consequences of using DBUPGRAD with
respect to performance
However:
only 9.0 format databases support named constraints
only 9.0 format databases support cache warming
only 9.0.1 databases support page checksums
8.0.2 databases do not support index statistics collection by
default
Can be turned on when creating the database via CREATE
DATABASE (but not dbinit)
26
Moving to ASA 9.0.2
Otherwise, unload/reload from 8.0.1 or 8.0.0
recommended
Clustered index support
Better statistics management
Improved histogram organization, statistics collection
Index statistics kept persistent in the database file
Improved histograms
Cache warming on startup
Checksums on database pages
PCTFREE option for base and temporary tables
27
Moving to SQL Anywhere “Jasper”
The Jasper release of the SQL Anywhere server will not
support older database formats
Jasper will ship with a migration tool to convert an existing
database into a Jasper-format database
28
Database organization
A database consists of up to 13 “dbspaces”
Maximum size of each dbspace is limited by the underlying operating
system
Maximum database size is also determined by page size
Limit for any dbspace is 2**28 (256 million) pages
Each dbspace, the temporary file, and the transaction log is a simple
OS file
Ease of administration, backup
Temporary file is used for temporary tables
A dbspace file grows in 256K extents (512K if 16K pages, 1Mb if 32K
pages)
Database files can be copied to/from different endian machines
Can copy database from Wintel to big-endian UNIX systems and back
again
Server automatically does data conversion where necessary
29
Database organization
A database file contains:
table pages
index pages
free pages
rollback log pages
checkpoint log pages
Each dbspace for a database must use the same page size
30
Physical organization: tables
Each table uses an independent set of table pages
Each table allocates at least one page, even if the table is empty
Server maintains bit-maps for table pages
Supports clustering of table pages in the same portion of the
database file
Facilitates large-block I/O – SQL Anywhere reads 64K at a time
when doing sequential scans
Result: considerably faster sequential scan performance
31
Physical organization: tables
New in 8.0.2: ‘scattered read’ support on Windows 2000 and
Windows XP
Another mainframe technology being reinvented on PC/UNIX
servers
aka “locate-mode I/O”
Improves performance, reduces memory requirements
Coming to other platforms as vendors implement it
Tables cannot span dbspaces
Each secondary index on a table can be stored in a separate
dbspace
Recommended if multiple spindles are available (not necessary
for RAID devices)
Partition dbspaces on separate devices whenever possible
Brings more disk arms to bear, reducing seek latency
32
Physical organization: tables
Rows are inserted into pages at a point where, if at all
possible, the entire row can be stored contiguously
Caveat: row segments are at most 4K; second or subsequent row
segments can appear on different pages
Columns are packed tightly together; only unpadded values
are stored on disk
Primary key columns are always at the beginning of each row,
in sequence
Server may rewrite all rows if PK added or modified
Rows can be of (almost) unlimited size; are split across pages
where necessary
Maximum length of any column is 2Gb
Maximum number of rows per page is 255
33
Physical organization: tables
Rows are not guaranteed to be placed in pages corresponding to
their insertion order
By default, ASA uses a first-fit algorithm for page selection
To guarantee ordering of a result set, specify an ORDER BY clause
Space is not reserved for columns that are null
BLOB values are stored in a separate “arena” of pages
First 255 bytes are stored together with the row
Access to the rest of the BLOB value will almost certainly require a SEEK
Implications for choice of page size
Once inserted, a row identifier is immutable
An updated row must be split if its new length does not allow it to fit on
the page
34
Physical organization: tables
Table pages are allocated in 8 page clusters; cluster allocation
depends on page size
2K: grow 4 clusters at a time
4K: grow 2 clusters at a time
All other page sizes: one cluster at a time
ASA will re-use database pages for additional inserts if entire pages
are freed
Defaults: for 1K pages, free space is 100 bytes; all other page sizes is
200 bytes
DBA can specify freespace percentage to accommodate future table
UPDATEs using PCTFREE
PCTFREE characteristic stored in new catalog table SYSATTRIBUTE
(and corresponding table SYSATTRIBUTENAME)
Can be specified for temporary tables
35
Page sizes
Page sizes supported are 1K, 2K, 4K, 8K, 16K, 32K
2K page size minimum on all UNIX platforms
Default changed to 2K in the 6.0.3 release
A server can support several databases concurrently
Buffer pool page size will be the largest database page size specified
on the command line
Consider tradeoffs with your choice of page size
4K recommended; occasionally 8K may offer improved
performance
Default will change to 4K with Jasper release
Do not use 16K or 32K pages unless you have a specialty
application
In typical environments, large page sizes cause inefficient use of cache
36
Choice of page size does matter
Larger rows usually require larger pages (requires fewer split
rows)
Random retrieval performance is dependent on the application
Larger pages can pollute the cache with unnecessary data
Often require larger buffer pools to accommodate the application’s
working set
Smaller pages are more cache efficient, but
Smaller pages reduce index fanout, and can increase index depth
37
Choice of page size does matter
Don’t ignore index maintenance costs when considering page
size (larger page sizes can mean increased cache pressure)
Test your application with different alternatives
Your mileage may vary
A 4K page size is a typical choice for many applications
My recommendation: use 4K pages unless thorough testing
proves that a different page size offers better
performance/scalability
See data storage whitepaper
Available at www.ianywhere.com/developer
Recently updated for 9.0.0
38
Physical organization: indexes
ASA 9.0 supports two different types of indexes:
Hash-based
Key is a one-way order-preserving encoding of at most nine bytes
of the data values
Hash-based indexes are still used when the key length does not
satisfy the limits for compressed indexes
Compressed
Contains Patricia tries in the index’s internal nodes
Used for keys > 10 bytes and less than
122 bytes with 1K pages
248 bytes for all other page sizes
Substantially improved performance with larger keys
39
Physical index organization: hashbased indexes
Values in an index are “hashed” into a key of at most 10 bytes
using an order-preserving encoding function
WITH HASH SIZE is deprecated
Each indexed column encoded separately, with a one-byte
length
A 10-byte hash value can hold two 32-bit integer values (including
two length bytes)
Hash values in an index are stored separately from the index
entry itself
The hash value for an identical secondary key is shared for
each index entry (row) in that index page
This improves fanout when data distribution is skewed
40
Physical index organization:
Compressed indexes
Internal nodes in the index contain a Patricia trie
PATRICIA: Practical Algorithm to Retrieve Information Coded
in Alphanumeric (D. R. Morrison, J. ACM Vol. 15, 1968)
Combines a binary trie with an optimization to skip over bit
comparisons that would result from one-way branching
Result: automatic compression of string data
Excellent fanout of internal nodes
Common substrings of key values have a negligible impact on
space requirements and performance
Superb performance improvements in many cases, especially
with composite primary and foreign keys
41
Clustered index support
First offered with the 8.0.2 release
At most one clustered index per table (may be a temporary table)
May be secondary index, PK, FK, UNIQUE constraint
Optimizer assumes PK indexes are clustered unless a different
clustering index exists
Engine will not attempt to maintain clustering on PK indexes unless
they are declared CLUSTERED
May be hash or compressed index
Clustering characteristic stored in SYSATTRIBUTE catalog table
CLUSTERED keyword can be used in both CREATE INDEX and
CREATE/ALTER TABLE statements
However, ALTER does not reorganize the table; use REORGANIZE
TABLE
42
Clustered index support
On INSERT/LOAD TABLE, server attempts to keep rows
physically adjacent in base table pages
Specification of PCTFREE on LOAD can be critical
Adjacency is NOT guaranteed; ORDER BY still requires a
physical sort or indexed retrieval
Can significantly improve performance
Optimizer costs clustered index access differently
Consider their use with queries that involve range predicates
Often useful with DATE or TIMESTAMP columns
Use REORGANIZE TABLE or UNLOAD/RELOAD if clustering
degrades over time
ALTER INDEX statement can rename an index or change its
clustering attribute
43
Physical index organization: fanout
and page size
Fanout refers to the number of index entries on a page
Lower fanout means greater index depth, and hence more
costly random retrieval
Fanout is affected by
Page size
Hash value size/trie compression
Distribution of key values
Index maintenance
Fanout can degrade over time
sa_index_density() procedure
44
Indexes and query processing
ASA does not store actual data values in the index
implies each base row must be retrieved to
Fetch the values of any attributes, or
To compare keys longer than the maximum hash value size
Indexes are automatically created to enforce referential
integrity
Primary keys, foreign keys, unique constraints
All related indexes must be the same type (hash or compressed)
Maximum number of indexes is dependent on page size
<= 4K: 2048 indexes
8K: 1024 indexes
16K: 512 indexes
32K: 256 indexes
45
Indexes and query processing
Each indexed column can be ascending or descending
Index is scanned backwards if the application scrolls in the
opposite direction, or an ORDER BY clause specifies the reverse
sequence
Support for merge and hash joins means that ASA will often
use sequential scans, rather than indexed retrieval
46
REORGANIZE Statement – base tables
REORGANIZE TABLE tablename
Defragments rows on-the-fly by removing/inserting groups of
rows in clustered index (or PK) order
Exclusive lock held on the table while a group is processed;
commits occur periodically to enable other applications to run,
checkpoints are suspended while the group is being
processed
Performs implicit COMMITs during operation
Rows will be in clustered sequence when operation is
complete (except possibly concurrent UPDATES)
Use new procedure sa_table_fragmentation() to discover
tables that warrant reorganization
47
REORGANIZE Statement - indexes
REORGANIZE TABLE tablename [ index specification ]
INDEX indexname
FOREIGN KEY indexname
PRIMARY KEY
Exclusive lock is held throughout
CHECKPOINTs are suspended
Reclaims space lost to update activity
Re-balances the index, especially important after many
DELETE operations
Use the new procedure sa_index_density() to identify indexes
that require reorganization
48
Data management improvements in
9.0.1
Better scalability – new lock-free cache manager
Substantially better performance across the board
Support for page checksums
New option for dbinit and CREATE DATABASE statement
Supported by dbvalid utility, and a new statement VALIDATE
CHECKSUM
Overhead: largely depends on CPU speed. Examples:
2.8 milliseconds per I/O for 32K pages
0.7 milliseconds per I/O for 8K pages
Improvements to dynamic cache sizing
Sampling rate changes with database growth or the starting of a
new database on the same server
49
Data management improvements in
9.0.1
Database cache warming feature
Two operational phases, collection and reload
During collection, page IDs are saved in the database as they are
accessed at startup
During reload, collected page IDs are read into cache as
background processing
Checks and balances used to prevent swamping the server with I/O
during server startup
Need to test performance before deploying
Cache warming is *enabled* by default
50
Data management improvements in
9.0.1
Optimistic locking introduced for WAIT_FOR_COMMIT
Controlled by a new connection option
OPTIMISTIC_WAIT_FOR_COMMIT
Temporary dbspace can be grown with ALTER DBSPACE
Can improve performance of complex queries by ensuring that the temp
file is not fragmented on disk
Size of temporary dbspace can be controlled with a governor
New public option TEMP_SPACE_LIMIT_CHECK (default OFF)
When OFF, engine’s default behaviour is to die with a DISK FULL error
Jasper release: default is ON
Server computes a temp space quota for each request; if quota is
exceeded and temporary dbspace is at least 80% of its maximum size,
request fails with SQLSTATE 54W05
Quota computed using amount of disk free space on that partition, and
number of active connections
Shipped in 9.0.0 build 1308, 9.0.1 build 1872, 8.0.3 build 4991
51
Data management improvements in
9.0.1
ALTER INDEX statement
Can rename an index, or alter its clustering attribute
Ability to create an index on a function
Automatically adds a computed column “column-name” to the
table
Creates an index on the computed column
Relies on the optimizer to replace any function occurrences with
the computed column
CREATE INDEX index-name
ON [owner.]table-name ( function( arg [, ...] ) AS column-name )
[{IN | ON} dbspace-name]
52
Data management improvements in
9.0.1
Non-transactional temporary tables
Unaffected by COMMIT or ROLLBACK; no entries made to
rollback log
Procedure, trigger, and view text can be hidden from other
users by using SET HIDDEN (8.0.2)
LOAD TABLE enhancements:
can be used on local temporary tables (8.0.2)
ORDER clause (8.0.2)
Control over which column histograms are built (9.0.0)
53
Data management improvements in
9.0.1
DEDICATED_TASK option (DBA-only, temporary only)
UUIDs and GUIDs can be used as surrogate keys - see
newid() function (8.0.2)
XML data type
SYSHISTORY system table
Statistics (depth, leaf pages) maintained on indexes in real
time (introduced in 8.0.2EBF)
Hash(), compress(), encrypt() builtin functions
Can be used to compress or encrypt individual string or binary
fields in the database
Values can be viewed, processed with decrypt() and
decompress() functions
54
Data management improvements in
9.0.1
ALTER DATABASE can now modify transaction log identically to
DBLOG utility
BACKUP and DBBACKUP can now rename the log copy
ALTER VIEW WITH RECOMPILE
Event handling improvements:
Two new parameters for event_parameter:
APPINFO
DisconnectReason: ‘from client’, ‘drop connection’, ‘liveness’, ‘inactive’,
‘connect failed’
New cost model for Ultralite requests
New DTT function based on analysis of several current models of pocket
PC devices
Equates random and sequential I/O to produce better Ultralite query
plans
55
Data management improvements in
9.0.2
Temporary stored procedures
they are visible only by the connection which creates them, and
are automatically dropped when the connection is dropped.
they can be explicitly dropped, but may not be ALTERed.
GRANT and REVOKE are not permitted on temporary
procedures.
they are not recorded in the catalog or in the transaction log
they can be created and dropped when connected to a read-only
database
a procedure owner cannot be specified for temporary procedures.
Rather, they are owned by the user that creates them.
temporary external procedures are not permitted
temporary procedures execute with the permissions of their
creator (i.e. the current user)
56
Data management improvements in
9.0.2
CREATE LOCAL TEMPORARY TABLE
defines a local temporary table which will persist until the end of a
connection, or until the table is explicitly dropped.
Intended for use inside procedures, functions, triggers
Similar to DECLARE LOCAL TEMPORARY table if executed outside of a
procedure context
UUIDs are now a native SQL Anywhere type
UUID_HAS_HYPHENS option
Controls formatting of UUIDs (UniqueIdentifier values) when converted to
strings
Disk-full callback support
MIN_TABLE_SIZE_FOR_HISTOGRAM is deprecated
New option COLLECT_STATISTICS_ON_DML_UPDATES
New option LOG_DEADLOCKS, sa_report_deadlocks() procedure
Enhancements to START DATABASE statement: WITH DISTINCT
SQLSTATE
57
Application profiling improvements in
9.0.2
Procedure profiling can now be performed for an individual
connection or user
call sa_server_option('Profile_connection',<connection-id>)
call sa_server_option('ProfileFilterUser','<userid>')
Request-level logging enhancements:
New –zn switch to retain n log files in a ring
Or use sa_server_option('RequestLogNumFiles',<n>)
Can log either text or the plan for expensive queries (9.0.2EBF)
-zx <cost> specifies the threshold cost, which if exceeded at either
optimization or execution time the statement is logged
Call sa_server_option(‘LogExpensiveQueries’)
When –zp is also specified, the plans are output; otherwise, only the
statement text is logged
58
Physical database design tips
59
Physical database design tips: file
placement
Database file placement
Place transaction log, database file(s), and temporary directory on
separate devices if possible
if using mirrored logging, ensure the two logs are on different
physical disks
Temporary file placement can dramatically affect performance
of complex queries
Use the ASTMP environment variable to specify location for
temporary file
Place on a different physical drive if possible
The more disk heads the better (RAID)
60
Physical database design tips: file
placement
Consider the use of caching disk controllers/NT striping/RAID
Consider the tradeoffs
Software striping offers better performance, but offers no recovery
advantages
RAID 5 tends to have poor write request latency: each I/O turns
into four write requests that take place serially
Not good for a transaction log
RAID 10 (1+0) offers much better performance, at the cost of
redundancy
61
File system considerations
Defragment your file system occasionally, especially after an
unload/reload
Database file fragmentation is now displayed in the console window
when the database is started
Preallocate large quantities of space in contiguous chunks through
the ALTER DBSPACE command
Less problematic with 256K block allocation in recent ASA releases
ALTER DBSPACE <dbspace-name> INSERT nnn {PAGES | KB | MB |
GB | TB}
Can also do this for the TEMPORARY DBSPACE
Use db_extended_property() function to determine
fragmentation/size of each dbspace individually (new in 9.0, also in
8.0.2.4215)
Can be done for temporary dbspace and the transaction log as well
62
File system considerations
Use caution when trying to run the database over a networked
drive!
Not all networks and/or operating systems guarantee network
packet ordering
Physical or logical corruption is likely
Can use “-r” (read-only) switch if necessary
SAN units are supported; they guarantee consistent semantics
Do not use cached filesystem writes unless persistence is
guaranteed
Corruption is virtually certain and database cannot be recovered;
will need to restore database from backup
63
Database fragmentation
ASA databases never shrink
Free pages will be reused for other purposes
Unload/reload will recover this unused space
If data is removed in the order it was inserted, fragmentation is
less likely
Avoid inserts of NULL values followed by updates with actual
data use PCTFREE if necessary
Repair fragmentation with unload/reload, or REORGANIZE
TABLE
Useful tools
DBINFO -u
stored procedure sa_table_fragmentation()
64
Physical database design tips: tables
Load table data in clustering order (by default, primary key
sequence)
Sorting automatically performed by DBUNLOAD and by the
REORGANIZE TABLE statement
New ORDER syntax for (UN)LOAD TABLE
Use 4K pages unless conditions warrant
Watch for ordering, placement of PK columns
Order in table dictates order in index
Changed in Jasper!
Rows are rewritten if PK columns, or column order, is changed
65
Physical database design tips: tables
Use of out-of-range default values instead of NULL
Reduces page fragmentation with updates
Can use PCTFREE as an alternative
Put large columns at end of row; fixed-size and frequentlyaccessed columns near start
Prevent seeks to another table page, required to access split rows
Choose your data types with care; tradeoff storage efficiency with
application requirements
For keys, alphanumeric strings are often more flexible
66
Physical database design tips: indexes
Compressed indexes prevent many of the problems with
relatively large or composite primary keys
However:
Surrogate keys can still be useful
Usually not a good idea for significant business objects to have
the same key format
Self-checking keys can simplify business processing
Watch for opportunities to specify a clustering index
Especially with date or timestamp columns used in range queries
Useful stored procedures:
sa_index_levels()
sa_index_density()
67
Physical database design tips:
surrogate keys
Consider surrogate keys when appropriate
Exploit autoincrement support, or develop self-checking keys
to simplify error detection
9.0 and 8.0.2 support automatic generation of universal unique
identifiers (UUIDs) as surrogate keys
Compatible with Microsoft’s implementation
New native domain: uniqueidentifier in 9.0.2
No longer necessary to use string conversion functions such as
uuidtostr(); type conversion done automatically
Tradeoff their characteristics with GLOBAL AUTOINCREMENT
68
Physical database design tips: foreign
keys
Foreign keys are essential to the optimization of complex
queries
Join selectivity and cardinality estimation is much more accurate
when foreign key constraints are present
Also enable a variety of query rewrite optimizations
But tradeoff using declarative referential integrity
Downside is the maintenance cost for indexes that are not utilized
in query processing
In rare situations, consider eliminating some RI and check
constraints once application is fully tested
69
Physical database design tips: triggers,
constraints
Use declarative referential integrity instead of triggers
Use CHECK constraints rather than triggers for simple
conditions
9.0 supports named constraints
Unnamed constraints are automatically named as ‘ASAnnn’
Mark columns as NOT NULL when appropriate
Don’t over-use CHECK constraints
e.g. in user-defined data types
Using a user-defined function in a CHECK constraint will
guarantee poor update performance
70
Server configuration tips: cache size
Dynamic cache sizing is instituted by default on platforms that
support it
Not supported for CE, Netware
Can override dynamic cache sizing as necessary
Server can dynamically adjust cache size depending on server workload;
this is more robust in 9.0.1
Use –ch to specify an upper bound larger than 256MB
If specifying cache size at startup:
Need to allow for OS and application overhead
CE has different defaults than other platforms
Java-enabled databases require a larger minimum cache for the Java
VM - 8Mb usually sufficient
Watch for NT File Cache competition
See white paper on memory usage (available at
http://www.ianywhere.com/developer)
71
Data management in Jasper
Statements concerning iAnywhere Solutions' new products are
forward-looking statements that involve a number of uncertainties
and risks and cannot be guaranteed. Factors that could ultimately
affect such statements are detailed from time to time in Sybase's
Securities and Exchange Commission filings, including but not
limited to its annual report on Form 10-K and its quarterly reports on
Form 10-Q (copies of which can be viewed on the Company's
website).
----------------------------------------------------All of the information in this presentation are forward-looking
statements, as defined above. As such, there is uncertainty
associated with if or when any of these features will be added to the
product.
72
Data management changes in Jasper
Default page size changed to 4K
New catalog implementation
Catalog base tables have been renamed
All catalog access by applications is through views
Catalog base tables are reorganized, more efficient
View dependencies on base tables and views are now tracked
Improved storage organization for BLOB columns
In-row BLOB prefix default is no longer fixed at 254:
CHAR/VARCHAR: minimum 8, maximum 128
BINARY/VARBINARY: minimum 0, maximum 256
can override on per-column basis
New storage architecture for long values, permits efficient random access
73
View dependency tracking
Three states for any view:
Valid: compiled and active, can be utilized in queries
Invalid: view has been invalidated by the server due to
dependency checking as a result of DDL on base tables
Upon reference, the server will attempt to compile the view and use it
if possible
Otherwise, query will get an error
Disabled: view has been explicitly disabled (via new statement,
DISABLE VIEW), and is unusable
View must be explicitly enabled in order to become valid (via new
statement, ENABLE VIEW)
74
View dependency tracking
Upon an ALTER (or DROP):
Server attempts to acquire an exclusive lock on the object to be
modified
Server honours the current setting of the BLOCKING option
Server then acquires exclusive locks on all dependent views
If any lock cannot be acquired, the statement gets an error
Once locked, all dependent views are invalidated
ALTER (or DROP) statement is executed
With ALTER, the server attempts to revalidate all the previously
invalidated views
Views successfully recompiled are marked as valid
Otherwise, the view is left in the invalid state
Server will attempt to recompile it when
First referenced in a server session, or
When other DDL is performed that may affect that view
75
Internationalization improvements
Support for NCHAR data type
NCHAR strings are stored as UTF-8
NCHAR specification and functions use character semantics, not byte
semantics
NCHAR(10) means 10 characters (1-4 bytes per character)
CHAR specification now supports either BYTE or CHAR modifier
E.g. CHAR(10 BYTE) or CHAR(23 CHAR)
NCHAR can support either
UCA (Unicode Collation Algorithm) using IBM’s ICU library
Properly supports multi-byte character sorting
A legacy collation stored as UTF-8
Database now can have two collations, one for NCHAR, one for CHAR
Details in session SQL506 Monday afternoon
76
Indexing changes
New index implementation
Improved implementation of compressed B-tree indexes
Key values are duplicated in the index to support index-only retrieval and
snapshot isolation
Older “hash”-based indexes have been dropped entirely
Index column order for primary keys now based on PK constraint
declaration, not column order in table
PK can be altered, reordered without rewriting all the rows in the table
Order specification can now be specified with any constraint index
e.g. PRIMARY KEY (X ASC, Y DESC, Z ASC)
Foreign key column order can now be different than that of PK
All indexes now appear in the SYSINDEXES view
Planned:
Ability to declare that a FK is unique (to enforce a 1:1 relationship)
Abstract indexes into logical and physical implementations
Redundant indexes will not be created
77
Shareable global temporary tables
Shared global temporary tables
New syntax:
CREATE GLOBAL TEMPORARY TABLE ….. SHARE BY ALL
The contents of the table will persist until explicitly deleted or until
the database is shut down. On database startup, the table will be
empty.
Row locking on shared temporary tables behaves the same as for
permanent tables
Inserts, updates and deletes on shared temporary tables are not
recorded in the transaction log
Column statistics are maintained in memory by the server.
78
Data management changes in Jasper
Last modification time for any row in a table now retained in
SYSTABLE
Resolution is one second
LOAD TABLE enhancements: better performance, ENCODING
option, ROW DELIMITED BY option
Apply multiple transaction logs at startup (can specify a directory)
Better row-level locking implementation
Elimination of key-range locking with anti-insert locks
Planned: introduction of INTENT locks (e.g. FETCH FOR UPDATE)
Improved administration of large databases:
Parallel backup
Auto-tuning to exploit multiple CPU’s on SMP hardware
Faster unload/reload, index creation, database validation
79
Database mirroring
Provides “hot” failover for a SQL Anywhere database
Involves two or three separate servers: primary, mirror, arbiter
Transaction log pages are passed from the primary server to the mirror to
keep the mirror up-to-date
Mirror server is not accessible by any other connections
Effectively the mirror server is in continuous recovery mode
Log pages can be passed in three modes:
Synchronously (default) on COMMIT
Asynchronously on COMMIT – better performance than synchronous mode
Asynchronously when log page is full, with a timeout option
Async implies the usual caveats with possible lost transactions
Role switch occurs if primary server fails
Arbiter used to verify the mirror state before role switch proceeds
Clients are disconnected from the primary server
Must reconnect to the mirror
See Techwave session SQL508 – High Availability ASA on Wednesday
80
Snapshot isolation support
Provides read-consistency in the face of concurrent writes from other
transactions (e.g. writers do not block readers)
Enabled by a global database option, allow_snapshot_isolation
Three new transaction isolation levels:
“snapshot” – cleanest semantics, transaction sees a consistent view of
the database as of transaction start (the time the first row was accessed)
“stmt-snapshot” – requires less resources, however each statement sees
a consistent state of the database but at different times
Only one snapshot time exists for a connection; outermost or first statement
sets the transaction time
“read-only-stmt-snapshot” – like stmt-snapshot, but only for queries;
update statements execute at isolation level 1
Usage is not free
Old copies of rows are maintained in a “row version store” (part of the
database’s temp file) for as long as necessary to ensure consistency for
any transaction
Indexes have a mix of “old” and “current” values
Can affect the performance of both sequential and index scans
81
Snapshot isolation support
Setting the isolation level:
set transaction isolation level snapshot
set transaction isolation level statement snapshot
set transaction isolation level read only statement snapshot
Or within an ODBC application, use
SA_SQL_TXN_SNAPSHOT
SA_SQL_TXN_STATEMENT_SNAPSHOT
SA_SQL_TXN_READ_ONLY_STATEMENT_SNAPSHOT
Update conflicts are still possible
Isolation levels can be mixed (but not recommended)
Database property VersionStorePages contains the number of pages
in the temp file devoted to copies of old rows
BLOB values do not reside in the temp file, but remain in the main
database file and are reference counted
Some restrictions on DDL when snapshot transactions are in
progress (ALTER TABLE, etc.)
82
Lazy CHECKPOINTs
A Jasper server can now initiate a CHECKPOINT and perform
other operations while it takes place.
In previous releases, all database activity would stop while the
CHECKPOINT took place.
There can only be one CHECKPOINT in progress at a time.
If a CHECKPOINT is already in progress, then any operation like
an ALTER TABLE or CREATE INDEX that wants to initiate a new
CHECKPOINT needs to wait for the last one to finish.
Lazy checkpoints are not used if using the –m option
Documented by START CHECKPOINT and FINISH
CHECKPOINT records in the transaction log
83
Application profiling and request-level
logging
Major enhancements in the Jasper release
Unified logging architecture
Can log data to a database, rather than a flat file
Can log data to a different database, even on another server
Much lower overhead
Considerably greater detail in diagnostic information
Lock contention
Statements within stored procedures and triggers
Elapsed times
Query plans
Planned improvements to DBCONSOLE for real-time server status
Attend sessions SQL501/514 Tuesday afternoon at 1:30
ASA Performance Analysis from Start to Finish
84
iAnywhere at TechWave 2005
Ask the iAnywhere Experts on the Technology Boardwalk (exhibit hall)
• Drop in during exhibit hall hours and have all your questions answered by our technical
experts!
• Appointments outside of exhibit hall hours are also available to speak one-on-one with
our Senior Engineers. Ask questions or get your yearly technical review – ask us for
details!
TechWave ToGo Channel
• TechWave ToGo, an AvantGo channel providing up-to-date information about TechWave
classes, events, maps and more –now available via your handheld device!
• www.ianywhere.com/techwavetogo
iAnywhere Developer Community - A one-stop source for technical information!
Access to newsgroups,new betas and code samples
• Monthly technical newsletters
• Technical whitepapers,tips and online product documentation
• Current webcast,class,conference and seminar listings
• Excellent resources for commonly asked questions
• All available express bug fixes and patches
• Network with thousands of industry experts
http://www.ianywhere.com/developer/
85
SQL Anywhere ‘Jasper’ Release
Learn more about 'Jasper', the upcoming SQL Anywhere release, loaded
with features focused on:
• Enhanced data management including performance, data protection, and
developer productivity
• Innovative data movement including manageability, flexibility and
performance, and messaging
Attend the following sessions:
SQL Anywhere 'Jasper' New Feature Overview
Session SQL512 will be held Monday, August 22nd, 1:30pm
MobiLink 'Jasper' New Feature Overview
Session SQL515 will be held Wednesday, August 24th, 1:30pm
... and remember to look for sneak peeks in other sessions and morning
education courses!
Register for the Jasper Beta program:
www.ianywhere.com/jasper
86
Questions
?
87