Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Cost Model
and
Estimating Result Sizes
מודל המחיר
Cost Model
• בהרצאה הראנו איך לחשב את המחיר של כל
שיטה )(join
• כדי לעשות זאת צריך לדעת את גודל היחסים,
שחלקם מתקבלים כתוצאות ביניים
• לפיכך ,יש צורך לחשב את הגודל של תוצאות ביניים
• עכשיו נסביר איך מעריכים את גודל התוצאה
בחירת תוכנית לחישוב
צירוף של שלושה יחסים
• רוצים לחשב צירוף של שלושת היחסים Sailors, Reserves :ו-
Boats
• שתי האפשרויות (תוך התעלמות מסדר היחסים בפעולת הצירוף
הראשונה) הנן:
(Sailors Reserves) Boats
)Sailors (Reserves Boats
• ההחלטה מהי התוכנית הזולה יותר תלויה בין היתר בשאלה איזה
תוצאת ביניים הנה קטנה יותר
אנליזה של גודל התוצאות
• צריך להעריך את גודל התוצאה של הצירוף
) (Sailors Reservesלעומת גודל
התוצאה של הצירוף (Reserves
)Boats
• ה DBMS -שומר סטטיסטיקות לגבי היחסים
והאינדקסים
Statistics Maintained by DBMS
• Cardinality: Number of tuples NTuples(R) in each
relation R
• Size: Number of pages NPages(R) in each relation
R
• Index Cardinality: Number of distinct key values
NKeys(I) for each index I
• Index Size: Number of pages INPages(I) in each
index I
• Index Height: Number of non-leaf levels
IHeight(I) in each B+ Tree index I
• Index Range: The minimum value ILow(I) and
maximum value IHigh(I) for each index I
Note
• The statistics are updated periodically
(not every time the underlying relations
are modified).
• We cannot use the cardinality for
computing
select count(*)
from R
Estimating Result Sizes
• Consider SELECT attribute-list
FROM relation-list
WHERE term1 and ... and termn
• The maximum number of tuples is the product
of the cardinalities of the relations in the
FROM clause
• The WHERE clause is associating a reduction
factor with each term. It reflects the impact
of the term in reducing result size.
Result Size
• Estimated result size:
maximum size
X
the product of the reduction factors
Assumptions
• There is an index I1 on R.Y and index I2
on S.Y
• Containment of value sets: if
NKeys(I1)<NKeys(I2) for attribute Y,
then every Y-value of R will be a Y-value
of S
Estimating Reduction Factors
• column = value: 1/NKeys(I)
– There is an index I on column.
– This assumes a uniform distribution.
– Otherwise, use 1/10.
• column1 = column2:
1/Max(NKeys(I1),NKeys(I2))
– There is an index I1 on column1 and an index I2 on
column2.
– Containment of value sets assumption
– If only one column has an index, we use it to estimate
the value.
– Otherwise, use 1/10.
Estimating Reduction Factors
• column > value: (High(I)-value)/(High(I)Low(I)) if there is an index I on column.
Example
Reserves (sid, agent), Sailors(sid, rating)
SELECT *
FROM Reserves R, Sailors S
WHERE R.sid = S.sid and S.rating > 3 and
R.agent = ‘Joe’
• Cardinality(R) = 100,000
• Cardinality(S) = 40,000
• NKeys(Index on R.agent) = 100
• High(Index on Rating) = 10, Low = 0
Example (cont.)
• Maximum cardinality: 100,000 * 40,000
• Reduction factor of R.sid = S.sid: 1/40,000
– sid is a primary key of S
• Reduction factor of S.rating > 3: (10–3)/(10-0) =
7/10
• Reduction factor of R.agent = ‘Joe’: 1/100
• Total Estimated size: 700
Database Tuning
Database Tuning
• Problem: Make database run efficiently
• 80/20 Rule: 80% of the time, the
database is running 20% of the queries
– find what is taking all the time, and tune
these queries
Solutions
• Indexing
– this can sometimes degrade performance.
why?
• Tuning queries
• Reorganization of tables; perhaps
"denormalization"
• Changes in physical data storage
Denormalization
• Suppose you have tables:
– emp(eid, ename, salary, did)
– dept(did, budget, address, manager)
• Suppose you often ask queries which require
finding the manager of an employee. You might
consider changing the tables to:
– emp(eid, ename, salary, did, manager)
– dept(did, budget, address, manager)
- in emp, there is an fd did -> manager. It is not 3NF!
Denormalization (cont’d)
• How will you ensure that the redundancy
does not introduce errors into the
database?
Creating Indexes Using Oracle
Index
• Map between
– a key of a row
– the location of the data on the row
• Oracle has two kinds of indexes
– B+ tree
– Bitmap
• Sorted
B+ tree
Root
13
2*
3*
5*
7*
14* 16*
17
24
19* 20* 22*
30
24* 27* 29*
33* 34* 38* 39*
Creating an Index
• Syntax:
create [bitmap] [unique] index index on
table(column [,column] . . .)
Unique Indexes
create unique index rating_bit on Sailors(rating);
• Create an index that will guarantee the
uniqueness of the key. Fail if any duplicate
already exists.
• When you create a table with a
– primary key constraint or
– unique constraint
a "unique" index is created automatically
Bitmap Indexes
• Appropriate for columns that may have very few
possible values
• For each value c that appears in the column, a
vector v of bits is created, with a 1 in v[i] if the
i-th row has the value c
– Vector length = number of rows
• Oracle can automatically convert bitmap entries
to RowIDs during query processing
Bitmap Indexes: Example
Sid
Sname
age
rating
12
Jim
55
3
13
John
46
7
14
Jane
46
10
15
Sam
37
3
create bitmap index rating_bit on Sailors(rating);
• Corresponding bitmaps:
– 3: <1 0 0 1>
– 7: <0 1 0 0>
– 10: <0 0 1 0>
When to Create an Index
• Large tables, on columns that are likely to
appear in where clauses as a simple
equality
• where s.sname = ‘John’ and s.age = 50
• where s.age = r.age
Function-Based Indexes
• You can't use an index on sname for the
following query:
select *
from Sailors
where UPPER(sname) = 'SAM';
• You can create a function-based index to speed
up the query:
create index upp_sname on Sailors(UPPER(sname));
Index-Organized Tables
• An index organized table keeps its data sorted
by the primary key
• Rows do not have RowIDs
• They store their data as if they were an index
create table Sailors(
sid number primary key,
sname varchar2(30),
age number,
rating number)
organization index;
Index-Organized Tables (2)
• What advantages does this have?
– Enforce uniqueness: primary key
– Improve performance
• What disadvantages?
– expensive to add column, dynamic data
• When to use?
– where clause on the primary key
– static data
Clustering Tables Together
• You can ask Oracle to store several tables close
together on the disk
• This is useful if you usually join these tables
together
• Cluster: area in the disk where the rows of the
tables are stored
• Cluster key: the columns by which the tables
are usually joined in a query
Clustering Tables Together:
Syntax
• create cluster sailor_reserves (X number);
– Create a cluster with nothing in it
• create table Sailors(
sid number primary key,
sname varchar2(30),
age number,
rating number)
cluster sailor_reserves(sid);
–
create the table in the cluster
Clustering Tables Together:
Syntax (cont.)
• create index sailor_reserves_index on cluster
sailor_reserves
– Create an index on the cluster
• create table Reserves(
sid number,
bid number,
day date,
primary key(sid, bid, day) )
cluster sailor_reserves(sid);
– A second table is added to the cluster
Sailors
Reserves
sid
sname
rating
age
sid
bid
day
22
Dustin
7
45.0
22
102
7/7/97
31
Lubber
8
55.5
22
101
10/10/96
58
Rusty
10
35.0
58
103
11/12/96
Stored
sid
sname
rating age
bid day
22
Dustin
7
102 7/7/97
45.0
101 10/10/96
31
Lubber
8
55.5
58
Rusty
10
35.0
103 11/12/96
The Oracle Optimizer
Types of Optimizers
• There are different modes for the optimizer
ALTER SESSION SET optimizer_mode =
{choose|rule|first_rows(_n)|all_rows}
• RULE: Rule-based optimizer (RBO)
– deprecated
• CHOOSE: Cost-based optimizer (CBO); picks a
plan based on statistics (e.g. number of rows in a
table, number of distinct keys in an index)
– Need to analyze the data in the database using
analyze command
Types of Optimizers
• ALL_ROWS: execute the query so that all of
the rows are returned as quickly as possible
– Merge join
• FIRST_ROWS(n): execute the query so that all
of the first n rows are returned as quickly as
possible
– Block nested loop join
Analyzing the Data
analyze table | index
<table_name> | <index_name>
compute statistics |
estimate statistics [sample <integer>
rows | percent] |
delete statistics;
analyze table Sailors estimate statistics sample
25 percent;
Viewing the Execution Plan
(Option 1)
• You need a PLAN_TABLE table. So, the first
time that you want to see execution plans, run
the command:
@$ORACLE_HOME/rdbms/admin/utlxplan.sql
• Set autotrace on to see all plans
– Display the execution path for each query, after
being executed
Viewing the Execution Plan
(Option 2)
• Another
option:
explain
plan
set statement_id=‘<name>’
for <statement>
explain plan
set statement_id='test'
for
SELECT *
FROM Sailors S
WHERE sname='Joe';
Select Plan_Table
Operations that Access Tables
• TABLE ACCESS FULL: sequential table scan
– Oracle optimizes by reading multiple blocks
– Used whenever there is no where clause on a query
select * from Sailors
• TABLE ACCESS BY ROWID: access rows by
their RowID values.
– How do you get the rowid? From an index!
select * from Sailors where sid > 10
Types of Indexes
• Unique: each row of the indexed table
contains a unique value for the indexed
column
• Nonunique: the row’s indexed values can
repeat
Operations that Use Indexes
• INDEX UNIQUE SCAN: Access of an
index that is defined to be unique
• INDEX RANGE SCAN: Access of an index
that is not unique or access of a unique
index for a range of values
When are Indexes Used/Not Used?
• If you set an indexed column equal to a value, e.g.,
sname = 'Jim'
• If you specify a range of values for an indexed
column, e.g., sname like 'J%'
– sname like '%m': will not use an index
– UPPER(sname) like 'J%' : will not use an index
– sname is null: will not use an index, since null values are
not stored in the index
– sname is not null: will not use an index, since every value
in the index would have to be accessed
When are Indexes Used? (cont)
• 2*age = 20: Index on age will not be used. Index
on 2*age will be used.
• sname != 'Jim': Index will not be used.
• MIN and MAX functions: Index will be used
• Equality of a column in a leading column of a
multicolumn index. For example, suppose we have
a multicolumn index on (sid, bid, day)
– sid = 12: Can use the index
– bid = 101: Cannot use the index
When are Indexes Used?
(cont)
• If the index is selective
– A small number of records are associated
with each distinct column value
Hints
Hints
• You can give the optimizer hints about
how to perform query evaluation
• Hints are written in /*+ */ right after
the select
• Note: These are only hints. The oracle
optimizer can choose to ignore your hints
Hints
• FULL hint: tell the optimizer to perform a
TABLE ACCESS FULL operation on the
specified table
• ROWID hint: tell the optimizer to perform a
TABLE ACCESS BY ROWID operation on the
specified table
• INDEX hint: tells the optimizer to use an indexbased scan on the specified table
Examples
Select /*+ FULL (sailors) */ sid
From sailors
Where sname=‘Joe’;
Select /*+ INDEX (sailors) */ sid
From sailors
Where sname=‘Joe’;
Select /*INDEX (sailors s_ind) */ sid
From sailors S, reserves R
Where S.sid=R.sid AND sname=‘Joe’;