* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download What is a Database System?
		                    
		                    
								Survey							
                            
		                
		                
                            
                            
								Document related concepts							
                        
                        
                    
						
						
							Transcript						
					
					Query optimisation Query optimisation 1 Query optimisation Example - hospital database Doctors Name Office … L. Johnson Sur1-left … … … … P. Thomson IC100 … … C. Craig Int-100 100 tuples Patients 2500 tuples Name Disease D_name … C. Reed M. Fox C. Fish … … P. Wolf prk11 blood press stomach-ul … … kidn-fail P. Thomson L. Johnson L. Johnson … … U. Ulrich … … … … … … 2 Query optimisation Query Get the name of the doctors who treat patients suffering from the prk11 disease SELECT Doctors.name FROM Doctors, Patients WHERE Disease = ‘prk11’ AND D_name = Doctors.Name 3 Query optimisation Evaluation #1  restrict Patients to those who suffer from prk11 • read: 2500 tuples; result: estimated 50 tuples; no need to write intermediate result - sufficiently small  join above result with Doctors • read: 100 tuples (Doctors); result 50 tuples; no need to write to disk intermediate result  project result over Doctors.name • the desired result is in the memory  estimated cost (read and write) 2600 4 Query optimisation Evaluation #2 • suppose the internal memory allows only some 350 tuples  join Patients with Doctors • read Patients in batches of 250 tuples; therefore read Doctors 10 times; read: 2500 + 1000 = 3500; write intermediate result (too big) to disk: 2500;  restrict above result • read 2500; result: estimated 50 tuples;  project  cost: 8500 (read and write) 5 Query optimisation Intermediate conclusions  the evaluation strategy (procedural aspect) can lead to very big differences in computation time, for the same query • computation time: read from and write to disk (quintessential) • processor time  the actual evaluation procedures are far more complex than in the previous introductory example 6 Query optimisation Optimisation - what  deciding upon the best strategy of evaluating a query  it is performed automatically by the optimiser of the DBMS  not just for data retrieval operations, but for updating operations as well (e.g. UPDATE)  not guaranteed to give the best result 7 Query optimisation Optimisation - how  based on statistical information about the specific database (not necessarily, though) perform  expression transformation (cast query in some internal form and convert to respective canonical form  candidate low level procedures selection  query plans generation and selection  statistical information - could you think of examples?  cardinality of base relations, indexes, ... 8 Query optimisation Cast (transform) query in some internal form  internal format • more suitable for automatic processing • trees (syntax tree or query tree)  from a conceptual point of view is is easier to assume that the internal format is relational algebra 9 Query optimisation Convert to canonical form  the initial expression is transformed into an equivalent but more efficient form • “efficient form” = efficient when executed • these transformation are performed independently from actual data values and access paths 10 Query optimisation Expression transformation  examples (A WHERE condition#1) WHERE condition#2 (A WHERE condition#1 AND condition#2) (A [projection#1] ) [projection#2] A [projection#2] (A [projection]) WHERE condition (A WHERE condition) [restriction] 11 Query optimisation Expression transformation  distributivity  commutativity and associativity  idempotence  scalar expressions  conditional expressions  semantic transformation 12 Query optimisation Set level operations  the operators of relational algebra are set level • i.e. they manipulate sets (relations) and not individual tuples  however, these operators are implemented by internal (DBMS) procedures • these procedures, inherently, need tuple-access (in fact, they need access to scalar values) 13 Query optimisation Choose candidate low-level procedures  the optimiser decides how to execute the query (expressed in canonical form) • access paths are relevant at this stage  in the main, each basic operation (join , restriction, …) has a set of procedures that implement it • e.g. RESTRICTION - (1) on candidate key; (2) on indexed key; (3) on other attributes … • each procedure has associated a cost function (usually based on the required I/O disk operations); these functions are used in the next stage 14 Query optimisation Implementing JOIN - examples  R and P - two relations to be joined  J - the attribute on which the (natural) join is performed  R[i] and P[j] mean the i-th tuple of R and the j-th tuple of P, respectively  R[i].J means the value of the attribute J for the i-th tuple of the relation R  R has M and P has N tuples, respectively 15 Query optimisation Implementing JOIN - brute force for i:=1 to M for j := 1 to N do if R[i].J = P[j].J then add joined tuple R[i]*P[j] to result end end end 16 Query optimisation Index lookup index X on Patients.D_name D_name Johnson Johnson Johnson Thomson Thomson U. Ulrich Pointer Name Disease D_name … C. Reed M. Fox C. Fish M. Maria P. Bosh P. Wolf prk11 blood press stomach-ul ear nose kidn-fail Thomson Johnson Johnson Thomson Johnson Ulrich … … … … … … 17 Query optimisation Implementing JOIN - index lookup /* index X on P.J */ for i:=1 to M for j := 1 to K[i] do add joined tuple (R[i] * PK[j]) to result /* PK[j] represents the tuple of P that K[j] points to */ end end 18 Query optimisation Choose the cheapest query plan  construct query plans (query evaluation plan) • • • • combine candidate low level procedures choose the cheapest total cost = the sum of individual costs individual costs depend on the actual data values; estimates are used instead, based on statistical data • usually not all possible evaluation procedures are generated; the search space is reduced by applying heuristics 19 Query optimisation Database statistics - in the data dictionary  for each base table • cardinality • space occupied • etc.  for each column of each base table • • • • no of distinct values maximum, minimum and average values histogram of values …  ... 20 Query optimisation An optimiser is never perfect  the following example is a real life example  suppose a Postgres definition for • base relation: Treatment(Patient, Drug, Disease, …)  the query • get all the drugs that are taken by patients that suffer from prk11 • (all the drugs, not only those for prk11) SELECT DISTINCT Drug FROM Treatment WHERE Patient IN (SELECT Patient FROM Treatment WHERE Disease = ‘prk11’) ;  the query is far slower that the equivalent one (next) ... 21 Query optimisation An optimiser is never perfect /* this query is faster than the previous one, even though it seems to be performing more computations - Patient is not unique! */ CREATE VIEW V_Treatment AS SELECT * FROM Treatment SELECT DISTINCT Treatment.Drug FROM Treatment, V_Treatment WHERE Treatment.Patient = V_Treatment.Patient AND Disease = ‘prk11’ ; 22