* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download 25-btrees
		                    
		                    
								Survey							
                            
		                
		                
                            
                            
								Document related concepts							
                        
                        
                    
						
						
							Transcript						
					
					CSE 373 Data Structures and Algorithms Lecture 25: B-Trees Assumptions of Algorithm Analysis Big-Oh assumes all operations take the same amount of time   2 Is this really true? Disk Based Data Structures All data structures we have examined are limited to main memory   Have been assuming data fits into main memory Counter-example: Transaction data of a bank > 1 GB per day    3 Uses secondary storage (e.g., hard disks) Operations: insert, delete, searches Cycles* to access: CPU Registers 1 Cache tens Main memory hundreds Disk millions *cycle: time it takes to execute an instruction Hard Disks Large amount of storage but slow access Identifying a particular block takes a long time    Read or write data in chunks (“a page”) of 0.5 – 8 KB in size (Newer technology) Solidstate drives are 50 – 100 times faster   5 Still “slow” Algorithm Analysis Running time of disk-based data structures measured in terms of    Computing time (CPU) Number of disk accesses Regular main-memory algorithms that work one data element at a time can not be "ported" to secondary storage in a straight forward way  6 Principles  Almost all of our data structure is on disk.  Every time we access a node in the tree it amounts to a random disk access.  How can we address this problem? 7 M-ary Search Tree  Suppose we devised a search tree with branching factor M:  M – 1 keys needed to decide branch to take  Complete tree has height: (logMn)  # Nodes accessed for search: (logMn) 8 B-Trees Internal nodes store (up to) M  1 keys  M=7 Order property:    Subtree between two keys x and y contain leaves with values v such that x  v < y Note the “” Leaf nodes contain up to L sorted values/ data items (“records”).  9 3 7 12 21 x<3 3x<7 7x<12 12x<21 21x B-Tree Structure Properties  Root (special case)   Internal nodes    Has between 2 and M children (or could be a leaf) Stores up to M1 keys Have between ceiling(M/2) and M children Leaf nodes   Nodes are at least ½ full Leaves are at least ½ full Where data is stored Contains between ceiling(L/2) and L data items The tree is perfectly balanced ! 10 B-Tree: Example  B-Tree with M = 4 (# pointers in internal node) and L = 5 (# data items in leaf) 12 44 Data objects… which we’ll ignore in slides 6 1, AB.. 2, GH.. 4, XY.. 6 8 9 10 20 27 34 12 14 16 17 19 20 22 24 27 28 32 50 34 38 39 41 44 47 49 Definition for later: “neighbor” is the next sibling to the left or right. 11 50 60 All leaves 70 at the same depth Disk Friendliness  Many keys stored in a node   Each node is one disk page/block. All brought to memory/cache in one disk access.  Internal nodes contain only keys; only leaf nodes contain actual data  What is limiting you from increasing the number of keys stored in each node?  12 The page size Exercise: Computing M and L  Exercise: If disk block is 4000 bytes, key size is 20 bytes, pointer size is 4 bytes, and data/value size is 200 bytes, what should M (# of branches) and L (# of data items per leaf) be for our B-Tree? Solve for M: M - 1 keys + M pointers = 20M - 20 + 4M = 24M – 20 24M - 20 <= 4000 M = 167 Solve for L: L = 4000 / 200 L = 20 13 B-trees vs. AVL trees Suppose we have n = 109 (billion) data items:  Depth of AVL Tree: log2 109 = 30  Depth of B-Tree with M = 256, L = 256: ~ log M / 2109 = log 128109 = 4.3 (M/2 because keys half-full) 14 14 Building a B-Tree with Insertions 3 Insert(3) 3 Insert(18) 3 Insert(14) 18 14 18 The empty B-Tree M=3 15 L=3 3 3 3 18 14 14 14 30 18 18 Insert(30) 30 M=3 16 L=3 18 3 18 18 3 18 18 Insert(32) 14 30 32 3 18 32 14 30 36 Insert(36) 14 30 32 18 M=3 L=3 Insert(15) 3 18 32 14 30 36 15 17 32 18 3 32 18 18 32 32 3 18 32 14 30 36 Insert(16) 14 30 36 15 15 16 18 15 32 18 15 18 32 3 15 18 32 14 16 30 36 M=3 L=3 Insert(12,40,45,38) 18 18 15 32 15 32 40 3 15 18 32 3 15 18 32 40 14 16 30 36 12 16 30 36 45 14 19 38 Insertion Algorithm: The Overflow Step K0 K1 K2 K3 Too big M=5 20 K0 K6 K4 K5 K1 K3 K2 K6 K4 K5 Insertion Algorithm 1. Insert the key in its leaf in sorted order 2. If the leaf ends up with L+1 items, overflow!    21 Split the leaf into two nodes with each half of the data. Add the new leaf to the parent. If the parent ends up with M+1 children, overflow! Insertion Algorithm 3. If an internal node ends up with M+1 children, overflow!    4. Split the internal node into two nodes each with half the keys. Add the new child to the parent If the parent ends up with M+1 items, overflow! If the root ends up with M+1 children, split it in two, and create new root with two children This makes the tree deeper! 22 And Now for Deletion… Delete(32) 18 15 32 18 15 40 40 3 15 18 32 40 3 15 18 36 40 12 16 30 36 45 12 16 30 38 45 14 38 M=3 23 L=3 14 Delete(15) 18 15 36 40 18 16 3 15 18 36 40 3 12 16 30 38 45 12 14 36 16 40 18 36 40 30 38 45 14 Are we okay? M=3 24 L=3 Leaf not half full! Are you using that 14? Can I borrow it? 18 18 16 3 36 16 12 14 M=3 25 L=3 40 14 36 40 18 36 40 3 14 18 36 40 30 38 45 12 16 30 38 45 Delete(16) 18 14 36 40 14 3 14 18 36 40 3 12 16 30 38 45 12 M=3 L=3 Are you using that 14? 26 18 36 14 40 18 36 40 30 38 45 18 18 14 3 36 14 12 36 40 40 18 36 40 3 18 36 40 30 38 45 12 30 38 45 14 M=3 27 L=3 Are you using the node18/30? 18 36 36 40 18 40 3 18 36 40 3 18 36 40 12 30 38 45 12 30 38 45 14 14 M=3 28 L=3 Delete(14) 36 36 18 40 18 40 3 18 36 40 3 18 36 40 12 30 38 45 12 30 38 45 14 M=3 29 L=3 Delete(18) 36 36 18 18 40 3 18 36 40 3 12 30 38 45 12 M=3 30 L=3 40 30 36 40 38 45 36 36 18 3 40 30 12 40 36 40 3 36 40 38 45 12 38 45 30 M=3 31 L=3 36 40 36 40 3 36 40 3 36 40 12 38 45 12 38 45 30 M=3 32 30 L=3 36 36 3 12 30 M=3 33 L=3 40 36 38 40 45 40 3 36 40 12 38 45 30 Deletion Algorithm: Rotation Step K0 K2 K1 K0 K3 K4 K5 K1 K3 K2 K4 Too small M=5 34 This is left rotation. Similarly, right rotation K5 Deletion Algorithm: Merging Step Too small! K2 K1 K3 Too small M=5 35 K4 Can’t get smaller K1 K2 K3 K4 Deletion Algorithm 1. Remove the key from its leaf 2. If the leaf ends up with fewer than L/2 items, underflow!    36 Try a left rotation If not, try a right rotation If not, merge, then check the parent node for underflow Deletion Slide Two 3. If an internal node ends up with fewer than M/2 children, underflow!    3. Try a left rotation If not, try a right rotation If not, merge, then check the parent node for underflow If the root ends up with only one child, make the child the new root of the tree This reduces the height of the tree! 37
 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                            