* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Set Title in 40pt. No more than 2 lines
		                    
		                    
								Survey							
                            
		                
		                
                            
                            
								Document related concepts							
                        
                        
                    
						
						
							Transcript						
					
					Designing and Tuning High Speed Data Loading Thomas Kejser Senior Program Manager tkejser@microsoft.com 1 Agenda  Tuning Methodology  Bulk Load API Basics  Design Pattern and Techniques  Parallelism  Table Layout  Tuning the SQL Server Engine  Tuning the Network Stack  Tuning Integration Services 2 Tuning ETL and ELT Tuning Methodology 3 The Tuning Loop  Get a baseline  Make small change at Generate Hypothesis Save Result Measure a time  Agree on targets for optimization  Actual runtime  CPU, Memory, I/O Measure Change  The greedy tuner:  “Tune it till it breaks, then fix it, so you can break it again” 4 Tools of the Trade - Windows  Perfmon  Logical Disk  Memory  Processor  Process (specifically the DTEXEC process)  Network Interface  Task Manager  WinDbg  KernRate 5 Tool of the Trade – SQL Server  Sys.dm_os_wait_stats  All my tuning starts here  Get familiar with common wait types  Sys.dm_os_latch_stats  Allows deep dive into LATCH_<X> waits  Sys.dm_os_spinlock_stats  When too much CPU seems to be spend  Sys.dm_io_virtual_filestats  Because I/O systems are rarely perfect 6 Designing and Tuning High Speed Data Loading Bulk load API Basics 7 Four ways to Load Data to SQL Server  Integration Services  OLEDB Destination  SQL Server Destinations  BULK INSERT  CSV or fixed width files  BCP  Like BULK INSERT, but can be run remotely  INSERT ... SELECT 8 Minimally logged and Bulk  Bulk Load  Feeds a continuous stream of data into a table  As opposed to running singleton INSERT statements  Minimally logged  Only allocations are logged, not individual rows/pages  Key Takeway: An operation can be a bulk load operation without being minimally logged 9 To TABLOCK or not to TABLOCK  General Rule (batch style):  Heaps: Use TABLOCK on Heaps  Cluster Indexes: Do NOT use TABLOCK  Minimally logged:  INSERT Heap WITH (TABLOCK) SELECT ...  If TF610 is on:  INSERT ClusterIndex SELECT ...  Same rules apply for SSIS OLEDB and SQL Destinations in SSIS 10 Designing and Tuning High Speed Data Loading Design Patterns 11 Integration Services or T-SQL  Sometimes: Matter or preference  Integration Services is graphical  Some users like this  Hard to make modular  SQL Server uses T-SQL ”text language”  Modular programming  The right tool for the right job  Learn both… 12 SQL Server – Which load method? BULK INSERT / BCP  Pro INSERT ... SELECT  Pro  Can takes BU-lock  Can perform transformations  No need for Linked Servers  Any OLEDB enabled input or OPENROWSET  Cons  Only CSV and fixed width files for input  Cons  Takes X-locks on table  Linked Servers or OPENROWSET needed 13 Integration Services – Which Destination? OLEDB Destination  Pros: SQL Server Destination  Pro:  Can be used over TCP/IP  Fastest option  ETL Servers can be scaled  Easy to configure out remote  Con:  Typically slower than SQL Destination  Con:  Must run on same box as SQL Server (shared memory connections) 14 Design Pattern: Parallel Load  Create a (priority) queue for your packages  SQL Table good for this purpose  Packages / T-SQL include a loop:  Loop takes one item from queue  Until queue empty… Priority Queue DTEXEC (1) DTEXEC (2) 15 Design Pattern: Table Hash Partitioning  Create filegroups to hold the partitions   Use CREATE PARTITION FUNCTION command   Partition the tables into #cores partitions Use CREATE PARTITION SCHEME command   Equally balance over LUN using optimal layout hash 0 1 2 3 4 5 6 Bind partition function to filegroups Add hash column to table (tinyint, just one byte per row)  Calculate a good hash distribution  For example, use hashbytes with modulo or binary_checksum 253 254 255 16 Design Pattern: Large Updates Sales Sales Updated 2001 2002 Sales_Ne w SWITCH Sales_Old Update Records Sales_Delt a BULK INSERT 2003 2004 17 Design Pattern: Large Deletes Sales 2001 (Filtered) 2001 2002 BULK SWITCH INSERT Sales_Temp (2001 Filtered) Sales_Temp (2001) 2003 2004 18 Designing and Tuning High Speed Data Loading Tuning the SQL Server Engine 19 ALLOC_FREESPACE_CACHE - Heap limits  Measure: 250.0 Sys.dm_os_latch_waits  Long waits for ALLOC_FREESPACE_CAC HE  SQL Server® Books Online: cache of pages with available space for heaps and binary large objects (BLOBs). Contention on latches of this class can occur when multiple connections try to insert rows into a heap or BLOB at the same time. You can reduce this contention by partitioning the object.” 150.0 MB/Sec  “Used to synchronize the access to a 200.0 100.0 50.0  Hypothesis: More heaps = more speed 0.0 0 10 20 Concurrent Bulks 30 20 PAGELATCH_UP – PFS contention  Measure:  sys.dm_os_wait_stats  Hypothesis Generation  I/O problem?  What can we predict?  Fix: Add more files to the filegoup! 21 RESOURCE_SEMAPHORE - Query memory usage  DW load queries will often be very memory intensive  By default, a single query can max use 25% of SQL Server’s allocated memory  Queries waiting to get a memory grant will wait for: RESOURCE_SEMAPH ORE  Can use RG to work around it 22 SOS_SCHEDULER_YIELD  Hypothesis: Caused by two bulk commands at same scheduler  Predict:  We should see multiple bulk commands on same scheduler  Observe: And we do…  scheduler_id in sys.dm_exec_requests 23 Fixing SOS_SCHEDULER_YIELD  How can we fix this?  Two ways:  Terminate and reconnect  Soft NUMA Core 0 Soft-NUMA Node 0 TCP port 1433 x CPU cores Core X Soft-NUMA Node X TCP port 1433 + X BULK INSERT x CPU cores BULK INSERT 24 I/O Related Waits for BULK INSERT  BULK insert uses a double buffering scheme  Important to feed it fast enough  Also, target SQL Server must be able to absorb writes Table PAGEIOLATCH_EX Pars e 64KB 64KB CSV IMPROVIO_WAIT OLEDB ASYNC_NETWORK_IO 25 CXPACKET – When it Matters  Statements of type Throughput / DOP 50.0  INSERT…SELECT 45.0  Measure: Sometimes Throughput (MB/sec( throughput drops with higher DOP  Hypothesis: backpressure in query execution 40.0 35.0 30.0 25.0 20.0 15.0 10.0 5.0 0.0 1 11 21 31 41 DOP 26 Drinking From a Fire Hose CXPACKET waits / Throughput 200,000,000 180,000,000 140,000,000 120,000,000 100,000,000 Solution: OPTION (MAXDOP = X) 80,000,000 60,000,000 CXPACKET Waits 160,000,000 40,000,000 20,000,000 0 40.0 30.0 20.0 10.0 Throughput (MB/sec) 27 SQL Server waits - Summary Wait Type Typical Cause Resolution PAGELATCH_UP Contention on PFS pages Add more data files to filegroup ALLOC_FREESPACE_CACHE Heap allocation bottleneck Partition target table and use SWITCH SOS_SCHEDULER_YIELD Network speed not keeping up Optimize network settings in Windows (Jumbo Frames) Increase packet size RESOURCE_SEMAPHORE Too much memory used by query Optimize query for less memory or use Resource Governor to limit max allocation LCK_X Locks prevent parallelism Use correct lock hints WRITELOG Transaction log contention Use TF610, seeks minimally logged operatorions PAGEIOLATCH_<X> I/O system not keeping Tune I/O IMPROV_IOWAIT Input file I/O too slow Improve input file latency and/or through CXPACKET Normallly harmless. But may be too much coordination Use MAXDOP hint, but carefully OLEDB/ASYNC_NETWORK_IO Not feeding bulk load fast enough Optimize source 28 Designing and Tuning High Speed Data Loading Tuning the Network Stack 29 How to Affinitize NICs  Using the Interrupt-Affinity Policy Tool you can affinitize individual NICs to CPU cores  Affinitize each of the NIC to their own core  One NIC per hard NUMA node  You mileage may very – depends on the box  Match Soft NUMA TCP/IP connections with NIC  NIC on the hardware NUMA node maps to SQL bulk stream target on same node 30 Tune Network Parameters  Jumbo Frames = 9014 bytes enabled  Adaptive Inter-Frame spacing disabled  Flow control = Tx & Rx enabled  Client & server Interrupt Moderation = Medium  Coalesc buffers = 256  Set server Rx buffers to 512 and server Tx buffers to 512  Set client Rx buffers to 512 and client Tx buffers to 256  Link speed 1000mbps Full Duplex 31 Network Packet Size  Measure  Perfmon shows huge discrepancy between num reads and writes  Hypothesis:  This is caused by small network packet size (Default 4096) forcing stream to be broken into smaller pieces  Test and prove:  Adjusting network packet size to 32K  Increases throughput by 15% 32 Designing and Tuning High Speed Data Loading Tuning Integration Services 33 Integration Services vs. SQL  Lab Test Setup Test 2: Raw Join Time/s Krows/s  Transform fact data with SSIS 2008 144 2222 surrogate key lookups  5 dimension tables, 100K rows each  Partitioned fact table, total of 320M rows SQL MAXDOP = 0 158 2025 SQL MAXDOP = 1 x 32 162 1975 SQL MAXDOP = 1 x 32 246 1301 SSIS 2008 278 1151 SQL MAXDOP = 0 1927 166  Test speed of hash Test 3: Join and write joins Integration Services lookup join is comparable in speed with T-SQL! 34 Baseline of Package  Sanity check:  How much memory does each package use?  How much CPU does each package stream use?  Need enough CPU and Memory to run them all  Performance counters:  Process – Private Bytes / Working Set (DTEXEC)  Processor – % Processor Time  Network interface  Network / Current Bandwidth  Network / Bytes Total/sec 35 Scaling the Package - Method  Using the parallel load technique described earlier you can run multiple copies of the package  Using the baseline of the package, you can now calculate how many scale servers you will need 36 Data Loading – Fast Enough?  Bulk load scales near linearly with bulk streams  Measured so far up to 96 cores  Possible to reach 100% CPU load on all cores  “Just” Get rid of all bottlenecks 37 & 38 © 2008 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION. 39 Tuning ETL and ELT APPENDIX 40 Data Loading Links  The Data Loading Performance Guide  Top 10 SQL Server Integration Services Best     Practices Managing and Deploying SQL Server Integration Services SQL Server 2005 Integration Services: A Strategy for Performance Integration Services: Performance Tuning Techniques High Impact Data Warehousing with SQL Server Integration Services 41
 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                            