Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
SQL Server 2016 PolyBase Sean Werick Principal Consultant 502.320.2918 swerick@pragmaticworks.com SQL Server 2016 - Industry leading TCO R SERVER IN-DB ADVANCED ANALYTICS MOBILE BI INTELLIGENCE & SELF-SERVICE BI BUSINESS $320K SQL Server 2016 Everything built-in #1 TPC-H—DW DATA WAREHOUSING BUILT-IN ETL ETL $3,745,000 $3,433,000 $1,272,000 $803,000 $640,000 11.7x more OLTP INDUSTRY LEADER—OLTP Built-in with SQL Server vs. expensive add-ons with Oracle In-memory built-in End-to-end security built-in Advanced Analytics built-in Complete mobile BI built-in SQL Server 2016: Everything built-in built-in built-in built-in built-in built-in $2,230 80 69 70 SQL Server 60 50 43 40 34 SQL Server $480 29 30 2220 6 0 4 1 0 22 18 15 20 10 SQL Server 49 0 5 3 3 0 2010 SQL Server 2011 2012 Oracle 2013 MySQL 2014 #1 #2 #3 2015 SAP HANA Oracle is #5 $120 Microsoft TPC-H Tableau Oracle Self-service BI per user at massive scale In-memory across all workloads Consistent experience from on-premises to cloud The above graphics were published by Gartner, Inc. as part of a larger research document and should be evaluated in the context of the entire document. The Gartner document is available upon request from Microsoft. Gartner does not endorse any vendor, product or service depicted in its research publications, and does not advise technology users to select only those vendors with the highest ratings or other designation. Gartner research publications consist of the opinions of Gartner's research organization and should not be construed as statements of fact. Gartner disclaims all warranties, expressed or implied, with respect to this research, including any warranties of merchantability or fitness for a particular purpose. National Institute of Standards and Technology Comprehensive Vulnerability Database update 10/2015 TPC-H 10TB non-clustered results as of 04/06/15, 5/04/15, 4/15/14 and 11/25/13, respectively. http://www.tpc.org/tpch/results/tpch_perf_results.asp?resulttype=noncluster 8 In-memory enhancements Operational analytics & enhanced performance In-memory SQL Server ColumnStore data warehouse 2-24 hrs ETL Fraud detected Real-time fraud detection 0100101010110 In-memory SQL Server OLTP Built-in advanced analytics In-database analytics at massive scale Example Solutions Extensibility • Sales forecasting R Integration • Warehouse efficiency • Predictive maintenance R ? New R scripts • Credit risk protection 010010 100100 010101 Analytic Library 010010 100100 010101 Data Scientist Interact directly with data 010010 100100 010101 T-SQL Interface Relational Data Built-in to SQL Server Data Developer/DBA Manage data and analytics together 010010 100100 010101 010010 100100 010101 Microsoft Azure Marketplace R integration and advanced analytics Capability SQL Server Data Scientists Analytics library Publish algorithms, interact directly with data Share and collaborate Manage and deploy Analytical engines Full R integration Fully extensible DBAs R + Manage storage and analytics together Data Management Layer Business Analysts Relational data Analysis through TSQL, tools, and vetted algorithms T-SQL interface Stream data in-memory Advanced analytics Extensible in-database analytics, integrated with R, exposed through T-SQL Centralize enterprise library for analytic models Benefits Always Encrypted Help protect data at rest and in motion, on-premises & cloud Apps SQL Server Trusted Client side SELECT Name FROM Patients WHERE SSN=@SSN @SSN='198-33-0987' Column Master Key Result Set Name Jim Gray Query SELECT Name FROM Patients WHERE SSN=@SSN @SSN=0x7ff654ae6d Enhanced ADO.NET Library Result Set Column Encryption Key Name Jim Gray ciphertext dbo.Patients Name SSN Country Jane Doe 243-24-9812 USA 1x7fg655se2e Jim Gray 198-33-0987 0x7ff654ae6d USA John Smith 123-82-1095 0y8fj754ea2c USA RLS in three steps Two App user (e.g., nurse) selects from Patients table Security Policy transparently rewrites query to apply filter predicate Nurse Database Three Security Policy Patients Application SELECT * FROM Patients Policy Manager Filter Predicate: INNER JOIN… CREATE FUNCTION dbo.fn_securitypredicate(@wing int) RETURNS TABLE WITH SCHEMABINDING AS return SELECT 1 as [fn_securitypredicate_result] FROM SELECT *StaffDuties FROM Patients d INNER JOIN Employees e SEMIJOIN APPLY dbo.fn_securitypredicate(patients.Wing); ON (d.EmpId = e.EmpId) WHERE e.UserSID = SUSER_SID() AND @wing = d.Wing; SELECT FROM Patients, CREATE Patients.* SECURITY POLICY dbo.SecPol StaffDuties d INNER JOIN Employees e ON (d.EmpId = e.EmpId) ADD FILTER PREDICATE dbo.fn_securitypredicate(Wing) ON Patients WHERE = SUSER_SID() AND Patients.wing = d.Wing; WITH e.UserSID (STATE = ON) Security Dynamic data masking walkthrough 2) Application user selects from Employee table 1) Dynamic Securitydata officer defines dynamic data masking policydata in T-SQL sensitive 3) masking policy obfuscates the sensitive in theover query results data in Employee table ALTER TABLE [Employee] ALTER COLUMN [SocialSecurityNumber] ADD MASKED WITH (FUNCTION = ‘SSN()’) ALTER TABLE [Employee] ALTER COLUMN [Email] ADD MASKED WITH (FUNCTION = ‘EMAIL()’) ALTER TABLE [Employee] ALTER COLUMN [Salary] ADD MASKED WITH (FUNCTION = ‘RANDOM(1,20000)’) GRANT UNMASK to admin1 SELECT [Name], [SocialSecurityNumber], [Email], [Salary] FROM [Employee] Security Monitoring performance by using the Query Store Capability Query Store helps customers quickly find and fix query performance issues Query Store is a ‘flight data recorder’ for database workloads Benefits Greatly simplifies query performance troubleshooting Provides performance stability across SQL Server upgrades Allows deeper insight into workload performance Performance Stretch SQL Server into Azure Stretch warm and cold tables to Azure with remote query processing Microsoft Azure Jim Gray Order history Name Jane Doe Jim Gray John Smith Bill Brown ox7ff654ae6d 3/18/2005 Stretch to cloud SSN Date Customer data 2/28/200 cm61ba906fd 5 Product data 3/18/200 ox7ff654ae6d 5 Order History 4/10/200 i2y36cg776rg 5 4/27/200 nx290pldo90l 5 5/12/200 Query App New Portal Updated Report Features New KPI Features Mobile Reporting Brand the Portal SQL Server 2016 Features So many great features, but which are included with which version? https://www.microsoft.com/en-us/server-cloud/products/sql-server-editions/ Deeper Insights Across Data with PolyBase What is PolyBase What is PolyBase PolyBase Query relational and non-relational data with T-SQL Query relational and non-relational data, on-premises and in Azure T-SQL query SQL Server Apps Access any data Hadoop PolyBase Query relational and non-relational data with T-SQL Quote: ************************ T-SQL query ********************** ********************* ********************** *********************** SQL Server Name DOB State Jim Gray 11/13/58 WA Ann Smith 04/29/76 ME Hadoop $658.39 PolyBase Can… PolyBase Performance Polybase scale-out groups PolyBase Requirements SQL Server (64-bit) Java SE downloads Enable or Disable a Server Network Protocol Setting up PolyBase 1.Install PolyBase a) PolyBase Data Movement Service b) PolyBase Engine 2.Configure SQL Server and enable the option 3.Configure Pushdown 4.Create external data source 5.Create external file format 6.Create Hadoop user 7.Create external table PolyBase Configuration First, is it enabled? SELECT SERVERPROPERTY ('IsPolybaseInstalled') -- 5 denotes the connection type EXEC sp_configure 'hadoop connectivity', 5; RECONFIGURE; Option 0: Disable Hadoop connectivity Option 1: Hortonworks HDP 1.3 on Windows Server Option 1: Azure blob storage (WASB[S]) Option 2: Hortonworks HDP 1.3 on Linux Option 3: Cloudera CDH 4.3 on Linux Option 4: Hortonworks HDP 2.0 on Windows Server Option 4: Azure blob storage (WASB[S]) Option 5: Hortonworks HDP 2.0 on Linux Option 6: Cloudera 5.1, 5.2, 5.3, 5.4, and 5.5 on Linux Option 7: Hortonworks 2.1, 2.2, and 2.3 on Linux Option 7: Hortonworks 2.1, 2.2, and 2.3 on Windows Server Option 7: Azure blob storage (WASB[S]) PolyBase Configuration Restart: • SQL Server • PolyBase Data Movement Service • PolyBase Engine PolyBase Configuration PolyBase Configuration EXEC sp_polybase_join_group 'PQTH4A-CMP01', 16450, 'MSSQLSERVER'; PolyBase Configuration Create Scoped Credential USE [AdventureworksDW] GO -- 2: Create a database scoped credential for Kerberos-secured Hadoop clusters. -- IDENTITY: the user name -- SECRET: the password CREATE DATABASE SCOPED CREDENTIAL HDPUser WITH IDENTITY = 'hue', Secret = ''; GO PolyBase Configuration Create External Data Source USE [AdventureworksDW] GO CREATE EXTERNAL DATA SOURCE [HDP2] WITH (TYPE = HADOOP, LOCATION = N'hdfs://pwpchadoop.cloudapp.net:8020', CREDENTIAL = HDPUser); GO PolyBase Configuration CREATE EXTERNAL FILE FORMAT TSV WITH ( FORMAT_TYPE = DELIMITEDTEXT, FORMAT_OPTIONS ( FIELD_TERMINATOR = '\t', DATE_FORMAT = 'MM/dd/yyyy' ) ) PolyBase Configuration CREATE EXTERNAL TABLE HDP_FactInternetSales ([ProductKey] [int], [OrderDateKey] [int], [DueDateKey] [int], [ShipDateKey] [int], [CustomerKey] [int], …) WITH (LOCATION = '/apps/hive/warehouse/factinternetsales', DATA_SOURCE = HDP2, FILE_FORMAT = TSV, REJECT_TYPE = value, REJECT_VALUE=0) Using PolyBase SELECT Insured_Customers.FirstName, Insured_Customers.LastName, Insured_Customers.YearlyIncome, Insured_Customers.MaritalStatus INTO Fast_Customers from Insured_Customers INNER JOIN (SELECT * FROM CarSensor_Data where Speed > 35) AS SensorD ON Insured_Customers.CustomerKey = SensorD.CustomerKey Using PolyBase -- Enable INSERT into external table sp_configure 'allow polybase export', 1; Reconfigure; -- Create an external table. CREATE EXTERNAL TABLE [dbo].[FastCustomers2009] ( [FirstName] char(25) NOT NULL, [LastName] char(25) NOT NULL, [YearlyIncome] float NULL, [MaritalStatus] char(1) NOT NULL) WITH (LOCATION='/old_data/2009/customerdata.tbl', DATA_SOURCE = HadoopHDP2, FILE_FORMAT = TextFileFormat, REJECT_TYPE = VALUE, REJECT_VALUE = 0); Using PolyBase SELECT DISTINCT Insured_Customers.FirstName, Insured_Customers.LastName, Insured_Customers.YearlyIncome, CarSensor_Data.Speed FROM Insured_Customers, CarSensor_Data WHERE Insured_Customers.CustomerKey = CarSensor_Data.CustomerKey and CarSensor_Data.Speed > 35 ORDER BY CarSensor_Data.Speed DESC OPTION (FORCE EXTERNALPUSHDOWN); -- or OPTION (DISABLE EXTERNALPUSHDOWN) Using PolyBase SELECT customer.name, customer.zip_code FROM customer WHERE customer.account_balance < 200000 Using PolyBase PolyBase Troubleshooting -- Find the longest running query SELECT execution_id, st.text, dr.total_elapsed_time FROM sys.dm_exec_distributed_requests dr cross apply sys.dm_exec_sql_text(sql_handle) st ORDER BY total_elapsed_time DESC; -- Find the longest running step of the distributed query plan SELECT execution_id, step_index, operation_type, distribution_type, location_type, status, total_elapsed_time, command FROM sys.dm_exec_distributed_request_steps WHERE execution_id = 'QID4547' ORDER BY total_elapsed_time DESC; PolyBase Troubleshooting -- Find the execution progress of SQL step SELECT execution_id, step_index, distribution_id, status, total_elapsed_time, row_count, command FROM sys.dm_exec_distributed_sql_requests WHERE execution_id = 'QID4547' and step_index = 1; PolyBase Troubleshooting SELECT execution_id, step_index, dms_step_index, compute_node_id, type, input_name, length, total_elapsed_time, status FROM sys.dm_exec_external_work WHERE execution_id = 'QID4547' and step_index = 7 ORDER BY total_elapsed_time DESC; PolyBase Troubleshooting Start Playing Developer Edition SQL Server Data Tools For Visual Studio 2015 https://www.microsoft.com/en-us/server-cloud/products/sql-server-editions/sql-server-developer.aspx https://msdn.microsoft.com/en-us/library/mt204009.aspx Pragmatic Works Offers a Variety of Services to Help You with SQL Server - Architectural Design Sessions - Detailed Assessments and Roadmaps - Migrations and Upgrades Ask us about test-driven migrations and upgrades with Legitest Sean Werick swerick@pragmaticworks.com