* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Download Start_to_Finish_with_Azure_Data_Factory
Oracle Database wikipedia , lookup
Microsoft Access wikipedia , lookup
Extensible Storage Engine wikipedia , lookup
Tandem Computers wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
Team Foundation Server wikipedia , lookup
Microsoft Jet Database Engine wikipedia , lookup
Functional Database Model wikipedia , lookup
Clusterpoint wikipedia , lookup
Database model wikipedia , lookup
Relational model wikipedia , lookup
Open Database Connectivity wikipedia , lookup
Andy Roberts
Data Architect
andyrob@microsoft.com
Session Objectives
Understand where ADF fits in Cortana Analytics
Understand how ADF Works, and its components
Be able to deploy and manage a simple ADF implementation
Key Takeaway:
ADF can be used in real world data pipeline scenarios, quickly
and easily
A Suite of Products that allow
you to Predict
Outcomes, Prescribe Actions
and Automate Decisions
Cortana
Power BI
Azure Stream Analytics
Azure HDInsight
Azure Machine Learning
Azure SQL DB, Data Warehouse, DocumentDB
Azure Data Lake
Azure Event Hubs
Azure Data Catalog
Azure Data Factory
Microsoft Azure
Transform
Store
Analyze
Orchestrate
Cortana Analytics Process:
https://tinyurl.com/caprocess
Ingest
Act
Create, orchestrate, and manage
data movement and enrichment
through the cloud
ADF Components
ADF Logical Flow
ADF Process
1. Define Architecture: Set up objectives and flow
2. Create the Data Factory: Portal, PowerShell, VS
3. Create Linked Services: Connections to Data and
Services
4. Create Datasets: Input and Output
5. Create Pipeline: Define Activities
6. Monitor and Manage: Portal or PowerShell, Alerts
and Metrics
Define data sources,
processing requirements,
and output – also
management and
monitoring
Example - Churn
Azure Data
Factory:
Data Sources
Call Log Files
Ingest
Transform & Analyze
Publish
Call Log Files
Customer Table
Customer Table
Customer
Call Details
Customers
Likely to
Churn
Customer
Churn Table
Our ADF:
• Business Goal: Transform and Analyze Web Logs
each month
• Design Process: Transform Raw Weblogs stored in
a temporary location, using a Hive Query, storing
the results in Blob Storage
Web
Logs in
HDFS
File store
Files ready
for analysis
and use in
AzureML
Portal, PowerShell
and Visual Studio
Using the
Portal
• Use in Non-MS Clients
• Use for Exploration
• Use when teaching or in a Demo
Using
PowerShell
• Use in MS Clients
• Use for Automation
• Use for quick set up and tear down
PowerShell ADF Example
1.
2.
3.
4.
5.
6.
Run Add-AzureAccount and enter the user name and password
Run Get-AzureSubscription to view all the subscriptions for this
account.
Run Select-AzureSubscription to select the subscription that
you want to work with.
Run Switch-AzureMode AzureResourceManager
Run New-AzureResourceGroup -Name
ADFTutorialResourceGroup -Location "West US"
Run New-AzureDataFactory -ResourceGroupName
ADFTutorialResourceGroup –Name DataFactory(your
alias)Pipeline –Location "West US"
Using
Visual
Studio
• Use in mature dev environments
• Use when integrated into larger
development process
Connection to Data or
Connection to Compute
Resource – Also termed
“Data Store”
Data Options
Source
Blob
Table
SQL Database
SQL Data Warehouse
DocumentDB
Data Lake Store
SQL Server on IaaS
OnPrem File System
OnPrem SQL Server
OnPrem Oracle Database
OnPrem MySQL Database
OnPrem DB2 Database
OnPrem Teradata Database
OnPrem Sybase Database
OnPrem PostgreSQL Database
Sink
Blob, Table, SQL Database, SQL Data Warehouse, OnPrem SQL Server,
OnPrem File System, Data Lake Store
Blob, Table, SQL Database, SQL Data Warehouse, OnPrem SQL Server,
Data Lake Store
Blob, Table, SQL Database, SQL Data Warehouse, OnPrem SQL Server,
Data Lake Store
Blob, Table, SQL Database, SQL Data Warehouse, OnPrem SQL Server,
Data Lake Store
Blob, Table, SQL Database, SQL Data Warehouse, Data Lake Store
Blob, Table, SQL Database, SQL Data Warehouse, OnPrem SQL Server,
OnPrem File System, Data Lake Store
Blob, Table, SQL Database, SQL Data Warehouse, OnPrem SQL Server,
Blob, Table, SQL Database, SQL Data Warehouse, OnPrem SQL Server,
System, Data Lake Store
Blob, Table, SQL Database, SQL Data Warehouse, OnPrem SQL Server,
Blob, Table, SQL Database, SQL Data Warehouse, OnPrem SQL Server,
Blob, Table, SQL Database, SQL Data Warehouse, OnPrem SQL Server,
Blob, Table, SQL Database, SQL Data Warehouse, OnPrem SQL Server,
Blob, Table, SQL Database, SQL Data Warehouse, OnPrem SQL Server,
Blob, Table, SQL Database, SQL Data Warehouse, OnPrem SQL Server,
Blob, Table, SQL Database, SQL Data Warehouse, OnPrem SQL Server,
SQL Server on IaaS, DocumentDB,
SQL Server on IaaS, DocumentDB,
SQL Server on IaaS, DocumentDB,
SQL Server on IaaS, DocumentDB,
SQL Server on IaaS, DocumentDB,
SQL Server on IaaS, Data Lake Store
SQL Server on IaaS, OnPrem File
SQL Server on IaaS,
SQL Server on IaaS,
SQL Server on IaaS,
SQL Server on IaaS,
SQL Server on IaaS,
SQL Server on IaaS,
SQL Server on IaaS,
Data Lake Store
Data Lake Store
Data Lake Store
Data Lake Store
Data Lake Store
Data Lake Store
Data Lake Store
Activity Options
Transformation activity
Hive
Pig
MapReduce
Hadoop Streaming
Machine Learning activities: Batch
Execution and Update Resource
Stored Procedure
Data Lake Analytics U-SQL
DotNet
Compute environment
HDInsight [Hadoop]
HDInsight [Hadoop]
HDInsight [Hadoop]
HDInsight [Hadoop]
Azure VM
Azure SQL
Azure Data Lake Analytics
HDInsight [Hadoop] or Azure Batch
Named reference
or pointer to data
Dataset Concepts
{
"name": "<name of dataset>",
"properties":
{
"structure": [ ],
"type": "<type of dataset>",
"external": <boolean flag to indicate external data>,
"typeProperties":
{
},
"availability":
{
},
"policy":
{
}.
}
Logical Grouping
of Activities
Pipeline Concepts
{
}
"name": "PipelineName",
"properties":
{
"description" : "pipeline description",
"activities":
[
}
],
"start": "<start date-time>",
"end": "<end date-time>"
Scheduling, Monitoring,
Disposition
Locating Failures within a Pipeline