Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
An Introduction to Dimensional Data Warehouse Design Presented by Joseph J. Sarna Jr. JJS Systems, LLC Relational Normal Form • Most relational databases are set to 3rd normal form 1st Normal form – Tables have unique keys and no repeating groups or multi-value fields 2nd Normal form – Every attribute is dependent ont the entire key of the table 3rd Normal form – Attributes are dependent only on the key. No derived elements Northwind Database Model – Relational Format Territories Region PK PK RegionID RegionDescription FK1 EmployeeTerritories Employees TerritoryID TerritoryDescription RegionID FK2 FK1 TerritoryID EmployeeID PK EmployeeID I1 LastName FirstName Title TitleOfCourtesy BirthDate HireDate Address City Region PostalCode Country HomePhone Extension Photo Notes ReportsTo PhotoPath I2 FK1 Categories PK CategoryID I1 CategoryName Description Picture Suppliers PK SupplierID I1 CompanyName ContactName ContactTitle Address City Region PostalCode Country Phone Fax HomePage I2 Orders Products PK ProductID I3 ProductName FK2,I4,I5 SupplierID FK1,I2,I1 CategoryID QuantityPerUnit UnitPrice UnitsInStock UnitsOnOrder ReorderLevel Discontinued Order Details PK,FK1,I2,I1 PK,FK2,I4,I3 OrderID ProductID UnitPrice Quantity Discount PK OrderID FK1,I1,I2 CustomerID FK2,I4,I3 EmployeeID I5 OrderDate RequiredDate I6 ShippedDate FK3,I7 ShipVia Freight ShipName ShipAddress ShipCity ShipRegion I8 ShipPostalCode ShipCountry Customers PK CustomerID I2 CompanyName ContactName ContactTitle Address City Region PostalCode Country Phone Fax I1 I4 I3 CustomerCustomerDemo PK,FK2 PK,FK1 CustomerID CustomerTypeID CustomerDemographics PK CustomerTypeID CustomerDesc Shippers PK ShipperID CompanyName Phone Why Dimensional Data Warehouses? • • Business needs to analyze data so that it can: – Understand trends – Predict future behavior and needs – Personalize contact with customers – Be competitive All of this in a speedy manner, with the ability to do “What if’s” Drawbacks to Relational Data Structures • Data is not structured for analytical usage • Multiple Joins are resource intensive • Missing data from external sources, context history, not operational sources What Is a Dimensional Data Warehouse? “A structured repository of validated and integrated historical information accessible to business people to provide the basis for both tactical and strategic business decisions.” • • • • Centralized extract and staging Separate from operational system Structured for analysis Historically contexted Dimensional Data Warehouse Architecture Relational Data External Data Enterprise Data Data Distribution Acquisition, Staging, Cleaning, Transformation Data Warehouse Storage Analytical Applications Levels of Design • Detail Level – – • Analytical Level – • Dimensional Normal form Value and feasibility Structured for the required analyses Summary Level – – Summaries for user requirements Better response time Dimensional Normal Form • Normalized for maintainability • De-normalized for performance, based on rules • 2 level structure, therefore only one level of joins required for queries Northwind Database Model – Dimensional Format Employees PK Territories Region PK RegionKey RegionID RegionDescription EmployeeTerritories PK TerritoryKey PK EmployeeTerritoryKey FK1 RegionKey TerritoryID TerritoryDescription FK1 FK2 EmployeeKey TerritoryKey TerritoryID TerritoryDescription RegionKey RegionID RegionDescription FK1 Order Details Products Categories PK PK CategoryKey CategoryID CategoryName Description Picture Suppliers PK FK1 SupplierKey SupplierID CompanyName ContactName ContactTitle Address City Region PostalCode Country Phone Fax HomePage FK2 ProductKey PK OrderDetailKey ProductID ProductName QuantityPerUnit UnitPrice UnitsInStock UnitsOnOrder ReorderLevel Discontinued CategoryKey CategoryID CategoryName Description SupplierKey SupplierID CompanyName ContactName ContactTitle Address City Region PostalCode Country Phone Fax HomePage FK2 OrderKey OrderID UnitPrice Quantity Discount ExtendedPrice ProductKey ProductID ProductName QuantityPerUnit UnitPrice UnitsInStock UnitsOnOrder ReorderLevel Discontinued CategoryKey CategoryID CategoryName Description SupplierKey SupplierID CompanyName ContactName ContactTitle Address City Region PostalCode Country Phone Fax HomePage FK1 Orders PK FK1 FK2 FK3 OrderKey OrderID CustomerKey EmployeeID ShipperKey OrderDate RequiredDate ShippedDate Freight ShipName ShipAddress ShipCity ShipRegion ShipPostalCode ShipCountry Shippers PK ShipperKey ShipperID CompanyName Phone EmployeeKey EmployeeID LastName FirstName Title TitleOfCourtesy BirthDate HireDate Address City Region PostalCode Country HomePhone Extension Photo Notes ReportsTo PhotoPath EmployeeTerritoryKey TerritoryKey TerritoryID TerritoryDescription RegionKey RegionID RegionDescription CustomerCustomerDemo Customers PK CustomerKey FK1 CustomerID CompanyName ContactName ContactTitle Address City Region PostalCode Country Phone Fax CustomerTypeID CustomerDesc CustomerDemographicsKey PK CustomerDemographicsKey FK1 CustomerKey CustomerTypeKey CustomerDesc CustomerDemographics PK CustomerTypeKey CustomerTypeID CustomerDesc Dimensional Schema • Fact Tables – – – – – • contain related measures Usually the largest tables Usually appended to Can contain detail or summary data Measures are usually additive Dimension Tables – – – – Contain descriptors Utilize business terminology Textual and discrete data Attributes through which the table measures are analyzed Northwind Database – Star Schema – Orders dimDate PK dimOrderDetails PK DateKey DayDate DayDate_YYYYMMDD DayOfWeekName DayOfWeekNameAbbrv DayNumberInWeek DayNumberInMonth DayNumberInQuarte DayNumberInYear WeekDayIndicator WeekEndIndicator Week_YYYYWW WeekNumberInYear Month_YYYYMM MonthName MonthNameAbbrv MonthNumberInYear Quarter_YYYYQ QuarterName QuarterNameAbrv QuarterNumberInYear Year ProductKey OrderID UnitPrice Quantity Discount ExtendedPrice ProductID ProductName QuantityPerUnit UnitPrice UnitsInStock UnitsOnOrder ReorderLevel Discontinued CategoryID CategoryName Description SupplierID CompanyName ContactName ContactTitle Address City Region PostalCode Country Phone Fax HomePage fctOrders PK OrderKey FK3 FK2 FK1 FK4 FK6 FK5 FK7 ProductKey EmployeeKey CustomerKey ShipperKey OrderDateKey RequiredDateKey ShippedDateKey OrderID ShipVia Freight ShipName ShipAddress ShipCity ShipRegion ShipPostalCode ShipCountry dimShippers PK ShipperKey ShipperID CompanyName Phone dimEmployees PK EmployeeKey EmployeeID LastName FirstName Title TitleOfCourtesy BirthDate HireDate Address City Region PostalCode Country HomePhone Extension Photo Notes ReportsTo PhotoPath TerritoryID TerritoryDescription RegionID RegionDescription dimCustomers PK CustomerKey CustomerID CompanyName ContactName ContactTitle Address City Region PostalCode Country Phone Fax CustomerTypeID CustomerDesc Resources • Books – – – • The Data Warehouse Toolkit, Ralph Kimball The Data Warehouse Lifecycle Toolkit, Ralph Kimball, et al Data Warehouse Design Solutions, Adamson / Venerable Websites – – – http://www.ralphkimball.com/ http://www.atre.com http://www.microsoft.com/sql/evaluation/overview/datawa re.asp