* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download What Is a Dimensional Data Warehouse?
Clusterpoint wikipedia , lookup
Expense and cost recovery system (ECRS) wikipedia , lookup
Data Protection Act, 2012 wikipedia , lookup
Data center wikipedia , lookup
Forecasting wikipedia , lookup
Data analysis wikipedia , lookup
Database model wikipedia , lookup
Data vault modeling wikipedia , lookup
Program Pelatihan Tenaga Infromasi dan Informatika Sistem Informasi Kesehatan    Paulraj Ponniah. 2010. Data Warehousing Fundamentals for IT Professional, John Wiley & Sons. Vincent Rainardi. 2008. Building a Data Warehouse With Examples in SQL Server. Apress. William H. Inmon. 2005. Building The Data Warehouse, Willey.  1980’s to early 1990’s  Focus on computerizing business processes  To gain competitive advantage  By early 1990’s  All companies had operational systems  It no longer offered any advantage  How to get competitive advantage?? Information A process of transforming data into information and making it available to users in a timely enough manner to make a difference [Forrester Research] Data     Companies, over the years, gathered huge volumes of data “Hidden Treasure” Can this data be used in any way? Can we analyze this data to get any competitive advantage?      Allows “efficient” analysis of data Competitive Advantage Analysis aids strategic decision making Increased productivity of decision makers Potential high ROI  Quick decisions  “The ultimate goal is simple: Give the battlefield commander access to all the information needed to win the war. And give it to him when he wants it, where he wants he and how he wants it.”  -- Gen. Colin L. Powell, “Information Warriors,” BYTE, 1992    Retail  Manufacturing  Customer Loyalty  Cost Reduction  Market Planning  Logistics Management Finacial  Utilities  Risk Management  Asset Management  Fraud Detection  Resource Managament Airlines  Government  Route Profitability  Manpower Planning  Yield Management  Cost Control  Strategic Information needed to formulate:  the business strategies,  establish goals,  set objectives, and  monitor results.  Examples of business objectives:  Retain the present customer base  Increase the customer base by 15% over the next 5 years  Improve product quality levels in the top five product groups  Gain market share by 10% in the next 3 years  Enhance customer service level in shipments  Bring three new products to market in 2 years  Increase sales by 15% in the East Division INTEGRATED Must have a single, enterprise-wide view. DATA INTEGRITY Information must be accurate and must conform to business rules ACCESSIBLE Easily accessible with intuitive access paths, and responsive for analysis. CREDIBLE Every business factor must have one and only one value. TIMELY Information must be available within the stipulated time frame.  Ease  It combines information from different, separate systems in one location  easy to access.  Speed  DW tables are specifically designed for quick response time, and handle large quantities of data.  Report and other data are precalculated  Reliability  DW is read-only database  stability over time.  Flexibility  Utilizing BI Tools  Data warehousing is a simple concept  It is born out of the need for strategic information and is the result of the search for a newway to provide such information.   An Environment, Not a Product A Blend of Many Technologies     A data warehouse is not a single software or hardware product you purchase to provide strategic information. A computing environment where users can find strategic information, an environment where users are put directly in touch with the data they need to make better decisions. It is a user-centric environment.  Characteristics of new computing environment called the data warehouse:  An ideal environment for data analysis and decision     support Fluid, flexible, and interactive 100% user-driven Very responsive and conducive to the ask–answer–ask again pattern Provides the ability to discover answers to complex, unpredictable questions  The basic concept of data warehousing is:  Take all the data from the operational systems.  Where necessary, include relevant data from outside, such as industry benchmark indicators.  Integrate all the data from the various sources.  Remove inconsistencies and transform the data.  Store the data in formats suitable for easy access for decision making.   A decision support database that is maintained separately from the organization’s operational databases. A data warehouse is a     subject-oriented, integrated, time-varying, non-volatile collection of data that is used primarily in organizational decision making  “A collection of integrated, subjectoriented databases designed to supply the information required for decisionmaking.” -- W. Inmon (1992) “A data warehouse is a system that retrieves and consolidates data periodically from the source systems into a dimensional or normalized data store. It usually keeps years of history and is queried for business intelligence or other analytical activities. It is typically updated in batches, not every time a transaction happens in the source system.” -- Vincent Rainardi (2005) “A data warehouse is simply a single, complete, and consistent store of data obtained from a variety of sources and made available to end users in a way they can understand and use it in a business context.” Barry Devlin, IBM Consultant Relational Databases Optimized Loader ERP Systems Extraction Cleansing Data Warehouse Engine Purchased Data Legacy Data Metadata Repository Analyze Query   The primary concept of data warehousing is that the data stored for business analysis can most effectively be accessed by separating it from the data in the operational systems. Fundamental differences between operational and informational (DW) environment:  Nature of the data  Development cycle  Supporting technology  User community  Processing characteristics      Subject-Oriented Data Integrated Data Time-Varying Data Nonvolatile Data Data Granularity   Data Warehouse is designed around “subjects” rather than processes A company may have  Retail Sales System  Outlet Sales System  Catalog Sales System  DW will have a Sales Subject Area    Heterogeneous Source Systems Little or no control Need to Integrate source data  For Example: Product codes could be different in different systems  Arrive at common code in DW   Most business analysis has a time component Trend Analysis (historical data is required)  In a data warehouse it is efficient to keep data summarized at different levels.  Depending on the query, you can then go to the particular level of detail and satisfy the query.  Data granularity in a data warehouse refers to the level of detail.  The lower the level of detail, the finer is the data granularity.  If we want to keep data in the lowest level of detail, we have to store a lot of data in the data warehouse.  We will have to decide on the granularity levels based on the data types and the expected system performance for queries.    Data granularity refers to the level of detail. Depending on the requirements, multiple levels of detail may be present. Many data warehouses have at least dual levels of granularity.       Extract, Transform, Load (ETL) tools DW databases & DBMS tools Data marts Meta data DW administration & management tools Information delivery system    Data Extraction Data Cleaning Data Transformation  Convert from legacy/host format to warehouse format  Load  Sort, summarize, consolidate, compute views, check integrity, build indexes, partition       Consumes 70-80% of project time Heterogeneous Source Systems Little or no control over source systems Source systems scattered Different currencies, measurement units Ensuring data quality       A storage area where extracted data is cleaned, transformed and deduplicated. Initial storage for data Need not be based on Relational model Mainly sorting and Sequential processing Does not provide data access to users Analogy – kitchen of a restaurant  Commercial tools:        Warehouse Builders (Oracle) MS Data Transformation Services SSIS (Microsoft) DataStage SAS ETL Server Typical functions Define source, query (run SQL), define transformation, define target, verify transformation, schedule run, audit report    Almost always a relational DB Oracle, DB2, Sybase, SQL Server New DB design for special purpose of DW (e.g., scale up, speed up, parallel processing)    OLTP Systems are Data Capture Systems “DATA IN” systems DW are “DATA OUT” systems    Design of the DW must directly reflect the way the managers look at the business Should capture the measurements of importance along with parameters by which these parameters are viewed must facilitate data analysis, i.e., answering business questions     A logical design technique that seeks to eliminate data redundancy Illuminates the microscopic relationships among data elements Perfect for OLTP systems Responsible for success of transaction processing in Relational Databases ER models are NOT suitable for DW?  End user cannot understand or remember an ER Model  Many DWs have failed because of overly complex ER designs  Not optimized for complex, ad-hoc queries  Data retrieval becomes difficult due to normalization  Browsing becomes difficult  Most relational databases are set to 3rd normal form  1st Normal form – Tables have unique keys and no repeating groups or multi-value fields  2nd Normal form – Every attribute is dependent ont the entire key of the table  3rd Normal form – Attributes are dependent only on the key. No derived elements  Business needs to analyze data so that it can:  Understand trends  Predict future behavior and needs  Personalize contact with customers  Be competitive  All of this in a speedy manner, with the ability to do “What if’s”  Data is not structured for analytical usage  Multiple Joins are resource intensive  Missing data from external sources, context history, not operational sources “A structured repository of validated and integrated historical information accessible to business people to provide the basis for both tactical and strategic business decisions.”     Centralized extract and staging Separate from operational system Structured for analysis Historically contexted Relational Data External Data Enterprise Data Data Distribution Acquisition, Staging, Cleaning, Transformation Data Warehouse Storage Analytical Applications  Detail Level  Dimensional Normal form  Value and feasibility  Analytical Level  Structured for the required analyses  Summary Level  Summaries for user requirements  Better response time  Normalized for maintainability  De-normalized for performance, based on rules  2 level structure, therefore only one level of joins required for queries  Subject  Fact  Dimension ▪ Aspect / Factor ▪ Level of reality ▪ Lifelike quality      Facts are stored in FACT Tables Dimensions are stored in DIMENSION tables Dimension tables contains textual descriptors of business Fact and dimension tables form a Star Schema “BIG” fact table in center surrounded by “SMALL” dimension tables       Measures or facts Facts are “numeric” & “additive” For example; Sale Amount, Sale Units Factors or dimensions Star Schemas Snowflake & Starflake Schemas    Data mart = subset of DW for community users, e.g. accounting department Sometimes exist as Multidimensional Database Info mart = summarized data + report for community users    Data about data Field description, business rules (e.g. profit=? formula), log of file updates Help users understand content & locate data     Security & priority Keep track of updates QC Purging & copy to data mart   Security issue critical (users at many levels) Some security measures to protect a DW  Views = limit users to see certain rows/columns  Access control = grant rights to specific users to access selected data (can be created by DBA thro’ SQL commands such as Grant/Revoke)  Admin controls such as group access, firewall, encryption  Audit = track what users are doing  Tools  Query & reporting  OLAP  Data mining, visualization, segmentation, clustering  New developments: text mining, web mining & personalization  Mining multimedia data  Commercial tools  Ms SQL Server Business Intelligence, Oracle Business Intelligence Suite, Crystal Report, Cognos Solution, WebFocus  Increasingly common mode of delivery:  Web-enabled Thank you
 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                            