* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Data Warehouse - Information Management and Systems
		                    
		                    
								Survey							
                            
		                
		                
                            
                            
								Document related concepts							
                        
                        
                    
						
						
							Transcript						
					
					Data Warehouse IMS5024 – presented by Eder Tsang Data Warehouse  A data warehouse is a system consisting of processes and databases used to provide the “data infrastructure” for EIS and DSS  “… a subject-oriented, integrated, timevariant, and nonvolatile collection of data in support of management’s decisions” Inmon and Hackathorn (1994) Data warehouse - subject oriented  The data warehouse is organised by “data subjects” that are relevant to the organisation. – Customer, claim, shipment, product  This may be contrasted with the process orientation of many OLTP systems Data warehouse - integrated  Data in the warehouse is structured based on a corporatewide model, spanning the functional boundaries of legacy systems  This includes naming standards, units of measurement and periodicity Data warehouse - time variant  Data is the data warehouse is characterised by the timeseries nature of historical data  The data consists of a series of “snapshots” which are time-stamped and record values at a moment in time  This supports trend analysis of the data Data warehouse - non volatile  The data warehouse is not continuously updated (inserts, eletes and changes) like data in an OLTP system  Data in a data warehouse is periodically up-loaded at a scheduled time intervals (say daily) Motivations for data warehousing  Demands on OLTP data bases for query processing would be too great  Data warehousing is designed for efficient retrieval  Data in legacy systems is frequently inconsistent, of poor quality and stored in different formats  Reduce costs in providing data for decision making Motivations for data warehousing  Support for focus on complete business processes (BPR)  Support for new initiatives – CRM, Balanced Scorecard  Industry sources quote ROI’s averaging 401% over 3 years  Remain competitive Typical Data Warehouse Architecture An Actual Data Warehouse Data warehouse development  Requirements identification  Logical design, data modelling  Data extract, transform and load (ETL)  Warehouse architecture , technology and tools  Physical database design  Delivery systems  Operational policies Designing a data warehouse – data design There are two main approaches to data modelling or data warehouse design – entity relationship modelling and normalisation – dimensional modelling The design of databases using a traditional E-R approach  Entities and relationships  Normalisation 3NF Entity relationship schema Why do we normalise data?  Normalisation is a process for converting complex data structures into simple, stable data structures  Normalisation protects integrity of database by avoiding anomalies (update, delete, create) • Normalised data models are: • robust and stable • have minimum redundancy Dimensional Modeling (star schema) Dimensional Modeling (star schema) Components of dimensional model: – Fact Tables : contain measurements of business eg. Sales, purchase order, shipment – Dimension Tables : store the descriptions of the dimensions of the business eg. Product, customer, vendor, store Dimensional Modeling (star schema)  Each dimension table has a single primary key that corresponds exactly to one of the components of the multipart key in the fact table.  A fact table always expresses a many to many relationship (the key is composed of foreign keys  The most useful facts in a fact table are numeric and additive ( typically values are added up) Snowflake schema  Snowflake schema –all the tables are normalised  Star schemas are preferable to snowflake – fewer joins for information retrieval Dimensional Modelling vs E-R modelling  the purpose of dimensional modelling structure data for easy and efficient analysis  E-R modelling creates a single required to support organisation’s Whereas  DM creates individual models for business/decision interest eg. • model for sales info • model for Inventory info Entity relationship schema (3NF) Corresponding Star schema Corresponding snowflake schema Dimensional Modelling vs E-R modelling (Con’t)  OLTP and DW have different purpose: – operational vs informational  Normalisation protects integrity of database by avoiding anomalies (update, delete, create)  Data models for data warehouse do not have to be normalised – In contrast, data in DW does not change often – periodic additions of new data DM vs. E-R modeling debate (Kimball’s view)  OLTP systems are volatile – high rates of update transactions  In normalised models the goal is to reduce data redundancy and prevent update anomalies  Data in a data warehouse does not need to be normalised because it is periodically refreshed not updated by user transactions
 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                            