* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download CIT 365: Data Mining and Data Warehousing
		                    
		                    
								Survey							
                            
		                
		                
                            
                            
								Document related concepts							
                        
                        
                    
						
						
							Transcript						
					
					CIT 858: Data Mining and Data Warehousing Course Instructor: Bajuna Salehe Email: bajunar@yahoo.com Web: www.ifm.ac.tz/staff/bajuna/courses/ Introduction to Data Mining and Data Warehousing Data Mining and Data Warehousing  Agenda  What is Data Mining?  What is Data Warehousing?  The source of invention of Data Mining and Data Warehousing.  Drowning in Data Starving for Knowledge.  Evolution of Database Technology to the current state. (Home Work) What Is Data Mining?  Data mining (knowledge discovery from data) Extraction of interesting (non-trivial, implicit, previously unknown and potentially useful) patterns or knowledge from huge amount of data Data mining: a misnomer?  Should have been named “knowledge mining from data” which is too long  or “knowledge mining” not reflecting the emphasis on mining from huge data What Is Data Mining? Many people treat data mining as a synonym for another popularly used term Knowledge Discovery from Data/Databases (KDD).  KDD as the process is depicted below:  The KDD Process Knowledge Evaluation & Presentation Data Mining Selection & Transformation Data Warehouse Cleaning & Integration Databases KDD Process 1) Data cleaning  To move noise and inconsistent data 2) Data integration  Where multiple data sources may be combined 3) Data selection  Where data relevant to the analysis task are retrieved from the database. KDD Process 4) Data transformation  Where data are transformed or consolidated into forms appropriate for mining by performing summary or aggregation operations, for instance. 5) Data mining  An essential process where intelligent methods are applied in order to extract data pattern. KDD Process 6) Pattern evaluation.  To identify the truly interesting pattern representing knowledge. 7) Knowledge presentation  Where visualization and knowledge representation techniques are used to present the mined knowledge to the users. 8) Use of discovered knowledge Data Mining: On What Kinds Of Data? Relational database Data warehouse Transactional database Advanced database and information repository  Spatial and temporal data  Stream data  Multimedia database  Text databases & WWW Data Mining Functionalities Association (correlation and causality)  Cheese & Bread Classification and Prediction  Construct models that describe and distinguish classes or concepts for future prediction  Predict some unknown or missing numerical values Data Mining Functionalities (cont…) Cluster analysis  Class label is unknown: Group data to form new classes, e.g., cluster houses to find distribution patterns Outlier analysis  Outlier: a data object that does not comply with the general behavior of the data  Noise or exception? No! useful in fraud detection and rare event analysis Necessity Is The Mother Of Invention Data explosion problem  Automated data collection tools and mature database technology lead to huge amounts of data accumulated We are drowning in data, but starving for knowledge! Solution: Data warehousing and data mining  Data warehousing and on-line analytical processing  Mining interesting knowledge (rules, regularities, patterns, constraints) from data Evolution Of Database Technology 1960s:  Data collection, database creation, IMS and network DBMS 1970s:  Relational data model, relational DBMS implementation 1980s:  RDBMS, advanced data models (extendedrelational, OO, deductive, etc.) Evolution Of Database Technology 1990s:  Data mining, data warehousing, multimedia databases, and Web databases 2000s  Stream data management and mining  Data mining with a variety of applications  Web technology and global information systems Potential Applications Data analysis and decision support  Market analysis and management  Risk analysis and management  Fraud detection and detection of unusual patterns Other applications  Text mining (email, documents) and Web mining  Stream data mining Fraud Detection & Mining Unusual Patterns Applications: Health care, retail, credit card service, telecommunications  Auto insurance: ring of collisions  Money laundering: suspicious monetary transactions  Medical insurance  Professional patients, ring of doctors, and ring of references  Unnecessary or correlated screening tests  Telecommunications: phone-call fraud  Phone call model: destination of the call, duration, time of day or week. Analyze patterns that deviate from an expected norm  Retail industry  Analysts estimate that 38% of retail shrink is due to dishonest employees  Anti-terrorism Other Applications Sports  IBM Advanced Scout analyzed NBA game statistics (shots blocked, assists, and fouls) to gain competitive advantage for New York Knicks and Miami Heat Internet Web Surf-Aid  IBM Surf-Aid applies data mining algorithms to Web access logs for market-related pages to discover customer preference and behavior to help analyzing effectiveness of Web marketing, improving Web site organization, etc. What is Data Warehouse?  Defined in many different ways, but not rigorously  A decision support database that is maintained separately from the organization’s operational database  Support information processing by providing a solid platform of consolidated, historical data for analysis “A data warehouse is a subject-oriented, integrated, time-variant, and non-volatile collection of data in support of management’s decision-making process” —Bill Inmon The source of Invention of DW and Data Mining    Data explosion problem  Automated data collection tools and mature database technology lead to huge amounts of data accumulated We are drowning in data, but starving for knowledge! Solution: Data warehousing and data mining  Data warehousing and on-line analytical processing  Mining interesting knowledge (rules, regularities, patterns, constraints) from data in large databases Drowning In Data, Starving For Knowledge DATA KNOWLEDGE Importance of Data Mining By performing data mining, interesting knowledge, regularities, or high-level information can be extracted from databases and viewed or browsed from different angles.  The discovered knowledge can be applied to decision making process. 
 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                            