Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
PESIT SOUTH CAMPUS 10IS74 - Data Mining Faculty: D Annapurna No of Hours: 52 Course Objectives In the current scenario, there is huge scientific data, medical data, demographic data, financial data, and marketing data. People have no time to look at this huge data, it is necessary to find the techniques to automatically analyze the data, to automatically classify it, to automatically summarize it, to automatically discover it, and characterize tends in it. This is one of the most interesting and active database research areas. The data mining areas includes statistics, visualization, artificial intelligence, and machine learning. Data mining is a multidisciplinary field; gain the work from areas that includes database technology, machine learning, statistics, pattern recognition, information retrieval, neural networks, knowledge-based systems, artificial intelligence, high-performance computing, and data visualization. Data mining and warehousing studies helps to understand the algorithms and computational paradigms that allow computers to find patterns and regularities in databases. Discussion of the major data mining technologies of frequent pattern, classification, clustering along with applications. Discussion of various classification algorithms that have been designed like decision tree algorithms, Naïve Bayes, Bayesian networks and nearest neighbor schemes. The course will cover all these issues and will illustrate the whole process by examples of practical applications. Data mining and warehousing technology enables students to explore data in search of interesting patterns, drawing work from artificial intelligence, statistics, and information retrieval. Students will be able to discover various kinds of patterns such as classification and regression models, clusters, and frequent patterns. Students will be able to identify the frequent patterns applications in diverse areas such as marketing, medicine, sports, and agriculture. Will able to understand various classification algorithms that have been designed like decision tree algorithms, Naïve bayes, Bayesian networks and nearest neighbor schemes. Students will be able to integrate the ideas from various classifiers to design a grand classifier that has the best features of the other classifiers. To develop skills of using recent data mining software for solving practical problems. To gain experience of doing independent study and research. B.E 7th Semester Information Science 1 PESIT SOUTH CAMPUS Class # 1. 2. 3. 4. 5. % of portions covered Chapter Title/ Reference Literature Topic To be Covered Reference Chapter Cumulative 11.54 11.54 Introduction, Unit -1 Data Warehousing T2 Operational Data Stores (ODS) Loading (ETL) Data Warehouses. Design Issues Guidelines for Data Warehouse Implementation, B.E 7th Semester Information Science 2 PESIT SOUTH CAMPUS 6. Data Warehouse Metadata 7. Introduction, Characteristics of OLAP systems, 8. 9. 10. 11. Unit- 2 Online Analytical Processing (OLAP) T2 Multidimensional view Data cube, Data Cube Implementations, 13. 14. 15. 16. 17. 18. 19. 20. What is data mining? Motivating challenges The origins of data mining, Data Mining Tasks Types of Data Data Quality Data preprocessing Measures of Similarity and Dissimilarity Measures of Similarity and Dissimilarity contd 24. 25. 26. 15.38 38.46 11.54 50.00 Implementation of OLAP and Overview on OLAP Software. 21. 22. 23. 23.08 Data Cube operations, 12. Unit - 3 Introduction, Data- 1 T1 11.54 Data Mining Applications Unit - 4 Association analysis-1 T1 Problem Definition Frequent Itemset generation Rule Generation; Compact representation of frequent itemsets Alternative methods for generating frequent itemsets.frequent itemsets FP-Growth algorithm Evaluation of association patterns B.E 7th Semester Information Science 3 PESIT SOUTH CAMPUS 1. 2. 3. Unit - 5 Classification -1 T1 Preliminaries; General approach to solving a classification problem Decision tree induction Rule-based classifier contd 4. Nearest-neighbor classifier. 5. 6. 7. Bayesian Classifiers classification methods Bayesian Classifiers Improving accuracy of clarification methods Improving accuracy of clarification methods Evaluation criteria for classification methods, 8. 9. 10. Unit - 6 Classification -2 T1 11. Multiclass Problem. 12. Overview, Features of cluster analysis, 13. Types of Data and Computing Distance, Unit- 7 14. Clustering Techniques T1 15. 16. 17. 18. Types of Cluster Analysis Methods, Partitional Methods, 15.38 38.46 15.38 38.46 13.46 88.46 Hierarchical Methods, Density Based Methods, Quality and Validity of Cluster Analysis Introduction Web content mining, Text Mining, Unit -8 19. Web Mining T1 20. 11.54 100.00 Text clustering Mining Spatial and Temporal Databases. 21. Book Unstructured Text, Code Title & Author B.E 7th Semester Information Science Publication Info 4 PESIT SOUTH CAMPUS Type Text Books T1 T2 R1 Referenc e Book R2 R3 Introduction to Data Mining – Pang-Ning Tan, Michael Steinbach, Vipin Kumar G. K. Gupta: Introduction to Data Mining with Case Studies, 3rd Edition, PHI, New Delhi, 2009. Arun K Pujari: Data Mining Techniques, 2nd Edition, 2. Jiawei Han and Micheline Kamber: Data Mining - Concepts and Techniques, 2nd Edition, Morgan Kaufmann Publisher, 2006 3. Alex Berson and Stephen J. Smith: Data Warehousing, Data Mining, and OLAP Computing, Mc GrawHill Publisher, 1997. B.E 7th Semester Information Science Edition Publisher Year - Pearson Education 2007 Second Morgan Kaufmann 2006 2nd Edition, Universities Press, 2009. 2nd Edition, Morgan Kaufmann Publisher, 2006. Mc GrawHill Publisher, 1997 5