Download CI-10IS74 -DM

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

Cluster analysis wikipedia , lookup

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
PESIT SOUTH CAMPUS
10IS74 - Data Mining
Faculty: D Annapurna
No of Hours: 52
Course Objectives
In the current scenario, there is huge scientific data, medical data, demographic data, financial data,
and marketing data. People have no time to look at this huge data, it is necessary to find the
techniques to automatically analyze the data, to automatically classify it, to automatically
summarize it, to automatically discover it, and characterize tends in it. This is one of the most
interesting and active database research areas. The data mining areas includes statistics,
visualization, artificial intelligence, and machine learning. Data mining is a multidisciplinary field;
gain the work from areas that includes database technology, machine learning, statistics, pattern
recognition, information retrieval, neural networks, knowledge-based systems, artificial
intelligence, high-performance computing, and data visualization. Data mining and
warehousing studies helps to understand the algorithms and computational paradigms that allow
computers to find patterns and regularities in databases. Discussion of the major data mining
technologies of frequent pattern, classification, clustering along with applications. Discussion of
various classification algorithms that have been designed like decision tree algorithms, Naïve
Bayes, Bayesian networks and nearest neighbor schemes.
The course will cover all these issues and will illustrate the whole process by examples of practical
applications. Data mining and warehousing technology enables students to explore data in search of
interesting patterns, drawing work from artificial intelligence, statistics, and information retrieval.
Students will be able to discover various kinds of patterns such as classification and regression
models, clusters, and frequent patterns. Students will be able to identify the frequent patterns
applications in diverse areas such as marketing, medicine, sports, and agriculture. Will able to
understand various classification algorithms that have been designed like decision tree algorithms,
Naïve bayes, Bayesian networks and nearest neighbor schemes. Students will be able to integrate
the ideas from various classifiers to design a grand classifier that has the best features of the other
classifiers. To develop skills of using recent data mining software for solving practical problems.
To gain experience of doing independent study and research.
B.E 7th Semester Information Science
1
PESIT SOUTH CAMPUS
Class
#
1.
2.
3.
4.
5.
% of portions covered
Chapter Title/
Reference
Literature
Topic To be Covered
Reference
Chapter
Cumulative
11.54
11.54
Introduction,
Unit -1
Data
Warehousing
T2
Operational Data Stores (ODS)
Loading (ETL)
Data Warehouses. Design Issues
Guidelines for Data
Warehouse Implementation,
B.E 7th Semester Information Science
2
PESIT SOUTH CAMPUS
6.
Data Warehouse Metadata
7.
Introduction, Characteristics of
OLAP systems,
8.
9.
10.
11.
Unit- 2
Online
Analytical
Processing
(OLAP)
T2
Multidimensional view
Data cube, Data Cube
Implementations,
13.
14.
15.
16.
17.
18.
19.
20.
What is data mining? Motivating challenges
The origins of data mining, Data Mining Tasks
Types of Data
Data Quality
Data preprocessing
Measures of Similarity and Dissimilarity
Measures of Similarity and Dissimilarity contd
24.
25.
26.
15.38
38.46
11.54
50.00
Implementation of OLAP and
Overview on OLAP Software.
21.
22.
23.
23.08
Data Cube operations,
12.
Unit - 3
Introduction,
Data- 1
T1
11.54
Data Mining
Applications
Unit - 4
Association
analysis-1
T1
Problem Definition
Frequent Itemset generation
Rule Generation; Compact representation of
frequent itemsets
Alternative methods for generating frequent
itemsets.frequent itemsets
FP-Growth algorithm
Evaluation of association patterns
B.E 7th Semester Information Science
3
PESIT SOUTH CAMPUS
1.
2.
3.
Unit - 5
Classification -1
T1
Preliminaries; General approach to solving
a classification problem
Decision tree induction
Rule-based classifier contd
4.
Nearest-neighbor classifier.
5.
6.
7.
Bayesian Classifiers
classification methods
Bayesian Classifiers
Improving accuracy of clarification methods
Improving accuracy of clarification methods
Evaluation criteria for classification methods,
8.
9.
10.
Unit - 6
Classification -2
T1
11.
Multiclass Problem.
12.
Overview, Features of cluster analysis,
13.
Types of
Data and Computing Distance,
Unit- 7
14.
Clustering
Techniques
T1
15.
16.
17.
18.
Types of Cluster Analysis Methods,
Partitional Methods,
15.38
38.46
15.38
38.46
13.46
88.46
Hierarchical Methods,
Density Based Methods,
Quality and Validity of Cluster Analysis
Introduction
Web content mining, Text Mining,
Unit -8
19.
Web Mining
T1
20.
11.54
100.00
Text clustering
Mining Spatial and Temporal Databases.
21.
Book
Unstructured
Text,
Code
Title & Author
B.E 7th Semester Information Science
Publication Info
4
PESIT SOUTH CAMPUS
Type
Text
Books
T1
T2
R1
Referenc
e Book
R2
R3
Introduction to Data Mining – Pang-Ning
Tan, Michael Steinbach, Vipin Kumar
G. K. Gupta: Introduction to Data Mining
with Case Studies, 3rd
Edition, PHI, New Delhi, 2009.
Arun K Pujari: Data Mining Techniques,
2nd Edition,
2. Jiawei Han and Micheline Kamber: Data
Mining - Concepts and
Techniques, 2nd Edition, Morgan
Kaufmann Publisher, 2006
3. Alex Berson and Stephen J. Smith: Data
Warehousing, Data Mining,
and OLAP Computing, Mc GrawHill
Publisher, 1997.
B.E 7th Semester Information Science
Edition
Publisher
Year
-
Pearson Education
2007
Second
Morgan Kaufmann
2006
2nd Edition,
Universities
Press,
2009.
2nd Edition,
Morgan Kaufmann
Publisher,
2006.
Mc GrawHill
Publisher,
1997
5