Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Efficient and Scalable Multi-Class
Classification using naive Bayes Tree
Yan Zhu
Agenda
Overview
Objective
The Proposed Algorithm
Experiments And Results
Conclusion
Overview
The goal of multi-class classification is to predict the class labels
of new instances whose attribute values are known, but class
values are unknown.
Decision tree (DT) is a most popular multi-class classification tool
that commonly used in many real world classification problems
such as weather predictions, astronomical and intrusion detection
etc.
Overview
DT provides a rapid and useful solution for classifying instances in
large datasets with a large number of variables.
Decision tree classification has various advantages: (a) simple to
understand, (b) easy to implement, (c) requiring little prior
knowledge, (d) able to handle both numerical and categorical
data, (e) robust, and (f) dealing with large and noisy datasets.
Objective
Two common issues
the growth of the tree to enable it to accurately categorize the
training dataset
the pruning stage, whereby superfluous nodes and branches are
removed in order to improve classification accuracy.
Overview
The naive Bayes (NB) classifier is also widely used for
classification problems in data mining and machine learning fields
because of its simplicity and impressive classification accuracy.
It has several advantages such as (a) easy to use, (b) only one
scan of the training data required, (c) handling missing attribute
values, and (d) continuous data.
Objective
In this paper, we propose an adaptive naive Bayes tree (NBTree)
algorithm for scaling up the classification accuracy for multi-class
classification tasks.
NBtree is a hybrid classifier using both decision tree and naive
Bayes classifiers.
In NBTree nodes contain and split as regular decision tree, but the
leaves are replaced by naive Bayes classifier.
The Proposed Algorithm
The naive Bayes tree classifier
The naive Bayes tree (NBTree) classifier is a hybrid learning
approach of decision tree (DT) and naive Bayesian (NB) classifiers.
In NBTree nodes contain and split as regular decision trees, but
the leaves are replaced by NB classifier
The Proposed Algorithm
In a given training dataset, each instance, xi , contains values
{𝑥𝑖1 , 𝑥𝑖2 , · · · , 𝑥𝑖ℎ }. There is a set of attributes used to describe
the training data, D = {𝐴1 , 𝐴2 , · · · , 𝐴𝑛 }. Each attribute contains
attribute values Ai = {𝐴𝑖1 , 𝐴𝑖2 , · · · , 𝐴𝑖𝑘 }. A set of classes C = {𝐶𝑖 ,
𝐶𝑖 , · · · , 𝐶𝑛 } is also used to label the training instances, where
each class Ci = {𝐶𝑖1 , 𝐶𝑖2 , · · · , 𝐶𝑖𝑘 } also has some values.
The Proposed Algorithm
The aim of DT learning is to construct a tree model from training
dataset, D, and correspondingly the Bayes theorem, if attribute Ai
∈ D is discrete or continuous, we will have:
Where P(𝐶𝑖 |𝐴𝑖𝑗 ) denote the probability.
The Proposed Algorithm
The algorithm calculates the class conditional probabilities of
attributes in each leaf node of the tree, T.
For each attribute, 𝐴𝑖 , the number of occurrences of each
attribute value, 𝐴𝑖𝑗 , can be counted to determine P(𝐴𝑖 ).
Similarly, the probability P(𝐴𝑖𝑗 |𝐶𝑖 ) also can be estimated by
counting how often each 𝐴𝑖𝑗 occurs in 𝐶𝑖 of leaf node of the DT, t.
The Proposed Algorithm
To calculate P(𝐶𝑖 |𝑥𝑖 ), we need P(𝐶𝑖 ) for each 𝐶𝑖 , and P(𝑥𝑖 |𝐶𝑖 ),
and estimate the likelihood that 𝑥𝑖 . The posterior probability,
P(𝐶𝑖 |𝑥𝑖 ), is then found for 𝐶𝑖 . The class, 𝐶𝑖 , with the highest
probability is used to label the instance, 𝑥𝑖
Experiments
Data Sets
10 real benchmark datasets from UCI machine learning repository
Experiments
Experimental setup
10-fold cross validation
Measurement
Accuracy
Precision
Sensitivity-specificity analysis
Results
Results
Results
Conclusion
This paper proposed an adaptive NBTree algorithm to improved the
classification accuracy rates of multiclass classification problems.
It used DT induction to select a subset of attributes from training dataset for
the production of naive assumption of class conditional independence.
The performances of the proposed algorithm was tested against traditional DT
and NB classifiers and the experimental results showed that the proposed
NBTree algorithm has produced impressive results for the classification of real
life challenging multi-class problems.
Reference
Farid, D. M., Rahman, M. M., & Al-Mamuny, M. A. (2014, May).
Efficient and scalable multi-class classification using naïve Bayes tree.
In Informatics, Electronics & Vision (ICIEV), 2014 International
Conference on(pp. 1-4). IEEE.