Download Efficient and Scalable Multi-Class Classification

Efficient and Scalable Multi-Class Classification using naive Bayes Tree Yan Zhu Agenda  Overview  Objective  The Proposed Algorithm  Experiments And Results  Conclusion Overview  The goal of multi-class classification is to predict the class labels of new instances whose attribute values are known, but class values are unknown.  Decision tree (DT) is a most popular multi-class classification tool that commonly used in many real world classification problems such as weather predictions, astronomical and intrusion detection etc. Overview  DT provides a rapid and useful solution for classifying instances in large datasets with a large number of variables.  Decision tree classification has various advantages: (a) simple to understand, (b) easy to implement, (c) requiring little prior knowledge, (d) able to handle both numerical and categorical data, (e) robust, and (f) dealing with large and noisy datasets. Objective  Two common issues  the growth of the tree to enable it to accurately categorize the training dataset  the pruning stage, whereby superfluous nodes and branches are removed in order to improve classification accuracy. Overview  The naive Bayes (NB) classifier is also widely used for classification problems in data mining and machine learning fields because of its simplicity and impressive classification accuracy.  It has several advantages such as (a) easy to use, (b) only one scan of the training data required, (c) handling missing attribute values, and (d) continuous data. Objective  In this paper, we propose an adaptive naive Bayes tree (NBTree) algorithm for scaling up the classification accuracy for multi-class classification tasks.  NBtree is a hybrid classifier using both decision tree and naive Bayes classifiers.  In NBTree nodes contain and split as regular decision tree, but the leaves are replaced by naive Bayes classifier. The Proposed Algorithm  The naive Bayes tree classifier  The naive Bayes tree (NBTree) classifier is a hybrid learning approach of decision tree (DT) and naive Bayesian (NB) classifiers.  In NBTree nodes contain and split as regular decision trees, but the leaves are replaced by NB classifier The Proposed Algorithm  In a given training dataset, each instance, xi , contains values {𝑥𝑖1 , 𝑥𝑖2 , · · · , 𝑥𝑖ℎ }. There is a set of attributes used to describe the training data, D = {𝐴1 , 𝐴2 , · · · , 𝐴𝑛 }. Each attribute contains attribute values Ai = {𝐴𝑖1 , 𝐴𝑖2 , · · · , 𝐴𝑖𝑘 }. A set of classes C = {𝐶𝑖 , 𝐶𝑖 , · · · , 𝐶𝑛 } is also used to label the training instances, where each class Ci = {𝐶𝑖1 , 𝐶𝑖2 , · · · , 𝐶𝑖𝑘 } also has some values. The Proposed Algorithm  The aim of DT learning is to construct a tree model from training dataset, D, and correspondingly the Bayes theorem, if attribute Ai ∈ D is discrete or continuous, we will have:  Where P(𝐶𝑖 |𝐴𝑖𝑗 ) denote the probability. The Proposed Algorithm  The algorithm calculates the class conditional probabilities of attributes in each leaf node of the tree, T.  For each attribute, 𝐴𝑖 , the number of occurrences of each attribute value, 𝐴𝑖𝑗 , can be counted to determine P(𝐴𝑖 ).  Similarly, the probability P(𝐴𝑖𝑗 |𝐶𝑖 ) also can be estimated by counting how often each 𝐴𝑖𝑗 occurs in 𝐶𝑖 of leaf node of the DT, t. The Proposed Algorithm  To calculate P(𝐶𝑖 |𝑥𝑖 ), we need P(𝐶𝑖 ) for each 𝐶𝑖 , and P(𝑥𝑖 |𝐶𝑖 ), and estimate the likelihood that 𝑥𝑖 . The posterior probability, P(𝐶𝑖 |𝑥𝑖 ), is then found for 𝐶𝑖 . The class, 𝐶𝑖 , with the highest probability is used to label the instance, 𝑥𝑖 Experiments  Data Sets  10 real benchmark datasets from UCI machine learning repository Experiments  Experimental setup  10-fold cross validation  Measurement  Accuracy  Precision  Sensitivity-specificity analysis Results Results Results Conclusion  This paper proposed an adaptive NBTree algorithm to improved the classification accuracy rates of multiclass classification problems.  It used DT induction to select a subset of attributes from training dataset for the production of naive assumption of class conditional independence.  The performances of the proposed algorithm was tested against traditional DT and NB classifiers and the experimental results showed that the proposed NBTree algorithm has produced impressive results for the classification of real life challenging multi-class problems. Reference Farid, D. M., Rahman, M. M., & Al-Mamuny, M. A. (2014, May). Efficient and scalable multi-class classification using naïve Bayes tree. In Informatics, Electronics & Vision (ICIEV), 2014 International Conference on(pp. 1-4). IEEE.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Efficient and Scalable Multi-Class Classification