Download Multi-label classification task x

國立雲林科技大學 N.Y.U.S.T. I. M. National Yunlin University of Science and Technology Effective multi-label active learning for text classification Presenter : Wu, Jia-Hao Authors : Bishan Yang , Jian-Tao Sun , Tengjiao Wang , Zheng Chen KDD (2009) Intelligent Database Systems Lab Outline  Motivation  Objective  Problem definition  Methodology  Experiments  Conclusion  Personal Comments N.Y.U.S.T. I. M. 2 Intelligent Database Systems Lab Motivation  N.Y.U.S.T. I. M. Multi-labeled text classification problems have received considerable attention, since many text classification tasks are multi-labeled. 觀光影劇財經 Intelligent Database Systems Lab Motivation (Cont.)  N.Y.U.S.T. I. M. Multi-label information – instance : three categories c1,c2,c3  x1 are [c1 : 0.8 , c2 : 0.5 , c3 : 0.1] Multi-label classification task  x2 are [c1 : 0.7 , c2 : 0.1 , c3 : 0.1]  x1 are [c1 : 0.8]  x2 are [c1 : 0.7] Single-label classification task Considering multi-label information in the sample selection strategy is very important. Intelligent Database Systems Lab Objective  The authors propose a novel multi-label active learning approach for text classification.   N.Y.U.S.T. I. M. The sample selection strategy aims to label data which can help maximize the reduction rate of the expected model loss. Also propose an effective method to predict labels for multi-label data. Intelligent Database Systems Lab Problem definition N.Y.U.S.T. I. M.  Training examples as x1,…,xn and the k classes as 1,…,k  The label set of xi by a binary vector yi = [ yi1 ,..., yik ]   The set of all possible class combinations as x1 x1 x1 x1 y11 +1 y12 +1 +1 -1 -1 +1 -1 -1 Intelligent Database Systems Lab Problem definition (Cont.)   N.Y.U.S.T. I. M. In SVM (binary classifiers)  f i as the binary classifier associated with target class i.  Given a test instance x’ , if f i (x’) > 0 , then x’ belongs to class i use a pool-based active learning approach.  The data with labels by Dl  Remaining data without labels by Du Intelligent Database Systems Lab Methodology N.Y.U.S.T. I. M.  P(x) be the input distribution  the multi-label prediction function given training set Dl as fDl  the predicted label set of x is fDl (x)  The true label set of x is y the estimated loss on x as L(fDl (x) , y) → L(fDl) Intelligent Database Systems Lab Methodology (Cont.) N.Y.U.S.T. I. M.  The active learner will evaluate each possible set of unlabeled data Ds to find the optimal query set Ds*  The new training set as Dl'  Dl  Ds , and the expected loss for the classifier trained on  The optimization problem is to find the optimal query set Ds* Intelligent Database Systems Lab Methodology (Cont.) – Sample Selection Strategy with SVM   The optimization problem  How to measure the loss reduction of the multi-label classifier  How to provide a good probability estimation for the conditional probability p(y|x) Estimate Loss Reduction  Use SVM margin as the measure of the version space size. Intelligent Database Systems Lab N.Y.U.S.T. I. M. Methodology (Cont.) N.Y.U.S.T. I. M.  Denote as the size of version space of the binary classifier associated with target class i and learnt from labeled data Dl  After adding new data point , where is the true label for data x on class i , the new model loss versus the old one on the binary classifier Intelligent Database Systems Lab Methodology (Cont.)  N.Y.U.S.T. I. M. Label Prediction  Suppose there are k classes. We can have k binary classifiers . Given data x , denote as the probability of x belonging to class i.  Next use the Logistic regression (LR) algorithm to predict the number of labels.  Before LR is used the authors transform the decision output on the training data to classification probabilities , use the Sigmoid function. SVM classifier Output the label with the largest probability Sort the probabilities Train Logistic regression classifier Intelligent Database Systems Lab Methodology (Cont.)  N.Y.U.S.T. I. M. Incorporating the predicted label vector into the expected loss estimation , we obtain our data selection strategy  Maximum loss reduction with Maximal Confidence (MMC)  The yi → Intelligent Database Systems Lab Experiments  N.Y.U.S.T. I. M. Use the Micro-Average F1 score as the evaluation measure.  n is the number of test data  yi is the true label vector of the i-th data instance.  ŷi is the predicted label vector.  k is the classes number Intelligent Database Systems Lab Experiments  N.Y.U.S.T. I. M. Label prediction methods  Use the RCV1-V2 data set Intelligent Database Systems Lab Experiments  N.Y.U.S.T. I. M. Sensitive experiments  Sampling sizes per run Intelligent Database Systems Lab Experiments N.Y.U.S.T. I. M. Intelligent Database Systems Lab Conclusion N.Y.U.S.T. I. M.  The method MMC is to reduce the required size of labeled data in multi-label classification while maintaining favorable accuracy performance.  The method outperforms the other active learning techniques on multi-label text classification by a large margin and can significantly reduce the labeling cost. Intelligent Database Systems Lab Comments  Advantage   This paper has many experiment to show their performance. Drawback   N.Y.U.S.T. I. M. … Application  News , Email classification  Image classification Intelligent Database Systems Lab

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Multi-label classification task x