Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Machine Learning with
WEKA
WEKA: A Machine
Learning Toolkit
The Explorer
•
•
Eibe Frank
•
•
Department of Computer Science,
University of Waikato, New Zealand
•
Classification and
Regression
Clustering
Association Rules
Attribute Selection
Data Visualization
The Experimenter
The Knowledge
Flow GUI
Conclusions
WEKA: the bird
Copyright: Martin Kramer (mkramer@wxs.nl)
1/10/2008
Machine Learning for Data Mining
University of Waikato
2
1
WEKA: the software
Machine learning/data mining software written in
Java (distributed under the GNU Public License)
Used for research, education, and applications
Complements “Data Mining” by Witten & Frank
Main features:
Comprehensive set of data pre-processing
pre processing tools,
learning algorithms and evaluation methods
Graphical user interfaces (incl. data visualization)
Environment for comparing learning algorithms
1/10/2008
University of Waikato
3
WEKA: versions
There are several versions of WEKA:
WEKA 3.0: “book version” compatible with
description in data mining book
WEKA 3.2: “GUI version” adds graphical user
interfaces (book version is command-line only)
WEKA 3.3: “development version” with lots of
i
improvements
t
This talk is based on the latest snapshot of WEKA
3.3 (soon to be WEKA 3.4)
1/10/2008
Machine Learning for Data Mining
University of Waikato
4
2
WEKA only deals with “flat” files
@relation heart-disease-simplified
@ tt ib t age numeric
@attribute
i
@attribute sex { female, male}
@attribute chest_pain_type { typ_angina, asympt, non_anginal, atyp_angina}
@attribute cholesterol numeric
@attribute exercise_induced_angina { no, yes}
@attribute class { present, not_present}
@
@data
63,male,typ_angina,233,no,not_present
67,male,asympt,286,yes,present
67,male,asympt,229,yes,present
38,female,non_anginal,?,no,not_present
...
1/10/2008
University of Waikato
5
WEKA only deals with “flat” files
@relation heart-disease-simplified
@ tt ib t age numeric
@attribute
i
@attribute sex { female, male}
@attribute chest_pain_type { typ_angina, asympt, non_anginal, atyp_angina}
@attribute cholesterol numeric
@attribute exercise_induced_angina { no, yes}
@attribute class { present, not_present}
@
@data
63,male,typ_angina,233,no,not_present
67,male,asympt,286,yes,present
67,male,asympt,229,yes,present
38,female,non_anginal,?,no,not_present
...
1/10/2008
Machine Learning for Data Mining
University of Waikato
6
3
1/10/2008
University of Waikato
7
1/10/2008
University of Waikato
8
Machine Learning for Data Mining
4
1/10/2008
University of Waikato
9
Explorer: pre-processing the data
Data can be imported from a file in various
formats: ARFF, CSV, C4.5, binary
Data can also be read from a URL or from an SQL
database (using JDBC)
Pre-processing tools in WEKA are called “filters”
WEKA contains filters for:
Discretization, normalization, resampling, attribute
selection, transforming and combining attributes, …
1/10/2008
Machine Learning for Data Mining
University of Waikato
10
5
1/10/2008
University of Waikato
11
1/10/2008
University of Waikato
12
Machine Learning for Data Mining
6
1/10/2008
University of Waikato
13
1/10/2008
University of Waikato
14
Machine Learning for Data Mining
7
1/10/2008
University of Waikato
15
1/10/2008
University of Waikato
16
Machine Learning for Data Mining
8
1/10/2008
University of Waikato
17
1/10/2008
University of Waikato
18
Machine Learning for Data Mining
9
1/10/2008
University of Waikato
19
1/10/2008
University of Waikato
20
Machine Learning for Data Mining
10
1/10/2008
University of Waikato
21
1/10/2008
University of Waikato
22
Machine Learning for Data Mining
11
1/10/2008
University of Waikato
23
1/10/2008
University of Waikato
24
Machine Learning for Data Mining
12
1/10/2008
University of Waikato
25
1/10/2008
University of Waikato
26
Machine Learning for Data Mining
13
1/10/2008
University of Waikato
27
1/10/2008
University of Waikato
28
Machine Learning for Data Mining
14
1/10/2008
University of Waikato
29
1/10/2008
University of Waikato
30
Machine Learning for Data Mining
15
1/10/2008
University of Waikato
31
Explorer: building “classifiers”
Classifiers in WEKA are models for predicting
nominal or numeric quantities
Implemented learning schemes include:
Decision trees and lists, instance-based classifiers,
support vector machines, multi-layer perceptrons,
logistic regression, Bayes’ nets, …
“Meta”-classifiers include:
Bagging, boosting, stacking, error-correcting output
codes, locally weighted learning, …
1/10/2008
Machine Learning for Data Mining
University of Waikato
32
16
1/10/2008
University of Waikato
33
1/10/2008
University of Waikato
34
Machine Learning for Data Mining
17
1/10/2008
University of Waikato
35
1/10/2008
University of Waikato
36
Machine Learning for Data Mining
18
1/10/2008
University of Waikato
37
1/10/2008
University of Waikato
38
Machine Learning for Data Mining
19
1/10/2008
University of Waikato
39
1/10/2008
University of Waikato
40
Machine Learning for Data Mining
20
1/10/2008
University of Waikato
41
1/10/2008
University of Waikato
42
Machine Learning for Data Mining
21
1/10/2008
University of Waikato
43
1/10/2008
University of Waikato
44
Machine Learning for Data Mining
22
1/10/2008
University of Waikato
45
1/10/2008
University of Waikato
46
Machine Learning for Data Mining
23
1/10/2008
University of Waikato
47
1/10/2008
University of Waikato
48
Machine Learning for Data Mining
24
1/10/2008
University of Waikato
49
1/10/2008
University of Waikato
50
Machine Learning for Data Mining
25
1/10/2008
University of Waikato
51
1/10/2008
University of Waikato
52
Machine Learning for Data Mining
26
1/10/2008
University of Waikato
53
1/10/2008
University of Waikato
54
Machine Learning for Data Mining
27
1/10/2008
University of Waikato
55
1/10/2008
University of Waikato
56
Machine Learning for Data Mining
28
1/10/2008
University of Waikato
57
1/10/2008
University of Waikato
58
Machine Learning for Data Mining
29
1/10/2008
University of Waikato
59
1/10/2008
University of Waikato
60
Machine Learning for Data Mining
30
1/10/2008
University of Waikato
61
1/10/2008
University of Waikato
62
Machine Learning for Data Mining
31
1/10/2008
University of Waikato
63
1/10/2008
University of Waikato
64
Machine Learning for Data Mining
32
1/10/2008
University of Waikato
65
1/10/2008
University of Waikato
66
QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture.
QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture.
Machine Learning for Data Mining
33
1/10/2008
University of Waikato
67
1/10/2008
University of Waikato
68
QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture.
Machine Learning for Data Mining
34
1/10/2008
University of Waikato
69
1/10/2008
University of Waikato
70
Machine Learning for Data Mining
35
1/10/2008
University of Waikato
71
1/10/2008
University of Waikato
72
Machine Learning for Data Mining
36
1/10/2008
University of Waikato
73
1/10/2008
University of Waikato
74
Machine Learning for Data Mining
37
QuickTime™ and a TIFF (LZW) decompressor are needed to see this pict
1/10/2008
University of Waikato
75
1/10/2008
University of Waikato
76
Machine Learning for Data Mining
38
1/10/2008
University of Waikato
77
1/10/2008
University of Waikato
78
Machine Learning for Data Mining
39
1/10/2008
University of Waikato
79
University of Waikato
80
QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture.
1/10/2008
Machine Learning for Data Mining
40
QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture.
1/10/2008
University of Waikato
81
1/10/2008
University of Waikato
82
Machine Learning for Data Mining
41
QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture.
1/10/2008
University of Waikato
83
1/10/2008
University of Waikato
84
Machine Learning for Data Mining
42
1/10/2008
University of Waikato
85
1/10/2008
University of Waikato
86
Machine Learning for Data Mining
43
1/10/2008
University of Waikato
87
1/10/2008
University of Waikato
88
Machine Learning for Data Mining
44
1/10/2008
University of Waikato
89
1/10/2008
University of Waikato
90
Machine Learning for Data Mining
45
1/10/2008
University of Waikato
91
Explorer: clustering data
WEKA contains “clusterers” for finding groups of
similar instances in a dataset
Implemented schemes are:
k-Means, EM, Cobweb, X-means, FarthestFirst
Clusters can be visualized and compared to “true”
clusters ((if given)
g
)
Evaluation based on loglikelihood if clustering
scheme produces a probability distribution
1/10/2008
Machine Learning for Data Mining
University of Waikato
92
46
1/10/2008
University of Waikato
93
1/10/2008
University of Waikato
94
Machine Learning for Data Mining
47
1/10/2008
University of Waikato
95
1/10/2008
University of Waikato
96
Machine Learning for Data Mining
48
1/10/2008
University of Waikato
97
1/10/2008
University of Waikato
98
Machine Learning for Data Mining
49
1/10/2008
University of Waikato
99
1/10/2008
University of Waikato
100
Machine Learning for Data Mining
50
1/10/2008
University of Waikato
101
1/10/2008
University of Waikato
102
Machine Learning for Data Mining
51
1/10/2008
University of Waikato
103
1/10/2008
University of Waikato
104
Machine Learning for Data Mining
52
1/10/2008
University of Waikato
105
1/10/2008
University of Waikato
106
Machine Learning for Data Mining
53
1/10/2008
University of Waikato
107
Explorer: finding associations
WEKA contains an implementation of the Apriori
algorithm for learning association rules
Can identify statistical dependencies between
groups of attributes:
Works only with discrete data
milk, butter ⇒ bread, eggs (with confidence 0.9 and
support 2000)
Apriori can compute all rules that have a given
minimum support and exceed a given confidence
1/10/2008
Machine Learning for Data Mining
University of Waikato
108
54
1/10/2008
University of Waikato
109
1/10/2008
University of Waikato
110
Machine Learning for Data Mining
55
1/10/2008
University of Waikato
111
1/10/2008
University of Waikato
112
Machine Learning for Data Mining
56
1/10/2008
University of Waikato
113
1/10/2008
University of Waikato
114
Machine Learning for Data Mining
57
1/10/2008
University of Waikato
115
Explorer: attribute selection
Panel that can be used to investigate which
(subsets of) attributes are the most predictive ones
Attribute selection methods contain two parts:
A search method: best-first, forward selection,
random, exhaustive, genetic algorithm, ranking
An evaluation method: correlation-based, wrapper,
information gain, chi-squared, …
Very flexible: WEKA allows (almost) arbitrary
combinations of these two
1/10/2008
Machine Learning for Data Mining
University of Waikato
116
58
1/10/2008
University of Waikato
117
1/10/2008
University of Waikato
118
Machine Learning for Data Mining
59
1/10/2008
University of Waikato
119
1/10/2008
University of Waikato
120
Machine Learning for Data Mining
60
1/10/2008
University of Waikato
121
1/10/2008
University of Waikato
122
Machine Learning for Data Mining
61
1/10/2008
University of Waikato
123
1/10/2008
University of Waikato
124
Machine Learning for Data Mining
62
Explorer: data visualization
Visualization very useful in practice: e.g. helps to
determine difficulty of the learning problem
WEKA can visualize single attributes (1-d) and
pairs of attributes (2-d)
To do: rotating 3-d visualizations (Xgobi-style)
Color-coded class values
“Jitter” option to deal with nominal attributes (and
to detect “hidden” data points)
“Zoom-in” function
1/10/2008
University of Waikato
125
1/10/2008
University of Waikato
126
Machine Learning for Data Mining
63
1/10/2008
University of Waikato
127
1/10/2008
University of Waikato
128
Machine Learning for Data Mining
64
1/10/2008
University of Waikato
129
1/10/2008
University of Waikato
130
Machine Learning for Data Mining
65
1/10/2008
University of Waikato
131
1/10/2008
University of Waikato
132
Machine Learning for Data Mining
66
1/10/2008
University of Waikato
133
1/10/2008
University of Waikato
134
Machine Learning for Data Mining
67
1/10/2008
University of Waikato
135
1/10/2008
University of Waikato
136
Machine Learning for Data Mining
68
1/10/2008
University of Waikato
137
Performing experiments
Experimenter makes it easy to compare the
performance of different learning schemes
For classification and regression problems
Results can be written into file or database
Evaluation options: cross-validation, learning
curve,, hold-out
Can also iterate over different parameter settings
Significance-testing built in!
1/10/2008
Machine Learning for Data Mining
University of Waikato
138
69
1/10/2008
University of Waikato
139
1/10/2008
University of Waikato
140
Machine Learning for Data Mining
70
1/10/2008
University of Waikato
141
1/10/2008
University of Waikato
142
Machine Learning for Data Mining
71
1/10/2008
University of Waikato
143
1/10/2008
University of Waikato
144
Machine Learning for Data Mining
72
1/10/2008
University of Waikato
145
1/10/2008
University of Waikato
146
Machine Learning for Data Mining
73
1/10/2008
University of Waikato
147
1/10/2008
University of Waikato
148
Machine Learning for Data Mining
74
1/10/2008
University of Waikato
149
1/10/2008
University of Waikato
150
Machine Learning for Data Mining
75
1/10/2008
University of Waikato
151
The Knowledge Flow GUI
New graphical user interface for WEKA
Java-Beans-based interface for setting up and
running machine learning experiments
Data sources, classifiers, etc. are beans and can
be connected graphically
Data “flows” through
g components:
p
e.g.,
g,
“data source” -> “filter” -> “classifier” -> “evaluator”
Layouts can be saved and loaded again later
1/10/2008
Machine Learning for Data Mining
University of Waikato
152
76
1/10/2008
University of Waikato
153
1/10/2008
University of Waikato
154
Machine Learning for Data Mining
77
1/10/2008
University of Waikato
155
1/10/2008
University of Waikato
156
Machine Learning for Data Mining
78
1/10/2008
University of Waikato
157
1/10/2008
University of Waikato
158
Machine Learning for Data Mining
79
1/10/2008
University of Waikato
159
1/10/2008
University of Waikato
160
Machine Learning for Data Mining
80
1/10/2008
University of Waikato
161
1/10/2008
University of Waikato
162
Machine Learning for Data Mining
81
1/10/2008
University of Waikato
163
1/10/2008
University of Waikato
164
Machine Learning for Data Mining
82
1/10/2008
University of Waikato
165
1/10/2008
University of Waikato
166
Machine Learning for Data Mining
83
1/10/2008
University of Waikato
167
1/10/2008
University of Waikato
168
Machine Learning for Data Mining
84
1/10/2008
University of Waikato
169
1/10/2008
University of Waikato
170
Machine Learning for Data Mining
85
1/10/2008
University of Waikato
171
1/10/2008
University of Waikato
172
Machine Learning for Data Mining
86
Conclusion: try it yourself!
WEKA is available at
http://www.cs.waikato.ac.nz/ml/weka
Also has a list of projects based on WEKA
WEKA contributors:
Abdelaziz Mahoui, Alexander K. Seewald, Ashraf M. Kibriya, Bernhard
Pfahringer
g , Brent Martin, Peter Flach, Eibe Frank ,Gabi Schmidberger
g
,Ian H. Witten , J. Lindgren, Janice Boughton, Jason Wells, Len Trigg,
Lucio de Souza Coelho, Malcolm Ware, Mark Hall ,Remco Bouckaert ,
Richard Kirkby, Shane Butler, Shane Legg, Stuart Inglis, Sylvain Roy,
Tony Voyle, Xin Xu, Yong Wang, Zhihai Wang
1/10/2008
Machine Learning for Data Mining
University of Waikato
173
87