Download yes - Hong Kong University of Science and Technology

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Data Mining with Naïve Bayesian
Methods
Instructor: Qiang Yang
Hong Kong University of Science and Technology
Qyang@cs.ust.hk
Thanks: Dan Weld, Eibe Frank
1
How to Predict?


From a new day’s data we wish to predict the
decision
Applications




Text analysis
Spam Email Classification
Gene analysis
Network Intrusion Detection
2
Naïve Bayesian Models

Two assumptions: Attributes are


equally important
statistically independent (given the class value)


This means that knowledge about the value of a
particular attribute doesn’t tell us anything about the
value of another attribute (if the class is known)
Although based on assumptions that are
almost never correct, this scheme works well
in practice!
3
Why Naïve?


Assume the attributes are independent, given class
What does that mean?
play
outlook
temp
humidity
windy
Pr(outlook=sunny | windy=true, play=yes)=
Pr(outlook=sunny|play=yes)
4
Weather data set
Outlook
Windy
Play
overcast
FALSE
yes
rainy
FALSE
yes
rainy
FALSE
yes
overcast
TRUE
yes
sunny
FALSE
yes
rainy
FALSE
yes
sunny
TRUE
yes
overcast
TRUE
yes
overcast
FALSE
yes
5
Is the assumption satisfied?




#yes=9
#sunny=2
#windy, yes=3
#sunny|windy, yes=1
Pr(outlook=sunny|windy=true, play=yes)=1/3
Pr(outlook=sunny|play=yes)=2/9
Pr(windy|outlook=sunny,play=yes)=1/2
Pr(windy|play=yes)=3/9
Thus, the assumption is NOT satisfied.
But, we can tolerate some errors (see later
slides)
Outlook
Windy
Play
overcast
FALSE
yes
rainy
FALSE
yes
rainy
FALSE
yes
overcast
TRUE
yes
sunny
FALSE
yes
rainy
FALSE
yes
sunny
TRUE
yes
overcast
TRUE
yes
overcast
FALSE
yes
6
Probabilities for the weather
data
Outlook
Temperature
Yes
No
Sunny
2
3
Overcast
4
Rainy
Humidity
Yes
No
Hot
2
2
0
Mild
4
2
3
2
Cool
3
1
Sunny
2/9
3/5
Hot
2/9
Overcast
4/9
0/5
Mild
Rainy
3/9
2/5
Cool

Windy
Yes
No
High
3
4
Normal
6
2/5
High
4/9
2/5
Normal
3/9
1/5
A new day:
Play
Yes
No
Yes
No
False
6
2
9
5
1
True
3
3
3/9
4/5
False
6/9
2/5
9/14
5/14
6/9
1/5
True
3/9
3/5
Likelihood of the two classes
For “yes” = 2/9  3/9  3/9  3/9  9/14 = 0.0053
Outlook
Temp.
Humidity
Windy
Play
Sunny
Cool
High
True
?
For “no” = 3/5  1/5  4/5  3/5  5/14 = 0.0206
Conversion into a probability by normalization:
P(“yes”) = 0.0053 / (0.0053 + 0.0206) = 0.205
P(“no”) = 0.0206 / (0.0053 + 0.0206) = 0.795
7
Bayes’ rule

Probability of event H given evidence E:
Pr[ E | H ] Pr[ H ]
Pr[ H | E ] 
Pr[ E ]

A priori probability of H:


Probability of event before evidence has been seen
A posteriori probability of H:

Pr[H ]
Pr[ H | E ]
Probability of event after evidence has been seen
8
Naïve Bayes for classification

Classification learning: what’s the probability of the
class given an instance?



Evidence E = an instance
Event H = class value for instance (Play=yes, Play=no)
Naïve Bayes Assumption: evidence can be split into
independent parts (i.e. attributes of instance are
independent)
Pr[ H | E ] 
Pr[ E1 | H ] Pr[ E2 | H ] Pr[ En | H ] Pr[ H ]
Pr[ E ]
9
The weather data example
Outlook
Temp.
Humidity
Windy
Play
Sunny
Cool
High
True
?
Evidence E
Pr[ yes | E ]  Pr[Outlook  Sunny | yes] 
Pr[Temperature  Cool | yes] 
Pr[ Humdity  High | yes] 
Probability for
Pr[ yes]
class “yes”
Pr[Windy  True | yes] 
Pr[ E ]
2 / 9  3 / 9  3 / 9  3 / 9  9 / 14

Pr[ E ]
10
The “zero-frequency problem”

What if an attribute value doesn’t occur with every
class value (e.g. “Humidity = high” for class “yes”)?
 Probability will be zero! Pr[ Humdity  High | yes ]  0
 A posteriori probability will also be zero!
Pr[ yes | E ]  0
(No matter how likely the other values are!)


Remedy: add 1 to the count for every attribute valueclass combination (Laplace estimator)
Result: probabilities will never be zero! (also:
stabilizes probability estimates)
11
Modified probability estimates
In some cases adding a constant different from 1
might be more appropriate
 Example: attribute outlook for class yes
2  /3
4  /3
3  /3
9
9 
9 

Sunny
Overcast
Rainy
 Weights don’t need to be equal (if they sum to 1)
2  p1
9
4  p2
9
3  p3
9
12
Missing values



Training: instance is not included in frequency count
for attribute value-class combination
Classification: attribute will be omitted from
calculation
Outlook
Temp.
Humidity
Windy
Play
Example:
?
Cool
High
True
?
Likelihood of “yes” = 3/9  3/9  3/9  9/14 = 0.0238
Likelihood of “no” = 1/5  4/5  3/5  5/14 = 0.0343
P(“yes”) = 0.0238 / (0.0238 + 0.0343) = 41%
P(“no”) = 0.0343 / (0.0238 + 0.0343) = 59%
13
Dealing with numeric
attributes


Usual assumption: attributes have a normal or
Gaussian probability distribution (given the class)
The probability density function for the normal
distribution is defined by two parameters:
 The sample mean :
1 n
   xi
n i1

The standard deviation :

The density function f(x):
1 n
2

(
x


)

i
n  1 i 1
f ( x) 
1
e
2 

( x  )2
2 2
14
Statistics for the weather data
Outlook
Temperature
Humidity
Windy
Yes
No
Yes
No
Yes
No
Sunny
2
3
83
85
86
85
Overcast
4
0
70
80
96
90
Rainy
3
2
68
65
80
70
…
…
…
…
Play
Yes
No
Yes
No
False
6
2
9
5
True
3
3
9/14
5/14
Sunny
2/9
3/5
mean
73
74.6
mean
79.1
86.2
False
6/9
2/5
Overcast
4/9
0/5
std dev
6.2
7.9
std dev
10.2
9.7
True
3/9
3/5
Rainy
3/9
2/5

Example density value:
f (temperature  66 | yes) 
1
2 6.2
e
( 6673 )2

26.22
 0.0340
15
Classifying a new day

A new day:
Outlook
Temp.
Humidity
Windy
Play
Sunny
66
90
true
?
Likelihood of “yes” = 2/9  0.0340  0.0221  3/9  9/14 = 0.000036
Likelihood of “no” = 3/5  0.0291  0.0380  3/5  5/14 = 0.000136
P(“yes”) = 0.000036 / (0.000036 + 0. 000136) = 20.9%
P(“no”) = 0. 000136 / (0.000036 + 0. 000136) = 79.1%

Missing values during training: not included in
calculation of mean and standard deviation
16
Probability densities

Relationship between probability and density:


Pr[c   x  c  ]    f (c)
2
2


But: this doesn’t change calculation of a posteriori
probabilities because  cancels out
Exact relationship:
b
Pr[ a  x  b] 
 f (t )dt
a
17
Example of Naïve Bayes in Weka

Use Weka Naïve Bayes Module to classify

Weather.nominal.arff
18
19
20
21
22
Discussion of Naïve Bayes


Naïve Bayes works surprisingly well (even if
independence assumption is clearly violated)
Why? Because classification doesn’t require accurate
probability estimates as long as maximum probability
is assigned to correct class


However: adding too many redundant attributes will
cause problems (e.g. identical attributes)
Note also: many numeric attributes are not normally
distributed
23
Related documents