Download No Slide Title - Cognitive Computation Group

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Group Presentation
Top Changwatchai
18 October 2000
Revised 23 Oct 2000
1
The main point
• Last week I got several good questions
• I plan to address three issues:
– Explain my definition of the random variable
– Explain why we want expectation, not maximum
likelihood value
– Justify why it has a beta distribution under certain
assumptions
Revised 23 Oct 2000
2
Assumptions
• There are k different coins (1, 2, …, k)
• pi = prior probability of picking coin i
k
p
i 1
i
1
• wi = weight of coin i = probability of getting
heads on any given toss of coin i
(independent of any other tosses)
• Our algorithm knows this, and knows the
values of the pi’s and wi’s
Revised 23 Oct 2000
3
Random experiment 1
• Experiment:
– 1. Pick one of the k coins according to the p’s
– 2. Toss this coin one time
• Goal:
– Perform this experiment one time
– Without knowing anything else about the results of the experiment
(except for our assumed knowledge), we want to predict whether
we got heads or tails
• Algorithm A:
– 1. Calculate the probability of getting heads
pheads
 Pheads 
k
  Pheads | coin i   Pcoin i 
i 1
k
  wi  pi
i 1
– 2. If pheads < 0.5, predict tails. Otherwise, predict heads.
Revised 23 Oct 2000
4
Confidence
•
We want confidence to reflect how “good” our prediction is:
–
•
Lots of different things can constitute extra knowledge. We focus on one type of
knowledge in particular:
–
•
confideal  P(make same prediction | more knowledge)
confexp1  P(make same prediction | we know which coin was picked)
Note: we don’t actually know which coin was picked. We want to know the
probability we will make the same prediction in the hypothetical case that we are
told which coin
was picked. (See next slide for alternative explanation.) So:
k
conf exp 1
  Pmake same prediction & we are told coin i was picked 
i 1
k
Pmake same prediction | we are told coin i was picked  
Pwe are told coin i was picked 
i 1

•
Our new prediction uses the same rule as in algorithm A. Say we are told that
coin i is picked. Then if wi < 0.5, we will predict tails. Otherwise, we will predict
heads. In other words, if we predicted heads with algorithm A:
 0 if wi  0.5 

Pmake same prediction | we are told coin i was picked   
1
if
w

0
.
5
i


•
•
In addition: Pwe are told coin i was picked   Pcoin i was picked   pi
k
So, if we predicted heads:
 0 if wi  0.5 
  pi
conf exp 1   
1
if
w

0
.
5
i 1 
i

  pi
i:wi  0.5
Revised 23 Oct 2000
5
Confidence (alternative explanation)
Revised 23 Oct 2000
6
Random variable for experiment 1
•
The space of random experiment 1 is:
– { (coin i, heads or tails) }
•
We define a discrete random variable X for this experiment:
– X((coin i, heads or tails)) = wi
– Note that we ignore the outcome of the flip…since that’s what we’re
predicting
– Support for X is { w1, w2, …, wk }
•
The pmf of X is defined as follows:
– f(w) = { pi if w = wi, 0 otherwise }
•
The expectation of X:
•
•
Note this is the same as pheads in algorithm A, so we define:
Algorithm B:
E  X    wi  pi
i 1
–
–
•
•
k
1. Calculate E(X)
2. If E(X) < 0.5, predict tails. Otherwise, predict heads
This is why we use expectation of X, not maximum likelihood
We also use X to compute confidence. For example, if we predict
heads:
conf exp 1  Pmake same prediction | we know which coin was picked 

Revised 23 Oct 2000
 f w  P X  0.5
w0.5
7
Example
0.5
f(w)
0.4
0.3
0.2
0.1
0
w1=0.2
coin 1
w2=0.8 w3=0.9
coin 2 coin 3
w
• Max likelihood coin (highest probability) is coin 1
– w1 = 0.2, so predict tails (not what we want)
• Instead, we use expectation:
– E(X) = 0.20.4 + 0.80.3 + 0.90.3 = 0.59, so predict heads
• confexp1 = 0.3 + 0.3 = 0.6
Revised 23 Oct 2000
8
Random experiment 2
• Same situation as above. Let N be a finite but very
large number.
• Experiment:
– 1. Pick one of the k coins according to the p’s
– 2. Toss this coin N times.
– 3. Toss the same coin one more time
• Goal:
– Perform this experiment one time
– Let H be the number of heads observed in the first N tosses
– Knowing H and N but nothing else about the results of the
experiment (except for our assumed knowledge), we want to
predict whether we got heads or tails on the last toss
– Note that for N=0, we have random experiment 1
Revised 23 Oct 2000
9
Algorithm C
•
Algorithm C:
– 1. Calculate the probability of getting heads on the last toss:
pheads
 Pheads | H , N 
k
  Pheads | coin i, H , N   Pcoin i | H , N 
i 1
Pheads | coin i, H , N   Pheads | coin i   wi
PH | coin i, N   Pcoin i | N 
Pcoin i | H , N 
 k
 PH | coin j, N  Pcoin j | N 
j 1
PH | coin x, N 
Pcoin i | N 
N H
N H
   wx  1  wx 
H 
 Pcoin i   pi
pheads
N H
N H
  wi  1  wi 
 pi
k
H

  wi  k
N H
i 1
  w j  1  w j N  H  p j

j 1  H 
k

i 1
wi
j 1
Confidence:
– If we predict heads:
conf ideal
conf exp 2
Revised 23 Oct 2000
 1  wi 
N H
 w  1  w 
k
– 2. If pheads < 0.5, predict tails. Otherwise, predict heads.
•
H 1
 Pmake same prediction | more knowledge 
 Pmake same prediction | all N  data 
 Pmake same prediction | we know which coin was picked 
  Pcoin i | H , N 
i:wi  0.5
10
N H
H
j
 pi
j
 pj
Random variable for experiment 2
•
The space of random experiment 2 is:
– { (coin i, data from N tosses, heads or tails on last toss) }
•
We define a discrete random variable X for this experiment:
– X((coin i, data from N tosses, heads or tails on last toss)) = wi
– Note again that we ignore everything except the coin index
•
The pmf of X is defined as follows:
– f(w) = { P(coin i | H, N) if w = wi, 0 otherwise }
•
The expectation of X:
k
E  X    wi Pcoin i | H , N 
i 1
•
•
Note this is the same as pheads in algorithm C, so we define:
Algorithm D:
– 1. Calculate E(X)
– 2. If E(X) < 0.5, predict tails. Otherwise, predict heads
•
Confidence:
– If we predict heads:
conf exp 2
 Pmake same prediction | we know which coin was picked 
  f w  P X  0.5
w0.5
Revised 23 Oct 2000
11
Continuous case
•
Random experiment 3 (continuous version of experiment 2):
– 1. Assume we have random variable W with pdf g(w):
w0
Pw  w0    g wdw
0
• Pick a value w under this distribution
– 2. Toss coin with this weight N times
– 3. Toss the same coin one more time
•
We can use Algorithm C as well, using the following calculations (we
abuse notation slightly--we will correct this on the next slide)
pheads  Pheads | H , N 
  Pheads | w, H , N   Pw | H , N 
w
1
  w  Pw | H , N 
– Since:
– And:
0
Pheads | w, H , N   Pheads | w  w
conf ideal
conf exp 2
 Pmake same prediction | more knowledge 
 Pmake same prediction | all N  data 
 Pmake same prediction | we know w
– Assuming we predicted heads:
conf exp 2   Pw | H , N 
1
0.5
Revised 23 Oct 2000
12
Continuous case (con’t)
•
We can translate all the probabilities as follows:
PH | w, N   Pw | N 
Pw | H , N  
 PH | v, N  Pv | N 
v
P  H | x, N 
Px | N 
•
so we can write:
Pw | H , N  
N
N H
   x H  1  x 
H 
 Px   g x dx
wH  1  w
N H
 v  1  v
1
0
H
N H
 g w
 g v dv
dw 
PH | w, N   g w
dw
P H | N 
•
Clearly, if we define random variable X with the pdf:
PH | w, N   g w
f ( w) 
P H | N 
•
Then the equations on the previous page become:
pheads
conf exp 2
•
  w  f ( w)dw  E  X 
1
0
1
  f ( w)dw  P X  0.5
0.5
Which of course fit into algorithms B and D
Revised 23 Oct 2000
13
Beta distribution
•
Let’s say we don’t know g(w). If we assume Wbeta(w, w), then:
w H  1  w
N H
f w 

w  w 
w w 
 w w 1  1  w
w 1
P H | N 
 C  w H  w 1  1  w
N  H  w 1
•
•
where C is the appropriately defined constant. Clearly f(w) is also a
beta distribution with parameters  = H+w and  = N-H+w, that is:
Xbeta(H+w, N-H+w) with mean:
E X    
•



For example, if Wbeta(1, 1)=U(0, 1), the uniform distribution, then
Xbeta(H+1, N-H+1) and:
EX  
•
H  w
N  w  w
H 1
N 2
Note that E(X) = HN exactly only if HN = ½
Revised 23 Oct 2000
14
Related documents