Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Group Presentation Top Changwatchai 18 October 2000 Revised 23 Oct 2000 1 The main point • Last week I got several good questions • I plan to address three issues: – Explain my definition of the random variable – Explain why we want expectation, not maximum likelihood value – Justify why it has a beta distribution under certain assumptions Revised 23 Oct 2000 2 Assumptions • There are k different coins (1, 2, …, k) • pi = prior probability of picking coin i k p i 1 i 1 • wi = weight of coin i = probability of getting heads on any given toss of coin i (independent of any other tosses) • Our algorithm knows this, and knows the values of the pi’s and wi’s Revised 23 Oct 2000 3 Random experiment 1 • Experiment: – 1. Pick one of the k coins according to the p’s – 2. Toss this coin one time • Goal: – Perform this experiment one time – Without knowing anything else about the results of the experiment (except for our assumed knowledge), we want to predict whether we got heads or tails • Algorithm A: – 1. Calculate the probability of getting heads pheads Pheads k Pheads | coin i Pcoin i i 1 k wi pi i 1 – 2. If pheads < 0.5, predict tails. Otherwise, predict heads. Revised 23 Oct 2000 4 Confidence • We want confidence to reflect how “good” our prediction is: – • Lots of different things can constitute extra knowledge. We focus on one type of knowledge in particular: – • confideal P(make same prediction | more knowledge) confexp1 P(make same prediction | we know which coin was picked) Note: we don’t actually know which coin was picked. We want to know the probability we will make the same prediction in the hypothetical case that we are told which coin was picked. (See next slide for alternative explanation.) So: k conf exp 1 Pmake same prediction & we are told coin i was picked i 1 k Pmake same prediction | we are told coin i was picked Pwe are told coin i was picked i 1 • Our new prediction uses the same rule as in algorithm A. Say we are told that coin i is picked. Then if wi < 0.5, we will predict tails. Otherwise, we will predict heads. In other words, if we predicted heads with algorithm A: 0 if wi 0.5 Pmake same prediction | we are told coin i was picked 1 if w 0 . 5 i • • In addition: Pwe are told coin i was picked Pcoin i was picked pi k So, if we predicted heads: 0 if wi 0.5 pi conf exp 1 1 if w 0 . 5 i 1 i pi i:wi 0.5 Revised 23 Oct 2000 5 Confidence (alternative explanation) Revised 23 Oct 2000 6 Random variable for experiment 1 • The space of random experiment 1 is: – { (coin i, heads or tails) } • We define a discrete random variable X for this experiment: – X((coin i, heads or tails)) = wi – Note that we ignore the outcome of the flip…since that’s what we’re predicting – Support for X is { w1, w2, …, wk } • The pmf of X is defined as follows: – f(w) = { pi if w = wi, 0 otherwise } • The expectation of X: • • Note this is the same as pheads in algorithm A, so we define: Algorithm B: E X wi pi i 1 – – • • k 1. Calculate E(X) 2. If E(X) < 0.5, predict tails. Otherwise, predict heads This is why we use expectation of X, not maximum likelihood We also use X to compute confidence. For example, if we predict heads: conf exp 1 Pmake same prediction | we know which coin was picked Revised 23 Oct 2000 f w P X 0.5 w0.5 7 Example 0.5 f(w) 0.4 0.3 0.2 0.1 0 w1=0.2 coin 1 w2=0.8 w3=0.9 coin 2 coin 3 w • Max likelihood coin (highest probability) is coin 1 – w1 = 0.2, so predict tails (not what we want) • Instead, we use expectation: – E(X) = 0.20.4 + 0.80.3 + 0.90.3 = 0.59, so predict heads • confexp1 = 0.3 + 0.3 = 0.6 Revised 23 Oct 2000 8 Random experiment 2 • Same situation as above. Let N be a finite but very large number. • Experiment: – 1. Pick one of the k coins according to the p’s – 2. Toss this coin N times. – 3. Toss the same coin one more time • Goal: – Perform this experiment one time – Let H be the number of heads observed in the first N tosses – Knowing H and N but nothing else about the results of the experiment (except for our assumed knowledge), we want to predict whether we got heads or tails on the last toss – Note that for N=0, we have random experiment 1 Revised 23 Oct 2000 9 Algorithm C • Algorithm C: – 1. Calculate the probability of getting heads on the last toss: pheads Pheads | H , N k Pheads | coin i, H , N Pcoin i | H , N i 1 Pheads | coin i, H , N Pheads | coin i wi PH | coin i, N Pcoin i | N Pcoin i | H , N k PH | coin j, N Pcoin j | N j 1 PH | coin x, N Pcoin i | N N H N H wx 1 wx H Pcoin i pi pheads N H N H wi 1 wi pi k H wi k N H i 1 w j 1 w j N H p j j 1 H k i 1 wi j 1 Confidence: – If we predict heads: conf ideal conf exp 2 Revised 23 Oct 2000 1 wi N H w 1 w k – 2. If pheads < 0.5, predict tails. Otherwise, predict heads. • H 1 Pmake same prediction | more knowledge Pmake same prediction | all N data Pmake same prediction | we know which coin was picked Pcoin i | H , N i:wi 0.5 10 N H H j pi j pj Random variable for experiment 2 • The space of random experiment 2 is: – { (coin i, data from N tosses, heads or tails on last toss) } • We define a discrete random variable X for this experiment: – X((coin i, data from N tosses, heads or tails on last toss)) = wi – Note again that we ignore everything except the coin index • The pmf of X is defined as follows: – f(w) = { P(coin i | H, N) if w = wi, 0 otherwise } • The expectation of X: k E X wi Pcoin i | H , N i 1 • • Note this is the same as pheads in algorithm C, so we define: Algorithm D: – 1. Calculate E(X) – 2. If E(X) < 0.5, predict tails. Otherwise, predict heads • Confidence: – If we predict heads: conf exp 2 Pmake same prediction | we know which coin was picked f w P X 0.5 w0.5 Revised 23 Oct 2000 11 Continuous case • Random experiment 3 (continuous version of experiment 2): – 1. Assume we have random variable W with pdf g(w): w0 Pw w0 g wdw 0 • Pick a value w under this distribution – 2. Toss coin with this weight N times – 3. Toss the same coin one more time • We can use Algorithm C as well, using the following calculations (we abuse notation slightly--we will correct this on the next slide) pheads Pheads | H , N Pheads | w, H , N Pw | H , N w 1 w Pw | H , N – Since: – And: 0 Pheads | w, H , N Pheads | w w conf ideal conf exp 2 Pmake same prediction | more knowledge Pmake same prediction | all N data Pmake same prediction | we know w – Assuming we predicted heads: conf exp 2 Pw | H , N 1 0.5 Revised 23 Oct 2000 12 Continuous case (con’t) • We can translate all the probabilities as follows: PH | w, N Pw | N Pw | H , N PH | v, N Pv | N v P H | x, N Px | N • so we can write: Pw | H , N N N H x H 1 x H Px g x dx wH 1 w N H v 1 v 1 0 H N H g w g v dv dw PH | w, N g w dw P H | N • Clearly, if we define random variable X with the pdf: PH | w, N g w f ( w) P H | N • Then the equations on the previous page become: pheads conf exp 2 • w f ( w)dw E X 1 0 1 f ( w)dw P X 0.5 0.5 Which of course fit into algorithms B and D Revised 23 Oct 2000 13 Beta distribution • Let’s say we don’t know g(w). If we assume Wbeta(w, w), then: w H 1 w N H f w w w w w w w 1 1 w w 1 P H | N C w H w 1 1 w N H w 1 • • where C is the appropriately defined constant. Clearly f(w) is also a beta distribution with parameters = H+w and = N-H+w, that is: Xbeta(H+w, N-H+w) with mean: E X • For example, if Wbeta(1, 1)=U(0, 1), the uniform distribution, then Xbeta(H+1, N-H+1) and: EX • H w N w w H 1 N 2 Note that E(X) = HN exactly only if HN = ½ Revised 23 Oct 2000 14