Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Scientific Methods 1 ‘Scientific evaluation, experimental design & statistical methods’ COMP80131 Lecture 4: Statistical Methods-Probability Barry & Goran www.cs.man.ac.uk/~barry/mydocs/myCOMP80131 16 Nov 2011 COMP80131-SEEDSM2-4 1 Probability There are two useful definitions of probability: 1. Baysian probability is a person’s belief in the truth of a statement S, quantified on a scale from 0 (definitely not true) to 1 (definitely true). 2. Experimental (or frequentist) probability is determined by the number, M, of times that a statement S will be found to be true if it is tested a large number, N, of times . The probability, P(S), may then be defined as the limit of M / N as more and more experiments are carried out & N tends to infinity. 16 Nov 2011 COMP80131-SEEDSM2-4 2 Different language • By either definition, probability P(S) is a number in range 0 to 1. • Multiply by 100 to express as a percentage. • Or express as odds: e.g. ‘4 to 1 against’ means 1/5 = 0.2 = 20%. • What do odds of ‘4 to 1 on’ mean? • What does ‘50-50’ mean ? 16 Nov 2011 COMP80131-SEEDSM2-4 3 Calculating probability • The 2 definitions of probability usually mean the same thing. • By examining a coin, we could give ourselves good reason for believing that tossing it just once will give an even chance of getting heads, i.e. that the Baysian definition of P(S) = 0.5 where S = ‘get heads’. • If the coin is then tossed N = 100 times we would expect about M = 50 occurrences of heads meaning that M/N 0.5. • Increasing N to 1000 and then to 1000000 would be expected to produce closer & closer approximations to P(S) = 0.5. • If this does not happen, our ‘a-priori’ belief may be wrong. • The coin may be ‘weighted’ after all. 16 Nov 2011 COMP80131-SEEDSM2-4 4 Random process • • • • • • • • • Tossing a coin is a random process. It generates a ‘random variable’ Heads or Tails. It is random because the outcome cannot be predicted exactly. If 1= heads and 0 = tails we have a random binary number. Throwing a dice generates a random integer in range 1-6. Spinning a Roulette wheel generates a random no. in range 0-36. Setting & marking an exam produces random nos in range 0-100 These are all random processes producing discrete variables. Some random processes produce continuous variables. e.g. measuring people’s heights. 16 Nov 2011 COMP80131-SEEDSM2-4 5 Simulating random process • MATLAB has functions that generate pseudo-random numbers. • ‘rand’ produces a pseudo-random number ‘uniformly distributed’ in the range 0 to 1. • May be considered ‘continuous’ since floating pt is very accurate. • Calling ‘rand’ repeatedly produces numbers evenly distributed across the range 0 to 1. • They are ‘pseudo-random’ because if we know the algorithm used, we can predict the numbers. • So we pretend we do not know the algorithm. • ‘rand’ may be considered to simulate some random process that generates truly random numbers, uniformly distributed.. 16 Nov 2011 COMP80131-SEEDSM2-4 6 Simulating coin tossing in MATLAB for n=1:20 R = rand; if R > 0.5, Heads(n)=1 else Heads(n) = 0; end; end; % of n loop Heads 10110001110101011101 - 12 heads & 8 tails When I changed 20 to 10,000, I got 5066 heads: P(Heads) 0.5066 When I ran it again, I got 4918 heads : P(Heads) 0.4918 16 Nov 2011 COMP80131-SEEDSM2-4 7 Using an unfair coin for n=1:20 R = rand; if R > 0.4, Heads(n)=1 else Heads(n) = 0; end; end; % of n loop Heads 00101001110101010101 - 10 heads & 10 tails •When I changed 20 to 10,000, I got 6012 heads: P(Heads) 0.6012 •When I ran it again, I got 5979 heads : P(Heads) 0.5979 16 Nov 2011 COMP80131-SEEDSM2-4 8 Estimating probability experimentally • We cannot measure probability with 100% accuracy. • All measurements are estimates that may be slightly or totally wrong. • According to experimental definition, we have to perform an experiment an infinite number of times to measure a probability. • This is clearly impossible. • In practice, we have to perform the experiment a finite number of times • (Cannot spend all our lives tossing coins) • Accept resulting measurement as an estimate of true probability. 16 Nov 2011 COMP80131-SEEDSM2-4 9 Baysian Definition • According to Baysian definition of probability, a person’s belief in the truth of a statement may be affected by one or more assumption (hypotheses). • “I assume it is a fair coin” • Different people may have different beliefs. • Can only estimate probability using information we have at hand, though we can modify this estimate later if we get new information. 16 Nov 2011 COMP80131-SEEDSM2-4 10 Conditional probability • P(S S1) means the probability of ‘statement S’ being true given that we know that another statement, S1, is definitely true. • If S stands for ‘get heads’ we may at first believe that P(S) = 0.5. • But what if someone tells us that the statement S1: ‘coin is weighted with heavier metal on one side’, is true? • We may change our measurement of probability to P(S S1). • P(S) is then referred to as the ‘prior’ probability • P(S S1) is the ‘conditional’ or ‘posterior’ probability. 16 Nov 2011 COMP80131-SEEDSM2-4 11 Bayes Theorem •Expresses the probability of some fact ‘A’ being true when we know that some other fact ‘B’ is true: P( A B) • • • • • P( B A) P( A) P( B) P(A) is ‘prior’ as it does not take into account any information about B. Similarly P(B) is ‘prior’. P(A|B) and P(B|A) are conditional or ‘posterior’ probabilities. Let A = ‘coin is fair’ & B = ‘getting 12 heads out of 20’ P(A B) = P(B A) P(A) / P(B) 16 Nov 2011 COMP80131-SEEDSM2-4 12 What is prob of getting 12 heads out of 20? clear all; % WITH FAIR COIN HIS=zeros(21,1); for rep=1:1000 for n=1:20 R = rand; % Unif random number between 0 & 1 if R > 0.5, Heads(n)=0; else Heads(n)=1; end; end; % of n loop Count = sum(Heads); HIS(1+Count) = HIS(1+Count)+1; end; % of rep loop figure(1); stem(0:20,HIS); 16 Nov 2011 COMP80131-SEEDSM2-4 13 Histogram for 1000 trials FAIR COIN 200 180 Frequency out of 1000 trials 160 140 120 100 80 60 40 20 0 0 16 Nov 2011 2 16 14 12 10 8 6 4 Number of Heads obtainable with 20 coin-tosses COMP80131-SEEDSM2-4 18 20 14 Estimate of probability distribution FAIR COIN Estimate of prob distribution based on 1000 trials 0.2 0.18 0.16 0.14 0.12 0.1 0.08 0.06 0.04 0.02 0 0 16 Nov 2011 2 4 6 8 10 12 14 16 Number of Heads obtainable with 20 coin-tosses COMP80131-SEEDSM2-4 18 20 15 Probability estimate (fair coin) Estimated probabilities: for 0:9 heads 0 0 0 0 0.008 0.011 0.024 0.087 0.119 0.160 for 10:19 heads 0.194 0.157 0.115 0.076 0.03 0.012 0.003 0.003 0.001 0 for 20 heads 0 So our estimate of the probability of getting 12 heads out of 20 with a fair coin is 0.115. 16 Nov 2011 COMP80131-SEEDSM2-4 16 What is prob of getting 12 heads out of 20? clear all; %WITH 60-40 WEIGHTED COIN HIS=zeros(21,1); for rep=1:1000 for n=1:20 R = rand; % Unif random number between 0 & 1 if R > 0.4, Heads(n)=1; else Heads(n)=0; end; end; % of n loop Count = sum(Heads); HIS(1+Count) = HIS(1+Count)+1; end; % of rep loop figure(1); stem(0:20,HIS); 16 Nov 2011 COMP80131-SEEDSM2-4 17 HISTOGRAM for ‘60-40’ weighted coin 200 180 Frequency out of 1000 trials 160 140 120 100 80 60 40 20 0 0 16 Nov 2011 2 4 6 8 10 12 14 16 Number of Heads obtainable with 20 coin-tosses COMP80131-SEEDSM2-4 18 20 18 Prob distribution estimate for ‘60-40’ weighted coin Estimate of prob distribution based on 1000 trials 0.2 0.18 0.16 0.14 0.12 0.1 0.08 0.06 0.04 0.02 0 0 16 Nov 2011 2 4 6 8 10 12 14 16 Number of Heads obtainable with 20 coin-tosses COMP80131-SEEDSM2-4 18 20 19 Estimate Cumulative Prob Distrib CDF(1)= HIS(1)/1000; for n=2:21, CDF(n)=CDF(n-1)+HIS(n)/1000; end; figure(3); stem(0:20,CDF); Easily derived from a Histogram or Prob Distribution. Estimate prob of getting between 0 and n Heads 16 Nov 2011 COMP80131-SEEDSM2-4 20 Estimate of Cumulative Prob Dist FAIR COIN Estimate of cumulative prob dist based on 1000 trials 1 0.9 0.8 0.7 Usually an S shaped function 0.6 0.5 0.4 0.3 0.2 0.1 0 0 16 Nov 2011 2 4 6 8 10 12 14 16 Number of Heads obtainable with 20 coin-tosses 18 COMP80131-SEEDSM2-4 20 21 4 coin-tosses: how many possible outcomes? 0000 0001 0010 0011 0100 0101 0110 0111 1111 1001 1010 1011 1100 1101 1110 1111 16 Nov 2011 How many with 0 heads? 1 How many with 1 heads? 4 = 4C1 How many with 2 heads? 6 = 4C2 = 43/ (2!) How many with 3 heads? 4 = 4C3 How many with 4 heads? 1 Combinations: nCr = no of ways of choosing r from n = n(n-1) …(n-r+1) / (r!) COMP80131-SEEDSM2-4 22 Binomial Prob Distribution • Distributions have up to now been estimated. • For random processes with just 2 outputs, we can derive a true distribution: • If p=prob(Heads), prob of getting Heads exactly r times in n independent coin-tosses is: r (1-p)(n-r) C p n r • For a fair coin. p=0.5, this becomes nCr /2n • For a fair dice, the prob of throwing 3 sixes in five throws is: [54/(3 2 1)] (1/6)3 (5/6)2 16 Nov 2011 COMP80131-SEEDSM2-4 23 Implementing formula (fair coin) • • • • • • • • p = 0.5; % for fair coin tossing n=20; for r=0:n nCr = prod(n:-1:(n-r+1))/prod(1:r); P(1+r) = nCr * (p^r) * (1-p)^(n-r); end; figure(4); stem(0:20,P); axis([0 20 0 0.2]); grid on; 16 Nov 2011 COMP80131-SEEDSM2-4 24 True prob distribution (n=20) 0.2 True probability of getting that no of heads 0.18 Fair coin 0.16 0.14 0.12 0.1 0.08 0.06 0.04 0.02 0 0 2 16 Nov 2011 4 16 14 12 10 8 6 No of heads obtainable with n coin-tosses 18 COMP80131-SEEDSM2-4 20 25 True probability from formula For 0-9 heads: 0 0 0.0002 0.0011 0.0046 0.015 0.037 0.074 0.12 0.16 For 10-19 heads: 0.176 0.16 0.12 0.074 0.037 0.015 0.0046 0.0011 0.0002 For 20 heads: 0 True prob of getting 12 heads with a fair coin is 0.12. 0 Changing p to 0.4, we find that the true probability of getting 12 heads out or 20 with a ‘60-40’ weighted coin is: 0.18 16 Nov 2011 COMP80131-SEEDSM2-4 26 Back to Bayes Theorem • There are 2 coins a fair one & a ‘60-40’ weighted one. • We chose a coin at random & toss it 20 times. • What is the probability of having a weighted coin when I get 12 heads out of 20? • A = ‘coin is weighted 60-40’ & B = ‘get 12 heads out of 20’ • We know that P(B Fair coin) is 0.12 & P(B A) is 0.18. • So P(B) will be the average of 0.12 & 0.18 = 0.15 • P(A B) = P(B A) P(A) / P(B) • = 0.18 0.5 /0.15 = 0.6 16 Nov 2011 COMP80131-SEEDSM2-4 27 Further illustration of Bayes Theorem • At a college there are: 10 students from France 5 girls & 5 boys 15 from UK 5 girls & 10 boys 20 from Canada 5 girls & 15 boys 16 Nov 2011 COMP80131-SEEDSM2-4 28 Calculation • If we choose a student at random, the a-priori probability that this student is French is P(French) = 10/45 = 2/9 0.22 • Now if we notice that this student is a boy, how does this change the probability that the student is French? • Use Bayes’ Theorem as follows: P( French Boy ) P( Boy French) P( French) P( Boy ) = 0.5 (10/45) / (30/45) = 1/6 0.167 • The fact that we notice that the chosen student is a boy gives us additional information that changes the probability that the student chosen at random will be French. 16 Nov 2011 COMP80131-SEEDSM2-4 29 Check the calculation • We can check the previous result by common sense, noticing that out of 30 boys, in the college 5 are from France. Therefore, P(FB) = 5/30 = 1/6. 16 Nov 2011 COMP80131-SEEDSM2-4 30 Usefulness of Bayes Theorem • In general Bayes’ theorem allows us to take additional information into account when calculating probabilities. Without the additional information, we have a ‘prior’ probability and with it we have a ‘conditional’ or ‘posterior’ probability. 16 Nov 2011 COMP80131-SEEDSM2-4 31 Bayes Theorem in medicine • A patent goes to a doctor with a bad cough & a fever. The doctor needs to decide whether he has ‘swine flu’. • Let statement S = ‘has bad cough and fever’ and statement F = ‘has swine flu’. • The doctor consults his medical books and finds that about 40% of patients with swine-flu have these same symptoms. • Assuming that, currently, about 1% of the population is suffering from swine-flu and that currently about 5% have bad cough and fever (due to many possible causes including swine-flu), we can apply Bayes theorem to estimate the probability of this particular patient having swine-flu. 16 Nov 2011 COMP80131-SEEDSM2-4 32 Another problem to solve • A doctor in another country knows form his text-books that for 40% of patients with swine-flu, the statement S, ‘has bad cough and fever’ is true. He sees many patients and comes to believe that the probability that a patient with ‘bad cough and fever’ actually has swine-flu is about 0.1 or 10%. If there were reason to believe that, currently, about 1% of the population have a bad cough and fever, what percentage of the population is likely to be suffering from swine-flu? 16 Nov 2011 COMP80131-SEEDSM2-4 33 Some questions from Lecture 2 • Analyse the ficticious exam results & comment on features. • Compute means, stds & vars for each subject & histograms for the distributions. • Make observations about performance in each subject & overall • Do marks support the hypothesis that people good at Music are also good at Maths? • Do they support the hypothesis that people good at English are also good at French? • Do they support the hypothesis that people good at Art are also good at Maths? • If you have access to only 50 rows of this data, investigate the same hypotheses • What conclusions could you draw, and with what degree of certainty? 16 Nov 2011 COMP80131-SEEDSM2-4 34 Continuous random processes • Characterised by probability density functions (pdf) pdf(x) Uniform pdf: Prob of the random variable x lying between a and b is: 1 b x ab pdf(x) pdf ( x)dx b a a 1 Gaussian (Normal) pdf with mean m & std dev . 1 pdf ( x) e 2 1 x m 2 2 b Pr ob pdf ( x)dx a m- 16 Nov 2011 m m+ 68% ab x COMP80131-SEEDSM2-4 95.5% for m 2 99.7% for m 3 35 pdf & Histograms • • • • Ru = rand(10000,1); %10000 unif samples hist(Ru,20); Rg=randn(10000,1); %Gaussian with m=0, std=1 hist(Rg,20); 600 1600 1400 500 1200 400 1000 300 800 600 200 400 100 200 0 0 0.1 0.2 16 Nov 2011 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 -4 -3 COMP80131-SEEDSM2-4 -2 -1 0 1 2 3 4 5 36 Converting histogram to estimate of pdf • Divide each column by number of samples • Then multiply by number of bins. • For better approximation, increase number of bins 16 Nov 2011 COMP80131-SEEDSM2-4 37 Concept of a ‘null-hypothesis’ • A null-hypothesis is an assumption that is made and then tested by a set of experiments designed to reveal that it is likely to be false, if it is false. • Testing is done by considering how probable the results are, assuming the null hypothesis is true. • If the results appear very improbable the researcher may conclude that the null-hypothesis is likely to be false. • This is usually the outcome the researcher hopes for when he or she is trying to prove that a new technique is likely to have some value. 16 Nov 2011 COMP80131-SEEDSM2-4 38 An example • Assume we wish to find out if a proposed technique designed to benefit users of a system is likely to have any value. • Divide the users into two groups and offer the proposed technique to one group and something different to the other group. • The null-hypothesis would be that the proposed technique offers no measurable advantage over the other techniques. 16 Nov 2011 COMP80131-SEEDSM2-4 39 The testing • This would be carried out by looking for differences between the sets of results obtained for each of the two groups. • Careful experimental design will try to eliminate differences not caused by the techniques being compared. • Must take a large number of users in each group & randomize the way the users are assigned to groups. • Once other differences have been eliminated as far as possible, any remaining difference will hopefully be indicative of the effectiveness of the techniques being investigated. • The vital question is whether they are likely to be due to the advantages of the new technique, or the inevitable random variations that arise from the other factors. • Are the differences statistically significant? • Can employ a statistical significance to find out. 16 Nov 2011 COMP80131-SEEDSM2-4 40 Failure of the experiment • If the results are not found to look improbable under the nullhypothesis, i.e. if the differences between the two groups are not statistically significant, then no conclusion can be made. • The null-hypothesis could be true, or it could still be false. • It would be a mistake to conclude that the ‘null-hypothesis’ has been proved likely to be true in this circumstance. • It is quite possible that the results of the experiment give insufficient evidence to make any conclusions at all. 16 Nov 2011 COMP80131-SEEDSM2-4 41 P-Value • Probability of obtaining a test result at least as extreme as the one that was actually observed, assuming that the null hypothesis is true. • Reject the null hypothesis if the p-value is less than some value α (significance level) which is often 0.05 or 0.01. • When the null-hypothesis is rejected, the result is said to be statistically significant. 16 Nov 2011 COMP80131-SEEDSM2-4 42 Checking whether a coin is fair Suppose we obtain heads 14 times out of 20 flips. The p-value for this test result would be the probability of a fair coin landing on heads at least 14 times out of 20 flips. This is: (20C14 + 20C15+20C16+20C17+20C18+20C19+20C20) / 220 = 0.058 This is probability that a fair coin would give a result as extreme or more extreme than 14 heads out of 20. 16 Nov 2011 COMP80131-SEEDSM2-4 43 Significance test • Reject null hypothesis if p-value α . • If α= 0.05, the rejection of the null hypothesis is at the 5% (significance) level. • The probability of wrongly rejecting the null-hypothesis (Type 1 error) will be equal to α. • This is considered sufficiently low. • In this case, p-value > 0.05, therefore observation is consistent with null hypothesis and we cannot reject it. • Cannot conclude that coin is likely to be unfair. • But we have NOT proved that coin is likely to be fair. • 14 heads out of 20 flips can be ascribed to chance alone • It falls within the range of what could happen 95% of the time with a fair coin. 16 Nov 2011 COMP80131-SEEDSM2-4 44