* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Probability and Random Variables (Rees: §5.1
Survey
Document related concepts
Transcript
Probability and Random Variables (Rees: §5.1 - §5.4) Earlier in this course, we looked at methods of describing the data in a sample. Next we would like to have models for the ways in which data can arise. Before we can develop these models, we need some probability theory. There are many situations (experiments) in which we do not know in advance what will happen. Examples • A toss of a fair coin. • Genetics. • The weather in Edinburgh tomorrow. • The winner of a particular horse race. The result of an experiment is called an outcome. An event is a collection of the possible outcomes from an experiment. Examples A = “Get a head from a toss of a fair coin”. B = “Get cystic fibrosis if parents are carriers”. C = “It will rain or snow in Edinburgh tomorrow”. D = “Fat Chance wins the 3.00 at Kempton”. Probability uses a numeric scale to describe how likely these events are. Let E be an event – something that can happen! We will assign a probability Pr(E) to that event such that 0 ≤ Pr(E) ≤ 1. • Pr(E) = 0 means “(almost) never happens”. • Pr(E) = 1 means “(almost) always happens”. For event A, there are 2 possible outcomes and both are equally likely, so that Pr(A) = 0.5 For event B, there are 4 equally likely outcomes and only one of these leads to the disease, so that Pr(B) = 0.25 For events C and D, or if the coin is not fair, the meaning of probability is less clear. One approach is to consider it to be the proportion of successes in a very large number of trials. Thus for event C, one could consider weather records from previous years for tomorrow’s date. For event D, this approach is impractical, but we can still do the corresponding ‘thought experiment’. This approach works well in practice. If there are n outcomes which are equally likely, then each outcome has probability If the event E occurs for r of these outcomes, then r n number of favourable outcomes = number of possible outcomes P(E) = 1 1 n Probability satisfies various rules: • 0 ≤ Pr(A) ≤ 1 • Pr(A) + Pr(Not A) = 1 (Either A occurs or it does not) Betting Odds compare P(E) with P(Not E) Example: Throw a fair die. 1 Probability of getting a 6 is 6 Odds are 5-1 against getting 6 Odds are 5-1 On getting ‘Not 6’ Most people find these easier to understand as they avoid small decimal numbers, but they are difficult to combine. If E and F are events, then: (E And F) are all the outcomes that are in both E and F (E Or F) are all the outcomes that are in E or in F or in both. These events can be shown on a Venn diagram. ' $ ' E $ F & & % % The outer rectangle represents all the possible outcomes. The regions labelled E and F represent the outcomes in those events. The overlap region of E and F represents the event “E And F”. The larger outline that contains both letters represents the event “E Or F”. In the Venn diagram, it is possible to draw all of the events so that the area corresponds to the probability of an event. This implies that the area enclosed by the outer rectangle is one unit. It is clear from the diagram that: Pr(E OR F ) = Pr(E)+Pr(F )−Pr(E AND F ) The subtraction of the ‘AND’ term is because adding the two probabilities means the overlap region has been counted twice. This is called the “ADDITION LAW” or the “OR LAW” (see Rees §5.10) 2 If E and F cannot both occur together, they are said to be disjoint or mutually exclusive events. On a Venn diagram, this corresponds to there being no overlap of their shapes. Then, the addition law simplifies to: Pr(E OR F ) = Pr(E) + Pr(F ) Example: Athletes data. Distance USA Sprinter 7 2 400m Middle 2 Long 1 12 GB 5 2 4 1 12 Kenya 0 12 0 4 7 13 5 7 12 36 If an athlete is picked at random: 4 = 91 Pr(400m) = 36 12 Pr(GB) = 36 = 31 Pr(USA or GB) = Pr(USA) + Pr(GB) because mutually exclusive + 12 = 23 = 12 36 36 Pr(USA or Sprinter) = Pr(USA) + Pr(Sprinter) − Pr(USA and Sprinter) 7 + 12 − 36 = 12 36 36 = 17 36 Note that the events “Sprinter” and “Kenyan” are mutually exclusive, because there are no athletes who belong to both events. In the previous example, each of the individuals was equally likely to be selected. Then: Probability = Number of favourable outcomes Total number of outcomes Example: Gregor Mendel carried out breeding experiments on peas. Each pea was classified as (Round or Wrinkled) and (Yellow or Green). The results from one of these experiments were summarised as: Round Wrinkled Total Yellow 315 101 416 Green 108 32 140 For peas from this experiment: Pr(Green) = 140 = 0.252 556 Pr(Wrinkled) = 133 = 0.239 556 Mendel noticed that both of these values are close to: 32 Pr(Green AND Wrinkled) = 556 = 0.058 1 This is close to: 16 = 0.0625 3 Total 423 133 556 1 4 = 0.25 This suggested a theoretical model for his results. Yellow Green 9 16 3 16 3 16 1 16 Round Wrinkled We now know that the Wrinkled and Green traits are controlled by recessive genes and these are on different chromosomes. Genetic theory would therefore predict these probabilities. This justification of the theoretical model was not available in Mendel’s time. Problem: How should we decide if a data set is consistent with a possible theoretical model? This will be considered later in the course. Example: Roulette There are 37 numbers on the wheel: 0 – 36. 18 are red 18 are black 1 is green (9 odd and 9 even) (9 odd and 9 even) (zero) We can display this in a table: Red Odd 9 Even 9 Zero 0 18 Black 9 9 0 18 Green 0 18 0 18 1 1 1 37 Consider a spin of the wheel and assume it is fair, so that all numbers are equally likely Let A = ‘Red’ and B = ‘Odd’. Number of reds 18 Pr(Red) = Pr(A) = = Number of outcomes 37 Number of odds 18 Pr(Odd) = Pr(B) = = Number of outcomes 37 Number Red AND Odd Number of outcomes 9 = 37 For ‘OR’ questions, it is usually best to use the Addition Law: Pr(Red AND Odd) = Pr(Red OR Odd) = Pr(Red) + Pr(Odd) − Pr(Red AND Odd) 18 18 9 = + − 37 37 37 27 = 37 4 Complementary Events If A is an event, then the event “Not A”is said to be the complementary event. We have: Pr(A) + Pr(Not A) = 1 The use of a complementary event can make the calculation of a probability easier. Example: In the Mendel pea experiments, ‘Round’ and ‘Wrinkled’ were complementary events. Example: Two dice are thrown. It helps to suppose that one dice is Red while the other is Blue. Possible outcomes are: Blue 1 2 3 4 5 6 1 (1,1) (1,2) (1,3) (1,4) (1,5) (1,6) 2 (2,1) (2,2) (2,3) (2,4) (2,5) (2,6) Red 3 (3,1) (3,2) (3,3) (3,4) (3,5) (3,6) 4 (4,1) (4,2) (4,3) (4,4) (4,5) (4,6) 5 (5,1) (5,2) (5,3) (5,4) (5,5) (5,6) 6 (6,1) (6,2) (6,3) (6,4) (6,5) (6,6) There are 36 possible outcomes – all equally likely. 2 Pr(Total = 3) = 36 6 Pr(Total = 7) = 36 9 Pr(Both odd) = 36 6 Pr(Red = Blue) = 36 Pr(Red > Blue) = 15 36 Example: For 2 fair dice let: E be the event ‘Sum is 4’ F be the event ‘Dice show same even number’. Find the probability that at least one of these events occurs. 3 P(E) = P(1, 3) + P(2, 2) + P(3, 1) = 36 3 P(F) = P(2, 2) + P(4, 4) + P(6, 6) = 36 These are not mutually exclusive because 1 P(E And F) = P(2, 2) = 36 Using the addition rule of probability: P(E Or F) = P(E) + P(F) – P(E And F) 3 3 1 = 36 + 36 − 36 5 = 36 5 Example: For 2 fair dice (Red and Blue) let E be the event ‘At least one 6’ P(E) = P(Red 6) + P(Blue 6) – P(Both 6) 1 = 16 + 61 − 36 = 11 36 6 Conditional Probability Knowledge about the occurrence of one event may affect the probability of another event. Example: Using straws to draw lots. Five students use 4 long straws and one short straw to decide who will buy the next round of drinks. Before the draw, each student is equally likely to draw the short straw. However, if you are last to draw and the other four were long, it must be you. Example: A standard pack of 52 cards has four suits (clubs, diamonds, hearts and spades) with 13 cards in each. Draw 2 cards at random without replacement. = 0.250 P(First card is Club) = 13 52 P(Second card is Club) = 13 = 0.250 52 But, if we know that the first card is a Club, then: = 0.235 P(Second card is Club) = 12 51 If we know that the first card is not a Club, then: P(Second card is Club) = 13 = 0.255 51 Note: It is information that leads to a change in the probability, not the drawing process. Example: Throw 2 fair dice, one Red and one Blue. The table gives the probabilities of having values for the Red die if we know the sum of the 2 dice. Value on Red die Sum 1 2 3 4 5 6 2 1 0 0 0 0 0 3 0 0 0 0 0 0 0 0 10 0 0 0 11 0 0 0 0 1 5 1 6 1 5 1 4 1 3 1 2 0 9 1 4 1 5 1 6 1 5 1 4 1 3 0 0 1 3 1 4 1 5 1 6 1 5 1 4 0 8 1 2 1 3 1 4 1 5 1 6 1 5 0 7 1 2 1 3 1 4 1 5 1 6 12 0 0 0 0 0 1 4 5 6 1 6 1 5 1 4 1 3 1 2 The various probabilities in the previous examples are Conditional Probabilities. We write: P(Second card is Club | First is club) = 0.235 P(Red die shows 3 | Sum is 5) = 14 = 0.25 In speech, the vertical bar is read as “given that”. To calculate a Conditional Probability • Find all of the outcomes consistent with both events • Find all of the outcomes consistent with the condition 7 Then the Conditional Probability is the ratio of the corresponding probabilities. P(E And C) So P(E | C) = P(C) Or P(E And C) = P(E | C) P(C) Example: 2 fair dice, Red and Blue. Calculate P(Red die shows 1 Or 2 | Sum is 6) • 2 outcomes are consistent with both events: these are (1, 5) and (2, 4) 2 So probability = 36 • 5 outcomes are consistent with a sum of 6: these are (1, 5) (2, 4) (3, 3) (4, 2) (5, 1) 5 So probability = 36 So P(Red 1 Or Red 2 | Sum is 6) = 2/36 5/36 = 2 5 Example: Draw cards from standard pack without replacement. P(First 2 cards are clubs) = P(first club) P(second club | first club) 13 = 52 × 12 51 1 = 17 P(First 3 cards are clubs) = P(first 2 clubs) P(3rd club | first 2 clubs) 11 = 13 × 12 × 50 52 51 = 0.0129 Example: Win jackpot on National Lottery. Draw 6 numbers out of 49 without replacement. 5 4 6 P(Win) = 49 × 48 × 47 × = 7.2 × 10−8 = 13983815 : 1 3 46 × 2 45 × 1 44 With 2 draws a week, you will win on average once every 140 thousand years! Example: Can consider birthdays as being drawn from the 365 possibilities with approximately equal probabilities. = 0.9973 P(2 out of 2 different) = 364 365 364 P(3 out of 3 different) = 365 × 363 = 0.9918 365 364 362 P(4 out of 4 different) = 365 × 363 × = 0.9836 365 365 364 363 P(23 out of 23 different) = 365 × 365 × · · · × 343 365 = 0.4927 < 0.5 In the last two examples, the drawing was done without replacement. The probabilities of interest were for either all matching or none matching respectively. If the required probability is for a partial match, the calculations are rather more complicated. 8 Example: Ten bats out of a colony of 50 bats have been ringed. Five bats are caught at random from this colony. Assume that each bat is equally likely to be caught. Then the probability that the first four bats caught are unmarked and the last one caught has a ring: 40 39 38 37 10 × × × × 50 49 48 47 46 = 0.08627 = In a similar way, the probability could be calculated that only the first bat of five already had a ring. You should check that the probability comes out to be the same! This is true for any other position in the sequence, so the probability that only one bat from five trapped at this colony already has a ring is: 5 × 0.08627 = 0.431 Example: Four playing cards, two red and two black. Two cards are chosen at random. What is the probability that they are the same colour? Answer: Suppose that one card has been chosen. There are 3 possibilities for the second card. Only one of these is the same colour as the first card. This can also be considered as the number of ways of choosing 2 items out of 4 – see later. Example: 3 cards. One card is blank on both sides, one has X on both sides and the third has a side of each type. A card is selected at random and one side shown. If this is blank, what is the probability that the other side is blank? Answer: There are 3 sides that are blank out of the 6 possible. So P(Blank side chosen) = 36 = 12 . However, of these 3 sides, 2 have blank sides on the reverse. So P(Blank on reverse | Blank side chosen) = 32 . Example: TV game show (Monty Hall). There are 3 doors with a good prize behind only one door. The game show host knows which door conceals the prize. The show winner selects one of the 3 doors. The host opens one of the other 2 doors and shows that it does not hide a prize. Should the winner: • Stay with the selected door. • Switch to the unopened door. • Make random choice of 2 doors. Answer: Probability of the good prize for the selected door is 13 . Opening one of the other doors does not change this, so the probability for the other unopened door is 23 . So the winner should always switch! You can play this game at: http://math.ucsd.edu/∼crypto/Monty/monty.html Wikipedia has more information. The situation is not the same as that in the TV programme ‘Deal or No Deal’. 9 Example: Suppose there are 2 identical opaque bags A and B. Bag A contains 3 red counters and 1 blue counter. Bag B contains 2 red counters and 2 blue counters. One of the bags is chosen at random and a counter removed. If the selected counter is red, what is the probability that the chosen bag was bag A? Suppose we did 200 trials of this experiment. What results would we expect to get? Red Blue Total Bag A 75 25 100 Bag B 50 50 100 Total 125 75 200 Thus we would expect to get 125 red counters, and in 75 of these cases it would come from bag A. 75 So P(Bag A | Red counter) = 125 = 0.6 Note: If we actually did 200 trials, we are very unlikely to get the exact table above. Finding the expected numbers is just a device to make the calculations easier to understand. A Tree Diagram (Rees §5.13) is a good way of representing small problems. 3 4 1 2 PP PP PP P PP P 1 2 h hhhh hh A hhh hh hhhh h P= 3 8 Blue P = 1 8 Red P= 2 8 Blue P = 2 8 Red 1 4 1 2 PP P ( (((( ( ( ( B PP ((( PP (((((( PP PP PP PP PP PP PP PP 1 P P 2 P( Bag A | Red counter) = 3 8 3 2 + 8 8 = 3 5 = 0.6 This is an example of the use of Bayes Rule. The calculations above using a table of expected numbers or using a tree diagram can be written as a formula. Most students find the formula unhelpful and prefer to use one of these other methods. 10 Example: Cystic Fibrosis is the most serious of the human genetic diseases in the UK. 5% of population are carriers but have no symptoms. 25% of children of 2 carriers have the disease. There is a diagnostic test which is positive with probability 0.85 for carriers and is always negative for non-carriers. If someone tests negative, what is the probability of being a carrier? From 2000 individuals we would expect: 100 to be carriers (= 5%) and 85 of these to test positive. We expect the rest (1900) to be non-carriers and to test negative. Positive Negative Total So P(Carrier | Negative) = 15 1915 Carriers 85 15 100 Non-carriers 0 1900 1900 Total 85 1915 2000 = 0.0078 Example: A factory makes equal numbers of cars on each weekday. P(Fault | Monday) = 0.20 P(Fault | Friday) = 0.15 P(Fault | Other day) = 0.10 P(Monday car | Fault) 0.20 = 0.20 + 0.10 + 0.10 + 0.10 + 0.15 20 = 65 = 0.308 A Venn diagram for this: Mon Tue Wed 11 Thu Fri