Download Languages and Finite Automata

Document related concepts
no text concepts found
Transcript
Discrete Probability
CSC-2259 Discrete Structures
Konstantin Busch - LSU
1
Introduction to Discrete Probability
Unbiased die
Sample Space:
S  {1,2,3,4,5,6}
All possible outcomes
2
Event: any subset of sample space
E1  {3}
E2  {2,5}
Experiment: procedure that yields events
Throw die
Konstantin Busch - LSU
3
Probability of event
E:
size of event set
|E|
p( E ) 

size of sample space | S |
Note that:
0  p( E )  1
since
0 | E || S |
Konstantin Busch - LSU
4
What is the probability that
a die brings 3?
E  {3}
Event Space:
Sample Space:
Probability:
S  {1,2,3,4,5,6}
|E| 1
p( E ) 

|S| 6
5
What is the probability that
a die brings 2 or 5?
E  {2,5}
Event Space:
Sample Space:
Probability:
S  {1,2,3,4,5,6}
|E| 2
p( E ) 

|S| 6
6
Two unbiased dice
Sample Space: 36 possible outcomes
S  {(1,1), (1,2), (1,3), , (6,6)}
First die
Second die
Ordered pair
7
What is the probability that
two dice bring (1,1)?
Event Space:
E  {(1,1)}
Sample Space: S  {(1,1), (1,2), (1,3), , (6,6)}
Probability:
|E| 1
p( E ) 

| S | 36
8
What is the probability that
two dice bring same numbers?
Event Space: E  {(1,1), (2,2), (3,3), (4,4), (5,5), (6,6)}
Sample Space: S  {(1,1), (1,2), (1,3), , (6,6)}
Probability:
|E| 6
p( E ) 

| S | 36
9
Game with unordered numbers
Game authority selects
a set of 6 winning numbers out of 40
Number choices: 1,2,3,…,40
i.e. winning numbers: 4,7,16,25,33,39
Player picks a set of 6 numbers
(order is irrelevant)
i.e. player numbers: 8,13,16,23,33,40
What is the probability that a player wins?
Konstantin Busch - LSU
10
Winning event:
E  {{4,7,16,25,33,39}} | E | 1
a single set with the 6 winning numbers
Sample space:
S  {all subsets with 6 numbers out of 40}
 {{1,2,3,4,5,6},{1,2,3,4,5 ,7},{1,2,3,4,5,8},}
 40 
| S |    3,838,380
6
Konstantin Busch - LSU
11
Probability that player wins:
|E|
1
1
P( E ) 


| S |  40  3,838,380
 
6
Konstantin Busch - LSU
12
A card game
Deck has 52 cards
13 kinds of cards (2,3,4,5,6,7,8,9,10,a,k,q,j),
each kind has 4 suits (h,d,c,s)
Player is given hand with 4 cards
What is the probability that the cards
of the player are all of the same kind?
Konstantin Busch - LSU
13
Event: E  {{2 h ,2d ,2c ,2s },{3 h ,3d ,3c ,3s },,{jh , jd , jc , js }}
| E | 13
each set of 4 cards is of
same kind
Sample Space:
S  {all possible sets of 4 cards out of 52}
 {{2 h ,2d ,2c ,2s },{2 h ,2d ,2c ,3h },{2 h ,2d ,2c ,3d },}
 52  52! 52  51 50  49
| S |   

 270,725
4 3 2
 4  4! 48!
Konstantin Busch - LSU
14
Probability that hand has 4 same kind cards:
|E|
13
13
P( E ) 


| S |  52  270,725
 
4
Konstantin Busch - LSU
15
Game with ordered numbers
Game authority selects from a bin 5 balls
in some order labeled with numbers 1…50
Number choices: 1,2,3,…,50
i.e. winning numbers: 37,4,16,33,9
Player picks a set of 5 numbers
(order is important)
i.e. player numbers: 40,16,13,25,33
What is the probability that a player wins?
Konstantin Busch - LSU
16
Sampling without replacement:
After a ball is selected
it is not returned to bin
Sample space size: 5-permutations of 50 balls
50!
50
| S | P(50,5) 

 50  49  48  47  46  245,251,200
(50  5)! 45!
|E|
1
Probability of success: P( E ) 

| S | 245,251,200
Konstantin Busch - LSU
17
Sampling with replacement:
After a ball is selected
it is returned to bin
Sample space size: 5-permutations of 50 balls
with repetition
| S | 50  312,500,000
5
|E|
1
Probability of success: P( E ) 

| S | 312,500,000
Konstantin Busch - LSU
18
Probability of Inverse:
Proof:
p( E )  1  p( E )
E SE
|S E| |S || E|
|E|
p( E ) 

 1
 1  p( E )
|S|
|S|
|S|
End of Proof
Konstantin Busch - LSU
19
Example: What is the probability that
a binary string of 8 bits contains
at least one 0?
E  {01111111, 10111111,,00111111,,00000000}
E  {11111111}
|E|
1
p( E )  1  p( E )  1 
 1 8
|S|
2
Konstantin Busch - LSU
20
Probability of Union:
E1 , E2  S
p( E1  E2 )  p( E1 )  p( E2 )  p( E1  E2 )
Proof: | E1  E2 || E1 |  | E2 |  | E1  E2 |
| E1  E2 | | E1 |  | E1 |  | E1  E2 |
p( E1  E2 ) 

|S|
|S|
| E1 | | E2 | | E1  E2 |



|S| |S|
|S|
 p( E1 )  p( E2 )  p( E1  E2 )
Konstantin Busch - LSU
End of Proof 21
Example: What is the probability that
a binary string of 8 bits
starts with 0 or ends with 11?
Strings that start with 0:
E1  {00000000, 00000001,,01111111}
| E1 | 27 (all binary strings with 7 bits 0xxxxxxx)
Strings that end with 11:
E2  {00000011, 00000111,,11111111}
| E2 | 2
6
(all binary strings with 6 bits xxxxxx11)
Konstantin Busch - LSU
22
Strings that start with 0 and end with 11:
E1  E2  {000000011, 00000111,,01111111}
| E1  E2 | 2 (all binary strings with 5 bits 0xxxxx11)
5
Strings that start with 0 or end with 11:
p ( E1  E2 )  p ( E1 )  p ( E2 )  p ( E1  E2 )
| E1 | | E2 | | E1  E2 |



|S| |S|
|S|
7
6
5
2
2 2
1 1 1 5
 8 8 8    
2
2 2
2 4 8 8
Konstantin Busch - LSU
23
Probability Theory
Sample space:
S  {x1 , x2 ,, xn }
Probability distribution function
p:
0  p( xi )  1
n
 p( x )  1
x 1
i
Konstantin Busch - LSU
24
Notice that it can be:
p( xi )  p ( x j )
Example: Biased Coin
Heads (H) with probability 2/3
Tails (T) with probability 1/3
Sample space:
2
p( H ) 
3
S  {H , T }
1
p (T ) 
3
2 1
p( H )  p(T )    1
3 3
Konstantin Busch - LSU
25
Uniform probability distribution:
1
p ( xi ) 
n
Sample space:
S  {x1 , x2 ,, xn }
Example: Unbiased Coin
Heads (H) or Tails (T) with probability 1/2
S  {H , T }
1
p( H ) 
2
Konstantin Busch - LSU
1
p (T ) 
2
26
Probability of event
E:
E  {x1 , x2 ,, xk }  S
k
p ( E )   p ( xi )
i 1
|E|
For uniform probability distribution: p ( E ) 
|S|
Konstantin Busch - LSU
27
Example: Biased die
S  {1,2,3,4,5,6}
1
p(1)  p(2)  p(3)  p(4)  p(5) 
7
What is the probability
that the die outcome is 2 or 6?
2
p(6) 
7
E  {2,6}
1 2 3
p( E )  p(2)  p(6)   
7 7 7
Konstantin Busch - LSU
28
Combinations of Events:
Complement:
p( E )  1  p( E )
Union: p( E1  E2 )  p( E1 )  p( E2 )  p( E1  E2 )


Union of disjoint events: p  Ei    p( Ei )
 i
 i
Konstantin Busch - LSU
29
Conditional Probability
Three tosses of an unbiased coin
Tails
Heads
Tails
Condition:
first coin is Tails
Question:
What is the probability that
there is an odd number of Tails,
given that first coin is Tails?
Konstantin Busch - LSU
30
Sample space:
S  {TTT , TTH , THT , THH , HTT , HTH , HHT , HHH }
Restricted sample space given condition:
F  {TTT , TTH , THT , THH }
first coin is Tails
Konstantin Busch - LSU
31
Event without condition:
E  {TTT , THH , HTH , HHT }
Odd number of Tails
Event with condition:
EF  E  F  {TTT , THH }
first coin is Tails
Konstantin Busch - LSU
32
F  {TTT , TTH , THT , THH }
EF  E  F  {TTT , THH }
Given condition,
the sample space changes to F
| E  F | | E  F | / | S | p( E  F ) 2 / 8
p( EF ) 



 0.5
|F|
|F|/|S|
p( F )
4/8
(the coin is unbiased)
Konstantin Busch - LSU
33
Notation of event with condition:
EF  E | F
event
E given F
p( E  F )
p( EF )  p( E | F ) 
p( F )
Konstantin Busch - LSU
34
Conditional probability definition:
(for arbitrary probability distribution)
Given sample space S with
events E and F (where p ( F )  0 )
the conditional probability of E given
F is:
p( E  F )
p( E | F ) 
p( F )
Konstantin Busch - LSU
35
Example: What is probability that a family
of two children has two boys
given that one child is a boy
Assume equal probability to have boy or girl
Sample space:
Condition:
S  {BB , BG , GB, GG}
F  {BB , BG , GB}
one child is a boy
Konstantin Busch - LSU
36
Event:
E  {BB}
both children are boys
Conditional probability of event:
p( E  F )
p({BB})
1/ 4 1
p( E | F ) 



p( F )
p({BB , BG , GB}) 3 / 4 3
Konstantin Busch - LSU
37
Independent Events
Events
E1
and
E2
are independent iff:
p( E1  E2 )  p( E1 ) p( E2 )
Equivalent definition (if p( E2 )  0 ):
p( E1 | E2 )  p( E1 )
Konstantin Busch - LSU
38
Example: 4 bit uniformly random strings
E1 : a string begins with 1
E2 : a string contains even 1
E1  {1000, 1001, 1010, 1011, 1100, 1101, 1110, 1111}
E2  {0000, 0011, 0101, 0110, 1001, 1010, 1100, 1111}
E1  E2  {1111, 1100, 1010, 1001}
| E1 || E2 | 8
| E1  E2 | 4
8 1
p( E1 )  p ( E2 ) 

16 2
4 1 1 1
p( E1  E2 ) 
    p( E1 ) p( E2 )
16 4 2 2
Events E1 and E2 are independent
Konstantin Busch - LSU
39
Bernoulli trial: Experiment with two outcomes:
success or failure
Success probability:
Failure probability:
p
q  1 p
Example: Biased Coin
Success = Heads
2
p  p( H ) 
3
Failure = Tails
1
q  p (T ) 
3
Konstantin Busch - LSU
40
Independent Bernoulli trials:
the outcomes of successive Bernoulli trials
do not depend on each other
Example: Successive coin tosses
Konstantin Busch - LSU
41
Throw the biased coin 5 times
What is the probability to have 3 heads?
Heads probability:
Tails probability:
2
p
3
1
q
3
Konstantin Busch - LSU
(success)
(failure)
42
HHHTT
HTHHT
HTHTH
THHTH


Total numbers of ways to arrange in sequence
5 coins with 3 heads:  5 
 
 3
Konstantin Busch - LSU
43
Probability that any particular sequence has
3 heads and 2 tails is specified positions:
3 2
pq
For example:
HHHTT
p
p
p
q
q
pppqq  p q
HTHHT
p
q
p
p
q
pqppq  p q
HTHTH
p
q
p
q
Konstantin Busch - LSU
p
3 2
3 2
pqpqp  p q
3 2
44
Probability of having 3 heads:
 5 3 2
p q  p q    p q    p q
 3
3
2
1st
sequence
success
(3 heads)
3
2
3
2nd
sequence
success
(3 heads)
Konstantin Busch - LSU
2
 5  st
 
 3
sequence
success
(3 heads)
45
Throw the biased coin 5 times
Probability to have exactly 3 heads:
 5 3 2
  p q
 3
5!  2 

 
3!2!  3 
3
2
1
   0.0086
 3
Probability to have 3 heads and 2 tails
in specified sequence positions
All possible ways to arrange in sequence 5 coins with 3 heads
Konstantin Busch - LSU
46
Theorem: Probability to have k successes
in n independent Bernoulli trials:
 n  k nk
  p q
k 
Also known as
binomial probability distribution:
 n  k nk
b(k ; n, p)    p q
k 
Konstantin Busch - LSU
47
Proof:
 n  k nk
  p q
k 
Total number of
sequences with
k successes and
n  k failures
Probability that
a sequence has
k successes and
n  k failures
in specified positions
Example:
SFSFFS…SSF
End of Proof
Konstantin Busch - LSU
48
Example: Random uniform binary strings
probability for 0 bit is 0.9
probability for 1 bit is 0.1
What is probability of 8 bit 0s out of 10 bits?
i.e. 0100001000
p  0 .9
q  0.1
k 8
n  10
 n  k nk 10 
b(k ; n, p)    p q   (0.9)8 (0.1) 2  0.1937102445
k 
8
Konstantin Busch - LSU
49
Birthday Problem
Birthday collision: two people have birthday
in same day
Problem:
How many people should be in a room
so that the probability of birthday collision
is at least ½?
Assumption: equal probability to be born in any day
Konstantin Busch - LSU
50
366 days in a year
If the number of people is 367 or more
then birthday collision is guaranteed by
pigeonhole principle
Assume that we have
n  366
Konstantin Busch - LSU
people
51
We will compute
pn
:probability that n people have
all different birthdays
It will helps us to get
1  pn :probability that there is
a birthday collision among n people
Konstantin Busch - LSU
52
Sample space:
Cartesian product
S  {1,2,,366} {1,2,,366}   {1,2,,366}
1st person’s
Birthday
choices
2nd person’s
Birthday
choices
nth person’s
Birthday
choices
S  {(1,1,,1), (2,1,,1),  (366,366,,366)}
Sample space size: | S | 366  366366  366n
Konstantin Busch - LSU
53
Event set: each person’s birthday is different
E  {(1,2,,366), (366,1,,365), , (366,365,,1)}
1st person’s
birthday
Sample size:
2nd person’s
birthday
#choices
#choices
nth person’s
birthday
#choices
366!
| E | P(366, n) 
 366  365  364(366  n  1)
(366  n)!
Konstantin Busch - LSU
54
Probability of no birthday collision
| E | 366  365  364(366  n  1)
pn 

n
|S|
366
Probability of birthday collision:
1  pn
n  22
1  pn  0.475
n  23
1  pn  0.506
Konstantin Busch - LSU
55
Probability of birthday collision:
n  23
Therefore:
1  pn
1  pn  0.506
n  23
people have probability
at least ½ of birthday collision
Konstantin Busch - LSU
56
The birthday problem analysis can be used
to determine appropriate hash table sizes
that minimize collisions
Hash function collision:
h(k1 )  h(k2 )
Konstantin Busch - LSU
57
Randomized algorithms:
algorithms with randomized choices
(Example: quicksort)
Las Vegas algorithms:
randomized algorithms whose output
is always correct (i.e. quicksort)
Monte Carlo algorithms:
randomized algorithms whose output
is correct with some probability
(may produce wrong output)
Konstantin Busch - LSU
58
A Monte Carlo algorithm
Primality_Test( n,k ) {
for(i  1 to k ) {
b  random_num ber(1, ,n)
if (Miller_Test( n,b ) == failure)
return(false)
}
// n is not prime
}
return(true) // most likely n is prime
Konstantin Busch - LSU
59
Miller_Test( n,b ) {
s
n-1  2 t
s, t  0, s  log n, t is odd
for ( j  0 to s-1 ) {
j
t
2
if ( b  1 ( mod n) or b t  1 ( mod n) )
return(success)
}
return(failure)
}
Konstantin Busch - LSU
60
A prime number n passes the
Miller test for every 1  b  n
A composite number n passes the
Miller test in range 1  b  n
for fewer than n numbers
4
1
false positive with probability:
4
Konstantin Busch - LSU
61
If the primality test algorithm returns false
then the number is not prime for sure
If the algorithm returns true then the
answer is correct (number is prime)
with high probability:
k
1
1
1    1  1
n
4
for k  log 4 n
Konstantin Busch - LSU
62
Bayes’ Theorem
p( E )  0
p( F )  0
p( E | F ) p( F )
p( F | E ) 
p( E | F ) p( F )  p( E | F ) p( F )
Applications: Machine Learning
Spam Filters
Konstantin Busch - LSU
63
Bayes’ Theorem Proof:
p( E  F )
p( F | E ) 
p( E )
p( E  F )  p( F | E ) p( E )
p( E  F )
p( E | F ) 
p( F )
p( E  F )  p( E | F ) p( F )
p( F | E ) p( E )  p( E | F ) p( F )
p( E | F ) p( F )
p( F | E ) 
p( E )
Konstantin Busch - LSU
64
E  (E  F )  (E  F )
p( E )  p( E  F )  p( E  F )
(E  F )  (E  F )  
p( E  F )  p( E | F ) p( F )
p( E  F )  p( E | F ) p( F )
p( E )  p( E | F ) p( F )  p( E | F ) p( F )
Konstantin Busch - LSU
65
p( E | F ) p( F )
p( F | E ) 
p( E )
p( E )  p( E | F ) p( F )  p( E | F ) p( F )
p( E | F ) p( F )
p( F | E ) 
p( E | F ) p( F )  p( E | F ) p( F )
End of Proof
Konstantin Busch - LSU
66
Example: Select random box
then select random ball in box
Box 1
Box 2
Question:
If a red ball is selected,
what is the probability it was taken from box 1?
Konstantin Busch - LSU
67
E : select red ball
F : select box 1
E : select green ball
F : select box 2
Question probability: P( F | E )  ?
Question:
If a red ball is selected,
what is the probability it was taken from box 1?
Konstantin Busch - LSU
68
Bayes’ Theorem:
p( E | F ) p( F )
p( F | E ) 
p( E | F ) p( F )  p( E | F ) p( F )
We only need to compute:
p( F )
p( F )
p( E | F )
Konstantin Busch - LSU
p( E | F )
69
E : select red ball
F : select box 1
E : select green ball
F : select box 2
Box 1
Box 2
p( F )  1 / 2  0.5
p( F )  1 / 2  0.5
Probability
to select box 1
Probability
to select box 2
Konstantin Busch - LSU
70
E : select red ball
F : select box 1
E : select green ball
F : select box 2
Box 1
Box 2
p( E | F )  7 / 9  0.777...
Probability to select
red ball from box 1
p( E | F )  3 / 7  0.428....
Probability to select
red ball from box 2
Konstantin Busch - LSU
71
p( E | F ) p( F )
p( F | E ) 
p( E | F ) p( F )  p( E | F ) p( F )
p( F )  1 / 2  0.5
p( F )  1 / 2  0.5
p( E | F )  7 / 9  0.777...
p( E | F )  3 / 7  0.428....
0.777  0.5
0.777
p( F | E ) 

 0.644
0.777  0.5  0.428  0.5 0.777  0.428
Final
result
Konstantin Busch - LSU
72
What if we had more boxes?
Generalized Bayes’ Theorem:
p ( Fj | E ) 
p ( E | Fj ) p ( Fj )
n
 p( E | F ) p( F )
i 1
i
i
Sample space S  F1  F2    Fn
mutually exclusive events
Konstantin Busch - LSU
73
Spam Filters
Training set:
Spam (bad) emails
B
Good emails
G
A user classifies each email
in training set as good or bad
Konstantin Busch - LSU
74
Find words that occur in B and
G
nG (w)
nB (w)
number of spam emails
that contain word w
number of good emails
that contain word w
nB ( w)
p(w) 
|B|
nG ( w)
q( w) 
|G |
Probability that
a good email
contains w
Probability that
a spam email
contains w
Konstantin Busch - LSU
75
A new email X arrives
S:
Event that X is spam
E:
Event that X contains word
w
What is the probability that X is spam
given that it contains word w?
P( S | E )  ?
Reject if this probability is at least 0.9
Konstantin Busch - LSU
76
p( E | S ) p( S )
p( S | E ) 
p( E | S ) p( S )  p( E | S ) p( S )
We only need to compute:
p( S )
0.5
p( S )
0.5
simplified
assumption
p( E | S )
nB ( w)
p(w) 
|B|
p( E | S )
nG ( w)
q( w) 
|G |
Computed from training set
Konstantin Busch - LSU
77
Example:
Training set for word “Rolex”:
“Rolex” occurs in 250 of 2000 spam emails
“Rolex” occurs in 5 of 1000 good emails
If new email contains word “Rolex”
what is the probability that it is a spam?
Konstantin Busch - LSU
78
“Rolex” occurs in 250 of 2000 spam emails
nB ( Rolex )  250
nB ( Rolex ) 250
p( Rolex ) 

 0.125
|B|
2000
“Rolex” occurs in 5 of 1000 good emails
nG ( Rolex )  5
nG ( Rolex )
5
q( Rolex ) 

 0.005
|G|
1000
Konstantin Busch - LSU
79
If new email X contains word “Rolex”
what is the probability that it is a spam?
S:
Event that X is spam
E:
Event that X contains word “Rolex”
P( S | E )  ?
Konstantin Busch - LSU
80
p( E | S ) p( S )
p( S | E ) 
p( E | S ) p( S )  p( E | S ) p( S )
We only need to compute:
p( S )
p( S )
0.5
0.5
simplified
assumption
p( E | S )
p( E | S )
p( Rolex )  0.125
q( Rolex )  0.005
Computed from training set
Konstantin Busch - LSU
81
0.125  0.5
0.125
p( S | E ) 

 0.961...
0.125  0.5  0.005  0.5 0.13
New email is considered to be spam because:
p( S | E )  0.961  0.9 spam threshold
Konstantin Busch - LSU
82
Better spam filters use two words:
p( E1 | S ) p( E2 | S )
p( S | E1  E2 ) 
p( E1 | S ) p( E2 | S )  p( E1 | S ) p( E2 | S )
Assumption: E1 and E2 are independent
the two words appear
independent of each other
Konstantin Busch - LSU
83
Related documents