Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
PRIVATE COMMUNICATION XXX–XX–AB/VER 0.1 1 Notes on Probability Theory Ayush Bhandari (Work in Progress) Contents I II Axioms of Probability I-A Set Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . I-A1 Set Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . I-A1a Subsets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . I-A1b Mutually Exclusive Sets or Disjoint Sets . . . . . . . . . . . . . . . . I-A2 Partitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . I-A3 Complements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . I-B Probability Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . I-B1 The Axioms of Probability . . . . . . . . . . . . . . . . . . . . . . . . . I-B2 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . I-B3 Von Mises and the Frequency Interpretation . . . . . . . . . . . . . . . I-B4 Equality of Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . I-B5 The class F of events . . . . . . . . . . . . . . . . . . . . . . . . . . . . I-B5a Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . I-B6 Axiomatic Definition of an Experiment . . . . . . . . . . . . . . . . . . I-B6a Countable Spaces: Case when I consists of N countable outcomes. I-B6b Non-countable Spaces: The real line . . . . . . . . . . . . . . . . . . I-C Conditional Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . I-C1 Total Probability and Bayes’ Theorem . . . . . . . . . . . . . . . . . . . I-C2 Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 2 2 2 2 2 2 2 3 3 3 3 3 4 4 4 4 5 5 6 Repeated Trials II-A Combined Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . II-A1 Cartesian Products or CP . . . . . . . . . . . . . . . . . . . II-A2 Cartesian Products of Experiments . . . . . . . . . . . . . . II-A2a Independent Experiments . . . . . . . . . . . . . . . . . II-B Bernoulli Trials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . II-B1 Success and Failure of an event A in n independent trials . II-B1a Most Likely Number of Successes . . . . . . . . . . . . . II-B1b Success of an event in an interval . . . . . . . . . . . . . II-C Asymptotic Theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . II-C1 Normal Distribution (for large values of n in pn (k)) . . . . II-C1a DeMoivre–Laplace Theorem . . . . . . . . . . . . . . . . II-C1b Law of Large Numbers . . . . . . . . . . . . . . . . . . . II-C1c Generalization of Bernoulli Trials . . . . . . . . . . . . . II-C1d Generalized DeMoivre–Laplace Theorem . . . . . . . . . II-C2 Poisson Distribution for Rare Events or when p ≪ 1 . . . . II-C3 Random Poisson Points . . . . . . . . . . . . . . . . . . . . II-C4 Asymptotic Independence of Random Poisson Points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 6 6 6 6 7 7 7 7 7 7 8 8 9 9 9 9 10 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ayush Bhandari is with the Biomedical Imaging Laboratory at the Ecole Polytechnique Fedéralé de Lausanne (EPFL), Station 17, CH–1015 Lausanne VD, Switzerland (e-mail: Ayush.Bhandari@{epfl.ch, googlemail.com}). 2 III PRIVATE COMMUNICATION XXX–XX–AB/VER 0.1 Concept of Random Variable 10 AB et al.: LECTURE NOTES 3 I. Axioms of Probability A. Set Theory A set is a collection of objects called elements, example: A = {ζ1 , . . . , ζn } or A = {all positive, even numbers}. Similarly ζk ∈ A and ζk ∈ / A means that ζk is and is not an element of A , respectively. An empty or null set denoted by {} or ∅ is the one which contains no element. A set containing k elements will have 2k subsets whose collection is a Power Set. Example Let fk denote the faces of a die, then, {∅} , . . . , {f1 } , . . . , {f1 , f2 } , . . . , {f1 , f2 , f3 } , . . . , I are the 26 subsets. 1) Set Operations: a) Subsets: B ⊂ A or A ⊃ B implies that B is a subset of A or every element of B is an element of A . § {∅} ⊂ A ⊂ A ⊂ I . The following are the consequences, • Transitivity — If C ⊂ B and B ⊂ A then C ⊂ A . • Equality — A = B ⇔ B ⊂ A and A ⊃ B . • Sums or Unions — The commutative and associate operation: A ∪ B or A + B . § B ⊂ A then A + B = A and therefore, A +A =A, • A + ∅ = A, A +I = I. Products or Intersections: The commutative, associate and distributive operation: A B or A ∩ B . § A ⊂ B then, AB =A, AA =A, A ∅ = ∅, AI =A. b) Mutually Exclusive Sets or Disjoint Sets: Iff A B = ∅ and for several sets {Ak }k∈ K , Ai Aj = ∅, ∀i ̸= j . 2) Partitions: Partition A of I is a union of mutually exclusive subsets of I : A1 + · · · + An = I , Ai Aj = ∅, i ̸= j and the partition is denoted by: A = [A1 , . . . , An ]. 3) Complements: A or A c is a set of all elements of I which are not in A . § A +A = I, A A = ∅, A =A, I = ∅, ∅ = I. § Property of subsets: If B ⊂ A then A ⊂ B . § De Morgan’s Law: A + B = A · B and A · B = A + B § Duality Principle: Removing all the overbars preserves the identity. B. Probability Space The space I is called certain space whose elements are called experimental outcomes and whose subsets are events. The empty set ∅ is an impossible event and any event {ζk } with single element ζk is an elementary event. 4 PRIVATE COMMUNICATION XXX–XX–AB/VER 0.1 Trial is the single performance of an experiment. At each trial, we observe a single outcome ζk . An event A occurs during the trial if ζk ∈ A . A certain event ζm ∈ I occurs on every trial and the impossible event ∅ never occurs. At each trial one of A or A occurs. Other events that might occur are A + B or A B . 1) The Axioms of Probability: The probability of an event A is a number P (A ) which satisfies the following three conditions: P (A ) ≥ 0 P (I ) = 1 A B = ∅ ⇒ P (A ∪ B) = P (A ) + P (B). (∪ ) ∑ ∩ The last axiom has an easy generalization j,k∈K Aj Ak = ∅, P k∈K Ak = k∈K P (Ak ). In the development I II III of the theory of probability, all conclusions are directly or indirectly based on the above mentioned axioms. 2) Properties: 1) The probability of an impossible event is 0 or P (∅) = 0. § The property of intersections asserts A ∅ = ∅ ⇒ A and ∅ are mutually exclusive. Also, since A + ∅ = A , we have: P (A ) = P (A + ∅) = P (A ) + P (∅) ⇒ P (∅) = 0. ( ) 2) The probability of an event is a positive number less than 1 or ∀A , P (A ) = 1 − P A ≤ 1. § ( ) ( ) A + A = I and A A = ∅. Therefore, P (I ) = P A + A = P (A ) + P A but P (I ) = 1 and the result follows. ( ) 3) The subset property: If B ⊂ A then P (A ) ≥ P (B). § A B ∩ A = 0. 4) Probability of events: ∀A , B we have P (A + B) = P (A ) + P (B) − P (A B) ≤ P (A ) + P (B). § Three cases: [1] A ∩ B = ∅ <easy> [2] B ⊂ A <easy> [3] A ∩ B ̸= ∅ so set ( ) B0 = A ∩ B → P (A ∪ B0 = A ∪ B) = P (A ) + P (B0 ) but P (B) = P (A ∩ B) + P B0 = A ∩ B . 3) Von Mises and the Frequency Interpretation: The classical theory depends on the account of empirical experiments and the associate probabilty is defined as P(A ) = nA outcomes favorable to event A = . N Total number of trials The Axioms of Probability are compatible with this definition. I II III P (A ) ≥ 0 since nA ≥ 0 and n > 0. P (I ) = 1 because I occurs at every trial. If A B = ∅ then nA ∪B = nA + nB . Consequently, P (A ∪ B) = nA +B nA + nB = = P (A ) + P (B) . n n For the general case: nA ∪B = nA + nB − nA ∩B . 4) Equality of Events: Two events are equal if they have the same elements. 5) The class F of events: Events are subsets of I to which a probability has been assigned. In practice, it is meaningful to associate a probability with only a specific subclass F of subsets of I . Example if in the die experiment, we are interested in the outcomes ‘odd’ and ‘even’, then it suffices to mark the subclass AB et al.: LECTURE NOTES 5 F = {{∅}, odd, even, I }. In the subsequent discussion, F will be chosen as the unions and intersections of the events. a) Fields: A field is a non-empty class of sets so that: If If A ∈F A ∈F B∈F and A ∈F then then A ∪B ∈F § If A ∈ F and B ∈ F then A ∩ B ∈ F since A , B ∈ F and so is true for the unions and intersections of A and B . § F contains the impossible event and the certain event. This is because by definition, a field is non-empty. Assume that it contains at least one element A . Since A ∪ A = I ∈ F we also have I = ∅ ∈ F. All sets that can be written as unions and intersections of finitely many sets in F are also in F. This, however, might not be true for infinitely many sets. Borel Fields: Let {Ak }∞ k=0 ∈ F—an infinite sequence of sets in F. If the unions and intersections of Ak belong to F then F is called a Borel Field. Events: Certain subsets of I that form a Borel Field. The key idea is to assign probabilities to the finite unions and intersections of events together with their limits. Repeated application of the last AOP results in ∑ conclusion that P (∪Ak ) = k∈K P (Ak ). However, this extension does not result in: P (∪ k∈Z ) ∑ P (Ak ). Ak = k This is an additional condition which will be referred to as axiom of infinite additivity. 6) Axiomatic Definition of an Experiment: An experiment is specified in terms of: • • • Set I of all experimental outcomes. The Borel Field of all event of I . The probabilities of these events. What follows next is the determination of probabilities in experiments with finitely and infinitely many elements. a) Countable Spaces: Case when I consists of N countable outcomes.: Since N is finite, the probabilities of all the events can be expressed as the probabilities of their elementary events: P (ζk ) = pk and from the axioms ift: pk ≥ 0, ∑ pk = 1. k Analogy with the classical definition: If I consists of N outcomes and the probabilities pk of elementary events are all equal then pk = 1/N . b) Non-countable Spaces: The real line: If I consists of infinitely many elementary events, then its probability can not be determined in terms of the probability of the elementary events. For example, if I = R then, it can shown that it is impossible to assign probability to all subsets of I so to satisfy the axioms. However, one might consider events as bounded intervals and their countable unions and intersections, provided that the events form a field F. This field (usually the Borel Field) contains all open and closed intervals and all points. 6 PRIVATE COMMUNICATION XXX–XX–AB/VER 0.1 A1 A2 An B Fig. 1. Proof of Total Probability Theorem: P (B) = ∑ k P (B ∩ Ak ) but P (B ∩ Ak ) = P (B |Ak ) P (Ak ) and thus, P (B) = P (B|A1 ) P (A1 ) + · · · + P (B|An ) P (An ). Suppose α(x) is a positive function such that: ∫ α ≥ 0, α (x) dx = 1 R then, we define the probability of an event {x ≤ xk } by: ∫ P (x ⩽ xk ) = Also, we have, P (x ∈ (xk , xk+1 ]) = ∫ xk+1 xk α (x) dx. −∞ α (x) dx and for the case when the intervals are mutually exclusive: P (x ⩽ x1 ) + P (x1 < x ⩽ x2 ) = P (x ⩽ x2 ). Another interesting property is that when the function α is xk bounded, we have, ∫ lim P (x ∈ (xk , xk+1 ]) = xk →xk+1 xk+1 α (x) dx = 0. xk One can deduce that the probability of point mass, xk+1 is 0 and in this case, probability of all elementary events of I is 0 although the probability of their unions is 1. C. Conditional Probability The Conditional Probability of an event A assuming B or A given B is denoted by def PB (A ) = P (A |B) = P (A ∩ B) , P (B) P (B) ̸= 0. § If B ⊂ A then P (A |B) = 1. ) § if A ⊂ B then P (A |B) = PP(A (B) . —The conditional probabilities for a fixed B are indeed probabilities for the satisfy the AOP: A1) Since P (X ) ≥ 0, X = A ∩ B and P (B) > 0, we have P (A |B) ≥ 0. A2) Since B ⊂ I , we have P (I |B) = 1. A3) If X ∩ Y = ∅ then P (X ∪ Y |B) = P (X |B) + P (Y |B). 1) Total Probability and Bayes’ Theorem: Total Probability Theorem: If A = [A1 , . . . , An ] is a partition of I and B is an arbitrary event then, P (B) = P (B|A1 ) P (A1 ) + · · · + P (B|An ) P (An ) ∩ Note that k Ak = ∅ and therefore, P (B) = proof follows. Also, because, ∑ k P (B ∩ Ak ). And since, P (B ∩ Ak ) = P (B |Ak ) P (Ak ), the P (B ∩ A k ) = P (B |A k ) P (A k ) = P (A k |B ) P (B) , | {z } | {z } a priori a posteriori AB et al.: LECTURE NOTES 7 we have the so called Bayes’ Theorem: P (B |Ak ) P (Ak ) P (A k |B ) = . | {z } P (B|A1 ) P (A1 ) + · · · + P (B|An ) P (An ) {z } | a posteriori in terms of a priori 2) Independence: Two events are independent if P (A ∩ B) = P (A ) P (B). If so is the case then the events (a) A , B and (b) A , B are also independent. II. Repeated Trials The key idea of this section is to study the probability of an event that occurs k–times in an experiment that is repeated n–times. Quantitatively, if the probability of certain event A of an experiment I is p. If this experiment is repeated n–times, then the probability that it occurs k–times, in any order, is: ( ) n k pn (k) = p (1 − p)n−k . | {z } k P(A ) A. Combined Experiments 1) Cartesian Products or CP: Given two sets, say I1 and I2 with elements: {ak }k∈K and {bm }m∈M , respectively, their Cartesian Product is defined as: I = I1 × I2 , {ak bm } ∈ I , ak ∈ I1 and bm ∈ I2 . Now if A ⊂ I1 and B ⊂ I2 then C = A × B consists of all pairs {ak bm }, am ∈ A , bm ∈ B . Alternatively, A × B = (A × I2 ) ∩ (I1 × B) . 2) Cartesian Products of Experiments: CP product of two experiments results in a new experiment of form: I1 × I2 | {z } = I (New Experiment) CP of experiments whose events are all CP of form: A ×B A is an event of I1 B is an event of I2 Their unions and intersections . In the combined experiment I , we can assign probabilities to events above so that P (A × I2 ) = P1 (A ) | {z } Probability of event A in exp. I1 and P (I2 × B) = P2 (B) . | {z } Probability of event B in exp. I2 a) Independent Experiments: P (A × B) = P (A × I2 ) P (I2 × B) = P1 (A ) P2 (B) . And since the elementary event {ak , bm } can be written as the CP: {ak } × {bm }, we have: P ({ak , bm }) = P ({ak }) P ({bm }). 8 PRIVATE COMMUNICATION XXX–XX–AB/VER 0.1 B. Bernoulli Trials Let |F | denote the cardinality of some set F . If |A | = n then the total number of subsets Ak ⊂ A such that |Ak | = k is given by, ( ) n k = n! . k! (n − k)! 1) Success and Failure of an event A in n independent trials: Let A be an event of experiment I . It may be the only event of I or one of the many events of I . Now let: P (A ) = p, ( ) P A = q = 1 − p. Repeating the experiment n–times, we have, I n = I1 × · · · × In . The probability Pk (A ) = pn (k) such that the event A occurs exactly k–times in any order is given by: ( ) n k Pk (A ) = pn (k) = p (1 − p)n−k , k Fundamental Theorem. a) Most Likely Number of Successes: The function: ( ) n pn (k) = pk · (1 − p)n−k k |{z} | {z } Success Failure is symmetric about k = n/2. Consequently, there is a point which maximizes the value of pn (k) for fixed n ∈ Z. Note that: kq pn (k − 1) = =1 pn (k) (n − k + 1) p ⇒ kmax = (n + 1)p . Since p ∈ R, and k ∈ Z, the integer value of k that maximizes pn (k) is kmax = ⌊(n + 1) p⌋. In case, k ⋆ = kmax ∈ Z, it turns out that k = k ⋆ and k − 1 = k ⋆ and therefore, two values of k maximize pn (k). b) Success of an event in an interval: P ({k1 ⩽ k ⩽ k2 }) | {z } Probablity that A occurs in order = k=k ∑2 pn (k) k=k1 C. Asymptotic Theorems 1) Normal Distribution (for large of n in pn (k)): We will use the following definition of the Normal ( 2values ) x 1 Distribution g (x) = √2π exp − 2 and an auxiliary function, ∫x G (x) = g (x) dx −∞ and G(−∞) = 0. Also, G (−x) = 1 − G (x). This function is linked with the error function as erf (x) = G(x) − 12 . Also, for large x, we can use the approximation: G(x) ≃ 1 − g(x) x . AB et al.: LECTURE NOTES 9 a) DeMoivre–Laplace Theorem: Theorem 1 (DeMoivre–Laplace Theorem). If npq ≫ 1 then for k in the be approximated as: ( ) n k n−k 1 2 p q ≃√ e−(k−np) /(2npq) , k 2πnpq 1 2 2 = √ e−(k−η) /(2σ ) σ ( 2π ) 1 k−η = g . σ σ √ npq neighborhood of np, pn (k) can p + q = 1, p > 0, q > 0 The equality for DeMoivre–Laplace Theorem holds in the limit case of n → ∞. This result can further be extended to the case of n–trials where the occurrences of the event A is in the interval k1 ≤ k ≤ k2 . In which case, one has, σ ≫ 1, 2 k=k ∑2 pn (k) = k=k1 k=k ∑2 ( k=k1 ) ( ) ( ) n k k1 − η k2 − η n−k −G . p (1 − p) ≃G σ σ k Error Correction: The summation above contains k2 − k1 + 1 terms while the integral contains the interval k2 − k1 . There is no problem for the case when k2 − k1 ≫ 1, however, when this is not true, a correction in the integral limits: G 1 2 k2 + −η σ − G k1 − 1 2 −η σ accounts for the symmetric distribution of the extra 1 in the summation. b) Law of Large Numbers: According to the Von Mises model, if there are k favorable outcomes in n–trials for an event A , then P (A ) = k/n. This has another interpretation. If an event A with probability P (A ) = p occurs appears k–times in n–trials then k ≃ np. A version of this empirical observation for the Bernoulli Trails takes form of the following theorem—theoretical justification of the empirical observation: Theorem 2 (Law of Large Numbers). For any ϵ > 0, ) ( k lim P − p ⩽ ε = 1. n→∞ n The key idea behind its proof is that k − p ⩽ ϵ = n (p − ϵ) ⩽ k ⩽ n (p + ϵ) n | {z } | {z } and since pn (k1 ≤ k ≤ k2 ) ≃ G ( k2 −η σ ) −G ( k1 −η σ2 ) k1 k2 , we have, pn (n (p − ε) ≤ k ≤ n (p + ε)) ≃ G ( ϵn ) σ ( −G −ϵn σ ) . Now, as n → ∞, ϵ → ∞ and we have G (∞) = 0. Also, G (x) + G (−x) = 1. Consequently, we have ) ( k lim P − p ⩽ ε = 2G (∞) − 1 = 1. n→∞ n 10 PRIVATE COMMUNICATION XXX–XX–AB/VER 0.1 c) Generalization of Bernoulli Trials: Suppose that the experiment I is partitioned into r events Am so that A = [A1 , . . . , Ar ] with probability, P (Am ) = pm , p1 + p2 + · · · + pr = 1. Repeating this experiment n times we compute the probability, pn (k1 , . . . , kr ) that the event Am occurs km times, with k1 + k2 + · · · + kr = n, we have, n! pk1 · · · pkr r k1 ! · · · kr ! 1 r ∏ pkmm = n! . km ! pn (k1 , . . . , kr ) = m=1 The interpretation of this result is as follows: given nitems, of which k1 are alike, k2 are alike and so on. Since event A1 occurs k1 times, A2 occurs k2 times and so on, the total number of arrangements are: ) ( ( ) ( ) n n − k1 n n! × ··· = = . k1 k2 k1 , . . . , r k1 ! · · · kr ! | {z } Multinomial Coeff. Having associated their respective probabilities, we have pn (k1 , . . . , kr ) = n! pk11 p kr ··· r . k1 ! kr ! d) Generalized DeMoivre–Laplace Theorem: n! pk11 · · · pkr r k1 ! · · · kr ! ( (m=r )) ( ( )) ∑ (km −ηm )2 2 2 1 (k −np ) (k −np ) 1 1 exp − 2 exp − 21 + · · · + r npr r ηm np1 m=1 √ √ ≃ = . ∏ (2πn)r−1 m pr (2πn)r−1 p1 · · · pr pn (k1 , . . . , kr ) = 2) Poisson Distribution for Rare Events or when p ≪ 1: The Poisson Distribution is used to approximate the case when the event A is rare, meaning that P (A ) ≪ 1. Theorem 3 (Poisson Theorem). If p ≪ 1, n ≫ 1 and k ∼ np, then, ( ) k n k n−k −np (np) p q ≃e . k k! This theorem can be generalized to r partitions as: pn (k1 , . . . , kr ) = r−1 ∏ (npm )km e−npm n! pk11 · · · pkr r −−−→ . n→∞ k1 ! · · · kr ! km ! m=1 ( ) 3) Random Poisson Points: Let ∆ be an arbitrary subinterval of I = − T2 , T2 . Now place n ∈ Z at random in the interval I . We are interested in the following probability: P (A ) = P ({k points in any order ∈ ∆}) AB et al.: LECTURE NOTES 11 which is given by P ({k ∈ ∆}) = ( ) n k p (1 − p)n−k , k p = k/∆. Assuming that n ≫ 1 and ∆ ≪ T we can use the Poission Theorem and P ({k ∈ ∆}) ≃ e−η ηk , k! η = np = n ∆ , T k ∼ η. 4) Asymptotic Independence of Random Poisson Points: Consider two non-overlapping intervals ∆1 and ∆2 . The probability that out of n points, k1 points are in ∆1 and k2 points are in ∆2 is: P ({k1 ∈ ∆1 , k2 ∈ ∆2 }) = n! pk11 pk22 (1 − (p1 + p2 ))n−(k1 +k2 ) k1 !k2 ! (n − (k1 + k2 ))! P ({k1 ∈ ∆1 , k2 ∈ ∆2 }) ̸= P ({k1 ∈ ∆1 }) P ({k2 ∈ ∆2 }) (not independent) where p1 = ∆1 /T and p2 = ∆2 /T. However, if n and T are increased indefinitely while keeping n/T ∼ λ constant, we have, η1k1 −η2 η2k2 e , η = npm k1 ! k2 ! P ({k1 ∈ ∆1 , k2 ∈ ∆2 }) = P ({k1 ∈ ∆1 }) P ({k2 ∈ ∆2 }) (independent). P ({k1 ∈ ∆1 , k2 ∈ ∆2 }) = e−η1 In conclusion, Poisson Random Points are independent in the asymptotic limit. III. Concept of Random Variable