Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
REVIEW OF STATISTICS • Sample Space: S = the set of all possible outcomes not known for sure by the investigator at the current time. § event A = a subset of the sample space S (i.e., A ⊂ S). § events A and B are disjoint if A ∩ B = ∅ (where ∅ is the empty set or null set). • Axioms of probabilities: P(⋅) is a probability function if it satisfies: § P(A) ≥ 0 for any event A ⊂ S § P(S) = 1 § P(A ∪ B) = Pr(A) + Pr(B) if A and B are disjoint events. The probability of an event can be intuitively interpreted as measuring the relative frequency or relative likelihood of this event. • A random variable X is a function that takes a specific real value at each point of the sample space. For any event A ⊂ S, P(A) is the probability that X ∈ A. • A cumulative distribution function (CDF) for a random variable X is the function F(t) = P(X≤ t) such that: § F(t) is non-decreasing and continuous from the right § F(-∞) = 0 § F(+∞) = 1. • A probability density function(PDF) = f(x) where x can be a discrete or a continuous variable. § Discrete Case: when the random variable X can take a countable number of distinct values: x1, x2, x3, ... Then, S = {x1, x2, x3, ...} , ♦ f(xi) = P(X = xi), and ♦ for A ⊂ S, P(X∈ A) = Σx∈A f(x). § Continuous Case: when the random variable X can take all possible values between a and b: a ≤ x ≤ b. Then, S = [a, b] = {x: a ≤ x ≤ b} and, for A ⊂ S, ♦ P(X ∈ A) = ∫ x∈A f ( x )dx . ♦ where f(x) = ∂F(x)/∂x under differentiability. ♦ F(x0) = P(x ≤ x0) = ∫xx=0 −∞ f ( x )dx , and ♦ P(X = x) = 0 ≠ f(x). • Multivariate Case, consider X, Y, Z, …, to be random variables. For simplicity, we consider the case of two random variables, X and Y. § Joint Cumulative Distribution Function of (X, Y) is F(x, y) = P(X ≤ x, Y ≤ y). § Marginal Cumulative Distribution of X is Fx(x) = F(x, ∞). § Marginal Cumulative Distribution of Y is Fy(y) = F(∞, y). • The Joint Probability Density Function of X and Y is f(x, y), where x and y are assumed to be in the sample space. § Discrete Case: f(x, y) = P(X = x, Y = y) F(x0, y0) = Σ x≤ x 0 Σ y≤ y 0 f(x, y). § Continuous Case: f(x, y) = ∂2F(x, y)/∂x ∂y x y F(x0, y0) = ∫x =0 −∞ ∫y =0 −∞ f(x, y) dy dx. • The Marginal Probability Density Functions are § fx(x) = Σy f(x, y) in the discrete case +∞ = ∫−∞ f(x, y) dy in the continuous case, § fy(y) = Σx f(x, y) in the discrete case +∞ = ∫−∞ f(x, y) dx in the continuous case, • The random variables (X, Y) are independent if § F(x, y) = Fx(x) Fy(y) or § f(x, y) = fx(x) fy(y) for all x and y in the sample space. • Conditional probability: Let f(x, y) be the joint probability density function at (x, y). Then, for all x and y in the sample space, § f(y| x) = f(x, y)/fx(x) and represents the conditional probability function of y given x (assuming fx(x) ≠ 0), and § f(x| y) = f(x, y)/fy(y) which is the conditional probability function of x given y (when fy(y) ≠ 0). • Bayes theorem: Assuming that fx(x) ≠ 0, f ( x| y ) f y ( y ) f(y| x) = f x ( x) f ( x| y) f y ( y) in the discrete case = Σ y f ( x| y ) f y ( y ) = f ( x| y) f y ( y) +∞ ∫−∞ in the continuous case. f ( x| y) f y ( y) dy In the case where x corresponds to sample information, fy(y) is called the prior probability, f(x| y) is called the likelihood function of the sample, and f(y| x) is called the posterior probability. • Expectations: The expected value of some function g(X) is given by E[g(X)] = Σx g(x) f(x) in the discrete case, and +∞ = ∫−∞ g(x) f(x) in the continuous case, where E is the "expectation operator". § § • If g(X) = X, then E(X) is called the mean or average of X. If g(X) = (X - E(X))2, then E[(X - E(X))2] is called the variance of X, denoted by V(X). V(X) ≥ 0. § V(X) = E[(X - E(X))2] = E[X2 +(E(X))2 - 2 X E(X)] = E(X2) - (E(X))2. Standard Deviation of X = (V(X)½ 2 • Covariance between X and Y = Cov(X, Y) = E[(X - E(X))(Y - E(Y))] = E[X⋅Y -X⋅E(Y) - Y⋅E(X) + E(X)⋅E(Y)] = E(X⋅Y) - E(X)⋅E(Y) • Correlation between X and Y = ρ(X, Y) = Cov(X, Y)/[(V(X)⋅V(Y)]½; -1 ≤ ρ ≤ 1. • Let X = (X1, X2, ..., Xn)' be a (n×1) random vector with mean E(X) = µ = (µ1, µ2, ..., µn)' (= a (n×1) σ11 σ 12 L σ1n σ σ 22 L σ 2 n vector) and variance V(X) = Σ = 21 , where σii = V(Xi) is the variance of Xi and M M O M σ n1 σ n 2 K σ nn σij = Cov(Xi, Xj) is the covariance of Xi with Xj, and Σ is a (n×n) symmetric positive semi-definite matrix. • Let Y = A⋅X + b, where Y = (Y1, Y2, …, Ym)’ is a (m×1) random vector, A is a (m×n) known matrix, and b is a (m×1) known vector. Then, § E(Y) = A⋅E(X) + b = A⋅µ + b § V(Y) = A⋅V(X)⋅A' = A⋅Σ⋅A'. • Note: If x and y are independently distributed with finite variances, then Cov(X, Y) = 0 and V(X+Y) = V(X) + V(Y). • Chebyschev inequality: If V(X) exists (i.e. if it is finite), then Pr[|X - E(X)| ≥ t] ≤ V(X)/t2 • Conditional Expectation: Let f(x, y) be a joint probability function, fy(y) be the marginal probability function of y, and h(x| y) = f(x, y)/fy(y) be the conditional probability of x given y. § Conditional expectation Ex|y of some function r(x, y) over the random variable x given y is the expectation of r(x, y) based on the conditional probability h(x| y): Ex|y r(x, y) = Σx r(x, y) h(x| y), (assuming that x is a discrete random variable). § Unconditional expectation Ex,y of the function r(x, y) is given by Ex,y r(x, y) = Ey[Ex|y r(x, y)], where Ex|y is the conditional expectation operator and Ey is the expectation based on the marginal probability of y. Proof: Ex,y r(x, y) = Σx,y r(x, y) f(x, y) = Σx,y r(x, y) h(x| y) fy(y) = Σy[Σx r(x, y) h(x| y)] fy(y) = Ey[Ex|y r(x, y)]. 3 Some Special Continuous Distributions Probability Function f(x) • • Mean E(X) Variance V(X) Uniform: for a < x < b f(x) = 1/(b-a) (b+a)/2 (b-a)2/12 Normal: x ∼ N(µ, σ2), x = scalar exp[-(x-µ)2/2σ2]/[σ(2π)½] µ σ2 Note: (x - µ)/σ ∼ N(0, 1) is called a standard normal random variable. • Multivariate Normal: x ∼ N(µ, Σ), x = (n×1) vector (2π)-n/2 |Σ|-1/2 exp[(-1/2) (x-µ)’ Σ-1 (x-µ)] µ = (n×1) vector Σ= (n×n) matrix • Gamma: for α>0; β>0; x>0 [βα/Γ(α)]⋅xα-1⋅e-βx α/β α/β2 • Exponential = Gamma with α = 1 • Chi Square = χ2(k) = Gamma with α = k/2; β = 1/2; k = positive integer (= “degrees of freedom”) If Zi ∼ N(0, 1) and (Z1, …, Zn) are independently distributed, then Y = (Z12 + Z22 + … + Zk2) ∼ χ2(k) • T-distribution = t(k) If Z ∼ N(0, 1), C ∼ χ2(k) (i.e., C has a chi-square distribution with k degrees of freedom), and Z and χ2(k) are independently distributed, then t = Z/[C/k]1/2 has a t-distribution with k degrees of freedom, i.e. t ∼ t(k). • F-distribution = F(k1, k2) If C1 ∼ χ2(k1) and C2 ∼ χ2(k2) are independent chi-square random variables with k1 and k2 degrees of freedom, respectively, then: § F = [C1/k1]/[C2/k2] has an F-distribution with k1 and k2 degrees of freedom, i.e., F ∼ F(k1, k2). • Pareto: α⋅kα/(xα+1) for x>k>0; α>0 α⋅k/(α-1) for α>1 α⋅k2/[(α-2)(α-1)2] for α>2 • Lognormal exp[-(log(x)-m)2/2σ2]/[x⋅σ⋅(2π)1/2] for x>0; σ>0 exp(m+σ2/2) [exp(σ2)-1]exp(2m+σ2) ∞ Note: Γ (α) = ∫0 yα-1 e-y dy = 1 if α = 1, = (α-1)! if α is an integer 4 Some Special Continuous Distributions Probability Function f(x) • Mean E(X) Binomial: for 0<p<1; x = 0,1,...,n [n!/(x!(n-x)!]⋅px⋅(1-p)n-x • Bernoulli = Binomial when n = 1 • Negative Binomial: for 0<p<1; x = 0,1,...,n [(r+x-1)!/(x!⋅(r-1)!)]⋅pr⋅(1-p)x Variance V(X) n⋅p n⋅p⋅(1-p) r⋅(1-p)/p r⋅(1-p)/p2 • Geometric = Negative Binomial when r = 1 • Poisson: for x = 0,1,2,...; λ>0; e-λ⋅λx/(x!) λ λ • Uniform: for x = 1, 2, ...,n; n = integer f(x) = 1/n (n+1)/2 (n2-1)/12 Note: • n! = n ⋅ (n-1) ⋅ (n-2) ⋅⋅⋅ 2 ⋅ 1 = the factorial of n. Some important relationships § F(1,k) = t(k)2 § J F(J, K) ≈ χ(J) as K → ∞ § t(K) ≈ N(0, 1) as K → ∞. 5