Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Computational Statistics 3. The Most Important Distributions Computational Statistics - Important Distributions 1 3.1 Discrete Transforms The Discrete Uniform Distribution This is the simplest discrete distribution with a finite number of possible values, x1, x2, …, xn, with equal probability 1/n. Mean x1 xn Ex 2 Variance 1 2 n n 2 x i i 1 Computational Statistics - Important Distributions 2 Bernoulli Distribution: X ~ Bernoulli (p) Let the probability of a trial A be P(A) = p, 0 < p < 1 and the random variable X to the indicator of A: 1, if A happens X 0, if A does not happen Computational Statistics - Important Distributions 3 The Bernoulli process must posses the next properties: 1. The number of repeated trials is n 2. Each trial results, an outcome that is either success or failure. 3. The probability of a success remains constant in every trial. 4. The repeated trials are independent. Computational Statistics - Important Distributions 4 P X k p 1 p k 1 k , k 0,1 EX p p1 p 2 Computational Statistics - Important Distributions 5 Binomial Distribution: X ~ Bin (n, p) This distribution is obtained the number of successes n Bernoulli trial. The probability mass function is n k nk P X k p 1 p k Computational Statistics - Important Distributions 6 Mean and variance of the binomial distribution are given by E X np 2 np1 p Computational Statistics - Important Distributions 7 Bin(30,0.4) Computational Statistics - Important Distributions 8 Bin(100,0.12) Computational Statistics - Important Distributions 9 This distribution is used in situations demanding sampling with replacement. When there are more than two possible outcomes for each trial, then we’ll have a Multinomial Distribution. Computational Statistics - Important Distributions 10 Multinomial Distribution: X ~Mu(n, p) If a given trial can result in the k outcomes E1, E2, …, Ek with probabilities p1, p2, p3, …, pk then the probability distribution of the random variables X1, X2, X3, …, Xk representing the number of occurrences for E1, E2, …, Ek in n independent trial is n! P X k p1x1 p2x2 pkxk x1! x2 ! xk ! Computational Statistics - Important Distributions 11 where k x i i 1 k n p i 1 i 1 Computational Statistics - Important Distributions 12 Geometric Distribution: X ~ Geom(p) In a series of Bernoulli trials let the random variable X denote the number of trials until the first success. Then X is a geometric random variable with P(A) = p, 0 < p < 1. The probability density function is P X k p1 p k 1 Computational Statistics - Important Distributions 13 Geom(0.5) Computational Statistics - Important Distributions 14 Negative Binomial Distribution: X ~ Negbin(r, p) This a generalization of a geometric distribution (geom distr. will appear, when r = 1). Trials are repeated until r successes occur. Now k = r, r+1, r+2,… The probability density function is k 1 k r r 1 p p P X k r 1 Computational Statistics - Important Distributions 15 r EX p r 1 p 2 p 2 Computational Statistics - Important Distributions 16 Hypergeometric distribution: X ~Hypergeom(n, N, r) A set of N objects contains r objects classified as failures. Randomly selected n items are inspected. Let random variable X be the number of the defectives. The the probability density function, mean and variance are Computational Statistics - Important Distributions 17 r N r k n k P X k N n nr EX N nr N r N n 2 N N N 1 Computational Statistics - Important Distributions 18 Poisson distribution: X ~ Poisson (λ) Experiments yielding numerical values of a random variable X, the number of outcomes occuring during an any given time interval or in a specified region, are called Poisson experiments. Given an interval of real numbers, assume events occur randomly throughout the interval. Computational Statistics - Important Distributions 19 If the interval can be partitioned into subintervals small enough length such that 1. the probability of more than one event in a subinterval is zero, 2. the probability of one event in a subinterval is the same for all subintervals and it is proportional to the length of the subinterval, and 3. the event in each subinterval is independent of subintervals, Computational Statistics - Important Distributions 20 The random experiment is called Poisson process. The random variable X that equals the number of events in the interval is a Poisson random variable with parameter λ > 0, and the probability density function of X is P X k k k! e Computational Statistics - Important Distributions 21 Poisson distribution is a distribution with infinitely many values. More accurately, it is possible to prove, that Poisson distribution is obtained as a limiting case of the binomial distribution. By letting p -> 0 and n -> ∞ so that the mean of the binomial approaches a finite value. Mean and variance of a Poisson distribution are EX 2 Computational Statistics - Important Distributions 22 Poisson(10) Computational Statistics - Important Distributions 23 Poisson(50) Computational Statistics - Important Distributions 24 The difference between geometric, binomial negative binomial distribution and Poisson is now stressed. Geometric: random variable denotes the number of trials until the first success. Binomial: random variable denotes the number of succesful trials of all number of trials (n is predetermined) Computational Statistics - Important Distributions 25 Negative Binomial: random variable denotes the number of trials required to obtain a certain amount of successes. Poisson: random variable denotes succesful events in a certain time or another interval. Computational Statistics - Important Distributions 26 3.2 Continuous Distributions The Uniform Distribution: The probability distribution function is given by 1 ,a xb f x b a 0, otherwise and the mean and variance are ab EX 2 b a 2 2 Computational Statistics - Important Distributions 12 27 Computational Statistics - Important Distributions 28 The Exponential Distribution: X ~Exp(λ) Suppose a random variable has a Poisson distribution. The the time between the events occuring is a random variable which follows an exponential distribution. Exponential distribution is a special case of the gamma distribution(will follow). The exponential probability density function is given by e t , t 0 f t 0, otherwise Computational Statistics - Important Distributions 29 The mean and variance ar given by EX 1 1 2 2 where α > 0. Computational Statistics - Important Distributions 30 The Gamma Distribution: X ~Gamma(α, β) First we define the gamma function Γ(α) for all x > 0, when α > 0: x 1 x e dx 0 Computational Statistics - Important Distributions 31 It can be shown that, Γ(α) = (α - 1) Γ(α - 1), Γ(1) = 0! =1 and if α is positive integer then Γ(α) = (α - 1)!. The random variable X has a gamma distribution with parameters α, β >0, if its probability density function for x > 0 is given by Computational Statistics - Important Distributions 32 1 f x x 1e x / Computational Statistics - Important Distributions 33 The Normal distribution: X ~ N(μ, σ2) A random variable X with probability density function f x 1 2 e 1 x 2 2 , x with parameters - ∞ < μ <∞ and σ > 0 is a normal random variable. Computational Statistics - Important Distributions 34 When E(X) = μ = 0 and σ2 = 1, a normal random variable is called a standard normal varible and the distribution respectively a standard normal distribution. Some properties of the normal distribution: 1. The mode, which is the point on the horizontal axis where the curve is a maximum occurs at x = μ. Computational Statistics - Important Distributions 35 2. The curve is symmetric through the mean μ. 3. The total area under the curve and above the xaxis is equal to 1. Computational Statistics - Important Distributions 36 Normal(10,3.25) Computational Statistics - Important Distributions 37 Recall, that the cumulative distribution function (cdf), F(x), of a continuous random variable X with density function f(x) is x F ( x) P( X x) f t dt In the case of the normal distribution these values have to be calculated numerically. Computational Statistics - Important Distributions 38 That’s why some of the these values are given in a tabular form. Secondly, it is impossible to calculate all the possible values according to all the worlds’ normal distributions-> standardized normal values are given in a tabular form. Other normal distributions are transformed to standard values by Z X Computational Statistics - Important Distributions 39 Standard normal distribution Computational Statistics - Important Distributions 40