Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Introductory Statistics Lectures Probability density functions The normal distribution Anthony Tanbakuchi Department of Mathematics Pima Community College Redistribution of this material is prohibited without written permission of the author © 2009 (Compile date: Tue May 19 14:49:36 2009) Contents 1 Probability density functions 1.1 Introduction . . . . . . . R tip of the day: graphing functions . . Uniform distribution . . Finding probabilities from density functions . . . . 1.2 Cumulative distribution functions . . . . . . Finding probabilities using CDF’s . . . 1 1.1 1 1 1.3 1 2 3 5 1.4 1.5 6 Inverse cumulative distribution functions . . . . . . Normal distribution . . Standard normal distribution . . . . . . Finding probabilities involving the normal distribution . . . . . . Examples . . . . . . . . Summary . . . . . . . . Additional Examples . . 8 9 10 11 11 12 13 Probability density functions Introduction R TIP OF THE DAY: GRAPHING FUNCTIONS Graphing functions: curve(expression, xmin, xmax) expression an expression or function involving x xmin min value of x to plot xmax max value of x to plot 1 R Command 2 of 15 500 0 x^3 −1000 0 x^2 par ( mfrow = c ( 2 , 2 ) ) c u r v e ( x ˆ 2 , −10 , 1 0 ) c u r v e ( x ˆ 3 , −10 , 1 0 ) c u r v e ( s i n ( x ) , −2 ∗ pi , 2 ∗ p i ) c u r v e ( s i n ( p i ∗ x ) / ( p i ∗ x ) , −5, 5 ) 20 40 60 80 R: R: R: R: R: 1.1 Introduction −10 −5 0 5 10 −10 −5 0 10 0.6 −0.2 0.2 sin(pi * x)/(pi * x) 1.0 0.5 0.0 −1.0 sin(x) 5 x 1.0 x −6 −4 −2 0 2 4 6 −4 x −2 0 2 4 x UNIFORM DISTRIBUTION Definition 1.1 R Command Uniform distribution f (x). Occurs when the probability of a continuous random variable is equal across a range of values. Uniform density: dunif(x, min=0, max=1) Useful for graphing, not useful for directly finding probabilities. In R, all PDF’s have a “d” prefix for density. R: c u r v e ( d u n i f ( x , min = 2 , max = 6 ) , 0 , 8 , y l i m = c ( 0 , + 0 . 5 ) , y l a b = ” f ( x ) ” , main = ”Uniform D e n s i t y f ( x ) ” ) Anthony Tanbakuchi MAT167 Probability density functions 3 of 15 0.3 0.2 0.0 0.1 f(x) 0.4 0.5 Uniform Density f(x) 0 2 4 6 8 x Probability is area! FINDING PROBABILITIES FROM DENSITY FUNCTIONS Finding probabilities Area represents probability! Z a P (x < a) = f (x) dx (area to the left of a) −∞ Z b f (x) dx P (a < x < b) = (area between a and b) a Z P (x > a) = ∞ f (x) dx (area to the right of a) a Anthony Tanbakuchi MAT167 4 of 15 1.1 Introduction 0.00 2 4 6 8 0 2 4 x P(3<x<4)=0.25 P(x=3)=0 6 8 6 8 0.10 0.00 0.00 0.10 f(x) 0.20 x 0.20 0 f(x) 0.10 f(x) 0.10 0.00 f(x) 0.20 P(x>3)=0.75 0.20 P(x<3)=0.25 0 2 4 x 6 8 0 2 4 x Finding probabilities for uniform density is easy: width × height. Use the density below to answer the following question. Anthony Tanbakuchi MAT167 Probability density functions 5 of 15 0.10 0.00 f(x) 0.20 Uniform PDF 0 2 4 6 8 x Question 1. Shade the region representing P (x < 5) and find the probability. 1.2 Cumulative distribution functions Cumulative distribution function (cdf) F (x). Gives the area to the left of x on the probability density function. P (x < a0 ) = F (a0 ) Z a0 = f (x) dx Definition 1.2 (1) (2) −∞ F (x) is the tool for finding probabilities of continuous random variables. Uniform CDF: punif(x, min=0, max=1) Gives the area to the left of the uniform density at x. In R, all CDF’s have a “p” prefix for probability. R Command Example 1. Find P (x < 5) Anthony Tanbakuchi MAT167 6 of 15 1.2 Cumulative distribution functions Uniform CDF F(x) 0.8 0.6 ● 0.0 0.00 0.2 0.4 F(x) 0.10 f(x) 0.20 1.0 Uniform PDF f(x) 0 2 4 6 8 0 2 x 4 6 8 x R: p u n i f ( 5 , min = 2 , max = 6 ) [ 1 ] 0.75 FINDING PROBABILITIES USING CDF’S What CDF gives us Only give area to the left! P (x < a) = F (a) (area to the left of a) P (x > a) =? (area to the right of a) P (a < x < b) =? (area between a and b) Using CDF to find P(x>a) Example 2. Find P (x > 5). Anthony Tanbakuchi MAT167 Probability density functions 7 of 15 Uniform CDF F(x) 0.8 0.6 ● 0.0 0.00 0.2 0.4 F(x) 0.10 f(x) 0.20 1.0 Uniform PDF f(x) 0 2 4 6 8 0 2 4 x 6 8 x P (x > 5) = 1 − P (x < 5) = 1 − F (5) R: 1 − p u n i f ( 5 , min = 2 , max = 6 ) [ 1 ] 0.25 Using CDF to find P(a<x<b) Example 3. Find P (3 < x < 5). 0 2 4 x 6 8 0.20 0.00 0.05 0.10 f(x) 0.15 0.20 0.15 f(x) 0.10 0.05 0.00 0.00 0.05 0.10 f(x) 0.15 0.20 0.25 Shaded Area = F(5)−F(3) 0.25 Shaded Area = F(3) 0.25 Shaded Area = F(5) 0 2 4 6 8 x 0 2 4 6 8 x P (3 < x < 5) = F (5) − F (3) (always subtract larger from smaller) R: p u n i f ( 5 , min = 2 , max = 6 ) − p u n i f ( 3 , min = 2 , + max = 6 ) [ 1 ] 0.5 Anthony Tanbakuchi MAT167 8 of 15 1.2 Cumulative distribution functions Finding probabilities with CDF’s Using F(x) to find probabilities: P (x < a) = F (a) (area to the left of a) P (x > a) = 1 − F (a) (area to the right of a) P (a < x < b) = F (b) − F (a) (area between a and b) You must know how to use this! INVERSE CUMULATIVE DISTRIBUTION FUNCTIONS Uniform inverse CDF: qunif(p, min=0, max=1) Finds x with area p to the left on the density function. In R, all inverse CDF’s have a “q” prefix for quantile. Using inverse CDF to find x given p Example 4. Find x0 such that P (x < x0 ) = 0.75 (the value of x that has an area to the left of 0.75). 0.10 0.20 Uniform PDF f(x) 0.00 R Command Inverse cumulative distribution functions CDF−1 . Finds the value of x that has an area p to the left. (Inverse operation of CDF). f(x) Definition 1.3 0 2 4 6 8 x R: q u n i f ( 0 . 7 5 , min = 2 , max = 6 ) [1] 5 Question 2. Find x0 such that P (x > x0 ) = 0.25 (the value of x that has an area to the right of 0.25). Anthony Tanbakuchi MAT167 Probability density functions 1.3 9 of 15 Normal distribution Normal probability density function f (x). f (x | µ, σ) = √ 1 2πσ 2 Definition 1.4 2 1 (x−µ) σ2 · e− 2 (3) characterized by µ and σ. Occurs frequently in nature. Normal density: dnorm(x, mean=0, sd=1) By default it is the standard normal density. R Command Visualizing the normal distribution 0.4 0.3 0.2 0.1 0.0 dnorm(x, mean = 0, sd = 1) R: c u r v e ( dnorm ( x , mean = 0 , sd = 1 ) , −5, 5 ) −4 −2 0 2 4 x Visualizing effect of µ, σ Anthony Tanbakuchi MAT167 10 of 15 1.3 Normal distribution fnorm(x) −10 0 10 20 −20 −10 0 10 x fnorm(x,, µ = −5,, σ = 5) fnorm(x,, µ = 15,, σ = 0.75) 20 fnorm(x) 0.0 0.2 0.0 0.2 0.4 x 0.4 −20 fnorm(x) 0.2 0.0 0.2 0.0 fnorm(x) 0.4 fnorm(x,, µ = 5,, σ = 2) 0.4 fnorm(x,, µ = 0,, σ = 1) −20 −10 0 10 20 −20 −10 x 0 10 20 x Area under curve is always 1. STANDARD NORMAL DISTRIBUTION Definition 1.5 Standard normal distribution f (z). A normal distribution with µ = 0 and σ = 1. If you convert normally distributed x data into z-scores, you will have a standard normal distribution. Since there are an infinite set of normal distributions, historically we converted x to z and then only had one standard normal distribution and one standard normal cumulative distribution F (z). A single table of F (z) could then be used to solve most probability questions involving normal distributions. With computers, we can directly use any specific normal cumulative distriAnthony Tanbakuchi MAT167 Probability density functions 11 of 15 bution F (x) and very accurately find probabilities. 0.3 0.2 0.0 0.1 fstd(z) 0.4 0.5 Standard normal distribution −6 −4 −2 0 2 4 6 z FINDING PROBABILITIES INVOLVING THE NORMAL DISTRIBUTION Normal CDF: pnorm(x, mean=0, sd=1) Gives the area to the left of the normal density at x. R Command Normal inverse CDF: qnorm(p, mean=0, sd=1) Finds x with area p to the left on the density function. R Command Tips for solving probabilities involving normal dist. 1. Determine µ and σ. 2. Sketch the PDF & area representing probability. 3. If asked to find probability use CDF: R function pnorm(x,...) to find probability p. 4. If asked to find value of x corresponding to probability use CDF−1 : R function qnorm(p,...) to find the value of x. 5. If working with upper tail be sure to take compliment! Be careful, if you want to find the value of x that has an area p to the right you need to use qnorm(1-p, ...) . EXAMPLES Example 5. Given µ = 2 and σ = 1, find P (x > 3). Anthony Tanbakuchi MAT167 12 of 15 1.4 Summary Normal CDF F(x) 1.0 0.4 Normal PDF f(x), P(x<3) 0.8 0.6 0.0 0.0 0.2 0.4 F(x) 0.3 0.2 0.1 f(x) ● −4 0 2 4 6 8 x −4 0 2 4 6 8 x P (x > 3) = 1 − F (3) R: p = 1 − pnorm ( 3 , mean = 2 , sd = 1 ) R: p [ 1 ] 0.15866 Thus, P (x > 3) = 0.159 Now lets look at the inverse problem: Example 6. Given µ = 2 and σ = 1, what value of x0 satisfies P (x > x0 ) = 0.159? x0 = F −1 (1 − 0.159) R: p [ 1 ] 0.15866 R: x = qnorm ( 1 − p , mean = 2 , sd = 1 ) R: x [1] 3 Note where we had to take the compliment! Thus, the value of x that has an area 0.159 to the right is 3! 1.4 Summary • For discrete random variables, probability is given by the distribution p = P (xi ). (“d” prefix in R.) – For the binomial distribution, the probability of a specific number of successes x is p = dbinom(x,n,p) . • For continuous variables, probability is area on density f (x). – Use CDF’s F (x) to find probabilities. (“p” prefix in R) P (x < x0 ) = F (x0 ) (area to the left of x0 ) P (x > x0 ) = 1 − F (x0 ) (area to the right of x0 ) P (a < x < b) = F (b) − F (a) (area between a and b) Anthony Tanbakuchi MAT167 Probability density functions 13 of 15 – Use inverse CDF’s F −1 (p) to find specific value of x0 in p = P (x < x0 ) given probability p. (“q” prefix in R) x0 = F −1 (p) For the normal distribution: • CDF p = F (x): p=pnorm(x, mean=0, sd=1) P (x < x0 ) = pnorm(x’,...) P (x > x0 ) = 1 − pnorm(x’,...) P (a < x < b) = pnorm(b,...) − pnorm(a,...) where “. . . ” is “ mean=µ, sd=σ ”. • Use inverse CDF’s x = F −1 (p) to find x given probability p. (“q” prefix in R) To find x0 in p = P (x < x0 ): x’=qnorm(p, ...) To find x0 in p = P (x > x0 ): x’=qnorm(1-p, ...) For continuous variables in general: • Carefully determine location of area: to left, to right, interval. • Always make a sketch when doing problems. • CDF assumes areas to the left. Take the compliment when finding upper tail! 1.5 Additional Examples Given the following density function on the left and it’s corresponding CDF for the χ2 distribution, answer the following questions. Chi−Squared CDF 0.8 0.6 0.0 0.00 0.2 0.4 F(x) 0.08 0.04 f(x) 0.12 1.0 Chi−Squared Density 0 5 10 15 x 20 25 0 5 10 15 20 25 x Question 3. Find P (x > 10) Anthony Tanbakuchi MAT167 14 of 15 1.5 Additional Examples Question 4. Find P25 The SAT-I scores for females is normally distributes with a mean of 998 and a standard deviation of 202 (based on data from the college board). Question 5. If a female is randomly selected, what is the probability that her score is greater than 1100? Question 6. What would the score be for P75 ? Question 7. What proportion of students scored between 500-1100? Replacement times for CD players are normally distributed with a mean of 7.1 years and a standard deviation of 1.4 years. Question 8. Find the probability that a randomly selected CD player will have a replacement time less than 8 years. Question 9. If you want to provide a warranty so that on only 2% of the CD players will be replaced before the warranty expires, what is the time length of the warranty? Anthony Tanbakuchi MAT167 Probability density functions 15 of 15 Anthony Tanbakuchi MAT167