Download Probability density functions

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Central limit theorem wikipedia , lookup

Transcript
Introductory Statistics Lectures
Probability density functions
The normal distribution
Anthony Tanbakuchi
Department of Mathematics
Pima Community College
Redistribution of this material is prohibited
without written permission of the author
© 2009
(Compile date: Tue May 19 14:49:36 2009)
Contents
1 Probability density functions
1.1 Introduction . . . . . . .
R tip of the day: graphing functions . .
Uniform distribution . .
Finding probabilities
from
density
functions . . . .
1.2 Cumulative distribution functions . . . . . .
Finding probabilities
using CDF’s . . .
1
1.1
1
1
1.3
1
2
3
5
1.4
1.5
6
Inverse cumulative distribution functions . . . . . .
Normal distribution . .
Standard normal distribution . . . . . .
Finding probabilities
involving
the
normal distribution . . . . . .
Examples . . . . . . . .
Summary . . . . . . . .
Additional Examples . .
8
9
10
11
11
12
13
Probability density functions
Introduction
R TIP OF THE DAY: GRAPHING FUNCTIONS
Graphing functions:
curve(expression, xmin, xmax)
expression an expression or function involving x
xmin min value of x to plot
xmax max value of x to plot
1
R Command
2 of 15
500
0
x^3
−1000
0
x^2
par ( mfrow = c ( 2 , 2 ) )
c u r v e ( x ˆ 2 , −10 , 1 0 )
c u r v e ( x ˆ 3 , −10 , 1 0 )
c u r v e ( s i n ( x ) , −2 ∗ pi , 2 ∗ p i )
c u r v e ( s i n ( p i ∗ x ) / ( p i ∗ x ) , −5, 5 )
20 40 60 80
R:
R:
R:
R:
R:
1.1 Introduction
−10
−5
0
5
10
−10
−5
0
10
0.6
−0.2
0.2
sin(pi * x)/(pi * x)
1.0
0.5
0.0
−1.0
sin(x)
5
x
1.0
x
−6
−4
−2
0
2
4
6
−4
x
−2
0
2
4
x
UNIFORM DISTRIBUTION
Definition 1.1
R Command
Uniform distribution f (x).
Occurs when the probability of a continuous random variable is equal
across a range of values.
Uniform density:
dunif(x, min=0, max=1)
Useful for graphing, not useful for directly finding probabilities.
In R, all PDF’s have a “d” prefix for density.
R: c u r v e ( d u n i f ( x , min = 2 , max = 6 ) , 0 , 8 , y l i m = c ( 0 ,
+
0 . 5 ) , y l a b = ” f ( x ) ” , main = ”Uniform D e n s i t y f ( x ) ” )
Anthony Tanbakuchi
MAT167
Probability density functions
3 of 15
0.3
0.2
0.0
0.1
f(x)
0.4
0.5
Uniform Density f(x)
0
2
4
6
8
x
Probability is area!
FINDING PROBABILITIES FROM DENSITY FUNCTIONS
Finding probabilities
Area represents probability!
Z
a
P (x < a) =
f (x) dx
(area to the left of a)
−∞
Z b
f (x) dx
P (a < x < b) =
(area between a and b)
a
Z
P (x > a) =
∞
f (x) dx
(area to the right of a)
a
Anthony Tanbakuchi
MAT167
4 of 15
1.1 Introduction
0.00
2
4
6
8
0
2
4
x
P(3<x<4)=0.25
P(x=3)=0
6
8
6
8
0.10
0.00
0.00
0.10
f(x)
0.20
x
0.20
0
f(x)
0.10
f(x)
0.10
0.00
f(x)
0.20
P(x>3)=0.75
0.20
P(x<3)=0.25
0
2
4
x
6
8
0
2
4
x
Finding probabilities for uniform density is easy: width × height.
Use the density below to answer the following question.
Anthony Tanbakuchi
MAT167
Probability density functions
5 of 15
0.10
0.00
f(x)
0.20
Uniform PDF
0
2
4
6
8
x
Question 1. Shade the region representing P (x < 5) and find the probability.
1.2
Cumulative distribution functions
Cumulative distribution function (cdf) F (x).
Gives the area to the left of x on the probability density function.
P (x < a0 ) = F (a0 )
Z a0
=
f (x) dx
Definition 1.2
(1)
(2)
−∞
F (x) is the tool for finding probabilities of continuous random
variables.
Uniform CDF:
punif(x, min=0, max=1)
Gives the area to the left of the uniform density at x.
In R, all CDF’s have a “p” prefix for probability.
R Command
Example 1. Find P (x < 5)
Anthony Tanbakuchi
MAT167
6 of 15
1.2 Cumulative distribution functions
Uniform CDF F(x)
0.8
0.6
●
0.0
0.00
0.2
0.4
F(x)
0.10
f(x)
0.20
1.0
Uniform PDF f(x)
0
2
4
6
8
0
2
x
4
6
8
x
R: p u n i f ( 5 , min = 2 , max = 6 )
[ 1 ] 0.75
FINDING PROBABILITIES USING CDF’S
What CDF gives us
Only give area to the left!
P (x < a) = F (a)
(area to the left of a)
P (x > a) =?
(area to the right of a)
P (a < x < b) =?
(area between a and b)
Using CDF to find P(x>a)
Example 2. Find P (x > 5).
Anthony Tanbakuchi
MAT167
Probability density functions
7 of 15
Uniform CDF F(x)
0.8
0.6
●
0.0
0.00
0.2
0.4
F(x)
0.10
f(x)
0.20
1.0
Uniform PDF f(x)
0
2
4
6
8
0
2
4
x
6
8
x
P (x > 5) = 1 − P (x < 5) = 1 − F (5)
R: 1 − p u n i f ( 5 , min = 2 , max = 6 )
[ 1 ] 0.25
Using CDF to find P(a<x<b)
Example 3. Find P (3 < x < 5).
0
2
4
x
6
8
0.20
0.00
0.05
0.10
f(x)
0.15
0.20
0.15
f(x)
0.10
0.05
0.00
0.00
0.05
0.10
f(x)
0.15
0.20
0.25
Shaded Area = F(5)−F(3)
0.25
Shaded Area = F(3)
0.25
Shaded Area = F(5)
0
2
4
6
8
x
0
2
4
6
8
x
P (3 < x < 5) = F (5) − F (3) (always subtract larger from smaller)
R: p u n i f ( 5 , min = 2 , max = 6 ) − p u n i f ( 3 , min = 2 ,
+
max = 6 )
[ 1 ] 0.5
Anthony Tanbakuchi
MAT167
8 of 15
1.2 Cumulative distribution functions
Finding probabilities with CDF’s
Using F(x) to find probabilities:
P (x < a) = F (a)
(area to the left of a)
P (x > a) = 1 − F (a)
(area to the right of a)
P (a < x < b) = F (b) − F (a)
(area between a and b)
You must know how to use this!
INVERSE CUMULATIVE DISTRIBUTION FUNCTIONS
Uniform inverse CDF:
qunif(p, min=0, max=1)
Finds x with area p to the left on the density function.
In R, all inverse CDF’s have a “q” prefix for quantile.
Using inverse CDF to find x given p
Example 4. Find x0 such that P (x < x0 ) = 0.75 (the value of x that has an
area to the left of 0.75).
0.10
0.20
Uniform PDF f(x)
0.00
R Command
Inverse cumulative distribution functions CDF−1 .
Finds the value of x that has an area p to the left. (Inverse operation
of CDF).
f(x)
Definition 1.3
0
2
4
6
8
x
R: q u n i f ( 0 . 7 5 , min = 2 , max = 6 )
[1] 5
Question 2. Find x0 such that P (x > x0 ) = 0.25 (the value of x that has an
area to the right of 0.25).
Anthony Tanbakuchi
MAT167
Probability density functions
1.3
9 of 15
Normal distribution
Normal probability density function f (x).
f (x | µ, σ) = √
1
2πσ 2
Definition 1.4
2
1 (x−µ)
σ2
· e− 2
(3)
characterized by µ and σ.
Occurs frequently in nature.
Normal density:
dnorm(x, mean=0, sd=1)
By default it is the standard normal density.
R Command
Visualizing the normal distribution
0.4
0.3
0.2
0.1
0.0
dnorm(x, mean = 0, sd = 1)
R: c u r v e ( dnorm ( x , mean = 0 , sd = 1 ) , −5, 5 )
−4
−2
0
2
4
x
Visualizing effect of µ, σ
Anthony Tanbakuchi
MAT167
10 of 15
1.3 Normal distribution
fnorm(x)
−10
0
10
20
−20
−10
0
10
x
fnorm(x,, µ = −5,, σ = 5)
fnorm(x,, µ = 15,, σ = 0.75)
20
fnorm(x)
0.0
0.2
0.0
0.2
0.4
x
0.4
−20
fnorm(x)
0.2
0.0
0.2
0.0
fnorm(x)
0.4
fnorm(x,, µ = 5,, σ = 2)
0.4
fnorm(x,, µ = 0,, σ = 1)
−20
−10
0
10
20
−20
−10
x
0
10
20
x
Area under curve is always 1.
STANDARD NORMAL DISTRIBUTION
Definition 1.5
Standard normal distribution f (z).
A normal distribution with µ = 0 and σ = 1. If you convert normally
distributed x data into z-scores, you will have a standard normal distribution.
Since there are an infinite set of normal distributions, historically we converted x to z and then only had one standard normal distribution and one
standard normal cumulative distribution F (z). A single table of F (z) could
then be used to solve most probability questions involving normal distributions.
With computers, we can directly use any specific normal cumulative distriAnthony Tanbakuchi
MAT167
Probability density functions
11 of 15
bution F (x) and very accurately find probabilities.
0.3
0.2
0.0
0.1
fstd(z)
0.4
0.5
Standard normal distribution
−6
−4
−2
0
2
4
6
z
FINDING PROBABILITIES INVOLVING THE NORMAL DISTRIBUTION
Normal CDF:
pnorm(x, mean=0, sd=1)
Gives the area to the left of the normal density at x.
R Command
Normal inverse CDF:
qnorm(p, mean=0, sd=1)
Finds x with area p to the left on the density function.
R Command
Tips for solving probabilities involving normal dist.
1. Determine µ and σ.
2. Sketch the PDF & area representing probability.
3. If asked to find probability use CDF: R function pnorm(x,...)
to
find probability p.
4. If asked to find value of x corresponding to probability use CDF−1 : R
function qnorm(p,...) to find the value of x.
5. If working with upper tail be sure to take compliment!
Be careful, if you want to find the value of x that has an area p to the
right you need to use qnorm(1-p, ...) .
EXAMPLES
Example 5. Given µ = 2 and σ = 1, find P (x > 3).
Anthony Tanbakuchi
MAT167
12 of 15
1.4 Summary
Normal CDF F(x)
1.0
0.4
Normal PDF f(x), P(x<3)
0.8
0.6
0.0
0.0
0.2
0.4
F(x)
0.3
0.2
0.1
f(x)
●
−4
0
2
4
6
8
x
−4
0
2
4
6
8
x
P (x > 3) = 1 − F (3)
R: p = 1 − pnorm ( 3 , mean = 2 , sd = 1 )
R: p
[ 1 ] 0.15866
Thus, P (x > 3) = 0.159
Now lets look at the inverse problem:
Example 6. Given µ = 2 and σ = 1, what value of x0 satisfies P (x > x0 ) =
0.159?
x0 = F −1 (1 − 0.159)
R: p
[ 1 ] 0.15866
R: x = qnorm ( 1 − p , mean = 2 , sd = 1 )
R: x
[1] 3
Note where we had to take the compliment!
Thus, the value of x that has an area 0.159 to the right is 3!
1.4
Summary
• For discrete random variables, probability is given by the distribution
p = P (xi ). (“d” prefix in R.)
– For the binomial distribution, the probability of a specific number
of successes x is p = dbinom(x,n,p) .
• For continuous variables, probability is area on density f (x).
– Use CDF’s F (x) to find probabilities. (“p” prefix in R)
P (x < x0 ) = F (x0 )
(area to the left of x0 )
P (x > x0 ) = 1 − F (x0 )
(area to the right of x0 )
P (a < x < b) = F (b) − F (a)
(area between a and b)
Anthony Tanbakuchi
MAT167
Probability density functions
13 of 15
– Use inverse CDF’s F −1 (p) to find specific value of x0 in p = P (x <
x0 ) given probability p. (“q” prefix in R)
x0 = F −1 (p)
For the normal distribution:
• CDF p = F (x): p=pnorm(x, mean=0, sd=1)
P (x < x0 ) = pnorm(x’,...)
P (x > x0 ) = 1 − pnorm(x’,...)
P (a < x < b) = pnorm(b,...) − pnorm(a,...)
where “. . . ” is “ mean=µ, sd=σ ”.
• Use inverse CDF’s x = F −1 (p) to find x given probability p. (“q” prefix
in R)
To find x0 in p = P (x < x0 ): x’=qnorm(p, ...)
To find x0 in p = P (x > x0 ): x’=qnorm(1-p, ...)
For continuous variables in general:
• Carefully determine location of area: to left, to right, interval.
• Always make a sketch when doing problems.
• CDF assumes areas to the left. Take the compliment when finding upper
tail!
1.5
Additional Examples
Given the following density function on the left and it’s corresponding CDF for
the χ2 distribution, answer the following questions.
Chi−Squared CDF
0.8
0.6
0.0
0.00
0.2
0.4
F(x)
0.08
0.04
f(x)
0.12
1.0
Chi−Squared Density
0
5
10
15
x
20
25
0
5
10
15
20
25
x
Question 3. Find P (x > 10)
Anthony Tanbakuchi
MAT167
14 of 15
1.5 Additional Examples
Question 4. Find P25
The SAT-I scores for females is normally distributes with a mean of 998
and a standard deviation of 202 (based on data from the college board).
Question 5. If a female is randomly selected, what is the probability that her
score is greater than 1100?
Question 6. What would the score be for P75 ?
Question 7. What proportion of students scored between 500-1100?
Replacement times for CD players are normally distributed with a mean of
7.1 years and a standard deviation of 1.4 years.
Question 8. Find the probability that a randomly selected CD player will have
a replacement time less than 8 years.
Question 9. If you want to provide a warranty so that on only 2% of the CD
players will be replaced before the warranty expires, what is the time length of
the warranty?
Anthony Tanbakuchi
MAT167
Probability density functions
15 of 15
Anthony Tanbakuchi
MAT167