Download Lecture 5

PHP2510: Principles of Biostatistics & Data Analysis Lecture V: Contiuous Random Variables PHP 2510 – Lec 5: continuous RV 1 Consider a random variable X that can take values 0 and 1 with equal probability. What distribution does X have? What if X can take values 0,.5, and 1, each with equal probability? P (X = 0) = P (X = .5) = P (X = 1) = 1/3 What if X can take values 0, .1, .2, ..., .9, 1.0? P (X = 0) = P (X = .1) = P (X = .2) = . . . = P (X = .9) = P (X = 1) = 1/11 What if X can take any value between 0 and 1, each with same chance? PHP 2510 – Lec 5: continuous RV 2 0.8 0.0 0.4 0.8 0.4 0.0 0.5 1.0 1.5 −0.5 0.0 0.5 1.0 1.5 −0.5 0.0 0.5 1.0 1.5 −0.5 0.0 0.5 1.0 1.5 0.4 0.0 0.0 0.4 0.8 0.0 0.8 −0.5 PHP 2510 – Lec 5: continuous RV 3 For continuous random variables, we use probability density function (PDF) instead of probability mass function to describe the distribution. For discrete random variables we have the sum of all probability mass being 1. X P (X = x) = 1 x∈Ω For continuous random variables we replace the “sum” with integration such that the “area under the curve” is 1. For Unif(0,1),   1 for 0 ≤ x ≤ 1 f (x) =  0 otherwise Z ∞ f (x)dx = 1 −∞ PHP 2510 – Lec 5: continuous RV 4 In general, for uniform distribution between a and b Unif(a, b),   1 for a ≤ x ≤ b b−a f (x) =  0 otherwise Z ∞ f (x)dx = 1 −∞ Unlike a mass function, a density function can take values greater than 1; therefore it should not be interpreted as a probability (more on this later). A continuous random variable X can take values over a continuum. In many cases this will be an interval on the real line R = {−∞, +∞}. PHP 2510 – Lec 5: continuous RV 5 Properties of density functions Let f (x) be the density function for a random variable X. Then 1. Probabilities are calculated in terms of area under the curve Z k2 f (x) dx P (k1 < X < k2 ) = k1 2. The density function integrates to 1 over the interval R Z ∞ f (x) dx = 1 −∞ 3. The cumulative distribution function is F (k) = P (X < k) = P (X ≤ k) = Z k f (x) dx −∞ These ideas are best illustrated with examples. PHP 2510 – Lec 5: continuous RV 6 0.06 For example, Unif(5,25) has probability density function: 0.04 P (X < 10) = Z ∞ f (x)dx 0.02 −∞ = Z 10 0.00 5 0 5 10 15 20 25 .05dx = 5 × .05 = .25 30 P (13 < X < 17) = Z ∞ f (x)dx −∞ = Z 17 13 .05dx = (17 − 13) × .05 = .20 R 10 Notice that P (X = 10) = 10 .05dx = 0. In fact, the probability that a continuous random variable X takes on any particular value is 0. PHP 2510 – Lec 5: continuous RV 7 Example. Suppose X has a uniform distribution on the interval [0, 5]. Find the following: • P (X < 2) • P (1.5 < X < 2) • The cumulative density function (CDF) F (x) = P (X < x) = Z x f (u)du −∞ PHP 2510 – Lec 5: continuous RV 8 Example. 0.20 0.15 f.x 0.20 0.15 f.x 0.25 P(X<2) 0.25 Uniform (0,5) 0 1 2 3 4 5 0 1 2 x 3 4 5 3 4 5 x P(X<k) P(X<k) 0.0 0.15 0.2 0.4 0.20 f.x 0.6 0.8 0.25 1.0 P(1.5<X<2) 0 1 2 3 x PHP 2510 – Lec 5: continuous RV 4 5 0 1 2 k 9 Uniform distribution f (x) = P (X < 2) = 1 ( if x in [0,5]) 5 Z 2 2 1 dx = 5 5 Z 2 .5 1 dx = = .1 5 5 0 P (1.5 < X < 2) = 1.5 F (k) = P (X ≤ k) = Z k 0 k 1 dx = 5 5 What is the average value of this uniform R.V.? PHP 2510 – Lec 5: continuous RV 10 Normal distribution A random variable X having a normal distribution is characterized by its mean µ and variance σ 2 . Its density function is given by the function 2 1 (x − µ) f (x) = √ exp − 2σ 2 σ 2π (notice here that π represents the constant 3.14159....). The probability density function (pdf) of normal distribution is a bell shaped curve. Two parameters determine the distribution, one for the location (mean) and one for its spread (variance). We will formally introduce the definition of mean and variance for a random variable later. PHP 2510 – Lec 5: continuous RV 11 Normal distribution Properties of normal distribution • Takes values from −∞ to ∞: Ω = (−∞, ∞) • symmetrical about its mean. • Mean=median=mode • Like all other distributions, area under the pdf curve is 1 Density of normal distribution N (µ, σ 2 ) 0 2σ 4σ µ PHP 2510 – Lec 5: continuous RV 12 Normal distribution All normal distributions have the same shape: 0.6 0.2 µ −4 −2 0 2 4 2σ 4σ 0.0 0.0 4σ −6 0.4 Probability 0.4 2σ 0.2 Probability 0.6 0.8 mean=0, sd=2 0.8 mean=0, sd=1 6 −6 −4 −2 2 4 2 4 6 0.8 0.6 0.4 2σ 0.2 0.4 Probability 0.8 0.6 mean=1, sd=.5 0.2 Probability mean=−1, sd=1.5 µ 0 2σ 4σ 0.0 0.0 4σ µ −6 −4 −2 0 2 PHP 2510 – Lec 5: continuous RV 4 6 µ −6 −4 −2 0 6 13 Normal distribution 0.0 0.1 0.2 0.3 0.4 Calculating Probabilities: The probability that X falls in any interval (a, b) is the area under the curve between a and b. a µ P (a ≤ X ≤ b) = PHP 2510 – Lec 5: continuous RV b Z a b (x−µ)2 1 − √ e 2σ2 dx σ 2π 14 Normal distribution The standard normal distribution has mean zero and variance one. A random variable having standard normal distribution is usually denoted by Z ∼ N (0, 1). • Its density function is 2 1 z f (z) = √ exp − 2 2π • Tail areas are probabilities 2 Z ∞ z 1 √ exp − dz P (Z > z) = 2 2π z • Tail areas are difficult to compute by hand but exist in tables (Table A1, p. 744 in SMMR, Table 2 in Rice.) PHP 2510 – Lec 5: continuous RV 15 Normal distribution 0.0 0.1 0.2 0.3 0.4 Table for Standard normal distribution probabilities: The integration is not easy to compute by hand but many books provide tables. In your textbook, the area under the curve for the upper tail is given. That is, you can look up P (Z > z0 ) for any z0 0 PHP 2510 – Lec 5: continuous RV z0 16 0.0 0.1 0.2 0.3 0.4 Normal distribution 0 z0 PHP 2510 – Lec 5: continuous RV Notice that not all normal tables give you P (Z > z0 ). Some give you lower tail P (Z < z0 ) (Rice Table 2), some give you P (0 < Z < z0 ), some may even give you P (−z0 < Z < z0 ). It is important to read the description. 17 Normal distribution 0.3 0.4 The symmetry of normal distribution allows you to calculate P (a ≤ Z ≤ b) for any interval provided the table. 0.0 0.1 0.2 P (Z > z0 ) = P (Z < −z0 ) 0 z0 0.4 −z0 0.2 0.3 For any z0 > 0 0.0 0.1 P (0 < Z < z0 ) = .5 − P (Z > z0 ) −z0 0 z0 PHP 2510 – Lec 5: continuous RV 18 Normal distribution 0.2 0.3 0.4 For example, to calculate P (−2.02 < Z < 1.53), 0.0 0.1 You can calculate P (−2.02 ≤ Z ≤ 1.53) = −2.02 0 1.53 we read off the table 1.53 −→ P (Z > 1.53) = .063 2.02 −→ P (Z > 2.02) = P (Z < −2.02) = P (−2.02 ≤ Z ≤ 0) + P (0 ≤ Z ≤ 1.53) = = (.5 − .022) − (.5 − .063) = .915 OR P (−2.02 ≤ Z ≤ 1.53) = = P (Z ≤ 1.53) − P (Z ≤ −2.02) = = (1 − .063) − (.022) = .915 = 0.022 PHP 2510 – Lec 5: continuous RV 19 Normal distribution What about non-standard normal distributions? Recall that all normal distributions have the same shape and all necessarily have total area under the curve =1. Therefore we can ”shift” and ”scale” any normal distribution: If a random variable X follows normal distribution with mean µ and standard deviation σ, then X −µ Z= σ has standard normal distribution with mean 0 and standard deviation 1. PHP 2510 – Lec 5: continuous RV 20 Normal distribution Example: Suppose infant birthweight follow a normal distribution with mean 3000 gram and standard deviation 1000 gram. • What is the probability of an infant weighing more than 5000 g? P (X > 5000) = P( 5000 − 3000 X − 3000 > ) = P (Z > 2) = 0.0228 1000 1000 • What is the probability of an infant weighing less than 3500g? P (X < 3500) = = X − 3000 3500 − 3000 < ) 1000 1000 P (Z < .5) = 1 − P (Z > .5) = 1 − .3085 = .6915 P( • What is the probability of an infant weighing between 2500 and 4000g? P (2500 < X < 4000) PHP 2510 – Lec 5: continuous RV = 4000 − 3000 2500 − 3000 X − 3000 < ) 1000 1000 1000 P (−.5 < Z < 1) = 1 − P (Z > 1) − P (Z < .5) = 1 − .1587 − .3085 = .5328 = P( 21 Normal distribution 0e+00 2e−04 4e−04 Now a more challenging problem: What is the range of birthweight for the middle 80% of babies? 80% 10% 10% a µ=3000 b That is, we want to find out x such that P (−a < X < b) = 80% and a, b are symmetric around mean 3000 Since P (a < X < b) = .8, P (X > b) + P (X < a) = .2, symmetry of normal distribution implies P (X > b) = .1. From the normal table we find P (Z > 1.28) = .100. We also know x−3000 x−3000 > ) = P (Z > ) = .100 P (X > x) = P ( X−3000 1000 1000 1000 Thus x−3000 = 1.28 =⇒ x = 4280 1000 Can you find a? PHP 2510 – Lec 5: continuous RV 22 Normal distribution In general, if a random variable X has normal distribution with mean µ and standard deviation σ, then Y = aX + b is also normally distributed, with mean aµ + b and standard deviation aσ PHP 2510 – Lec 5: continuous RV 23 Standard normal distribution Example – distribution of heights Let X denote height in cm of men randomly sampled from some population. Suppose the mean height is 173cm and the standard deviation is 6.25cm. µ = 172.5, σ 2 = 6.252 We write 0.00 0.01 0.02 0.03 0.04 0.05 0.06 X ∼ N (172.5, 6.252 ). 140 150 160 170 180 190 200 Height PHP 2510 – Lec 5: continuous RV 24 Standard normal distribution The probability that a randomly chosen men • is taller than 180cm > P (X > 180) = P ( X−172.5 6.25 180−172.5 ) 6.25 = P (Z > 1.2) = 0.115 • is shorter than 180cm P (X < 180) = · · · = 0.885 • has height between 165 and 175 cm <Z< P (165 < X < 175) = P ( 165−172.5 6.25 P (−1.2 < Z < 0.40) = 0.54 175−172.5 ) 6.25 = • What is the 90th percentile for heights? Find q such that P (X < q) = P ( q − 172.5 q − 172.5 X − 172.5 < ) = P (Z < ) = .9 6.25 6.25 6.25 We know P (Z < 1.28) = 0.9 from the table, so q = 1.28 × 6.25 + 172.5 = 181 PHP 2510 – Lec 5: continuous RV 25 Standard normal distribution Now consider Y be the height measured in inches when men wear shoes that add 2-inch to their height. 1inch = 2.54cm so Y = X/2.54 + 2 The mean and standard deviation for Y: 172.5/2.54 + 2 = 69.91339 6.25/2.54 = 2.46063 Y ∼ N (69.91339, 2.460632 ) • some one is taller than 180cm means, when he wears shoes that add 2 inches, he would be taller than (180/2.54 + 2 = 72.86614) inches 72.86614−69.91339 > ) = P (Z > P (Y > 72.86614) = P ( Y −69.91339 2.46063 2.46063 1.20) PHP 2510 – Lec 5: continuous RV 26 Standard normal distribution • The probability that a randomly chosen man is less than 65 inches when he wears shoes that add 2 inches, P (Y < 65) is the same as the probability that a randomly chosen man is less than (65-2)*2.54=160.02cm. P (Y < 65) = P (X < 160.02) • both statements above mean that this man is 1.9968 Standard Deviation lower than the average (65 − 69.91339)/2.46063 = (160.02 − 172.5)/6.25 = −1.9968 PHP 2510 – Lec 5: continuous RV 27 Standard normal distribution The normal (Gaussian) distribution is useful for describing the behavior of a continuous random variable. • Sometimes the normal model is used to describe a single random variable such as age, weight, blood pressure, IQ score, etc. • Distribution is symmetric and shaped like a bell (hence ‘bell shaped curve’) • It also can be used to describe the distribution of certain summary statistics, such as a sample mean. In fact the normal distribution is used far more often for applications of this type, than as a model for a single random variable. PHP 2510 – Lec 5: continuous RV 28 Exponential distribution The exponential distribution is useful for modeling waiting times on a continuous scale. It is characterized by a parameter θ, which represents the average waiting time. Or equivalently, by parameter λ = 1/θ, which represents the rate. Its probability density function (PDF) and cumulative density function (CDF) are f (x) F (x) 1 −x/θ e = λe−λx θ  Z x  1 − e−λx = f (u)du =  −∞ 0, = PHP 2510 – Lec 5: continuous RV x≥0 x<0 29 Exponential distribution CDF 1.0 2.0 PDF 0.8 pexp(x, 2) 0.4 0.6 0.0 0.0 0.2 0.5 dexp(x, 2) 1.0 1.5 λ= 2 1 0.5 0 2 4 x 6 8 PHP 2510 – Lec 5: continuous RV 10 0 2 4 x 6 8 10 30 Exponential distribution Memory less property of exponential distribution Exponential distribution is often used to model waiting times. If r.v. T describes the time till when an event happens, and T follows exponential distribution, then P (T > t + s|T > s) = = = P (T > t + s&T > s) P (T > s) P (T > t + s) P (T > s) 1 − F (t + s) 1 − F (s) 1 − e−λ(t+s) = 1 − e−λ(s) = e−λt = P (T > t) PHP 2510 – Lec 5: continuous RV 31 Exponential distribution Given that one has waiting for time s, the probability that the event will happen after a period of length t is the same as the unconditional probability. Relationship between Poisson and exponential distribution: If events occur in time as a Poisson process with mean λ. Let X(t) represent the number of events happened between (t0 , t0 + t) P (T > t) = P (X(t) = 0) = e−λt PHP 2510 – Lec 5: continuous RV 32

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Lecture 5