Download Review Lecture - Mathematics and Statistics

Stats 245.3(02) Review Summarizing Data Graphical Methods Histogram Grouped Freq Table 8 7 6 5 4 3 2 1 0 70 to 80 80 to 90 90 to 100 100 to 110 110 to 120 120 to 130 Stem-Leaf Diagram 8 9 10 11 12 024669 04455699 224559 189 70 to 80 80 to 90 90 to 100 100 to 110 110 to 120 120 to 130 Verbal IQ Math IQ 1 1 6 2 7 11 6 4 3 4 0 1 Box-whisker Plot Summary Numerical Measures Measure of Central Location n 1. Mean x x i 1 i n • Center of gravity 2. Median • “middle” observation Measure of Non-Central Location 1. Percentiles 2. Quartiles 1. Lower quartile (Q1) (25th percentile) (lower mid-hinge) 2. median (Q2) (50th percentile) (hinge) 3. Upper quartile (Q3) (75th percentile) (upper mid-hinge) Measure of Variability (Dispersion, Spread) 1. 2. 3. 4. Range Inter-Quartile Range Variance, standard deviation Pseudo-standard deviation 1. Range R = Range = max - min 2. Inter-Quartile Range (IQR) Inter-Quartile Range = IQR = Q3 - Q1 The Sample Variance Is defined as the quantity: n d i 1 n 2 i n 1   x  x  i 1 2 i n 1 and is denoted by the symbol s 2 The Sample Standard Deviation s Definition: The Sample Standard Deviation is defined by: n s d i 1 n 2 i n 1   x  x  i 1 2 i n 1 Hence the Sample Standard Deviation, s, is the square root of the sample variance. Interpretations of s • In Normal distributions – Approximately 2/3 of the observations will lie within one standard deviation of the mean – Approximately 95% of the observations lie within two standard deviations of the mean – In a histogram of the Normal distribution, the standard deviation is approximately the distance from the mode to the inflection point Mode 0.14 0.12 Inflection point 0.1 0.08 0.06 0.04 s 0.02 0 0 5 10 15 20 25 2/3 s s 2s Computing formulae for s and s2 The sum of squares of deviations from the the mean can also be computed using the following identity: 2   x    i n 2 i 1     xi  n i 1 n n  x  x  i 1 2 i Then: n s  2   x  i  n i 1 2   xi   n i 1  n 1 n  x  x  i 1 i n 1 2 2 and   x    i n 2 i 1   xi   n i 1  n 1 n n s  x  x  i 1 2 i n 1 2 A quick (rough) calculation of s Range s 4 The reason for this is that approximately all (95%) of the observations are between x  2s and x  2s. Thus max  x  2s and min  x  2s. and Range  max  min  x  2s   x  2s .  4s Range Hence s  4 The Pseudo Standard Deviation (PSD) Definition: The Pseudo Standard Deviation (PSD) is defined by: IQR InterQuart ile Range PSD   1.35 1.35 Properties • For Normal distributions the magnitude of the pseudo standard deviation (PSD) and the standard deviation (s) will be approximately the same value • For leptokurtic distributions the standard deviation (s) will be larger than the pseudo standard deviation (PSD) • For platykurtic distributions the standard deviation (s) will be smaller than the pseudo standard deviation (PSD) Measures of Shape Measures of Shape • Skewness 0.14 0.16 0.14 0.12 0.1 0.08 0.06 0.04 0.02 0 0.16 0.14 0.12 0.1 0.08 0.06 0.04 0.02 0 0.12 0.1 0.08 0.06 0.04 0.02 0 0 5 10 15 20 25 0 5 10 15 20 25 0 5 10 15 20 25 • Kurtosis 0.14 0.12 0.1 0.08 0.06 0.04 0.02 0 0 -3 -2 -1 0 1 2 3 0 0 5 10 15 20 25 -3 -2 -1 0 1 2 3 • Skewness – based on the sum of cubes n  x  x  i 1 3 i • Kurtosis – based on the sum of 4th powers n  x  x  i 1 4 i The Measure of Skewness n 1 3  xi  x   n i 1 g1  3 s The Measure of Kurtosis n 1 4  xi  x   n i 1 g2  3 4 s Interpretations of Measures of Shape • Skewness 0.14 0.16 0.14 0.12 0.1 0.08 0.06 0.04 0.02 0 0.12 g1 > 0 0.16 0.14 0.12 0.1 0.08 0.06 0.04 0.02 0 g1 = 0 0.1 0.08 0.06 0.04 0.02 0 0 5 10 15 20 25 0 5 10 15 20 25 g1 < 0 0 5 10 15 20 25 • Kurtosis 0.14 g2 < 0 0.12 g2 = 0 0.1 0.08 0.06 g2 > 0 0.04 0.02 0 0 -3 -2 -1 0 1 2 3 0 0 5 10 15 20 25 -3 -2 -1 0 1 2 3 Inferential Statistics Making decisions regarding the population base on a sample Estimation by Confidence Intervals • Definition – An (100) P% confidence interval of an unknown parameter is a pair of sample statistics (t1 and t2) having the following properties: 1. P[t1 < t2] = 1. That is t1 is always smaller than t2. 2. P[the unknown parameter lies between t1 and t2] = P. • the statistics t1 and t2 are random variables • Property 2. states that the probability that the unknown parameter is bounded by the two statistics t1 and t2 is P. Confidence Interval for a Proportion pˆ  z / 2 pˆ  pˆ  p1  p   n pˆ 1  pˆ  n z / 2  upper  / 2 critical point of the standard normal distribtio n B  z / 2 pˆ  z / 2 p 1  p  n  z / 2  Error Bound pˆ 1  pˆ  n Determination of Sample Size The sample size that will estimate p with an Error Bound B and level of confidence P = 1 –  is: za2/ 2 p * 1  p * n B2 where: • B is the desired Error Bound • z/2 is the /2 critical value for the standard normal distribution • p* is some preliminary estimate of p. Confidence Intervals for the mean of a Normal Population, m x  z / 2 x or x  z / 2 or x  z / 2  n s n x  sample mean z / 2  upper  / 2 critical point of the standard normal distribtio n s  sample standard deviation   Determination of Sample Size The sample size that will estimate m with an Error Bound B and level of confidence P = 1 –  is: z  z s * n  2 2 B B 2 a/2 2 2 a/2 2 where: • B is the desired Error Bound • z/2 is the /2 critical value for the standard normal distribution • s* is some preliminary estimate of s. Hypothesis Testing An important area of statistical inference Definition Hypothesis (H) – Statement about the parameters of the population • In hypothesis testing there are two hypotheses of interest. – The null hypothesis (H0) – The alternative hypothesis (HA) Type I, Type II Errors 1. Rejecting the null hypothesis when it is true. (type I error) 2. accepting the null hypothesis when it is false (type II error) Decision Table showing types of Error H0 is True H0 is False Accept H0 Correct Decision Type II Error Reject H0 Type I Error Correct Decision To define a statistical Test we 1. Choose a statistic (called the test statistic) 2. Divide the range of possible values for the test statistic into two parts • The Acceptance Region • The Critical Region To perform a statistical Test we 1. Collect the data. 2. Compute the value of the test statistic. 3. Make the Decision: • If the value of the test statistic is in the Acceptance Region we decide to accept H0 . • If the value of the test statistic is in the Critical Region we decide to reject H0 . Probability ofhe two types of error Definitions: For any statistical testing procedure define 1.  = P[Rejecting the null hypothesis when it is true] = P[ type I error] 2. b = P[accepting the null hypothesis when it is false] = P[ type II error] Determining the Critical Region 1. The Critical Region should consist of values of the test statistic that indicate that HA is true. (hence H0 should be rejected). 2. The size of the Critical Region is determined so that the probability of making a type I error, , is at some pre-determined level. (usually 0.05 or 0.01). This value is called the significance level of the test. Significance level = P[test makes type I error] To find the Critical Region 1. Find the sampling distribution of the test statistic when is H0 true. 2. Locate the Critical Region in the tails (either left or right or both) of the sampling distribution of the test statistic when is H0 true. Whether you locate the critical region in the left tail or right tail or both tails depends on which values indicate HA is true. The tails chosen = values indicating HA. 3. the size of the Critical Region is chosen so that the area over the critical region and under the sampling distribution of the test statistic when is H0 true is the desired level of  =P[type I error] Sampling distribution of test statistic when H0 is true Critical Region - Area =  The z-tests Testing the probability of success z pˆ  p0  pˆ  pˆ  p0 p0 1  p0  n Testing the mean of a Normal Population z x  m0 x  x  m0  n  n x  m0  x  m0  n s Critical Regions for testing the probability of success, p. The Alternative Hypothesis HA The Critical Region H A : p  p0 z   z / 2 or z  z / 2 H A : p  p0 z  z H A : p  p0 z   z Critical Regions for testing mean, m, of a normal population The Alternative Hypothesis HA The Critical Region H A : m  m0 z   z / 2 or z  z / 2 H A : m  m0 z  z H A : m  m0 z   z • You can compare a statistical test to a meter Value of test statistic Acceptance Region Critical Critical Region Region Critical Region is the red zone of the meter Value of test statistic Acceptance Region Critical Critical Region Region Accept H0 Acceptance Region Value of test statistic Critical Critical Region Region Reject H0 Acceptance Region Critical Region Sometimes the critical region is located on one side. These tests are called one tailed tests. Whether you use a one tailed test or a two tailed test depends on: 1. The hypotheses being tested (H0 and HA). 2. The test statistic. If only large positive values of the test statistic indicate HA then the critical region should be located in the positive tail. (1 tailed test) If only large negative values of the test statistic indicate HA then the critical region should be located in the negative tail. (1 tailed test) If both large positive and large negative values of the test statistic indicate HA then the critical region should be located both the positive and negative tail. (2 tailed test) Usually 1 tailed tests are appropriate if HA is one-sided. Two tailed tests are appropriate if HA is two sided. But not always The p-value approach to Hypothesis Testing Definition – Once the test statistic has been computed form the data the p-value is defined to be: p-value = P[the test statistic is as or more extreme than the observed value of the test statistic when H0 is true] more extreme means giving stronger evidence to rejecting H0 Properties of the p -value 1. If the p-value is small (<0.05 or 0.01) H0 should be rejected. 2. The p-value measures the plausibility of H0. 3. If the test is two tailed the p-value should be two tailed. 4. If the test is one tailed the p-value should be one tailed. 5. It is customary to report p-values when reporting the results. This gives the reader some idea of the strength of the evidence for rejecting H0 Summary • A common way to report statistical tests is to compute the p-value. • If the p-value is small ( < 0.05 or < 0.01) then H0 is rejected. • If the p-value is extremely small this gives a strong indication that HA is true. • If the p-value is marginally above the threshold 0.05 then we cannot reject H0 but there would be a suspicion that H0 is false. “Students” t-test The Situation • Let x1, x2, x3 , … , xn denote a sample from a normal population with mean m and standard deviation . Both m and  are unknown. • Let n x x i 1 n n s i  the sample mean  x  x  i 1 2 i n 1  the sample standard deviation • we want to test if the mean, m, is equal to some given value m0. The Test Statistic x  m0 t s n The sampling distribution of the test statistic is the t distribution with n - 1 degrees of freedom The Alternative Hypothesis HA The Critical Region H A : m  m0 t  t / 2 or t  t / 2 H A : m  m0 t  t H A : m  m0 t  t t and t/2 are critical values under the t distribution with n – 1 degrees of freedom Critical values for the t-distribution  or /2 0 t t / 2 or t Confidence Intervals using the t distribution Confidence Intervals for the mean of a Normal Population, m, using the Standard Normal distribution x  z / 2  n Confidence Intervals for the mean of a Normal Population, m, using the t distribution x  t / 2 s n Testing and Estimation of Variances Sampling Theory The statistic U n  x  x  i 1 i 2 2 n  1 s 2   2 has a c2 distribution with n – 1 degrees of freedom Critical Points of the c2 distribution 0.2 0.1  0 0 5 c 2 10 15 20 Confidence intervals for 2 and  Hence (1 – )100% confidence limits for 2 are:  n  1 s 2 c / 2 2 to  n  1 s 2 c 2 1 / 2 and (1 – )100% confidence limits for  are:  n  1 s c / 2 2 to  n  1 s c12 / 2 Testing Hypotheses for 2 and . Suppose we want to test: H 0 :  2   02 against H A :  2   02 The test statistic: U 2 n  1 s    2 0 If H 0 is true the test statistic, U, has a c2 distribution with n – 1 degrees of freedom: Thus we reject H0 if 2 n  1 s    2 0 c 2 1 / 2 or 2 n  1 s    2 0  c / 2 2 0.2 0.1 /2 /2 Reject Reject Accept 0 0 c12 / 2 5 c2 / 2 10 15 20 One-tailed Tests for 2 and . Suppose we want to test: H 0 :  2   02 against H A :  2   02 The test statistic: We reject H0 if U 2 n  1 s     n  1 s  02 2 2 0  c2 0.2 0.1  Reject Accept 0 0 5 c2 10 15 20 Or suppose we want to test: H 0 :  2   02 against H A :  2   02 2 n  1 s   The test statistic: U We reject H0 if  n  1 s  02  2 2 0  c12 0.2 0.1  Reject Accept 0 0 c12 5 10 15 20 Comparing Populations Proportions and means Comparing proportions Comparing two binomial probabilities p1 and p2 The test statistic z pˆ1  pˆ 2 1 1 pˆ 1  pˆ      n1 n2  where x1 x2 pˆ1  , pˆ 2  n1 n2 x1  x2 and pˆ  n1  n2 The Critical Region The Alternative Hypothesis HA The Critical Region H A : p1  p2 z   z / 2 or z  z / 2 H A : p1  p2 z  z H A : p1  p2 z   z 100(1 – ) % Confidence Interval for d = p1 – p2 : pˆ1  pˆ 2  z / 2 pˆ1 1  pˆ1  pˆ 2 1  pˆ 2   n1 n2 or pˆ1  pˆ 2  B where B  z 2 p1 1  p1  p2 1  p2   n1 n2  pˆ  pˆ 1 2 Sample size determination Confidence Interval for d = p1 – p2 : pˆ1  pˆ 2  B where B  z 2 p1 1  p1  p2 1  p2   n1 n2 Again we want to choose n1 and n2 to set B at some predetermined level with a fixed level of confidence 1 – . Special solutions - case 1: n1 = n2 = n. then n1  n2  n  z / 2 2 p1111 p11  p2 11 p22  22 BB Special solutions - case 2: Choose n1 and n2 to minimize N = n1 + n2 = total sample size then z2 / 2 n1  2 B p 11p   p11 11  p11  p22 11  p22    z2 / 2 n2  2 B p 11p  p1 11 p11 p2 11 p2   11 22 11 22   Special solutions - case 3: Choose n1 and n2 to minimize C = C0 + c1 n1 + c2 n2 = total cost of the study Note: C0 = fixed (set-up) costs c1 = cost per unit in population 1 c2 = cost per unit in population 2 then z2 / 2 n1  2 B   cc2 2  1p111 1p12p211 2p2 p1111p11 c1c1   z2 / 2 n2  2 B   cc1 1  1p111 1p1 2p211 2p2  p221 p22 cc2 2   Comparing Means The z-test z xy  2 1 n   2 2  m n and m large xy 2 x 2 y s s  n m Confidence Interval for d = m1 – m2 :  x1  x2  z 2 2 1 n1   n2 or x1  x2  B where B  z 2  2 1 n1   2 2 2 2 n2 Sample size determination The sample sizes required, n1 and n2, to estimate m1 – m2 within an error bound B with level of confidence 1 –  are: Equal sample sizes n  n1  n2  z / 2 2  1x2   2y2  2 B Minimizing the total sample size N = n1 + n2 . z2 / 2 2 n1  2  1x  1x 2y  B z2 / 2 2 n2  2  x2  1x y2 B Minimizing the total cost C = C0 + c1n1 + c2n2 . z2 / 2 n1  2 B  2   z2 / 2  2 c2 c1 1x 2y  n2  2  2y  1x 2y  1x  c1 B  c2    The t test – for comparing means – small samples (equal variances) Situation • We have two normal populations (1 and 2) • Let m1 and  denote the mean and standard deviation of population 1. • Let m2 and  denote the mean and standard deviation of population 1. • Note: we assume that the standard deviation for each population is the same. 1 = 2 =  The t test for comparing means – small samples (equal variances) xy t 1 1 sPooled  n m sPooled   n  1 sx2   m  1 s y2 nm2 The Alternative Hypothesis HA The Critical Region H A : m1  m2 t  t / 2 or t  t / 2 H A : m1  m2 t  t H A : m1  m2 t  t t / 2 and t are critical points under the t distribution with degrees of freedom n + m –2. Confidence intervals for the difference in two means of normal populations (small sample sizes equal variances) (1 – )100% confidence limits for m1 – m2 1 1  n m  x  y   t / 2 sPooled where sPooled   n  1 s and df  n  m  2 2 x   m  1 s nm2 2 y Tests, Confidence intervals for the difference in two means of normal populations (small sample sizes, unequal variances) The approximate test for a comparing two means of Normal Populations (unequal variances) 2 2 2  sx s y  Test statistic    n m xy  df  t 2 2 2 2 2 2 sx s y 1  sx  1  sy        n m n 1 n  m 1 m  Null Hypothesis H0: m1 = m2 Alt. Hypothesis H0: m1 ≠ m2 H0: m1 > m2 H0: m1 < m2 Critical Region t < -t/2 or t > t/2 t > t t < -t Confidence intervals for the difference in two means of normal populations (small samples, unequal variances) (1 – )100% confidence limits for m1 – m2  x  y   t / 2 with df  s s  n m s s      n m 2 x 2 y 2 x 2 y 2 1 s  1 s       n 1 n  m 1 m  2 x 2 2 y 2 The paired t-test An example of improved experimental design The matched pair experimental design (The paired sample experiment) Prior to assigning the treatments the subjects are grouped into pairs of similar subjects. Suppose that there are n such pairs (Total of 2n = n + n subjects or cases), The two treatments are then randomly assigned to each pair. One member of a pair will receive treatment 1, while the other receives treatment 2. The data collected is as follows: – (x1, y1), (x2 ,y2), (x3 ,y3),, …, (xn, yn) . xi = the response for the case in pair i that receives treatment 1. yi = the response for the case in pair i that receives treatment 2. Let xi = the measurement of the response for the subject in pair i that received treatment 1. Let yi = the measurement of the response for the subject in pair i that received treatment 2. The data x1 y1 x2 y2 x3 y3 … xn yn To test H0: m1 = m2 is equivalent to testing H0: md = 0. (we have converted the two sample problem into a single sample problem). The test statistic is the single sample t-test on the differences d1, d2, d3 , … , dn namely d 0 td  sd n df = n - 1 d  the mean of the d i' s and sd  the std. dev. of the d i' s Testing for the equality of variances The F test The test statistic (F) s F s 2 x 2 y or 1 s  F s 2 y 2 x The sampling distribution of the test statistic If the Null Hypothesis (H0) is true then the sampling distribution of F is called the F-distribution with n1 = n - 1 degrees in the numerator and n2 = m - 1 degrees in the denominator The F distribution n1 = n - 1 degrees in the numerator 0.7 n2 = m - 1 degrees in the denominator 0.6 0.5 0.4 0.3 0.2  0.1 0 0 1 2 3 F(n1, n2) 4 5 Critical region for the test: H0 :  x2   y2 against H A :  x2   y2 (Two sided alternative) Reject H0 if or sx2 F  2  F / 2  n  1, m  1 sy 2 y 2 x 1 s   F / 2  m  1, n  1 F s Critical region for the test (one tailed): H0 :    against H A :    2 x 2 y 2 x (one sided alternative) Reject H0 if sx2 F  2  F  n  1, m  1 sy 2 y Summary of Tests One Sample Tests Situation Test Statistic Sample form the Normal distribution with unknown mean and known variance (Testing m) z Sample form the Normal distribution with unknown mean and unknown variance (Testing m) Testing of a binomial probability  Sample form the Normal distribution with unknown mean and unknown variance (Testing ) t z n x  m0  H0 m  m 0 n x  m 0  s pˆ  p0 p0 (1  p0 ) n  n  1s 2 U  02 m  m    p= p0  0 HA m  m m  m m  m m  m m  m m  m p ≠p0  p >p0  p0 p <   0 Critical Region z < -z/2 or z > z/2 z > z z <-z t < -t/2 or t > t/2 t > t t < -t z < -z/2 or z > z/2 z > z z < -z U  c 12 / 2 n  1 or   0 U  c 2 n  1   0 U  c 12 n  1 U  c 2 / 2 n  1 Two Sample Tests Situation Two independent samples from the Normal distribution with unknown means and known variances (Testing m1 - m2) Test Statistic  x1  x2  z  12 n1  H0 HA Critical Region m1  m 2 m1  m 2 z < -z/2 or z > z/2  22 m1  m 2 z > z n2 m1  m 2 z < -z Two independent samples from the Normal distribution with unknown means and unknown but equal variances. (Testing m1 - m2) t sp sp  Estimation of a the difference between two binomial probabilities, p1-p2 x1  x2  zz  m1  m 2 m1  m 2 t < -t/2 or t > t/2 df  n  m  2 1 1  n1 n2 m1  m 2 t > t n  1s12  m  1s22 m1  m 2 t < -t nm2 pˆˆ11 ˆpˆ2 2  11 11  ˆ pˆˆ (1   ) 1  pˆ n1 n2    n1 n2  p11  p22 df  n  m  2 df  n  m  2 p1   2 z < -z/2 or z > z/2 1  p2 p11   p22 z > z p1  p2 z < -z Two Sample Tests - continued Situation Two independent Normal samples with unknown means and variances (unequal) Two independent Normal samples with unknown means and variances (unequal) Test statistic t x1  x2 H0 HA m1  m2 m1 ≠ m2 s12 s22  n1 n2 * = df  t > t df = * m1 < m2 t < - t df = * F > F(n-1, m -1) 1 < 2 1/F > F(m-1, n -1) 2 2 1 s  1  s 2y       n2  1  nm2  n1 1  n 1  m 2 F > F/2(n-1, m -1) or 1/F > F/2(m-1, n -1) 1 > 2  s1x2 s22y      n1 nm2  2 1x t < - t/2 or t > t/2 df = * m1 > m2 1  2 1 ≠ 2 s12 1 s22 F  2 or  s2 F s12 Critical Region 2 The paired t test Situation n matched pair of subjects are treated with two treatments. di = xi – yi has mean d = m1 – m2 Test statistic t H0 HA Critical Region m1  m2 m1 ≠ m2 d sd n Independent Treat Treat samples 2 1 Possibly equal numbers t < - t/2 or t > t/2 df = n - 1 m1 > m2 t > t df = n - 1 m1 < m2 t < - t df = n - 1 Matched Pairs Treat Treat 2 1 Pair 1Pair 2Pair 3 Pair n Comparing k Populations Means – One way Analysis of Variance (ANOVA) The F test The F test – for comparing k means Situation • We have k normal populations • Let mi and  denote the mean and standard deviation of population i. • i = 1, 2, 3, … k. • Note: we assume that the standard deviation for each population is the same. 1 = 2 = … = k =  We want to test H 0 : m1  m2  m3    mk against H A : mi  m j for at least one pair i, j To test H 0 : m1  m2  m3    mk against H A : mi  m j for at least one pair i, j use the test statistic 2 Between 2 Pooled s F s  k  n x  x  i 1 2 i i k 1 k   2 ni  1si   ni  k   i 1  i 1  where xi  mean for the i th sample. th si  standard deviation for the i sample n1 x1   nk xk x  overall mean n1   nk k k the statistic n x  x i i 1 2 i is called the Between Sum of Squares and is denoted by SSBetween It measures the variability between samples k – 1 is known as the Between degrees of freedom and k  n  x  x   k  1 2 i 1 i i is called the Between Mean Square and is denoted by MSBetween k 2 n  1 s  i  i the statistic i 1 is called the Within Sum of Squares and is denoted by SSWithin k n k  N k i 1 i is known as the Within degrees of freedom and k   n  1 s i 1 i 2 i  k    ni  k   i 1  is called the Within Mean Square and is denoted by MSWithin then F  MS Between MSWithin The Computing formula for F: Compute ni 1) 2) Ti   xij  Total for sample i j 1 k k G   Ti   xij  Grand Total i 1 k 3) i 1 ni  x ij i 1 j 1 k 5) i 1 j 1 N   ni  Total sample size k 4) ni 2 Ti  i 1 ni 2 Then 1) SS Between 3) 2 Ti G   N i 1 ni k 2) 2 k ni k 2 Ti SSW ithin   xij   i 1 j 1 i 1 ni 2 SS Between k  1 F SSW ithin  N  k  The critical region for the F test We reject H 0 : m1  m2  m3    mk if F  F F is the critical point under the F distribution with n1 = k - 1degrees of freedom in the numerator and n2 = N – k degrees of freedom in the denominator The ANOVA Table A convenient method for displaying the calculations for the F-test Anova Table Source d.f. Sum of Squares Between k-1 SSBetween Mean Square MSBetween Within N-k SSWithin MSWithin Total N-1 SSTotal F-ratio MSB /MSW Fishers LSD (least significant difference) procedure: 1. Test H0: m1 = m2 = m3 = … = mk against HA: at least one pair of means are different, using the ANOVA F-test 2. If H0 is accepted we know that all means are equal (not significantly different). Then stop in this case 3. If H0 is rejected we conclude that at least one pair of means is significantly different, then follow this by • using two sample t tests to determine which pairs means are significantly different Comparing k Populations Proportions The c2 test for independence 1. The no. of populations (columns) k (or c) 2. The number of categories (rows) from 2 to r. 1 2 c Total 1 x11 x12 R1 2 x21 x22 R2 Rr Total C1 C2 Cc N The c2 test for independence Situation • • • • We have two categorical variables R and C. The number of categories of R is r. The number of categories of C is c. We observe n subjects from the population and count xij = the number of subjects for which R = i and C = j. • R = rows, C = columns The c2 test for independence Define c Ri   xij  i th row Total j 1 c Ci   xij  j th column Total i 1 Eij  Ri C j n = Expected frequency in the (i,j) th cell in the case of independence. Then to test H0: R and C are independent against HA: R and C are not independent Use test statistic r c c   2 i 1 j 1 x ij  Eij  2 Eij Eij= Expected frequency in the (i,j) th cell  in the case of independence. xij= observed frequency in the (i,j) th cell Ri C j n Sampling distribution of test statistic when H0 is true r c c   2 x ij  Eij  2 Eij i 1 j 1 - c2 distribution with degrees of freedom n = (r - 1)(c - 1) Critical and Acceptance Region Reject H0 if : c 2  c2 Accept H0 if : c  c 2 2 Linear Regression Hypothesis testing and Estimation Assume that we have collected data on two variables X and Y. Let (x1, y1) (x2, y2) (x3, y3) … (xn, yn) denote the pairs of measurements on the on two variables X and Y for n cases in a sample (or population) The Statistical Model Each yi is assumed to be randomly generated from a normal distribution with mean mi =  + bxi and standard deviation . (, b and  are unknown) slope = b yi   + b xi  xi Y =  + bX The Data The Linear Regression Model • The data falls roughly about a straight line. 160 Y =  + bX 140 120 100 unseen 80 60 40 20 0 40 60 80 100 120 140 The Least Squares Line Fitting the best straight line to “linear” data Let Y=a +bX denote an arbitrary equation of a straight line. a and b are known values. This equation can be used to predict for each value of X, the value of Y. For example, if X = xi (as for the ith case) then the predicted value of Y is: yˆ i  a  bxi The residual ri  yi  yî  yi  a  bxi  can be computed for each case in the sample, r1  y1  yˆ1, r2  y2  yˆ 2 ,, rn  yn  yˆ n , The residual sum of squares (RSS) is n n n RSS   ri    yi  yˆ i     yi  a  bxi  2 i 1 i 1 2 i 1 a measure of the “goodness of fit of the line Y = a + bX to the data 2 The optimal choice of a and b will result in the residual sum of squares n n n RSS   ri    yi  yˆ i     yi  a  bxi  2 i 1 i 1 2 i 1 attaining a minimum. If this is the case than the line: Y = a + bX is called the Least Squares Line 2 Comments • b and  are the slope and intercept of the regression line (unseen) • b and a are the slope and intercept of the least squares line (calculated from the data bˆ and ˆ are sometimes used in place of b and a • They represent the same quantities The equation for the least squares line n Let 2 S xx    xi  x  i 1 n S yy    yi  y  2 i 1 n S xy   xi  x  yi  y  i 1 Computing Formulae: 2     xi  n n 2 i 1   2 S xx    xi  x    xi  n i 1 i 1 2 n     yi  n n 2 S yy    yi  y    yi2   i 1  n i 1 i 1 n n S xy   xi  x  yi  y  i 1  n  n    xi   yi  n i 1 i 1       xi yi  n i 1 Then the slope of the least squares line can be shown to be: n b S xy S xx   x  x  y i i 1 n  y i  x  x  i 1 2 i and the intercept of the least squares line can be shown to be: a  y  bx  y  S xy S xx x The residual sum of Squares n n RSS    yi  yî     yi   a  bxi   2 i 1  S xy   S yy  S xx 2 i 1 2 Computing formula Estimating , the standard deviation in the regression model : n s y i 1 i  yˆ i  n2 n 2    y  a  bx  2 i i 1 i n2  S xy  1    S yy  n  2  S xx 2    Computing formula This estimate of  is said to be based on n – 2 degrees of freedom Sampling distributions of the estimators The sampling distribution slope of the least squares line : n b S xy S xx   x  x  y i i 1 n  y i  x  x  2 i i 1 It can be shown that b has a normal distribution with mean and standard deviation mb  b and  b   S xx   n  x  x  i 1 2 i Thus z b  mb b  bb  S xx has a standard normal distribution, and b  mb bb t  s sb S xx has a t distribution with df = n - 2 (1 – )100% Confidence Limits for slope b : bˆ  t  /2 s S xx t/2 critical value for the t-distribution with n – 2 degrees of freedom Testing the slope H 0 : b  b 0 vs H A : b  b 0 The test statistic is: b  b0 t s S xx - has a t distribution with df = n – 2 if H0 is true. The Critical Region Reject H 0 : b  b 0 vs H A : b  b 0 if b  b0 t  t / 2 or t  t / 2 s S xx df = n – 2 This is a two tailed tests. One tailed tests are also possible The sampling distribution intercept of the least squares line : a  ˆ  y  bx  y  S xy S xx x It can be shown that a has a normal distribution with mean and standard deviation 1 m a   and  a    n x n 2  x  x  i 1 2 i Thus z a  ma a a   1   n x 2 n  x  x  i i 1 has a standard normal distribution and a  ma t  sa a  1 s  n x 2 n  x  x  i 1 i has a t distribution with df = n - 2 2 2 (1 – )100% Confidence Limits for intercept  : 2 1 x ˆ  t / 2 s  n S xx t/2 critical value for the t-distribution with n – 2 degrees of freedom Testing the intercept H 0 :   0 vs H A :   0 The test statistic is: t 1 s  n a  0 x n 2  x  x  i 1 2 i - has a t distribution with df = n – 2 if H0 is true. The Critical Region Reject H 0 :    0 vs H A :    0 if a  0 t  t / 2 or t  t / 2 sa df = n – 2 Confidence Limits for Points on the Regression Line • The intercept  is a specific point on the regression line. • It is the y – coordinate of the point on the regression line when x = 0. • It is the predicted value of y when x = 0. • We may also be interested in other points on the regression line. e.g. when x = x0 • In this case the y – coordinate of the point on the regression line when x = x0 is  + b x0 y=+bx  + b x0 x0 (1- )100% Confidence Limits for  + b x0 : 1  x0  x  a  bx0  t / 2 s  n S xx 2 t/2 is the /2 critical value for the t-distribution with n - 2 degrees of freedom Prediction Limits for new values of the Dependent variable y • An important application of the regression line is prediction. • Knowing the value of x (x0) what is the value of y? • The predicted value of y when x = x0 is: yˆ    bx0 • This in turn can be estimated by:. yˆ  ˆ  bˆx0  a  bx0 The predictor yˆ  ˆ  bˆx0  a  bx0 • Gives only a single value for y. • A more appropriate piece of information would be a range of values. • A range of values that has a fixed probability of capturing the value for y. • A (1- )100% prediction interval for y. (1- )100% Prediction Limits for y when x = x0: 1  x0  x  a  bx0  t / 2 s 1   n S xx 2 t/2 is the /2 critical value for the t-distribution with n - 2 degrees of freedom Correlation Definition The statistic: n r S xy S xx S yy   x  x  y i 1 n i i  y n  x  x    y i 1 2 i i 1  y 2 i is called Pearsons correlation coefficient Properties 1. -1 ≤ r ≤ 1, |r| ≤ 1, r2 ≤ 1 2. |r| = 1 (r = +1 or -1) if the points (x1, y1), (x2, y2), …, (xn, yn) lie along a straight line. (positive slope for +1, negative slope for -1) The test for independence (zero correlation) H0: X and Y are independent HA: X and Y are correlated The test statistic: t  n2 The Critical region r 1 r 2 Reject H0 if |t| > ta/2 (df = n – 2) This is a two-tailed critical region, the critical region could also be one-tailed Spearman’s rank correlation coefficient r (rho) Spearman’s rank correlation coefficient r (rho) Spearman’s rank correlation coefficient is computed as follows: • Arrange the observations on X in increasing order and assign them the ranks 1, 2, 3, …, n • Arrange the observations on Y in increasing order and assign them the ranks 1, 2, 3, …, n. •For any case (i) let (xi, yi) denote the observations on X and Y and let (ri, si) denote the ranks on X and Y. Spearman’s rank correlation coefficient is defined as follows: For each case let di = ri – si = difference in the two ranks. Then Spearman’s rank correlation coefficient (r) is defined as follows: n r  1 6 d i 1 2 2 i nn  1 Properties of Spearman’s rank correlation coefficient r 1. The value of r is always between –1 and +1. 2. If the relationship between X and Y is positive, then r will be positive. 3. If the relationship between X and Y is negative, then r will be negative. 4. If there is no relationship between X and Y, then r will be zero. 5. The value of r will be +1 if the ranks of X completely agree with the ranks of Y. 6. The value of r will be -1 if the ranks of X are in reverse order to the ranks of Y. Relationship between Regression and Correlation Recall r S xy S xx S yy Also bˆ  since S xy S xx  sy sx r S yy S xx sx  and s y  n 1 n 1 Thus the slope of the least squares line is simply the ratio of the standard deviations × the correlation coefficient The coefficient of Determination Sums of Squares associated with Linear Regresssion n n n RSS   ri    yi  yˆ i     yi  a  bxi  2 2 i 1 i 1 i 1 = SSunexplained n SSTotal    yi  y  2 i 1 n SS Explained    yˆ i  y  i 1 2 2 It can be shown: n y i 1 i n n  y     yˆ i  y     yi  yˆ i  2 i 1 2 i 1 SSTotal  SS Explained  SSUn exp lained (Total variability in Y) = (variability in Y explained by X) + (variability in Y unexplained by X) 2 It can also be shown: n r  2   yˆ i 1 n y i 1 i  y i  y 2 2 = proportion variability in Y explained by X. = the coefficient of determination Further: n 1 r  2 y i  yˆ i  y  y i 1 n i 1 2 2 i = proportion variability in Y that is unexplained by X. Regression (in general) In many experiments we would have collected data on a single variable Y (the dependent variable ) and on p (say) other variables X1, X2, X3, ... , Xp (the independent variables). One is interested in determining a model that describes the relationship between Y (the response (dependent) variable) and X1, X2, …, Xp (the predictor (independent) variables. This model can be used for – Prediction – Controlling Y by manipulating X1, X2, …, Xp The Model: is an equation of the form Y = f(X1, X2,... ,Xp | q1, q2, ... , qq) + e where q1, q2, ... , qq are unknown parameters of the function f and e is a random disturbance (usually assumed to have a normal distribution with mean 0 and standard deviation . The Multiple Linear Regression Model In Multiple Linear Regression we assume the following model Y = b0 + b1 X1 + b2 X2 + ... + bp Xp + e This model is called the Multiple Linear Regression Model. Again are unknown parameters of the model and where b0, b1, b2, ... , bp are unknown parameters and e is a random disturbance assumed to have a normal distribution with mean 0 and standard deviation . Summary of the Statistics used in Multiple Regression The Least Squares Estimates: b0 , b1 , b 2 , , b p , - the values that minimize n RSS    yi  yî  i 1 n  2   yi   b 0  b1 x1i  b 2 x2i  i 1  b p x pi   2 The Analysis of Variance Table Entries a) Adjustedn Total Sum of Squares (SSTotal) SSTotal  y  y . d.f.  n  1 _ 2 i i1 b) Residual Sum of Squares (SSError) n RSS  SSError   2 yi  ŷi  . d.f.  n  p  1 i1 c) Regression Sum of Squares (SSReg) n SSReg  SSb1 ,b2 , ... , bp   Note: ŷ  y . d.f. p _ 2 i i1 n  i1 n _ 2 yi  y   n _ 2 ŷi  y  i1 i.e. SSTotal = SSReg +SSError  i1 2 yi  ŷi  . The Analysis of Variance Table Source Sum of Squares d.f. Regression Error SSReg SSError Total SSTotal Mean Square p SSReg/p = MSReg n-p-1 SSError/(n-p-1) =MSError = s2 n-1 F MSReg/s2 Uses: 1. To estimate 2 (the error variance). - Use s2 = MSError to estimate 2. 2. To test the Hypothesis H0: b1 = b2= ... = bp = 0. Use the test statistic F  MSReg MSError  MSReg s   SSReg p   SS Error 2  n  p  1 - Reject H0 if F > F(p,n-p-1). 3. To compute other statistics that are useful in describing the relationship between Y (the dependent variable) and X1, X2, ... ,Xp (the independent variables). a)R2 = the coefficient of determination = SSReg/SSTotal n 2 ˆ y  y   i =  i 1 n  y i 1  y 2 i = the proportion of variance in Y explained by X1, X2, ... ,Xp 1 - R2 = the proportion of variance in Y that is left unexplained by X1, X2, ... , Xp = SSError/SSTotal. b) Ra2 = "R2 adjusted" for degrees of freedom. = 1 -[the proportion of variance in Y that is left unexplained by X1, X2,... , Xp adjusted for d.f.]  1  MSError MSTotal n  p  1   1 SSTotal  n  1 n  1 SS Error   1  n  p  1 SSTotal n  1  1  R 2   1  n  p  1 SS Error c) R= R2 = the Multiple correlation coefficient of Y with X1, X2, ... ,Xp = SS Re g SS Total = the maximum correlation between Y and a linear combination of X1, X2, ... ,Xp Comment: The statistics F, R2, Ra2 and R are equivalent statistics. Logistic regression The dependent variable y is binary. It takes on two values “Success” (1) or “Failure” (0) We are interested in predicting a y from a continuous dependent variable x. This is the situation in which Logistic Regression is used The logisitic Regression Model Let p denote P[y = 1] = P[Success]. This quantity will increase with the value of x. p is called the odds ratio The ratio: 1 p This quantity will also increase with the value of x, ranging from zero to infinity.  p  The quantity: ln    1 p  is called the log odds ratio The logisitic Regression Model Assumes the log odds ratio is linearly related to x.  p  i. e. : ln    b0  b1 x  1 p  In terms of the odds ratio p b0  b1 x e 1 p The logisitic Regression Model Solving for p in terms x. b0  b1 x e p b0  b1 x 1 e Interpretation of the parameter b0 (determines the intercept) 1 0.8 p 0.6 b0 0.4 e 1  e b0 0.2 0 0 2 4 x 6 8 10 Interpretation of the parameter b1 (determines when p is 0.50 (along with b0)) 1 b0  b1 x 0.8 p e 1 1 p   b0  b1 x 1 e 11 2 when b0 b 0  b1 x  0 or x   b1 0.6 0.4 0.2 0 0 2 4 x 6 8 10 Interpretation of the parameter b1 (determines slope when p is 0.50 ) 1 0.8 p 0.6 slope  0.4 b1 4 0.2 0 0 2 4 x 6 8 10 The Multiple Logistic Regression model Here we attempt to predict the outcome of a binary response variable Y from several independent variables X1, X2 , … etc  p  ln    b0  b1 X1   1 p  b0  b1 X1   b p X p e or p  b0  b1 X1  1 e b p X p  bpX p Nonparametric Statistical Methods Definition When the data is generated from process (model) that is known except for finite number of unknown parameters the model is called a parametric model. Otherwise, the model is called a nonparametric model Statistical techniques that assume a nonparametric model are called non-parametric. Nonparametric Statistical Methods The sign test A nonparametric test for the central location of a distribution To carry out the The Sign test: 1. Compute the test statistic: S = the number of observations that exceed m0 = sobserved 2. Compute the p-value of test statistic, sobserved : p-value = P [S ≥ sobserved ] ( = 2 P [S ≥ sobserved ] for 2-tailed test) where S is binomial, n = sample size, p = 0.50 3. Reject H0 if p-value low (< 0.05) Sign Test for Large Samples If n is large we can use the Normal approximation to the Binomial. Namely S has a Binomial distribution with p = ½ and n = sample size. Hence for large n, S has approximately a Normal distribution with mean and n m S  np  2 standard deviation  S  npq  n 1  1   n 2  2  2  Hence for large n,use as the test statistic (in place of S) n z S  mS S  S n 2 2 Choose the critical region for z from the Standard Normal distribution. i.e. Reject H0 if z < -z/2 or z > z/2 two tailed ( a one tailed test can also be set up. Nonparametric Confidence Intervals Assume that the data, x1, x2, x3, … xn is a sample from an unknown distribution. Now arrange the data x1, x2, x3, … xn in increasing order x(1) < x(2) < x(3) < … < x(n) Hence x(1) = the smallest observation x(2) = the 2nd smallest observation x(n) = the largest observation Consider the kth smallest observation and the kth largest observation in the data x1, x2, x3, … xn x(k) and x(n – k + 1) Hence P[x(k) < median < x(n – k + 1) ] = P[k ≤ the no. of obs greater than the median ≤ n-k] = p(k) + p(k + 1) + … + p(n-k) = P where p(i)’s are binomial probabilities with n = the sample size and p =1/2. This means that x(k) to x(n – k + 1) is a P(100)% confidence interval for the median Choose k so that P = p(k) + p(k + 1) + … + p(n-k) is close to .95 (or 0.99) Summarizing x(k) to x(n – k + 1) is a P(100)% confidence interval for the median where P = p(k) + p(k + 1) + … + p(n-k) and p(i)’s are binomial probabilities with n = the sample size and p =1/2. For large values of n one can use the normal approximation to the Binomial to find the value of k so that x(k) to x(n – k + 1) is a 95% confidence interval for the median. n  1.96 n k 2 n  1.96 n Using k  2 n 20 40 100 200 k 5.6 13.8 40.2 86.1 The Wilcoxon Signed Rank Test An Alternative to the sign test • For Wicoxon’s signed-Rank test we would assign ranks to the absolute values of (x1 – m0 , x2 – m0 , … , xn – m0). • A rank of 1 to the value of xi – m0 which is smallest in absolute value. • A rank of n to the value of xi – m0 which is largest in absolute value. W+ = the sum of the ranks associated with positive values of xi – m0. W- = the sum of the ranks associated with negative values of xi – m0. To carry out Wilcoxon’s signed rank test We 1. Compute T = W+ or W- (usually it would be the smaller of the two) 2. Let tobserved = the observed value of T. 3. Compute the p-value = P[T ≤ tobserved] (2 P[T ≤ tobserved] for a two-tailed test). i. ii. 4. For n ≤ 12 use the table. For n > 12 use the Normal approximation. Conclude HA (Reject H0) if p-value is less than 0.05 (or 0.01). For sample sizes, n > 12 we can use the fact that T (W+ or W-) has approximately a normal distribution with nn  1 mean mT  4 nn  12n  1 standard deviation  T  24  T  mT t  mT   t  mT  and PT  t   P     P Z   T  T   T  Comments The t – test 1. i. ii. This test requires the assumption of normality. If the data is not normally distributed the test is invalid • iii. 2. The probability of a type I error may not be equal to its desired value (0.05 or 0.01) If the data is normally distributed, the t-test commits type II errors with a smaller probability than any other test (In particular Wilcoxon’s signed rank test or the sign test) The sign test i. ii. This test does not require the assumption of normality (true also for Wilcoxon’s signed rank test). This test ignores the magnitude of the observations completely. Wilcoxon’s test takes the magnitude into account by ranking them Two-sample – Non-parametic tests Mann-Whitney Test A non-parametric two sample test for comparison of central location The Mann-Whitney Test • This is a non parametric alternative to the two sample t test (or z test) for independent samples. • These tests (t and z) assume the data is normal • The Mann- Whitney test does not make this assumption. • Sample of n from population 1 x1, x2, x3, … , xn • Sample of m from population 2 y1, y2, y3, … , ym The Mann-Whitney test statistics U1 and U2 Arrange the observations from the two samples combined in increasing order (retaining sample membership) and assign ranks to the observations. Let W1 = the sum of the ranks for sample 1. Let W2 = the sum of the ranks for sample 2. Then U1  nm  and n  n  1 U 2  nm   W1 2 m  m  1 2  W2 • The distribution function of U (U1 or U2) has been tabled for various values of n and m (<n) when the two observations are coming from the same distribution. • These tables can be used to set up critical regions for the Mann-Whitney U test. The Mann-Whitney test for large samples For large samples (n > 10 and m >10) the statistics U1 and U2 have approximately a Normal distribution with mean and standard nm deviation mU  i 2 U  i nm  n  m  1 12 Thus we can convert Ui to a standard normal statistic nm Ui  U i  mUi 2 z   Ui nm  n  m  1 12 And reject H0 if z < -z/2 or z > z/2 (for a two tailed test) The Kruskal Wallis Test • Comparing the central location for k populations • An nonparametric alternative to the one-way ANOVA F-test Situation: Data is collected from k populations. The sample size from population i is ni. The data from population i is: xi1 , xi 2 , , xini i  1, 2, .k The computation of The Kruskal-Wallis statistic We group the N = n1 + n2 + … + nk observation from k populations together and rank these observations from 1 to N. Let rij be the rank associated with with the observation xij. Handling of “tied” observations If a group of observations are equal the ranks that would have been assigned to those observations are averaged The Kruskal-Wallis statistic k 2 i U 12 K  3  N  1  N  N  1 i 1 ni ni where U i   rij  ri1  j 1  rini = the sum of the ranks for the ith sample The Kruskal-Wallis test Reject H0: the k populations have same central location = p(k) + p(k + 1) + … + p(n-k) = P if K  c2 with d . f .  k  1 Probability Theory Probability – Models for random phenomena Definitions The sample Space, S The sample space, S, for a random phenomena is the set of all possible outcomes. An Event , E The event, E, is any subset of the sample space, S. i.e. any set of outcomes (not necessarily all outcomes) of the random phenomena Venn diagram S E The event, E, is said to have occurred if after the outcome has been observed the outcome lies in E. S E Set operations on Events Union Let A and B be two events, then the union of A and B is the event (denoted by AB) defined by: A  B = {e| e belongs to A or e belongs to B} AB A B The event A  B occurs if the event A occurs or the event and B occurs . AB A B Intersection Let A and B be two events, then the intersection of A and B is the event (denoted by AB) defined by: A  B = {e| e belongs to A and e belongs to B} AB A B The event A  B occurs if the event A occurs and the event and B occurs . AB A B Complement Let A be any event, then the complement of A (denoted by A ) defined by: A = {e| e does not belongs to A} A A The event A occurs if the event A does not occur A A In problems you will recognize that you are working with: 1. Union if you see the word or, 2. Intersection if you see the word and, 3. Complement if you see the word not. Definition: mutually exclusive Two events A and B are called mutually exclusive if: A B  A B If two events A and B are are mutually exclusive then: 1. They have no outcomes in common. They can’t occur at the same time. The outcome of the random experiment can not belong to both A and B. A B Rules of Probability The additive rule P[A  B] = P[A] + P[B] – P[A  B] and P[A  B] = P[A] + P[B] if A  B =  The Rule for complements for any event E P  E   1  P  E  Conditional probability P  A B   P  A  B P  B The multiplicative rule of probability   P  A P  B A if P  A  0 P  A  B     P B P A B if P B  0         and P  A  B  P  A P  B if A and B are independent. This is the definition of independent Counting techniques Summary of counting rules Rule 1 n(A1  A2  A3  …. ) = n(A1) + n(A2) + n(A3) + … if the sets A1, A2, A3, … are pairwise mutually exclusive (i.e. Ai  Aj = ) Rule 2 N = n1 n2 = the number of ways that two operations can be performed in sequence if n1 = the number of ways the first operation can be performed n2 = the number of ways the second operation can be performed once the first operation has been completed. Rule 3 N = n1n2 … nk = the number of ways the k operations can be performed in sequence if n1 = the number of ways the first operation can be performed ni = the number of ways the ith operation can be performed once the first (i - 1) operations have been completed. i = 2, 3, … , k Basic counting formulae 1. Orderings n !  the number of ways you can order n objects 2. Permutations n!  The number of ways that you can n Pk   n  k ! choose k objects from n in a specific order 3. Combinations n n!  The number of ways that you    n Ck  k ! n  k  ! k  can choose k objects from n (order of selection irrelevant) Random Variables Numerical Quantities whose values are determine by the outcome of a random experiment Random variables are either • Discrete – Integer valued – The set of possible values for X are integers • Continuous – The set of possible values for X are all real numbers – Range over a continuum. The Probability distribution of A random variable A Mathematical description of the possible values of the random variable together with the probabilities of those values The probability distribution of a discrete random variable is describe by its : probability function p(x). p(x) = the probability that X takes on the value x. This can be given in either a tabular form or in the form of an equation. It can also be displayed in a graph. Comments: Every probability function must satisfy: 1. The probability assigned to each value of the random variable must be between 0 and 1, inclusive: 0  p( x)  1 2. The sum of the probabilities assigned to all the values of the random variable must equal 1:  p( x)  1 3. Pa  X  b  x b  p( x) x a  p(a)  p(a  1)    p(b) Probability Distributions of Continuous Random Variables Probability Density Function The probability distribution of a continuous random variable is describe by probability density curve f(x). Notes:  The Total Area under the probability density curve is 1.  The Area under the probability density curve is from a to b is P[a < X < b]. Normal Probability Distributions (Bell shaped curve) P(a  x  b) a m b x Mean, Variance and standard deviation of Random Variables Numerical descriptors of the distribution of a Random Variable Mean of a Discrete Random Variable • The mean, m, of a discrete random variable x is found by multiplying each possible value of x by its own probability and then adding all the products together: m   xpx  x  x1 px1   x2 px2     xk pxk  Notes:  The mean is a weighted average of the values of X.  The mean is the long-run average value of the random variable.  The mean is centre of gravity of the probability distribution of the random variable Variance of a Discrete Random Variable: Variance, 2, of a discrete random variable x is found by multiplying each possible value of the squared deviation from the mean, (x  m)2, by its own probability and then adding all the products together:    2   x  m 2 px  2 x   2   x px   xpx  x x    x 2 px  m 2 x Standard Deviation of a Discrete Random Variable: The positive square root of the variance:   2 The Binomial distribution An important discrete distribution X is said to have the Binomial distribution with parameters n and p. 1. X is the number of successes occurring in the n repetitions of a Success-Failure Experiment. 2. The probability of success is p. 3. The probability function n x n x px     p 1  p   x Mean,Variance & Standard Deviation of the Binomial Ditribution • The mean, variance and standard deviation of the binomial distribution can be found by using the following three formulas: 1. m  np 2.   npq  np1  p  2 3.   npq  np1  p Mean of a Continuous Random Variable (uses calculus) • The mean, m, of a discrete random variable x m   xf  x  dx  Notes:  The mean is a weighted average of the values of X.  The mean is the long-run average value of the random variable.  The mean is centre of gravity of the probability distribution of the random variable Variance of a Continuous Random Variable   2    x  m  f  x  dx 2  Standard Deviation of a Continuous Random Variable: The positive square root of the variance:    2    x  m  f  x  dx 2  The Normal Probability Distribution Points of Inflection  m  3 m  2 m   m m  m  2 m  3 Main characteristics of the Normal Distribution • Bell Shaped, symmetric • Points of inflection on the bell shaped curve are at m –  and m + . That is one standard deviation from the mean • Area under the bell shaped curve between m –  and m +  is approximately 2/3. • Area under the bell shaped curve between m – 2 and m + 2 is approximately 95%. Normal approximation to the Binomial distribution Using the Normal distribution to calculate Binomial probabilities Normal Approximation to the Binomial distribution Pa  X  b  p(a)  p(a  1)   p(b)  Pa  12  Y  b  12  • X has a Binomial distribution with parameters n and p • Y has a Normal distribution m  np   npq 1 2  continuity correction Sampling Theory Determining the distribution of Sample statistics The distribution of the sample mean Thus if x1, x2, … , xn denote n independent random variables each coming from the same Normal distribution with mean m and standard deviation . Then n x x i 1 i n has Normal distribution with mean m x  m and variance  x2  2 n standard deviation  x   n The Central Limit Theorem The Central Limit Theorem (C.L.T.) states that if n is sufficiently large, the sample means of random samples from any population with mean m and finite standard deviation  are approximately normally distributed with mean m and standard deviation  . n Technical Note: The mean and standard deviation given in the CLT hold for any sample size; it is only the “approximately normal” shape that requires n to be sufficiently large. Graphical Illustration of the Central Limit Theorem Distribution of x: n=2 Original Population 10 20 30 x 10 20 Distribution of x: n = 30 Distribution of x: n = 10 10 x x 30 10 20 x Implications of the Central Limit Theorem • The Conclusion that the sampling distribution of the sample mean is Normal, will to true if the sample size is large (>30). (even though the population may be nonnormal). • When the population can be assumed to be normal, the sampling distribution of the sample mean is Normal, will to true for any sample size. • Knowing the sampling distribution of the sample mean allows to answer probability questions related to the sample mean. Sampling Distribution of a Sample Proportion Sampling Distribution for Sample Proportions Let p = population proportion of interest or binomial probability of success. Let pˆ  X  no. of succeses n no. of bimomial trials = sample proportion or proportion of successes. Then the sampling distributi on of p̂ is approximately a normal distribution with mean m pˆ  p  pˆ  p(1  p) n Sampling distribution of a differences Note If X, Yare independent normal random variables, then : X – Y is normal with mean m X  mY standard deviation  X2   Y2 Sampling distribution of a difference in two Sample means Situation • We have two normal populations (1 and 2) • Let m1 and 1 denote the mean and standard deviation of population 1. • Let m2 and 2 denote the mean and standard deviation of population 2. • Let x1, x2, x3 , … , xn denote a sample from a normal population 1. • Let y1, y2, y3 , … , ym denote a sample from a normal population 2. • Objective is to compare the two population means Then D  x  y is Normal with mean m x  y  m x  m y  m1 - m2 and  x  y =  x2   y2   12 n   22 m Sampling distribution of a difference in two Sample proportions Situation • Suppose we have two Success-Failure experiments • Let p1 = the probability of success for experiment 1. • Let p2 = the probability of success for experiment 2. • Suppose that experiment 1 is repeated n1 times and experiment 2 is repeated n2 • Let x1 = the no. of successes in the n1 repititions of experiment 1, x2 = the no. of successes in the n2 repititions of experiment 2. x1 x2 pˆ1 = and pˆ 2 = n1 n2 Then D  pˆ1  pˆ 2 is Normal with mean m pˆ  pˆ  m pˆ  m pˆ  p1 - p2 1 2 1  pˆ  pˆ =    1 2 2 pˆ1 2 2 pˆ 2  p1 1  p1  p2 1  p2   n1 n2 The Chi-square (c2) distribution The Chi-squared distribution with n degrees of freedom Comment: If z1, z2, ..., zn are independent random variables each having a standard normal distribution then 2 2 2 U = z1  z2    zn has a chi-squared distribution with n degrees of freedom. The Chi-squared distribution with n degrees of freedom 0.18 0.12 0.06 0 0 10 n - degrees of freedom 20 0.5 2 d.f. 0.4 3 d.f. 0.3 4 d.f. 0.2 0.1 2 4 6 8 10 12 14 Statistics that have the Chi-squared distribution: c 1. r c 2   j 1 i 1 x ij  Eij  Eij 2 c r   rij2 j 1 i 1 This statistic is used to detect independence between two categorical variables d.f. = (r – 1)(c – 1) Let x1, x2, … , xn denote a sample from the normal distribution with mean m and standard deviation , then r 2. U  x  x  i 1 i  2 2  (n  1) s  2 2 has a chi-square distribution with d.f. = n – 1.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Review Lecture - Mathematics and Statistics