Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
1 Schaum’s Outline Probability and Statistics Chapter 7 HYPOTHESIS TESTING presented by Professor Carol Dahl Examples by Alfred Aird Kira Jeffery Catherine Keske Hermann Logsend Yris Olaya 2 Outline of Topics Topics Covered Statistical Decisions Statistical Hypotheses Null Hypotheses Tests of Hypotheses Type I and Type II Errors Level of Significance Tests Involving the Normal Distribution One and Two – Tailed Tests P – Value 3 Outline of Topics (Continued) Special Tests of Significance Large Samples Small Samples Estimation Theory/Hypotheses Testing Relationship Operating Characteristic Curves and Power of a Test Fitting Theoretical Distributions to Sample Frequency Distributions Chi-Square Test for Goodness of Fit 4 “The Truth Is Out There” The Importance of Hypothesis Testing Hypothesis testing helps evaluate models based upon real data enables one to build a statistical model enhances your credibility as analyst economist 5 Statistical Decisions Innocent until proven guilty principle Want to prove someone is guilty Assume the opposite or status quo - innocent Ho: Innocent H1: Guilty Take subsample of possible information If evidence not consistent with innocent - reject Person not pronounced innocent but not guilty 6 Statistical Decisions Status quo innocence = null hypothesis Evidence = sample result Reasonable doubt = confidence level 7 Statistical Decisions Eg. Tantalum ore deposit feasible if quality > 0.0600g/kg with 99% confidence 100 samples collected from large deposit at random. Sample distribution mean of 0.071g/kg standard deviation 0.0025g/kg. 8 Statistical Decisions Should the deposit be developed? Evidence = 0.071 (sample mean) Reasonable doubt = 99% Status quo = do not develop the deposit Ho: < 0.0600 H1: > 0.0600 9 Statistical Hypothesis General Principles Inferences about population using sample statistic Prove A is true by assuming it isn’t true Results of experiment (sample) compared with model If results of model unlikely, reject model If results explained by model, do not reject 10 Statistical Hypothesis Event A fairly likely, model would be retained Event B unlikely, model would be rejected Area B 0 A z 11 Statistical Decisions Should the deposit be developed? Evidence = 0.071 (sample mean) Reasonable doubt = 99% Status quo = do not develop the deposit Ho: = 0.0600 H1: > 0.0600 How likely Ho given X = 0.071 12 Need Sampling Statistic Need statistic with population parameter estimate for population parameter its distribution 13 Need Sampling Statistic Population Normal - Two Choices Small Sample <30 Known Variance X n N(0,1) Unknown Variance X ŝ n tn-1 14 Need Sampling Statistic Population Not-Normal Large Sample Known Variance X n Unknown Variance X ŝ n N(0,1) N(0,1) Doesn’t matter if know variance of not If population is finite sampling no replacement need adjustment 15 Normal Distribution X~N(0,1) =0 SD=1 (68%) SD=2 (95%) SD=3 (99.7%) 27 16 Statistical Decisions Should the deposit be developed? Evidence: 0.071 (sample mean) 0.0025g/kg (sample variance) 0.05 (sample standard deviation) Reasonable doubt = 99% Status quo = do not develop the deposit Ho: = 0.0600 H1: > 0.0600 One tailed test How likely Ho given X = 0.071 17 Hypothesis test Evidence: 0.071 (sample mean) 0.05g/kg (sample standard deviation) Reasonable doubt = 99% Status quo = do not develop the deposit Ho: = 0.0600 H1: > 0.0600 X P( Z c ) 0.99 1 ŝ n 18 Statistical Hypothesis Eg. Z = (0.071 – 0.0600)/ (0.05/ 100) = 2.2 Conclusion: Don’t reject Ho , don’t develop deposit 2.2 Zc=2.33 19 Null Hypothesis Hypotheses cannot be proven reject or fail to reject based on likelihood of event occurring null hypothesis is not accepted 20 Test of Hypotheses Maple Creek Mine and Potaro Diamond field in Guyana Mine potential for producing large diamonds Experts want to know true mean carat size produced True mean said to be 4 carats Experts want to know if true with 95% confidence Random sample taken Sample mean found to be 3.6 carats Based on sample, is 4 carats true mean for mine? 21 Tests of Hypotheses Tests referred to as: “Tests of Hypotheses” “Tests of Significance” “Rules of Decision” 22 Types of Errors Ho: µ = 4 (Suppose this is true) H1: µ 4 Two tailed test Choose = 0.05 Sample n = 100 (assume X is normal), = 1 X4 P( 1.96 1.96) 0.95 1 n 23 Type I error () –reject true Ho: µ = 4 suppose true X4 P( 1.96 1.96) 0.95 1 n /2 /2 24 Type II Error (ß) - Accept False Ho: µ = 4 not true µ = 6 true X-µ not mean 0 but mean 2 ß μ=4 0 μ=6 2 25 Lower Type I What happens to Type II Ho: µ = 4 not true µ = 6 true ß μ=4 μ=6 0 2 26 Higher µ What happens to Type II? Ho: µ = 4 not true µ = 7 true X-µ not mean 0 but mean 3 ß μ=4 0 μ=7 3 27 Type I and Type II Errors Two types of errors can occur in hypothesis testing To reduce errors, increase sample size when possible P( Type I Error ) P( Type II Error ) Ho True Ho False Reject Ho Type I Error Correct Decision Do Not Reject Ho Correct Decision Type II Error 28 To Reduce Errors Increase sample size when possible Population, n = 5, 10, 20 Mean Sampling Distributions Difference Sample Sizes 2.5 2 1.5 1 0.5 0 -4 -2 -0.5 0 2 4 29 Error Examples Type I Error – rejecting a true null hypothesis Convicting an innocent person Rejecting true mean carat size is 4 when it is Type II Error – not rejecting a false null hypothesis Setting a guilty person free Not rejecting mean carat size is 4 when it’s not 30 Level of Significance () α = max probability we’re willing to risk Type I Error = tail area of probability density function If Type I Error’s “cost” high, choose α low α defined before hypothesis test conducted α typically defined as 0.10, 0.05 or 0.01 α = 0.10 for 90% confidence of correct test decision α = 0.05 for 95% confidence of correct test decision α = 0.01 for 99% confidence of correct test decision 31 Diamond Hypothesis Test Example Ho: µ = 4 H1: µ 4 Choose α = 0.01 for 99% confidence Sample n = 100, = 1 X = 3.6, -Zc = - 2.575, Zc = 2.575 -2.575 .005 2.575 .005 32 Example Continued 21 X - 3.2 4 z 2 2 1 100 n 1 - 2 ( z ) not - 2.575 ( z 2 ) Observed not “significantly” different from expected Fail to reject null hypothesis We’re 99% confident true mean is 4 carats 33 Tests Involving the t Distribution Billy Ray has inherited large, 25,000 acre homestead Located on outskirts of Murfreesboro, Arkansas, near: Crater of Diamonds State Park Prairie Creek Volcanic Pipe Land now used for agricultural recreational No official mining has taken place 34 Case Study in Statistical Analysis Billy Ray’s Inheritance Billy Ray must now decide upon land usage Options: Exploration for diamonds Conservation Land biodiversity and recreation Agriculture and recreation Land development 35 Consider Costs and Benefits of Mining Cost and Benefits of Mining Opportunity cost Excessive diamond exploration damages land’s value Exploration and Mining Costs Benefit Value of mineral produced 36 Consider Costs and Benefits of Mining Cost and Benefits of Mining Sample for geologic indicators for diamonds kimberlite or lamporite larger sample more likely to represent “true population” larger sample will cost more 37 How to decide one tailed or two tailed One tailed test Do we change status quo only if its bigger than null Do we change status quo only if its smaller than null Two tailed test Change status quo if its bigger of if it smaller 38 Tests of Mean Normal or t population normal known variance Normal small sample population normal unknown variance small sample large population t Normal 39 Difference Normal and t 0.6 0.5 0.4 0.3 0.2 0.1 0 -5 0 t “fatter” tail than normal bell-curve 5 40 Hypothesis and Sample Need at least 30 g/m3 mine Null hypothesis Ho: µ = 20 Alternative hypothesis H1: ? Sample data: n=16 (holes drilled) X close to normal X =31 g/m³ variance (ŝ2/n)=0.286 g/m³ 41 Normal or t? One tailed Null hypothesis Ho: µ = 30 Alternative hypothesis H1: µ > 30 Sample data: n = 16 (holes drilled) X = 31 g/m³ variance (ŝ2) = 4.29 g/m³ = 4.29 standard deviation ŝ = 2.07 small sample, estimated variance, X close to normal not exactly t but close if X close to normal 42 Tests Involving the t Distribution tn-1 = X - µ ŝ/n t16-1 =0 Reject 5% tc=1.75 43 Tests Involving the t Distribution tn-1 = X - µ = (31 - 30) = 1.93 ŝ/n 2.07/ 16 t16-1 =0 Reject 5% tc=1.75 44 Wells produces oil X= API Gravity approximate normal with mean 37 periodically test to see if the mean has changed too heavy or too light revise contract Ho: H1: Sample of 9 wells, X= 38, ŝ2 = 2 What is test statistic? Normal or t? 45 Two tailed t test on mean tn-1 = X - µ ŝ/n =0 Reject /2% Reject /2% tc tc 46 Two tailed t test on mean Ho: µ= 37 H1: µ 37 Sample of 9 wells, X= 38, ŝ2 = 2, = 10% tn-1 = X - µ = (38 – 37) = 1.5 ŝ/n 2/ 9 47 P-values - one tailed test Level of significance for a sample statistic under null Largest for which statistic would reject null t16-1 = X - µ = (31 - 30) = 1.93 ŝ/n 2.07/ 16 P=0.04 tinv(1,87,15,1) 48 P-value two tailed test Ho: µ= 37 H1: µ 37 Sample of 9 wells, X= 38, ŝ2 = 2, = 10% tn-1 = X - µ = (38 – 37) = 1.5 ŝ/n 2/ 9 =TDIST(1.5,8,2) = 0.172 49 Formal Representation of p-Values p-Value < = Reject Ho p-Value > = Fail to reject Ho 50 More tests Survey: - Ranking refinery managers Daily refinery production Sample two refineries of 40 and 35 1000 b/cd First refinery: mean = 74, stand. dev. = 8 Second refinery: mean = 78, stand. dev. = 7 Questions: difference of means? variances? differences of variances Again Statistics Can Help!!!! 51 Differences of Means Ho: µ1 - µ2 = 0 Ho: µ1 - µ2 0 X1 X 2 2 2 1 2 n1 n 2 X1 and X2 normal, known variance or large sample known variance 5% = 10% 5% -Zc Z 52 Differences of Means Ho: µ1 - µ2 = 0 Ho: µ1 - µ2 0 X1 X 2 74 78 0.958 2 2 2 2 σ1 σ 2 8 7 n1 n 2 40 35 n1 = 40, n2 = 35 5% X1 = 74, 1 = 8 X2 = 78, 2 = 7 5% -Z=-1.645c Z -1.645 53 Difference of Means X normal Unknown but equal variances Do above test with t n1 n 2 2 X1 X 2 ( n1 1 ) ŝ12 ( n 2 1 ) ŝ 22 n1 n 2 n1 n 2 2 n1 n 2 54 Variance test (2 distribution) ( n 1) Ŝ 2 2 Two tailed /2 /2 2 55 Variance test (2 distribution) ( n 1) Ŝ 2 2 One tailed 2 56 Hypothesis Test on Variance Suppose best practice in refinery 2 = 6 Does refinery 2 have different variability than best practice? Ho: 2 = 6 H1: 2 6.5 Example: 2nd mine, n –1 = 34, Standard deviation = 7 ( n 1) Ŝ 2 P( ) 1 2 2 2 c1 c2 57 Hypothesis Test on Variance /2 Ho: 2 = (6.5) 2 H1: 2 6.52 Example: 2nd mine, n –1 = 34, Standard deviation = 7 = 10% ( n 1) Ŝ 2 ( 35 1)7 2 46.278 2 2 6 2 ( n 1 ) Ŝ 2 P( 2 ) 1 2 c1 c2 58 Hypothesis Test on Variance /2 Suppose best practice in refinery Ho: 2 = 6.5 H1: 2 6.5 Example: 2nd mine, n –1 = 34, Standard deviation = 7 chiinv(0.95,34), chiinv(0.05,34) 21.664,48.603 59 Variance test (2 distribution) ( n 1) Ŝ 46.278 2 2 2 Two tailed 0.05 0.05 21.664 48.602 60 Variance test (2 distribution) More variance than best practice Ho: 2 = 6.5 H1: 2 > 6.5 One tailed 0.10 61 Variance test (2 distribution) More variance than best practice Ho: 2 = 6.5 H1: 2 > 6.5 One tailed 2 ( n 1 ) Ŝ 2 46.278 2 0.10 chiinv(0.10,34)=44.903 62 Testing if Variances the Same F Distribution 2 samples of size n1 and n2 sample variances: ŝ12, ŝ22, Ho: 12 = 22 => Ho: 22/12= 1 Ho: 12 22 => Ho: 22/12 1 Ŝ12 F 2 2 S 2 1 2 2 Ŝ 2 1 2 2 Ŝ 2 2 2 1 is Fn11,n 21 63 Testing if Variances the Same F Distribution Ho: 12/22= 1 H1: 12/22 Ŝ 1 2 1 Ŝ Two tailed /2 /2 2 2 64 Testing if Variances the Same F Distribution Ho: 22/12= 1 Ŝ H1: 22/12>1 2 1 Ŝ One tailed =10 2 2 65 Example Testing if Variances the Same 2 samples of size n1 = 40 and n2 = 35 sample variances: ŝ12= 82, ŝ22 = 72 Ho: 22/12= 1 Ho: 22/12 1 2 2 Ŝ 1 2 P( Finv(0.95, 39, 34) [0.579, 82/72=1.306 Ŝ 2 2 2 1 Finv(0.05, 39, 34)) 1 0.10 1.749] 66 Testing if Variances the Same F Distribution Ho: 12/22= 1 H1: 12/22 Ŝ 1 2 1 Ŝ 2 2 1.306 Two tailed 0.05 Finv(0.95,39,34)=0.579 0.05 Finv(0.05,39,34)=1.749 67 Testing if Variances the Same F Distribution Ho: 22/12= 1 H1: 22/12 Ŝ 1 2 1 Ŝ 2 2 1.306 One tailed 0.05 Finv(0.10,39,34)=1.544 68 Power of a test Type II error: = P(Fail to reject Ho | H1 is true) Power = 1- μ=4 μ=6 0 2 69 Power of a test Type II error: = P(Fail to reject Ho | H1 is true) Power = 1- μ=4 μ=6 0 2 70 Power of a test Researcher controls level of significance, Increase what happens to ß? 71 Raise Type I ( ) What happens to Type II (ß) Ho: µ = 4 not true µ = 6 true X-µ not mean 0 but mean 2 ß μ=4 μ=6 0 2 72 Higher What happens to Type II? μ=4 μ=6 ß 0 Increase ß, reduce 2 73 Operating Characteristic Curve μ= μ0 μ=μ1 H0 H1 ß -10 -5 Zβ 5 Can graph against called operating characteristic curve useful in experimental design 10 74 Operating Characteristic Curve μ=μ0 μ=μ1 H0 H1 ß -10 -5 5 Zβ μ=μ0 10 μ=μ2 H1 H0 ß -10 -5 Zβ 5 10 75 Fitting a probability distribution Is electricity demand a log-normal distribution Observed Mean: 18.42 Observed Variance 43 Observations : 20 9.8261 20.8787 35.6834 13.1139 15.9879 13.2253 20.2954 18.1785 24.3539 16.4685 30.2449 14.182 20.275 17.243 12.8461 9.2554 23.3099 17.2652 21.9764 13.9045 76 Fitting a probability distribution Does electricity demand follow a normal distribution? 9.8261 20.8787 35.6834 13.1139 15.9879 13.2253 20.2954 18.1785 24.3539 16.4685 30.2449 14.182 20.275 17.243 12.8461 Observed Mean: 18.42 Observed Variance: 43 Observations : 20 9.2554 23.3099 17.2652 21.9764 13.9045 77 You can test your model graphically: 1. Order observations from smallest Y1 to largest Yn 2. Compute cumulative frequency distribution 3. Plot ordered observations versus Pi on special probability sheet 4. If straight line within critical range can’t reject normal 78 You can test your model graphically: 9.26 9.83 12.85 13.11 13.23 13.90 14.18 15.99 16.47 17.24 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50 17.27 18.18 20.28 20.30 20.88 21.98 23.31 24.35 30.24 35.68 0.55 0.60 0.65 0.70 0.75 0.80 0.85 0.90 0.95 1.00 79 Or use the Graph/Probability Plot … Option in Minitab 80 Statistical test of distribution Ho: Xe N(µ,2) H1: Xe does not follow N(µ,2) Order data Estimate sample mean & variance Observed Mean: 18.42 Observed Variance: 43 Observations : 20 2 statistic goodness of fit of model 81 Statistical test of distribution 9.26 9.83 12.85 13.11 13.23 13.90 14.18 15.99 16.47 17.24 17.27 18.18 20.28 20.30 20.88 21.98 23.31 24.35 30.24 35.68 Again order sample Create m = 5 categories <10 10-15 15-20 20-25 >25 82 Statistical test of distribution 9.26 9.83 12.85 13.11 13.23 13.90 14.18 15.99 16.47 17.24 17.27 18.18 20.28 20.30 20.88 21.98 23.31 24.35 30.24 35.68 Actual frequencies <10 2 10-15 5 15-20 5 20-25 6 >25 2 83 Statistical test of distribution Frequencies actual expected <10 15-20 2 Normdist(10,18.42,6.56,1)*20 (Normdist(15,18.42,6.56,1) 5 Normdist(10,18.42,6.56,1)*20 (Normdist(20,18.42,6.56,1) 5 Normdist(15,18.42,6.56,1)*20 20-25 6 >25 2 10-15 84 Statistical test of distribution Frequencies Observed Expected <10 2 1.99 10-15 5 4.03 15-20 5 5.88 20-25 6 4.94 >25 2 3.16 85 2 Goodness of Fit Test Is based on: 2 = m 2 (oi-ei) /ei i=1 df = m – k – 1 k = number of parameters replaced by estimates oi: observed frequency, ei: expected frequency 86 Statistical test of distribution Frequencies 2= (oi-ei)2/ei oi ei <10 2 1.99 +(2-1.99)2/1.99 10-15 5 4.03 +(5-4.03)2/4.03 5.88 +(5-5.88)2/5.88 15-20 5 20-25 6 4.94 >25 2 3.16 +(6-4.94)2/4.94 +(2-3.19)2/3.16 = 1.04 87 Statistical test of distribution Ho: X N(µ,2) H1: X ~ does not follow N(µ,2) df = m – k – 1= 5 – 2 - 1 2= (oi-ei)2/ei= 1.04 CHIINV(0.05,2)=5.99 88 Outline of Topics (Continued) Estimation Theory/Hypotheses Testing Relationship Operating Characteristic Curves and Power of a Test Fitting Theoretical Distributions to Sample Frequency Distributions Chi-Square Test for Goodness of Fit 89 Sum Up Chapter 7 Hypothesis testing null vs alternative null with equal sign null often status quo alternative often what want to prove type I error vs type II error type I called level of significance P – values 1-ß = power of test = probability of rejecting false one tailed vs two tailed 90 Sum Up Chapter 7 Hypothesis tests mean – Normal test population normal, known variance large sample mean – t test X n population normal, unknown variance, X small sample ŝ n 91 Sum Up Chapter 7 Normal and t 92 Sum Up Chapter 7 Hypothesis tests difference of means – Normal test population normal, known variance X1 X 2 12 22 n1 n 2 93 Sum Up Chapter 7 Hypothesis tests variance 2 ( n 1 ) Ŝ 2 2 Are variances equal Ŝ 2 1 2 2 Ŝ 2 2 2 1 is Fn11,n 21 94 Sum Up Chapter 7 2 and F 95 Sum Up Chapter 7 How is random variable distributed normal – graph cumulative frequency distribution special paper straight line Statistical 2k-m-1= (oi-ei)2/ei k = categories m = estimated parameters always 1 tailed End of Chapter 7! 96