* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download File - Professor Fell
Degrees of freedom (statistics) wikipedia , lookup
History of statistics wikipedia , lookup
Foundations of statistics wikipedia , lookup
Bootstrapping (statistics) wikipedia , lookup
Taylor's law wikipedia , lookup
German tank problem wikipedia , lookup
Resampling (statistics) wikipedia , lookup
Chapter 8 Parameter Estimates and Hypothesis Testing Estimating the Population Standard Deviation • The SD and the mean of a population is an estimate because we don’t have all the scores (this is why it’s called “inferential statistics” because we are estimating) • Estimating σ: the sample SD tends to underestimate the σ -this is due to sampling error -since the sample SD underestimates σ it is called a biased estimator To correct for the biased estimator we subtract 1 from N • Estimating the Standard Error of the Mean • • Estimated Standard Error of the Mean: allows us to predict what the standard deviation of an entire distribution of means would be if we had measured the whole population. -ie. the standard deviation of the sampling distribution of means -σM symbolizes the standard deviation of an entire distribution of means -We can estimate σM from a single sample *when we do this it is called the Estimated Standard Error of the Means *symbolized as SEM • Formula: Standard error of the mean can’t be a negative number, in the same way an SD can’t! • Simplest raw score formula: Standard Error of the Mean Worksheet Estimating the Population Mean • Point Estimate: the sample mean (M) is used to estimate μ -the most precise (best) estimate • Confidence Intervals: a range of values is estimated within which it is assumed that μ is contained -Goal is to bracket the μ within a specific bracket of high and low sample means. EX: Let’s say we want to predict the average temperature for the next 4th of July. We could predict that the temp will fall between 95 & 105 degrees. Q: What would be the problem with saying the temp will be between 0 & 120 degrees? A: We wouldn’t be able to plan a picnic with that confidence interval because it isn’t precise at all! **.99 confidence interval isn’t as precise as .95 because it gives a bigger range that the mean could fall in Confidence Intervals • Using the normal curve to calculate confidence intervals (when σ is known): -A probability value can be calculated that indicates the degree of confidence we might have that μ is really in this interval -Typically, we like to be at least 95% sure that the predicted μ falls within our confidence interval. .95 confidence interval = ±1.96 σM + M **At a .95, any value that falls beyond z=1.96 leads to the conclusion that the mean is not part of the known population .99 confidence interval = ±2.58 σM + M **At a .99, any value that falls beyond z=2.58 leads to the conclusion that the mean is not part of the known population Usually we don’t know σ so we can’t calculate confidence intervals this way Standard Error of the Mean Homework Due Next Class t-distribution • • • • • Table C Use the t-distribution to calculate confidence intervals -when σ is unknown or you have a small sample size (N=30 or less) *usually it is unknown & with sampling, we often have a small N! The smaller the sample size, the less certain we are of normality of the entire sampling distribution. Therefore, we use a t-distribution which is a family of distributions each of which deviates from normality depending on sample size. t-distributions are distinguished by their degrees of freedom which are based on sample size. -df = N – 1 -as the df increases the t-distribution becomes more like the normal distribution The critical values on the t-distribution are at the .05 (95% confidence level) and .01 (99% confidence level) levels William Sealy Gossett Developer of the tdistribution Calculating Confidence Intervals • Use the t-distribution to determine critical values -NOTE: Critical values of t should be calculated using 3 digits after the decimal (as they appear in Table C & D). Step 1: Calculate df df = N – 1 Step 2: find ±t.05 or ±t.01 (as instructed) look up value in Table C at either the .05 level or .01 using the df Step 3: Complete this formula *Note answer will be 2 numbers (a range) One Sample t Tests • We can use t-tests to answer research questions • T-tests answer statistical questions such as: 1) Is the difference between the sample mean & μ statistically significant? 2) What is the probability that a sample mean could deviate from μ the amount that it does? 3) Is the sample from this population or not? • Example Research Problem: A researcher theorizes that the population mean among college students taking the new Social Conformity Test is a “neutral” 100. Scores higher than 100 represent more conformity than average and scores lower than represent less conformity than average. A random sample of 30 students was selected and found to have a mean of 103 with a standard deviation of 10.83. -Q: Using the t-test what statistical questions could we ask about this problem? One Sample t Tests • We must test the Null Hypothesis (Ho), the hypothesis of no difference Ho: μ1 = μ2 There is no significant difference between the sample mean and the population mean. OR The sample is from the population. Ha: μ1 ≠ μ2 The alternate hypothesis (the hypothesis of difference) says that the sample mean deviates enough from μ that we can conclude the sample is NOT from the population in question. ***Note: μ1 is our sample mean or our “point estimate” -ie. It’s a μ because theoretically, it estimates our population mean Calculating One Sample t Tests Step 1: determine the number of degrees of freedom df = N – 1 Step 2: calculate a t-value for our sample mean so we can see it’s relation to the μ EX: using the previous example N=30 M=103 SD=10.83 Step 3: Look at Table C Handout under the calculated df at the .05 level & compare the calculated t-value. -If your calculated t-value is equal to or greater than the table value then you reject the null hypothesis. -We can NOT reject our null because the table value is 2.045 (df=29) **we conclude that there is no difference between the sample mean & μ One-sample t test Worksheet One-sample t test Homework due Next Class Statistical Significance • In order to reject Ho, the t-values must fall within the .05 or .01 critical areas • Using the previous example, let’s say we did reject the null hypothesis & accept the alternate hypothesis • We must now say how unlikely our null hypothesis is -Instead of saying: “It is highly unlikely that there is not a difference between the sample mean and population mean” OR “it is highly unlikely that our sample mean is from our population” -We would say ”there is a significant difference between the population mean and the sample mean” or “the means have been found to be significantly different” -ie. the probability that the groups are the same or that the sample mean is from the population is very, very low • Well, how low is very low? -when the probability that the groups or means are the same is less than 5% or 1% -ie. “There is less than a 5% chance that these groups are the same” • Significance levels are called alpha levels and are represented by alpha symbol (α) or p-values (p<.05 or p<.01%) Type I Error • We use strict levels of significance to reduce the probability of committing a Type I Error (aka. Alpha Error) or Type II Error -Alpha Error: we reject the null (Ho) but we should have accepted it *ie. Saying there is a difference between the groups when there isn’t • The probability of making a type one error is equal to alpha (.05 or .01) -p <.05 (or α <.05) means that the probability of making a mistake in rejecting Ho is less than 5 in 100 *ie. At the .95 significance level, we are willing to make a mistake in rejecting Ho 5% of the time -p <.01 (or α <.01) means that the probability of making a mistake in rejecting Ho is less than 1 in 100 *ie. At the .99 significance level, we are willing to make a mistake in rejecting Ho 1% of the time Two Tail (Nondirectional) Sample two-tail question: Is there a difference between groups? •Use Table C for critical values •It doesn’t matter if the t-value is positive or negative One Tail (Directional) •Use Table D for critical values •It DOES matter if the t-value is positive or negative and will depend on the Ha Sample one-tail question: Does one group perform better (or worse), score higher (or lower) than another? Is one drug more (or less) likely to be effective? •If you hypothesize a positive tvalue & get a negative one or vice versa, you must accept Ho (even if it’s larger than the table value)
 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                            