* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download parametric statistics version 2[1].
Degrees of freedom (statistics) wikipedia , lookup
History of statistics wikipedia , lookup
Bootstrapping (statistics) wikipedia , lookup
Analysis of variance wikipedia , lookup
Regression toward the mean wikipedia , lookup
Misuse of statistics wikipedia , lookup
Resampling (statistics) wikipedia , lookup
Parametric statistics 922 Outline Measuring the accuracy of the mean  Practical notes for practice  Inferential statistics   T-test  ANOVA Measuring the accuracy of the mean The mean is the simplest statistical model that we use  This statistic predicts the likely score of a person  The mean is a summry statistic  Measuring the accuracy of the mean  The model we choose (mean/ median / mode) should represent the state in the real world  Does the model represent the world precisely?  The mean is a prefect representation only if all the scores we collect are the same as the mean. Mean  When the mean is a perfect fit: there is no difference between the mean and each data point Child Score 1 10 2 10 3 10 4 10 5 10 6 10 7 10 8 10 Mean 80/8=10 Mean   Usually, there are differences between the mean and the raw scores If the mean is representative of the data these differences are small. Child Score 1 10 2 9 3 8 4 12 5 8 6 11 7 10 8 12 Mean 80/8=10 Deviation  The differences between the model prediction (=mean) and each raw score is the deviation Child Score Mean 1 10 10 2 9 10 3 8 10 4 12 10 5 8 10 6 11 10 7 10 10 8 12 10 Mean 10 Deviation Deviation Compute the deviation of each score from the mean  Measure the overall deviation (sum)  Deviation Child Score Mean 1 10 10 0 2 9 10 1 3 8 10 2 4 12 10 -2 5 8 10 2 6 11 10 -1 7 10 10 0 8 12 10 -2 Mean 10 Sum 0 Deviation Raw score Mean Deviation 10 10 0 0 9 10 1 1 8 10 2 4 12 10 -2 4 8 10 2 4 11 10 -1 1 10 10 0 0 12 10 -2 0 4 Sum Squared dev. 18 Deviation   Sum of squared deviations (also called the sum of squared errors) is a good measure of the accuracy of the mean Except that it gets bigger with more scores Deviation Squared dev. 0 0 1 1 2 4 -2 4 2 4 -1 1 0 0 -2 0 4 18 Variance Divide sum of squared deviations by the number of scores minus 1  Sum of square deviations Number of scores (N) N-1 Variance Standard deviation 18 8 7 18/7=2.57 2.57=1.6   We can compare variance across samples Square root of variance is standard deviation Accuracy of the mean Sum of squared deviations (sum of squared errors), variance and standard deviation all measure the same thing: variability of the data  Standard deviation (SD) measures how well the mean represents the data: small SD indicate data points close to the mean  Standard Deviation (SD) SD close to the mean Mean SD Sentence 1 7 0.2 Sentence 2 5 0.5 Sentence 3 2 0.1 Sentence 4 2 0.5 Sentence 5 5 0.2 Sentence 6 6 0.2 Sentence 7 4 0.1 Sentence 8 6 0.2 Standard Deviation (SD) SD far from the mean Mean SD Sentence 1 7 1.5 Sentence 2 5 2.5 Sentence 3 2 1.5 Sentence 4 2 2.5 Sentence 5 5 3 Sentence 6 6 1.5 Sentence 7 4 2.5 Sentence 8 6 3.5 Why use number of scores minus 1?  We are using a sample to estimate the variance in the population Population? Sample-population The intended population of psycholinguistic research can be all people / all children aged 3 / etc.  Actually, we collect data only from a sample of the population we are interested in.  We use the sample to make a guess about the linguistic behavior of the relevant population.  Sample - population Size of the sample  The mean as a model is resistant to sampling variation: different samples from the same populations usually have a similar mean  Why use number of scores minus 1?  We are using a sample to estimate the variance in the population  Population? Why use number of scores minus 1? We are using a sample to estimate the variance in the population  Variance in the sample: observations can vary (5, 6, 2, 9, 3) mean=5  But: if we assume that the sample mean is the same as the population (mean=5)  Why use number of scores minus 1? For the next sample, not all observations are free to vary.  For a sample of (5, 7, 1, 8, ?) we already need to assume that the mean is 5. (?=4)  Why use number of scores minus 1? This does not mean we fix the value of the observation, but simply that for various statistics we have to calculate the number of observations that are free to vary.  This number is called: Degrees of freedom and it must be one less than the sample size (N-1).  To summarize  Mean  represents the sample  Sample  represents population Many samples - population Theoretically, if we take several samples from the same population  Each sample will have its own Mean and SD  If the samples are taken from the same population, they are expected to be reasonably similar.  Many samples mean Mean Frequency Population 10 8 1 Sample 1 9 9 2 Sample 2 11 10 3 Sample 3 10 11 2 Sample 4 12 12 1 Sample 5 9 Sample 6 10 Sample 7 11 Sample 8 8 Sampling distribution Mean Frequency 8 1 9 2 10 3 11 2 12 1 Average of all sample means will give the value for the population mean Sampling distribution    How accurate is a sample likely to be? Calculate the SD of the sampling distribution This is called the standard error of the mean (SE) Standard Error (SE)   We do not collect many samples, but compute SE SE= SD/N Mean SD Sentence 1 7 0.2 Sentence 2 5 0.5 Sentence 3 2 0.1 Sentence 4 2 0.5 Sentence 5 5 0.2 Sentence 6 6 0.2 Sentence 7 4 0.1 Sentence 8 6 0.2 Standard Error (SE) Mean SD SE 7 0.2 0.07 5 0.5 0.18 2 0.1 0.04 2 0.5 0.18 5 0.2 0.07 6 0.2 0.07 4 0.1 0.04 6 0.2 0.07 Why are the samples different?  Source of the variance:  Different population or  Sampling error (random effect, can be calculated)  Can we take results from the sample to make generalizations about the population? Accuracy of sample means Calculate the boundaries within which most sample means will fall.  Looking at the means of 100 samples, the lowest mean is 2, and the highest mean is 7.  The mean of any additional sample will fall within these limits.  Confidence Interval The limits within which a certain percent (typically we look at 95%) of sample means will fall.  If we collect 100 samples, 95 of them will have a mean within the confidence interval  PRACTICE  Experiment:  Compare children with specific language impairment (SLI) and children who are typically developing (TD).  Hypothesis:  effect of word order SVO vs. VSO  Task : repeat a sentence V CORRECT?  SVO YES/NO VSO YES/NO 30 Children, each was presented with 10 sentences (5 SVO, 5 VSO). SLI age gender SVO1 SVO2 VSO1 VSO2 Child 1 1 4;02 1 1 0 1 0 Child 2 0 3;04 1 0 1 1 0 Child 3 1 4;06 1 0 0 1 0 Child 4 0 3;11 0 1 1 1 0 Compute: Mean? Frequency? Mean SLI SVO VSO TD SVO VSO SD Basic analysis with Excel Descriptive statistics: Sum, Average, Percentage  Drawing graphs  Parametric statistics: Mean, Standard Deviation, t-test  Smart sheets: COUNTIF  INFERENTIAL STATISTICS Statistical hypothesis Hypothesis for the effect of a linguistic phenomenon  Findings from a sample  Do the findings support the hypothesis?   Do  they show a linguistic effect? To answer this, we consider a null hypothesis The null hypothesis (H0) H0= the experiment has no effect  The purpose of statistical inference is to reject this hypothesis  H1 = the mean of the population affected by the experiment is different from the general population  Rejecting the null hypothesis Compare the mean of the sample to two populations (under H1 or H0).  We cannot show the sample belongs to the population under H1.  All we can do is compare the sample to population under H0 and consider the likelihood that it belongs to it.  Rejecting the null hypothesis Check if our sample belongs to the population under H1 or H0  Consider confidence interval, SE  Compare means  Compare varience  Level of significance (alpha) Is the difference between the sample and the population big enough to reject H0?  Determine a critical value (alpha) as criterion for including the sample in the population   < 0.05  Parametric statistics Variables are on interval scale (at least)  Compute means of raw grade (several items to one condition)   t-tests  ANOVA  ANACOVA t-Tests t-tests are used in order to compare two samples and decide whether they are significantly different or not.  The t-tests represent the difference between the means of the two samples which takes into consideration the degree to which these means could differ by chance.  t-Tests The degree to which the means could differ by chance is the Standard Error (SE)  We do not calculate the t-value ourselves, but we use it to determine the effect of the experiment on the sample.  How do we know if the t-value is significant (p<0.05)?  t-Tests Every sample belongs to a different tcurve. This depends on the degree of freedom (df= N-1.)  Check the table of values called the Student's t-distributions, which is based on df determines. We mark the df on t.  t(32)=1.15  Types of t-tests:  Matched/Paired/Dependent t-test - compares two sets of scores for the same sample, or scores of matched samples. Sample can be of equal variance or unequal variance.  Independent (two sample) t-test - compares two different samples (on the same test). Samples can be of equal size or unequal size, of equal variance or unequal variance. The df for independent t-test is Nx-1+Ny-1. ANOVA - Comparing Means of more than two samples ANOVA - Analysis of variance. It considers within group variability as well as random between group variability and nonrandom between group variability.  The type of ANOVA depends on the research design - the number of independent and dependent variables.      One way ANOVA - one independent variable, one dependent variable with more than two values Two-Way Independent ANOVA - Two independent variables, with different participants in all groups (each person contributes one score). Two-Way Repeated Measures ANOVA Everything comes from the same participants Two-Way Mixed ANOVA - one independent variable is tested on the same participants, the other on different participants   F-score is the product of dividing the between group variance (which takes into consideration random and non-random variance) by the within group variance . For every F-score we can determine the significance based on the dfs (between groups and within groups). Post hoc comparisons      Post-hoc comparisons are used to find out where the differences which yielded significant come from. Tukey Test - used when the sample sizes are the same. Scheffe Test - used with unequal sample sizes, but could be used with equal sample sizes Bonferroni correction - when there are multiple comparisons the level of significance is divided by the number of test to avoid family-wise errors . These tests can also be used to test unplanned comparisons ANCOVA - Analysis of covariance Allows introducing covariates (factors other than the experimental design which might influence the results) into the ANOVA.
 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                            