Download Inferential Statistics

Experimental Research Methods in Language Learning Chapter 10 Inferential Statistics Leading Questions • What do you think is the main difference between descriptive statistics and inferential statistics? • What is population? • What is a sample? • What is hypothesis testing? The Logics of Inferential Statistics • Inferential statistics are used to gain a better understanding of the nature of the relationship between two or more variables (e.g., linear or causal-like relationships). • A population = the totality of the people in which they are interested) • A sample is those drawn from that population (i.e., a selection of people from the population) • A parameter is a characteristic of a population • A statistic is a characteristic of a sample that will be used to infer a parameter. Hypothesis and Inferential Statistics • Inferential statistics are employed to estimate the parameter of the target population. • A null hypothesis is a statistical statement that there is no relationship between two variables. • Inferential statistics are used to test the null hypothesis and aim to evaluate whether the null hypothesis can be rejected. • When the null hypothesis is rejected, researchers can accept an alternative hypothesis which states that there is a relationship. Hypothesis Testing • Hypothesis testing is a statistical approach to investigating how well quantitative data support a hypothesis. • The null hypothesis (H0) is basically the prediction that there is no relationship between two variables, or no difference between two or more groups of learners. • When the data do not support the null hypothesis, the researchers will accept the hypothesis called the alternative hypothesis (H1) which is logically the opposite of the null hypothesis. Probability Value • In order to reject the null hypothesis, researchers must set a probability value (i.e., p-value). • The probability value is directly set for testing the null hypothesis, not for the alternative hypothesis. • In language learning research, for example, researchers usually set a probability value to be less than 0.05 (p < 0.05) or sometimes equal to or less than 0.05 (p ≤ 0.05). Statistical Significance • In statistics, significance does not mean importance in common language use. • Statistical significance means that the probability value from an inferential statistic in the data set has the value less than 0.05 (p < 0.05), equal to or less than 0.05 (p ≤ 0.05), or less than 0.01 (p < 0.01), depending on what the researcher has set. • What is meant when there is a statistical significance at 0.05 is that there is a 5% chance that the null hypothesis being rejected is correct. Type I and Type II Errors • When we reject a null hypothesis when it is in fact true, we make a Type I error. • For example, you found a correlation between sizes of a t-shirt and vocabulary knowledge. When in fact both things should not be theoretically related. But of course, as we grow older, we need to have a larger t-shirt to wear and at the same time, we gain more vocabulary knowledge, hence a positive correlation. • We are wrong here because we reject the null hypothesis instead of accepting it. Type I and Type II Errors • The significance level is related to the possibility that researchers will have made Type I error when they reject the null hypothesis. • The significance value, therefore, is the level at which researchers agree to take the risk of making the Type I error. Type I and Type II Errors • In statistical testing, there is a possibility that we accept the null hypothesis when we should reject it. This error is known as Type II error. • For example, we claim that it did not rain when it actually did rain and when the street was flooded. • In language learning, it is like someone arguing that the development of speaking skills do not require any social interactions. One-tailed or Two-tailed Test • The one-tailed or two-tailed tests of significance are related to the alternative hypothesis. • The one-tailed or two-tailed tests are concerned with whether researchers specify the alternative hypothesis (i.e., one-tailed) or do not specify the alternative hypothesis (i.e., two-tailed). One-tailed or Two-tailed Test • If a researcher hypothesizes that teaching method A is more effective than teaching method B, he/she has specified an alternative hypothesis. This is a directional alternative hypothesis. • However, if we are not sure whether one method is better than the other, we just need to say that the effectiveness of method A is different to that of method B. This is a nondirectional alternative hypothesis. One-tailed or Two-tailed Test One-tailed or Two-tailed Test • Note that the p-value in Figure 10.2 (page 199) should be 0.025 on both tails. Degree of Freedom (df) • When we aim to estimate a parameter of interest, we need to have a degree of freedom (df). • A df is essentially the number of independent pieces of information that we use to estimate a parameter. Sample Size • The more sample we have, the better we can infer about the nature of the relationship between variables. • Hatch and Lazaraton (1991), and Ary et al. (2006) recommend a minimum sample size of in quantitative research. Parametric versus Non-parametric Tests • Parametric tests in statistical analysis can be performed when required statistical assumptions, such as the normality of the distribution and linearity, are met in the data set. • Non-parametric tests are used when certain statistical assumptions cannot be met in the data set. Overview of Statistical Tests • Correlation is used to examine non-causal relationships between two variables. Through correlational analysis, researchers examine whether one variable can systematically decrease or increase together with another, rather than one variable causing the change in the other. • Examples, Pearson Product Moment correlation or Pearson r, and Point-Biserial correlation, Spearman Rho correlation, Kendall’s tau-b correlation, and Phi correlation Overview of Statistical Tests • Regression analysis is an extension of bivariate correlation analysis. It tests whether a dependent variable can be predicted from the values of one or more independent variables. • Simple regression examines the effect of just one independent variable on the dependent variable. • Multiple regression allows us to examine the effect of two or more independent variables on the dependent variable. Overview of Statistical Tests • A t-test is a statistical procedure that allows researchers to determine whether the difference in the means of the data in two groups is significant. • A paired-samples t-test examines whether two mean scores from the same group of participants differ significantly (e.g., pretest-posttest comparison). • An independent-samples t-test investigates whether the mean scores between two groups of participants are significantly different (e.g., experimental and control group comparison). Overview of Statistical Tests • Analyses of Variance (ANOVAs) provide inferential statistics similar to the t-tests above. • For example, a within-group ANOVA is similar to a paired-samples t-test. • A between-groups ANOVA is similar to an independent-samples t-test. • An ANOVA can compare two or more mean scores of two or more groups, whereas a t-test can only compare the means of two groups. Overview of Statistical Tests • The analysis of covariance (ANCOVA) allows researchers to control an extraneous variable (treated as covariate) during the analysis. • Extraneous variables include, for example, pre-existing language proficiency differences or a particular personal trait (e.g., age, anxiety, motivation, prior language exposure). Discussion • What are differences between a null hypothesis and an alternative hypothesis? • What does ‘statistically significant’ mean? • What is ‘practical significance’? • How do you understand the concept of statistical effect size? Chi-square tests • There are two kinds of chi-square (χ2) tests which are non-parametric: The χ2 test for goodness of fit and χ2 test for relatedness or independence. • A χ2 test for goodness of fit helps researchers examine whether a difference between two groups of learners (e.g., males and female; group A and group B) exists using frequency scores. • A χ2 test for relatedness or independence uses categorical or nominal data to test whether paired variables are independent of each other. Other Non-parametric Tests • The Wilcoxon Signed Ranks Test = a nonparametric test, parallel to a paired-samples ttest. • The Mann-Whitney U test = a non-parametric test that can address a research question in a similar manner to that of the independent-samples t-test. • The Kruskal-Wallis tests = non-parametric tests that have a function similar to that of the one-way ANOVA (between-group) • The Friedman test is parallel to the within-group ANOVA. Practical Significance and the Effect Size • An effect size is a magnitude-of-effect estimate that is independent of sample size. • A magnitude-of-effect estimate highlights the distinction between statistical and practical significance. • Effect sizes can be classified as small, medium, or large effect (Cohen 1988). • Larson-Hall (2010, pp. 118-119) provides a table of effect sizes, their formulas and interpretations). Different Effect Size Indices • For example, Pearson correctional analysis (Cohen 1988): Pearson r values of 0.10, 0.30, and 0.50 are considered to indicate small, medium, and large effect sizes respectively. • In a t-test, Cohen’s d has been used as an effect size index: d-values of 0.2, 0.5, and 0.8 indicate small, medium, and large effect sizes, respectively.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Inferential Statistics