Download Inferential Statistics

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Linear regression wikipedia , lookup

Regression analysis wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Transcript
Experimental Research
Methods in Language
Learning
Chapter 10
Inferential Statistics
Leading Questions
• What do you think is the main difference
between descriptive statistics and inferential
statistics?
• What is population?
• What is a sample?
• What is hypothesis testing?
The Logics of Inferential Statistics
• Inferential statistics are used to gain a better
understanding of the nature of the relationship
between two or more variables (e.g., linear or
causal-like relationships).
• A population = the totality of the people in which
they are interested)
• A sample is those drawn from that population
(i.e., a selection of people from the population)
• A parameter is a characteristic of a population
• A statistic is a characteristic of a sample that will
be used to infer a parameter.
Hypothesis and Inferential Statistics
• Inferential statistics are employed to estimate the
parameter of the target population.
• A null hypothesis is a statistical statement that
there is no relationship between two variables.
• Inferential statistics are used to test the null
hypothesis and aim to evaluate whether the null
hypothesis can be rejected.
• When the null hypothesis is rejected, researchers
can accept an alternative hypothesis which
states that there is a relationship.
Hypothesis Testing
• Hypothesis testing is a statistical approach to
investigating how well quantitative data support a
hypothesis.
• The null hypothesis (H0) is basically the prediction
that there is no relationship between two
variables, or no difference between two or more
groups of learners.
• When the data do not support the null hypothesis,
the researchers will accept the hypothesis called
the alternative hypothesis (H1) which is logically
the opposite of the null hypothesis.
Probability Value
• In order to reject the null hypothesis,
researchers must set a probability value (i.e.,
p-value).
• The probability value is directly set for testing
the null hypothesis, not for the alternative
hypothesis.
• In language learning research, for example,
researchers usually set a probability value to
be less than 0.05 (p < 0.05) or sometimes
equal to or less than 0.05 (p ≤ 0.05).
Statistical Significance
• In statistics, significance does not mean
importance in common language use.
• Statistical significance means that the probability
value from an inferential statistic in the data set
has the value less than 0.05 (p < 0.05), equal to or
less than 0.05 (p ≤ 0.05), or less than 0.01 (p < 0.01),
depending on what the researcher has set.
• What is meant when there is a statistical
significance at 0.05 is that there is a 5% chance
that the null hypothesis being rejected is correct.
Type I and Type II Errors
• When we reject a null hypothesis when it is in fact
true, we make a Type I error.
• For example, you found a correlation between
sizes of a t-shirt and vocabulary knowledge. When
in fact both things should not be theoretically
related. But of course, as we grow older, we need
to have a larger t-shirt to wear and at the same
time, we gain more vocabulary knowledge,
hence a positive correlation.
• We are wrong here because we reject the null
hypothesis instead of accepting it.
Type I and Type II Errors
• The significance level is related to the
possibility that researchers will have made
Type I error when they reject the null
hypothesis.
• The significance value, therefore, is the level
at which researchers agree to take the risk of
making the Type I error.
Type I and Type II Errors
• In statistical testing, there is a possibility that
we accept the null hypothesis when we
should reject it. This error is known as Type II
error.
• For example, we claim that it did not rain
when it actually did rain and when the street
was flooded.
• In language learning, it is like someone
arguing that the development of speaking
skills do not require any social interactions.
One-tailed or Two-tailed Test
• The one-tailed or two-tailed tests of
significance are related to the alternative
hypothesis.
• The one-tailed or two-tailed tests are
concerned with whether researchers specify
the alternative hypothesis (i.e., one-tailed) or
do not specify the alternative hypothesis (i.e.,
two-tailed).
One-tailed or Two-tailed Test
• If a researcher hypothesizes that teaching
method A is more effective than teaching
method B, he/she has specified an alternative
hypothesis. This is a directional alternative
hypothesis.
• However, if we are not sure whether one
method is better than the other, we just need
to say that the effectiveness of method A is
different to that of method B. This is a nondirectional alternative hypothesis.
One-tailed or Two-tailed Test
One-tailed or Two-tailed Test
• Note that the p-value in Figure 10.2 (page
199) should be 0.025 on both tails.
Degree of Freedom (df)
• When we aim to estimate a parameter of
interest, we need to have a degree of
freedom (df).
• A df is essentially the number of independent
pieces of information that we use to estimate
a parameter.
Sample Size
• The more sample we have, the better we
can infer about the nature of the relationship
between variables.
• Hatch and Lazaraton (1991), and Ary et al.
(2006) recommend a minimum sample size of
in quantitative research.
Parametric versus Non-parametric
Tests
• Parametric tests in statistical analysis can be
performed when required statistical
assumptions, such as the normality of the
distribution and linearity, are met in the data
set.
• Non-parametric tests are used when certain
statistical assumptions cannot be met in the
data set.
Overview of Statistical Tests
• Correlation is used to examine non-causal
relationships between two variables. Through
correlational analysis, researchers examine
whether one variable can systematically
decrease or increase together with another,
rather than one variable causing the change
in the other.
• Examples, Pearson Product Moment
correlation or Pearson r, and Point-Biserial
correlation, Spearman Rho correlation,
Kendall’s tau-b correlation, and Phi
correlation
Overview of Statistical Tests
• Regression analysis is an extension of
bivariate correlation analysis. It tests whether
a dependent variable can be predicted from
the values of one or more independent
variables.
• Simple regression examines the effect of just
one independent variable on the dependent
variable.
• Multiple regression allows us to examine the
effect of two or more independent variables
on the dependent variable.
Overview of Statistical Tests
• A t-test is a statistical procedure that allows
researchers to determine whether the difference
in the means of the data in two groups is
significant.
• A paired-samples t-test examines whether two
mean scores from the same group of participants
differ significantly (e.g., pretest-posttest
comparison).
• An independent-samples t-test investigates
whether the mean scores between two groups of
participants are significantly different (e.g.,
experimental and control group comparison).
Overview of Statistical Tests
• Analyses of Variance (ANOVAs) provide
inferential statistics similar to the t-tests above.
• For example, a within-group ANOVA is similar
to a paired-samples t-test.
• A between-groups ANOVA is similar to an
independent-samples t-test.
• An ANOVA can compare two or more mean
scores of two or more groups, whereas a t-test
can only compare the means of two groups.
Overview of Statistical Tests
• The analysis of covariance (ANCOVA) allows
researchers to control an extraneous variable
(treated as covariate) during the analysis.
• Extraneous variables include, for example,
pre-existing language proficiency differences
or a particular personal trait (e.g., age,
anxiety, motivation, prior language exposure).
Discussion
• What are differences between a null
hypothesis and an alternative hypothesis?
• What does ‘statistically significant’ mean?
• What is ‘practical significance’?
• How do you understand the concept of
statistical effect size?
Chi-square tests
• There are two kinds of chi-square (χ2) tests which
are non-parametric: The χ2 test for goodness of fit
and χ2 test for relatedness or independence.
• A χ2 test for goodness of fit helps researchers
examine whether a difference between two
groups of learners (e.g., males and female; group
A and group B) exists using frequency scores.
• A χ2 test for relatedness or independence uses
categorical or nominal data to test whether
paired variables are independent of each other.
Other Non-parametric Tests
• The Wilcoxon Signed Ranks Test = a nonparametric test, parallel to a paired-samples ttest.
• The Mann-Whitney U test = a non-parametric test
that can address a research question in a similar
manner to that of the independent-samples t-test.
• The Kruskal-Wallis tests = non-parametric tests that
have a function similar to that of the one-way
ANOVA (between-group)
• The Friedman test is parallel to the within-group
ANOVA.
Practical Significance and the Effect
Size
• An effect size is a magnitude-of-effect
estimate that is independent of sample size.
• A magnitude-of-effect estimate highlights the
distinction between statistical and practical
significance.
• Effect sizes can be classified as small,
medium, or large effect (Cohen 1988).
• Larson-Hall (2010, pp. 118-119) provides a
table of effect sizes, their formulas and
interpretations).
Different Effect Size Indices
• For example, Pearson correctional analysis
(Cohen 1988): Pearson r values of 0.10, 0.30,
and 0.50 are considered to indicate small,
medium, and large effect sizes respectively.
• In a t-test, Cohen’s d has been used as an
effect size index: d-values of 0.2, 0.5, and 0.8
indicate small, medium, and large effect
sizes, respectively.