Download document

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Statistical inference
Statistical inference

Definition : generalization from a sample to
a population.

2 cases:


Is a sample belongs to an hypothetical
population?
Is two samples belong to the same hypothetical
population?
Statistical inference

1st possibility
x  96
  100
x
x x
x
x x
x x x
1  100
?
Inference
?
 2  100
Statistical inference

2nd possibility
x1  104
  100
x
x x
x
x x
x x x
x2  110
x
x
x x
x
x x
x
1  2  0
?
Inference
?
1  2  0
Hypotheses
1
2
H0 :   k
H1 :   k
H 0 : 1  2
H1 : 1  2
H0= Null hypothesis
H1 = Alternative hypothesis
 = Mean of the population
k = Constant
H0= Null hypothesis
H1 = Alternative hypothesis
1 = Mean of the first population
2 = Mean of the second population
We test H0
Decision

From the sample(s) we decide if we reject or not the
null hypothesis.

When we are doing inference we are never certain
that we took the right decision
Population
Sample
Decision
Identical
Different
Identical
Good
Error 2
Different
Error 1
Good
Decision

2 type of errors:

1 – If we inferred that 2 groups belong to two different populations when
they don’t. We rejected H0 when H0 was true.

2 – If we inferred that 2 groups belong to the same population when they
don’t. We kept H0 when H0 was false.
Population
Sample
Decision
Identical
Different
Identical
Good
Error 2
Different
Error 1
Good
1- Inference about the mean of a
population
Sampling Distribution of the Mean
Sample (n)
Population
x1
x2
x
Sampling Distribution
of the Mean
x1
x2
x
  72
 3
  72
x  ?
x1
x2
x
Sampling Distribution of the Mean

Characteristics:



Follows a normal curve.
The mean will be equal to the one of the population
The standard deviation will be equal to
x 


n
 Standard Error
The larger the sample size is, the smaller the standard
error will be.
Sampling Distribution of the Mean
N=9
Sample
Population
x1
x2
Sampling Distribution
of the Mean
x1
x2
x10000
  72
 3
x10000
  71.9958
 x  0.9959
x1
x2
x10000
Sampling Distribution of the Mean
N=16
Sample
Population
x1
x2
Sampling Distribution
of the Mean
x1
x2
x10000
  72
 3
x10000
  71.9984
 x  0.74696
x1
x2
x10000
Sampling Distribution of the Mean
N=36
Sample
Population
x1
x2
Sampling Distribution
of the Mean
x1
x2
x10000
  72
 3
x10000
  72.0146
 x  0.50165
x1
x2
x10000
Sampling Distribution of the Mean
N=144
Sample
Population
x1
x2
Sampling Distribution
of the Mean
x1
x2
x10000
  72
 3
x10000
  72.0014
 x  0.24972
x1
x2
x10000
Test of Significance



If we suppose that the null hypothesis is true, what is the probability
of observing the giving sample mean?
If it is unlikely, we will reject H0, else we will keep H0.
Unlikely: 5% or 1% = a = significance threshold
zx 
x 
x
Test of Significance
Example: one side






H0:  = 72
H1:  < 72 (based on previous studies)
a = 0.05 (5%)
x 
65  72
zx 

 4, 67
=9
x
1,5
x = 65
n = 36

9
9
x 
n

36
za = 1.65

6
 1,5
Because zx is greater za we reject the null
hypothesis and accept the alternative
hypothesis
Test of Significance
Example : 2 sides






H0:  = 72
H1:   72
a = 0.05 (5%)
=9
x = 68
n = 36

9
9
x 
n

36
za = 1.96

6
zx 
 1,5
x 
x

68  72
 2, 667
1,5
Because zx is greater za we reject the null
hypothesis and accept the alternative
hypothesis
Confidence intervals

We are never sure that the mean of our sample is exactly the real
mean of the population. Therefore, instead of given the mean only, it
is possible de quantify our level of certitude by specifying a
confidence interval around the mean.
CI 1a  x  za x    x  za x
Confidence intervals
Example: CI = 95%

x = 50,7
n = 100
  = 20

20
20
x 


2

n
100
IC0.95  50, 7  3,92    50, 7  3,92
IC0.95  46, 78    54, 62
10
a = 1-IC = 1-0,95 = 0,05
za = 1.96
IC0.95  50, 7  1,96  2    50, 7  1,96  2
Therefore, there is a 95% probability that
the mean of the population is between
46.75 and 54.62
Confidence intervals
Example: CI = 99%

x = 50,7
n = 100
  = 20

20
20
x 


2

n
100
IC0.99  50, 7  5,16    50, 7  5,16
IC0.99  45,54    55,86
10
a = 1-IC = 1-0,99 = 0,01
za = 2.58
IC0.99  50, 7  2,58  2    50, 7  2,58  2
Therefore, there is a 99% probability that
the mean of the population is between
445.54 and 55.86
2- Inference for the difference
between two population means
distribution of sample mean differences
Samples (n)
Population
  72
 3
Distribution of sample
mean differences
x1
x2
x1  x2
x
x1  x
 0
 x x  ?
1
2
x1  x2
x1  x
Distribution of sample mean
differences

Characteristics:



 x x   2   2
1
2
x1
x2
Follows a normal distribution
The mean will be equal to 0 (1-2=0)
The standard deviation will be equal to:
The standard error of mean difference
Decision rule
zx x
1
zx x
1

x

1
2

x

1
2
 x2   1   2 
 x x
 x2 
 x x
1
2
1
2
, because 1   2   0
Test of Significance
Example: What is the probability of observed
difference between the following groups?



H0: 1 = 2 (1 - 2 = 0)
H1: 1  2 (1 - 2  0)
a = 0.05 (5%)
1
5
5
x 
1
n1

36

6
 0,833
2
5
5
 x2 

  0,833
n2
36 6
2
 x x
1
2
2
5 5
       1,18
6 6

x1 = 50

x2 = 48

1 = 5

2 = 5

n1 = 36

n2 = 36
z x1  x2 
x1  x2
 x x
1
2

50  48
 1, 69
1,18
Critical z  1.96
Test of Significance
Example: What is the probability of observed
difference between the following groups?



H0: 1 = 2 (1 - 2 = 0)
H1: 1  2 (1 - 2  0)
a = 0.05 (5%)

x1 = 50

x2 = 48

1 = 5

2 = 5

n1 = 36

n2 = 36
Because the observed z is lower than the critical
(za) we will keep the null hypothesis
Confidence intervals
IC1a   x1  x2   za x  1  2   x1  x2   za x
Test of Significance
Example: a 95% confidence interval



H0: 1 = 2 (1 - 2 = 0)
H1: 1  2 (1 - 2  0)
a = 0.05 (5%)

x1 = 50

x2 = 48

1 = 5

2 = 5

n1 = 36

n2 = 36
IC1a   x1  x2   za  x  1  2   x1  x2   za  x
IC0,95  (50  48)  1, 96 1,18  1  2  (50  48)  1, 96 1,18
IC0,95  2  2.3128  1  2  2  2.3128
IC0,95  0.3128  1  2  4.3128
Test of Significance
Example: a 95% confidence interval



H0: 1 = 2 (1 - 2 = 0)
H1: 1  2 (1 - 2  0)
a = 0.05 (5%)

x1 = 50

x2 = 48

1 = 5

2 = 5

n1 = 36

n2 = 36
Therefore there is a 95% probability that the mean
difference between the populations is between
-0.3128 and 4.3128
Related documents