Download Slides for Session #15

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Statistics for Social
and Behavioral Sciences
Session #15:
Interval Estimation, Confidence Interval
(Agresti and Finlay, Chapter 5)
Prof. Amine Ouazad
Statistics Course Outline
PART I. INTRODUCTION AND RESEARCH DESIGN
Week 1
PART II. DESCRIBING DATA
Weeks 2-4
PART III. DRAWING CONCLUSIONS FROM DATA:
INFERENTIAL STATISTICS
Weeks 5-9
Firenze or Lebanese Express’s ratings are within a MoE of each other!
PART IV. : CORRELATION AND CAUSATION:
REGRESSION ANALYSIS
This is where we talk
about Zmapp and Ebola!
Weeks 10-14
Last Session: Inference
Central Limit Theorem:
• with a large sample size N, the sampling distribution of the sample mean is
approximately normal.
• The mean of the sampling distribution is the population mean.
• The standard deviation of the sampling distribution is sX/√N, where sX is the
standard deviation of X.
• A conservative Margin of Error (= 2 standard errors)
for Cafe Firenze’s restaurant rating is 1.1 with 14
votes.
• For any rating from 1 to 5, the largest possible Margin
of Error is 4/√N, where N is the number of ratings.
• With TripAdvisor, we see the rating of each individual
customer, and so we can calculate sX!
Today
• Use this margin of error to provide interval
estimates:
– A 95% confidence interval for Café Firenze is [2.3,4.5].
– “The true rating of Café Firenze is between 2.3 and 4.5
with probability 95%”.
– Note: average was 3.4 and MoE was 1.1.
– A 95% confidence interval for Cory Gardner’s vote share
in Colorado is [48-3.6,48+3.6]=[44.4,51.6].
– “The true vote share for Cory Gardner is between
42.9% of the vote and 50.1% of the vote with 95%
probability”.
– Note: MoE was 3.6.
News: Last Tuesday
• We learnt the population proportion p !!!
– Proportion of voters for Cory Gardner.
• The latest poll was giving us a
sample proportion of the vote p (N around 1000).
Outline
1. Interval Estimation
Confidence Interval
2. Choosing between 90, 95, 99% confidence
3. When distributions are normal: t-distribution
Next time:
Estimation, Confidence Intervals (continued)
Chapter 5 of A&F
Parameters and Interval Estimate
• An interval estimate is an interval of numbers
around the point estimate, which includes the
parameter with probability either 90%, 95%,
or 99%.
• Example:
“the interval estimate
[156.2 cm – 0.49cm ; 156.2 cm + 0.49cm]
includes the population average height with probability 95%.”
• Sample mean: 156.2cm, MoE = 0.49 cm.
Parameters and Interval Estimate
• An interval estimate that includes the parameter with
probability 95% is called a 95% confidence interval.
• The expression “95% confidence interval” is widely used.
• Example:
“[156.2 cm – 0.49cm ; 156.2 cm + 0.49cm]
is a 95% confidence interval for the population average height.”
• Sample mean: 156.2cm, MoE = 0.49 cm.
We use 1.96 instead
of 2 from now on.
How do we build a
95% confidence interval?
Goal: estimate the population average m.
From previous sessions:
[m – MoE ; m + MoE] includes the sample mean with
probability 95%.
We conclude: the interval [m – MoE; m+MoE] includes the
population mean with probability 95%.
[m – MoE; m+MoE] is a 95% confidence interval for m.
MoE = 1.96 x Standard Error
Standard Error = sX/√N
Outline
1. Interval Estimation
Confidence Interval
2. Choosing between 90, 95, 99% confidence
3. When distributions are normal: t-distribution
Next time:
Estimation, Confidence Intervals (continued)
Chapter 5 of A&F
Choosing between 90%, 95%, 99%
• The interval estimate
[Sample Mean – MoE, Sample Mean + MoE]
includes the population mean (the parameter)
with probability:
• 99% if MoE = 2.58
• 95% if MoE = 1.96
• 90% if MoE = 1.65
* Standard Error
* Standard Error
* Standard Error
• The width of a confidence interval:
1. Increases as the confidence level increases.
2. Decreases as the sample size increases.
Building 90%, 95%, 99%
confidence intervals
Exercise:
• The sample mean weight (a sample of
individuals in the US) is 60.0 kg, and the
sample standard deviation is 29.9 kg.
• Find a 90% (resp., 95%, 99%) confidence
interval for the population mean weight.
Why 90%, 95%, 99%?
• Invented by Jerzy Newman in the 1930s.
• R.A. Fisher developed the theory of
statistical testing.
• Sample sizes were small at the time
(a few hundred), and 95% seemed
a reasonable confidence level.
• Medical sciences introduced
confidence intervals in medicine
soon after their discoveries.
• 95% became the standard.
R.A. Fisher
Outline
1. Interval Estimation
Confidence Interval
2. Choosing between 90, 95, 99% confidence
3. When distributions are normal: t-distribution
Next time:
Estimation, Confidence Intervals (continued)
Chapter 5 of A&F
Central Limit Theorem
• Requires a large sample size N.
• This is because it applies to any distribution of X.
• Example #1:
– We had a sample of N songs, and the number of times
Xi that song had been played.
– The number of times Xi a song is played on Spotify does
not have a normal distribution.
– But we can build a confidence interval for the average
number of times a song is played (m), provided we have
a large enough number N of songs.
– MoE = 1.96 * sX/√N for a 95% confidence
interval.
We can use our formulas to find
a 95% confidence interval for
m=360.63 as:
• N is large.
Even though X does not have a
normal distribution.
What if N is small?
• If N is “small”, the Central Limit Theorem does
not apply….
– We cannot use our formulas.
• “Small” ? Less than a few hundred (from
experience).
• If N is very small:
N=2
These sampling
distributions are not
normal.
N=5
If N is small
• sX is potentially very far from sx.
• But… we can still find confidence intervals if X is normal.
• The sampling distribution of the sample mean is Student’s
t distribution, with degrees of freedom (df) equal to N-1,
and with standard deviation sx/√N.
If N is small
A 95% confidence interval for the sample mean is:
[Sample Mean – MoE , Sample Mean + MoE]
With MoE = z * Standard Error.
• z= 1.96
when the df = ∞
• z> 1.96
when the df are small.
• See next table for the exact value of z.
t Table
Why is it called Student’s t
distribution?
• The t distribution was allegedly invented by a person
called Student.
• That “Student” was an engineer at Guinness’s
Factories in Ireland: William Sealy Gossett.
• He was producing small samples of a drink, seeking
guidance for industrial quality control:
– He was trying a small number of samples
(N=2,4, perhaps 7).
– And from these samples was trying to infer the quality of
all containers of the product (the population).
W.S. Gosset and Some Neglected Concepts in Experimental Statistics: Guinnessometrics II,
Stephen T. Ziliak, 2011.
Wrap up
• Interval estimates for a population mean
(a parameter) when N is large, for any distribution of X.
• Build a confidence interval for a parameter:
the interval [Sample Mean – MoE ; Sample + MoE]
includes the parameter with probability:
99% if MoE = 2.58 * Standard Error
95% if MoE = 1.96 * Standard Error
90% if MoE = 1.65 * Standard Error
• The t-distribution gives confidence intervals when the sample size
N is small… and when the distribution of X is normal.
• Use z given by Table 5.1 of Agresti and Finlay for degrees of
freedom N-1.
Coming up:
Readings:
• This week and next week:
– Chapter 5 entirely – estimation, confidence intervals.
• Online quiz deadline Tuesday 9am.
•
Deadlines are sharp and attendance is followed.
For help:
• Amine Ouazad
Office 1135, Social Science building
amine.ouazad@nyu.edu
Office hour: Tuesday from 5 to 6.30pm.
• GAF: Irene Paneda
Irene.paneda@nyu.edu
Sunday recitations.
At the Academic Resource Center, Monday from 2 to 4pm.
Related documents