Download Slides: The art of computer science analysis

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
The art of computer science analysis
Emmanuel Jeannot
INRIA
Complex HPC Spring School
May, 10 2011
E. Jeannot (INRIA)
The art of computer science analysis
May, 10 2011
1 / 19
Outline
1
Comparing System Using Sample Data
2
Conclusion
E. Jeannot (INRIA)
The art of computer science analysis
May, 10 2011
2 / 19
Comparing System Using Sample Data
Outline
1
Comparing System Using Sample Data
2
Conclusion
E. Jeannot (INRIA)
The art of computer science analysis
May, 10 2011
3 / 19
Comparing System Using Sample Data
Comparing systems using sample data
[Jain 91, Chap 13]
Determine the confidence interval of the mean
Comparing two alternatives
Confidence interval for proportion
Determining sample size
E. Jeannot (INRIA)
The art of computer science analysis
May, 10 2011
4 / 19
Comparing System Using Sample Data
Determine the confidence interval of the mean
Problem
S = {x1 , . . . , xn }: a set of results
Determine the mean µ of S, such that:
P(c1 ≤ µ ≤ c2 ) = 1 − α
α: significance level (e.g. 0.01)
1 − α: confidence level (e.g. 0.99)
Notations
n: number of experiments
P
x̄ = n1
xi : sample mean
q
1 P
s = n−1
(x̄ − xi )2 : unbiased estimation of the standard
deviation
E. Jeannot (INRIA)
The art of computer science analysis
May, 10 2011
5 / 19
Comparing System Using Sample Data
When n is large (n ≥ 30)
√
Central-limit theorem: x̄ ∼ N (µ, σ/ n)
µ (resp. σ): true mean (resp. the true std. dev.) of the distribution of the xi .
E. Jeannot (INRIA)
The art of computer science analysis
May, 10 2011
6 / 19
Comparing System Using Sample Data
When n is large (n ≥ 30)
√
Central-limit theorem: x̄ ∼ N (µ, σ/ n)
µ (resp. σ): true mean (resp. the true std. dev.) of the distribution of the xi .
x̄−µ
√ . Z ∼ N ( x̄−µ
√ , 1) ∼ N (0, 1).
Z = σ/
n
σ/ n
P(−c ≤ Z ≤ c) = 1 − α ⇔ c = z1−α/2
zi : value of the i th quantile of a unit normal variate.
α = 0.1 : z1−α/2 = z0.95 = 1.64
E. Jeannot (INRIA)
The art of computer science analysis
May, 10 2011
6 / 19
Comparing System Using Sample Data
When n is large (n ≥ 30)
√
Central-limit theorem: x̄ ∼ N (µ, σ/ n)
µ (resp. σ): true mean (resp. the true std. dev.) of the distribution of the xi .
x̄−µ
√ . Z ∼ N ( x̄−µ
√ , 1) ∼ N (0, 1).
Z = σ/
n
σ/ n
P(−c ≤ Z ≤ c) = 1 − α ⇔ c = z1−α/2
zi : value of the i th quantile of a unit normal variate.
α = 0.1 : z1−α/2 = z0.95 = 1.64
0
0.1
0.2
0.3
0.4
alpha=0.1
−4
−3
−2
−1
0
1
1.64 2
3
4
N(0,1)
E. Jeannot (INRIA)
The art of computer science analysis
May, 10 2011
6 / 19
Comparing System Using Sample Data
When n is large (n ≥ 30)
√
Central-limit theorem: x̄ ∼ N (µ, σ/ n)
µ (resp. σ): true mean (resp. the true std. dev.) of the distribution of the xi .
x̄−µ
√ . Z ∼ N ( x̄−µ
√ , 1) ∼ N (0, 1).
Z = σ/
n
σ/ n
P(−c ≤ Z ≤ c) = 1 − α ⇔ c = z1−α/2
zi : value of the i th quantile of a unit normal variate.
α = 0.1 : z1−α/2 = z0.95 = 1.64
0
0.1
0.2
0.3
0.4
alpha=0.1
−4
−3
−2
−1
0
1
1.64 2
3
4
N(0,1)
−c ≤
x̄−µ
√
σ/ n
√
√
≤ c ⇔ x̄ − cσ/ n ≤ µ ≤ x̄ + cσ/ n. However, s ≈ σ
E. Jeannot (INRIA)
The art of computer science analysis
May, 10 2011
6 / 19
Comparing System Using Sample Data
When n is large (n ≥ 30)
√
Central-limit theorem: x̄ ∼ N (µ, σ/ n)
µ (resp. σ): true mean (resp. the true std. dev.) of the distribution of the xi .
x̄−µ
√ . Z ∼ N ( x̄−µ
√ , 1) ∼ N (0, 1).
Z = σ/
n
σ/ n
P(−c ≤ Z ≤ c) = 1 − α ⇔ c = z1−α/2
zi : value of the i th quantile of a unit normal variate.
α = 0.1 : z1−α/2 = z0.95 = 1.64
0
0.1
0.2
0.3
0.4
alpha=0.1
−4
−3
−2
−1
0
1
1.64 2
3
4
N(0,1)
−c ≤
x̄−µ
√
σ/ n
√
√
≤ c ⇔ x̄ − cσ/ n ≤ µ ≤ x̄ + cσ/ n. However, s ≈ σ
With probability 1 − α
√
√
µ ∈ [x̄ − z1−α/2 s/ n, x̄ + z1−α/2 s/ n]
E. Jeannot (INRIA)
The art of computer science analysis
May, 10 2011
6 / 19
Comparing System Using Sample Data
Example
x̄ = 10
n = 64
s=2
α = 0.1 ⇒ z0.95 = 1.64 ⇒ µ ∈ [10 − 1.64 × 2/8, 10 + 1.64 × 2/8] ⇒
µ ∈ [9.59, 10.41]
α = 0.01 ⇒ z0.995 = 2.58 ⇒ µ ∈ [10 − 2.58 × 2/8, 10 + 2.58 × 2/8] ⇒
µ ∈ [9.35, 10.65]
R code
interval <-function(x,conf_level=0.9){
n<-length(x)
X<-mean(x)
s<-sd(x)
alpha<-1-conf_level
q<-qnorm(1-alpha/2)
return(c(X-q*s/sqrt(n),X+q*s/sqrt(n)))
}
E. Jeannot (INRIA)
The art of computer science analysis
May, 10 2011
7 / 19
Comparing System Using Sample Data
n ≤ 30 and xi follow a normal distribution
t(n): Student distribution with n degree of freedom.
x̄−µ
√ . Z ∼ t(n − 1).
Z = σ/
n
P(−c ≤ Z ≤ c) = 1 − α ⇔ c = t[1−α/2,n−1]
t[i,k ] : value of the i th quantile of a Student variate with k degree of freedom.
α = 0.1, n = 5 : t[0.95,4] = 2.13
E. Jeannot (INRIA)
The art of computer science analysis
May, 10 2011
8 / 19
Comparing System Using Sample Data
n ≤ 30 and xi follow a normal distribution
t(n): Student distribution with n degree of freedom.
x̄−µ
√ . Z ∼ t(n − 1).
Z = σ/
n
P(−c ≤ Z ≤ c) = 1 − α ⇔ c = t[1−α/2,n−1]
t[i,k ] : value of the i th quantile of a Student variate with k degree of freedom.
α = 0.1, n = 5 : t[0.95,4] = 2.13
With probability 1 − α
√
√
µ ∈ [x̄ − t[1−α/2,n−1] s/ n, x̄ + t[1−α/2,n−1] s/ n]
E. Jeannot (INRIA)
The art of computer science analysis
May, 10 2011
8 / 19
Comparing System Using Sample Data
n ≤ 30 and xi follow a normal distribution
t(n): Student distribution with n degree of freedom.
x̄−µ
√ . Z ∼ t(n − 1).
Z = σ/
n
P(−c ≤ Z ≤ c) = 1 − α ⇔ c = t[1−α/2,n−1]
t[i,k ] : value of the i th quantile of a Student variate with k degree of freedom.
α = 0.1, n = 5 : t[0.95,4] = 2.13
With probability 1 − α
√
√
µ ∈ [x̄ − t[1−α/2,n−1] s/ n, x̄ + t[1−α/2,n−1] s/ n]
R code
student_interval <-function(x,conf_level=0.9){
n<-length(x); X<-mean(x);s<-sd(x);alpha<-1-conf_level
q<-qt(1-alpha/2,n-1)
return(c(X-q*s/sqrt(n),X+q*s/sqrt(n)))
}
E. Jeannot (INRIA)
The art of computer science analysis
May, 10 2011
8 / 19
Comparing System Using Sample Data
Comparing two alternatives (paired observations)
6 benchmarks were used to compare two systems.
The observations are:
{(5.4,19.1),(16.6,3.5),(0.6,3.4),(7.3,1.7),(1.4,2.5),(0.6,3.6)}.
E. Jeannot (INRIA)
The art of computer science analysis
May, 10 2011
9 / 19
Comparing System Using Sample Data
Comparing two alternatives (paired observations)
6 benchmarks were used to compare two systems.
The observations are:
{(5.4,19.1),(16.6,3.5),(0.6,3.4),(7.3,1.7),(1.4,2.5),(0.6,3.6)}.
Is one system better than the other?
E. Jeannot (INRIA)
The art of computer science analysis
May, 10 2011
9 / 19
Comparing System Using Sample Data
Comparing two alternatives (paired observations)
6 benchmarks were used to compare two systems.
The observations are:
{(5.4,19.1),(16.6,3.5),(0.6,3.4),(7.3,1.7),(1.4,2.5),(0.6,3.6)}.
Is one system better than the other?
Differences: 6 observations: {-13.7,13.1,-2.8,-1.1,-3.0,5.6}
Sample means x̄ = −0.32
Sample standard deviation s = 9.03
These observation are likely to follow a normal distribution (P value of
Shapiro/Wills test = 0.82>0.1): we can use the student distribution.
E. Jeannot (INRIA)
The art of computer science analysis
May, 10 2011
9 / 19
Comparing System Using Sample Data
Comparing two alternatives (paired observations)
6 benchmarks were used to compare two systems.
The observations are:
{(5.4,19.1),(16.6,3.5),(0.6,3.4),(7.3,1.7),(1.4,2.5),(0.6,3.6)}.
Is one system better than the other?
Differences: 6 observations: {-13.7,13.1,-2.8,-1.1,-3.0,5.6}
Sample means x̄ = −0.32
Sample standard deviation s = 9.03
These observation are likely to follow a normal distribution (P value of
Shapiro/Wills test = 0.82>0.1): we can use the student distribution.
α = 0.1, t[0.95,5] = 2.015. 90% confidence interval:
√
√
µ ∈ [−0.32−2.015×9.03/ 6, −0.32+2.015×9.03/ 6] = [−7.76, 7.12]
E. Jeannot (INRIA)
The art of computer science analysis
May, 10 2011
9 / 19
Comparing System Using Sample Data
Comparing two alternatives (paired observations)
6 benchmarks were used to compare two systems.
The observations are:
{(5.4,19.1),(16.6,3.5),(0.6,3.4),(7.3,1.7),(1.4,2.5),(0.6,3.6)}.
Is one system better than the other?
Differences: 6 observations: {-13.7,13.1,-2.8,-1.1,-3.0,5.6}
Sample means x̄ = −0.32
Sample standard deviation s = 9.03
These observation are likely to follow a normal distribution (P value of
Shapiro/Wills test = 0.82>0.1): we can use the student distribution.
α = 0.1, t[0.95,5] = 2.015. 90% confidence interval:
√
√
µ ∈ [−0.32−2.015×9.03/ 6, −0.32+2.015×9.03/ 6] = [−7.76, 7.12]
The interval contains 0: hence the two systems are not different (with a
probability of 0.9)
E. Jeannot (INRIA)
The art of computer science analysis
May, 10 2011
9 / 19
Comparing System Using Sample Data
Comparing two alternatives (unpaired observations)
Unpaired observations: no correspondence between the two samples
(you cannot subtract them pairwise)
Example: measured bandwidth between Europe and America and
between Europe and Asia.
The Student test (t-test) can compute the confidence interval of the
difference of the means.
R code
r<-t.test(x,y,paired=FALSE,conf.level=0.9)
r$conf.int[1]
r$conf.int[2]
E. Jeannot (INRIA)
The art of computer science analysis
May, 10 2011
10 / 19
Comparing System Using Sample Data
Confidence interval for proportions
System A is better than system B for n1 among n experiments.
1
Sample proportion: p̂1 = nn1 p̂2 = 1 − p̂1 = n−n
n
E. Jeannot (INRIA)
The art of computer science analysis
May, 10 2011
11 / 19
Comparing System Using Sample Data
Confidence interval for proportions
System A is better than system B for n1 among n experiments.
1
Sample proportion: p̂1 = nn1 p̂2 = 1 − p̂1 = n−n
n
n1 ∼ B(n, p1 ) (p1 the true probability that A outperform B).
E. Jeannot (INRIA)
The art of computer science analysis
May, 10 2011
11 / 19
Comparing System Using Sample Data
Confidence interval for proportions
System A is better than system B for n1 among n experiments.
1
Sample proportion: p̂1 = nn1 p̂2 = 1 − p̂1 = n−n
n
n1 ∼ B(n, p1 ) (p1 the true probability that A outperform B).
if np1 ≥ 10 and n(1 − p1p
) ≥ 10
n1 ∼ B(n, p1 ) ∼ N (np1 , np1 (1− p1 ))
q
q
p̂1 p̂2
1)
∼
N
p
,
⇔ p̂1 = nn1 ∼ N p1 , p1 (1−p
1
n
n
E. Jeannot (INRIA)
The art of computer science analysis
May, 10 2011
11 / 19
Comparing System Using Sample Data
Confidence interval for proportions
System A is better than system B for n1 among n experiments.
1
Sample proportion: p̂1 = nn1 p̂2 = 1 − p̂1 = n−n
n
n1 ∼ B(n, p1 ) (p1 the true probability that A outperform B).
if np1 ≥ 10 and n(1 − p1p
) ≥ 10
n1 ∼ B(n, p1 ) ∼ N (np1 , np1 (1− p1 ))
q
q
p̂1 p̂2
1)
∼
N
p
,
⇔ p̂1 = nn1 ∼ N p1 , p1 (1−p
1
n
n
With probability 1 − α
"
r
p1 ∈ p̂1 − z1−α/2
E. Jeannot (INRIA)
p̂1 p̂2
, p̂1 + z1−α/2
n
The art of computer science analysis
r
p̂1 p̂2
n
#
May, 10 2011
11 / 19
Comparing System Using Sample Data
Confidence interval for proportions
System A is better than system B for n1 among n experiments.
1
Sample proportion: p̂1 = nn1 p̂2 = 1 − p̂1 = n−n
n
n1 ∼ B(n, p1 ) (p1 the true probability that A outperform B).
if np1 ≥ 10 and n(1 − p1p
) ≥ 10
n1 ∼ B(n, p1 ) ∼ N (np1 , np1 (1− p1 ))
q
q
p̂1 p̂2
1)
∼
N
p
,
⇔ p̂1 = nn1 ∼ N p1 , p1 (1−p
1
n
n
With probability 1 − α
"
r
p1 ∈ p̂1 − z1−α/2
p̂1 p̂2
, p̂1 + z1−α/2
n
r
p̂1 p̂2
n
#
If the interval contains 0.5, we cannot conclude that A outperforms B.
E. Jeannot (INRIA)
The art of computer science analysis
May, 10 2011
11 / 19
Comparing System Using Sample Data
Example
An experiment is repeated 40 times. System A is found superior to
system B 30 times, can we state with 99% confidence that system A is
superior?
E. Jeannot (INRIA)
The art of computer science analysis
May, 10 2011
12 / 19
Comparing System Using Sample Data
Example
An experiment is repeated 40 times. System A is found superior to
system B 30 times, can we state with 99% confidence that system A is
superior?
n = 40, n1 = 30
p̂
q1 = 30/40
q = 0.75 (np̂1 = 30, n(1 − p̂1 ) = 10)
p̂1 p̂2
n
=
0.75×0.25
40
E. Jeannot (INRIA)
= 0.068
The art of computer science analysis
May, 10 2011
12 / 19
Comparing System Using Sample Data
Example
An experiment is repeated 40 times. System A is found superior to
system B 30 times, can we state with 99% confidence that system A is
superior?
n = 40, n1 = 30
p̂
q1 = 30/40
q = 0.75 (np̂1 = 30, n(1 − p̂1 ) = 10)
p̂1 p̂2
n
=
0.75×0.25
40
= 0.068
α = 0.01, z0.995 = 2.58
E. Jeannot (INRIA)
The art of computer science analysis
May, 10 2011
12 / 19
Comparing System Using Sample Data
Example
An experiment is repeated 40 times. System A is found superior to
system B 30 times, can we state with 99% confidence that system A is
superior?
n = 40, n1 = 30
p̂
q1 = 30/40
q = 0.75 (np̂1 = 30, n(1 − p̂1 ) = 10)
p̂1 p̂2
n
=
0.75×0.25
40
= 0.068
α = 0.01, z0.995 = 2.58
p1 ∈ [0.75 − 2.58 × 0.068, 0.75 + 2.58 × 0.068] = [0.57, 0.92]
The confidence interval does not include 0.5. Hence, we can conclude
with 99% confidence that system A is superior than system B.
E. Jeannot (INRIA)
The art of computer science analysis
May, 10 2011
12 / 19
Comparing System Using Sample Data
Code
R code
proportion_test <-function(x,conf_level=0.9){
n<-length(x)
X<-mean(x)
n1<-sum(findInterval(x,1))
n2<-n-n1
p1<-n1/n
p2<-n2/n
if(p1*n<10 || p2*n<10){
stop("Cannot apply normal approximation!")
}
alpha<-1-conf_level
q<-qnorm(1-alpha/2)
s<-sqrt(p1*p2/n)
return(c(p1-q*s,p1+q*s))
}
E. Jeannot (INRIA)
The art of computer science analysis
May, 10 2011
13 / 19
Comparing System Using Sample Data
R is nice!
R provides all the above function
Confidence interval of the mean :
r<-t.test(x,conf.level=0.9)
Comparing paired experiment :
r<-t.test(x,y,paired=TRUE,conf.level=0.9)
Comparing unpaired experiment :
r<-t.test(x,y,paired=FALSE,conf.level=0.9)
CI for proportion : r<-prop.test(n1,n,conf.level=0.9)
r=binom.test(n1,n,conf.level=0.9)
inf<-r$conf.int[1]
sup<-r$conf.int[2]
E. Jeannot (INRIA)
The art of computer science analysis
May, 10 2011
14 / 19
Comparing System Using Sample Data
Computing the number of experiments
Problem
You have a confidence interval, how many more experiments (n) you
need to reduce your confidence interval to a given level ()?
√
√
CI of the mean: µ ∈ [x̄ − z1−α/2 s/ n, x̄ + z1−α/2 s/ n]
If you want: µ ∈ [x̄(1 − ), x̄(1 + )]
z
s 2
n ≥ 1−α/2
x̄
q
q
CI of a proportion: p1 ∈ p̂1 − z1−α/2 p̂1np̂2 , p̂1 + z1−α/2 p̂1np̂2
If you want p1 ∈ [p̂1 − , p̂1 + ]
n≥
E. Jeannot (INRIA)
2
z1−α/2
p̂1 p̂2
2
The art of computer science analysis
May, 10 2011
15 / 19
Comparing System Using Sample Data
Example
For the mean
x̄ = 10, s = 2
E. Jeannot (INRIA)
The art of computer science analysis
May, 10 2011
16 / 19
Comparing System Using Sample Data
Example
For the mean
x̄ = 10, s = 2
α = 0.1 ⇒ z0.95 = 1.64
E. Jeannot (INRIA)
The art of computer science analysis
May, 10 2011
16 / 19
Comparing System Using Sample Data
Example
For the mean
x̄ = 10, s = 2
α = 0.1 ⇒ z0.95 = 1.64
1.64×2 2
= 0.05 ⇒ n ≥ 10∗0.05
= 43
E. Jeannot (INRIA)
The art of computer science analysis
May, 10 2011
16 / 19
Comparing System Using Sample Data
Example
For the mean
x̄ = 10, s = 2
α = 0.1 ⇒ z0.95 = 1.64
1.64×2 2
= 0.05 ⇒ n ≥ 10∗0.05
= 43
For proportion
n = 40, n1 = 30 ⇒ p̂1 = 30/40 = 0.75, p̂2 = 0.25
E. Jeannot (INRIA)
The art of computer science analysis
May, 10 2011
16 / 19
Comparing System Using Sample Data
Example
For the mean
x̄ = 10, s = 2
α = 0.1 ⇒ z0.95 = 1.64
1.64×2 2
= 0.05 ⇒ n ≥ 10∗0.05
= 43
For proportion
n = 40, n1 = 30 ⇒ p̂1 = 30/40 = 0.75, p̂2 = 0.25
α = 0.01 ⇒ z0.995 = 2.58
E. Jeannot (INRIA)
The art of computer science analysis
May, 10 2011
16 / 19
Comparing System Using Sample Data
Example
For the mean
x̄ = 10, s = 2
α = 0.1 ⇒ z0.95 = 1.64
1.64×2 2
= 0.05 ⇒ n ≥ 10∗0.05
= 43
For proportion
n = 40, n1 = 30 ⇒ p̂1 = 30/40 = 0.75, p̂2 = 0.25
α = 0.01 ⇒ z0.995 = 2.58
= 0.005 ⇒ n ≥
E. Jeannot (INRIA)
2.582 ×0.75×0.25
0.005
= 250
The art of computer science analysis
May, 10 2011
16 / 19
Comparing System Using Sample Data
Outline
1
Comparing System Using Sample Data
2
Conclusion
E. Jeannot (INRIA)
The art of computer science analysis
May, 10 2011
17 / 19
Comparing System Using Sample Data
Conclusion
Computer-Science is also an experimental science
There are different and complementary approaches of doing
experiments in computer-science
Often, computer-scientists lack of tools and methods to perform
insightful experiments:
General methodology
Performance analysis
Statistics and probability
Data analysis and representation
E. Jeannot (INRIA)
The art of computer science analysis
May, 10 2011
18 / 19
Comparing System Using Sample Data
Further reading
E. Jeannot (INRIA)
The art of computer science analysis
May, 10 2011
19 / 19
Related documents