* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Confidence interval
		                    
		                    
								Survey							
                            
		                
		                
                            
                            
								Document related concepts							
                        
                        
                    
						
						
							Transcript						
					
					CPSC 531: Output Data Analysis
Instructor: Anirban Mahanti
Office: ICT 745
Email: mahanti@cpsc.ucalgary.ca
Class Location: TRB 101
Lectures: TR 15:30 – 16:45 hours
Slides primarily adapted from:
“The Art of Computer Systems Performance
Analysis” by Raj Jain, Wiley 1991.
[Chapters 12, 13, and 25]
CPSC 531: Data Analysis
1
Outline
 Measures of Central Tendency
Mean, Median, Mode
 How to Summarize Variability?
 Comparing Systems Using Sample Data
 Comparing Two Alternatives
 Transient Removal
CPSC 531: Data Analysis
2
Measures of Central Tendency (1)
 Sample mean – sum of all observations divided
by the total number of observations
Always exists and is unique
 Mean gives equal weight to all observations
 Mean is strongly affected by outliers
 Sample median – list observations in an
increasing order; the observation in the middle
of the list is the median;
Even # of observations – mean of middle two values
 Always exists and is unique
 Resistant to outliers (compared to mean)
CPSC 531: Data Analysis
3
Measures of Central Tendency (2)
mode
0.4
Mode may not exists (e.g.,
all sample have equal
weight)
More than one mode may
exist (i.e. bimodal)
If only one mode then
distribution is unimodal
0.2
0.1
0
0
4
8
12
x
16
20
mode
mode
0.2
PDF f(x)
0.15
0.1
0.05
0
0
4
8
12
16
20
x
mode
0.6
0.5
PDF f(x)
histogram from the
observations; find
bucket with peak
frequency; the middle
point of this bucket is
the mode;
PDF f(x)
 Sample mode – plot
0.3
0.4
0.3
0.2
0.1
0
0
4
8
12
x
CPSC 531: Data Analysis
4
Measure of Central Tendency (3)
 Is data categorical?
 Yes:
use mode
 e.g. most used resource in a system
 Is total of interest?
 Yes: use mean
 e.g. total response time for Web requests
 Is distribution skewed?
Yes: use median
• Median less influenced by outlier than mean.
No: use mean. Why?
CPSC 531: Data Analysis
5
Common Misuses of Means (1)
 Usefulness of mean depends on the number of
observations and the variance
E.g. two response time samples: 10 ms and 1000 ms.
Mean is 505 ms! Correct index but useless.
 Using mean without regard to skewness
System A
10
9
11
10
10
Mean: 10
Mode: 10
Min,Max: [9,11]
System B
5
5
5
4
31
10
5
[4,31]
CPSC 531: Data Analysis
6
Common Misuses of Means (2)
 Mean of a Product by Multiplying means
Mean of product equals product of means if the
two random variables are independent.
If x and y are correlated E(xy) != E(x)E(y)
Avg. users in system 23; avg. processes/user 2.
Avg. # of processes in system? Is it 46?
No! Number of processes spawned by users
depends on the load.
CPSC 531: Data Analysis
7
Outline
 Measures of Central Tendency
 How to Summarize Variability?
 Comparing Systems Using Sample Data
 Comparing Two Alternatives
 Transient Removal
CPSC 531: Data Analysis
8
Summarizing Variability
 Summarizing by a single number rarely enough.
Given two systems with same mean, we generally
prefer one with less variability
20%
4s
Mean=2s
Response Time
Frequency
Frequency
80% 1.5 s
60% ~ 0.001 s
~5 s
40%
Mean=2s
Response Time
 Indices of dispersion
• Range, Variance, 10- and 90-percentiles, Semi-interquantile
range, and mean absolute deviation
CPSC 531: Data Analysis
9
Range
 Easy to calculate; range = max – min
 In many scenarios, not very useful:
 Min may be zero
 Max may be an “outlier”
 With more samples, max may keep increasing and
min may keep decreasing → no “stable” point
 Range is useful if systems performance is
bounded
CPSC 531: Data Analysis
10
Variance and Standard Deviation
 Given sample of n observations {x1, x2, …, xn} the
sample variance is calculated as:
2
1 n
s 
 xi  x 
n  1 i 1
2
1 n
where x   xi
n i 1
 Sample variance: s2 (square of the unit of observation)
 Sample standard deviation: s (in unit of observation)
 Note the (n-1) in variance computation
 (n-1) of the n differences are independent
 Given (n-1) differences, the nth difference can be computed
 Number of independent terms is the degrees of freedom (df)
CPSC 531: Data Analysis
11
Standard Deviation (SD)
 Standard deviation and mean have same units
 Preferred!
 E.g. a) Mean = 2 s, SD = 2 s; high variability?
 E.g. b) Mean = 2 s, SD = 0.2 s; low variability?
 Another widely used measure – C.O.V
 C.O.V = Ratio of standard deviation to mean
 C.O.V does not have any units
 C.O.V shows magnitude of variability
 C.O.V in (a) is 1 and in (b) is .1
CPSC 531: Data Analysis
12
Percentiles, Quantiles, Quartiles
 Lower and upper bounds expressed in percents
or as fractions
90-percentile →0.9-quantile
 –quantile: sort and take [(n-1)+1]th observation
• [] means round to nearest integer
 Quartiles divide data into parts at 25%, 50%,
75% → quartiles (Q1, Q2, Q3)
25% of the observations ≤ Q1 (the first quartlie)
 Second quartile Q2 is also the median
 The range (Q3 – Q1) is interquartile range
 (Q3 – Q1)/2 is semi-interquartile (SIQR) range
CPSC 531: Data Analysis
13
Mean Absolute Deviation
 Mean absolute deviation is calculated as:
1 n
 xi  x
n i 1
CPSC 531: Data Analysis
14
Influence of Outliers
 Range: considerably
 Sample variance: considerably, but less than
range
 Mean absolute deviation: less than variance
Doesn’t square (aka magnify) the outliers
 SIQR range: very resistant
 Use SIQR for index of dispersion whenever
median is used as index of central tendency
CPSC 531: Data Analysis
15
Outline
 Measures of Central Tendency
 How to Summarize Variability?
 Comparing Systems Using Sample Data
 Sample vs. Population
 Confidence Interval for Mean
 Comparing Two Alternatives
 Transient Removal
CPSC 531: Data Analysis
16
Comparing Systems Using Sample Data
 The words “sample” and “example” have a
common root – “essample” (French)
 One sample does not prove a theory - a sample
is just an example
 The point is - definite statement cannot be
made about characteristics of all systems.
 However, probabilistic statements about the
range of most systems can be made
 Confidence interval concept as a building block
CPSC 531: Data Analysis
17
Sample versus Population
 Generate 1-million random numbers
with mean  and SD  and put them in an urn
 Draw sample of n observations
 {x1, x2, …, xn} has mean , standard deviation s
x
x
is likely different than !
 The population mean  is unknown or impossible
to obtain in many real-world scenarios
obtain estimate of  from
x
 Therefore,
CPSC 531: Data Analysis
18
Confidence Interval for the Mean
 Define bounds c1 and c2 such that:
Prob{c1 <  < c2} = 1-
(c1, c2) is confidence interval
  is significance level
 100(1- ) is confidence level
 Typically small  desired
 confidence level 90%, 95% or 99%
 One approach: take k samples, find sample
means, sort, and take the [1+0.05(k-1)]th as
c1 and [1+0.95(k-1)]th as c2
CPSC 531: Data Analysis
19
Central Limit Theorem
 We do not need many samples. Confidence
intervals can be determined from one sample
because ~ N(, /sqrt(n))
 SD of sample mean  /sqrt(n) called
Standard error
 Using the CLT, a 100(1- )% confidence
interval for a population mean is
( -z1-/2s/sqrt(n), +z1-/2s/sqrt(n))
x
x
x
 z1-/2
is the (1-/2)-quantile of a unit normal
variate (and is obtained from a table!)
 s is the sample SD
CPSC 531: Data Analysis
20
Confidence Interval Example
 CPU times obtained by repeating experiment
32 times. The sorted set consists of
{1.9,2.7,2.8,2.8,2.8,2.9,3.1,3.1,3.2,3.2,3.3,3.4,3.6,3.7,3.8,3.9,3.9
,4.1,4.1,4.2,4.2,4.4,4.5,4.5,4.8,4.9,5.1,5.1,5.3,5.6,5.9}
Mean = 3.9, standard deviation (s) = 0.95, n=32
 For 90% confidence interval z1-/2 = 1.645, and
we get {3.90 + (1.645)(0.95)/(sqrt(32))} =
(3.62,4.17)
CPSC 531: Data Analysis
21
Meaning of Confidence Interval
 What does this mean? With 90% confidence,
we can say population mean is within the above
bounds; that is, chance of error is 10%.
E.g., Take 100 samples and construct CI’s. In 10
cases, the interval will not contain population mean
x
-c
x
x
+c
90% chance that this interval contains 
CPSC 531: Data Analysis
22
Length of Confidence Interval
 Let z1-/2s/sqrt(n) = c
 Then, z1-/2 = (c.sqrt(n))/s
 Larger s implies wider confidence interval
 Larger n implies shorter confidence interval
• → with more observations, we are better able to predict
population mean
• → square-root n relationship implies increasing
observations by a factor of 4 only cuts confidence interval
by a factor of 2.
 Confidence Interval computation, as described
here works for n ≥ 30.
CPSC 531: Data Analysis
23
What if n not large?
 For smaller samples, can construct confidence
intervals only if observations come from
normally distributed population
x  t[1 / 2;n1]s /
n , x  t[1 / 2;n 1]s / n
 t[1-α/2;n-1]
is the (1-α/2)-quantile of a t-variate with
(n-1) degrees of freedom
CPSC 531: Data Analysis
24
Testing for a Zero Mean
 Check if measured value is significantly
different than zero
 Determine confidence interval
 Then check if zero is inside interval.
 Procedure applicable to any other value a
mean
0
Mean is zero
Mean is nonzero
CPSC 531: Data Analysis
25
Outline
 Measures of Central Tendency
 How to Summarize Variability?
 Comparing Systems Using Sample Data
 Comparing Two Alternatives
 Transient Removal
CPSC 531: Data Analysis
26
Comparing Two Alternatives
 Often interested in comparing systems
“naïve” VOD vs. “batching” VOD (assignment 3)
 “SJF” vs. “FIFO” request scheduling (assignment 1)
 Statistical techniques for such comparison:
 Paired Observations
 Unpaired Observations (we will omit this!)
 Approximate Visual Test
 Did you use any of these in your assignments?
CPSC 531: Data Analysis
27
Paired Observations (1)
 n experiments with one-to-one corrsp. between
test on system A and test on system B
no correspondence => unpaired
 This test uses the zero mean idea…
 Treat the two samples as one sample of n pairs
 For each pair, compute difference
 Construct confidence interval for difference
 CI includes zero => systems not significantly
different
CPSC 531: Data Analysis
28
Paired Observations (2)
 Six similar workloads used on two systems.
{(5.4, 19.1), (16.6, 3.5), (0.6,3.4), (1.4,2.5), (0.6,
3.6) (7.3, 1.7)} Is one system better?
 The performance differences are
{-13.7, 13.1, -2.8, -1.1, -3.0, 5.6}
 Sample mean = -.32, sample SD = 9.03
 CI = -0.32 + t[sqrt(81.62/6)] = -0.32 + t(3.69)
 .95 quantile of t with 5 DF’s is 2.015
 90% confidence interval = (-7.75, 7.11)
 Systems not different as zero mean in CI
CPSC 531: Data Analysis
29
Approximate Visual Test
 Compute confidence interval for means
 If CI’s don’t overlap, one system better than
the other
mean
mean
CI’s do not overlap =>
alternatives different
mean
CI’s overlap and mean
of one is in the CI of
the other =>
not significantly diff.
CI’s overlap but mean
of one is not in the
CI of the other =>
need more testing
CPSC 531: Data Analysis
30
Determining Sample Size
 Goal: find the smallest sample size n such that desired
confidence in the results
 Method:
small set of preliminary measurements
estimate variance from the measurements
use estimate to determine sample size for accuracy
 r% accuracy=> +r% at 100(1-)% confidence
r 
xz
 x 1 
100 
n
s
 100zs 
n  
 rx 
2
CPSC 531: Data Analysis
31
Outline
 Measures of Central Tendency
 How to Summarize Variability?
 Comparing Systems Using Sample Data
 Comparing Two Alternatives
 Transient Removal
CPSC 531: Data Analysis
32
Transient Removal
 In many simulations, we are interested in
steady state performance
 Remove
initial transient state
 However, defining exactly what constitutes
end of transient state is difficult!
 Several heuristics developed:
Long runs
 Proper initialization
 Truncation
 Initial data deletion
 Moving average of replications
 Batch means
CPSC 531: Data Analysis
33
Long Runs
 Use very long runs
 Impact of transient state becomes negligible
 Wasteful use of resources
 How long is “long enough”?
 Raj Jain text recommends that this method
not be used in isolation
CPSC 531: Data Analysis
34
Batch Means
 Run simulation for long
duration
 Divide observations (N) into
m batches, each of size n
 Compute variance of batch
means using procedure shown
for n = 2, 3, 4, 5 …
 Plot variance vs. batch size
Ignore
1) Computebatch mean
1 n
xi   xij , i  1,2,...,m
n i 1
2)Computeoverallmean
1 m
x   xi
m i 1
3) Computevarianceof batch means
1 m
2
Var ( x ) 
 ( xi  x )
m  1 i 1
Variance of
Batch means
Transient
interval
Batch Size n
CPSC 531: Data Analysis
35
					 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                            