Survey							
                            
		                
		                * Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Statistical Data Analysis and
Simulation
João R. T. de Mello Neto
Jorge Andre Swieca School
Campos do Jordão, January,2003
Questions
•
•
•
•
•
•
What is probability? How to quantify it?
What is the probability of something happens?
What is the value of a given parameter?
What is the uncertainty in a given parameter?
Is this fit acceptable?
What is the likelihood of a given signal be physics and
not background?
• How one separates signal from background?
Chance
The conception of chance enters into the very first
steps of scientific activity, in virtue of the fact that no
observation is absolutely correct.
Max Born
Natural Philosophy of Cause and Chance, p. 47
O acaso é um diabo e um deus ao mesmo tempo.
Machado de Assis
Lectures
• Basics: random variables, probability,
distributions
• Random numbers, minimization techniques
• Maximum likelihood and chi-square methods
• Goodness of fit, limits
• Applications: pattern recognition in the LHCb
muon system, sigma particle fitting in E791,
bayesian coin,…
First lecture
Basics: random variables,
probabilities and distributions
Jorge Andre Swieca School
Campos do Jordão, January,2003
References
• Statistical Data Analysis, G. Cowan, Oxford,
1998;
• Statistics, A guide to the Use of Statistical
Methods in the Physical Sciences, R. Barlow,
J. Wiley & Sons, 1989;
• Computational Statistics Handbook with
MATLAB, W. L. Martinez, A. R. Martinez,
Chapman&Hall, 2002
Random Variables
• Random experiment: the outcome cannot be
Errors in the measuring process
predicted with certainty
Fundamental unpredictability
• Statistics: model and analyze the outcomes
• Sample space S = set of all possible outcomes
• Die X = { 1, 2, 3, 4, 5, 6}
Discrete random variable
• Period of a pendulum
Continous random variable
Probability
• Quantify the degree of randomness;
• Definition in terms of set theory:
S composed of elements A (subsets of S)
P(A) real number that satisfy three axioms:
• for every A, P(A) ≥ 0
• if A∩B = Ø (disjoints) P(AUB) = P(A) + P(B)
• P(S) = 1
P(Ā) = 1 – P(A)
P(Ø) = 0
P(AUĀ) = 1
If A C B, P(A) ≤ P(B)
0 ≤ P(A) ≤ 1
P(AUB) = P(A) + P(B) – P(A∩B)
Intuitive approach
S
A
∩
∩
∩
Conditional probability P(A|B) :
prob. of event A given B
B
∩
∩
∩
∩
∩
P (B )  5
P (A)  3
10
10
∩
P( A | B)  2
∩
P(A|B) = events in A and B =
Events in B
P(B|A) = 2 =
3
events in A and B
total
events in B
total
P(B∩A)
P(A)
P(A∩B) = P(B|A)P(A) = 2/3 x 3/10 = 2/10
= P(A|B)P(B)
=
P(A∩B)
P(B)
5
Intuitive approach
Independent probabilities
S
A
B
∩
∩
∩
∩
S
∩
∩
∩
∩
A
B
∩
∩
∩
∩
∩
∩
2 3
5 10
not independent
∩
∩
∩
P(A|B) = P(A)
∩
∩
∩
5
4
2
P(A) 
P(B) 
P(A  B) 
10
10
10
2
P(A | B)   P ( A)
independent
4
Bayes Theorem
P( A  B)
P( A | B) 
P (B )
P (B  A )
P (B | A ) 
P ( A)
P ( A | B )P (B )  P (B | A)P ( A)
P (B | A)P ( A)
P( A | B) 
P (B )
Ai  A j  ,
S   i Ai
i  j
disjoints
B  B  S  B  ( i Ai )   i (B  Ai )
P (B )   P (B  Ai )   P (B | Ai )P ( Ai )
i
i
P (B | A)P ( A)
P( A | B) 
 P (B | Ai )P ( Ai )
i
Cherenkov counter
90%
signal π
10% signal K
95% efficiency
6% false signals
P ( )  0.9 P (K )  0.1
P (s |  )  0.95 P (s |  )  0.05
P (s | K )  0.06 P (s | K )  0.94
P (s |  )P ( )
P ( | s ) 
P (s |  )P ( )  P (s | K )P (K )
= 99.3%
P (K | s ) = 0.7%
P (s | K )P (K )
P (K | s ) 
P (s |  )P ( )  P (s | K )P (K ) = 67.6%
P ( | s ) = 32.4%
AIDS positive
“About 0.01 percent of men with no known risk behaviour are
infected with HIV (base rate). If such a man has the virus, there
is a 99.9 percent chance that the test result will be positive
(sensitivity). If a man is not infected, there is a 99.99 percent
chance that the test result will be negative (specificity)”
What is the chance that a man who tests positive actually has the virus?
p(d )  0.0001 p( p | d )  0.999 p(n | d )  0.9999
p( p | d )  0.0001 p( n | d )  0.001
p( p | d )p(d )
p( p | d )p(d )
p(d | p) 
p( p | d )p(d )  p( p | d )p(d )
p( p)
= 0.5
Reckoning with Risk, G. Gigerenger, 2002
AIDS positive
natural frequencies
(no known risk behaviour)
10000
1 HIV
9999 no HIV
1 positive 0 negative 1 positive
1
p(d | p ) 
2
9998 negative
Many examples: mamography screening 1 out of 10 positives!
Gigerenger, 2002
Probability
What is the meaning of P(A)?
Frequentist: limit of relative frequencies
S: possible outcomes of an experiment (repeatable)
A: occurrence of a given outcome (event)
P(A) = lim
n→∞
•
•
•
•
number of occurrences of A in n measurements
n
consistent with the probability axioms
usual interpretation in standard textbooks
appropriate to particle physics (many repeatable events)
more problematic for unique phenomena • big-bang
• rain tomorrow
Probability
Bayesian (subjective)
Element of S: hypotheses or propositions (true or false)
P(A) = degree of belief that hypothesis A is true
Hypothesys: a measurement will yield a given outcome a
certain fraction of the time subjective probabilities include
the frequentist interpretation
P=95%
m 1 ≤ me ≤ m 2
Bayesian interpretation!
Bayesian statistics: interpretation of Bayes theorem
Probability
P (B | A)P ( A)
P( A | B) 
P (B )
A: a given theory is correct;
B: data will yield a particular result;
likelihood
P(theory|data) = P(data|theory) P(theory)
P(data)
apriori
posteriori
Distributions
f(x) prob. density function
x: random continuos variable
probability to observe x in the interval [x, x+ dx] = f(x)dx
 f ( x )dx  1
S
cumulative distribution function
x
F(x) 
/
/
f
(
x
)
dx
Distributions
joint p.d.f f(x,y)
P(A∩B) = prob. of x in [x, x + dx] and y in [y, y + dy]
= f ( x, y )dxdy
P ( A)   f ( x, y i )dydx  fx ( x )dx
i
fx ( x ) 
 f ( x, y )dy
P (B | A ) 
P ( A  B ) f ( x, y )dxdy
P ( A)
f x ( x )dx
f ( x, y )
f ( x, y )
h( y | x ) 
/
/
fx ( x )
f
(
x
,
y
)
dy
Distributions
Distributions
E[ x] 
expectation value
 xf ( x )dx  
2
population variance E[( x  E[ x ]) ]   ( x   ) f ( x )dx  V [ x ]
2
covariance
Vxy  E [( x   x )( y   y )]  E [ xy ]   x  y
  xyf ( x, y )dxdy  
 
correlation coeficient
 xy 
Vxy
 x y
x
y
Distributions
binomial
• process with a given number of identical trials (N) with
two possible outcomes : success (p), failure (1-p)
• what is the probability of n success? ( N-n failures)
probability for a particular sequence:
N n
p (1  p)
n
N!
order does not matter: number of sequences
n! (N  n )!
N!
probability, not
f (n; N, p) 
p n (1  p)N n
n! (N  n )!
prob. density
E [n ]  Np
V [n]  E[n ]  (E[n])  Np(1  p)
2
2
binomial
binomial
binomial
C1
C2
C3
C4
C5
Individual efficiency: 0.95
track: at least 3 points
3 chambers: f(3;3,0.95) = 0.953 = 0.857
4 chambers: f(3;4,0.95) + f(4;4,0.95) = 0.171 + 0.815 = 0.986
5 chambers: f(3;5,0.95) + f(4;5,0.95) + f(5;5,0.95) =
0.021 + 0.204 + 0.774 = 0.999
Poisson
binomial: N large, p very small, Np→ν
f (n; ) 
 e
n
v
n!
particular events, but no idea of number of trials
sharp events occurring in a continuum
Geiger counter near a radioactive source;
Number of flashes of lightning in a storm;
E [n ]  
V [n ]  
Poisson
Proof:
ν events in some interval
split interval in N sections
prob. that a given section contains an event
p
N
prob. of n events in N sections
Nn
f (n;
N
,N ) 
 
 
1
n 
N  N
n
N n
N!
n! (N  n )!
N→∞ with n finite
N!
 N (N  1)(N  2)...( N  n  1)  N n
(N  n )!
  
1  
 N
N n
  
 1    e 
 N
N
f (n; ) 
 e
n
v
n!
Poisson
Poisson
Fatal horse kicks: number of Prussian soldiers kicked to death
by horses. In ten different army corps, over 20 years, there
were 122 deaths:   122 = number of deaths
= 0.610
one corps X year
200
no deaths: P(0, 0.61) = 0.5434
number of (corpsXyears) with no deaths: 200X0.5434 = 108.7
one death: P(1, 0.61) = 0.3315
number of (corpsXyears) with one death: 200X0.3515 = 66.3
deaths
corpsXyear
0
1
2
3
4
actual number
109
65
22
3
1
Poisson
108.7
66.3
20.2
4.1
0.6
Gaussian
2
1
(
x
)
2
f ( x; , ) 
exp 
2
2
2 
standard gaussian:
x
 ( x )   ( x / )dx /
V [x ]   2
 x2 
1
( x ) 
exp  
2
 2
evaluated numerically
cumulative
E [x ]  
Gaussian
Gaussian
Gaussian
Gaussian
 
in N dimensions: x,  column vectors
 
  T 1  
1
1
f ( x; ,V ) 
exp  2 ( x   ) V ( x   )
N/2
1/ 2
( 2 ) | V |
N (N  1)
V: symmetric NXN matrix
2
E[ xi ]  i V [ xi ]  Vii cov[ x i , x j ]  Vij
in 2 dimensions:  
V12
 1 2
f ( x1, x 2 ; 1,  2 ,  ) 
1
21 2 1  
2
2
2
 x1  1   x 2   2 
 x1  1  x 2   2  
1
  
  2  
 
exp 
2
  1   2  
 2(1   )   1    2 
Gaussian
Central limit theorem
the sum of N independent continous random variables xi with
means µi and variances σi (N →∞) becomes a Gaussian
random variable with
N
   i
i 1
N
 2    i2
i 1
regardless of the form of the individual p.d.f. of the xi
formal justification for treating measurement errors as
Gaussian random variables:
total error: sum of a large number of small contributions
Central limit theorem
Actually used: algorithm R632 Cern library