Download Lecture 2 - Manchester HEP

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Taylor's law wikipedia , lookup

Karhunen–Loève theorem wikipedia , lookup

Transcript
Statistics for HEP
Roger Barlow
Manchester University
Lecture 2: Distributions
Slide 1
The Binomial
0.4
n trials r successes
Individual success
probability p
0.2
n!
P(r ; n, p) 
p r (1  p) n r
r!(n  r )!
0
0
1
2
3
Mean
4
5
r
Variance
V=<(r-  )2>=<r2>-<r>2
=<r>=rP( r )
=np(1-p)
= np
Met with in
Efficiency/Acceptance
calculations
Slide 2
1-p p  q
Binomial Examples
p=0.1
n=5
1
n=50
n=20
0.3
0.2
0.2
0.5
0.1
0.1
0
0
0
r
n=10
r
r
p=0.2
0.4
0.3
0.3
0.2
p=0.5
p=0.8
0.4
0.3
0.2
0.2
0.1
0.1
0.1
0
0
0
Slider 3
r
r
Poisson
0. 3
=2.5
0. 2
0. 1
0
Mean
r
=<r>=rP( r )
=
‘Events in a continuum’
e.g. Geiger Counter clicks
Mean rate  in time interval
Gives number of events in data
P(r ;  )  e 
r
r!
Variance
V=<(r-  )2>=<r2>-<r>2
Slide 4
=
Poisson Examples
0. 8
=2.0
=1.0
=0.5
0. 3
0.4
0. 6
0. 2
0.3
0. 4
0.2
0. 2
0. 1
0.1
0
0
0
=5.0
r
0.2
0.2
=10
=25
r
0.1
0.1
0
0
0
r
Slide 5
r
r
Binomial and Poisson
From an exam paper
A student is standing by the road, hoping to hitch a lift. Cars pass
according to a Poisson distribution with a mean frequency of 1 per
minute. The probability of an individual car giving a lift is 1%.
Calculate the probability that the student is still waiting for a lift
(a) After 60 cars have passed
(b) After 1 hour
a) 0.9960=0.5472
Slide 6
b) e-0.6=0.5488
Poisson as approximate
binomial
Poisson mean 2
Use: MC simulation (Binomial) of
Real data (Poisson)
Binomial: p=0.1, 20 tries
Binomial: p=0.01, 200 tries
0.3
0.2
0.1
0
r
Slide 7
Two Poissons
2 Poisson sources, means 1 and 2
Combine samples
e.g. leptonic and hadronic decays of W
Forward and backward muon pairs
Tracks that trigger and tracks that don’t
What you get is a Convolution
P( r )= P(r’; 1 ) P(r-r’; 2 )
Turns out this is also a Poisson with mean 1+2
Avoids lots of worry
Slide 8
The Gaussian
Probability
Density
1
P( x;  ,  ) 
 2
( x   ) 2 / 2 2
e
Mean
=<x>=xP( x ) dx
=
Slide 9
Variance
V=<(x-  )2>=<x2>-<x>2
=
Different Gaussians
There’s only one!
Width scaling
factor
Normalisation
(if required)
Slide 10
Falls to 1/e
of peak at
x=
Location
change 
Probability Contents
68.27% within 1
95.45% within 2
99.73% within 3
90% within 1.645 
95% within 1.960 
99% within 2.576 
99.9% within 3.290
These numbers apply to Gaussians and only Gaussians
Slide 11
Other distributions have equivalent values
which you could use of you wanted
Central Limit Theorem
Or: why is the Gaussian Normal?
If a Variable x is produced by the
convolution of variables x1,x2…xN
I) <x>=1+2+…N
II) V(x)=V1+V2+…VN
III) P(x) becomes Gaussian for large N
There were hints in the Binomial and Poisson examples
Slide 12
CLT demonstration
Convolute Uniform distribution with itself
Slide 13
CLT Proof (I) Characteristic
functions
Given P(x) consider
~
ikx
ikx
e   e P( x)dx  P (k )  (k )
The Characteristic Function
For convolutions, CFs multiply
~
~
~
If f ( x)  g ( x)  h( x) then f (k )  g (k ) h (k )
Logs of CFs Add
Slide 14
CLT proof (2) Cumulants
CF is a power series in k
<1>+<ikx>+<(ikx)2/2!>+<(ikx)3/3!>+…
1+ik<x>-k2<x2>/2!-ik3<x3>/3!+…
Ln CF can then be expanded as a series
ikK1 + (ik)2K2/2! + (ik)3K3/3!…
Kr : the “semi-invariant cumulants of Thiele”
Total power of xr
If xx+a then only K1  K1+a
If xbx then each Kr  brKr
Slide 15
CLT proof (3)
• The FT of a Gaussian is a Gaussian
e
 x 2 / 2 2
e
 k 2 2 / 2
Taking logs gives power series up to k2
Kr=0 for r>2 defines a Gaussian
• Selfconvolute anything n times: Kr’=n Kr
Need to normalise – divide by n
Kr’’=n-r Kr’= n1-r Kr
• Higher Cumulants die away faster
If the distributions are not identical but
Slide 16
similar the same argument applies
CLT in real life
Examples
• Height
• Simple
Measurements
• Student final
marks
Slide 17
Counterexamples
• Weight
• Wealth
• Student entry
grades
Multidimensional Gaussian
P( x, y;  x ,  y ,  x ,  y ) 
1
 x y 2
( x   x ) 2 / 2 x
e
2
( y   y ) 2 / 2 y
2
e
P ( x, y;  x ,  y ,  x ,  y ,  )

1
 x y 2 1  
Slide 18

2
e
1
 ( x   ) 2 /  2  ( y   ) 2 /  2  2  ( x   )( y   ) /   
x
x
y
y
x
y
x y
2 

2 (1  ) 
Chi squared
 xi   i 

   
i 
i 1 
n
2
2
Sum of squared discrepancies, scaled by
expected error
Integrate all but 1-D of multi-D Gaussian
n / 2
2
2
n2   2 / 2
P(  ; n) 
 e
(n / 2)
Slide 19
Mean n
Variance 2n
CLT slow to operate
Generating Distributions
Given int rand()in stdlib.h
float Random()
{return ((float)rand())/RAND_MAX;}
float uniform(float lo, float hi)
{return lo+Random()*(hi-lo);}
float life(float tau)
{return -tau*log(Random());}
float ugauss()
// really crude. Do not use
{float x=0; for(int i=0;i<12;i++) x+=Random();
return x-6;}
float gauss(float mu, float sigma)
{return mu+sigma*ugauss();}
Slide 20
A better Gaussian Generator
float ugauss(){
static bool igot=false;
static float got;
if(igot){igot=false; return got;}
float phi=uniform(0.0F,2*M_PI);
float r=life(1.0f);
igot=true;
got=r*cos(phi);
return r*sin(phi);}
Slide 21
More complicated functions
25
Find P0=max[P(x)].
Overestimate if in doubt
Repeat :
Repeat:
Generate random x
Find P(x)
Generate random P in range 0-P0
till P(x)>P
till you have enough data
20
15
10
5
0
1
Slide 22
3
5
7
9
11
13
15
17
If necessary make x non-random and compensate
19
Other distributions
Uniform(top hat)
=width/12
Breit Wigner (Cauchy)
Has no variance – useful for wide tails
Landau
Has no variance or mean

e
Not given by
.Use CERNLIB
Slide 23
e
Functions you need to know
• Gaussian/Chi squared
• Poisson
•
•
Binomial
Everything else
Slide 24