Download Stats Review Lecture 3 - Random Variables 08.29.12

Document related concepts

Foundations of statistics wikipedia , lookup

Degrees of freedom (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Central limit theorem wikipedia , lookup

Gibbs sampling wikipedia , lookup

Transcript
Discrete Random Variable
• Let X denote the return of the S&P 500
tomorrow, rounded to the nearest percent
• what are the possibilities, i.e. 0, 1%, …
• what is the probability of each of the above
possibilities
• Probability distribution function:
f(x) = P(X=x)
Probability Distribution
Probability
0.35
0.30
0.25
0.20
0.15
0.10
0.05
0.00
-4
-3
-2
-1
0
Return
1
2
3
4
Cumulative Distribution Function
CDF
1.20
1.00
0.80
0.60
0.40
0.20
-6
-4
0.00
-2
0
Return
2
4
6
Discrete Random Variable
• Expectation
• Variance
Which Distribution Has Higher
Mean?
0.25
0.20
0.15
0.10
0.05
0.00
0.25
0.20
0.15
0.10
0.05
0.00
-4
-3
-2
-1
0
1
2
3
4
-4
-3
-2
-1
0
1
2
3
4
Which Distribution Has Higher
Variance?
0.40
0.30
0.20
0.10
0.00
0.25
0.20
0.15
0.10
0.05
0.00
-4
-3
-2
-1
0
1
2
3
4
-4
-3
-2
-1
0
1
2
3
4
Expectation of a Function of a R.V.
• Function g(X):
– What is the expectation E(g(X))?
• General result:
• Example – call option on the S&P 500
Binomial Distribution
• Bernoulli distribution
– A r.v. X has two possible outcomes, 0 or 1
• Binomial distribution
– Number of successes that occur in n trials
• Example: Ch. 4, 6b
Poisson
• A r.v. X takes on values 0, 1, 2, ....
• Poisson distribution if for some l > 0,
• The Poisson r.v. is an approximation for
binomial with l = np.
• Example: how many days in a year will the
S&P500 drop more than 1%?
• Example 7b
Geometric
• Independent trials with prob. of success p
– How many trials until a success occurs?
• What happens when n goes to infinity?
• Example: how many days until we get a
stock market drop of 2% or more?
Negative Binomial
• Independent trials with prob. of success p
– How many trials until r successes occur?
• What happens when n goes to infinity?
• Example: how many days until we get three
stock market drops of 2% or more (not
necessarily consecutive)?
Hyper-Geometric
• Choose n balls out of N, without replacement
– m white, N – m black
– X = number of white balls selected
• Example 8i
• What happens if you choose the n balls with
replacement?
Continuous Random Variable
• Let X denote the return of the S&P 500
tomorrow, no rounding
• what are the possibilities
• what is the probability of each of the above
possibilities
• Probability density function:
Probability Density Function
pdf
0.35
0.30
0.25
0.20
0.15
0.10
0.05
0.00
-4
-2
0
Return
2
4
Cumulative Density Function
cdf
1.20
1.00
0.80
0.60
0.40
0.20
0.00
-4
-2
0
Return
2
4
Continuous Random Variable
• Expectation
• Variance
• Example Ch5, 1a, 1b, 2a
Continuous Random Variable
• For any real-valued function g and
continuous r.v. X:
• Example: payoff on a call option, 2b
Which Distribution Has Higher
Mean?
0.25
0.20
0.15
0.10
0.05
0.00
0.40 -4
-2
0
2
4
-2
0
2
4
0.30
0.20
0.10
0.00
-4
Which Distribution Has Higher
Mean?
0.60
0.50
0.40
0.30
0.20
0.10
0.00
0.40 -4
-2
0
2
4
-2
0
2
4
0.30
0.20
0.10
0.00
-4
Which Distribution Has Higher
Variance?
0.40
0.30
0.20
0.10
0.00
0.30 -4
-2
0
2
4
-2
0
2
4
0.20
0.10
0.00
-4
Skewness
0.40
0.30
0.20
0.10
0.00
0.40
0.30
0.20
0.10
0.00
-4
-2
0
2
4
-4
-2
0
2
4
Kurtosis
Additional Sample Questions
• Given a discrete probability function (pdf) (i.e., all
possible outcomes and their probabilities),
compute the mean and the variance
• Given a graph of several discrete or continuous
pdf, estimate which ones has the highest mean,
variance, skewness, kurtosis
• Given two random variables, guess whether they
have positive or negative covariance and/or
correlation
The Uniform Distribution
0.15
0.10
0.05
0.00
-4
Example 3b
-2
0
2
4
The Normal Distribution
0.50
0.40
0.30
0.20
0.10
0.00
-4
-2
0
2
4
The Normal Distribution
Example Ch 5, 4b, 4e
Properties of the Normal
Distribution
2.00
1.50
1.00
x~N[.5,1]
0.50
1+.5x
0.00
-0.50 -4
-2
0
2
4
Normal is an Approximation to
Binomial
• Sn = number of successes in n independent
trials with individual prob. of success p.
• The DeMoivre-Laplace limit theorem:
Normal is an Approximation to
Binomial
Lognormal Distribution
• What is the distribution of the S&P 500
index tomorrow?
• If the return on the S&P500 is normally
distributed, the index itself is lognormally
distributed
Lognormal Distribution
Chi-squared Distribution
• Sum of squared standard normal variables
F distribution
• Ratio of two independent chi-squared
variables with degrees of freedom n1 and n2
t distribution
• Very important for hypothesis testing
Normal vs. t distribution
Exponential Distribution
• PDF:
• CDF:
• Exercise:
Joint Distributions of R. V.
• Joint probability distribution function:
f(x,y) = P(X=x, Y=y)
• Example Ch 6, 1c, 1d
Independence
• Two variables are independent if, for any
two sets of real numbers A and B,
• Operationally: two variables are indepndent
iff their joint pdf can be “separated” for any
x and y:
Joint Distributions of R. V.
• The expectation of a sum equals the sum of
the expectations:
• The variance of a sum is more complicated:
• If independent, then the variance of a sum
equals the sum of the variances
Sum of Normally Distributed RV
0.50
0.40
x~N[.5,1]
0.30
y~N[1,1]
0.20
x+y~N[1.5,2]
0.10
0.00
-4
-2
0
2
4
Additional Sample Questions
• Find the distribution of a transformation of
two or more normal random variables
• By looking at a graph of a pdf, guess
whether it is normal, log-normal, or tdistribution
• What normally distributed random variables
do you need to construct an F distribution
with 3 and 5 degrees of freedom
Conditional Distributions
(Discrete)
• For any two events, E and F,
• Conditional pdf:
• Examples Ch 6, 4a, 4b
Conditional Distributions
(Discrete)
• Conditional cdf:
Conditional Distributions
(Discrete)
• Example: what is the probability that the
TSX is up, conditional on the S&P500
being up?
Conditional Distributions
(Continous)
• Conditional pdf:
• Conditional cdf:
• Example 5b
Conditional Distributions
(Continous)
• Example: what is the probability that the
TSX is up, conditional on the S&P500
being up 3%?
Joint PDF of Functions of R.V.
•
= joint pdf of X1 and X2
• Equations
for and
and can be uniquely solved
given by:
and
• The functions and have continuous
partial derivatives:
Joint PDF of Functions of R.V.
• Under the conditions on previous slide,
• Insert eq. 7.1, p275
• Example: You manage two portfolios of
TSX and S&P500:
– Portfolio 1: 50% in each
– Portfolio 2: 10% TSX, 90% S&P 500
• What is the probability that both of those
portfolios experience a loss tomorrow?
Joint PDF of Functions of R.V.
• Example 7a – uniform and normal cases
Estimation
• Given limited data we make educated
guesses about the true parameters
• Estimation of the mean
• Estimation of the variance
• Random sample
Population vs. Sample
• Population parameter describes the true
characteristics of the whole population
• Sample parameter describes characteristics
of the sample
• Statistics is all about using sample
parameters to make inferences about the
population parameters
Distribution of the Sample Mean
• The sample mean follows a t-distribution:
Confidence Intervals
• We can estimate the mean, but we’d like to
know how accurate our estimate is
• We’d like to put upper and lower bounds
on our estimate
• We might need to know whether the true
mean is above certain value, e.g. zero
Constructing Confidence
Intervals
• We already know the distribution of our
estimate of the mean
• To construct a 95% confidence interval, for
instance, just find the values that contain
95% of the distribution
Constructing Confidence
Intervals
X 
s/ n
2.5% of the
distribution
Critical values
falls in this
region 95% of
the time
2.5% of the
distribution
Confidence Intervals and
Hypothesis Testing
• The critical values are available from a table
or in Matlab
>> tinv(.975, n-1)
• If the confidence interval includes zero,
then the sample mean is not statistically
different from the population mean we are
testing
• One-sided vs. two-sided tests
Example
• Are the returns on the S&P 500
significantly above zero?
– Sample mean = .23
– Sample standard deviation = .59
– Sample size = 128
• Compute the test:
• At 95% the critical value is 1.98
• Therefore, we reject that the returns are zero
Distribution of S&P500 Returns
• The direct use of historical data requires the
following assumptions:
– The true distribution of returns is constant
through time and will not change in the future
– Each period represents an independent draw
from this distribution
Distribution of Stock Returns
S&P 500
0.3
0.25
0.2
0.15
0.1
0.05
0
-0.80
-0.20
0.40
1.00
1.60
2.20
more
Distribution of Stock Returns
TSE 200
0.2
0.15
0.1
0.05
0
-0.80
0.20
1.20
2.20
Distribution of Stock Returns
DAX
0.14
0.12
0.1
0.08
0.06
0.04
0.02
0
-0.80
0.20
1.20
2.20
Linear Regression (Harvey 1989)
0.10
0.08
0.06
0.04
Growth
Spread
0.02
0.00
O-54
-0.02
-0.04
J-68
F-82
O-95
J-09
Harvey 1989
Growth
0.10
0.08
GNP Growth
0.06
0.04
0.02
0.00
-0.03
-0.02
-0.01
0
0.01
0.02
-0.02
-0.04
Spread
0.03
0.04
Harvey 1989
Growth
Regression Line:
Growth
0.10t 1:t 5  a  b( Spread )t  ut 5
0.08
GNP Growth
0.06
0.04
0.02
0.00
-0.03
-0.02
-0.01
0
0.01
0.02
-0.02
-0.04
Spread
0.03
0.04
Regression
• Minimize the squared residuals:
Regression in Matrix Form
• Regression equation:
• Minimize the squared residuals: