Download Random Variables and Probability Distributions Statistics and

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Central limit theorem wikipedia , lookup

Transcript
1/3/2017
Statistics and Probability Distributions
Random Variables and Probability
Distributions
 Certain
probability distributions are
assumed by many of the common
statistical tests
ANOVA assumes variables follow a normal
distribution (need to meet assumptions to
use ANOVA)
 Probability
world data
distributions fit many real-
Ecological Analyses
Ecological Analyses
Random Variables
 Discrete
Which sample is ‘better’?
Random Variables
e.g., 1,3,5
Presence versus Absence
Number of Offspring born to swallows
 Continuous
Random Variables
Can have any value within an interval
e.g., body mass, wing length
Ecological Analyses
Ecological Analyses
Accuracy versus Precision
Precision, Accuracy and Bias
 Accuracy
is how close the estimated
value is to the true value – this
difference is the bias
 Precision is the variation in the
measurement
 Your
sample indicates precision, but you
don’t know its accuracy!
Precise
Accurate
Ecological Analyses
Ecological Analyses
1
1/3/2017
Discrete Random Variable Distributions
 Bernoulli
Bernoulli Random Variable
Random Variables
X ~ Bernoulli(p)
Experiment has only two outcomes
(e.g., organism present or absent)
Bernoulli Random Variable describes the
outcome of such an experiment
 The
random variable X is distributed as
a Bernoulli random variable with a single
parameter ‘p’
 Best
example would be the toss of a
‘fair’ coin in which either outcome is
equally likely (i.e., p =0.5)
Ecological Analyses
Ecological Analyses
Bernoulli Random Variable
Binomial Random Variable
 Might
use a Bernoulli Random variable to
look at the presence or absence of a
species in a number of different
locations (e.g., habitats, lakes)
 Many
Bernoulli Trials = Binomial Random
Variable
 Necessary because we would also want
to involve replication in our experiments
Ecological Analyses
Ecological Analyses
Binomial Random Variable
Binomial Random Variable
X ~ Bin(n,p)
binomial Random Variable X is the
number of successful results in n
independent Bernoulli trials (parameters
n and p)
 If n = 1, then the result is equivalent to
a Bernoulli trial
 One of the most common types of
random variables encountered in
ecological studies

The probability of obtaining X successes for a
binomial random variable is:

where n is the number of trials, X is the
number of successful outcomes (X ≤ n) and n! is
n factorial (i.e., n x (n-1) x (n-2) ... x 1)
A
Ecological Analyses
Ecological Analyses
2
1/3/2017
Binomial Random Variable
Binomial Coefficient
Think of
Consider the following set of five small
mammals:{(red-backed vole), (meadow vole),
(deer mouse), (short-tailed shrew), (jumping
mouse)}
 How many unique pairs of small mammals can be
formed from this set?

as “n choose X”, which is known as the binomial
coefficient
 Needed because there are many ways to obtain
combinations and failures
Ecological Analyses
Ecological Analyses
Binomial Coefficient
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
Binomial Coefficient
(red-backed vole),(meadow vole)
(red-backed vole),(deer mouse)
(red-backed vole),(short-tailed shrew)
(red-backed vole),(jumping mouse)}
(meadow vole),(deer mouse)
(meadow vole),(short-tailed shrew)
(meadow vole),(jumping mouse)}
(deer mouse),(short-tailed shrew)
(deer mouse),(jumping mouse)}
(short-tailed shrew),(jumping mouse)}

Using the binomial coefficient, we would set n
= 5 and X = 2 and get 10 combinations:
Ecological Analyses
Ecological Analyses
The Binomial Distribution
Binomial Random Variable
 The
By having a predicted (or theoretical)
distribution, we can then see if our observed
results ‘fit’ that distribution
 But first we need to be able to know about the
available distributions

Ecological Analyses
following example (details on pages
31 and 32) illustrates taking the
distributions of taking various X values
out of 25 trials
Ecological Analyses
3
1/3/2017
Calculating the Binomial Distribution
The Binomial Distribution
Ecological Analyses
Ecological Analyses
The Binomial Distribution
Binomial Distribution with X~Bin(25,0.8)
Probability
distribution
 Symmetrical (both
tails equal)
 True only when
p = (1 - p) = 0.5

Ecological Analyses
Ecological Analyses
Poisson Random Variables
Binomial Distribution with X~Bin(25,0.8)
The Binomial distribution is appropriate when
there is a fixed number of trials (n) and the
probability of success is not too small
 Formula becomes awkward when n becomes
large and p becomes small (i.e., for rare
occurrences of animals or plants)
 Also need to be able to directly count the
trails themselves

Ecological Analyses
Ecological Analyses
4
1/3/2017
Poisson Random Variables
Poisson Random Variables
 Instead
we frequently count the events
that occur within a sample
 Suppose that you are using a number of
quadrats to sample for the presence of
animal damage
 Each occurrence represents the
‘success’ of an unobserved event
 Can’t
really determine how many ‘trials’
have taken place
 Similar for trials in time: number of
birds visiting a feeder over a period of
time
 We use the Poisson Distribution
Ecological Analyses
Ecological Analyses
Poisson Random Variables
Poisson Random Variables
X ~ Poisson()
 X is the number of occurrences of an
event recorded in a sample of fixed area
or during a fixed time interval
 Used when occurrences are rare (i.e.,
the most common number of counts in
any sample is 0)
 X is the number of events in a sample
X ~ Poisson()
 Described by a single parameter, 
 is the average value of the number of
occurrences of the event in each sample
Ecological Analyses
Ecological Analyses
Poisson Random Variables
Poisson Random Variables
 Suppose
that the average number of
damaged plants in a 10-m2 quadrat is 2
 What are the chances that a single
quadrat will contain 3 damaged plants?
  = 2, x = 3
Ecological Analyses

= 2, x = 3
Ecological Analyses
5
1/3/2017
Poisson Random Variables
Poisson Distributions
 The
chances that a plot will contain no
damage would be ( = 2, x = 0):
Ecological Analyses
Ecological Analyses
Poisson Distributions
 = 0.1
 = 0.5
 = 2.0
Poisson Distributions
 = 1.0
 Later
we can test observed frequencies
against these theoretical distributions
to see if our predictions are met ...
 = 4.0
Ecological Analyses
Expected value of a Discrete Random Variable
 The
entire distribution can be
summarized by determining the average
value
 Straight averaging can be misleading
with probability distributions, because
we need to weight by their probabilities
Ecological Analyses
Ecological Analyses
Variance of a Discrete Random Variable
 The
variance of a random variable is a
measure of how far the actual values or
a random variable differ from the
expected value
Ecological Analyses
6
1/3/2017
Discrete Statistical Distributions
Female horseshoe crabs with satellite
males
Ecological Analyses
Female horseshoe crabs with satellite
males
Ecological Analyses
Continuous Random Variables
ecological variables are not
discrete:
Number of Satellite Males
 Most
Body mass
Wing length
Concentrations of chemicals
Heights and diameters of trees
 Within
an interval, there are infinitely
many possible values for a variable
Female Carapace Width (mm)
Ecological Analyses
Uniform Random Variables
Ecological Analyses
Uniform Random Variables
 We
break up the continuous variable into
discrete intervals
 The sum of the probability of
occurrence of all intervals will be 1.0
Ecological Analyses
Ecological Analyses
7
1/3/2017
Uniform Random Variable
Uniform Random Variable
 The
probability that this uniform
random variable X occurs in any
subinterval
 f(x)
is a probability density function
(PDF)
 Assigning the P that a continuous
variable X occurs within an interval I
Ecological Analyses
Ecological Analyses
Probability Density and Cumulative
Distribution Functions
Cumulative Density Function



F(y) = P(X < y)
CDF represents the
tail probability: the
probability that a
random variable X is
less than or equal to
some value y
More when we look
at statistical tests
Ecological Analyses
Ecological Analyses
Normal (Gaussian) Random Variables
Normal Random Variables
X~N(,)
E(x) = 
 (x) = 
 Symmetric around 



Ecological Analyses
Standard Normal:
X~N(0,1)
Ecological Analyses
8
1/3/2017
Properties of the Normal Distribution
 Normal
distributions can be added
Properties of the Normal Distribution

Normal Distributions can be transformed
The sum of two independent normal random
variables is also a normally distributed
random variable
E(X+Y) = E(X) + E(Y)
(X+Y) = (X) + (Y)
Ecological Analyses
Properties of the Normal Distribution
Ecological Analyses
Log-normal and Exponential Distributions
 Normal
Distributions can be
standardized
A special case of a transformation
If a = 1/ and b = -1(/)
E(Y) = a + b and 2(Y) = a22
For X~N(,), Y=(1/)X-/ = (X-)/
E(Y) = 0, 2(Y)=1
For each X, subtracted  and divided by 
Ecological Analyses
Ecological Analyses
Continuous Statistical Distributions
Central Limit Theorem
 Corner
stone of probability and
statistical analyses
 Standardizing any random variable that
itself is a sum or average of a set of
independent random variables results in
a new random variable that is “nearly the
same as” a standard normal one
Ecological Analyses
Ecological Analyses
9
1/3/2017
Central Limit Theorem
Summary
 Allows
us to use statistics that require a
normal distribution even though the
underlying data themselves may not be
normally distributed
... Provided the samples size is large
enough ...
Ecological Analyses
 The
distributions of random variables
can be characterized by their expected
values and variance
Discrete: Bernoulli, Binomial, Poisson
Continuous: Uniform, Normal, Exponential
Ecological Analyses
Summary
 The
Central Limit Theorem asserts that
the sum or averages of large,
independent samples will follow a normal
distribution if standardized
 For most ecological data, the Central
Limit Theorem supports the use of
statistical tests that assume normal
distributions
Ecological Analyses
10