Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
MATH 7550-01
INTRODUCTION TO PROBABILITY
FALL 2011
Lecture 4. Measurable functions. Random variables.
I have told you that we can work with π-algebras generated by classes of sets in some
indirect ways.
The following microtheorem is useful here.
Theorem 4.1. Suppose π and π are classes of subsets of a space π. If
π β π(π),
π β π(π),
(4.1)
then
π(π) = π(π).
(4.2)
Proof. It is clear that if π β β°, then π(π) β π(β°). Applying this to the ο¬rst
inclusion in (4.1), we get:
(
)
π(π) β π π(π) = π(π)
(4.3)
(the last equality is quite clear). From the second inclusion in (4.1) we obtain, the same
way, that π(π) β π(π); which, together with (4.3), yields (4.2).
Let us show an example of how this is used.
Let π be the class of all intervals, (π, π], (π, π), [π, π), [π, π], ο¬nite or inο¬nite, in the
real line (a one-point set {π} is also an interval: {π} = [π, π]: it is the set of all real π₯ such
that π β€ π₯ β€ π; and the empty set can also be considered as an interval). Let π be the class of all
intervals, ο¬nite or inο¬nite, of the form (π, π] (we understand what the interval (β β, π] is;
but what is (π, β]? β By deο¬nition, it is the set {π₯ β β1 : π < π₯ β€ β}; that is, the same
as {π₯ β β1 : π < π₯ < β} = (π, β): no real number can be equal to β).
Clearly π β π. Let us prove that the π-algebras π(π) and π(π) are the same.
It is enough to check that π β π(π), which means that every interval of any kind
(( , ), [ , ), [ , ]) belongs to π(π). And to do this, it is enough to represent
βͺ β© every
interval as the result of applying countably many set-theoretic operations ( , , and π )
to intervals of the form ( , ].
An interval (π, π) is the union of smaller semi-closed intervals:
(π, π) =
β
βͺ
(π, π β 1/π]
(4.4)
π=π
(make a picture; I did not write the union from 1 to inο¬nity because I did not want
questions about what happens if the length of the interval (π, π) is less than 1. But in fact,
it would be OK: several ο¬rst summands in (4.4) would be intervals with the βrightβ end
π β 1/π to the left of the βleftβ end π; such intervals are just empty, and it wouldnβt aο¬ect
the union). So (π, π) belongs to π(π) by the π-algebra axiom 3π) (see Lectures 1β 2).
The interval [π, π] is the complement of the union
(ββ, π) βͺ (π, β]
1
(4.5)
(if π = β β or π = β, the corresponding intervals are empty), and such intervals are both
in π(π); so [π, π] β π(π). Finally, the interval [π, π) is the complement of (β β, π) βͺ [π, β],
and also belongs to π(π).
The same π-algebra is also the same as the π-algebra
generated by) all semi-inο¬nite
(
π
intervals (β β, π], β β < π < β (because (π, π] = (β β, π] βͺ (β β, π]π ), and the same
as that generated by all open subsets of β1 (the last statement is true because every open
subset of the real line is a countable union of open intervals).
The π-algebra
π{all intervals} = π{(π, π] : ββ β€ π β€ π β€ β} = π{(ββ, π] : π β β1 } = π{all open sets}
(4.6)
that we introduced is very important for us. Remember that I said that the choice of the
sample space Ξ© in applications of probability theory is natural, related to the nature of
the experiment whose mathematical model we are trying to build; and the choice of the
π-algebra β± of its subsets proclaimed as events is standard. Namely, if the sample space Ξ©
is countable (possibly, ο¬nite), we take β± = π«(Ξ©), the class of all subsets of Ξ©; and if Ξ© is
the real line, or part of it, or a set in βπ , ..., then the choice of the π-algebra β± is also
standard.
The time has come to set this standard. If Ξ© = β1 , we take as β± the π-algebra (4.6).
This π-algebra is called the (one-dimensional) Borel π-algebra, and its elements are
called Borel sets. The notation for it that we are going to use is β¬ 1 .
When Henri Lebesgue constructed what we call now the Lebesgue measure π1 on the real line, it was
deο¬ned on some π -algebra β1 of subsets of β1 , called measurable subsets. This π -algebra contains all
intervals, and the Lebesgue measure of every interval is equal to its length. Since β1 is some π -algebra
containing all intervals, and β¬ 1 is the smallest such π -algebra, we have clearly
β1 β β¬ 1 .
Is the inclusion β, in fact, a strict inclusion β, or is β1 = β¬ 1 ? You can be sure that mathematicians
have worked to ascertain this; it turned out that β1 is wider than β¬ 1 : β1 β β¬ 1 (and even much wider β
though I donβt want to spend time explaining what it means).
So which of these π -algebras should we use as the standard?
It turns out that the π -algebra β¬ 1 of Borel sets is very large: quite large enough for all practical
purposes, and for most of theoretic ones. As a matter of fact, we cannot even construct an example of
a non-Borel set; and we learn about existence of such only by indirect methods.
So in our probability theory we are quite satisο¬ed with the Borel π -algebra β¬ 1 , and we donβt need
in fact any sets belonging to β1 but not to β¬ 1 . We even donβt need to know that β1 is strictly larger
than β¬ 1 . (The question arises: why did Lebesgue introduce the π -algebra β1 if β¬ 1 is quite enough? β But
this was discovered only later.) The second reason that we are using the Borel π -algebra rather than some
larger ones is that the π -algebra β1 is closely related to a speciο¬c measure: the Lebesgue measure π1 ; and
for other measures on the real line the π -algebras on which one can consider them are diο¬erent from β1 . It
would be unreasonable to consider many diο¬erent π -algebras on the same space β1 if we can do with just
one, β¬ 1 .
The words to describe the π-algebra π(π) will be: the π-algebra generated by the
class of sets π.
2
How can we work with π-algebras generated by class of sets? We can do so in some
indirect ways.
We can consider Borel π-algebras in multidimensional spaces βπ ; these π-algebras
can be deο¬ned either as generated by all multidimensional βintervalsβ β i. e., rectangles
(π, π] × (π, π], or parallelepipeds in the three-dimensional case, etc.; or as generated by all
open sets. Problem 7 given to you is about the fact that it is all the same. The Borel
π-algebra in the π-dimensional Euclidean space is denoted β¬π .
We can also deο¬ne the Borel π-algebra β¬π in every space π that is a metric space,
or even a non-metric topological space: in every space where a class of open sets is deο¬ned. Of course, since there neednβt be any βintervalsβ or rectangles in the space π, the
π-algebra β¬π is deο¬ned as one generated by the class πͺ of open subsets of π.
I said that standard π-algebras are used in spaces that are either Euclidean spaces βπ ,
or their parts; but I have spoken only of the whole spaces.
If π is a Borel subset of βπ , we can deο¬ne the π-algebra β¬π either as one generated
by subsets of π that are open in π (a set π΄ β π is called open in π if for every point
π₯0 β π΄ there is its neighborhood in π that is entirely within π΄: there exists a positive
radius π such that {π₯ β π : β£π₯ β π₯0 β£ < π} β π΄); or as the π-algebra consisting of all Borel
subsets of π΄. See Problem 8 .
Now we go to random variables.
First, a piece belonging to the set-theoretic introduction to probability and measure
theory: what a measurable function is.
When Lebesgue introduced the concept of measurability, it was understood as measurability with respect to the Lebesgue measure. But afterwards mathematicians understood
that this concept can be formulated without reference to any measure: as one belonging
to the set-theoretic introduction.
A pair (π, π³ ) of a space π and a π-algebra π³ in it is called a measurable space.
Let (π, π³ ) and (π, π΄) be two measurable spaces. Let π (π₯) be a function π : π 7β π
(i. e., a function deο¬ned on π, with values in π ). We call this function (π³ , π΄)-measurable
(or measurable with respect to π³ , π΄) if for every set πΆ β π΄ its inverse image π β1 (πΆ) =
{π₯ : π (π₯) β πΆ} belongs to π³ :
πΆ β π΄ β π β1 (πΆ) β π³ .
(4.7)
If we consider a number-valued or a vector-valued function (π = β1 or βπ ), and take the
standard π-algebra β¬1 or β¬π as the π-algebra π΄, we omit the mention of π΄ and say
βπ³ -measurableβ, or βmeasurable with respect to the π-algebra π³ β. If π³ is a Borel
π-algebra, we call a function π that is measurable with respect to it Borel measurable.
Let Ξ© be a sample space, β±, a π-algebra in it whose elements we proclaim events.
Let (π, π³ ) be a measurable space. A random variable taking values in this measurable
space is, by deο¬nition, a measurable function from (Ξ©, β±) to (π, π³ ). We will denote
random variables with Greek letters. So, a random variable with values in (π, π³ ) is a
function π : Ξ© 7β π that is (β±, π³ )-measurable:
πΆ β π³ β π β1 (πΆ) = {π : π(π) β πΆ} [the short notation: {π β πΆ}] β β± (is an event).
(4.8)
3
Of course, for the most part we consider π = β1 (just random variables, without
any mention of the space β1 in which they take values); or π = βπ (random vectors); or,
say, the space of all π × π matrices (or symmetric matrices) β then we speak of random
matrices (or random symmetric matrices). In all these cases as the π-algebra π³ we take
the corresponding Borel π-algebra.
Theorem 4.2. Let π be a measurable function from (π, π³ ) to (π, π΄), and π a
measurable function from (π, π΄) to (π,
( π΅).
) Then the composition π β π (i. e. the function π 7β π deο¬ned by (π β π )(π₯) = π π (π₯) ) is an (π³ , π΅)-measurable function.
The proof is so simple that I omit it.
The βprobabilisticβ formulation (i. e., in the language of (the set-theoretic introduction
to) probability theory):
Let π be a random variable with values in (π, π³ ). If π is an (π³ , π΅)-measurable
(
)
function π 7β π, then π = π(π) (which is the short notation for π(π) = π π(π) ) is a
random variable with values in (π, π΅).
The particular case of π = π = β1 , π³ = π΅ = β¬1 :
Let π be a random variable (by default, taking values in the real line), and π a Borel
measurable real-valued function. Then π(π) also is a random variable.
The deο¬nition (4.7) requires checking π β1 (πΆ) β π³ for all sets in π΄. This may be too
many sets.
But usually we are able to do it for much fewer sets:
Theorem 4.3. Let π΄ be the π-algebra generated by some class π of subsets of π.
Then if
πΆ β π β π β1 (πΆ) β π³ ,
(4.9)
the function π is (π³ , π΄)-measurable.
Proof. Let us denote with π the class of all subsets of π for which π β1 (πΆ) β π³ .
Clearly π β π.
If we prove that π is a π-algebra, then, by deο¬nition, π β π(π) = π΄, and π΄ β π is
just the statement of our theorem.
So let us check that π is a π-algebra in π .
This means, ο¬rst, that
π β π;
(4.10)
πΆ β π β πΆ π = π β πΆ β π,
(4.11)
second and third, that
πΆ1 , πΆ2 , ..., πΆπ , ... β π β
β
βͺ
πΆπ β π.
(4.12)
π=1
Checking (4.10): π β1 (π ) = {π₯ : π (π₯) β π } = π β π³ . Now to (4.11):
π β1 (π β πΆ) = π β π β1 (πΆ) β π³ ;
4
(4.13)
and (4.12):
π β1
β
(βͺ
π=1
β
) βͺ
πΆπ =
π β1 (πΆπ ) β π³ .
(4.14)
π=1
So, e. g., since the one-dimensional Borel π-algebra is generated by all semi-inο¬nite
intervals (ββ, π] (or by all (β β, π), β β < π < β), to check that a real-valued function
π = π(π) is a random variable it is enough to check that all sets {π : π(π) β€ π} (or <) are
events.
The proof of Theorem 4.3 is the standard thing that we have to do inο¬nitely many
times if we want to set probability theory rigorously; we cannot do without such thins, but
they are pretty simple, and we have to remember that they are not what is very important
in probability theory.
Theorem 4.4. Every continuous function π : π 7β π (of course, if we want to
consider continuous functions, we have to suppose that π and π are sets in Euclidean
spaces, or metric, or at least topological spaces) is (β¬π , β¬π )-measurable (in short: Borel
measurable).
Proof. By deο¬nition, β¬π is the π-algebra generated by the open subsets of π ; so by
Theorem 4.3 it is enough to check that for every open πΆ β π
π β1 (πΆ) = {π₯ : π (π₯) β πΆ} β β¬π
[= π{open subsets of π}].
(4.15)
But if the function π is continuous, the inverse image of every open set is again open; so
(4.15) is true.
In the same way we prove that every monotone function π : β1 7β β1 is Borel
measurable: we use the fact that β¬1 = π{all intervals}, and that the inverse image π β1 of
every interval is again an interval.
Now, the most important thing in probability theory must be the probabilities; and
we havenβt even mentioned them. Of course it is because we just deο¬ned what random
variables are, and this is only the ο¬rst, and not the most important thing about random
variables.
I donβt really know in what place of my lecture notes to put what follows β just as I didnβt know in
what place of the lecture I should put it. We are going to use some things of measure theory.
I would like to tell you that measure theory: the theory of measure and integration, contains material
of diο¬erent degrees of diο¬culty: some things are very simple; some other are of medium diο¬culty (such is,
for example, the construction of Lebesgue integral); and some other parts are pretty complicated. Such is
the construction of the Lebesgue measure on the real line or in βπ . We could develop our theory using only
very simple and medium-simple things; but in probability theory this would restrict us to discrete random
variables, and we wouldnβt be able to speak of continuous distributions, for which integration with respect
to the Lebesgue measure is needed. So in fact we cannot do without the Lebesgue measure, and with it, the
more complicated things in measure theory.
In contrast with this, things belonging to the set-theoretic introduction both to measure theory and
probability theory are all pretty simple.
5