* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download linear algebra - Math Berkeley - University of California, Berkeley
Linear least squares (mathematics) wikipedia , lookup
Euclidean vector wikipedia , lookup
Symmetric cone wikipedia , lookup
Matrix (mathematics) wikipedia , lookup
Non-negative matrix factorization wikipedia , lookup
Exterior algebra wikipedia , lookup
Determinant wikipedia , lookup
Perron–Frobenius theorem wikipedia , lookup
Orthogonal matrix wikipedia , lookup
Gaussian elimination wikipedia , lookup
Vector space wikipedia , lookup
Singular-value decomposition wikipedia , lookup
Jordan normal form wikipedia , lookup
Covariance and contravariance of vectors wikipedia , lookup
Eigenvalues and eigenvectors wikipedia , lookup
System of linear equations wikipedia , lookup
Matrix calculus wikipedia , lookup
Matrix multiplication wikipedia , lookup
LINEAR ALGEBRA
by
Alexander Givental
s
ι
σ
u
m
δ
α
Σ
τ
Sumizdat
s
ι
σ
u
Σ
δ
m
τ
α
Published by Sumizdat
5426 Hillside Avenue, El Cerrito, California 94530, USA
http://www.sumizdat.org
University of California, Berkeley Cataloging-in-Publication Data
Givental, Alexander
Linear Algebra / by Alexander Givental ;
El Cerrito, Calif. : Sumizdat, 2009.
iv, 200 p. : ill. ; 23 cm.
Includes bibliographical references and index.
ISBN 978-0-9779852-4-1
1. Linear algebra. I. Givental, Alexander.
Library of Congress Control Number:
c
2009
by Alexander Givental
All rights reserved. Copies or derivative products of the whole work or
any part of it may not be produced without the written permission from
Alexander Givental (givental@math.berkeley.edu), except for brief excerpts
in connection with reviews or scholarly analysis.
Credits
Editing: Alisa Givental.
Art advising: Irina Mukhacheva, http://irinartstudio.com
Cataloging-in-publication: Catherine Moreno,
Technical Services Department, Library, UC Berkeley.
Layout, typesetting and graphics: using LATEX and Xf ig.
Printing and binding: Thomson-Shore, Inc., http://www.tshore.com
Member of the Green Press Initiative.
7300 West Joy Road, Dexter, Michigan 48130-9701, USA.
Offset printing on 30% recycled paper; cover: by 4-color process on Kivar-7.
ISBN 978-0-9779852-4-1
Contents
1 A Crash Course
1
Vectors . . . . . . . . . . .
2
Quadratic Curves . . . . . .
3
Complex Numbers . . . . .
4
Problems of Linear Algebra
.
.
.
.
1
1
9
17
23
2 Dramatis Personae
1
Matrices . . . . . . . . . . . . . . . . . . . . . . . . . .
2
Determinants . . . . . . . . . . . . . . . . . . . . . . .
3
Vector Spaces . . . . . . . . . . . . . . . . . . . . . . .
31
31
41
59
3 Simple Problems
1
Dimension and Rank . . . . . . . . . . . . . . . . . . .
2
Gaussian Elimination . . . . . . . . . . . . . . . . . . .
3
The Inertia Theorem . . . . . . . . . . . . . . . . . . .
71
71
83
97
4 Eigenvalues
1
The Spectral Theorem .
2
Jordan Canonical Forms
Hints . . . . . . . . . . . . .
Answers . . . . . . . . . . . .
Bibliography . . . . . . . .
Index . . . . . . . . . . . . .
.
.
.
.
.
.
iii
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
117
117
137
149
153
157
158
iv
Foreword
“Mathematics is a form of culture.
Mathematical ignorance is a form of
national culture.”
“In antiquity, they knew few
ways to learn, but modern pedagogy
discovered 1001 ways to fail.”
August Dumbel1
1
From his eye-opening address to NCVM, the National Council of Visionaries
of Mathematics.
v
vi
Chapter 1
A Crash Course
1
Vectors
Operations and their properties
The following definition of vectors can be found in elementary geometry textbooks, see for instance [4].
−
−→
A directed segment AB on the plane or in space is specified by
an ordered pair of points: the tail A and the head B. Two directed
−−
→
−−→
segments AB and CD are said to represent the same vector if
they are obtained from one another by translation. In other words,
the lines AB and CD must be parallel, the lengths |AB| and |CD|
must be equal, and the segments must point toward the same of the
two possible directions (Figure 1).
B
B
v
D
u
A
C
C
A
Figure 1
u+v
Figure 2
A trip from A to B followed by a trip from B to C results in a
trip from A to C. This observation motivates the definition of the
1
2
Chapter 1. A CRASH COURSE
−−
→
vector sum w = v + u of two vectors v and u: if AB represents v
−→
−−→
and BC represents u then AC represents their sum w (Figure 2).
The vector 3v = v + v + v has the same direction as v but is 3
times longer. Generalizing this example one arrives at the definition
of the multiplication of a vector by a scalar: given a vector v and
a real number α, the result of their multiplication is a vector, denoted
αv, which has the same direction as v but is α times longer. The
last phrase calls for comments since it is literally true only for α > 1.
If 0 < α < 1, being “α times longer” actually means “shorter.” If
α < 0, the direction of αv is in fact opposite to the direction of v.
Finally, 0v = 0 is the zero vector represented by directed segments
−→
AA of zero length.
Combining the operations of vector addition and multiplication
by scalars we can form expressions αu+βv+...+γw. They are called
linear combinations of the vectors u, v, ..., w with the coefficients
α, β, ..., γ.
The pictures of a parallelogram and parallelepiped (Figures 3 and
4) prove that the addition of vectors is commutative and associative: for all vectors u, v, w,
u + v = v + u and (u + v) + w = u + (v + w).
u
u+v+w
v+u
v
w
v+w
v
u+v
u+v
u
Figure 3
v
u
Figure 4
From properties of proportional segments and similar triangles,
the reader will easily derive the following two distributive laws:
for all vectors u, v and scalars α, β,
(α + β)u = αu + βv and α(u + v) = αu + αv.
3
1. Vectors
Coordinates
−→ −−→
From a point O in space, draw three directed segments OA, OB, and
−−→
OC not lying in the same plane and denote by i, j, and k the vectors
−−→
they represent. Then every vector u = OU can be uniquely written
as a linear combination of i, j, k (Figure 5):
u = αi + βj + γk.
The coefficients form the array (α, β, γ) of coordinates of the vector u (and of the point U ) with respect to the basis i, j, k (or the
coordinate system OABC).
βj
αi
U
γk
u
B
C
O
A
Figure 5
Multiplying u by a scalar λ or adding another vector u′ = α′ i +
and using the above algebraic properties of the operations
with vectors, we find:
β ′ j + γ ′ k,
λu = λαi + λβj + λγk, and u + u′ = (α + α′ )i + (β + β ′ )j + (γ + γ ′ )k.
Thus, the geometric operations with vectors are expressed by componentwise operations with the arrays of their coordinates:
λ(α, β, γ) = (λα, λβ, λγ),
(α, β, γ) + (α′ , β ′ , γ ′ ) = (α + α′ , β + β ′ , γ + γ ′ ).
What is a vector?
No doubt, the idea of vectors is not new to the reader. However,
some subtleties of the above introduction do not easily meet the eye,
and we would like to say here a few words about them.
4
Chapter 1. A CRASH COURSE
As many other mathematical notions, vectors come from physics,
where they represent quantities, such as velocities and forces, which
are characterized by their magnitude and direction. Yet, the popular
slogan “Vectors are magnitude and direction” does not qualify for a
mathematical definition of vectors, e.g. because it does not tell us
how to operate with them.
The computer science definition of vectors as arrays of numbers,
to be added “apples with apples, oranges with oranges,” will meet
the following objection by physicists. When a coordinate system
rotates, the coordinates of the same force or velocity will change, but
the numbers of apples and oranges won’t. Thus forces and velocities
are not arrays of numbers.
The geometric notion of a directed segment resolves this problem.
Note however, that calling directed segments vectors would constitute
abuse of terminology. Indeed, strictly speaking, directed segments
can be added only when the head of one of them coincides with the
tail of the other.
So, what is a vector? In our formulations, we actually avoided
answering this question directly, and said instead that two directed
segments represent the same vector if . . . Such wording is due to pedagogical wisdom of the authors of elementary geometry textbooks,
because a direct answer sounds quite abstract: A vector is the class
of all directed segments obtained from each other by translation in
space. Such a class is shown in Figure 6.
Figure 6
This picture has another interpretation: For every point in space
(the tail of an arrow), it indicates a new position (the head). The geometric transformation in space defined this way is translation. This
leads to another attractive point of view: a vector is a translation.
Then the sum of two vectors is the composition of the translations.
5
1. Vectors
The dot product
This operation encodes metric concepts of elementary Euclidean geometry, such as lengths and angles. Given two vectors u and v of
lengths |u| and |v| and making the angle θ to each other, their dot
product (also called inner product or scalar product) is a number
defined by the formula:
hu, vi = |u| |v| cos θ.
Of the following properties, the first three are easy (check them!):
(a) hu, vi = hv, ui (symmetricity);
(b) hu, ui = |u|2 > 0 unless u = 0 (positivity);
(c) hλu, vi = λhu, vi = hu, λvi (homogeneity);
(d) hu + v, wi = hu, wi + hv, wi (additivity with respect to the
first factor).
To prove the last property, note that due to homogeneity, it suffices to check it assuming that w is a unit vector, i.e. |w| = 1. In
−−
→
this case, consider (Figure 7) a triangle ABC such that AB = u,
−−→
−→
−−→
BC = v, and therefore AC = u + v, and let OW = w. We can
consider the line OW as the number line, with the points O and W
representing the numbers 0 and 1 respectively, and denote by α, β, γ
the numbers representing perpendicular projections to this line of the
vertices A, B, C of the triangle. Then
−−
→
−−→
−→
hAB, wi = β − α, hBC, wi = γ − β, andhAC, wi = γ − α.
The required identity follows, because γ − α = (γ − β) + (β − α).
B
u
v
C
A
u+v
O
α
W
w
β
γ
Figure 7
Combining the properties (c) and (d) with (a), we obtain the
following identities, expressing bilinearity of the dot product (i.e.
6
Chapter 1. A CRASH COURSE
linearity with respect to each factor):
hαu + βv, wi = αhu, wi + βhv, wi
hw, αu + βvi = αhw, ui + βhw, vi.
The following example illustrates the use of nice algebraic properties
of dot product in elementary geometry.
B
v−u
u
θ
v
A
C
Figure 8
Example. Given a triangle ABC, let us denote by u and v the
−−
→
−→
vectors represented by the directed segments AB and AC and use
properties of the inner product in order to compute the length |BC|.
−−→
Notice that the segment BC represents v − u. We have:
|BC|2 = hv − u, v − ui = hv, vi + hu, ui − 2hu, vi
= |AC|2 + |AB|2 − 2|AB| |AC| cos θ.
This is the famous Law of Cosines in trigonometry.
When the vectors u and v are orthogonal, i.e. hu, vi = 0, then
the formula turns into the Pythagorean theorem:
|u ± v|2 = |u|2 + |v|2 .
When basis vectors i, j, k are pairwise orthogonal and unit, the
coordinate system is called Cartesian.1 We have:
hi, ii = hj, ji = hk, ki = 1, and hi, ji = hj, ki = hk, ii = 0.
Thus, in Cartesian coordinates, the inner squares and the dot products of vectors r = xi + yj + zk and r′ = x′ i + y ′ j + z ′ k are given by
the formulas:
|r|2 = x2 + y 2 + z 2 ,
1
hr, r′ i = xx′ + yy ′ + zz ′ .
After René Descartes (1596–1650).
7
1. Vectors
EXERCISES
1. A mass m rests on an inclined plane making 30◦ with the horizontal
plane. Find the forces of friction and reaction acting on the mass. 2. A ferry, capable of making 5 mph, shuttles across a river of width 0.6 mi
with a strong current of 3 mph. How long does each round trip take? 3. Prove that for every closed broken line ABC . . . DE,
−−
→ −−→
−−→ −→
AB + BC + · · · + DE + EA = 0.
4. Three medians of a triangle ABC intersect at one point M called the
barycenter of the triangle. Let O be any point on the plane. Prove that
−−→ 1 −→ −−→ −−
→
OM = (OA + OB + OC).
3
−−→ −−→ −−→
5. Prove that M A + M B + M C = 0 if and only if M is the barycenter of
the triangle ABC.
6.⋆ Along three circles lying in the same plane, vertices of a triangle are
moving clockwise with the equal constant angular velocities. Find how the
barycenter of the triangle is moving. 7. Prove that if AA′ is a median in a triangle ABC, then
−−→′
→ −→
1 −−
AA = (AB + AC).
2
8. Prove that from medians of a triangle, another triangle can be formed. 9. Sides of one triangle are parallel to the medians of another. Prove that
the medians of the latter triangle are parallel to the sides of the former one.
10. From medians of a given triangle, a new triangle is formed, and from
its medians, yet another triangle is formed. Prove that the third triangle
is similar to the first one, and find the coefficient of similarity. 11. Midpoints of AB and CD, and of BC and DE are connected by two
segments, whose midpoints are also connected. Prove that the resulting
segment is parallel to AE and congruent to AE/4.
12. Prove that a point X lies on the segment AB if and only if for any
−−→
origin O and some scalar 0 ≤ λ ≤ 1 the radius-vector OX has the form:
−−→
−→
−−→
OX = λOA + (1 − λ)OB.
13.⋆ Given a triangle ABC, we construct a new triangle A′ B ′ C ′ in such a
way that A′ is centrally symmetric to A with respect to the center B, B ′
centrally symmetric to B with respect to C, and C ′ centrally symmetric
to C with respect to A, and then erase the original triangle. Reconstruct
ABC from A′ B ′ C ′ by straightedge and compass. 8
Chapter 1. A CRASH COURSE
14. Prove the Cauchy – Schwarz inequality: hu, vi2 ≤ hu, uihv, vi. In
which cases does the inequality turn into equality? Deduce the triangle
inequality: |u + v| ≤ |u| + |v|. −−
→ −−→
15. Compute the inner product hAB, BCi if ABC is a regular triangle
inscribed into a unit circle. 16. Prove that if the sum of three unit vectors is equal to 0, then the angle
between each pair of these vectors is equal to 120◦.
17. Express the inner product hu, vi in terms of the lengths |u|, |v|, |u + v|
of the two vectors and of their sum. 18. (a) Prove that if four unit vectors lying in the same plane add up to 0,
then they form two pairs of opposite vectors. (b) Does this remain true if
the vectors do not have to lie in the same plane? 19.⋆ Let AB . . . E be a regular polygon with the center O. Prove that
−→ −−→
−−→
OA + OB + · · · + OE = 0.
20. Prove that if u + v and u − v are perpendicular, then |u| = |v|.
21. For arbitrary vectors u and v, verify the equality:
|u + v|2 + |u − v|2 = 2|u|2 + 2|v|2,
and derive the theorem: The sum of the squares of the diagonals of a
parallelogram is equal to the sum of the squares of the sides.
22. Prove that for every triangle ABC and every point X in space,
−−→ −−→ −−→ −→ −−→ −
−
→
XA · BC + XB · CA + XC · AB = 0.
23.⋆ For four arbitrary points A, B, C, and D in space, prove that if the
lines AC and BD are perpendicular, then AB 2 + CD2 = BC 2 + DA2, and
vice versa. 24.⋆ Given a quadrilateral with perpendicular diagonals. Show that every
quadrilateral, whose sides are respectively congruent to the sides of the
given one, has perpendicular diagonals. 25.⋆ A regular triangle ABC is inscribed into a circle of radius R. Prove
that for every point X of this circle, XA2 + XB 2 + XC 2 = 6R2. 26.⋆ Let A1 B1 A2 B2 . . . An Bn be a 2n-gon inscribed into a circle. Prove
−−−→
−−−→ −−−→
that the length of the vector A1 B1 + A2 B2 + · · · + An Bn does not exceed
the diameter. 27.⋆ A polyhedron is filled with air under pressure. The pressure force to
each face is the vector perpendicular to the face, proportional to the area
of the face, and directed to the exterior of the polyhedron. Prove that the
sum of these vectors is equal to 0. 9
2. Quadratic Curves
2
Quadratic Curves
Conic Sections
On the coordinate plane, consider points (x, y), satisfying an equation of the form
ax2 + 2bxy + cy 2 + dx + ey + f = 0.
Generally speaking, such points form a curve. The set of solutions is
called a quadratic curve, provided that not all of the coefficients
a, b, c vanish.
Being a quadratic curve is a geometric property. Indeed, if the
coordinate system is changed (say, rotated, stretched, or translated),
the same curve will be described by a different equation, but the
L.H.S. of the equation will remain a polynomial of degree 2.
Our goal in this section is to describe all possible quadratic curves
geometrically (i.e. disregarding their positions with respect to coordinate systems); or, in other words, to classify quadratic equations
in two variables up to suitable changes of the variables.
A.Givental, 1999
"The Empty Curve"
oil on canvas
Figure 9
Example: Dandelin’s spheres. The equation x2 + y 2 = z 2
describes in a Cartesian coordinate system a cone (a half of which
is shown on Figure 10). Intersecting the cone by planes, we obtain
examples of quadratic curves. Indeed, substituting the equation z =
αx + βy of a secting plane into the equation of the cone, we get a
quadratic equation x2 + y 2 = (αx + βy)2 (which actually describes
the projection of the conic section to the horizontal plane).
10
Chapter 1. A CRASH COURSE
The conic section on the picture is an ellipse. According to one
of many equivalent definitions,2 an ellipse consists of all points on the
plane with a fixed sum of the distances to two given points (called
foci of the ellipse). Our picture illustrates an elegant way3 to locate
foci of a conic section. Place into the conic cup two balls (a small
and a large one), and inflate the former and deflate the latter until
they touch the plane (one from inside, the other from outside). Then
the points F and G of tangency are the foci. Indeed, let A be an
arbitrary point on the conic section. The segments AF and AG lie in
the cutting plane and are therefore tangent to the balls at the points
F and G respectively. On the generatrix OA, mark the points B
and C where it crosses the circles of tangency of the cone with the
balls. Then AB and AC are tangent at these points to the respective
balls. Since all tangent segments from a given point to a given ball
have the same length, we find that |AF | = |AB|, and |AG| = |AC|.
Therefore |AF | + |AG| = |BC|. But |BC| is the distance along the
generatrix between two parallel horizontal circles on the cone, and is
the same for all generatrices. Therefore the sum |AF | + |AG| stays
fixed when the point A moves along our conic section.
C
G
A
F
B
O
Figure 10
Beside ellipses, we find among conic sections: hyperbolas (when
a plane cuts through both halves of the cone), parabolas (cut by
planes parallel to generatrices), and their degenerations (obtained
when the cutting plane is replaced with the parallel one passing
through the vertex O of the cone): just one point O, pairs of intersecting lines, and “double-lines.” We will see that this list exhausts
all possible quadratic curves, except two degenerate cases: pairs of
parallel lines and (yes!) empty curves.
2
According to a mock definition, “an ellipse is a circle inscribed into a square
with unequal sides.”
3
Due to Germinal Pierre Dandelin (1794–1847).
11
2. Quadratic Curves
Orthogonal Diagonalization (Toy Version)
Let (x, y) be Cartesian coordinates on a Euclidean plane, and let
Q be a quadratic form on the plane, i.e. a homogeneous degree-2
polynomial:
Q(x, y) = ax2 + 2bxy + cy 2 .
Theorem. Every quadratic form in a suitably rotated coordinate system assumes the form:
Q = AX 2 + CY 2 .
Proof. Rotating the basis vectors i and j counter-clockwise through
j
I
J
θ
θ
i
O
Figure 11
the angle θ we obtain (Figure 11):
I = (cos θ)i + (sin θ)j and J = −(sin θ)i + (cos θ)j.
Therefore
xi + yj = XI + Y J = (X cos θ − Y sin θ)i + (X sin θ + Y cos θ).
This shows that the old coordinates (x, y) are expressed in terms of
the new coordinates (X, Y ) by the formulas
x = X cos θ − Y sin θ,
y = X sin θ + Y cos θ.
(∗)
Substituting into ax2 + 2bxy + cy 2 , we rewrite the quadratic form
in the new coordinates as AX 2 + 2BXY + CY 2 , where A, B, C are
certain expressions of a, b, c and θ. We want to show that choosing
the rotation angle θ appropriately, we can make 2B = 0. Indeed,
making the substitution explicitly and ignoring X 2 - and Y 2 -terms,
we find Q in the form
· · · + XY −2a sin θ cos θ + 2b(cos2 θ − sin2 θ) + 2c sin θ cos θ + . . .
12
Chapter 1. A CRASH COURSE
Thus 2B = (c − a) sin 2θ + 2b cos 2θ. When b = 0, our task is trivial,
as we can take θ = 0. When b 6= 0, we can divide by 2b to obtain
cot 2θ =
a−c
.
2b
Since cot assumes arbitrary real values, the theorem follows.
Example. For Q = x2 + xy + y 2 , we have cot 2θ = 0, and
find 2θ = π/2 + πk (k = 0, ±1, ±2, . . . ), i.e. up to multiples of
2π, θ = ±π/4 or ±3π/4. (This is a general rule: together with a
solution θ, the angle θ + π as well as θ ± π/2, also work. Could
you give an a√priori explanation?)
√ Taking θ = π/4, we compute
x = (X − Y )/ 2, y = (X + Y )/ 2, and finally find:
3
1
1
x2 + y 2 + xy = X 2 + Y 2 + (X 2 − Y 2 ) = X 2 + Y 2 .
2
2
2
Completing the Squares
In our study of quadratic curves, the plan is to simplify the equation
of the curve as much as possible by changing the coordinate system.
In doing so we may assume that the coordinate system has already
been rotated to make the coefficient at xy-term vanish. Therefore
the equation at hands assumes the form
ax2 + cy 2 + dx + ey + f = 0,
where a and c cannot both be zero. Our next step is based on
completing squares: whenever one of these coefficients (say, a) is
non-zero, we can remove the corresponding linear term (dx) this way:
d
d 2
d2
d2
2
2
ax + dx = a(x + x) = a (x + ) − 2 = aX 2 − .
a
2a
4a
4a
Here X = x + d/2a, and this change represents translation of the
origin of the coordinate system from the point (x, y) = (0, 0) to
(x, y) = (−d/2a, 0).
Example. The equation x2 + y 2 = 2ry can be rewritten by completing the square in y as x2 + (y − r)2 = r 2 . Therefore, it describes
the circle of radius r centered at the point (0, r) on the y-axis.
With the operations of completing the squares in one or both
variables, renaming the variables if necessary, and dividing the whole
equation by a non-zero number (which does not change the quadratic
curve), we are well-armed to obtain the classification.
13
2. Quadratic Curves
Classification of Quadratic Curves
Case I: a 6= 0 6= c. The equation is reduced to aX 2 + cY 2 = F by
completing squares in each of the variables.
Sub-case (i): F 6= 0. Dividing the whole equation by F , we
obtain the equation (a/F )X 2 + (c/F )Y 2 = 1. When both a/F and
c/F are positive, the equation can be re-written as
X2 Y 2
+ 2 = 1.
α2
β
This is the equation of an ellipse with semiaxes α and β (Figure
12). When one a/F and c/F have opposite signs, we get (possibly
renaming the variables) the equation of a hyperbola (Figure 13)
X2 Y 2
− 2 = 1.
α2
β
When a/F and c/F are both negative, the equation has no real
solutions, so that the quadratic curve is empty (Figure 9).
β
−α
−β
Figure 12
α
α
−α
Figure 13
Sub-case (ii): F = 0. Then, when a and c have opposite signs
(say, a = α2 > 0, and c = −γ 2 < 0), the equation α2 X 2 = γ 2 Y 2
describes a pair of intersecting lines Y = ±kX, where k = α/γ
(Figure 14). When a and c are of the same sign, the equation aX 2 +
cY 2 = 0 has only one real solution: (X, Y ) = (0, 0). The quadratic
curve is a “thick” point.4
Case II: One of a, c is 0. We may assume without loss of generality that c = 0. Since a 6= 0, we can still complete the square in x
to obtain an equation of the form aX 2 + ey + F = 0.
Sub-case (i): e 6= 0. Divide the whole equation by e and put
Y = y − F/e to arrive at the equation Y = −aX 2 /e. This curve is a
parabola Y = kX 2 , where k = −a/e 6= 0 (Figure 15).
4
In fact this is the point of intersection of a pair of “imaginary” lines consisting
of non-real solutions.
14
Chapter 1. A CRASH COURSE
Sub-case (ii): e = 0. The equationpX 2 = −F/a describes: a
pair of parallel lines X = ±k (where k = −F/e), or the empty set
(when F/e > 0), or a “double-line” X = 0 (when F = 0).
k
k
1
1
−k
Figure 14
Figure 15
We have proved the following:
Theorem. Every quadratic curve on a Euclidean plane is
one of the following: an ellipse, hyperbola, parabola, a pair
of intersecting, parallel, or coinciding lines, a “thick” point
or the empty set. In a suitable Cartesian coordinate system,
the curve is described by one of the standard equations:
X2 Y 2
± 2 = 1, −1, or 0; Y = kX 2 ; X 2 = k.
α2
β
EXERCISES
28. Prove that with the exception of parabolas, each conic section has a
center of symmetry. 29. Prove that a hyperbolic conic section consists of all points on the secting
plane with a fixed difference of the distances to two points (called foci).
Locate the foci by adjusting the construction of Dandelin’s spheres.
30. Locate foci of (a) ellipses and (b) hyperbolas given by the standard
equations x2 /α2 ± y 2 /β 2 = 1, where α > β > 0. 31. A line is called an axis of symmetry of a given function Q(x, y) if the
function takes on the same values at every pair of points symmetric about
this line. Prove that every quadratic form has two perpendicular axes of
symmetry. (They are called principal axes.) 32. Prove that if a line passing through the origin is an axis of symmetry
of a quadratic form Q = ax2 + 2bxy + cy 2 , then the perpendicular line is
also its axis of symmetry.
33. Can a quadratic form on the plane have > 2 axes of symmetry? 34. Find axes of symmetry of the following quadratic forms Q:
(a) x2 + xy + y 2 , (b) x2 + 2xy + y 2 , (c) x2 + 4xy + y 2 .
Which of them have level curves Q = const ellipses? hyperbolas? 15
2. Quadratic Curves
35. Transform the equation 23x2 + 72xy + 2y 2 = 25 to one of the standard
forms by rotating the coordinate system explicitly. 36. Prove that ellipses are obtained by stretching (or shrinking) the unit
circle in two perpendicular directions with two different coefficients.
37. Prove that every quadratic form on the plane in a suitable (but non
necessarily Cartesian) coordinate system assumes one of the forms:
X 2 + Y 2 , X 2 − Y 2 , −X 2 − Y 2 , X 2 , −Y 2 , 0.
Sketch graphs of these functions.
38. Complete squares to find out which of the following curves are ellipses
and which are hyperbolas:
x2 + 4xy = 1, x2 + 2xy + 4y 2 = 1, x2 + 4xy + 4y 2 = 1, x2 + 6xy + 4y 2 = 1.
39. Find the place of the quadratic curve x2 − 4y 2 = 2x − 4y in the classification of quadratic curves. 40.⋆ Prove that ax2 + 2bxy + cy 2 = 1 is a hyperbola if and only if ac < b2 .
B
O
C
O’
E
A
B’
Figure 16
41. Examine Figure 16 (showing two cones centrally symmetric to each
other about the center of the ellipse E), and prove that |AO| + |AO′ | =
|BB ′ |. Derive that the vertex O of the cone is a focus of the projection of a
conic section along the axis of the cone to the perpendicular plane passing
through its vertex.5
42.⋆ Prove that ellipses, parabolas and hyperbolas can be characterized as
plane curves formed by all points with a fixed ratio e (called eccentricity)
between the distances to a fixed point (a focus) and a fixed line (called the
directrix), and that e > 1 for ellipses, e = 1 for parabolas, and 1 > e > 0
for ellipses (e.g. e = |AO|/|AC| in Figure 16). 43.⋆ Prove that light rays emitted from one focus of an ellipse and reflected
in it as in a mirror will focus at the other focus. Formulate and prove
similar optical properties of hyperbolas and parabolas. 5
The proof based on Figure 16 is due to Anil Hirani.
16
Chapter 1. A CRASH COURSE
17
3. Complex Numbers
3
Complex Numbers
Law and Order
Life is unfair: The quadratic equation x2 − 1 = 0 has two solutions
x = ±1, but a similar equation x2 + 1 = 0 has no solutions at all. To
restore justice one introduces new number i, the imaginary unit,
such that i2 = −1, and thus x = ±i become two solutions to the
equation. This is how complex numbers could have been invented.
More formally, complex numbers are introduced as ordered pairs
(a, b) of real numbers, written in the form z = a + bi. The real
numbers a and b are called respectively the real part and imaginary part of the complex number z, and are denoted a = Re z and
b = Im z.
The sum of z = a + bi and w = c + di is defined as
z + w = (a + c) + (b + d)i.
The product is defined so as to comply with the relation i2 = −1:
zw = ac + bdi2 + adi + bci = (ac − bd) + (ad + bc)i.
The operations of addition and multiplication of complex numbers
enjoy the same properties as those of real numbers do. In particular,
the product is commutative and associative.
The complex number z̄ = a − bi is called complex conjugate
to z = a + bi. The operation of complex conjugation respects sums
and products:
z + w = z̄ + w̄
and
zw = z̄ w̄.
This can be easily checked from definitions, but there is a more profound explanation. The equation x2 + 1 = 0 has two roots, i and −i,
and the choice of the one to be called i is totally ambiguous. The
complex conjugation consists in systematic renaming i by −i and
vice versa, and such renaming cannot affect properties of complex
numbers.
Complex numbers satisfying z̄ = z are exactly the real numbers
a+0i. We will see that this point of view on real numbers as complex
numbers invariant under complex conjugation is quite fruitful.
The product zz̄ = a2 + b2 (check this formula!) is real, and is
positive unless z = 0 + 0i = 0. This shows that
1
z̄
a
b
=
= 2
−
i.
z
zz̄
a + b2 a2 + b2
18
Chapter 1. A CRASH COURSE
Hence the division by z is well-defined for any non-zero complex
number z. In terminology of Abstract Algebra, complex numbers
form therefore a field6 (just as real or rational numbers do).
The field of complex numbers is denoted by C (while R stands
for reals, and Q for rationals).
√
√
The non-negative real number |z| = zz̄ = a2 + b2 is called the
absolute value of z. The absolute value function is multiplicative:
√
√
|zw| = zwzw = zz̄ww̄ = |z| · |w|.
It actually coincides with the absolute value of real numbers when
applied to complex numbers with zero imaginary part: |a + 0i| = |a|.
z
b
i
u
θ
1
a
0
z −1
i
b
z
Figure 17
Geometry
We can identify complex numbers z = a + bi with points (a, b) on
the real coordinate plane (Figure 17). This way, the number 0 is
identified with the origin, and 1 and i become the unit basis vectors
(1, 0) and (0, 1). The coordinate axes are called respectively the real
and imaginary axes. Addition of complex numbers coincides with
the operation of vector sum (Figure 18).
The absolute value function has the geometrical meaning of the
distance from the origin: |z| = hz, zi1/2 . In particular, the triangle
inequality |z + w| ≤ |z| + |w| holds true. Complex numbers of unit
absolute value |z| = 1 form the unit circle centered at the origin.
The operation of complex conjugation acts on the radius-vectors
z as the reflection about the real axis.
6
This requires that a set be equipped with commutative and associative operations (called addition and multiplication) satisfying the distributive law
z(v + w) = zv + zw, possessing the zero and unit elements 0 and 1, additive
opposites −z for every z, and multiplicative inverses 1/z for every z 6= 0.
19
3. Complex Numbers
In order to describe a geometric meaning of complex multiplication, let us study the way multiplication by a given complex number
z acts on all complex numbers w, i.e. consider the function w 7→ zw.
For this, write the vector representing a non-zero complex number
z in the polar (or trigonometric) form z = ru where r = |z| is a
positive real number, and u = z/|z| = cos θ + i sin θ has absolute
value 1 (see Figure 19). Here θ = arg z, called the argument of the
complex number z, is the angle that z as a vector makes with the
positive direction of the real axis.
Clearly, multiplication by r acts on all vectors w by stretching
them r times. Multiplication by u applied to w = x + yi yields a new
complex number uw = X + Y i according to the rule:
X = Re [(cos θ + i sin θ)(x + yi)] = x cos θ − y sin θ
Y = Im [(cos θ + i sin θ)(x + yi)] = x sin θ + y cos θ.
Comparing with the formula (∗) in Section 2, we conclude that the
transformation w 7→ uw is the counter-clockwise rotation through
the angle θ.
z
i
z+w
w
w
z
zw
θ
θ
1
Figure 18
Figure 19
Notice a difference though: In Section 2, we rotated the coordinate system, and the formulas (∗) expressed old coordinates of a
vector via new coordinates of the same vector. This time, we transform vectors, while the coordinate system remains unchanged. The
same formulas now express coordinates (X, Y ) of a new vector in
terms of the coordinates (x, y) of a the old one.
Anyway, the conclusion is that multiplication by z is the composition of two operations: stretching |z| times, and rotating through
the angle arg z.
In other words, the product operation of two complex numbers
sums their arguments and multiplies absolute values:
|zw| = |z| · |w|, arg zw = arg z + arg w modulo 2π.
For example, if z = r(cos θ +i sin θ), then z n = r n (cos nθ +i sin nθ).
20
Chapter 1. A CRASH COURSE
The Fundamental Theorem of Algebra
A degree 2 polynomial z 2 + pz + q has two roots
p
−p ± p2 − 4q
z± =
.
2
This quadratic formula works regardless of the sign of the discriminant p2 − 4q, provided that we allow the roots to be complex, and take in account multiplicity. Namely, if p2 − 4q = 0,
z 2 + pz + q = (z + p/2)2 and therefore the single root z = −p/2 has
multiplicity two. If p2 − 4qp
< 0 the roots are complex conjugate with
Re z± = −p/2, Im z± = ± |p2 − 4q|/2. The Fundamental Theorem
of Algebra shows that not only justice has been restored, but that
any degree n polynomial has n complex roots, possibly — multiple.
Theorem. A degree n polynomial
P (z) = z n + a1 z n−1 + ... + an−1 z + an
with complex coefficients a1 , ..., an factors as
P (z) = (z − z1 )m1 ...(z − zr )mr .
Here z1 , ..., zr are complex roots of P , and m1 , ..., mr their
multiplicities, m1 + · · · + mr = n.
A proof of this theorem deserves a separate chapter (if not a
book). Many proofs are known, based on various ideas of Algebra,
Analysis or Topology. We refer to [6] for an exposition of the classical
proof due to Euler, Lagrange and de Foncenex, which is almost entirely algebraic. Here we merely illustrate the theorem with several
examples.
Examples. (a) To solve the quadratic equation z 2 = w, equate
the absolute value r and argument θ of the given complex number w
with those of z 2 :
|z|2 = ρ, 2 arg z = φ + 2πk, k = 0, ±1, ±2, . . . .
√
We find: |z| = ρ, and arg z = φ/2 + πk. Increasing arg z by even
multiples π does not change z, and by odd changes z to −z. Thus
the equation has two solutions:
φ
φ
√
z = ± ρ cos + i sin
.
2
2
21
3. Complex Numbers
(b) The equation z 2 + pz + q = 0 with coefficients p, q ∈ C has
two complex solutions given by the quadratic formula (see above),
because according to Example (a), the square root of a complex
number takes on two opposite values (distinct, unless both are equal
to 0).
(c) The complex numbers 1, i, −1, −i are the roots of the polynomial z 4 − 1 = (z 2 − 1)(z 2 + 1) = (z − 1)(z + 1)(z − i)(z + i).
(d) There are n complex nth roots of unity (see Figure 20,
where n = 5). Namely, if z = r(cos θ + i sin θ) satisfies z n = 1,
then r n = 1 (and hence r = 1), and nθ = 2πk, k = 0, ±1, ±2, ....
Therefore θ = 2πk/n, where only the remainder of k modulo n is
relevant. Thus the n roots are:
2πk
2πk
z = cos
+ i sin
, k = 0, 1, 2, ..., n − 1.
n
n
For instance, if n = 3, the roots are 1 and
√
3
2π
2π
1
cos
± i sin
=− ±i
.
3
3
2
2
0
1
Figure 20
As illustrated by the previous two examples, even if all coefficients
a1 , . . . , an of a polynomial P are real, its roots don’t have to be
real. But then the non-real roots come in pairs of complex conjugate
ones. To verify this, we can use the fact that being real means
stay invariant (i.e. unchanged) under complex conjugation. Namely,
āi = ai for all i means that
P (z̄) = z n + ā1 z n−1 + ... + ān = P (z).
Therefore we have two factorizations of the same polynomial:
P̄ (z̄) = (z − z̄1 )m1 ...(z − z̄r )mr = (z − z1 )m1 ...(z − zr )mr = P (z).
They can differ only by orders of the factors. Thus, for each non-real
root zi of P , the complex conjugate z̄i must be also a root, and of
the same multiplicity.
22
Chapter 1. A CRASH COURSE
Expanding the product
(z − z1 )...(z − zn ) = z n − (z1 + ... + zn )z n−1 + ... + (−1)n z1 ...zn
we can express coefficients a1 , ..., an of the polynomial in terms of the
roots z1 , ..., zn (here multiple roots are repeated according to their
multiplicities). In particular, the sum and product of the roots are
z1 + ... + zn = −a1 , z1 ...zn = (−1)n an .
These relations generalize Vieta’s theorem z+ +z− = −p, z+ z− = q
about roots z± of quadratic equations z 2 + pz + q = 0.
EXERCISES
44. Can complex numbers be: real? real and imaginary? neither? 45. Compute: (a) (1 + i)/(3 − 2i); (b) (cos π/3 + i sin π/3)−1 . 46. Verify the commutative and distributive laws for multiplication of complex numbers.
47. Show that z −1 is real proportional to z̄ and find the proportionality
coefficient. 48. Find all z satisfying the equations: |z − 1| = |z + 1| = 2. 49. Sketch the solution set to the following system of inequalities:
|z − 1| ≤ 1, |z| ≤ 1, Re(iz) ≤ 0.
√
50. Compute absolute values and arguments of (a) 1 − i, (b) 1 − i 3. √ 100
3+i
51. Compute
. 2
52. Express cos 3θ and sin 3θ in terms of cos θ and sin θ. 53. Express cos(θ1 + θ2 ) and sin(θ1 + θ2 ) in terms of cos θi and sin θi .
54. Prove Bézout’s theorem7 : A number z0 is a root of a polynomial P
in one variable z if and only if P is divisible by z − z0 . 55. Find roots of degree 2 polynomials:
√
z 2 − 4z + 5, z 2 − iz + 1, z 2 − 2(1 + i)z + 2i, z 2 − 2z + i 3.
56. Find all roots of polynomials:
z 3 + 8, z 3 + i, z 4 + 4z 2 + 4, z 4 − 2z 2 + 4, z 6 + 1.
57. Prove that every polynomial with real coefficients factors into the product of polynomials of degree 1 and 2 with real coefficients. 58. Prove that the sum of all 5th roots of unity is equal to 0. 59.⋆ Find general Vieta’s formulas 8 expressing all coefficients of a polynomial in terms of its roots. 7
8
Named after Étienne Bézout (1730–1783).
Named after François Viéte (1540–1603) also known as Franciscus Vieta.
4. Problems of Linear Algebra
4
23
Problems of Linear Algebra
One of our goals in this book is to equip the reader with a unifying
view of Linear Algebra, or at least of what is studied under this name
in traditional university courses. Following this mission, we give here
a preview of the subject and describe its main achievements in lay
terms.
To begin with a few words of praise: Linear Algebra is a very
simple and useful subject, underlying most of other areas of mathematics, as well as its applications to physics, engineering, and economics. What makes Linear Algebra useful and efficient is that it
provides ultimate solutions to several important mathematical problems. Furthermore, as should be expected of a truly fruitful mathematical theory, the problems it solves can be formulated in a rather
elementary language and make sense even before any advanced machinery is developed. Even better, the answers to these problems
can also be described in elementary terms (in contrast with the justification of those answers, which better be postponed until adequate
tools are developed). Finally, those several problems we are talking
about are similar in their nature; namely, they all have the form of
problems of classification of very basic mathematical objects. Before presenting explicitly the problems and the answers, we need to
discuss the general idea of classification in mathematics.
Classifications in Mathematics
Classifications are intended to bring order into seemingly complex
or chaotic matters. Yet, there is a major difference between, say,
our classification of quadratic curves and Carl Linnaeus’ Systema
Naturae.
For two quadratic curves to be in the same class, it is not enough
that they share a number of features. What is required is a transformation of a prescribed type that would transform one of the curves
into the other, and thus make them equivalent in this sense, i.e. up
to such transformations.
What types of transformations are allowed (e.g., changes to arbitrary new coordinate systems, or only to Cartesian ones) may be
a matter of choice. With every choice, the classification of objects
of a certain kind (i.e. quadratic curves in our example) up to transformations of the selected type becomes a well-posed mathematical
problem.
24
Chapter 1. A CRASH COURSE
A complete answer to a classification problem should consist of
– a list of normal (or canonical) forms, i.e. representatives of the
classes of equivalence, and
– a classification theorem establishing that each object of the kind
(quadratic curve in our example) is equivalent to exactly one of the
normal forms, i.e. in other words, that
(i) each object can be transformed into a normal form, and
(ii) no two normal forms can be transformed into each other.
Simply put, Linear Algebra deals with classifications of linear
and/or quadratic equations, or systems of such equations. One might
think that all that equations do is ask: Solve us! Unfortunately this
attitude toward equations does not lead too far. It turns out that
very few equations (and kinds of equations) can be explicitly solved,
but all can be studied and many classified.
The idea is to replace a given “hard” (possibly unsolvable) equation with another one, the normal form, which should be chosen to
be as “easy” as it is possible to find in the same equivalence class.
Then the normal form should be studied (and hopefully “solved”)
thus providing information about the original “hard” equation.
What sort of information? Well, any sort that remains invariant
under the equivalence transformations in question.
For example, in classification of quadratic curves up to changes
of Cartesian coordinate systems, all equivalent ellipses are indistinguishable from each othergeometrically (in particular, they have the
same semiaxes) and differ only by the choice of a Cartesian coordinate system. However, if arbitrary rescaling of coordinates is also
allowed, then all ellipses become indistinguishable from circles (but
still different from hyperbolas, parabolas, etc.)
Whether a classification theorem really simplifies the matters,
depends on the kind of objects in question, the chosen type of equivalence transformations, and the applications in mind. In practice,
the problem often reduces to finding sufficiently simple normal forms
and studying them in great detail.
The subject of Linear Algebra fits well into the general philosophy just outlined. Below we formulate four classification problems
and respective answers. Together with a number of variations and
applications, which will be presented later in due course, they form
what is usually considered the main course of Linear Algebra.
25
4. Problems of Linear Algebra
The Rank Theorem
Question. Given m linear functions in n variables,
y1 = a11 x1 + ... + a1n xn
...
,
ym = am1 x1 + ... + amn xn
what is the simplest form to which they can be transformed by linear
changes of the variables,
y1 = b11 Y1 + ... + b1m Ym
...
,
ym = bm1 Y1 + ... + bmm Ym
x1 = c11 X1 + ... + c1n Xn
...
?
xn = cn1 X1 + ... + cnn Xn
Theorem. Every system of m linear functions in n variables can be transformed by suitable linear changes of dependent and independent variables to exactly one of the normal forms:
Y1 = X1 ,
...,
Yr = Xr ,
Yr+1 = 0,
...,
Ym = 0,
where 0 ≤ r ≤ m, n.
The number r featuring in the answer is called the rank of the
given system of m linear functions.
The Inertia Theorem
Question. Given a quadratic form (i.e. a homogeneous quadratic
function) in n variables,
Q = q11 x21 + 2q12 x1 x2 + 2q13 x1 x3 + ... + qnn x2n ,
what is the simplest form to which it can be transformed by a linear
change of the variables
x1 = c11 X1 + ... + c1n Xn
...
?
xn = cn1 X1 + ... + cnn Xn
Theorem. Every quadratic form in n variables can be
transformed by a suitable linear change of the variables to
exactly one of the normal forms:
2
2
X12 + ... + Xp2 − Xp+1
− ... − Xp+q
where 0 ≤ p + q ≤ n.
26
Chapter 1. A CRASH COURSE
The numbers p and q of positive and negative squares in the
normal form are called inertia indices of the quadratic form in
question. If the quadratic form Q is known to be positive everywhere
outside the origin, the Inertia Theorem tells us that in a suitable
coordinate system Q assumes the form X12 + ... + Xn2 , i.e. its inertia
indices are p = n, q = 0.
The Orthogonal Diagonalization Theorem
Question. Given two homogeneous quadratic forms in n variables,
Q(x1 , ..., xn ) and S(x1 , ..., xn ), of which the first one is known to be
positive everywhere outside the origin, what is the simplest form to
which they can be simultaneously transformed by a linear change of
the variables?
Theorem. Every pair Q, S of quadratic forms in n variables, of which Q is positive everywhere outside the origin,
can be transformed by a linear changes of the variables to
exactly one of the normal forms
Q = X12 + ... + Xn2 , S = λ1 X12 + ... + λn Xn2 , where λ1 ≥ ... ≥ λn .
The real numbers λ1 , . . . , λn are called eigenvalues of the given
pair of quadratic forms (and are often said to form their spectrum).
The Jordan Canonical Form Theorem
The fourth question deals with a system of n linear functions in n
variables. Such an object is the special case of systems of m functions in n variables when m = n. According to the Rank Theorem, such a system of rank r ≤ n can be transformed to the form
Y1 = X1 , . . . , Yr = Xr , Yr+1 = · · · = Yn = 0 by linear changes of dependent and independent variables. There are many cases however
where relevant information about the system does not stay invariant
when dependent and independent variables are changed separately.
This happens whenever both groups of variables describe objects in
the same space (rather than in two different ones).
An important class of examples comes from the theory of Ordinary Differential Equations (for short: ODE). Such equations (e.g.
ẋ = ax) relate values of quantities with the rates of their change in
4. Problems of Linear Algebra
27
time. Transforming the variables (e.g. by rescaling: x = cy) would
make little sense if not accompanied with the simultaneous change of
the rates (ẋ = cẏ). We will describe the fourth classification problem
in the context of the ODE theory.
Question.Given a system of n linear homogeneous 1st order constant coefficient ODEs in n unknowns:
ẋ1 = a11 x1 + ... + a1n xn
...
,
ẋn = an1 x1 + ... + ann xn
what is the simplest form to which it can be transformed by a linear
change of the unknowns:
x1 = c11 X1 + ... + c1n Xn
...
?
xn = cn1 X1 + ... + cnn Xn
Due to the Fundamental Theorem of Algebra, there is an advantage in answering this question over complex numbers, i.e. assuming
that the coefficients cij in the change of variables, as well as the
coefficients aij of the given ODE system are allowed to be complex
numbers.
Example. Consider a single mth order linear ODE of the form:
(
d
− λ)m y = 0, where λ ∈ C.
dt
By setting
y = x1 ,
d
d
d
y − λy = x2 , ( − λ)2 y = x3 , . . . , ( − λ)m−1 y = xm ,
dt
dt
dt
the equation can be written as the following system of m ODEs of
the 1st order:
ẋ1
ẋ2
ẋm−1
ẋm
= λx1 + x2
=
λx2 + x3
...
.
=
λxm−1 + xm
=
λxm
Let us call this system the Jordan cell of size m with the eigenvalue λ. Introduce a Jordan system of several Jordan cells of sizes
28
Chapter 1. A CRASH COURSE
m1 , ..., mr with the eigenvalues λ1 , ..., λr . It can be similarly compressed into the system
(
d
d
− λ1 )m1 y1 = 0, ..., ( − λr )mr yr = 0
dt
dt
of r unlinked ODEs of the orders m1 , ..., mr .
Theorem. Every constant coefficient system of n linear
1st order ODEs in n unknowns can be transformed by a
complex linear change of the unknowns to exactly one (up
to reordering of the cells) of the Jordan systems with m1 +
... + mr = n.
Fools and Wizards
In the rest of the text we will undertake a more systematic study of
the four basic problems and prove the classification theorems stated
here. The reader should be prepared however to meet the following
three challenges of Chapter 2.
Firstly, one will find there much more diverse material than what
has just been described. This is because many mathematical objects and classification problems about them can be reduced (speaking roughly or literally) to the four problems discussed above. The
challenge is to learn how to recognize situations where results of Linear Algebra can be helpful. Many of those objects will be introduced
in the opening section of Chapter 2.
Secondly, we will encounter one more fundamental result of Linear Algebra, which is not a classification, but an important (and
beautiful) formula. It answers the question which substitutions of
the form
x1 = c11 X1 + ... + c1n Xn
...
xn = cn1 X1 + ... + cnn Xn
are indeed changes of the variables and can therefore be inverted by
expressing X1 , ..., Xn linearly in terms of x1 , ..., xn , and how to describe such inversion explicitly. The answer is given in terms of
the determinant, a remarkable function of n2 variables c11 , ..., cnn ,
which will also be studied in Chapter 2.
Thirdly, Linear Algebra has developed an adequate language,
based on the abstract notion of vector space. It allows one to
represent relevant mathematical objects and results in ways much
4. Problems of Linear Algebra
29
less cumbersome and thus more efficient than those found in the previous discussion. This language is introduced at the end of Chapter
2. The challenge here is to get accustomed to the abstract way of
thinking.
Let us describe now the principle by which our main four themes
are grouped in Chapters 3 and 4.
Note that Jordan canonical forms and the normal forms in the
Orthogonal Diagonalization Theorem do not form discrete lists, but
instead depend on continuous parameters — the eigenvalues. Based
on experience with many mathematical classifications, it is considered that the number of parameters on which equivalence classes in a
given problem depend, is the right measure of complexity of the classification problem. Thus, Chapter 3 deals with simple problems of
Linear Algebra, i.e. those classification problems where equivalence
classes do not depend on continuous parameters. Respectively, the
non-simple problems are studied in Chapter 4.
Finally, let us mention that the proverb: Fools ask questions that
wizards cannot answer, fully applies in Linear Algebra. In addition
to the four basic problems, there are many similarly looking questions that one can ask: for instance, to classify triples of quadratic
forms in n variables up to linear changes of the variables. In fact,
in this problem, the number of parameters, on which equivalence
classes depend, grows with n at about the same rate as the number
of parameters on which the three given quadratic forms depend. We
will have a chance to touch upon such problems of Linear Algebra
in the last section, in connection with quivers. The modern attitude
toward such problems is that they are unsolvable.
EXERCISES
60. Classify quadratic forms Q = ax2 in one variable with complex coefficients (i.e. a ∈ C up to complex linear changes: x = cy, c ∈ C, c 6= 0. 61. Using results of Section 2, derive the Inertia Theorem in dimension
n = 2.
62. Show that classification of real quadratic curves up to arbitrary linear inhomogeneous changes of coordinates consists of 8 equivalence classes.
Show that if the coordinate systems are required to remain Cartesian, then
there are infinitely many equivalence classes, which depend on 2 continuous
parameters.
63. Is there any difference between classification of quadratic equations in
two variables F (x, y) = 0 up to linear inhomogeneous changes of the variables and multiplications of the equations by non-zero constants, and the
30
Chapter 1. A CRASH COURSE
classification of quadratic curves, i.e. sets {(x, y)|F (x, y) = 0} of solutions
to such equations, up to the same type of coordinate transformations? 64. Prove the Rank Theorem on the line, i.e. show that every linear function y = ax can be transformed to exactly one of the normal forms: Y = X
or Y = 0, by the changes: y = bY, x = cX, where b 6= 0 6= c.
65. For the pair of linear functions describing a rotation of the plane (see
the formula (∗) in Section 2), find the normal form to which they can be
transformed according to the Rank Theorem. 66. Show that X12 + ... + Xn2 is the only one of the normal forms of The
Inertia Theorem which is positive everywhere outside the origin.
67.⋆ In the Inertia Theorem with n = 2, show that that there are six normal
forms, and prove that they are pairwise non-equivalent. 68. Sketch the surfaces Q(X1 , X2 , X3 ) = 0 for all normal forms in the
Inertia Theorem with n = 3.
69. How many equivalence classes are there in the Inertia Theorem for
quadratic forms in n variables? 70.⋆ Prove the Orthogonal Diagonalization Theorem for n = 2 using results
of Section 2. 71. Classify ODEs ẋ = ax up to the changes x = cy, c 6= 0. 72. Verify that y(t) = eλt c0 + tc1 + ... + cm−1 tm−1 , where ci ∈ C are
d
arbitrary constants, is the general solution to the ODE ( dt
− λ)m y = 0.
P
m
k m−k
, show that
73. Using the binomial formula (a + b)m = k=0 m
k a b
the Jordan cell of size m with the eigenvalue λ can be written as the m-th
order ODE
(m−1)
m−1 ′
m
y (m) − m
+ · · · + (−1)m−1 m−1
λ
y + (−1)m y = 0.
1 λy
74.⋆ Prove that the binomial coefficient m
k “m choose k” is equal to
the number of k-element subsets in a set of m elements. m!
75.⋆ Prove that m
k = k!(m−k)! .
76. Rewrite the pendulum equation ẍ = −x as a system. 77.⋆ Identify the Jordan form of the system ẋ1 = x2 , ẋ2 = −x1 . 78.⋆ Solve the following Jordan systems and show that they are not equivalent to each other:
ẋ1
ẋ2
=
=
λx1
λx2
and
ẏ1
ẏ2
=
=
λy1
+
y2
.
λy2
79. How many arbitrary coefficients are there in a quadratic form in n
variables? 80.⋆ Show that equivalence classes of triples of quadratic forms in n variables must depend on at least n2 /2 parameters. Chapter 2
Dramatis Personae
1
Matrices
Matrices are rectangular arrays of numbers. An m × n-matrix
a11 . . .
A =  . . . aij
am1 . . .
a1n
... 
amn
has m rows and n columns. The matrix entry aij is positioned in
row i and column j.
We usually assume that the matrix entries are taken from either
of the fields R or C of real or complex numbers respectively. These
two choices are sufficient for all our major goals. However all we use
is basic properties of addition and multiplication of numbers. Thus
everything we say in this section (and in the next section about
determinants) works well when matrix entries are taken from any
field (or even any commutative ring with unity, e.g. the ring Z of all
integers, where multiplication is commutative, but division by nonzero numbers is not always defined). Various choices of a field will
be discussed at the beginning of Section 3 on vector spaces.
Matrices are found in Linear Algebra all over the place. Yet, the
main point of this section is that matrices per se are not objects of
Linear Algebra. Namely, the same matrix A can represent different
mathematical objects, and will behave differently depending on what
kind of object is meant by it.
31
32
Chapter 2. DRAMATIS PERSONAE
Vectors
The standard convention is that vectors are represented by columns
of their coordinates. Thus, λx and x+y are expressed in coordinates
by the following operations with 3 × 1-matrices:
 
 
 
 
x1
λx1
x1
y1
x1 + y1
λ  x2  =  λx2  ,  x2  +  y2  =  x2 + y2  .
x3
λx3
x3
y3
x3 + y3
More generally, one refers to n × 1-matrices as coordinate vectors,
which can be term-wise multiplied by scalars or added. Such vectors
form the coordinate space Rn (or Cn , or Qn , if the matrix entries
and the scalars are taken to be complex or rational numbers).
Linear Functions
A linear function (or linear form) on the coordinate space has
the form
a(x) = a1 x1 + · · · + an xn
and is determined by the row [a1 , . . . , an ] of its coefficients. Linear
combinations λa + µb of linear functions are linear functions. Their
coefficients are expressed by linear combinations of 1 × n-matrices:
λ[a1 , . . . , an ] + µ[b1 , . . . , bn ] = [λa1 + µb1 , . . . , λan + µbn ].
The operation of evaluation, i.e. taking the value a(x) of a linear function on a vector, leads to the simplest instance of matrix
product, namely the product of a 1 × n row with an n × 1 column:
x1
a(x) = [a1 , . . . , an ]  . . .  = a1 x1 + · · · + an xn .
xn
Note that
a(λx + µy) = λa(x) + µa(y) for all vectors x, y and scalars λ, µ.
This can be viewed as a manifestation of the distributive law for
matrix product. As we will see later, it actually expresses the very
property of linearity, namely, the property of linear functions to
respect operations with vectors, i.e. to assign to linear combinations
of vectors linear combinations of respective values with the same
coefficients.
33
1. Matrices
Linear Maps
An m-tuple of linear functions a1 , . . . , am in n variables defines a
linear map A : Rn → Rm . Namely, to a vector x ∈ Rn it associates
the vector y = Ax ∈ Rm whose m components are computed as
yi = ai (x) = a11 x1 + · · · + ain xn , i = 1, . . . , m.
Whereas each yi is given by the product of a row with a column, the
whole linear map can be described as the product of the m×n-matrix
A with the column:
 
y1
a11 . . . a1n
x1
. . .   . . .  = Ax.
y =  ...  =  ...
ym
am1 . . . amn
xn
The rows of the matrix A represent the m linear functions a1 , . . . , am .
To interpret the columns, note that every vector x can be uniquely
written as a linear combination x = x1 e1 +· · ·+xn en of the standard
basis vectors e1 , . . . , en :
x1
 ... 
 . . .  = x1 
xn
1
0 
+ · · · + xn 
... 
0
0
... 
.
0 
1
Respectively, Ax = x1 Ae1 + · · · + xn Aen . The n columns of the
matrix A are exactly Aej , j = 1, . . . , n, i.e. the images in Rm of the
standard basis vectors from Rn under the linear map A.
Given linear maps B : Rn → Rm and A : Rm → Rl , we can form
their composition Rn → Rl by substituting y = Bx into z = Ay.
The corresponding operation with the matrices A and B leads to the
general notion of matrix product C = AB defined whenever the
number of columns of A coincides with the number of rows of B:
 
  b11 . . . b1n 
c11 . . . c1n
a11 . . . . a1m
... 
 ...
 . . . cij . . .  =  . . .
... 
.
...
... 
cl1 . . . cln
al1 . . . . alm
bm1 . . . bmn
By definition of composition, the entry cij located at the intersection
of the ith row and jth column of C is the value of the linear function
ai on the image Bej ∈ Rm of the standard basis vector ej ∈ Rn .
34
Chapter 2. DRAMATIS PERSONAE
Since ai and Bej are represented by the ith row and jth column
respectively of the matrices A and B, we find:
b1j
cij = [ai1 , . . . , aim ]  . . .  = ai1 b1j + · · · + aim bmj .
bmj
In other words, cij is the product of the ith row of A with the jth
column of B.
Based on this formula, it is not hard to verify that the matrix
product is associative, i.e. (AB)C = A(BC), and satisfies the
left and right distributive laws: P (λQ + µR) = λP Q + µP R and
(λX + µY )Z = λXZ + µY Z, whenever the sizes of the matrices are
right. However, one of the key points in mathematics is that there
is no point in making such verifications. The operations with matrices encode in the coordinate form meaningful operations with linear
maps, and the properties of matrices simply reflect those of the maps.
For instance, matrix product is associative because composition of
arbitrary (and not only linear) maps is associative. We will have a
chance to discuss properties of linear maps in the more general context of abstract vector spaces, where coordinate expressions may not
even be available.
Changes of Coordinates
Suppose that standard coordinates (x1 , . . . , xn ) in Rn are expressed
in terms of new variables (x′1 , . . . , x′n ) by means of n linear functions:
 
x′1
c11 . . . c1n
x1
 ...  =  ...
...  ... .
x′n
cn1 . . . cnn
xn
In matrix notation, this can be written as x = Cx′ where C is the
square matrix of size n. We call this a linear change of coordinates, if conversely, x′ can be expressed by linear functions of
x. In other words, there must exist a square matrix D such that
the substitutions x′ = Dx and x = Cx′ are inverse to each other,
i.e. x = CDx and x′ = DCx′ for all x and x′ . It is immediate to
see that this happens exactly when the matrices C and D satisfy
CD = I = DC where I is the identity matrix of size n:
1
0 ... 0
1 ... 0 
 0
.
I=
...
... 
0 ...
0
1
35
1. Matrices
When this happens, the square matrices C and D are called inverse
to each other, and one writes: D = C −1 , C = D −1. The rows of
C express old coordinates xi as linear functions of new coordinates,
x′j . The columns of C represent, in the old coordinate system, the
vectors which in the new coordinate system serve as standard basis
vectors e′j . The matrix C is often called the transition matrix
between the coordinate systems.
In spite of apparent simplicity of this notion, it is easy to get lost
here. As we noticed in connection with rotations on the complex
plane, a change of coordinates is easy to confuse with a linear map
from the space Rn to itself. We will reserve for such maps the term
linear transformation. Thus, the formula x′ = Cx describes a linear transformation, that associates to a vector x ∈ Rn a new vector
x′ written in the same coordinate system. The inverse transformation, when exists, is given by the formula x = C −1 x′ .
Let us now return to changes of coordinates and examine how
they affect linear functions and maps.
Making the change of variables x = Cx′ in a linear function a
results in the values a(x) of this function being expressed as a′ (x′ )
in terms of new coordinates of the same vectors. Using the matrix
product notation, we find: a′ x′ = ax = a(Cx′ ) = (aC)x′ . Thus,
coordinates of vectors and coefficients of linear functions are transformed differently:
x = Cx′ , or x′ = C −1 x, but a′ = aC, or a = a′ C −1 .
Next, let y = Ax be a linear map from Rn to Rm , and let x = Cx′
and y = Dy′ be changes of coordinates in Rn and Rm respectively.
Then in new coordinates the same linear map is given by the new
formula y′ = A′ x′ . We compute A′ in terms of A, C and D: Dy′ =
y = Ax = ACx′ , i.e. y′ = D−1 ACx′ , and hence
A′ = D −1 AC.
In particular, if x 7→ Ax is a linear transformation on Rn (i.e. —
we remind — a linear map from Rn to itself), and a change of coordinates x = Cx′ is made, then in new coordinates the same linear
transformation is x′ 7→ A′ x′ , where
A′ = C −1 AC,
i.e. in the the previous rule we need to take D = C. This is because
the same change applies to both: the input vector x and its image
Ax. The operation A 7→ C −1 AC over a square matrix A is often
called the similarity transformation by the invertible matrix C.
36
Chapter 2. DRAMATIS PERSONAE
Bilinear Forms
The dot product of two coordinate vectors x, y ∈ Rn is defined by
the formula
hx, yi = x1 y1 + · · · + xn yn .
It is an example of a bilinear form, i.e. a function of an ordered pair
(x, y) of vector variables, which is a linear function of each of them.
In general, the vectors may come from different spaces, x ∈ Rm and
y ∈ Rn. We can write a bilinear form as a linear function of xi whose
coefficients are linear functions of yj :
B(x, y) =
m
X
i=1
xi
n
X
j=1
bij yj =
n
m X
X
xi bij yj .
i=1 j=1
Thus the bilinear form B is determined by the m × n-matrix of its
coefficients bij . Slightly abusing notation, we will denote this matrix
with the same letter B as the bilinear form. Here is the meaning of
the matrix entries: bij = B(ei , fj ), the value of the form on the pair
of standard basis vectors, ei ∈ Rm and fj ∈ Rn .
With a bilinear form B of x and y, one associates another bilinear
form called transposed to B, which is a function of y and x (in this
order!), denoted B t , and defined by B t (y, x) = B(x, y). Explicitly:
m
n X
X
i=1 j=i
yi btij xj
t
= B (y, x) = B(x, y) =
n
m X
X
xj bji yi .
j=1 i=i
We conclude that btij = bji . In other words, the n × m-matrix of coefficients of B t is obtained from the m × n-matrix B by the operation
of transposition:
b11 bn1
b11 ... b1n
...
... 
 , Bt = 
...
.
B=
 ...
... 
bm1 ... bmn
b1m bnm
Note that transposition transforms rows into columns, thus messing
up our convention to represent vectors by columns and linear functions by rows. Taking advantage of this, we can compute values of
bilinear forms using matrix product and transposition:
B(x, y) = xt By, B t (y, x) = yt B t x.
37
1. Matrices
In particular, when x, y ∈ Rn , we have: hx, yi = xt y.
To find out how changes of variables x = Dx′ , y = Cy′ transform
the matrix of a bilinear form, take B ′ (x′ , y′ ) = B(Dx′ , Cy′ ), the
value B(x, y) written in new coordinates. We have: b′ij = B ′ (ei , fj ) =
B(Dei , Cfj ) = (Dei )t B(Cfj ). Note that (Dei )t is the transposed ith
column of D, i.e. the ith row of D t , and B(Cfj ) is the jth column
of BC. Combining these results for all i and j, we find:
B ′ = D t BC.
Were the same matrix B representing a linear map, the transformation law would’ve been different: B 7→ D −1 BC.
Consider now the case when both vector inputs x, y of a bilinear
form B lie in the same space Rn . Then the transposed form B t
can be evaluated on the same pair of vectors (x, y) (in this order!)
and the result B(y, x) compared with B(x, y). The form B is called
symmetric if B t = B, i.e. B(y, x) = B(x, y) for all x, y. It is
called anti-symmetric if B t = −B, i.e. B(y, x) = −B(x, y) for all
x, y. Every bilinear form (of two vectors from the same space) can be
uniquely written as the sum of a symmetric and an anti-symmetric
bilinear forms:
B + Bt B − Bt
B=
+
.
2
2
Quadratic Forms
To a bilinear form B on Rn , one associates a function on Rn by
substituting the same vector x ∈ Rn for both inputs:
B(x, x) =
n
n X
X
bij xi xj .
i=1 j=1
The result is a quadratic form, i.e. a degree-2 homogeneous polynomial in n variables (x1 , . . . , xn ). Since bij can be arbitrary, all
quadratic forms are obtained this way.
Moreover, if B is written as the sum S + A of symmetric and
anti-symmetric forms, then A(x, x) = −A(x, x) = 0, and B(x, x) =
S(x, x), i.e. the the quadratic form depends only on the symmetric
part of B. Further abusing our notation, we will write S(x) for the
values S(x, x) of the quadratic form corresponding to the bilinear
form S.
38
Chapter 2. DRAMATIS PERSONAE
The correspondence between symmetric bilinear forms and
quadratic forms is not only “onto” but is also “one-to-one,” i.e. the
values S(x, y) of a symmetric bilinear form can be reconstructed
from the values of the corresponding quadratic form. Explicitly:
S(x + y, x + y) = S(x, x) + S(x, y) + S(y, x) + S(y, y).
Due to the symmetry S(x, y) = S(y, x), we have:
2S(x, y) = S(x + y) − S(x) − S(y).
The transformation law for coefficients of a quadratic form under
the changes of variables x = Cx′ is S 7→ C t SC. Here S = S t denotes
the symmetric matrix of coefficients of the quadratic form
S(x) =
n X
n
X
sij xi xj , sij = sji for all i, j.
i=1 j=1
EXERCISES
81. Are the functions 3x, x3 , x+1, 0, sin x, (1+x)2 −(1−x)2, tan arctan x,
arctan tan x linear? P
82. Check the linearity property of functions a(x) =
ai xi , i.e. that
a(λx + µy) = λa(x) + µa(y).
n
83. Prove that
Pevery function on R that possesses the linearity property
has the form
ai xi . 84. Let v ∈ Rm , and let a : Rn → R be a linear function. Define a linear
map E : Rn → Rm by E(x) = a(x)v, and compute the matrix of E. 85. Write down the matrix of rotation through the angle θ in R2 . 86. Let
"
#
1
2
1 2
1
2 3
1 .
, C = −2
, B=
A=
3 4
1 −1 1
0 −1
Compute those of the products ABC, BAC, BCA, CBA, CAB, ACB which
are defined. Pn
87. Let j=1 aij xj = bi , i = 1, . . . , m, be a system of m linear equations
in n unknowns (x1 , . . . , xn ). Show that it can be written in the matrix form
Ax = b, where A is a linear map from Rn to Rm .
88. A square matrix A is called upper triangular if aij = 0 for all i < j
and lower triangular if aij = 0 for all i > j. Prove that products of upper
triangular matrices are upper triangular and products of lower triangular
matrices are lower triangular.
1. Matrices
39
89. For an identity matrix I, prove AI = A and IB = B for all allowed
sizes of A and B.
90. For a square matrix A, define its powers Ak for k > 0 as A · · · A (k
times), for k = 0 as I, and for k < 0 as A−1 · · · A−1 (k times), assuming A
invertible. Prove that A that Ak Al = Ak+l for all integer k, l.
19
cos 19◦ − sin 19◦
⋆
91. Compute
.
sin 19◦
cos 19◦
92. Compute powers Ak , k = 0, 1, 2, . . . , of the square matrix A all of whose
entries are zeroes, except that ai,i+1 = 1 for all i. 93. For which sizes of matrices A and B, both products AB and BA: (a)
are defined, (b) have the same size? 94.⋆ Give examples of matrices A and B which do not commute, i.e. AB 6=
BA, even though both products are defined and have the same size. 95. When does (A + B)2 = A2 + 2AB + B 2 ? 96. A square matrix A is called diagonal if aij = 0 for all i 6= j. Which
diagonal matrices are invertible? 97. Prove that an inverse of a given matrix is unique when exists. 98. Let A, B be invertible n × n-matrices. Prove that AB is also invertible,
and (AB)−1 = B −1 A−1 . 99.⋆ If AB is invertible, does it imply that A, B, and BA are invertible? 100.⋆ Give an example of matrices A and B such that AB = I, but BA 6= I.
101. Let A : Rn → Rm be an invertible linear map. (Thus, m = n, but we
consider Rn and Rm as two different copies of the coordinate space.) Prove
that after suitable changes of coordinates in Rn and Rm , the matrix of this
transformation becomes the identity matrix I. 102. Is the function xy: linear? bilinear? quadratic? 103. Find the coefficient matrix of the dot product. 104. Prove that all anti-symmetric bilinear forms in R2 are proportional
to each other. 105. Represent the bilinear form B = 2x1 (y1 + y2 ) in R2 as the sum S + A
of symmetric and anti-symmetric ones. 106. Find the symmetric bilinear
forms corresponding to the quadratic
P
forms (x1 + · · · + xn )2 and i<j xi xj . 107. Is AB necessarily symmetric if A and B are? 108. Prove that for any matrix A, both At A and AAt are symmetric.
109. Find a square matrix A such that At A 6= AAt . 40
Chapter 2. DRAMATIS PERSONAE
41
2. Determinants
2
Determinants
Definition
Let A be a square matrix of size n:
a11 ... a1n
...
A=
an1 ... ann
Its determinant is a scalar det A defined by the formula
X
det A =
ε(σ)a1σ(1) a2σ(2) ...anσ(n) .
σ
Here σ is a permutation of the indices 1, 2, ..., n. A permutation σ can be considered as an invertible function i 7→ σ(i) from
the set of n elements {1, ..., n} to itself. We use the functional
notation
σ(i) in order to specify the i-th term in the permutation
1
...
n
σ = σ(1) . . . σ(n) . Thus, each elementary product in the
determinant formula contains exactly one matrix entry from each
row, and these entries are chosen from n different columns. The sum
is taken over all n! ways of making such choices. The coefficient ε(σ)
in front of the elementary product equals 1 or −1 and is called the
sign of the permutation σ.
We will explain the general rule of the signs after a few examples.
In these examples, we begin using one more conventional notation for
determinants. According to it, a square array of matrix entries placed
between
bars denotes the determinant
of the matrix.
two vertical
a b a b
denotes a number
denotes a matrix, but Thus,
c d
c d equal to the determinant of that matrix.
Examples. (1) For n = 1, the determinant |a11 | = a11 .
a11 a12 = a11 a22 − a12 a21 .
(2) For n = 2, we have: a21 a22 (3) For n = 3, we have 3! = 6 summands
a11 a12 a13 a21 a22 a23 =
a31 a32 a33 a11 a22 a33 − a12 a21 a33 + a12 a23 a31 − a13 a22 a31 + a13 a21 a32 − a11 a23 a32
123 123 123 123 123
corresponding to permutations 123
123 , 213 , 231 , 321 , 312 , 132 .
42
Chapter 2. DRAMATIS PERSONAE
The rule of signs for n = 3 is schematically shown on Figure 21.
Figure 21
Parity of Permutations
The general rule of signs depends on properties of permutations.
We say that σ inverses a pair of indices i < j if σ(i) > σ(j).
The total number l(σ) of pairs i < j that σ inverses is called the
length of the permutation σ. Permutations are called even or odd
depending on their lengths being respectively even or odd. We put
ε(σ) := ε(σ) = (−1)l(σ) (i.e. ε(σ) = 1 for even and −1 for odd
permutations.
Examples. (1) If l(σ) = 0, then σ(1) < σ(2) < · · · < σ(n), and
hence σ = id, the identity permutation. In particular, ε(id) = 1.
(2) Consider a transposition τ , i.e. a permutation that swaps
two indices, say i < j, leaving all other indices in their respective
places. Then τ (j) < τ (i), i.e. τ inverses the pair of indices i < j.
Besides, for every index k such that i < k < j we have: τ (j) < τ (k) <
τ (i), i.e. both pairs i < k and k < j are inverted. Note that all other
pairs of indices are not inverted by τ , and hence l(τ ) = 2(j − i) + 1.
In particular, every transposition is odd: ε(τ ) = −1.
(3) There are n − 1 transpositions of length 1, namely τ (i) , i =
1, . . . , n − 1, defined as transpositions of nearby indices i and i + 1.
Lemma. Let σ ′ = στ (i) be the composition: a permutation
σ preceded by τ (i) . Then
l(σ) + 1 if σ(i) < σ(i + 1)
.
l(σ ′ ) =
l(σ) − 1 if σ(i) > σ(i + 1)
In particular, σ and σ ′ have opposite parities.
1
...
i
i + 1 ...
n
Proof. Indeed, σ ′ = σ(1) . . . σ(i + 1) σ(i) . . . σ(n) , i.e.
σ ′ is obtained from σ by the extra swap of σ(i) and σ(i + 1). This
swap does not affect monotonicity of any pairs of entries, except the
monotonicity of σ(i), σ(i + 1), which is reversed. 2. Determinants
43
Several corollaries follow immediately.
Corollary 1. Every permutation σ, of length l > 0, can be
represented as a composition τ (i1 ) . . . τ (il ) of l transpositions
of length 1.
Indeed, locating a pair of nearby indices i < i + 1 inverted by σ
and composing σ with τ (i) as in Lemma, we obtain a permutation
σ ′ of length l − 1. Continuing the same way with σ ′ , we after l
steps arrive at the permutation of length 0, which is the identity
permutation (Example (1)). Reversing the order of transpositions,
we obtain the required result.
Corollary 2. If a permutation is represented as composition of transpositions of length 1, then the parity of the
number of these transpositions coincides with the parity of
the permutation.
Indeed, according to the lemma, precomposing with each transposition of length 1 reverses the parity of the permutation. Since the
identity permutation is even, the number of factors τ (i) in a composition must be even for even permutations and odd for odd.
Corollary 3. The sign of permutations is multiplicative
with respect to compositions: ε(σσ ′ ) = ε(σ)ε(σ ′ ).
Indeed, representing each σ and σ ′ as a composition of transpositions of length 1 and concatenating them, we obtain such a representation for the composition σσ ′ . The result follows from Corollary
2, since the number (and hence parity) of factors behaves additively
under concatenation.
Corollary 2′ . If a permutation is represented as a composition of arbitrary transpositions, then the parity of the
number of these transpositions coincides with the parity of
the permutation.
Indeed, according to Example (2), every transposition is odd, and
hence ε(τ1 · · · τN ) = ε(τ1 ) · · · ε(τN ) = (−1)N .
Here are some illustrations of the above properties in connection
with the definition of determinants.
Examples. (4) The transposition (21) is odd. That is why the
term a12 a21 occurs in 2 × 2-determinants with the negative sign.
123 123 123 123 123
(5) The permutations 123
123 , 213 , 231 , 321 , 312 , 132 have
lengths l = 0, 1, 2, 3, 2, 1 and respectively signs ε = 1, −1, 1, −1, 1, −1
(thus explaining Figure 21). Notice that each next permutation here
is obtained from the previous one by an extra flip.
44
Chapter 2. DRAMATIS PERSONAE
(6) The permutation 1234
4321 inverses all the 6 pairs of indices and
has therefore length l = 6. Thus the elementary product a14 a23 a32 a41
occurs with the sign ε = (−1)6 = +1 in the definition of 4 × 4determinants.
(7) Permutations σ and σ −1 inverse to each other have the same
parity since their composition σσ −1 = σ −1 σ = id is even. This shows
that the definition of determinants can be rewritten “by columns:”
X
det A =
ε(σ)aσ(1)1 ...aσ(n)n .
σ
Indeed, each summand in this formula is equal to the summand in
the original definition corresponding to the permutation σ −1, and
vice versa. Namely, reordering the factors aσ(1)1 ...aσ(n)n , so that
σ(1), . . . , σ(n) increase monotonically, yields a1σ−1 (1) ...anσ−1 (n) .
Properties of determinants
(i) Transposed matrices have equal determinants:
det At = det A.
This follows from the last Example. Below, we will think of an
n × n matrix as an array A = [a1 , . . . , an ] of its n columns of size n
(vectors from Cn if you wish) and formulate all further properties of
determinants in terms of columns. The same properties hold true for
rows, since the transposition of A changes columns into rows without
changing the determinant.
(ii) Interchanging any two columns changes the sign of
the determinant:
det[..., aj , ..., ai , ...] = − det[..., ai , ..., aj , ...].
Indeed, the operation replaces each permutation in the definition
of determinants by its composition with the transposition of the indices i and j. Thus changes the parity of the permutation, and thus
reverses the sign of each summand.
Rephrasing this property, one says that the determinant, considered as a function of n vectors a1 , . . . , an is totally anti-symmetric,
i.e. changes the sign under every odd permutation of the vectors, and
stays invariant under even. It implies that a matrix with two equal
columns has zero determinant. It also allows one to formulate further
2. Determinants
45
column properties of determinants referring to the 1st column only,
since the properties of all columns are alike.
(iii) Multiplication of a column by a number multiplies
the determinant by this number:
det[λa1 , a2 , ..., an ] = λ det[a1 , a2 , ..., an ].
Indeed, this operation simply multiplies each of the n! elementary
products by the factor of λ.
This property shows that a matrix with a zero column has zero
determinant.
(iv) The determinant function is additive with respect to
each column:
det[a′1 + a′′1 , a2 , ..., an ] = det[a′1 , a2 , ..., an ] + det[a′′1 , a2 , ..., an ].
Indeed, each elementary product contains exactly one factor
picked from the 1-st column and thus splits into the sum of two elementary products a′σ(1)1 aσ(2)2 ...aσ(n)n and a′′σ(1)1 aσ(2)2 ...aσ(n)n . Summing up over all permutations yields the sum of two determinants
on the right hand side of the formula.
The properties (iv) and (iii) together mean that the determinant
function is linear with respect to each column separately. Together
with the property (ii), they show that adding a multiple of one
column to another one does not change the determinant of
the matrix. Indeed,
|a1 + λa2 , a2 , ...| = |a1 , a2 , ...| + λ |a2 , a2 , ...| = |a1 , a2 , ...| ,
since the second summand has two equal columns.
The determinant function shears all the above properties with the
identically zero function. The following property shows that these
functions do not coincide.
(v) det I = 1.
Indeed, since all off-diagonal entries of the identity matrix are
zeroes, the only elementary product in the definition of det A that
survives is a11 ...ann = 1.
The same argument shows that the determinant of any diagonal
matrix equals the product of the diagonal entries. It is not hard to
generalize the argument in order to see that the determinant of any
46
Chapter 2. DRAMATIS PERSONAE
upper or lower triangular matrix is equal to the product of the diagonal entries. One can also deduce this from the following factorization
property valid for block triangular matrices.
A B
Consider an n × n-matrix
subdivided into four blocks
C D
A, B, C, D of sizes m × m, m × l, l × m and l × l respectively (where
of course m + l = n). We will call such a matrix block triangular
if C or B is the zero matrix 0. We claim that
A B
det
= det A det D.
0 D
Indeed, consider a permutation σ of {1, ..., n} which sends at least
one of the indices {1, ..., m} to the other part of the set,
{m + 1, ..., m + l}. Then σ must send at least one of {m + 1, ..., m + l}
back to {1, ..., m}. This means that every elementary product in our
n × n-determinant which contains a factor from B must also contain
a factor from C, and hence vanish, if C = 0. Thus only the permutations σ which permute {1, ..., m} separately from {m + 1, ..., m + l}
contribute to the determinant in question. Elementary products
corresponding to such permutations factor into elementary products from det A and det D and eventually add up to the product
det A det D.
Of course, the same holds true if B = 0 instead of C = 0.
We will use the factorization formula in the 1st proof of the following fundamental property of determinants.
Multiplicativity
Theorem. The determinant is multiplicative with respect to
matrix products: for arbitrary n × n-matrices A and B,
det(AB) = (det A)(det B).
We give two proofs: one ad hoc, the other more conceptual.
A 0
with
Proof I. Consider the auxiliary 2n×2n matrix
−I B
the determinant equal to the product (det A)(det B) according to the
factorization formula. We begin to change the matrix by adding to
the last n columns linear combinations of the first n columns with
such coefficients that the submatrix B is eventually replaced by zero
47
2. Determinants
submatrix. Thus, in order to kill the entry bkj we must add the
bkj -multiple of the k-th column to the n + j-th column. According to the properties of determinants (see (iv)) these operations do
not
change the determinant but transform the matrix to the form
A C
. We ask the reader to check that the entry cij of the
−I 0
submatrix C in the upper right corner equals ai1 b1j + ... + ain bnj so
that C = AB is the matrix product! Now, interchanging the i-th
and n + i-th columns, i = 1, ..., n, we change the determinant
by the
C
A
n
factor of (−1) and transform the matrix to the form
.
0 −I
The factorization formula applies again and yields det C det(−I). We
conclude that det C = det A det B since det(−I) = (−1)n compensates for the previous factor (−1)n .
Proof II. We will first show that the properties (i – v) completely characterize det[v1 , . . . , vn ] as a function of n columns vi of
size n.
Indeed, consider a function f , which to n columns v1 , . . . , vn ,
associates a number f (v1 , . . . , vn ). Suppose that f is linear with
respect to each column.
P Let ei denotes the ith column of the identity
matrix. Since v1 = ni=1 vi1 ei , we have:
f (v1 , v2 , . . . , vn ) =
n
X
vi1 f (ei , v2 , . . . , vn ).
i=1
Using linearity with respect to the 2nd column v2 =
similarly obtain:
f (v1 , v2 , . . . vn ) =
n
n X
X
Pn
j=1 vj2 ej ,
we
vi1 vj2 f (ei , ej , v3 , . . . , vn ).
i=1 j=1
Proceeding the same way with all columns, we get:
X
f (v1 , . . . , vn ) =
vi1 1 · · · vin n f (ei1 , . . . , ein ).
i1 ,...,in
Thus, f is determined by its values f (ei1 , . . . , ein ) on strings of n
basis vectors.
Let us assume now that f is totally anti-symmetric. Then, if any
two of the indices i1 , . . . , in coincide, we have: f (ei1 , . . . , ein ) = 0.
48
Chapter 2. DRAMATIS PERSONAE
1 ... n
All other coefficients correspond to permutations σ = i1 . . . in
of the indices (1, . . . , n), and hence satisfy:
f (ei1 , . . . , ein ) = ε(σ)f (e1 , . . . , en ).
Therefore, we find:
f (v1 , . . . , vn ) =
X
vσ(1)1 . . . vσ(n)n ε(σ)f (e1 , . . . , en ),
σ
= f (e1 , . . . , en ) det[v1 , . . . , vn ].
Thus, we have established:
Proposition 1. Every totally anti-symmetric function of
n coordinate vectors of size n which is linear in each of them
is proportional to the determinant function.
Next, given an n × n matrix C, put
f (v1 , . . . , vn ) := det[Cv1 , . . . , Cvn ].
Obviously, the function f is a totally anti-symmetric in all vi (since
det is). Multiplication by C is linear:
C(λu + µv) = λCu + µCv
for all u, v and λ, µ.
. Therefore, f is linear with respect to each vi (as composition of
two linear operations). By the previous result, f is proportional to
det. Since Cei are columns of C, we conclude that the coefficient
of proportionality f (e1 , . . . , en ) = det C. Thus, we have found the
following interpretation of det C.
Proposition 2. det C is the factor by which the determinant function of n vectors vi is multiplied when the vectors
are replaced with Cvi .
Now our theorem follows from the fact that when C = AB, the
substitution v 7→ Cv is the composition v 7→ Av 7→ ABv of consecutive substitutions defined by A and B. Under the action of A, the
function det is multiplied by the factor det A, then under the action
of B by another factor det B. But the resulting factor (det A)(det B)
must be equal to det C. Corollary. If A is invertible, then det A is invertible.
Indeed, (det A)(det A−1 ) = det I = 1, and hence det A−1 is reciprocal to det A. The converse statement: that matrices with invertible
determinants are invertible, is also true due to the explicit formula
for the inverse matrix, described in the next section.
49
2. Determinants
Remark. Of course, a real or complex number det A is invertible
whenever det A 6= 0. Yet over the integers Z this is not the case:
the only invertible integers are ±1. The above formulation, and
several similar formulations that follow, which refer to invertibility
of determinants, are preferable as they are more general.
The Cofactor Theorem
In the determinant formula for an n × n-matrix A each elementary
product ±a1σ(1) ... begins with one of the entries a11 , ..., a1n of the
first row. The sum of all terms containing a11 in the 1-st place is
the product of a11 with the determinant of the (n − 1) × (n − 1)matrix obtained from A by crossing out the 1-st row and the 1-st
column. Similarly, the sum of all terms containing a12 in the 1-st
place looks like the product of a12 with the determinant obtained by
crossing out the 1-st row and the 2-nd column of A. In fact it differs
by the factor of −1 from this product, since switching the columns
1 and 2 changes signs of all terms in the determinant formula and
interchanges the roles of a11 and a12 . Proceeding in this way with
a13 , ..., a1n we arrive at the cofactor expansion formula for det A
which can be stated as follows.
j
a 11
a 1n
a ij
a n1
a nn
i
1
2
3
4
5
1
2
3
4
5
Figure 22
Figure 23
The determinant of the (n − 1) × (n − 1)-matrix obtained from A
by crossing out the row i and column j is called the (ij)-minor of A
(Figure 22). Denote it by Mij . The (ij)-cofactor Aij of the matrix
A is the number that differs from the minor Mij by a factor ±1:
Aij = (−1)i+j Mij .
The chess-board of the signs (−1)i+j is shown on Figure 23. With
these notations, the cofactor expansion formula reads:
det A = a11 A11 + a12 A12 + ... + a1n A1n .
50
Chapter 2. DRAMATIS PERSONAE
Example.
a11
a21
a
31
a12
a22
a32
a13
a23
a33
a
= a11 a22
32
a21
a23 −
a
12 a33
a31
a21
a23 +
a
13 a33
a31
a22 .
a32 Using the properties (i) and (ii) of determinants we can adjust
the cofactor expansion to the i-th row or j-th column:
det A = ai1 Ai1 + ... + ain Ain = a1j A1j + ... + anj Anj , i, j = 1, ..., n.
These formulas reduce evaluation of n × n-determinants to that of
(n − 1) × (n − 1)-determinants and can be useful in recursive computations.
Furthermore, we claim that applying the cofactor formula to the
entries of the i-th row but picking the cofactors of another row we
get the zero sum:
ai1 Aj1 + ... + ain Ajn = 0 if i 6= j.
Indeed, construct a new matrix à replacing the j-th row by a copy of
the i-th row. This forgery does not change the cofactors Aj1 , ..., Ajn
(since the j-th row is crossed out anyway) and yields the cofactor
expansion ai1 Aj1 + ... + ain Ajn for det Ã. But à has two identical
rows and hence det à = 0. The same arguments applied to the
columns yield the dual statement:
a1i A1j + ... + ani Anj = 0 if i 6= j.
All the above formulas can be summarized in a single matrix identity.
Introduce the n × n-matrix adj(A), called adjoint to A, by placing
the cofactor Aij on the intersection of j-th row and i-th column. In
other words, each aij is replaced with the corresponding cofactor Aij ,
and then the resulting matrix is transposed:
 
a11 . . . a1n
A11 . . . An1
adj  . . . aij . . .  =  . . . Aji . . .  .
an1 . . . ann
A1n . . . Ann
Theorem. A adj(A) = (det A) I = adj(A) A.
Corollary. If det A is invertible then A is invertible, and
1
adj(A).
det A
−1
a b
=
Example. If ad − bc 6= 0, then
c d
A−1 =
1
ad−bc
d −b
.
−c
a
51
2. Determinants
Cramer’s Rule
This is an application of the Cofactor Theorem to systems of linear
equations. Consider a system
a11 x1 + · · · + a1n xn = b1
···
an1 x1 + · · · + ann xn = bn
of n linear equations with n unknowns (x1 , . . . , xn ). It can be written
in the matrix form
Ax = b,
where A is the n × n-matrix of the coefficients aij , b = [b1 , . . . , bn ]t is
the column of the right hand sides, and x is the column of unknowns.
In the following Corollary, ai denote columns of A.
Corollary. If det A is invertible then the system of linear equations Ax = b has a unique solution given by the
formulas:
x1 =
det[b, a2 , ..., an ]
det[a1 , ..., an−1 , b]
, . . . , xn =
.
det[a1 , ..., an ]
det[a1 , ..., an ]
Indeed, when det A 6= 0, the matrix A is invertible. Multiplying the matrix equation Ax = b by A−1 on the left, we find:
x = A−1 b. Thus the solution is unique, and xi = (det A)−1 (A1i b1 +
... + Ani bn ) according to the cofactor formula for the inverse matrix. But the sum b1 A1i + ... + bn Ani is the cofactor expansion for
det[a1 , ..., ai−1 , b, ai+1 , ..., an ] with respect to the i-th column.
Example. Suppose that a11 a22 6= a12 a21 . Then the system
a11 x1 + a12 x2 = b1
a21 x2 + a22 x2 = b2
has a unique solution
b1
b2
x1 = a11
a21
a12 a22 ,
a12 a22 x2 = a11 b1 a21 b2 .
a11 a12 a21 a22 52
Chapter 2. DRAMATIS PERSONAE
Three Cool Formulas
We collect here some useful generalizations of previous results.
A. We don’t know of any reasonable generalization of determinants to the situation when matrix entries do not
However
commute.
a b
the following generalization of the formula det
= ad − bc is
c d
instrumental in some non-commutative applications.1
A B
In the block matrix
, assume that D−1 exists.
C D
A B
= det(A − BD−1 C) det D.
Then det
C D
I
0
A B
A − BD−1 C B
.
=
Proof:
C D
0
D
−D −1 C I
B. Lagrange’s formula2 below generalizes cofactor expansions.
By a multi-index I of length |I| = k we mean an increasing
sequence i1 < · · · < ik of k indices from the set {1, . . . , n}. Given
and n × n-matrix A and two multi-indices I, J of the same length
k, we define the (IJ)-minor of A as the determinant of the k × kmatrix formed by the entries aiα jβ of A located at the intersections
of the rows i1 , . . . , ik with columns j1 , . . . , jk (see Figure 24). Also,
denote by I¯ the multi-index complementary to I, i.e. formed by
those n − k indices from {1, . . . , n} which are not contained in I.
For each multi-index I = (i1 , . . . , ik ), the following cofactor expansion with respect to rows i1 , . . . , ik holds true:
X
det A =
(−1)i1 +···+ik +j1 +···+jk MIJ MI¯J¯,
J:|J|=k
where the sum is taken over all multi-indices J = (j1 , . . . , jk )
of length k.
Similarly, one can similarly write Lagrange’s cofactor expansion
formula with respect to given k columns.
Example. Let a1 , a2 , a3 , a4 and
b1 , b2 , b3 , b4 be 8 vectors on the
a1 a2 a3 a4 = |a1 a2 ||b3 b4 | − |a1 a3 ||b2 b4 |
plane. Then b1 b2 b3 b4 + |a1 a4 ||b2 b3 | − |a2 a3 ||b1 b4 | + |a2 a4 ||b1 b3 | − |a3 a4 ||b1 b2 |.
1
2
Notably in the definition of Berezinian in super-mathematics [7].
After Joseph-Louis Lagrange (1736–1813).
53
2. Determinants
In the proof of Lagrange’s formula, it suffices to assume that it
is written with respect to the first k rows, i.e. that I = (1, . . . , k). Indeed, interchanging them with the rows i1 < · · · < ik takes
(i1 − 1) + (i2 − 2) + · · · + (ik − k) transpositions, which is accounted
for by the sign (−1)i1 +···+ik in the formula.
Next, multiplying out MIJ MI¯J¯, we find k!(n − k)! elementary
products of the form:
±a1,jα1 · · · ak,jαk ak+1,j̄β · · · an,j̄β
1
,
1 ... n − k
and β = β1 . . . βn−k
are permu¯
tations, and jαµ ∈ J, j̄βν ∈ J. It is clear that the total sum over
multi-indices I contains each elementary product from det A, and
does it exactly once. Thus, to finish the proof, we need to compare
the signs.
where α =
n−k
1
α1
...
...
k
αk
j1
j2
jk
i1
i2
ik
Figure 24
The sign ± in the above formula is equal to ε(α)ε(β), the product of the signs of the permutations α and β. The sign of this
elementary product „
in the definition of det A is equal« to the sign
1
...
k
k + 1 ...
n
on the set
of the permutation
j̄β1
. . . j̄βn−k
jα1 . . . jαk
J ∪ J¯ = {1, . . . , n}. Reordering separately the first k and last n − k
indices in the increasing order changes the sign of the permutation
by ε(α)ε(β). Therefore the signs of all summands of det A which
occur in MIJ MI¯J¯ are coherent. It remains to find the total sign with
which MIJ MI¯J¯ occurs in det A, by computing
the sign of the permutation σ :=
1
j1
...
...
k
jk
k+1
j̄1
...
...
n
j̄n−k
, where j1 < · · · jk and
j̄1 < · · · < j̄n−k .
Starting with the identity permutation (1, 2 . . . , j1 , . . . , j2 , . . . , n),
it takes j1 − 1 transpositions of nearby indices to move j1 to the 1st
place. Then it takes j2 − 2 such transpositions to move j2 to the 2nd
54
Chapter 2. DRAMATIS PERSONAE
place. Continuing this way, we find that
ε(σ) = (−1)(j1 −1)+···+(jk −k) = (−1)1+···+k+j1 +···+jk .
This agrees with Lagrange’s formula, since I = {1, . . . , k}. .
C. Let A and B be k × n and n × k matrices (think of k < n).
For each multi-index I = (i1 , . . . , ik ), denote by AI and BI the k × kmatrices formed by respectively: columns of A and rows of B with
the indices i1 , . . . , ik .
The determinant of the k × k-matrix AB is given by the
following Binet–Cauchy formula:3
X
det AB =
(det AI )(det BI ).
I
Note that when k = n, this turns into the multiplicative property
of determinants: det(AB) = (det A)(det B). Our second proof of it
can be generalized to establish the formula of Binet–Cauchy. Namely,
let a1 , . . . , an denote columns of A. Then the jth column of C = AB
is the linear combination: cj = a1 b1j + · · · + an bnj . Using linearity
in each cj , we find:
det[c1 , . . . , ck ] =
X
1≤i1 ,...,ik ≤k
det[ai1 , . . . , aik ]bi1 1 · · · bik k .
If any two of the indices iα coincide, det[ai1 , . „
. . , aik ] = 0. Thus
the
«
1 ... k
on the
sum is effectively taken over all permutations i . . . i
k
1
set4 {i1 , . . . , ik }. Reordering the columns ai1 , . . . , aik in the increasing order of the indices (an paying the “fees” ±1 according to parities
of permutations) we obtain the sum over all multi-indices of length
k:
X
X
ε(σ)bi1 1 · · · bik k .
det[ai′1 , . . . , ai′k ]
i′1 <···<i′k
σ
The sum on the right is taken over permutations σ ′ =
It is equal to det BI , where I =
(i′1 , . . . , i′k ).
i′1
i1
...
...
i′k
ik
.
Corollary 1. If k > n, det AB = 0.
3
4
After Jacques Binet (1786–1856) and Augustin Louis Cauchy (1789–1857).
Remember that in a set, elements are unordered!
55
2. Determinants
This is because no multi-indices of length k > n can be formed
from {1, . . . , n}. In the oppositely extreme case kP= 1, Binet–
Cauchy’s formula turns into the expression ut v =
ui vi for the
dot product of coordinate vectors. A “Pythagorean” interpretation
of the following identity will come to light in the next chapter, in
connection with volumes of parallelepipeds.
P
Corollary 2. det AAt = I (det AI )2 .
EXERCISES
110. Prove that the following determinant is equal to 0:
0 0 0 a b 0 0 0 c d 0 0 0 e f .
p q r s t v w x y z 111. Compute determinants:
cos x − sin x cosh x sinh x
, sin x
cos x sinh x cosh x
112. Compute
0 1
1 0
1 1
determinants:
0 1
1 1 , 1 2
1 3
0
1
3
6
2
3
k+1
2
,
1
−i
1−i
,
cos x sin y
sin x cos y
i 1+i
1
0
0
1
.
.
113. List all the 24 permutations of {1, 2, 3, 4}, find the length and the sign
of each of them. 114. Find the length of the following permutation:
„
1
1
...
...
k
2k − 1
k+2
4
...
...
2k
2k
«
.
115. Find the maximal possible length of permutations of {1, ..., n}. 116. Find the length of a permutation
of the permutation
„
1
in
...
...
n
i1
«
„
1
i1
...
...
n
in
«
given the length l
. 117. Prove that inverse permutations have the same length. 118. Compare parities of permutations of the letters a,g,h,i,l,m,o,r,t in the
words logarithm and algorithm. 56
Chapter 2. DRAMATIS PERSONAE
119. Represent the permutation
1
4
2 3
5 1
4 5
3 2
as composition of a
minimal number of transpositions. 120. Do products a13 a24 a53 a41 a35 and a21 a13 a34 a55 a42 occur in the defining formula for determinants of size 5? 121. Find the signs of the elementary products a23 a31 a42 a56 a14 a65 and
a32 a43 a14 a51 a66 a25 in the definition of determinants of size 6 by computing
the numbers of inverted pairs of indices. 122. Compute the determinants
13247 13347
28469 28569
,
246 427 327 1014 543 443 .
−342 721 621 123. The numbers 195, 247, and 403 are divisible
by 13.
Prove that the
1 9 5 following determinant is also divisible by 13: 2 4 7 . 4 0 3 124. Professor Dumbel writes his office and home phone numbers as a 7×1matrix O and 1 × 7-matrix H respectively. Help him compute det(OH). 125. How does a determinant change if all its n columns are rewritten in
the opposite order? 1 x x2 ... xn 1 a1 a21 ... an1 126.⋆ Solve the equation 1 a2 a22 ... an2 = 0, where all a1 , ..., an
...
1 a
n 2
n an ... an
are given distinct numbers. 127. Prove that an anti-symmetric matrix of size n has zero determinant
if n is odd. 128. How do similarity transformations of a given matrix affect its determinant? 129. Prove that the adjoint matrix of an upper (lower) triangular matrix
is upper (lower) triangular.
130. Which triangular matrices are invertible?
131. Compute the determinants: (∗ is a wild card):
(a) ∗
∗
∗
a1
∗
∗
a2
0
∗
...
0
...
an
0
...
0
,
(b) ∗
∗
e
g
∗
∗
f
h
a
c
0
0
b
d
0
0
.
57
2. Determinants
132. Compute determinants using cofactor expansions:
1 2 2 1 2 −1
0
0
2 −1
0
0 1 0 2 −1
(a) , (b) 0 −1
2 −1
2 0 1 1 0 2 0 1 0
0 −1
2
133. Compute inverses of
"
1 2
3 1
(a)
2 3
.
matrices using the Cofactor Theorem:
#
#
"
1 1 1
3
0 1 1 .
2 , (b)
0 0 1
1
134. Solve the systems of linear equations Ax = b where A is one of the
matrices of the previous exercise, and b = [1, 0, 1]t . 135. Compute
−1
1 −1
0
0
1 −1
0 
 0
.
 0
0
1 −1 
0
0
0
1
136. Express det(adj(A)) of the adjoint matrix via det A. 137. Which integer matrices have integer inverses? 138. Solve systems of equations using Cramer’s rule:
(a)
2x1 − x2 − x3
3x1 + 4x2 − 2x3
3x1 − 2x2 + 4x3
= 4
= 11 ,
= 11
(b)
x1 + 2x2 + 4x3
5x1 + x2 + 2x3
3x1 − x2 + x3
= 31
= 29 .
= 10
139.⋆ Compute determinants:
(a) 0
x1
x2
.
xn
x1
1
0
.
0
x2
0
1
.
...
...
...
...
.
0
xn
0
0
.
1
,
(b) a
0
0
0
0
c
0
a
0
0
c
0
0 0
0 0
a b
c d
0 0
0 0
0 b
b 0
0 0
0 0
d 0
0 d
.
140.⋆ Let Pij , 1 ≤ i < j ≤ 4, denote the 2 × 2-minor of a 2 × 4-matrix
formed by the columns i and j. Prove the following Plücker identity5
P12 P34 − P13 P24 + P14 P23 = 0.
5
After Julius Plücker (1801–1868).
58
Chapter 2. DRAMATIS PERSONAE
141. The cross product of two vectors x, y ∈ R3 is defined by
x x3 x3 x1 x1 x2 ,
,
.
x × y := 2
y2 y3 y3 y1 y1 y2 p
|x|2 |y|2 − hx, yi2 .
1
∆n
,
142.⋆ Prove that an +
=
1
∆n−1
an−1 +
1
···+
1
a1 +
a0
a0
1
0
...
0 −1 a1
1
...
0 .
.
.
. .
where ∆n = .
0 . . . −1 a
1 n−1
0 ...
0
−1 an λ
−1
0 ...
0
0
λ
−1 . . .
0
⋆
.
.
.
.
143. Compute: .
.
0
...
0
λ
−1 a
an−1 . . . a2 λ + a1 n
1
1
1
...
1 2
3
n
1
...
1
1
1 3
4
n+1
⋆
.
...
144. Compute: 1
2
2
2
.
. . .
. n
n+1
2n−2 1
...
n−1
n−1
n−1
Prove that the length |x × y| =
145.⋆ Prove Vandermonde’s identity6
1
1
.
1
x1
x2
.
xn
⋆
146. Compute: 6
x21
x22
.
x2n
1
1
.
1
...
...
.
...
x1n−1
x2n−1
.
xnn−1
2
23
.
2
2n−1
3
33
.
3
2n−1
Y
(xj − xi ).
=
1≤i<j≤n
...
...
.
...
n
n3
.
n
2n−1
.
After Alexandre-Theóphile Vandermonde (1735–1796).
59
3. Vector Spaces
3
Vector Spaces
In four words, the subject of Linear Algebra can be described as
geometry of vector spaces.
Axioms
By definition, a vector space is a set, equipped with operations
of addition and multiplication by scalars which are required to
satisfy certain axioms:
The set will be denoted here by V, and its elements referred to
as vectors. Here are the axioms.
(i) The sum of two vectors u and v is a vector (denoted u + v);
the result of multiplication of a vector v by a scalar λ is a vector
(denoted λv).
(ii) Addition of vectors is commutative and associative:
u + v = v + u, (u + v) + w = u + (v + w) for all u, v, w ∈ V.
(iii) There exists the zero vector (denoted by 0) such that
v + 0 = 0 + v = v for every v ∈ V.
(iv) For every vector u there exists the opposite vector, denoted
by −u, such that
−u + u = 0.
(v) Multiplication by scalars is distributive: For all vectors u, v
and scalars λ, µ we have
(λ + µ)(u + v) = λu + λv + µu + µv.
(vi) Multiplication by scalars is associative in the following sense:
For every vector u and all scalars λ, µ we have:
(λµ)u = λ(µu).
(vii) Multiplication by scalars 0 and 1 acts on every vector u as
0u = 0, 1u = u.
60
Chapter 2. DRAMATIS PERSONAE
We have to add to this definition the following comment about
scalars. Taking one of the sets R or C of real or complex numbers
on the role of scalars one obtains the definition of real vector spaces
or complex vector spaces. In fact these two choices will suffice for
all our major goals. The reader may assume that we use the symbol
K to cover both cases K = R and K = C in one go.
On the other hand, any field K would qualify on the role of
scalars, and this way one arrives at the notion of K-vector spaces. By
a field one means a set K equipped with two operations: addition and
multiplication. Both are assumed to be commutative and associative,
and satisfying the distributive law: a(b + c) = ab + ac. Besides, it is
required that there exist elements 0 and 1 such that a + 0 = a and
1a = a for all a ∈ K. Then, it is required that every a ∈ K has the
opposite −a such that −a + a = 0, and every non-zero a ∈ K has its
inverse a−1 such that a−1 a = 1. To the examples of fields C and R,
we can add (omitting many other available examples): the field Q of
rational numbers; the field A ⊂ C of all algebraic numbers (i.e.
roots of polynomials in one variable with rational coefficients); the
field Zp of integers modulo a given prime number p. For instance, the
set Z2 = {0, 1} of remainders modulo 2 with the usual arithmetics of
remainders (0+0 = 0 = 1+1, 0+1 = 1 = 1+0, 0·0 = 1·0 = 0·1 = 0,
1 · 1 = 1) can be taken on the role of scalars. This gives rise to the
definition of Z2 -vector spaces useful in computer science and logic.
To reiterate: it is essential that division by all non-zero scalars
is defined. Therefore the set Z of all integers and the set F[x] of all
polynomials in one indeterminate x with coefficients in a field F are
not fields, and do not qualify on the role of scalars in the definition of
vector spaces, because the division is not always possible. However
the field Q of all rational numbers and the field F(x) of all rational
functions with coefficients in a field F are O.K.
Examples
The above definition of vector spaces is doubly abstract: not only it
neglects to specify the set V of vectors, but it does not even tell us
anything explicit about the nature of the operations of addition of
vectors and multiplication of vectors by scalars. To find various examples of vector spaces we should figure out which operations would
be good candidates to satisfy the axioms (i–vii). It turns out that in
the majority of useful examples, the operations are pointwise addition
of functions and multiplication of functions by scalars.
3. Vector Spaces
61
Example 1. Let S be any set, and V be the set of all functions
on S with values in K. We will denote this set by KS . The sum and
multiplication by scalars are defined on KS as pointwise operations
with functions. Namely, given two functions f, g and a scalar λ, the
values of the sum f + g and the product λf at a point s ∈ S are
(f + g)(s) = f (s) + g(s), (λf )(s) = λ(f (s)).
It is immediate to check that V = KS equipped with these operations
satisfies the axioms (i–vii). Thus KS is a K-vector space.
Example 1a. Let S be the set of n elements 1, 2, ..., n. Then the
space KS is the coordinate space Kn (e.g. RS = Rn and CS = Cn ).
Namely, each function on the set {1, . . . , n} is specified by the column
x = (x1 , ..., xn )t of its values, and the usual operations with such
columns coincide with pointwise operations with the functions.
Example 1b. Let S be the set of all ordered pairs (i, j), where
i = 1, ..., m, j = 1, ..., n. Then the vector space KS is the space
of m × n-matrices A whose entries aij lie in K. The operations of
addition of matrices and their multiplication by scalars coincide with
pointwise operations with the functions.
Example 2. Let V be a K-vector space. Consider the set V S of
all functions on a given set S with values in V. Elements of V can
be added and multiplied by scalars. Respectively the vector-valued
functions can be added and multiplied by scalars in the pointwise
fashion. Thus, V S is an example of a K-vector space.
Example 3. A non-empty subset W in a vector space V is called
a linear subspace (or simply subspace) if linear combinations
λu + µv of vectors from W with arbitrary coefficients lie in W. In
particular, for every u ∈ W, −u = (−1)u ∈ W, and 0 = 0u ∈ W.
A subspace of a vector space satisfies the axioms of a vector space
on its own (since the operations are the same as in V). Thus every
subspace of a K-vector space is an example of a K-vector space.
Example 3a. For instance, all upper triangular n × n-matrices
(lower triangular, block triangular, block diagonal, diagonal matrices
— the reader can continue this line of examples) form subspaces in
the space of all n × n-matrices, and therefore provide examples of
vector spaces.
Example 3b. The set of all polynomials (say, in one variable),7
form a subspace in the space RR of all real-valued functions on the
7
As well as sets of all continuous, differentiable, 5 times continuously differentiable, infinitely differentiable, Riemann-integrable, measurable, etc. functions,
introduced in Mathematical Analysis.
62
Chapter 2. DRAMATIS PERSONAE
number line and therefore provide examples of real vector spaces.
More generally, polynomials with coefficients in K (as well as such
polynomials of degree not exceeding 7) form examples of K-vector
spaces.
Example 3c. Linear
forms or quadratic forms in Kn form subn
K
spaces in the space K of all K-valued functions on Kn , and thus
provide examples of K-vector spaces.
Example 3d. All bilinear forms of two vectors v ∈ Km , w ∈
n
K form a subspace in the space of all functions on the Cartesian
product8 Km × Kn with values in K. Hence they form a K-vector
space.
Morphisms
The modern ideology requires objects of mathematical study to be
organized into categories. This means that in addition to specifying
objects of interest, one should also specify morphisms, i.e. maps
between them. The category of vector spaces is obtained by
taking vector spaces for objects, and linear maps for morphisms.
By definition, a function A : V → W from a vector space V to a
vector space W is called a linear map if it respects the operations
with vectors, i.e. if it maps linear combinations of vectors to linear
combinations of their images with the same coefficients:
A(λu + µv) = λAu + µAv
for all u, v ∈ V and λ, µ ∈ K.
With a linear map A : V → W, one associates two subspaces, one
in V, called the null space, or the kernel of A and denoted Ker A,
and the other in W, called the range of A and denoted A(V):
Ker A := {v ∈ V : Av = 0},
A(V) := {w ∈ W : w = Av for some v ∈ V}.
A linear map is injective, i.e. maps different vectors to different
ones, exactly when its kernel is trivial. Indeed, if Ker A 6= {0}, then
it contains non-zero vectors mapped to the same point 0 in W as 0
from V. This makes the map A non-injective. Vice versa, if A is
non-injective, i.e. if Av = Av′ for some v 6= v′ , then u = v − v′ 6= 0
lies in Ker A. This makes the kernel nontrivial.
8
Cartesian product of two sets A and B is defined as the set of all ordered
pairs (a, b) of elements a ∈ A and b ∈ B.
3. Vector Spaces
63
When the range of a map is the whole target space, A(V) = W,
the map is called surjective. If a linear map A : V → W is bijective, i.e. both injective and surjective, it establishes a one-to-one
correspondence between V and W in a way that respects vector operations. Then one says that A establishes an isomorphism between
the vector spaces V and W. Two vector spaces are called isomorphic (written V ∼
= W) if there exists an isomorphism between them.
Example 4. Let V = W be the space K[x] of all polynomials in
one indeterminate x with coefficients from K. The differentiation
d/dx : K[x] → K[x] is a linear map defined by
d
(a0 + a1 x + · · · + an xn ) = a1 + 2a1 x + · · · + nan xn−1 .
dx
It is surjective, with the kernel consisting of constant polynomials.9
Example 5. Linear combinations λA + µB of linear maps A, B :
V → W are linear. Therefore all linear maps from V to W form a
subspace in the space of all vector-valued functions V → W. The
vector space of linear maps from V to W is usually denoted by10
Hom(V, W).
Example 5a. The space of linear functions (or linear forms)
on V, i.e. linear maps V → K, is called the space dual to V, and is
denoted by V ∗ .
The following formal construction indicates that every vector
space can be identified with a subspace in a space of functions with
pointwise operations of addition and multiplication by scalars.
Example 5b. Given a vector v ∈ V and a linear function f ∈ V ∗,
the value f (v) ∈ K is defined. We can consider it not as a function
f of v, but as a function of f defined by v. This way, to a vector
v we associate the function Ev : V ∗ → K defined by evaluating all
linear functions V → K on the vector v. The function Ev is linear,
since (λf + µg)(v) = λf (v) + µg(v). The linear function Ev is an
element of the second dual space (V ∗ )∗. The formula f (λv + µw) =
λf (v) + µf (w), expressing linearity of linear functions, shows that
Ev depends linearly on v. Thus the evaluation map E : v 7→ Ev
is a linear map V → (V ∗ )∗ . One can show11 that E is injective
9
When K contains Q (e.g. K = R or C), but not when K contains Zp .
From homomorphism, a word roughly synonymous to the terms linear map
and morphism.
11
Although we are not going to do this, as it would drag us into the boredom
of general set theory and transfinite induction.
10
64
Chapter 2. DRAMATIS PERSONAE
and thus provides an isomorphism between V and its range
E(V) ⊂ (V ∗ )∗.
The previous result and examples suggest that vector spaces need
not be described abstractly and raises the suspicion that the axiomatic definition is misleading as it obscures the actual nature of
vectors as functions subject to the pointwise algebraic operations.
Here are however some examples where vectors do not come naturally as functions.
Example 6a. Geometric vectors (see Section 1 of Chapter 1),
and forces and velocities in physics, are not introduced as functions.
Example 6b. Rational functions are defined as ratios P/Q of
polynomials P and Q. Morally they are functions, but technically
they are not. More precisely, the domain of P/Q is the non-empty
set of points x where Q(x) 6= 0, but it varies with the function. All
rational functions do form a vector space, but there is not a single
point x at which all of them defined. The addition operation is
not defined as pointwise, but is introduced instead by means of the
formula: P/Q + P ′ /Q′ = (P Q′ + QP ′ )/QQ′ .
w
(v,w)
v
v’
v
W
Figure 25
0
Figure 26
Example 6c. Vector spaces frequently occur in number theory
and field theory in connection with field extensions, i.e. inclusions
K ⊂ F of one field as a subfield into another. As examples, we
already have: Q ⊂ A ⊂ C, Q ⊂ R ⊂ C, and K ⊂ K(x). Given a
field extension K ⊂ F, it is often useful to temporarily forget that
elements of F can be multiplied, but still remember that they can
be multiplied by elements of K. This way, F becomes a K-vector
space (e.g. C is a real plane). Vectors here, i.e. elements of F, are
“numbers” rather then “functions.”
More importantly, various abstract constructions of new vector
spaces from given ones are used regularly, and it would be very awkward to express the resulting vector space as a space of functions
even when the given spaces are expressed this way. Here are two
such constructions.
65
3. Vector Spaces
Direct Sums and Quotients
Example 7. Given two vector spaces V and W, their direct sum
V ⊕ W (Figure 25) is defined as the set of all ordered pairs (v, w),
where v ∈ V, w ∈ W, equipped with the component-wise operations:
λ(v, w) = (λv, λw), (v, w) + (v′ , w′ ) = (v + v′ , w + w′ ).
Of course, one can similarly define the direct sum of several vector
spaces. E.g. Kn = K ⊕ · · · ⊕ K (n times).
Example 7a. Given a linear map A : V → W, its graph is
defined as a subspace in the direct sum V ⊕ W:
Graph A = {(v, w) ∈ V ⊕ W : w = Av}.
Example 8. The quotient space of a vector space V by a subspace W is defined as follows. Two vectors v and v′ (Figure 26) are
called equivalent modulo W, if v − v′ ∈ W. This way, all vectors
from V become partitioned into equivalence classes. These equivalence classes form the quotient vector space V/W.
More precisely, denote by π : V → V/W the canonical projection, which assigns to a vector v its equivalence class modulo W.
This class can be symbolically written as v + W, a notation emphasizing that the class consists of all vectors obtained from v by adding
arbitrary vectors from W. Alternatively, one may think of v + W
as a “plane” obtained from W as translation by the vector v. When
v ∈ W, we have v + W = W. When v ∈
/ W, v + W is a not a linear
subspace in V. We will call it an affine subspace parallel to W.
The set V/W of all affine subspaces in V parallel to W is equipped
with algebraic operations of addition and multiplication by scalars
in such a way that the canonical projection π : V → V/W becomes a
linear map. In fact this condition leaves no choices, since it requires
that for every u, v ∈ V and λ, µ ∈ K,
λπ(u) + µπ(v) = π(λu + µv).
In other words, the linear combination of given equivalence classes
must coincide with the equivalence class containing the linear combination λu + µv of arbitrary representatives u, v of these classes. It
is important here that picking different representatives u′ and v′ will
result in a new linear combination λu′ + µv′ which is however equivalent to the previous one. Indeed, the difference λ(u − u′ ) + µ(v − v′ )
lies in W since u − u′ and v − v′ do. Thus linear combinations in
V/W are well-defined.
66
Chapter 2. DRAMATIS PERSONAE
Example 8a. Projecting 3D images to a 2-dimensional screen is
described in geometry by the canonical projection π from the 3D
space V to the plane V/W of the screen along the line W of the eye
sight (Figure 27).
V
0
W
V/W
0
Figure 27
Example 8b. The direct sum V ⊕ W contains V and W as subspaces consisting of the pairs (v, 0) and (0, w) respectively. The quotient of V⊕W by W is canonically identified with V, because each pair
(v, w) is equivalent modulo W to (v, 0). Likewise, (V ⊕ W) /V = W.
Example 8c. Let V = R[x] be the space of polynomials with real
coefficients, and W the subspace of polynomials divisible by x2 + 1.
Then the quotient space V/W can be identified with the plane C
of complex numbers, and the projection π : R[x] → C with the map
P 7→ P (i) of evaluating a polynomial P at x = i. Indeed, polynomials
P and P ′ are equivalent modulo W if and only if P − P ′ is divisible
by x2 + 1, in which case P (i) = P ′ (i). Vice versa, if P (i) = P ′ (i),
then P (−i) = P ′ (−i) (since the polynomials are real), and hence
P − P ′ is divisible by (x − i)(x + i) = x2 + 1.
Example 8d. For every linear map A : V → V ′ , there is a canonical isomorphism à : V/ Ker A → A(V) between the quotient by
the kernel of A, and its range. Namely, Au = Av if and only if
u − v ∈ Ker A, i.e. whenever u is equivalent to v modulo the kernel. Thus, one can think of every linear map as the projection of the
source space onto the range along the null space. This is a manifestation of a general homomorphism theorem in algebra, which in
the context of vector spaces can be formally stated this way:
Theorem. Every linear map A : V → V ′ is uniquely represented as the composition A = iÃπ of the canonical projection π : V → V/ Ker A with the isomorphism à : V/ Ker A →
67
3. Vector Spaces
A(V) followed by the inclusion i : A(V) ⊂ V ′ :
A
V
−→ V ′
π↓
∪ i
.
∼
=
V/ Ker A −→ A(V)
Ã
Quaternions
The assumption that scalars commute under multiplication is reasonable because they usually do, but it is not strictly necessary. Here is
an important example: vector spaces over over a skew-filed, namely
the skew-filed H of quaternions.
By definition, H = C2 = R4 consists of quaternions
q = z + wj = (a + bi) + (c + di)j = a + bi + cj + dk,
where z = a + bi and w = c + di are complex numbers, and k = ij.
The multiplication is determined by the requirements: i2 = j 2 = −1,
ij = −ji, associativity, and bilinearity over R. Namely:
q ′ q = (z ′ + w′ j)(z + wj) = z ′ z + w′ jwj + z ′ wj + w′ jz
= (z ′ z − w′ w̄) + (z ′ w + w′ z̄)j,
where we use that j(x + yi) = (x − yi)j for all real x, y. Equivalently,
in purely real notation, 1, i, j, k form the standard basis of H = R4 ,
and the product is specified by the multiplication table:
i2 = j 2 = k2 = −1, ij = k = −ji, jk = i = −kj, ki = j = −ik.
The quaternion
q ∗ = z̄ − wj = a − bi − cj − dk
is called adjoint or conjugate to q. We have:
q ∗ q = (z̄ − wj)(z + wj) = z̄z + ww̄ + (z̄w − wz̄)j
= |z|2 + |w|2 = a2 + b2 + c2 + d2 = |q|2 .
By |q| we denote here the norm or absolute value
√ of quaternion
q. The norm coincides with the Euclidean length a2 + b2 + c2 + d2
68
Chapter 2. DRAMATIS PERSONAE
of the vector q ∈ R4 . Note that (q ∗ )∗ = q, and hence qq ∗ = |q ∗ |2 =
|q|2 = q ∗ q. It follows, that if q 6= 0, then
q −1 =
q∗
= |q|−2 (a − bi − cj − dk)
|q|2
is the quaternion inverse to q, i.e. q −1 q = 1 = qq −1 . We conclude
that although H is not a field, since multiplication is not commutative, it shares with fields the property that all non-zero elements of
H are invertible.
In the definition of an H-vector space, one should be cautious
about only the axiom (vi): associativity of multiplication by scalars.
Namely, there are two versions of this axiom:
(qq ′ )u = q(q ′ u) and v(qq ′ ) = (vq)q ′ .
Using the first or the second one obtains the category of left or right
quaternionic vector space. The actual difference is not in the side of
the vector on which the scalars are written — this difference is purely
typographical — but in the order of the factors: in the left case v is
first multiplied by q ′ , and in the right by q.
Some people prefer to deal with left, other with right quaternionic
spaces, but in fact it is a matter of religion and hence does not
matter. Namely, a left vector space can be converted into a right
one (and vice versa) by defining the new multiplication by q as the
old multiplication by q ∗ . The trick is that conjugation q 7→ q ∗ is
not exactly an automorphism of H because it changes the order of
factors: (q1 q2 )∗ = q2∗ q1∗ . We prefer to work with right quaternionic
vector spaces, and the following example explains why.
Define the coordinate quaternionic space Hn as the set of all
n-columns of quaternions, added component-wise, and multiplied by
scalars q ∈ H on the right:
 
  ′  
q1 q
q1
q1 + q1′
q1
q1
.
 . 
  . 
 .   .  
,  . q =  . .
 . + . =
.
qn q
qn
qn + qn′
qn′
qn
Clearly, Hn is a right quaternionic vector space.
Let A be an m × n matrix with entries aij ∈ H, and x ∈ Hn .
Then the usual matrix product y = Ax yields a vector y ∈ Hm :
yi = ai1 x1 + · · · + ain xn ,
i = 1, . . . , m.
69
3. Vector Spaces
The matrix multiplication x 7→ y = Ax defines an H-linear
map A : Hn → Hm . Namely, A is not only additive, but it also commutes with multiplication by quaternionic scalars: (Ax)q = A(xq).
The point is that the matrix entries aij are applied to the components xj of the vectors on the left, while the scalar q is applied on
the right, and hence the order of these operations is irrelevant. For
left vector spaces, linear maps y = Ax would correspond to matrix
multiplication formulas yi = x1 ai1 + · · · + xn ain that look a bit weird.
EXERCISES
147. Give an example of a “vector space” that satisfies all axioms except
the last one: 1u = u for all u. 148. Prove that the axiom: 0u = 0 for all u, in the definition of vector
spaces is redundant, i.e. can be derived from remaining axioms. 149. Derive from axioms of vector spaces that (−1)u = −u for all u. 150.⋆ Prove that every non-zero element of Zp is invertible. 151. Verify that KS and V S are vector spaces.
152. How many vectors are there in Zp -vector space Znp ? Hom(Znp , Zm
p )? 153. Show that in K[x], polynomials of degree n do not form a subspace,
but polynomials of degree ≤ n do. 154. Prove that intersection of subspaces is a subspace.
155. How many vectors are there in the Zp -vector space of strictly upper
triangular n × n-matrices? Rb
156. Show that the map f 7→ a f (x) dx defined by integration of (say)
polynomial functions is a linear form R[x] → R.
157. Find the kernel and the range of the differentiation map D =
K[x] → K[x], when K = Zp . d
dx
:
158. For V = Kn , verify that the map E : V → V ∗∗ defined in Example 5b
is an isomorphism.
159. Let W ⊂ V be any subset of V. Define W ⊥ ⊂ V ∗ as the set of all
those linear functions which vanish on W. Prove that W ⊥ is a subspace of
V ∗ . (It is called the annihilator of W.)
160. Let W ⊂ V be a subspace. Establish a canonical isomorphism between
the dual space (V/W)∗ and the annihilator W ⊥ ⊂ V ∗ . 161. Let B be a bilinear form on V × V, i.e. a function (v, w) 7→ B(v, w)
of pairs of vectors linear in each of them. Prove that the left kernel and
right kernel of B, defined as LKerB = {v | B(v, x) = 0 for all x ∈ V}
and RKerB = {w | B(x, w) = 0 for all x ∈ V}, are subspaces of V, which
coincide when B is symmetric or anti-symmetric.
70
Chapter 2. DRAMATIS PERSONAE
162. Establish canonical isomorphisms between the spaces Hom(V, W ∗ ),
Hom(W, V ∗ ), and the space B(V, W) of all bilinear forms on V × W. 163. Describe all affine subspaces in R3 . 164. Prove that the intersection of two affine subspaces, parallel to given
linear ones, if non-empty, is an affine subspace parallel to the intersection
of the given linear subspaces. 165. Prove that for a linear map A : V → W, Ker A = (Graph A) ∩ V.
166. Show that (V ⊕ W)∗ = V ∗ ⊕ W ∗ . 167. Composing a linear map A : V → W with a linear form f ∈ W ∗
we obtain a linear form At f ∈ V ∗ . Prove that this defines a linear map
At : W ∗ → V ∗ . (The map At is called dual or adjoint to A.) Show that
(AB)t = B t At .
168. Given the kernel and the range of a linear map A : V → W, find the
kernel and the range of the dual map At . √
169. Show that numbers of the form a + b 2, where a, b ∈ Q, form a
subfield in R, and a Q-vector space.
170. For q = i cos θ + j sin θ, compute q 2 , q −1 . 171. Prove multiplicativity of the norm of quaternions: |q1 q2 | = |q1 | |q2 |. 172. Prove that in H, (q1 q2 )∗ = q2∗ q1∗ .
173. Prove that if f : Hn → H is a quaternionic linear function, then
qf , where q ∈ H, is also a quaternionic linear function, but f q generally
speaking not.
174. On the set Hom(Hn , Hm ) of all H-linear maps A : Hn → Hm , define
the structure of a right H-vector space. 175. Let x + yj ∈ C2 be complex coordinates on H. Show that multiplication by q = z + wj ∈ H on the left is, generally speaking, not C-linear, but
on the right is, and find its 2 × 2-matrix. 176.⋆ Find all quaternionic square roots of −1, i.e. solve the equation
q 2 = −1 for q ∈ H. Chapter 3
Simple Problems
1
Dimension and Rank
Bases
Let V be a K-vector space. A subset V ⊂ V (finite or infinite) is
called a basis of V if every vector of V can be uniquely written as a
(finite!) linear combination of vectors from V .
Example 1. Monomials xk , k = 0, 1, 2, . . . , form a basis in the
space K[x] since every polynomial is uniquely written as a linear
combination of monomials.
Example 2. In Kn , every vector (x1 , . . . , xn )t is uniquely written
as the linear combination x1 e1 + · · · + xn en of unit coordinate vectors
e1 = (1, 0, . . . , 0)t , . . . , en = (0, . . . , 0, 1)t . Thus, vectors e1 , . . . , en
form a basis. It is called the standard basis of Kn .
The notion of basis has two aspects which can be considered
separately.
Let V ⊂ V be any set of vectors. Linear combinations λ1 v1 +· · ·+
λk vk , where the vectors vi are taken from the subset V , and λi are
arbitrary scalars, form a subspace in V. (Indeed, sums and scalar
multiples of linear combinations of vectors from V are also linear
combinations of vectors from V .) This subspace is often denoted as
Span V . One says that the set V spans the subspace Span V , or that
Span V is spanned by V .
A set V of vectors is called linearly independent, if no vector
from Span V can be represented as a linear combination of vectors
from V in more than one way. To familiarize ourselves with this
notion, let us give several reformulations of the definition. Here is
71
72
Chapter 3. SIMPLE PROBLEMS
one: no two distinct linear combinations of vectors from V are equal
to each other. Yet another one: if two linear combinations of vectors
from V are equal: α1 v1 + · · · + αk vk = β1 v1 + · + βk vk , then their
coefficients must be the same: α1 = β1 , . . . , αk = βk . Subtracting
one linear combination from the other, we arrive at one more reformulation: if γ1 v1 + · · · + γk vk = 0 for some vectors vi ∈ V , then
necessarily γ1 = · · · = γk = 0. In other words, V is linearly independent, if the vector 0 can be represented as a linear combination
of vectors from V only in the trivial fashion: 0 = 0v1 + · · · + 0vk .
Equivalently, every nontrivial linear combination of vectors from
V is not equal to zero: γ1 v1 + · · · + γk vk 6= 0 whenever at least one
of γi 6= 0.
Of course, a set V ⊂ V is called linearly dependent if it is not
linearly independent. Yet, it is useful to have an affirmative reformulation: V is linearly dependent if and only if some nontrivial linear
combination of vectors from V vanishes, i.e. γ1 v1 + · · · + γk vk = 0,
where at least one of the coefficients (say, γk ) is non-zero. Dividing
by this coefficients and moving all other terms to the other side of
the equality, we obtain one more reformulation: a set V is linearly
dependent if one of its vectors can be represented as a linear combination of the others: vk = −γk−1 γ1 v1 − · · · − γk−1 γk−1 vk−1 .1 Obviously,
every set containing the vector 0 is linearly dependent; every set containing two proportional vectors is linearly dependent; adding new
vectors to a linearly dependent set leaves it linearly dependent.
Thus, a basis of V is a linearly independent set of vectors that
spans the whole space.
Dimension
In this course, we will be primarily concerned with finite dimensional vector spaces, i.e. spaces which can be spanned by finitely
many vectors. If such a set of vectors is linearly dependent, then
one of its vectors is a linear combination of the others. Removing
this vector from the set, we obtain a smaller set that still spans V.
Continuing this way, we arrive at a finite linearly independent set
that spans V. Thus, a finite dimensional vector space has a basis,
consisting of finitely many elements. The number of elements in a
basis does not depend (as we will see shortly) on the choice of the
basis. This number is called the dimension of the vector space V
and is denoted dim V.
1
It is essential here that division by all non-zero scalars is well-defined.
1. Dimension and Rank
73
Let v1 , . . . , vn be a basis of V. Then every vector x ∈ V is
uniquely written as x = x1 v1 + · · · + xn vn . We call (x1 , . . . , xn )
coordinates of the vector x with respect to the basis v1 , . . . , vn .
For y = y1 v1 + · · · + yn vn ∈ V and λ ∈ K, we have:
x + y = (x1 + y1 )v1 + · · · + (xn + yn )vn ,
λx = (λx1 )v1 + · · · + (λn xn )vn .
This means that operations of addition of vectors and multiplication
by scalars are performed coordinate-wise. In other words, the map:
Kn → V :
(x1 , . . . , xn )t 7→ x1 v1 + · · · + xn vn
defines an isomorphism of the coordinate space Kn onto the
vector space V with a basis {v1 , . . . , vn }.
Lemma. A set of n + 1 vectors in Kn is linearly dependent.
Proof. Any two vectors in K1 are proportional and therefore
linearly dependent. We intend to prove the lemma by deducing from
this that any 3 vectors in K2 are linearly dependent, then deducing
from this that any 4 vectors in K3 are linearly dependent, and so on.
Thus we only need to prove that if every set of n vectors in Kn−1 is
linearly dependent then every set of n + 1 vectors in Kn is linearly
dependent too.2
To this end, consider n + 1 column vectors v1 , ..., vn+1 of size
n each. If the last entry in each column is 0, then v1 , ..., vn+1 are
effectively n − 1-columns. Hence some nontrivial linear combination
of v1 , ..., vn is equal to 0 (by the induction hypothesis), and thus
the whole set is linearly dependent. Now consider the case when
at least one column has the last entry non-zero. Reordering the
vectors we may assume that it is the column vn+1 . Subtracting the
column vn+1 with suitable coefficients α1 , ..., αn from v1 , ..., vn we
can form n new columns u1 = v1 − α1 vn+1 , ..., un = vn − αn vn+1 so
that all of them have the last entries equal to zero. Thus u1 , ..., un
are effectively n − 1-vectors and are therefore linearly dependent:
β1 u1 + ... + βn un = 0 for some β1 , ..., βn not all equal to 0. Thus
β1 v1 + ... + βn vn − (α1 β1 + ... + αn βn )vn+1 = 0.
Here at least one of βi 6= 0, and hence v1 , ..., vn+1 are linearly dependent. 2
This way of reasoning is called mathematical induction. Put abstractly,
it establishes a sequence Pn of propositions in two stages called respectively the
base and step of induction: (i) P1 is true; (ii) for all n = 2, 3, 4, . . . , if Pn−1 is
true (the induction hypothesis) then Pn is true.
74
Chapter 3. SIMPLE PROBLEMS
Corollaries. (1) Any set of m > n vectors in Kn is linearly dependent.
(2) Kn and Km are not isomorphic unless n = m.
(3) Every finite dimensional vector space is isomorphic
to exactly one of the spaces Kn .
(4) In a finite dimensional vector space, all bases have
the same number of elements. In particular, dimension is
well-defined.
(5) Two finite dimensional vector spaces are isomorphic
if and only if their dimensions are equal.
Indeed, (1) is obvious because adding new vectors to a linearly
dependent set leaves it linearly dependent. Since the standard basis
in Km consists of m linearly independent vectors, Km cannot be
isomorphic to Kn if m > n. This implies (2) and hence (3), because
two spaces isomorphic to a third one are isomorphic to each other.
Now (4) follows, since the choice of a basis of n elements establishes
an isomorphism of the space with Kn . Rephrasing (3) in terms of
dimensions yields (5).
Example 3. Let V be a vector space of dimension n, and let
v1 , . . . , vn be a basis of it. Then every vector x ∈ V can be uniquely
written as x = x1 v1 + · · · + xn vn . Here x1 , . . . , xn can be considered
as linear functions from V to K. Namely, the function xi takes the
value 1 on the vector vi and the value 0 on all vj with j 6= i. Every
linear function f : V → K takes on a vector x the value f (x) =
x1 f (v1 ) + · · · + xn f (vn ). Therefore f is the linear combination of
x1 , . . . , xn with the coefficients f (v1 ), . . . , f (vn ), i.e. x1 , . . . , xn span
the dual space V ∗ . In fact, they are linearly independent, and thus
form a basis of V ∗ . Indeed, if a linear combination γ1 x1 + · · · + γn xn
coincides with the identically zero function, then its values γi on the
vectors vi must be all zeroes. We conclude that the dual space V ∗
has the same dimension n as V and is isomorphic to it. The
basis x1 , . . . , xn is called the dual basis of V ∗ with respect to the
basis v1 , . . . , vn of V.
Remark. Corollaries 3, 5, and Example 3 suggest that in a sense
there is “only one” K-vector space in each dimension n = 0, 1, 2, . . . ,
namely Kn . The role of this fact, which is literally true if the uniqueness is understood up to isomorphism, should not be overestimated.
An isomorphism Kn → V is determined by the choice of a basis in
V, and is therefore not unique. For example, the space of polynomials of degree < n in one indeterminate x has dimension n and is
isomorphic to Kn . However, different isomorphisms may be useful
75
1. Dimension and Rank
for different purposes. In elementary algebra one would use the basis
1, x, x2 , . . . , xn−1 . In Calculus 1, x, x2 /2, . . . , xn−1 /(n − 1)! may be
more common. In the theory of interpolation the basis of Lagrange
polynomials is used:
Q
j6=i (x − xj )
Li (x) = Q
.
j6=i (xi − xj )
Here x1 , . . . , xn are given distinct points on the number line, and
Li (xj ) = 0 for j 6= i and Li (xi ) = 1. The theory of orthogonal
polynomials leads to many other important bases, e.g. those formed
by Chebyshev polynomials3 Tk , or Hermite polynomials Hk :
Tk (x) = cos k cos−1 (x) ,
Hk (x) = ex
2
dk −x2
e .
dxk
There is no preferred basis in an n-dimensional vector space V (and
hence no preferred isomorphism between V and Kn ).
Figure 28
Figure 29
The lack of the preferred isomorphism becomes really important
when continuous families of vector spaces get involved. For instance,
consider on the plane R2 , all subspaces of dimension 1 (Figure 28).
When subspaces rotate, one can pick a basis vector in each of them,
which would vary continuously, but when the angle of rotation approaches π, the direction of the vector disagrees with the initial direction of the same line. In fact it is impossible to choose bases in
all the lines in a continuous fashion. The reason is shown on Figure
29: The surface formed by the continuous family of 1-dimensional
subspaces in R2 has the topology of a Möbius band (rather than
a cylinder). The Möbius band is a first example of nontrivial vector bundles. Vector bundles are studied in Homotopic Topology. It
turns out that among all k-dimensional vector bundles (i.e. continuous families of k-dimensional vector spaces) the most complicated are
the bundles formed by all k-dimensional subspaces in the coordinate
space of dimension n ≫ k.
3
After Pafnuty Chebyshev (1821– 1984).
76
Chapter 3. SIMPLE PROBLEMS
Corollary 6. A finite dimensional space V is canonically
isomorphic to its second dual V ∗∗ .
Here “canonically” means that there is a preferred isomorphism.
Namely, the isomorphism is established by the map E : V → V ∗∗ from
Example 5b of Chapter 2, Section 3. Recall that to a vector v ∈ V,
it assigns the linear function Ev : V ∗ → K, defined by evaluation of
linear functions on the vector v. The kernel of this map is trivial (e.g.
because one can point a linear function that takes a non-zero value
on a given non-zero vector). The range E(V) must be a subspace
in V ∗∗ isomorphic to V. But dim V ∗∗ = dim V ∗ = dim V. Thus the
range must be the whole space V ∗∗ .
Rank
In the previous subsection, we constructed a basis of a finite dimensional vector space by starting from a finite set that spans it and
removing unnecessary vectors. Alternatively, one can construct a
basis by starting from any linearly independent set and adding, one
by one, new vectors linearly independent from the previous ones.
Since the number of such vectors cannot exceed the dimension of
the space, the process will stop when the vectors span the whole
space and form therefore a basis. Thus we have proved that in a
finite dimensional vector space, every linearly independent set
of vectors can be completed to a basis.4 We are going to use
this in the proof of the Rank Theorem.
The rank of a linear map A : V → W is defined as the dimension
of its range: rk A := dim A(V).
n
m
Example. Consider
the map Er : K → K given by the block
Ir 0
matrix Er =
, of size m × n, where the left upper block is
0 0
the identity matrix Ir of size r×r, and the other three blocks are zero
matrices of appropriate sizes. In standard coordinates (x1 , . . . , xn )
in Kn and (y1 , . . . , ym ) in Km , the map Er is given by the formulas
y1 = x1 , . . . , yr = xr , yr+1 = 0, . . . , ym = 0. The range of Er is
the subspace of dimension r in Km given by the m − r equations
yr+1 = · · · = ym . Thus rk Er = r. The kernel of Er is the subspace
of dimension n − r in Kn given by r equations x1 = · · · = xr = 0.
The map can be viewed geometrically as the projection along the
kernel onto the range.
4
Using the so called transfinite induction one can prove the same for infinite
dimensional vector spaces as well.
77
1. Dimension and Rank
The Rank Theorem. A linear map A : V → W of rank r
between two vector spaces of dimensions n and m is given
by the matrix Er in suitable bases of the spaces V and W.
Proof. Let f1 , . . . , fr be any basis in the range A(V) ⊂ W. Complete it to a basis of W by choosing vectors fr+1 , . . . , fm as explained
above. Pick vectors e1 , . . . , er ∈ V such that Aei = fi . (They exist
because fi lie in the range of A.) Take vectors er+1 , er+2 , . . . to form
a basis in the kernel of A. We claim that e1 , . . . , er , er+1 . . . form a
basis in V (and in particular the total number of these vectors is equal
to n). The theorem follows from this, since Aei = fi for i = 1, . . . r,
and Aei = 0 for i = r + 1, . . . , n, and hence the matrix of A in these
bases coincides with Er .
A
V
v1
W
f1
vr
fr
v r+1
f r+1
fm
vr+2
Figure 30
To justify the claim, we will show that every vector x ∈ V is
uniquely written as a linear combination of ei . Indeed, we have:
Ax = α1 f1 + · · · + αr fr since Ax lies in the range of A. Then
A(x − α1 e1 − · · · − αr er ) = 0, and hence x − α1 e1 − · · · − αr er lies in
the kernel of A. Therefore x = α1 e1 + · · · + αr er + αr+1 er+1 + · · · ,
i.e. the vectors vi span V. On the other hand, if in the last equality we have x = 0, then Ax = α1 f1 + · · · + αr fr = 0 and hence
α1 = · · · = αr = 0, since fi are linearly independent in W. Finally,
0 = αr+1 er+1 + αr+2 er+2 + . . . implies that αr+1 = αr+2 = · · · = 0
since er+1 , er+2 , . . . are linearly independent in V. Let A be an m×n matrix. It defines a linear map Kn → Km . The
rank of this map is the dimension of the subspace in Km spanned by
columns of A. It is called the rank of the matrix A. Applying the
Rank Theorem to this linear map, we obtain the following result.
Corollary. For every m × n-matrix A of rank r there
exist invertible matrices D and C of sizes m × m and n × n
respectively such that D −1 AC = Er .
78
Chapter 3. SIMPLE PROBLEMS
The Rank Theorem has the following reformulation. Let A : V →
W and A′ : V ′ → W ′ be two linear maps. They are called equivalent
∼
∼
=
=
if there exist isomorphisms C : V ′ → V and D : W ′ → W such that
′
DA = AC. One expresses the last equality by saying that the
following square is commutative:
A′
−→ W ′
V′
∼
∼
C ↓=
=↓ D .
A
V
−→
W
The Rank Theorem′. Linear maps between finite dimensional spaces are equivalent if and only if they have the
same rank.
Indeed, when A′ = D −1 AC, the ranges of A and A′ must have
the same dimension (since C and D are isomorphisms). Conversely,
when rk A = r = rk A′ , each A and A′ is equivalent to Er : Kn → Km
by the Rank Theorem.
Below we discuss further corollaries and applications of the Rank
Theorem.
Adjoint Maps
Given a linear map A : V → W, one defines its adjoint map,
acting between dual spaces in the opposite direction: At : W ∗ → V ∗ .
a
Namely, to a linear function W → K, the adjoint map At assigns the
A
a
composition V → W → K, i.e.:
(At a)(x) = a(Ax) for all x ∈ V and a ∈ W ∗ .
In coordinates, suppose that A : Kn → Km is given by m linear
functions in n variables:
y1 = a11 x1 + · · · + a1n xn
···
ym = am1 x1 + · · · + amn xn .
These equalities show that the elements yi of the dual basis in (Km )∗
are mapped by At to the linear combinations ai1 x1 + · · · + ain xn of
the elements xj of the dual basis in (Kn )∗ . Therefore columns of
the matrix representing the map At in these bases are rows of A.
Thus, matrices of adjoint maps with respect to dual bases
are transposed to each other.
1. Dimension and Rank
79
Corollary 1. Adjoint linear maps have the same rank.
Indeed, when a map A : V → W has the matrix Er in suitable
bases of V and W, the map At has the matrix Ert in respectively dual
bases of W ∗ and V ∗ . Thus rk At = rk Ert = r.
Remark. Here is a more geometric way to understand this fact.
According to the homomorphism theorem, the range of A : V → W
is a subspace in W canonically isomorphic to V/ Ker A. The range of
At is exactly the dual space (V/ Ker A)∗ considered as the subspace
in V ∗ which consists of all those linear functions on V that vanish on
the Ker A. Since dual spaces have the same dimension, we conclude
once again that rk A = rk At .
The range of the linear map A : Kn → Km is spanned by columns
of the matrix A. Therefore the rank of A is equal to the maximal
number of linearly independent columns of A.
Corollary 2. The maximal number of linearly independent rows of a matrix is equal to the maximal number of
linearly independent columns of it.
Ranks and Determinants
Corollary 3. The rank of a matrix is equal to the maximal
size k for which there exists a k × k-submatrix with a nonzero determinant.
Proof. Suppose that a given m × n-matrix has rank r. Then
there exists a set of r linearly independent columns.of it. These
columns form an m × r-matrix of rank r. Therefore there exists a
set of r linearly independent rows of it. These rows form an r × r
matrix M of rank r. By Corollary of the Rank Theorem, this matrix
can be written as the product M = DEr C −1 where D and C are
invertible r × r-matrices, and Er = Ir is the identity matrix of size
r. Since invertible matrices have non-zero determinants, we conclude
that det M = (det D)/(det C) 6= 0.
On the other hand, let M ′ be a k × k-submatrix of the given
matrix, such that k > r. Then columns of M ′ are linearly dependent.
Therefore one of them can be represented as a linear combination of
the others. Since determinant don’t change when from one of the
columns, a linear combination of other columns is subtracted, we
conclude that det M ′ = 0.
80
Chapter 3. SIMPLE PROBLEMS
Systems of Linear Equations — Theory
Let Ax = b be a system of m linear equations in n unknowns x with
the coefficient matrix A and the right hand side b. Let r = rk A.
Corollary 4. (1) The solution set to the homogeneous
system Ax = 0 is a linear subspace in Kn of dimension n − r
(namely, the kernel of A : Kn → Km ).
(2) The system Ax = b is consistent (i.e. has at least one
solution) only when b lies in a certain subspace of dimension
r (namely, in the range of A).
(3) When it does, the solution set to the system Ax = b
is an affine subspace in Kn of dimension n − r parallel to the
kernel of A.
Indeed, this is obviously true in the special case A = Er , and is
therefore true in general due to the Rank Theorem.
A subspace (affine or linear) of dimension n − r in a space of dimension n is said to have codimension r. Thus, rephrasing Corollary 4, we can say that the solution space to a system Ax = b is
either empty (when the column b does not lie in a subspace spanned
by columns of A), or is an affine subspace of codimension r parallel to the solution spaces of the corresponding homogeneous system
Ax = 0. One calls the rank r of the matrix A also the rank of the
system Ax = b.
b
V
A
0
0
U+V
U
T
Figure 31
Figure 32
Consider now the case when the number of equations is equal to
the number of unknowns.
Corollary 5. When det A 6= 0, the linear system Ax = b
of n linear equations with n unknowns has a unique solution
for every b, and when det A = 0, solutions are non-unique
for some (but not all) b and do not exist for all others.
1. Dimension and Rank
81
Indeed, when det A 6= 0, the matrix A is invertible, and x = A−1 b
is the unique solution. When det A = 0, the rank r of the system is
smaller than n. Then the range of A has positive codimension, and
the kernel has positive dimension, both equal to n − r (Figure 31).
Dimension Counting
In 3-space, two distinct planes intersect in a line, and line meets a
plane at a point. How do these geometric statements generalize to
higher dimensions?
It follows from the Rank Theorem, that dimensions of the range
and kernel of a linear map add up to the dimension of the domain
space. We will use this fact to answer the above question.
Corollary 6. If linear subspaces of dimensions k and l
span together a subspace of dimension n, then their intersection is a linear subspace of dimension k + l − n.
Proof. Let U, V ⊂ W be linear subspaces of dimensions k and
l in a vector space W, and T = U ∩ V be their intersection (Figure
32). Denote by U + V ⊂ W the subspace of dimension n spanned
by vectors of U and V. Define a linear map A : U ⊕ V → W, where
U ⊕ V = {(u, v)|u ∈ U, v ∈ V} is the direct sum, by A(u, v) = u− v.
The range of A coincides with U + V. The kernel of A consists of
all those pairs (u, v), where u ∈ U and v ∈ V, for which u = v.
Therefore Ker A = {(t, t)|t ∈ T } ∼
= T . Thus dim(U + V) + dim T =
dim(U ⊕ V) = dim U + dim V. We conclude that dim T = k + l − n.
EXERCISES
177. Prove that columns of an invertible n × n-matrix form a basis in Kn ,
and vice versa: every basis in Kn is thus obtained.
178. Find the dimension of the subspace in K4 given by two equations:
x1 + x2 + x3 = 0 and x2 + x3 + x4 = 0. 179. Find the dimension of the subspace in RR spanned by functions
cos(x + θ1 ), . . . , cos(x + θn ), where θ1 , . . . , θn are given distinct angles. 180.⋆ In the space of polynomials of degree < n, express the basis xk , k =
0, 1, . . . , n−1 of monomials in terms of the basis Li , i = 1, . . . , n, of Lagrange
polynomials. 181. In R[x], find coordinates of the Chebyshev polynomials T4 in the basis
of monomials xk , k = 0, 1, 2, . . . 182. Professor Dumbel writes his office and home phone numbers as a 7×1matrix O and 1 × 7-matrix H respectively. Help him compute rk(OH). 82
Chapter 3. SIMPLE PROBLEMS
183. Prove that rk A does not change if to a row of A a linear combination
of other rows of A is added.
184. Prove that rk(A + B) ≤ rk A + rk B. 185. Following the proof of the Rank Theorem, find bases in the domain
and the target spaces in which the following linear map A : K3 → K3
y1 = 2x1 − x2 − x3
y2 = −x1 + 2x2 − x3
y3 = −x1 − x2 + 2x3
has the matrix E2 . For which b ∈ K3 the system Ax = b is consistent? 186. Given a linear map A : V → W, its right inverse (respectively,
left inverse) is defined as a linear map B : W → V such that AB =
idW (respectively, BA = idV ), where idV and idW denote the identity
transformations on V and W. Prove that a right (left) inverse to A exists if
and only if rk A = dim W (rk A = dim V), and that neither is unique unless
dim V = dim W .
187. Suppose that a system Ax = b of m linear equations in 2009 unknowns has a unique solution for b = (1, 0, . . . , 0)t . Does this imply that:
(a) Ker A = {0}, (b) rk A = 2009, (c) m ≥ 2009, (d) A−1 exists, (e) At A
is invertible, (f) det(AAt ) 6= 0, (g) rows of A are linearly independent, (h)
columns of A are linearly independent? 188. Given A : Kn → Km , consider two adjoint systems: Ax = f and
At a = b. Prove that one of them (say, Ax = f ) is consistent for a given
right hand side vector (f ∈ Km ) if and only if this vector is annihilated
(i.e. a(f ) = 0) by all linear functions (a ∈ (Km )∗ ) satisfying the adjoint
homogeneous system (At a = 0). 189. Prove that two affine planes lying in a vector space are contained in
an affine subspace of dimension ≤ 5.
190. The solution set of a single non-trivial linear equation a(x) = b is
called a hyperplane (affine if b 6= 0 and linear if b = 0). Show that a
hyperplane is an (affine or linear) subspace of codimension 1.
191. Find possible codimensions of intersections of k linear hyperplanes. 192. Prove that every subspace in Kn can be described as: (a) the range
of a linear map; (b) the kernel of a linear map.
193. Classify linear subspaces in Kn up to linear transformations of Kn . 194.⋆ Classify pairs of subspaces in Kn up to linear transformations. 195. Let K be a finite field of q elements. Compute the number of: (a)
vectors in a K-vector space of dimension n, (b) bases in Kn , (c) n × rmatrices of rank r, (d) subspaces of dimension r in Kn . 196.⋆ Prove that if a field has q elements, then q is a power of a prime. 2. Gaussian Elimination
2
83
Gaussian Elimination
Evaluating the determinant of a 20 × 20-matrix directly from the
definition of determinants requires 19 multiplications for each of the
20! > 2 · 1018 elementary products. On a typical PC that makes 1 gigaflops (i.e. 109 FLoating point arithmetic Operations Per Second),
this would take about 4 · 1010 seconds, which is a little longer than
1000 years. Algorithms based on Gaussian elimination allow your PC
to evaluate much larger determinants in tiny fractions of a second.
Row Reduction
Usually, solving a system of linear algebraic equations with coefficients given numerically we, using one of the equations, express the
1st unknown via the other unknowns and eliminate it from the remaining equations, then express the 2nd unknown from one of the
remaining equations, etc., and finally arrive to an equivalent algebraic system which is easy to solve starting from the last equation
and working backward. This computational procedure called Gaussian elimination can be conveniently organized as a sequence of
operations with rows of the coefficient matrix of the system. Namely,
we use three elementary row operations:
• transposition of two rows;
• division of a row by a non-zero scalar;
• subtraction of a multiple of one row from another one.
Example 1. Solving the system
x2 + 2x3 =
3
2x1 + 4x2
= −2
3x1 + 5x2 + x3 =
0
by Gaussian elimination, we pull the 2nd equation up (since the 1st
equation does not contain x1 ), divide it by 2 (in order to express
x1 via x2 ) and subtract it 3 times from the 3rd equation in order
to get rid of x1 therein. Then we use the 1st equation (which has
become the 2nd one in our pile) in order to eliminate x2 from the
3rd equation. The coefficient matrix of the system is subject to the
elementary row transformations:
0 1 2 |
3
2 4 0 | −2
1 2 0 | −1
 2 4 0 | −2  7→  0 1 2 |
3  7→  0 1 2 |
3 
3 5 1 |
0
3 5 1 |
0
3 5 1 |
0
84
Chapter 3. SIMPLE PROBLEMS
1 2 0 | −1
1 2 0 | −1
1 2 0 | −1
3 .
3  7→  0 1 2 |
3  7→  0 1 2 |
7→  0 1 2 |
0 0 1|
2
0 0 3|
6
0 −1 1 |
3
The final “triangular” shape of the coefficient matrix is an example
of the row echelon form. If read from bottom to top, it represents
the system x3 = 2, x2 + 2x3 = 3, x1 + 2x2 = −1 which is ready to
be solved by back substitution: x3 = 2, x2 = 3 − 2x3 = 3 − 4 =
−1, x1 = −1 − 2x2 = −1 + 2 = 1. The process of back substitution,
expressed in the matrix form, consists of a sequence of elementary
row operations of the third type:
1 0 0|
1
1 2 0 | −1
1 2 0 | −1
 0 1 2|
3  7→  0 1 0 | −1  7→  0 1 0 | −1  .
0 0 1|
2
0 0 1|
2
0 0 1|
2
The last matrix is an example of the reduced row echelon form
and represents the system x1 = 1, x2 = −1, x3 = 2 which is “already
solved”.
In general, Gaussian elimination is an algorithm of reducing an
augmented matrix to a row-echelon form by means of elementary
row operations. By an augmented matrix we mean simply a matrix
subdivided into two blocks [A|B]. The augmented matrix of a linear system Ax = b in n unknowns is [a1 , ..., an |b] where a1 , ..., an
are columns of A, but we will also make use of augmented matrices with B consisting of several columns. Operating with a row
[a1 , ..., an |b1 , ...] of augmented matrices we will refer to the leftmost
non-zero entry among aj as the leading entry5 of the row. We say
that the augmented matrix [A|B] is in the row echelon form of
rank r if the m × n-matrix A satisfies the following conditions:
• each of the first r rows has the leading entry equal to 1;
• leading entries of the rows 1, 2, ..., r are situated respectively in
the columns with indices j1 , ..., jr satisfying j1 < j2 < ... < jr ;
• all rows of A with indices i > r are zero.
Notice that a matrix in a row echelon has zero entries everywhere
below and to the left of each leading entry. A row echelon form is
called reduced (Figure 33) if all the entries in the columns j1 , ..., jr
above the leading entries are also equal to zero.
5
Also called leading coefficient, or pivot.
85
2. Gaussian Elimination
If the matrix A of a linear system is in the row echelon form and
indeed has one or several zero rows on the bottom, then the system
contains equations of the form 0x1 + ... + 0xn = b. If at least one of
such b is non-zero, the system is inconsistent (i.e. has no solutions).
If all of them are zeroes, the system is consistent and ready to be
solved by back substitution.
t1
1
1 * 0 0 0
1 0 0
1 0
1
r
t2 t3
*
*
*
*
*
*
*
*
t n−r
0
0
0
0
1
0
0
0
0
0
1
*
*
*
*
*
*
0
0
0
0
0
0
0
1
*
*
*
*
*
*
*
*
*
*
*
*
*
*
0
0
0
0
0
0
0
1
m
x1
*
*
*
*
*
*
*
*
*
*
*
xn
Figure 33
Example 2. The following augmented matrix is in the row echelon form of rank 2:
1 2 3 | 0
 0 0 1 | 2 .
0 0 0 | 0
It corresponds to the system x1 + 2x2 + 3x3 = 0, x3 = 2, 0 = 0. The
system is consistent: x3 = 2, x2 = t, x1 = −3x3 − 2x2 = −6 − 2t
satisfy the system for any value of the parameter t.
We see that presence of leading entries in the columns j1 , ..., jr of
the row echelon form allows one to express the unknowns xj1 , ..., xjr
in terms of the unknowns xj with j 6= j1 , ..., jr , while the values
t1 , ..., tn−r of the unknowns xj , j 6= j1 , ..., jr remain completely ambiguous. In this case, solutions to the linear system depend on
the n − r parameters t1 , ..., tn−r .
The algorithm of row reduction of an augmented matrix [A|B] to
the row echelon form can be described by the following instructions.
Let n be the number of columns of A. Then the algorithm consists of
n steps. At the step l = 1, ..., n, we assume that the matrix formed
by the columns of A with the indices j = 1, ..., l − 1 is already in the
row echelon form of some rank s < l, with the leading entries located
86
Chapter 3. SIMPLE PROBLEMS
in some columns j1 < ... < js < l. The l-th step begins with locating
the first non-zero entry in the column l below the row s. If none is
found, the l-th step is over, since the columns 1, ..., l are already in
the row echelon form of rank s. Otherwise the first non-zero entry is
located in a row i(> s), and the following operations are performed:
(i) transposing the rows i and s + 1 of the augmented matrix,
(ii) dividing the whole row s + 1 of the augmented matrix by the
leading entry, which is now as+1,l (6= 0),
(iii) annihilating all the entries in the column l below the leading
entry of the s + 1-st row by subtracting suitable multiples of the
s + 1-st row of the augmented matrix from all rows with indices
i > s + 1.
After that, the l-th step is over since the columns 1, ..., l are now
in the row echelon form of rank s + 1.
When an augmented matrix [A|B] has been reduced to a row
echelon form with the leading entries a1,j1 = ... = ar,jr = 1, the back
substitution algorithm, which reduces it further to a reduces row
echelon form, consists of r steps which we number by l = r, r −1, ..., 1
and perform in this order. On the l-th step, we subtract from each
of the rows i = 1, ..., l − 1 of the augmented matrix, the l-th row
multiplied by ai,jl , and thus annihilate all the entries of the column
jl above the leading entry.
Applications
Row reduction algorithms allow one to compute efficiently determinants and inverses of square matrices given numerically, and to find
a basis in the null space, column space and row space of a given
rectangular matrix (i.e., speaking geometrically, in the kernel of the
matrix, its range, and the range of the transposed matrix).
Proposition 1. Suppose that an m × n-matrix A has been
reduced by elementary row operations to a row echelon form
A′ of rank r with the leading entries a1,j1 = ... = ar,jr = 1,
j1 < ... < jr . Then
(1) rk A = rk A′ = r,
(2) rows 1, . . . , r of A′ form a basis in the row space of A,
(3) the columns of A with indices j1 , ..., jr form a basis in
the column space of A.
Proof. Elementary row operations do not change the space
spanned by rows of the matrix. The non-zero rows of a row ech-
2. Gaussian Elimination
87
elon matrix are linearly independent and thus form a basis in the
row space. In particular, rk A = rk A′ = r.
The row operations change columns a1 , ..., an of the matrix A,
but preserve linear dependencies among them: α1 a1 + ... + αn an = 0
if and only if α1 a′1 + ... + αn a′n = 0. The r columns a′j1 , ..., a′jr of the
matrix A′ in the row echelon form which contain the leading entries
are linearly independent. Therefore columns aj1 , ..., ajr of the matrix
A are linearly independent too and hence form a basis in the column
space of A. Example 3. The following row reduction
1 2 3 −1
1 2
3 −1
1 2 3 −1
 2 4 5
3  7→  0 0 1 −3 
1  7→  0 0 −1
0 0 0
0
0 0 −1
3
3 6 8
0
shows that the matrix has rank 2, rows (1, 2, 3, −1), (0, 0, 1, −3) form
a basis in the row space, and columns (1, 2, 3)t , (3, 5, 8)t a basis in
the column space.
Suppose that the augmented matrix [A|b] of the system Ax = b
has been transformed to a reduced row echelon form [A′ |b′ ] with the
leading entries positioned in the columns j1 < j2 < ... < jr . These
columns are the unit coordinate vectors e1 , ..., er , and the system is
consistent only if b′ is their linear combination, b′ = b′1 e1 + ... +
b′r er . Assuming that this is the case we can assign arbitrary values
t1 , ..., tn−r to the unknowns xj , j 6= j1 , ..., jr , and express xj1 , ..., xjr
as linear inhomogeneous functions of t1 , ..., tn−r . The general solution
to the system will have the form x = v0 + t1 v1 + ... + tn−r vn−r of
a linear combination of some n-dimensional vectors v0 , v1 , ..., vn−r .
We claim that the vectors v1 , ..., vn−r form a basis in the null
space of the matrix A. Indeed, substituting t = 0 we conclude
that v0 satisfies the equation Av0 = b. Therefore x−v0 = t1 v1 +...+
tn−r vn−r form the general solution to the homogeneous system Ax =
0, i.e. the null space of A. In addition, we see that the solution
set to the inhomogeneous system is the affine subspace in
Rn obtained from the null space by the translation through
the vector v0 .
Example 4. Consider the system Ax = 0 with the matrix A
from Example 3. Transform the matrix to the reduced row echelon
form:
1 2 3 −1
1 2 0
8
... 7→  0 0 1 −3  7→  0 0 1 −3  .
0 0 0
0
0 0 0
0
88
Chapter 3. SIMPLE PROBLEMS
The general solution to the system assumes the form
 
−8
−2
−2t1 − 8t2
x1
t1
 0 
 1 
 x2  
 = t1  0  + t2  3  .
 x =
3t2
3
1
0
t2
x4
The columns (−2, 1, 0, 0)t and (−8, 0, 3, 1)t form therefore a basis in
the null space of the matrix A.
Proposition 2. Suppose that in the process of row reduction of an n × n matrix A to a row echelon form A′ row
transpositions occurred k times, and the operations of division by leading entries α1 , ..., αr were performed. If rk A′ < n
then det A = 0. If rk A′ = n then det A = (−1)k α1 ...αn .
Indeed, each transposition of rows reverses the sign of the determinant, divisions of a row by α divides the determinant by α,
and subtraction of a multiple of one row from another one does not
change the determinant. Thus det A = (−1)k α1 ...αr det A′ . The row
echelon matrix is upper triangular. When rk A′ = n, it has n leading
1’s on the diagonal, and hence det A′ = 1. When r < n we have
det A′ = 0.
Proposition 3. Given an n × n-matrix A, introduce the
augmented matrix [A|In ] (where In is the identity matrix) and
transform it to the reduced row-echelon form [A′ |B] by elementary row operations. If A′ = In then B = A−1 .
Indeed, the equality A′ = In means that rk A = n and thus A−1
exists. Then the system Ax = b has a unique solution for any b, and
for b = e1 , ..., en the corresponding solutions x = A−1 e1 , ..., A−1 en
are the columns of the inverse matrix A−1 . These solutions can
be found by simultaneous row reduction of the augmented matrices
[A|e1 ], ..., [A|en ] and thus coincide with the columns of the matrix B
in the reduced row-echelon form [In |B].
Example 5. Let us compute det A and A−1 for the matrix of
Example 1. We have:
0 1 2 | 1 0 0
1 2 0 | 0 21 0
 2 4 0 | 0 1 0  7→  0 1 2 | 1 0 0  7→
3 5 1 | 0 0 1
3 5 1 | 0 0 1
1
1
1 2 0 | 0
1
2 0 | 0
2 0
2 0
 0
1 2 | 1
0 0  7→  0 1 2 | 1
0 0 .
1
3
1
0 −1 1 | 0 − 2 1
0 0 1 | 3 − 2 31
2. Gaussian Elimination
89
Here one transposition of rows and divisions by 2 and by 3 were
applied. Thus det A = (−1) · 2 · 3 = −6, and the matrix is invertible.
Back substitution eventually yields the inverse matrix:
1
4
1 2 0 | 0
0
1 0 0 | − 23 − 32
2
3
1
7→  0 1 0 | 31
1 − 32  7→  0 1 0 |
1 − 32  .
3
1
1
1
1
1
1
0 0 1 | 3 −2
0 0 1 |
3
3 −2
3
Remark. Gaussian elimination algorithms are unlikely to work
well for matrices depending on parameters. To see why, try row
reduction in order to solve a linear system of the form (λI − A)x = 0
depending on the parameter λ, or (even better!) apply Gaussian
elimination to the system a11 x1 + a12 x2 = b1 , a21 x1 + a22 x2 = b2
depending on 6 parameters a11 , a12 , a21 , a22 , b1 , b2 .
LP U Decomposition
Gaussian elimination is an algorithm. The fact that it always works is
a theorem. In this and next subsections, we reformulate this theorem
(or rather an important special case of it) first in the language of
matrix algebra, and then in geometric terms.
Recall that a square matrix A is called upper triangular (respectively lower triangular) if aij = 0 for all i > j (respectively
i < j). We call P a permutation matrix if it is obtained from the
identity matrix In by a permutation of columns. Such P is indeed the
matrix of a linear transformation in Kn defined as the permutation
of coordinate axes.
Theorem. Every invertible matrix M can be factored as
the product M = LP U of a lower triangular matrix L, a
permutation matrix P , and an upper triangular matrix U .
The proof of this theorem is based on interpretation of elementary row operations in terms of matrix multiplication. Consider the
following m × m-matrices:
• Tij (i 6= j), a transposition matrix, obtained by transposing
the i-th and j-th columns of the identity matrix;
• Di (d) (d 6= 0), a diagonal matrix, all of whose diagonal entries
are equal to 1 except the i-th one, which is equal to 1/d;
• Lij (α) (i > j), a lower triangular matrix, all of whose diagonal
entries are equal to 1, and all off-diagonal equal to 0 except the entry
in i-th row and j-th column, which is equal to −α.
90
Chapter 3. SIMPLE PROBLEMS
Here are examples T13 , D2 (3), and L24 (−2) of size m = 4:
0
 0
 1
0
0
1
0
0
1
0
0
0
 
0
0  
,
0  
1
 
1 0 0 0
1
0 3 0 0  
,
0 0 1 0  
0 0 0 1
1
0
0
0
0
1
0
2
0
0
1
0
0
0 
.
0 
1
Elementary row operations on a given m×n-matrix can be described
as multiplication on the left by Tij , Di (d), or Lij (α), which results
respectively in transposing the i-th and j-th rows, dividing the i-th
row by d, and subtracting the i-th row times α from the j-th row.
Note that inverses of elementary row operations are also elementary
row operations. Thus, Gaussian elimination allows one to represent
any matrix A as the product of a row echelon matrix with matrices
of elementary row operations. In order to prove the LP U decomposition theorem, we will combine this idea with a modification of the
Gaussian elimination algorithm.
Let M be an invertible n × n-matrix. We apply the row reduction
process to it, temporarily refraining from using permutations and
divisions of rows and using only the row operations equivalent to left
multiplication by Lij (α). On the i-th step of the algorithm, if the
i-th row does not contain a non-zero entry where expected, we don’t
swap it with the next row. Instead, we locate in this row the leading
(i.e. leftmost non-zero) entry, which must exist since the matrix is
invertible. When it is found in a column j, we subtract multiples of
the row i from rows i + 1, . . . , n with such coefficients that all entries
of these rows in the column j become annihilated.
Example 7. Let us illustrate the modified row reduction with
the matrix taken from Example 1, and at the same time represent
the process as matrix factorization. On the first step, we subtract
the 1st row from the 2nd and 3rd 4 and 5 times respectively, and
on the second step subtract the 2nd row times 23 from the 3rd. The
lower triangular factors shown are inverses of the matrices L12 (4),
L13 (5) and L23 ( 32 ). The leading entries are boldfaced:
 
1 0 0
1 0 0
0 1
2
0 1 2
 2 4 0  =  4 1 0   0 1 0   2 0 −8 
0 0 1
5 0 1
3 0 −9
3 5 1
1 0 0
0 1
2
1 0 0
1 0 0
=  4 1 0   0 1 0   0 1 0   2 0 −8  .
0 0
3
0 0 1
5 0 1
0 32 1
2. Gaussian Elimination
91
Products of lower triangular matrices are lower triangular. As a
result, applying the modified row reduction we obtain a factorization
of the form M = LM ′ , where L is a lower triangular matrix with
all diagonal entries equal to 1. Note that on the ith step of the
algorithm, when a leading entry in the row i is searched, the leading
entries of the previous rows can be in any columns j1 , . . . , ji−1 . The
entries of the i-row situated in these columns have been annihilated
at the previous steps. Therefore the leading entry of the i-th row will
be found in a new column. This shows that the columns j1 , . . . , jn
of the leading entries found in the rows 1, . . . , n are all distinct and
thus form a permutation of {1, . . . , n}. We now write M ′ = P U ,
. The operation
where P is the matrix of the permutation j1,...,n
1 ,...,jn
′
−1
′
′
M 7→ U = P M permutes rows of M and places all leading entries
of the resulting matrix U on the diagonal. Here is how this works in
our example:
 
0 1
2
0 1 0
2 0 −8
 2 0 −8  =  1 0 0   0 1
2 .
0 0
3
0 0 1
0 0
3
Since leading entries are leftmost non-zero entries in their rows, the
matrix U turns out to be upper triangular. This completes the proof.
Remarks. (1) As it is seen from the proof, the matrix L in the
LP U factorization can be required to have all diagonal entries equal
to 1. Triangular matrices with this property are called unipotent.
Alternatively, one can require that U is unipotent.
(2) When the permutation matrix P = In , one obtains the LU
decomposition, M = LU , of some invertible matrices. Which ones?
One can work backward and consider products LDU of a lower and
upper triangular unipotent matrices L and U with an invertible diagonal matrix D in between (the so called LDU decomposition).
A choice of such matrices depends on a number of arbitrary parameters: n(n − 1)/2 for each L and U , and n non-zero parameters for D,
i.e. totally n2 . This is equal to the total number of matrix elements,
suggesting that a typical n × n-matrix admits an LDU factorization.
(3) This heuristic claim can be made precise. As illustrated by
the above example of LP U decomposition, when P 6= In , certain
entries of the resulting factor U come out equal to 0. This is because some entries of the matrix M ′ on the right of leading ones are
annihilated at previous steps of row reduction. As a result, such decompositions involve fewer than n2 arbitrary parameters, and hence
cover a positive codimension locus in the matrix space. Thus, the
bulk of the space is covered by factorizations LP U with P = In .
92
Chapter 3. SIMPLE PROBLEMS
Flags and Bruhat cells
A sequence V1 ⊂ V2 ⊂ · · · ⊂ Vn of nested subspaces is said to form a
flag in the space V = Vn . When dim Vk = k for all k = 1, . . . , n, the
flag is called complete.
Given a basis f1 , . . . , fn in V, one can associate to it the standard
coordinate flag (Figure 34)
Span(f1 ) ⊂ Span(f1 , f2 ) ⊂ · · · ⊂ Span(f1 , . . . , fn ) = V.
Let U : V → V be an invertible linear transformation preserving the
standard coordinate flag. Then the matrix of U in the given basis
is upper triangular. Indeed, since U (Vk ) ⊂P
Vk , the vector U fk is a
linear combination of f1 , . . . , fk , i.e. U fk = i≤k uik fi . Since this is
true for all k, the matrix [uik ] is upper triangular. Reversing this
argument, we find that if the matrix of U is upper triangular, then
U preserves the flag.
f
3
V3
V2
f
W n−i+1
Wn−i
Wn−i+1
ei
f1
Vj−1
W n−i+1
Vj
2
V1
Figure 34
Figure 35
Conversely, given a complete flag V1 ⊂ · · · ⊂ Vn = Kn , one can
pick a basis f1 in the line V1 , then complete it to a basis f1 , f2 in the
plane V2 , and so on, until a basis f1 , . . . , fn in the whole space Vn
is obtained, such that Vk = Span(f1 , . . . , fk ) for each k. This shows
that every complete flag in Kn can be obtained from any
other by an invertible linear transformation. Indeed, the linear transformation,
defined
in terms of the standard basis e1 , . . . , en
P
P
of Kn by
xi ei 7→
xi fi , transforms the standard coordinate flag
Span(e1 ) ⊂ · · · ⊂ Span(e1 , . . . , en ) into the given flag V1 ⊂ · · · ⊂ Vn .
Example 8. Let Pσ : Kn → Kn act by a permutation σ =
of coordinate axes. It transforms the standard coordinate
flag into a coordinate flag
1 ... n i1 ... in
Fσ : Span(ei1 ) ⊂ Span(ei1 , ei2 ) ⊂ · · · ⊂ Span(ei1 , . . . , ein ) = Kn ,
2. Gaussian Elimination
93
called so because all the spaces are spanned by vectors of the standard basis. There are n! such flags, one for each permutation. For
n
instance, Fid is the standard coordinate flag. When σ = 1n ...
... 1 , the
flag opposite to the standard one is obtained:
Span(en ) ⊂ Span(en , en−1 ) ⊂ · · · ⊂ Span(en , . . . , e1 ).
Transformation of Kn defined by lower triangular matrices are exactly those that preserve this flag.
Theorem. Every complete flag in Kn can be transformed
into exactly one of n! coordinate flags by invertible linear
transformations preserving one of them (e.g. the flag opposite to the standard one).
Proof. Let F be a given complete flag in Kn , Fid the standard
coordinate flag, M a linear transformation such that F = M (Fid ),
and M = LPσ U its LP U decomposition. Since U (Fid ) = Fid , and
Pσ (Fid ) = Fσ , we find that F = L(Fσ ). Therefore the given flag
F is transformed into a coordinate flag Fσ by L−1 , which is lower
triangular and thus preserves the flag opposite to Fid .
It remains to show that the same flag F cannot be transformed
this way into two different coordinate flags Fσ . Let Vj , dim Vj = j,
denote the spaces of the flag F, and Wn−i = Span(en , . . . , ei+1 ) the
spaces of the flag opposite to the standard one, codim Wn−i = i.
Invertible transformations preserving the spaces Wn−i can change
their intersections with Vj , but cannot change the dimensions of these
intersections. Thus it suffices to show that, in the case of the flag
Fσ , the permutation σ is uniquely determined by these dimensions.
Note that ei ∈ Wn−i+1 but ei ∈
/ Wn−i (Figure 35), i.e. the quotient (1-dimensional!) space Wn−i+1 /Wn−i is spanned by the image
of ei . Suppose that in the flag Fσ , the vector ei first occurs in the
subspace Vj , i.e. σ(j) = i. Consider the increasing sequence of spaces
V1 ⊂ · · · ⊂ Vn = Kn , their intersections with Wn−i+1 , and the projections of these intersections to the quotient space Wn−i+1 /Wn−i .
Examining the ranges of these maps and their dimensions, we find
the sequence of j − 1 zeroes followed by n − j ones. Thus j = σ −1 (i)
is determined by the flag. Remark. The theorem solves the following classification problem:
In a vector space equipped with a fixed complete flag W1 ⊂ · · · ⊂ Wn ,
classify all complete flags up to invertible linear transformations preserving the fixed flag. According to the theorem, there are n! equivalence classes determined by dimensions of intersections of spaces of
94
Chapter 3. SIMPLE PROBLEMS
the flags with the spaces of the fixed flag. The equivalence classes
are known as Bruhat cells. This formulation also shows that the
Gaussian elimination algorithm can be understood as a solution to
a simple geometric classification problem. One can give a purely
geometric proof of the above theorem (and hence a new proof of
the LP U decomposition) by refining the argument in the proof of
the Rank Theorem. the Rank Theorem, Gaussian elimination, and
Bruhat cells under the same roof.
EXERCISES
197. Solve systems of linear equations:
x1 + x2 − 3x3 = −1
2x1 + x2 − 2x3 = 1
x1 + x2 + x3 = 3
x1 + 2x2 − 3x3 = 1
2x1 − x2 − x3 = 4
3x1 + 4x2 − 2x3 = 11
3x1 − 2x2 + 4x3 = 11
2x1 + x2 + x3 = 2
x1 + 3x2 + x3 = 5
x1 + x2 + 5x3 = −7
2x1 + 3x2 − 3x3 = 14
x1 − 2x2 + x3 + x4 = 1
x1 − 2x2 + x3 − x4 = −1
x1 − 2x2 + x3 + 5x4 = 5
x1 − 2x2 + 3x3 − 4x4 = 4
x2 − x3 + x4 = −3
x1 + 3x2
− 3x4 = 1
−7x2 + 3x3 + x4 = −3
2x1 + 3x2 − x3 + 5x4
3x1 − x2 + 2x3 − 7x4
4x1 + x2 − 3x3 + 6x4
x1 − 2x2 + 4x3 − 7x4
3x1 + 4x2 − 5x3 + 7x4 = 0
2x1 − 3x2 + 3x3 − 2x4 = 0
4x1 + 11x2 − 13x3 + 16x4 = 0
7x1 − 2x2 + x3 + 3x4 = 0
=0
=0
=0
=0
x1 + x2 + x3 + x4 + x5 = 7
3x1 + 2x2 + x3 + x4 − 3x5 = −2
.
x2 + 2x3 + 2x4 + 6x6 = 23
5x1 + 4x2 + 3x3 + 3x4 − x5 = 12
198.⋆ Find those λ for which the system is consistent:
2x1 − x2 + x3 + x4 = 1
x1 + 2x2 − x3 + 4x4 = 2
.
x1 + 7x2 − 4x3 + 11x4 = λ
199. For each of the following matrices, find the rank and bases in the null,
column, and row spaces:
0
4 10 1
8 18 7 
 4
(b) 
(a) 
10 18 40 17 
1
7 17 3
14 2
6 8
6 104 21 9
7
6
3 4
35 30 15 20
2
17 
1 
5
95
2. Gaussian Elimination
(c) 
1
0
0
1
4
0
1
0
2
5
0 1 4
0 2 5
1 3 6
3 14 32
6 32 77
 2
 1
 1
(d) 
 1
1
1
1
3
1
1
2
1
1
1
4
1
3
1
1 
2
1
1 
1 
 3 −1
(e) 
1
3
5 
4 −3
4
1
3
2
4
1
−1
0 
.
−2 
1
200. For each of the following matrices, compute the determinant and
inverse matrix, and an LU P decomposition:
"
#
1
1
1
1
2
1 0 0
2
2 −3
1 −1 −1 
2 0 0
 1
 3
1 −1
0
(a)
(b) 
(c) 
1 −1
1 −1 
1
1 3 4
−1
2
1
1 −1 −1
1
2 −1 2 3
the
.
−1
201. Prove that Di−1 (d) = Di (d−1 ) and Lij
(α) = Lij (−α).
202. Prove that the inverse of a permutation matrix P is P t .
203. Prove that every invertible matrix M has an LUP decomposition
M = LU P where L is lower triangular, U upper triangular, and P is a
permutation matrix, and compute such factorizations for the matrices from
the previous exercise. 204. Prove that every invertible matrix M has an PLU decomposition
M = P LU . 205. Prove that every invertible matrix has factorizations of the form U P L,
P U L, and U LP , where L, U , and P stand for lower triangular, upper
triangular, and permutation matrices respectively. 206. List all coordinate complete flags in K3 .
207. For each permutation matrix P of size 4 × 4, describe all upper triangular matrices U which can occur as a result of the modified Gaussian
algorithm from the proof of the LP U decomposition theorem. For each
permutation, find the maximal number of non-zero entries of U .
208.⋆ Compute the dimension of each Bruhat cell, i.e. the number of
parameters on which flags in the equivalence class of Fσ depend. 209.⋆ When K is a finite field of q elements, find the number of all complete
flags in Kn . 210.⋆ Prove that the number of Bruhat cells of dimension l is equal to the
coefficient at q l in the product (called q-factorial)
[n]q ! := (1 + q)(1 + q + q 2 ) · · · (1 + q + q 2 + · · · + q n−1 ).
96
Chapter 3. SIMPLE PROBLEMS
97
3. The Inertia Theorem
3
The Inertia Theorem
We study here the classification of quadratic forms and some generalizations of this problem. The answer actually depends on properties
of the field of scalars K. Our first goal will be to examine the case
K = R. We begin however with a key argument that remains valid
in general.
Orthogonal Bases
We will assume here that the field K does not contain Z2 , i.e. that
1 + 1 6= 0 in K. Then, for any K-vector space, there is a one-toone correspondence between symmetric bilinear forms and quadratic
forms:
1
Q(x) = Q(x, x), Q(x, y) = [Q(x + y) − Q(x) − Q(y)].
2
Let dim V = n, {f1 , . . . , fn } be a basis of V, and (x1 , . . . , xn )
corresponding coordinates. In these coordinates, the quadratic form
Q is written as:
n
n X
X
qij xi xj .
Q(x) =
i=1 j=1
The coefficients qij here are the values Q(fi , fj ) of the corresponding
symmetric bilinear form:
Q(x, y) =
n X
n
X
xi qij yj .
i=1 j=1
These coefficients form a matrix, also denoted Q, which is symmetric:
qij = qji for all i, j = 1, . . . , n. The basis {f1 , . . . , fn } is called Qorthogonal if Q(fi , fj ) = 0 for all i 6= j, i.e. if the coefficient matrix
is diagonal.
Lemma. Every quadratic form in Kn has an orthogonal
basis.
Proof. We use induction on n. For n = 1 the requirement
is empty. Let us construct a Q-orthogonal basis in Kn assuming
that every quadratic form in Kn−1 has an orthogonal basis. If Q
is the identically zero quadratic form, then the corresponding symmetric bilinear form is identically zero too, and so any basis will be
98
Chapter 3. SIMPLE PROBLEMS
Q-orthogonal. If the quadratic form Q is not identically zero, then
there exists a vector f1 such that Q(f1 ) 6= 0. Let V be the subspace in Kn consisting of all vectors Q-orthogonal to f1 : V = {x ∈
Kn |Q(f1 , x) = 0.}. This subspace does not contain f1 and is given
by 1 linear equation. Thus dim V = n − 1. Let {f2 , . . . , fn } be a
basis in V orthogonal with respect to the symmetric bilinear form
obtained by restricting Q to this subspace. Such a basis exists by
the induction hypothesis. Therefore Q(fi , fj ) = 0 for all 1 < i < j.
Besides, Q(f1 , fi ) = 0 for all i > 1, since fi ∈ V. Thus {f1 , f2 , . . . , fn }
is a Q-orthogonal basis of Kn . Corollary. For every symmetric n × n-matrix Q with entries from K there exists an invertible matrix C such that
C t QC is diagonal.
The diagonal entries here are the values Q(f1 ), . . . , Q(fn ).
Inertia Indices
Consider the case K = R.
Given a quadratic form Q in Rn , we pick a Q-orthogonal basis {f1 , . . . , fn } and then rescale those of the basis vectors for which
Q(fi ) 6= 0: fi 7→ f̃i = |Q(fi )|−1/2 fi . After such rescaling, the non-zero
coefficients Q(f̃i ) of the quadratic form will become ±1. Reordering the basis so that terms with positive coefficients come first, and
negative next, we transform Q to the normal form:
2
2
Q = X12 + · · · + Xp2 − Xp+1
− · · · − Xp+q
, p + q ≤ n.
Note that by restricting Q to the subspace Xp+1 = · · · = Xn = 0 of
dimension p we obtain a quadratic form on this subspace which is
positive (or positive definite), i.e takes on positive values everywhere outside the origin.
Proposition. The numbers p and q of positive and negative squares in the normal form are equal to the maximal
dimensions of the subspaces in Rn where the quadratic form
Q (respectively, −Q) is positive.
2 −...−X 2
Proof. The quadratic form Q = X12 +...+Xp2 −Xp+1
p+q is
non-positive everywhere on the subspace W of dimension n − p given
by the equations X1 = ... = Xp = 0. Let us show that the existence of
a subspace V of dimension p + 1 where the quadratic form is positive
leads to a contradiction. Indeed, the subspaces V and W would
3. The Inertia Theorem
99
intersect in a subspace of dimension at least (p + 1) + (n − p) − n = 1,
containing therefore non-zero vectors x with Q(x) > 0 and Q(x) ≤ 0.
Thus, Q is positive on some subspace of dimension p and cannot be
positive on any subspace of dimension > p. Likewise, −Q is positive
on some subspace of dimension q and cannot be positive on any
subspace of dimension > q. The maximal dimensions of positive subspaces of Q and −Q
are called respectively positive and negative inertia indices of
a quadratic form in question. By definition, inertia indices of a
quadratic form do not depend on the choice of a coordinate system.
Our Proposition implies that the normal forms with different pairs
of values of p and q are pairwise non-equivalent. This establishes the
Inertia Theorem (as stated in Section 4 of Chapter 1).
Theorem. Every quadratic form in Rn by a linear change
of coordinates can be transformed to exactly one of the normal forms:
2
2
X12 + ... + Xp2 − Xp+1
− ... − Xp+q
, where 0 ≤ p + q ≤ n.
The matrix formulation of the Inertia Theorem reads:
Every real symmetric
 matrix Q can
 be transformed to exactly one
Ip 0 0
of the diagonal forms  0 −Iq 0  by transformations of the form
0
0 0
Q 7→ C t QC defined by invertible real matrices C.
Complex Quadratic Forms
Consider the case K = C.
Theorem. Every quadratic form in Cn can be transformed
by linear changes of coordinates to exactly one of the normal forms:
z12 + · · · + zr2 , where 0 ≤ r ≤ n.
Proof. Given a quadratic form Q, pick a Q-orthogonal basis in
Cn , order it in such a way that vectors f1 , . . . p
, fr with Q(fi ) 6= 0 come
first, and then rescale these vectors by fi 7→ Q(fi )fi .
100
Chapter 3. SIMPLE PROBLEMS
In particular, we have proved that every complex symmetric
maIr 0
trix Q can be transformed to exactly one of the forms
by
0 0
the transformations of the form Q 7→ C t QC defined by invertible
complex matrices C. As it follows from the Rank Theorem, here
r = rk Q, the rank of the coefficient matrix of the quadratic form.
This guarantees that the normal forms with different values of r are
pairwise non-equivalent, and thus completes the proof. To establish the geometrical meaning of r, consider a more general
situation.
Given a quadratic form Q on a K-vector space V, its kernel
is defined as the subspace of V consisting of all vectors which are
Q-orthogonal to all vectors from V:
Ker Q := {z ∈ V|Q(z, v) = 0 for all v ∈ V.}
Note that the values Q(x, y) do not change when a vector from the
kernel is added to either of x and y. As a result, the symmetric
bilinear form Q descends to the quotient space V/ Ker Q.
The rank of a quadratic form Q on Kn is defined as the codimension of Ker Q. For example, the quadratic form z12 +· · ·+zr2 on Kn corresponds to the symmetric bilinear form x1 y1 +· · ·+xr yr , and has the
kernel of codimension r defined by the equations z1 = · · · = zr = 0.
Conics
The set of all solutions to one polynomial equation in Kn :
F (x1 , . . . , xn ) = 0
is called a hypersurface. When the polynomial F does not depend
on one of the variables (say, xn ), the equation F (x1 , . . . , xn−1 ) = 0
defines a hypersurface in Kn−1 . Then solution set in Kn is called
a cylinder, since it is the Cartesian product of the hypersurface in
Kn−1 and the line of arbitrary values of xn .
Hypersurfaces defined by polynomial equations of degree 2 are
often referred to as conics — a name reminiscent of conic sections,
which are “hypersurfaces” in K2 . The following application of the
Inertia Theorem allows one to classify all conics in Rn up to equivalence defined by compositions of translations with invertible linear
transformations.
101
3. The Inertia Theorem
Theorem. Every conic in Rn is equivalent to either the
cylinder over a conic in Rn−1 , or to one of the conics:
x21 + · · · + x2p − x2p+1 − · · · − x2n = 1, 0 ≤ p ≤ n,
x21 + · · · + x2p = x2p+1 + · · · + x2n , 0 ≤ p ≤ n/2,
xn = x21 + · · · + x2p − x2p+1 − · · · − x2n−1 , 0 ≤ p ≤ (n − 1)/2,
known as hyperboloids, cones, and paraboloids respectively.
For n = 3, all types of “hyperboloids” (of which the first type
contains spheres and ellipsoids) are shown in Figures 36, and cones
and paraboloids in Figure 37.
p=3
p=2
p=1
p=0
Figure 36
p=0
p=0
p=1
p=1
Figure 37
Proof. Given a degree 2 polynomial F = Q(x) + a(x) + c, where
Q is a non-zero quadratic form, a a linear form, and c a constant.
we can apply a linear change of coordinates to transform Q to the
form ±x21 ± · · · ± x2r , where r ≤ n, and then use the completion
of squares in the variables x1 , . . . , xr to make the remaining linear
form independent of x1 , . . . , xr . When r = n, the resulting equations
±x21 ± · · · ± x2n = C (where C is a new constant) define hyperboloids
(when C 6= 0), or cones (when C = 0). When r < n, we can take the
remaining linear part of the function F (together with the constant)
for a new, r + 1-st coordinate, provided that this linear part is nonconstant. When r = n − 1, we obtain the equations of paraboloids.
When r < n − 1, or if r = n − 1, but the linear function was constant,
the function F , written in new coordinates, does not depend on the
last of them, and this defines the cylinder over a conic in Rn−1 . 102
Chapter 3. SIMPLE PROBLEMS
Classification of conics in Cn is obtained in the same way, but
the answer looks simpler, since there are no signs ± in the normal
forms of quadratic forms over C.
Theorem. Every conic in Cn is equivalent to either the
cylinder over a conic in Cn−1 , or to one of the three conics:
2
z12 + · · · + zn2 = 1, z12 + · · · + zn2 = 0, zn = z12 + · · · + zn−1
.
Example. Let Q be a non-degenerate quadratic form with real
coefficients in 3 variables. According to the previous (real) classification theorem, the conic Q(x1 , x2 , x3 ) = 1 can be transformed by a
real change of coordinates into one of the 4 normal forms shown on
Figure 36. The same real change of coordinates identifies the set of
complex solutions to the equation Q(z1 , z2 , z3 ) = 1 with that of the
normal form:√±z12 ± z22 + ±z32 = 1. However, −z becomes z after the
change z 7→ −1z, which identifies the set of complex solutions with
the complex sphere in C3 , given by the equation z12 + z22 + z32 = 1.
Thus, various complex conics equivalent to the complex sphere and
given by equations with real coefficients, “expose” themselves in R3
by various real forms: real spheres or ellipsoids, hyperboloids of one
or two sheets (as shown on figure 36), or even remain invisible (when
the set of real points is empty).
Remark. The same holds true in general: various hyperboloids
(as well as cones or paraboloids) of the real classification theorem are
real forms of complex conics defined by the same equations. They
become equivalent when complex changes of coordinates are allowed.
In this sense, the three normal forms of the last theorem represent
hyperboloids, cones and paraboloids of the previous one.
Hermitian and Anti-Hermitian Forms
We introduce here a variant of the notion of a quadratic form which
makes sense in complex vector spaces. It has no direct analogues
over arbitrary fields, but it plays a central role in geometry and
mathematical physics.
Let V be a C-vector space. A function6 P : V × V → C is called
a sesquilinear form if it is linear in the second argument, i.e.
P (z, λx+µy) = λP (z, x)+µP (z, y) for all λ, µ ∈ C and x, y, z ∈ V,
6
We leave it to the reader to examine the possibility when the function P is
defined on the product V × W of two different spaces.
103
3. The Inertia Theorem
and semilinear (or antilinear) in the first one:
P (λx+µy, z) = λ̄P (x, z)+ µ̄P (y, z) for all λ, µ ∈ C and x, y, z ∈ V.
When P is sesquilinear, the form P ∗ defined by
P ∗ (x, y) := P (y, x)
is also sesquilinear, and is called Hermitian adjoint7 to P . When
P ∗ = P , i.e.
P (y, x) = P (x, y) for all x, y ∈ V,
the form P is called Hermitian symmetric. When P ∗ = −P , i.e.
P (y, x) = −P (x, y) for all x, y ∈ V,
the form P is called Hermitian anti-symmetric. When P is Hermitian symmetric, iP is Hermitian anti-symmetric (and vice versa)
since ī = −i. Every sesquilinear form is uniquely written as the sum
of an Hermitian symmetric and Hermitian anti-symmetric form:
1
1
P = H + Q, H = (P + P ∗ ), Q = (P − P ∗ ).
2
2
P
P
In coordinates, when x = xi ei and y = yj ej , we have
P (x, y) =
n
m X
X
x̄i pij yj ,
where pij = P (ei , ej ).
i=1 j=1
The coefficients pij of a sesquilinear form can be arbitrary complex
numbers, and form a square matrix which we also denote by P . The
Hermitian adjoint form has the coefficient matrix, denoted P ∗ , whose
entries are p∗ij = P (ej , ei ) = pji . Thus two complex square matrices,
P and P ∗ , are Hermitian adjoint if they are obtained from each
other by transposition and entry-wise complex conjugation:
P ∗ = P̄ t .
A complex square matrix P is called Hermitian if P ∗ = P , and
anti-Hermitian if P ∗ = −P .
To any Hermitian symmetric sesquilinear form on a C-vector
space V, there corresponds an Hermitian quadratic form, or simply Hermitian form, z 7→ H(z, z). Following our abuse-of-notation
convention, we denote this form by the same letter H, i.e. put
H(z) := H(z, z)
for all z ∈ V.
The values H(z) of an Hermitian form are real: H(z, z) = H(z, z).
7
After French mathematician Charles Hermite (1822–1901).
104
Chapter 3. SIMPLE PROBLEMS
In coordinates z =
P
zi ei , an Hermitian form is written as
H(z) =
n X
n
X
z̄i hij zj ,
i=1 j=1
where hij = hji . For example, the Hermitian form
|z|2 = |z1 |2 + · · · + |zn |2 ,
corresponds to the sesquilinear form
hx, yi = x̄1 y1 + · · · + x̄n yn ,
which plays the role of the Hermitian dot product in Cn . Note
that |z|2 > 0 unless z = 0. An Hermitian form on a vector space V
is called positive definite (or simply positive) if all of its values
outside the origin are positive.
Since H(z, w) + H(w, z) = 2 Re H(z, w), we have:
1
[H(z + w) − H(z) − H(w)] ,
2
1
Im H(z, w) = Re H(iz, w) = [H(iz + w) − H(iz) − H(w)] ,
2
Re H(z, w) =
i.e. a Hermitian symmetric sesquilinear form is uniquely determined
by the corresponding Hermitian quadratic form.
Theorem. Every Hermitian form H in Cn can be transformed by a C-linear change of coordinates to exactly one
of the normal forms
|z1 |2 + · · · + |zp |2 − |zp+1 |2 − · · · − |zp+q |2 ,
0 ≤ p + q ≤ n.
Proof. It is the same as in the case of the Inertia Theorem for
real quadratic forms. We pick a vector f1 such that H(f1 ) = ±1,
and consider the subspace V1 consisting of all vectors H-orthogonal
to f1 . It does not contain f1 (since H(f1 , f1 ) = H(f1 ) 6= 0), and
has codimension 1. We consider the Hermitian form obtained by
restricting H to V1 and proceed the same way, i.e. pick a vector
f2 ∈ V1 such that H(f2 ) = ±1, and pass to the subspace V2 consisting
of all vectors of V1 which are H-orthogonal to f2 . The process stops
when we reach a subspace Vr of codimension r in Cn such that the
restriction of the form H to Vr vanishes identically. Then we pick
3. The Inertia Theorem
105
any basis {fr+1 , . . . , fn } in Vr . The vectors f1 , . . . , fn form a basis
in Cn which is H-orthogonal (since H(fi , fj ) = 0 for all i < j by
construction), and H(fi , fi ) = ±1 (for i ≤ r) or = 0 for i > r.
Reordering the vectors f1 , . . . , fr so that those with the values +1
come first, we obtain the required normal form for H, where p+q = r.
To prove that the normal forms with different pairs of values of
p and q are non-equivalent to each other, we show (the same way
as in the case of real quadratic forms) that the number p (q) of
positive (respectively negative) squares in the normal form
is equal to the maximal dimension of a subspace where the
Hermitian form H (respectively −H) is positive definite. .
To any given Hermitian anti-symmetric sesquilinear form Q, there
corresponds an anti-Hermitian form Q(z) := Q(z, z), which takes
on purely imaginary values, and determines the given sesquilinear
form uniquely. Applying the theorem to the Hermitian form iQ, we
obtain the following result.
Corollary 1. An anti-Hermitian form Q in Cn can be
transformed by a C-linear change of coordinates to exactly
one of the normal forms
i|z1 |2 + · · · + i|zp |2 − i|zp+1 |2 − · · · − i|zp+q |2 ,
0 ≤ p + q ≤ n.
Using matrix notation, we can express the Hermitian dot product
by x∗ y (where x∗ = x̄t ), and respectively the values of an arbitrary
sesquilinear form by P (x, y) = x∗ P y. Making a linear change of
variables x = Cx′ , y = Cy′ , we find x∗ P y = (x′ )∗ P ′ y′ , where P ′ is
the coefficient matrix of the same form in the new coordinates. The
value p′ij = P (Cei , Cej ) is the product of the ith row of C ∗ with P
and the jth column of C. Thus P ′ = C ∗ P C. Therefore the previous
results have the following matrix reformulations.
Corollary 2. Any Hermitian (anti-Hermitian) matrix
can be transformed to exactly one of the normal forms
 
 
iIp
0 0
Ip
0 0
 0 −Iq 0   respectively  0 −iIq 0   .
0
0 0
0
0 0
by transformations of the form P 7→ C ∗ P C defined by invertible complex matrices C.
It follows that p + q is equal to the rank of the coefficient matrix
of the (anti-)Hermitian form.
106
Chapter 3. SIMPLE PROBLEMS
Sylvester’s Rule
Let H he a Hermitian n × n-matrix. Denote by ∆0 = 1, ∆1 = h11 ,
∆2 = h11 h22 − h12 h21 , . . . , ∆n = det H the minors formed by the
intersection of the first k rows and columns of H, k = 1, 2, . . . , n
(Figure 38). They are called leading minors of the matrix H.
Note that det H = det H t = det H̄ = det H is real, and the same is
true for each ∆k , since it is the determinant of an Hermitian k × kmatrix. The following result is due to the English mathematician
James Sylvester (1814–1897).
∆
1
∆2
∆
h
11
h
12
h
1n
h21 h22
3
∆n−1
∆n
h
h
n1
nn
Figure 38
Theorem. Suppose that an Hermitian n × n-matrix H has
non-zero leading minors. Then the negative inertia index
of the corresponding Hermitian form is equal to the number
of sign changes in the sequence ∆0 , ∆1 , . . . , ∆n .
Remark. The hypothesis that det H 6= 0 means that the Hermitian form is non-degenerate, or equivalently, that its kernel is
trivial. In other words, for each non-zero vector x there exists y
such that H(x, y) 6= 0. Respectively, the assumption that all leading
minors are non-zero means that restrictions of the Hermitian forms
to all spaces of the standard coordinate flag
Span(e1 ) ⊂ Span(e1 , e2 ) ⊂ · · · ⊂ Span(e1 , . . . , ek ) ⊂ . . .
are non-degenerate. The proof of the theorem consists in classifying such Hermitian forms up to linear changes of coordinates that
preserve the flag.
Proof. As before, we inductively construct an H-orthogonal
basis {f1 , . . . , fn } and normalize the vectors so that H(fi ) = ±1, requiring however that each fk ∈ Span(e1 , . . . , ek ). When such vectors
f1 , . . . , fk−1 are already found, the vector fk , H-orthogonal to them,
can be found (by the Rank Theorem) in the k-dimensional space of
3. The Inertia Theorem
107
the flag, and can be assumed to satisfy H(fk ) = ±1, since the Hermitian form on this space is non-degenerate. Thus, an Hermitian
form non-degenerate on each space of the standard coordinate flag can be transformed to one (and in fact exactly one)
of the 2n normal forms ±|z1 |2 ± · · · ± |zn |2 by a linear change
of coordinates preserving the flag.
In matrix form, this means that there exists an invertible upper
triangular matrix C such that D = C ∗ HC is diagonal with all diagonal entries equal to ±1. Note that transformations of the form
H 7→ C ∗ HC may change the determinant but preserve its sign:
det(C ∗ HC) = (det C ∗ )(det H)(det C) = det H| det C|2 .
When C is upper triangular, the same holds true for all leading
minors, i.e. each ∆k has the same sign as the leading k × k-minor
of the diagonal matrix D with the diagonal entries d1 , . . . , dn equal
±1. The latter minors form the sequence 1, d1 , d1 d2 , . . . , d1 . . . dk , . . . ,
where the sign is changed each time as dk = −1. Thus the total
number of sign changes is equal to the number of negative squares
in the normal form. When the form H is positive definite, its restrictions to any subspace is positive definite and hence non-degenerate automatically.
We obtain the following corollaries.
Corollary 1. Any positive definite Hermitian form in Cn
can be transformed into |z1 |2 + · · · + |zn |2 by a linear change
of coordinates preserving a given complete flag.
Corollary 2. A Hermitian form in Cn is positive definite
if and only if all of its leading minors are positive.
Note that the standard basis of Cn isP
orthonormal with respect
to the Hermitian dot product hx, yi =
x̄i yi , i.e. hei , ej i = 0 for
i 6= i, and hei , ei i = 1.
Corollary 3. Every Hermitian form in Cn has an orthonormal basis {f1 , . . . , fn } such that fk ∈ Span(e1 , . . . , ek ).
Remarks. (1) The process of replacing a given basis {e1 , . . . , en }
with a new basis, orthonormal with respect to a given positive definite Hermitian form and such that each fk is a linear combination of
e1 , . . . , ek , is called Gram–Schmidt orthogonalization.
(2) Results of this subsection hold true for quadratic forms in Rn .
Namely, our reasoning can be easily adjusted to this case. Note also
that every real symmetric matrix is Hermitian.
108
Chapter 3. SIMPLE PROBLEMS
Finite Fields
Consider the case K = Zp , the field of integers modulo a prime
number p 6= 2. It consists of p elements corresponding to possible
remainders 0, 1, 2, . . . , p − 1 of integers divided by p. It is indeed
a field due to some facts of elementary number theory. Namely,
when an integer a is not divisible by p, it follows from the Euclidean
algorithm, that the greatest common divisor of a and p, which is 1,
can be represented as their linear combination: 1 = ma+np. Modulo
p, this means that m is inverse to a. Thus every non-zero element of
Zp is invertible.
As we have seen, in classification of quadratic forms, it is important to know which scalars are complete squares.
Examples. (1) Let Q = ax2 and Q′ = a′ x2 be two non-zero
quadratic forms on K1 (i.e. a, a′ ∈ K − {0}). Rescaling x to cx,
where c can be any element from K − {0}, transforms Q into ac2 x2 .
Thus, the quadratic forms in K1 are equivalent if and only if a′ = ac2
for some non-zero c. i.e. if the ratio a′ /a is a complete square in
K − {0}.
(2) When K = C, every element is a complete square, so there is
only one equivalence class of non-zero quadratic forms in C1 . When
K = R, there are two such classes according to the sign of the coefficient (because complete squares are exactly the positive reals).
(3) When K = Zp , there are p − 1 non-zero quadratic forms which
are divided into two equivalence classes. One class can be represented
by the normal form x2 and consists of those quadratic forms whose
coefficient is a complete square, a = c2 6= 0. There are (p − 1)/2 such
forms, i.e. a half of all non-zero ones, since each complete square
a = c2 has exactly two different square roots: c and −c. Let ε 6= 0
be any non-square. Then, when c2 runs all squares, εc2 runs all the
(p − 1)/2 non-squares. Thus Q = εx2 can be taken for the normal
form in the other equivalence class.
(4) In Z13 , there are 12 non-zero elements represented by the
integers ±1, . . . , ±6, their squares are 1, 4, −4, 3, −1, −3 respectively,
and the non-squares are ±2, ±5, ±6. Any of them (e.g. 2) can be
taken for ε. Thus, every non-zero quadratic form on Z113 is equivalent
to either x2 or 2x2 . This example suggests that there may be no
choice of the normal form εx2 good for all Zp at once.
Theorem. Every non-zero quadratic form on Znp , p 6= 2, is
equivalent to exactly one of the forms
x21 + x22 + · · · + x2r , εx21 + x22 + · · · + x2r , 1 ≤ r ≤ n.
3. The Inertia Theorem
109
Proof. First note that both normal forms have rank r. Since the
rank of the coefficient matrix Q of a quadratic form does not change
under the transformations C t QC defined by invertible matrices C,
it suffices to prove that a quadratic form of a fixed rank r > 0 is
equivalent to exactly one of the two normal forms.
Next, the symmetric bilinear form corresponding to the quadratic
form Q of rank r has kernel Ker(Q) (non-trivial when r < n) and
defines a non-degenerate symmetric bilinear form on the quotient
space Kn / Ker(Q) of dimension r. Thus, it suffices to prove that a
non-degenerate quadratic form on Kr = Zrp is equivalent to exactly
one of the two normal forms.
The normal forms, considered as non-degenerate quadratic forms
on Zrp , are not equivalent to each other. Indeed, they have diagonal
coefficient matrices with determinants equal 1 and ε respectively, of
which the first one is a square in Zp , and the second is not. But for
equivalent non-degenerate quadratic forms, the ratio of the determinants is a complete square: det(C t QC)/(det Q) = (det C)2 .
To transform a non-degenerate quadratic form Q on Zrp to one of
the normal forms, we can construct a Q-orthogonal basis {f1 , . . . , fr }
and thus reduce Q to the form a1 x21 +· · ·+ar x2r . Here ai = Q(fi ) 6= 0.
We would like to show that a better choice of a basis can be made,
such that Q(fi ) = 1 for all i > 1. Let us begin with the case r = 2.
Lemma. Given non-zero a, b ∈ Zp , there exist (x, y) ∈ Z2p
such that ax2 + by 2 = 1.
Indeed, when each of x and y runs all p possible values (including
0) each of ax2 and 1 − by 2 takes on (p − 1)/2 + 1 different values.
Since the total number exceeds p, we must have ax2 = 1 − by 2 for
some x and y. Thus, given a non-degenerate quadratic form P = ax2 + by 2 in
there exists f ∈ Z2p , such that P (f ) = 1. Taking a second vector
P -orthogonal to f , we obtain a new basis in which P takes on the
form a′ x2 + b′ y 2 with a′ = 1 and b′ 6= 0.
We can apply this trick r − 1 times to the quadratic form Q =
a1 x21 + · · · + ar x2r using two of the variables at a time, and end up
with the form where ar = ar−1 = · · · = a2 = 1. Finally, rescaling x1
as in Example 3, we can make a1 equal either 1 or ε.
Z2p ,
Remark. Readers comfortable with arbitrary finite fields can easily check that our proof and the theorem remain true over any finite
field K ⊃ Zp , p 6= 2, with any non-square in K taken in the role of ε.
110
Chapter 3. SIMPLE PROBLEMS
The Case of K = Z2 .
This is a peculiar world where 2 = 0, −1 = 1, and where therefore the
usual one-to-one correspondence between quadratic and symmetric
bilinear forms is broken, and the distinction between symmetric and
anti-symmetric forms lost.
Yet, consider a symmetric bilinear form Q on Zn2 :
Q(x, y) =
n X
n
X
xi qij yj , where qij = qji = 0 or 1 for all i, j.
i=1 j=1
The corresponding quadratic form Q(x) := Q(x, x) still exists, but
satisfies Q(x + y) = Q(x) + 2Q(x, y) + Q(y) = Q(x) + Q(y) and
hence defines a linear function Zn2 → Z2 . This linear function can be
identically zero, i.e. Q(x, x) = 0 for all x, in which case the bilinear
form Q is called even. This happens exactly when all the diagonal entries of the coefficient matrix vanish: qii = Q(ei , ei ) = 0 for
all i. Otherwise the bilinear form Q is called odd. We begin with
classifying even non-degenerate forms. As it is implied by the following theorem, such forms exist only in Z2 -spaces of even dimension
n = 2k.
Theorem. Every even non-degenerate symmetric bilinear
form Q on Zn2 in a suitable coordinate system is given by
the formula:
Q(x, y) = x1 y2 + x2 y1 + · · · + x2k−1 y2k + x2k y2k−1 .
(i)
Proof. Pick any f1 6= 0 and find f2 such that Q(f1 , f2 ) = 1.
Such f2 must exist since Q is non-degenerate. In Span(f1 , f2 ), we
have: Q(x1 f1 + x2 f2 , y1 f1 + y2 f2 ) = x1 y2 + x2 y1 , since Q is even, i.e.
Q(f1 , f1 ) = Q(f2 , f2 ) = 0.
Let V denote the space of all vectors Q-orthogonal to Span(f1 , f2 ).
It is given by two linear equations: Q(f1 , x) = 0, Q(f2 , x) = 0, which
are independent (since x = f1 satisfies the first one but not the
second, and x = f2 the other way around). Therefore codim V =
2. If v ∈ V is Q-orthogonal to all vectors from V, then being Qorthogonal to f1 and f2 , it lies in Ker Q, which is trivial. This shows
that the restriction of the bilinear form Q to V is non-degenerate.
We can continue our constriction inductively, i.e. find f3 , f4 ∈ V such
that Q(f3 , f4 ) = 1, take their Q-orthogonal complement in V, and
so on. At the end we obtain a basis f1 , f2 , . . . , f2k−1 , f2k such that
Q(f2i−1 , f2i ) = 1 = Q(f2i , f2i−1 ) for i = 1, . . . , k, and Q(fi , fj ) = 0 for
all other pairs of indices. In the coordinate system corresponding to
this basis, the form Q is given by (i). 111
3. The Inertia Theorem
Whenever Q is given by the formula (i), let us call the basis a
Darboux basis8 of Q.
Consider now the case of odd non-degenerate forms. Let W ⊂ Zn2
denote the subspace given by one linear equation Q(x) = 0. It has
dimension n − 1. The restriction to it of the bilinear form Q is
even, but possibly degenerate. Consider vectors y Q-orthogonal to
all vectors from W. They are given by the system of n − 1 linear
equations: Q(w1 , y) = · · · = Q(wn−1 , y) = 0, where w1 , . . . , wn−1
is any basis of W. Since Q is non-degenerate, and wi are linearly
independent, these linear equations are also independent, and hence
the solution space has dimension 1. Let f0 be the non-zero solution
vector, i.e. f0 6= 0, and Q(f0 , x) = 0 for all x ∈ W. There are two
cases: f0 ∈ W (Figure 39) and f0 ∈
/ W (Figure 40).
f
f0
W
W
f0
V
Figure 40
Figure 39
In the first case, f0 spans the kernel of the form Q restricted to
W. We pick f such that Q(f , f0 ) = 1. Such f exists (since Q is nondegenerate), but f ∈
/ W, i.e. Q(f ) = 1. Let V consist of all vectors
of W which are Q-orthogonal to f . It is a subspace of codimension 1
in W, which does not contain f0 . Therefore the restriction of Q to V
is non-degenerate (and even). Let {f1 , . . . , f2k } (with 2k = n − 2) be
a Darboux basis in V. Then f , f0 , f1 , . . . , f2k form a basis in Zn2 such
that in the corresponding coordinate system:
k
X
(x2i−1 y2i + x2i y2i−1 ).
Q = xy + xy0 + x0 y +
(ii)
i=1
In the second case, Q(f0 , f0 ) = 1, so that the restriction of Q to
W is non-degenerate (and even). Let {f1 , . . . , f2k } (with 2k = n − 1)
be a Darboux basis in W. Then f0 , f1 , . . . , f2k form a basis in Zn2 such
that in the corresponding coordinate system:
Q = x0 y 0 +
k
X
(x2i−1 y2i + x2i y2i−1 ).
i=1
8
After a French mathematician Jean-Gaston Darboux (1842–1917).
(iii)
112
Chapter 3. SIMPLE PROBLEMS
Corollary. A non-degenerate symmetric bilinear form
in Zn is equivalent to (iii) when n is odd, and to one of the
forms (i) or (ii) when n is even.
The Case of K = Q
All previous problems of this section belong to Linear Algebra, which
is Geometry, and hence are relatively easy. Classification of rational
quadratic forms belongs to Arithmetic and is therefore much harder.
Here we can only hope to whet reader’s appetite for the theory which
is one of the pinnacles of classical Number Theory, and refer to [5]
for a serious introduction.
Of course, every non-degenerate quadratic form Q on Qn has a
Q-orthogonal basis and hence can be written as
Q = a1 x21 + · · · + an x2n ,
where a1 , . . . , an are non-zero rational numbers. Furthermore, by
rescaling xi we can make each ai integer and square free (i.e. expressed as a signed product ±p1 . . . pk of distinct primes). The problem is that such quadratic forms with different sets of coefficients can
sometimes be transformed into each other by transformations mixing
up the variables, and it is not obvious how to determine if this is the
case. A necessary condition is that the inertia indices of equivalent
rational forms must be the same, since such forms are equivalent over
R. However, there are many other requirements.
To describe them, let us start with writing integers and fractions
using the binary number system, e.g.:
1
2009(10) = 11111011001(2) , − = −.010101 . . . (2) .
3
We usually learn in school that every rational (and even real) number
can be represented by binary sequences, which are either finite or
infinite to the right. What we usually don’t learn in school is that
rational numbers can also be represented by binary sequences infinite
to the left. For instance,
1
1
− =
= 1 + 22 + 24 + 26 + · · ·(10) = . . . 1010101.(2)
3
1 − 22
For this, one should postulate that powers 2k of the base become
smaller (!) as k increases, and moreover: lim 2k = 0 as k → +∞.
3. The Inertia Theorem
113
Just as the standard algorithms for the addition and multiplication
of finite binary fractions can be extend to binary fractions infinite
to the right, they can be extended to such fractions infinite to the
left. While the former possibility leads to completing the field Q into
R, the latter one gives rise to another completion, denoted Q(2) . In
fact the same construction can be repeated with any prime base p
each time leading to a different completion, Q(p) , called the field of
p-adic numbers.
If two quadratic forms with rational coefficients are equivalent
over Q they must be equivalent not only over R (denoted in this
context by Q(∞) ), but also over Q(p) for each p = 2, 3, 5, 7, 11, . . . .
Classification of quadratic forms over each Q(p) is relatively tame.
For instance, it can be shown that over Q(2) , there are 16 (respectively 15 and 8) equivalence classes of quadratic forms of rank r when
r > 2 (respectively r = 2 and r = 1). However, the classification of
quadratic forms over Q is most concisely described by the following
celebrated theorem.9
Theorem. Two quadratic forms with rational coefficients
are equivalent over Q if and only if they are equivalent over
each Q(p) , p = 2, 3, 5, 7, . . . , ∞.
These infinitely many equivalence conditions are not independent. Remarkably, if all but any one of them are satisfied, then the
last one is satisfied too. It follows, for example, that if two rational
quadratic forms are equivalent over every p-adic field, then they are
equivalent over R.
EXERCISES
211. Find orthogonal bases and inertia indices of quadratic forms:
x1 x2 + x22 , x21 + 4x1 x2 + 6x22 − 12x2 x3 + 18x23 , x1 x2 + x2 x3 + x3 x1 .
P
212. Prove that Q = 1≤i≤j≤n xi xj is positive definite.
213. A minor of a square matrix formed by rows and columns with the
same indices is called principal. Prove that all principal minors of the
coefficient matrix of a positive definite quadratic form are positive.
214.⋆ Let a1 , . . . , ap and b1 , . . . , bq be linear forms in Rn , and let Q(x) =
a21 (x) + · · · + a2p (x) − b21 (x) − · · · − b2q (x). Prove that the positive and
negative inertia indices of Q do not exceed p and q respectively. 9
This is essentially a special case of the Minkowski–Hasse theorem named
after Hermann Minkowski (1864–1909) and Helmut Hasse (1898–1979).
114
Chapter 3. SIMPLE PROBLEMS
215. Find the place of surfaces x1 x2 +x2 x3 = ±1 and x1 x2 +x2 x3 +x3 x1 =
±1 in the classification of conics in R3 .
216. Examine normal forms of hyperboloids in R4 and find out how many
connected components (“sheets”) each of them has. 217. Find explicitly a C-linear transformation that identifies the sets of
complex solutions to the equations xy = 1 and x2 + y 2 = 1.
218. Find the rank of the quadratic form z12 + 2iz1z2 − z22 .
219. Define the kernel of an anti-symmetric bilinear form A on a vector
space V as the subspace Ker A := {z ∈ V | A(z, x) = 0 for all x ∈ V}, and
prove that the form descends to the quotient space V/ Ker A.
220. Classify conics in C2 up to linear inhomogeneous transformations. 221. Find the place of the complex conic z12 − 2iz1 z2 − z22 = iz1 + z2 in the
classification of conics in C2 . 222. Classify all conics in C3 up to linear inhomogeneous transformations.
223. Prove that there are 3n − 1 equivalence classes of conics in Cn .
224. Check that P ∗ = P t if and only if A is real.
225. Show that diagonal entries of an Hermitian matrix are real, and of
anti-Hermitian imaginary.
226. Find all complex matrices which are symmetric and anti-Hermitian
simultaneously. 227.⋆ Prove that a sesquilinear form P of z, w ∈ Cn can be expressed in
terms of its values at z = w, and find such an expression. 228. Define sesquilinear forms P : Cm × Cn → C of pairs of vectors (z, w)
taken from two different spaces, and prove that P (z, w) = hz, P wi, where
P is the m × n-matrix of coefficients of the form, and h·, ·i is the Hermitian
dot product in Cm . 229. Prove that under changes of variables v = Dv′ , w = Cw′ the coefficent matrices of sesquilinear forms are transformed as P 7→ D∗ P C.
230. Prove that hAz, wi = hz, Bwi for all z ∈ Cm , w ∈ Cn if and only if
A = B ∗ . Here h·, ·i denote Hermitian dot products in Cn or Cm . 231. Prove that (AB)∗ = B ∗ A∗ . 232. Prove that for (anti-)Hermitian matrices A and B, the commutator
matrix AB − BA is (anti-)Hermitian.
233. Find out which of the following forms are Hermitian or anti-Hermitian
and transform them to the appropriate normal forms:
z̄1 z2 − z̄2 z1 , z̄1 z2 + z̄2 z1 , z̄1 z1 + iz̄2 z1 − iz̄1 z2 − z̄2 z2 .
234. Prove that for every symmetric matrix Q all of whose leading minors
are non-zero there exists a unipotent upper triangular matrix C such that
3. The Inertia Theorem
115
D = C t QC is diagonal, and express the diagonal entries of D in terms of
the leading minors. 235. Use Sylvester’s rule to find inertia indices of quadratic forms:
x21 + 2x1 x2 + 2x2 x3 + 2x1 x4 , x1 x2 − x22 + x23 + 2x2 x4 + x24 .
236. Compute determinants and inertia indices of quadratic forms:
x21 − x1 x2 + x22 , x21 + x22 + x23 − x1 x2 − x2 x3 .
Pn
P
237. Prove positivity of the quadratic form i=1 x2i − 1≤i<j≤n xi xj .
238.⋆ Prove that when the square of a linear form is added to a positive
quadratic form, the determinant of the coefficient matrix increases. 239. In Z11 , compute multiplicative inverses of all non-zero elements, find
all non-square elements, and find out if any of the quadratic forms x1 x2 ,
x21 + x1 x2 + 3x22 , 2x21 + x1 x2 − 2x22 are equivalent to each other in Z211 .
240. Prove that when p is a prime of the form 4k − 1, then every nondegenerate quadratic form in Znp is equivalent to one of the two normal
forms ±x21 + x22 + · · · + x2n . 241. Prove that in a sutable coordinate system (u, v, w) in the space Z3p of
a b
symmetric 2 × 2-matrices
over Zp , the determinant ac − b2 takes
b c
on the form u2 + v 2 + w2 .
P
P
242. For x ∈ Zn2 , show that
ai x2i = ai xi .
243. Show that on Z22 , there are 4 non-degenerate symmetric bilinear forms,
and find how they are divided into 2 equivalence classes. 244. Let Q be a quadratic form in n variables x1 , . . . , xn over Z2 , i.e. a
sum of monomials xi xj . Associate to it a function of x, y ∈ Zn2 given by
the formula: BQ (x, y) = Q(x + y) + Q(x) + Q(y). Prove that BQ is an
even bilinear form.
245.⋆ Let Q and B denote vector Z2 -spaces of quadratic and symmetric
bilinear forms on Zn2 respectively. Denote by p : Q → B the linear map
Q 7→ BQ , and by q : B → Q the linear map that associates to a symmetric
bilinear form B the quadratic form Q(x) = B(x, x). Prove that the range
of p consists of all even forms and thus coincides with the kernel of q, and
that the range of q in its turn coincides
Pwith the kernel of p, i.e. consists
of all “diagonal” quadratic forms Q = ai x2i .
246. Show that in Q(2) , the field of 2-adic numbers, −1 = . . . 11111.
247.⋆ Compute the (unsigned!) binary representation of 1/3 in Q(2) . 248.⋆ Prove that every non-zero 2-adic number is invertible in Q(2) . 249.⋆ Prove that a 2-adic unit · · · ∗ ∗ ∗ 1. (where ∗ is a wild card) is a
square in Q(2) if and only if it has the form · · · ∗ ∗ ∗ 001.
250.⋆ Prove that over the field Q(2) , there are 8 equivalence classes of
quadratic forms in one variable. 116
Chapter 3. SIMPLE PROBLEMS
251.⋆ Let K, K× , (K× )2 and V := K× /(K× )2 stand for: any field, the set
of non-zero elements in it, complete squares in K× , and equivalence classes
of all non-zero elements modulo complete squares. Show that V, equipped
with the operation, induced by multiplication in K× , is a Z2 -vector space.
Show that when K = C, R, or Q(2) , dim V = 0, 1 and 3 respectively.
252. Let Q and Q′ be quadratic forms in n variables with integer coefficients. Prove that if these forms can be transformed into each other by
linear changes of variables with coefficients in Z, then det Q = det Q′ .
(Thus, det Q, which is called the discriminant of Q, depends only on
the equivalence class of Q.)
Chapter 4
Eigenvalues
1
The Spectral Theorem
Hermitian Spaces
Given a C-vector space V, an Hermitian inner product in V is
defined as a Hermitian symmetric sesquilinear form such that the
corresponding Hermitian quadratic form is positive definite. A space
V equipped with an Hermitian inner product h·, ·i is called a Hermitian space.1
The inner square hz, zi is interpreted as the square of the length
|z| of the vector z. Respectively, the distance between two points z
and w in an Hermitian space is defined as |z−w|. Since the Hermitian
inner product is positive, distance is well-defined, symmetric, and
positive (unless z = w). In fact it satisfies the triangle inequality2 :
|z − w| ≤ |z| + |w|.
This follows from the Cauchy – Schwarz inequality:
|hz, wi|2 ≤ hz, zi hw, wi,
where the equality holds if and only if z and w are linearly dependent.
To derive the triangle inequality, write:
|z − w|2 = hz − w, z − wi = hz, zi − hz, wi − hw, zi + hw, wi
≤ |z|2 + 2|z||w| + |w|2 = (|z| + |w|)2 .
1
2
Other terms used are unitary space and finite dimensional Hilbert space.
This makes a Hermitian space a metric space.
117
118
Chapter 4. EIGENVALUES
To prove the Cauchy–Schwarz inequality, note that it suffices to
consider the case |w| = 1. Indeed, when z = 0, both sides vanish,
and when w 6= 0, both sides scale the same way when w is normalized
to the unit length. So, assuming |w| = 1, we put λ := hw, zi and
consider the projection λw of the vector z to the line spanned by
w. The difference z − λw is orthogonal to w: hw, z − λwi =
hw, zi − λhw, wi = 0. From positivity of inner squares, we have:
0 ≤ hz − λw, z − λwi = hz, z − λw, zi = hz, zi − λhz, wi.
Since hz, wi = hw, zi = λ̄, we conclude that |z|2 ≥ |hz, wi|2 as
required. Notice that the equality holds true only when z = λw.
All Hermitian spaces of the same dimension are isometric (or Hermitian isomorphic), i.e. isomorphic through isomorphisms respecting Hermitian inner products. Namely, as it follows from the Inertia Theorem for Hermitian forms, every Hermitian
space has an orthonormal basis, i.e. a basis e1 , . . . , en such that
hei , ej i = 0 for i 6= j and = 1 for i = j. In the coordinate system
corresponding to an orthonormal basis, the Hermitian inner product
takes on the standard form:
hz, wi = z̄1 w1 + · · · + z̄n wn .
An orthonormal basis is not unique. Moreover, as it follows from
the proof of Sylvester’s rule, one can start with any basis f1 , . . . , fn in
V and then construct an orthonormal basis e1 , . . . , en such that ek ∈
Span(f1 , . . . , fk ). This is done inductively; namely, when e1 , . . . , ek−1
have already been constructed, one subtracts from fk its projection
to the space Span(e1 , . . . , ek−1 ):
f̃k = fk − he1 , fk ie1 − · · · − hek−1 , fk iek−1 .
The resulting vector f̃k lies in Span(f1 , . . . , fk−1 , fk ) and is orthogonal
to Span(f1 , . . . , fk−1 ) = Span(e1 , . . . , ek−1 ). Indeed,
hei , f̃k i = hei , fk i −
k−1
X
j=1
hej , fk ihei , ej i = 0.
for all i = 1, . . . , k − 1. To construct ek , one normalizes f̃k to the unit
length:
ek := hf̃k , f̃k /|f̃k |.
The above algorithm of replacing a given basis with an orthonormal
one is known as Gram–Schmidt orthogonalization.
1. The Spectral Theorem
119
Normal Operators
Let A : V → W be a linear map between two Hermitian spaces. Its
Hermitian adjoint (or simply adjoint is defined as the linear map
A∗ : W → V characterized by the property
hA∗ w, vi = hw, Avi for all v ∈ V, w ∈ W.
One way to look at this construction is to realize that an Hermitian
inner product on a vector space U allows one to assign to each vector
z ∈ U a C-linear function hz, ·i (i.e. the function U → U whose value
at w ∈ U is equal to hz, wi). Moreover, since the inner product is
non-degenerate, each C-linear function on U is represented this way
by a unique vector. Thus, given w ∈ W, one introduces a C-linear
function on V: v 7→ hw, Avi, and defines A ∗ w to be the unique
vector in V that represents this linear function.
Examining the defining identity of the adjoint map in coordinate
systems in V and W corresponding to orthonormal bases, we conclude
that the matrix of A∗ in such bases is the Hermitian adjoint to A,
i.e. A∗ = Āt . Indeed, in matrix product notation,
hw, Avi = w∗ Av = (A∗ w)∗ v = hA∗ w, vi.
Note that (A∗ )∗ = A, and that hAv, wi = hv, A∗ wi for all v ∈ V and
w ∈ W.
Consider now linear maps A : V → V from an Hermitian space
to itself. To such a map, one can associate a sesquilinear form:
A(z, w) := hz, Awi. Vice versa, every sesquilinear form corresponds
this way to a unique linear transformation. (It is especially obvious
when the inner product is written in the standard form hz, wi = z∗ w
using an orthonormal basis.) In particular, one can define Hermitian
(A∗ = A) and anti-Hermitian (A∗ = −A) linear maps which correspond to Hermitian and anti-Hermitian forms respectively.
A linear map A : V → V on a Hermitian vector space is called
normal3 if it commutes with its Hermitian adjoint: A∗ A = AA∗ .
Examples 1. Hermitian and anti-Hermitian transformations are
normal.
Example 2. An invertible linear transformation U : V → V is
called unitary if it preserves inner products:
hU z, U wi = hz, wi for all z, w ∈ V.
3
The term normal operator is frequently in use.
120
Chapter 4. EIGENVALUES
Equivalently, hz, (U ∗ U − I)wi = 0 for all z, w ∈ V. Taking z =
(U ∗ U − I)w, we conclude that (U ∗ U − I)w = 0 for all w ∈ V,
and hence U ∗ U = I. Thus, for a unitary map U , U −1 = U ∗ . The
converse statement is also true (and easy to check by starting from
U −1 = U ∗ and reversing our computation). Since every invertible
transformation commutes with its own inverse, we conclude that unitary transformations are normal.
Example 3. Every linear transformation A : V → V can be
uniquely written as the sum A = B + C of Hermitian (B = (A +
A∗ )/2) and and ant-Hermitian (C = A − A∗ )/2) operators. We
claim that an operator is normal if and only if its Hermitian and
anti-Hermitian parts commute. Indeed, A∗ = B − C, AA∗ = B 2 −
BC + CB − C 2 , A∗ A = B 2 + BC − CB − C 2 , and hence AA∗ = A∗ A
if and only if BC = CB.
The Spectral Theorem for Normal Operators
Let A : V → V be a linear transformation, v ∈ V a vector, and
λ ∈ C a scalar. The vector v is called an eigenvector of A with
the eigenvalue λ, if v 6= 0, and Av = λv. In other words, A
preserves the line spanned by the vector V and acts on this line as
the multiplication by λ.
Theorem. A linear transformation A : V → V on a finite
dimensional Hermitian vector space is normal if and only
if V has an orthonormal basis of eigenvectors of A.
Proof. In one direction, the statement is almost obvious: If a
basis consists of eigenvectors of A, then the matrix of A in this basis
is diagonal. When the basis is orthonormal, the matrix of the adjoint
operator A∗ in this basis is adjoint to the matrix of A and is also
diagonal. Since all diagonal matrices commute, we conclude that A
is normal. Thus, it remains to prove that, conversely, every normal
operator has an orthonormal basis of eigenvectors. We will prove
this in four steps.
Step 1. Existence of eigenvalues. We need to show that there
exists a scalar λ ∈ C such that the system of linear equations Ax =
λx has a non-trivial solution. Equivalently, this means that the linear
transformation λI − A has a non-trivial kernel. Since V is finite
dimensional, this can be re-stated in terms of the determinant of the
matrix of A (in any basis) as
det(λI − A) = 0.
1. The Spectral Theorem
121
This relation, understood as an equation for λ, is called the characteristic equation of the operator A. When A = 0, it becomes
λn = 0, where n = dim V. In general, it is a degree-n polynomial
equation
λn + p1 λn−1 + · · · + pn−1 λ + pn = 0,
where the coefficients p1 , . . . , pn are certain algebraic expressions of
matrix entries of A (and hence are complex numbers). According to
the Fundamental Theorem of Algebra, this equation has a complex
solution, say λ0 . Then det(λ0 I − A) = 0, and hence the system
(λ0 − A)x = 0 has a non-trivial solution, v 6= 0, which is therefore
an eigenvector of A with the eigenvalue λ0 .
Remark. Solutions to the system Ax = λ0 x form a linear subspace W in V, namely the kernel of λ0 I − A, and eigenvectors of A
with the eigenvalue λ0 are exactly all non-zero vectors in W. Slightly
abusing terminology, W is called the eigenspace of A corresponding to the eigenvalue λ0 . Obviously, A(W) ⊂ W. Subspaces with
such property are called A-invariant. Thus eigenspaces of a linear
transformation A are A-invariant.
Step 2. A∗ -invariance of eigenspaces of A. Let W =
6 {0} be the
eigenspace of a normal operator A corresponding to the eigenvalue
λ. Then for every w ∈ W,
A(A∗ w) = A∗ (Aw) = A∗ (λw) = λ(A∗ w).
Therefore A∗ w ∈ W, i.e. the eigenspace W is A∗ -invariant.
Step 3. Invariance of orthogonal complements. Let W ⊂ V be
a linear subspace. Denote by W ⊥ the orthogonal complement of
the subspace W with respect to the Hermitian inner product:
W ⊥ := {v ∈ V | hw, vi = 0 for all w ∈ W.}
Note that if e1 , . . . , ek is a basis in W, then W ⊥ is given by k linear
equations hei , vi = 0, i = 1, . . . , k, and thus has dimension ≥ n − k.
On the other hand, W ∩ W ⊥ = {0}, because no vector w 6= 0
can be orthogonal to itself: hw, wi > 0. It follows from dimension
counting formulas that dim W ⊥ = n − k. Moreover, this implies that
V = W ⊕ W ⊥ , i.e. the whole space is represented as the direct sum
of two orthogonal subspaces.
We claim that if a subspace is both A-and A∗ -invariant, then its
orthogonal complement is also A- and A∗ -invariant. Indeed, suppose
that A∗ (W) ⊂ W, and v ∈ W ⊥ . Then for any w ∈ W, we have:
hw, Avi = hA∗ w, vi = 0, since A∗ w ∈ W. Therefore Av ∈ W ⊥ , i.e.
122
Chapter 4. EIGENVALUES
W ⊥ is A-invariant. By the same token, if W is A-invariant, then W ⊥
is A∗ -invariant.
Step 4. Induction on dim V. When dim V = 1, the theorem is
obvious. Assume that the theorem is proved for normal operators in
spaces of dimension < n, and prove it when dim V = n.
According to Step 1, a normal operator A has an eigenvalue λ,
and let W =
6 {0} be the corresponding eigenspace. If W = V, then
the operator is scalar, A = λI, and any orthonormal basis in V will
consist of eigenvectors of A. If W =
6 V, then (by Steps 2 and 3)
both W and W ⊥ are A- and A∗ -invariant and have dimensions < n.
The restrictions of the operators A and A∗ to each of these subspaces still satisfy AA∗ = A∗ A and hA∗ x, yi = hx, Ayi for all x, y.
Therefore these restrictions remain adjoint to each other normal operators on W and W ⊥ . Applying the induction hypothesis, we can
find orthonormal bases of eigenvectors of A in each W and W ⊥ . The
union of these bases form an orthonormal basis of eigenvectors of A
in V = W ⊕ W ⊥ . Remark. Note that Step 1 is based on the Fundamental Theorem of Algebra, but does not use normality of A and applies to
any C-linear transformation. Furthermore, Step 2 actually applies
to any commuting transformations and shows that if AB = BA then
eigenspaces of A are B-invariant. The fact that B = A∗ is used in
Step 3.
Corollary 1. A normal operator has a diagonal matrix
in a suitable orthonormal basis.
Corollary 2. Let A : V → V be a normal operator, λi distinct roots of its characteristic polynomial, mi their multiplicities,P
and Wi corresponding eigenspaces. Then dim Wi =
mi , and
dim Wi = dim V.
Indeed, this is true for transformations defined by any diagonal
matrices. For normal operators, in addition Wi ⊥ Wj when i 6= j.
In particular we have the following corollary.
Corollary 3. Eigenvectors of a normal operator corresponding to different eigenvalues are orthogonal.
Here is a matrix version of the Spectral Theorem.
Corollary 4. A square complex matrix A commuting
with its adjoint matrix A∗ can be transformed to a diagonal form by transformations A 7→ U ∗ AU defined by unitary
matrices U .
1. The Spectral Theorem
123
Note that for unitary matrices, U ∗ = U −1 , and therefore the
above transformations coincide with similarity transformations A 7→
U −1 AU . This is how the matrix A of a linear transformation changes
under a change of the basis. When both the old and new bases are
orthonormal, the transition matrix U must be unitary (because in
old and new coordinates the Hermitian inner product has the same
standard form: hx, yi = x∗ y). The result follows.
Unitary Transformations
Note that if λ is an eigenvalue of a unitary operator U then |λ| =
1. Indeed, if x 6= 0 is a corresponding eigenvector, then hx, xi =
hU x, U xi = λλ̄hx, xi, and since hx, xi =
6 0, it implies λλ̄ = 1.
Corollary 5. A transformation is unitary if and only if
in a suitable orthonormal basis its matrix is diagonal, and
the diagonal entries are complex numbers of the absolute
value 1.
On the complex line C, multiplication by λ with |λ| = 1 and
arg λ = θ defines the rotation through the angle θ. We will call
this transformation on the complex line a unitary rotation. We
arrive therefore to the following geometric characterization of unitary
transformations.
Corollary 6. Unitary transformations in an Hermitian
space of dimension n are exactly unitary rotations (through
possibly different angles) in n mutually perpendicular complex directions.
Orthogonal Diagonalization
Corollary 7. A linear operator is Hermitian (respectively
anti-Hermitian) if and only if in a suitable orthonormal
basis its matrix is diagonal with all real (respectively imaginary) diagonal entries.
Indeed, if Ax = λx and A∗ = ±A, we have:
λhx, xi = hx, Axi = hA∗ x, xi = ±λ̄hx, xi.
Therefore λ = ±λ̄ provided that x 6= 0, i.e. eigenvalues of a Hermitian operator are real and of anti-Hermitian imaginary. Vice versa,
a real diagonal matrix is obviously Hermitian, and imaginary antiHermitian.
124
Chapter 4. EIGENVALUES
Recall that (anti-)Hermitian operators correspond to (anti-) Hermitian forms A(x, y) := hx, Ayi. Applying the Spectral Theorem
and reordering the basis eigenvectors in the monotonic order of the
corresponding eigenvalues, we obtain the following classification results for forms.
Corollary 8. In a Hermitian space of dimension n, an
Hermitian form can be transformed by unitary changes of
coordinates to exactly one of the normal forms
λ1 |z1 |2 + · · · + λn |zn |2 , λ1 ≥ · · · ≥ λn .
Corollary 9. In a Hermitian space of dimension n, an
anti-Hermitian form can be transformed by unitary changes
of coordinates to exactly one of the normal forms
iω1 |z1 |2 + · · · + iωn |zn |2 , ω1 ≥ · · · ≥ ωn .
Uniqueness follows from the fact that eigenvalues and dimensions
of eigenspaces are determined by the operators in a coordinate-less
fashion.
Corollary 10. In a complex vector space of dimension n,
a pair of Hermitian forms, of which the first one is positive
definite, can be transformed by a choice of a coordinate
system to exactly one of the normal forms:
|z1 |2 + · · · + |zn |2 , λ1 |z1 |2 + · · · + λn |zn |2 , λ1 ≥ · · · ≥ λn .
This is the Orthogonal Diagonalization Theorem for Hermitian forms. It is proved in two stages. First, applying the Inertia
Theorem to the positive definite form one transforms it to the standard form; the 2nd Hermitian form changes accordingly but remains
arbitrary at this stage. Then, applying Corollary 8 of the Spectral
Theorem, one transforms the 2nd Hermitian form to its normal form
by transformations preserving the 1st one.
Note that one can take the positive definite sesquilinear form
corresponding to the 1st Hermitian form for the Hermitian inner
product, and describe the 2nd form as hz, Azi, where A is an operator Hermitian with respect to this inner product. The operator, its
eigenvalues, and their multiplicities are thus defined by the given pair
of forms in a coordinate-less fashion. This guarantees that pairs with
different collections λ1 ≥ . . . λn of eigenvalues are non-equivalent to
each other.
1. The Spectral Theorem
125
Singular Value Decomposition
Theorem. Let A : V → W be a linear map or rank r between
Hermitian spaces of dimensions n and m respectively. Then
there exist orthonormal bases: v1 , . . . , vn of V and w1 , . . . , wm
of W, and positive reals µ1 ≥ · · · ≥ µr , such that
Av1 = µ1 w1 , . . . , Avr = µr wr , Avr+1 = · · · = Avn = 0.
Proof. For x, y ∈ V, put4
H(x, y) := hAx, Ayi = hx, A∗ Ayi.
This is an Hermitian sesquilinear form on V (thus obtained by pulling
the Hermitian inner product on W back to V by means of A). It is
non-negative (i.e. H(x, x) ≥ 0 for all x ∈ V), and corresponds
to the Hermitian operator H := A∗ A. The eigenspace of H corresponding to the eigenvalue 0 coincides with the kernel of A. Indeed,
Hx = 0 implies |Ax| = 0, i.e. Ax = 0.
Applying the Spectral Theorem to H, we obtain an orthonormal
basis of V consisting of eigenvectors v1 , . . . , vr of H with positive
eigenvalues, λ1 ≥ · · · ≥ λr > 0, and eigenvectors vr+1 , . . . , vn lying
−1/2
in the kernel of A. For i ≤ r, put wi = λi
Avi . We have:
s
s
s
1
1
λj
hwi , wj i =
hAvi , Avj i =
hvi , A∗ Avj i =
hvi , vj i,
λi λj
λi λj
λi
which is equal to 1 when i = j and 0 when i 6= j. Thus w1 , . . . , wr
form an orthonormal basis in the range of A. Completing it to
√ an
orthonormal basis of W, we obtain the required result with µi = λi .
Remark. When Av = µw and A∗ w = µv for unit vectors v and
w, they are called right and left singular vectors of A corresponding to the singular value µ. The next reformulation provides the
singular value decomposition of a matrix.
Corollary. For every complex m × n-matrix A of rank
r there exist: unitary matrices U and V of sizes m and n
respectively, and a diagonal r × r-matrix M with positive
diagonal entries, such that
M 0
A = U∗
V.
0 0
4
Note that the 1st inner product is in W while the 2nd one is in V.
126
Chapter 4. EIGENVALUES
Complexification
Since R ⊂ C, every complex vector space can be considered as a
real vector space simply by “forgetting” that one can multiply by
non-real scalars. This operation is called realification; applied to a
C-vector space V, it produces an R-vector space, denoted V R , of real
dimension twice the complex dimension of V.
In the reverse direction, to a real vector space V one can associate
a complex vector space, V C , called the complexification of V. As
a real vector space, it is the direct sum of two copies of V:
V C := {(x, y) | x, y ∈ V}.
Thus the addition is performed componentwise, while the multiplication by complex scalars α + iβ is introduced with the thought in
mind that (x, y) stands for x + iy:
(α + iβ)(x, y) := (αx − βy, βx + αy).
This results in a C-vector space V C whose complex dimension equals
the real dimension of V.
Example. (Rn )C = Cn = {x + iy | x, y ∈ Rn }.
A productive point of view on complexification is that it is a complex vector space with an additional structure that “remembers” that
the space was constructed from a real one. This additional structure
is the operation of complex conjugation (x, y) 7→ (x, −y). The
operation in itself is a map σ : V C → V C , satisfying σ 2 = id, which
is anti-linear over C. The latter means that σ(λz) = λ̄σ(z) for
all λ ∈ C and all z ∈ V C . In other words, σ is R-linear, but anticommutes with multiplication by i: σ(iz) = −iσ(z).
Conversely, let W be a complex vector space equipped with an
anti-linear operator whose square is the identity5 :
σ : W → W, σ 2 = id, σ(λz) = λ̄σ(z) for all λ ∈ C, z ∈ W.
Let V denote the real subspace in W that consists of all σ-invariant
vectors. We claim that W is canonically identified with the
complexification of V: W = V C . Indeed, every vector z ∈ W
is uniquely written as the sum of σ-invariant and σ-anti-invariant
vectors:
1
1
z = (z + σz) + (z − σz).
2
2
5
An transformation whose square is the identity is called an involution.
1. The Spectral Theorem
127
Since σi = −iσ, multiplication by i transforms σ-invariant vectors to
σ-anti-invariant ones, and vice versa. Thus, W as a real space is the
direct sum V ⊕ (iV) = {x + iy | x, y ∈ V}, where multiplication by
i acts in the required for the complexification fashion: i(x + iy) =
−y + ix.
The construction of complexification and its abstract description
in terms of the complex conjugation operator σ are the tools that
allow one to carry over results about complex vector spaces to real
vector spaces. The idea is to consider real objects as complex ones
invariant under the complex conjugation σ, and apply (or improve)
theorems of complex linear algebra in a way that would respect σ.
Example. A real matrix can be considered as a complex one. This
way an R-linear map defines a C-linear map (on the complexified
space). More abstractly, given an R-linear map A : V → V, one
can associate to it a C-linear map AC : V C → V C by AC (x, y) :=
(Ax, Ay). This map is real in the sense that it commutes with the
complex conjugation: AC σ = σAC .
Vice versa, let B : V C → V C be a C-linear map that commutes
with σ: σ(Bz) = Bσ(z) for all z ∈ V C . When σ(z) = ±z, we find
σ(Bz) = ±Bz, i.e. the subspaces V and iV of real and imaginary
vectors are B-invariant. Moreover, since B is C-linear, we find that
for x, y ∈ V, B(x + iy) = Bx + iBy. Thus B = AC where the linear
operator A : V → V is obtained by restricting B to V.
Euclidean Spaces
Let V be a real vector space. A Euclidean inner product (or Euclidean structure) on V is defined as a positive definite symmetric
bilinear form h·, ·i. A real vector space equipped with a Euclidean inner product is called a Euclidean space. A Euclidean inner product
allows one to talk about distances between points and angles between
directions:
p
hx, yi
|x − y| = hx − y, x − yi, cos θ(x, y) :=
.
|x| |y|
It follows from the Inertia Theorem that every finite dimensional Euclidean vector space has an orthonormal basis. In
coordinates corresponding to an orthonormal basis e1 , . . . , en the inner product is given by the standard formula:
n
X
xi yj hei , ej i = x1 y1 + · · · + xn yn .
hx, yi =
i,j=1
128
Chapter 4. EIGENVALUES
Thus, every Euclidean space V of dimension n can be identified with
the coordinate Euclidean space Rn by an isomorphism Rn → V
respecting inner products. Such an isomorphism is not unique, but
can be composed with any invertible linear transformation U : V → V
preserving the Euclidean structure:
hU x, U yi = hx, yi
for all x, y ∈ V.
Such transformations are called orthogonal.
A Euclidean structure on a vector space V allows one to identify
the space with its dual V ∗ by the rule that to a vector v ∈ V assigns
the linear function on V whose value at a point x ∈ V is equal to the
inner product hv, xi. Respectively, given a linear map A : V → W
between Euclidean spaces, the adjoint map At : W ∗ → V ∗ can be
considered as a map between the spaces themselves: At : W → V.
The defining property of the adjoint map reads:
hAt w, vi = hw, Avi for all v ∈ V and w ∈ W.
Consequently matrices of adjoint maps A and At with respect to
orthonormal bases of the Euclidean spaces V and W are transposed
to each other.
As in the case of Hermitian spaces, one easily derives that a linear
transformation U : V → V is orthogonal if and only if U −1 = U t .
In the matrix form, the relation U t U = I means that columns of U
form an orthonormal set in the coordinate Euclidean space.
Our nearest goal is obtain real analogues of the Spectral Theorem
and its corollaries. One way to do it is to combine corresponding
complex results with complexification. Let V be a Euclidean space.
We extend the inner product to the complexification V C in such a
way that it becomes an Hermitian inner product. Namely, for all
x, y, x′ , y′ ∈ V, put
hx + iy, x′ + iy′ i = hx, x′ i + hy, y′ i + ihx, y′ i − ihy, x′ i.
It is straightforward to check that this form on V C is sesquilinear and
Hermitian symmetric. It is also positive definite since hx + iy, x +
iyi = |x|2 + |y|2 . Note that changing the signs of y and y′ preserves
the real part and reverses the imaginary part of the form. In other
words, for all z, w ∈ V C , we have:
hσ(z), σ(w)i = hz, wi (= hw, zi).
1. The Spectral Theorem
129
This identity expresses the fact that the Hermitian structure of V C
came from a Euclidean structure on V. When A : V C → V C is a real
operator, i.e. σAσ = A, the Hermitian adjoint operator A∗ is also
real.6 Indeed, since σ 2 = id, we find that for all z, w ∈ V C
hσA∗ σz, wi = hσw, A∗ σzi = hAσw, σzi = hσAw, σzi = hz, Awi,
i.e. σA∗ σ = A∗ . In particular, complexifications of orthogonal
(U −1 = U t ), symmetric (At = A), anti-symmetric (At = −A),
normal (At A = AAt ) operators in a Euclidean space are respectively unitary, Hermitian, anti-Hermitian, normal operators on the
complexified space, commuting with the complex conjugation.
The Real Spectral Theorem
Theorem. Let V be a Euclidean space, and A : V → V a normal
operator. Then in the complexification V C , there exists an
orthonormal basis of eigenvectors of AC which is invariant
under complex conjugation and such that the eigenvalues
corresponding to conjugated eigenvectors are conjugated.
Proof. Applying the complex Spectral Theorem to the normal
operator B = AC , we obtain a decomposition of the complexified
space V C into a direct orthogonal sum of eigenspaces W1 , . . . , Wr
of B corresponding to distinct complex eigenvalues λ1 , . . . , λr . Note
that if v is an eigenvector of B with an eigenvalue µ, then Bσv =
σBv = σ(µv) = µ̄σv, i.e. σv is an eigenvector of B with the conjugate eigenvalue µ̄. This shows that if λi is a non-real eigenvalue,
then its conjugate λ̄i is also one of the eigenvalues of B (say, λj ), and
the corresponding eigenspaces are conjugated: σ(Wi ) = Wj . By the
same token, if λk is real, then σ(Wk ) = Wk . This last equality means
that Wk itself is the complexification of a real space, namely of the
σ-invariant part of Wk . It coincides with the space Ker(λk I −A) ⊂ V
of real eigenvectors of A with the eigenvalue λk . Thus, to construct
a required orthonormal basis, we take: for each real eigenspace Wk ,
a Euclidean orthonormal basis in the corresponding real eigenspace,
and for each pair Wi , Wj of complex conjugate eigenspaces, an Hermitian orthonormal basis {fα } in Wi and the conjugate basis {σ(fα )}
in Wj = σ(Wi ). The vectors of all these bases altogether form an
orthonormal basis of V C satisfying our requirements. 6
This is obvious in the matrix form: In a real orthonormal basis of V (which is
a complex orthonormal basis of V C ) A has a real matrix, so that A∗ = At . We use
here a “hard way” to illustrate how various aspects of σ-invariance fit together.
130
Chapter 4. EIGENVALUES
Example 1. Identify C with the Euclidean plane R2 in the usual
way, and consider the operator (x + iy) 7→ (α + iβ)(x + iy) of multiplication by given complex number α + iβ. In the basis 1, i, it has
the matrix
α −β
A=
.
β α
Since At represents multiplication by α − iβ, it commutes with A.
Therefore A is normal. It is straightforward to check that
1
1
1
1
z= √
and z̄ = √
−i
2
2 i
are complex eigenvectors of A with the eigenvalues α + iβ and α − iβ
respectively, and form an Hermitian orthonormal basis in (R2 )C .
Example 2. If A is a linear transformation in Rn , and λ0 is a
non-real root of its characteristic polynomial det(λI − A), then the
system of linear equations Az = λ0 z has non-trivial solutions, which
cannot be real though. Let z = u + iv be a complex eigenvector of A
with the eigenvalue λ0 = α + iβ. Then σz = u − iv is an eigenvector
of A with the eigenvalue λ̄0 = α − iβ. Since λ0 6= λ̄0 , the vectors z
and σz are linearly independent over C, and hence the real vectors
u and v must be linearly independent over R. Consider the plane
Span(u, v) ⊂ Rn . Since
A(u − iv) = (α − iβ)(u − iv) = (αu − βv) − i(βu + αv),
we conclude that A preserves this plane and
u, −v in it
in the basis
α −β
(note the sign change!) acts by the matrix
. If we assume
β α
in addition that A is normal (with respect to the standard Euclidean
structure in Rn ), then the eigenvectors z and σz must be Hermitian
orthogonal, i.e.
hu − iv, u + ivi = hu, ui − hv, vi + 2ihu, vi = 0.
We conclude that hu, vi = 0 and |u|2 − |v|2 = 0, i.e. u and v are
orthogonal and have the same length. Normalizing the length to 1,
we obtain an orthonormal basis of the A-invariant plane, in which
the transformation A acts as in Example 1. The geometry of this
transformation is known to us from studying geometry of complex
numbers: It is the composition of the rotation through the angle
arg(λ0 ) with the expansion by the factor |λ0 |. We will cal such a
transformation of the Euclidean plane a complex multiplication
or multiplication by a complex scalar, λ0 .
1. The Spectral Theorem
131
Corollary 1. Given a normal operator on a Euclidean
space, the space can be represented as a direct orthogonal
sum of invariant lines and planes, on each of which the
transformation acts as multiplication by a real or complex
scalar respectively.
Corollary 2. A transformation in a Euclidean space is
orthogonal if and only it the space can be represented as
the direct orthogonal sum of invariant lines and planes on
each of which the transformation acts as multiplication by
±1 and rotation respectively.
Corollary 3. In a Euclidean space, every symmetric operator has an orthonormal basis of eigenvectors.
Corollary 4. Every quadratic form in a Euclidean space
of dimension n can be transformed by an orthogonal change
of coordinates to exactly one of the normal forms:
λ1 x21 + · · · + λn x2n , λ1 ≥ · · · ≥ λn .
Corollary 5. In a Euclidean space of dimension n, every
anti-symmetric bilinear form can be transformed by an orthogonal change of coordinates to exactly one of the normal
forms
r
X
ωi (x2i−1 y2i − x2i y2i−1 ), ω1 ≥ · · · ≥ ωr > 0, 2r ≤ n.
hx, yi =
i=1
Corollary 6. Every real normal matrix A can be written
in the form A = U ∗ M U where U is an orthogonal matrix,
and M is block-diagonal
matrix with each block of size 1, or
α −β
, where α2 + β 2 6= 0.
of size 2 of the form
β α
If A is symmetric, then only blocks of size 1 are present
(i.e. M is diagonal).
If A is anti-symmetric, then
blocks of size 1 are zero,
0 −ω
and of size 2 are of the form
, where ω > 0.
ω 0
If A is orthogonal, then all blocks of size 1 are equal
cos θ − sin θ
,
to ±1, and blocks of size 2 have the form
sin θ cos θ
where 0 < θ < π.
132
Chapter 4. EIGENVALUES
Courant–Fisher’s Minimax Principle
One of the consequences (equivalent to Corollary 4) of the Spectral
Theorem is that a pair (Q, S) of quadratic forms in Rn of which the
first one is positive definite can be transformed by a linear change of
coordinates to the normal form:
Q = x21 + · · · + x2n , S = λ1 x21 + · · · + λn x2n , λ1 ≥ · · · ≥ λn .
The eigenvalues λ1 ≥ . . . λn form the spectrum of the pair (Q, S).
The following result gives a coordinate-less, geometric description of
the spectrum.
Theorem. The k-th greatest spectral number is given by
λk =
max
min
W: dim W=k x∈W−0
S(x)
,
Q(x)
where the maximum is taken over all k-dimensional subspaces W ⊂ Rn , and minimum over all non-zero vectors in
the subspace.
Proof. When W is given by the equations xk+1 = · · · = xn = 0,
the minimal ratio S(x)/Q(x), achieved on vectors proportional to
ek , is equal to λk because
λ1 x21 + · · · + λk x2k ≥ λk (x21 + · · · + x2k ) when λ1 ≥ · · · ≥ λk .
Therefore it suffices to prove for every other k-dimensional subspace
W the minimal ratio cannot be greater than λk . For this, denote
by V the subspace of dimension n − k + 1 given by the equations
x1 = · · · = xk−1 = 0. Since λk ≥ · · · ≥ λn , we have:
λk x2k + · · · + λn x2n ≤ λk (x2k + · · · + x2n ),
i.e. for all non-zero vectors x in V the ratio S(x)/Q(x) ≤ λk . Now
we invoke the dimension counting argument: dim W + dim V = k +
(n − k + 1) = n + 1 > dim Rn , and conclude that W has a non-trivial
intersection with V. Let x be a non-zero vector in W ∩ V. Then
S(x)/Q(x) ≤ λk , and hence the minimum of the ratio S/Q on W − 0
cannot exceed λk . Applying Theorem to the pair (Q, −S) we obtain yet another
characterization of the spectrum:
λk =
min
W: dim W=n−k
S(x)
.
x∈W−0 Q(x)
max
1. The Spectral Theorem
133
Formulating some applications, we assume that the space Rn is
Euclidean, and refer to the spectrum of the pair (Q, S) where Q =
|x|2 , simply as the spectrum of S.
Corollary 1. When a quadratic form increases, its spectral numbers do not decrease: If S ≤ S ′ then λk ≤ λ′k for all
k = 1, . . . , n.
Proof. Indeed, since S/Q ≤ S ′ /Q, the minimum of the ratio
S/Q on every k-dimensional subspace W cannot exceed that of S ′ /Q,
which in particular remains true for that W on which the maximum
of S/Q equal to λk is achieved.
The following result is called Cauchy’s interlacing theorem.
Corollary 2. Let λ1 ≥ · · · ≥ λn be the spectrum of a
quadratic form S, and λ′1 ≥ · · · ≥ λ′n−1 be the spectrum of
the quadratic form S ′ obtained by restricting S to a given
hyperplane Rn−1 ⊂ Rn . Then:
λ1 ≥ λ′1 ≥ λ2 ≥ λ′2 ≥ · · · ≥ λn−1 ≥ λ′n−1 ≥ λn .
Proof. Maximum over all k-dimensional subspaces W cannot be
smaller than maximum (of the same quantities) over subspaces lying
inside the hyperplane. This proves that λk ≥ λ′k . Applying the same
argument to −S and subspaces of dimension n − k − 1, we conclude
that −λk+1 ≥ −λ′k . An ellipsoid in a Euclidean space is defined as the level-1 set
E = {x | S(x) = 1} of a positive definite quadratic form, S. It follows
from the Spectral Theorem that every ellipsoid can be transformed
by an orthogonal transformation to principal axes: a normal form
x21
x2n
= 1, 0 < α1 ≤ · · · ≤ αn .
+
·
·
·
+
α2n
α21
The vectors x = ±αk ek lie on the ellipsoid, and their lengths αk
are called semiaxes of E. They are related to the spectral
√ numbers
=
λ1 ≥ · · · ≥ λk > 0 of the quadratic form by α−1
λk . From
k
Corollaries 1 and 2 respectively, we obtain:
If one ellipsoid is enclosed by another, the semiaxes of the inner
ellipsoid do not exceed corresponding semiaxes of the outer:
If E ′ ⊂ E, then α′k ≤ αk for all k = 1, . . . , n.
Semiaxes of a given ellipsoid are interlaced by semiaxes of any
section of it by a hyperplane passing through the center:
If E ′ = E ∩ Rn−1 , then αk ≤ α′k ≤ αk+1 for k = 1, . . . , n − 1.
134
Chapter 4. EIGENVALUES
EXERCISES
253. Prove that if two vectors u and v in an Hermitian space are orthogonal, then |u|2 + |v|2 = |u − v|2 . Is the converse true? 254. Prove that for any vectors u, v in an Hermitian space,
|u + v|2 + |u − v|2 = 2|u|2 + 2|v|2 .
Find a geometric interpretation of this fact. 255. Apply Gram–Schmidt orthogonalization to the basis f1 = e1 + 2ie2 +
2ie3 , f2 = e1 + 2ie2 , f3 = e1 in the coordinate Hermitian space C3 .
256. Apply Gram–Schmidt orthogonalization to the standard basis e1 , e2
of C2 to construct an orthonormal basis of the Hermitian inner product
hz, wi = z̄1 w1 + 2z̄1 w2 + 2z̄2 w1 + 5z̄2 w2 .
257. Let f ∈ V be a vector in an Hermitian space,
e1 , . . . , ek an orthonorP
mal basis in a subspace W. Prove that u = hei , viei is the point of W
closest to v, and that v − u is orthogonal to W. (The point u ∈ W is called
the orthogonal projection of v to W.)
258.⋆ Let f1 , . . . , fN be a finite sequence of vectors in an Hermitian space.
The Hermitian N × N -matrix hfi , fj i is called the Gram matrix of the
sequence. Show that two finite sequences of vectors are isometric, i.e. obtained from each other by a unitary transformation, if and only if their
Gram matrices are the same.
259. Prove that hA, Bi := tr(A∗ B) defines an Hermitian inner product on
the space Hom(Cn , Cm ) of m × n-matrices.
260. Let A1 , P
. . . , Ak : V → W be linear maps between Hermitian spaces.
Prove that if
A∗i Ai = 0, then A1 = · · · = Ak = 0.
261. Let A : V → W be a linear map between Hermitian spaces. Show
that B := A∗ A and C = AA∗ are Hermitian, and that the corresponding
Hermitian forms B(x, x) := hx, Bxi in V and C(y, y) := hy, Cyi in W are
non-negative. Under what hypothesis about A is the 1st of them positive?
the 2nd one? are they both positive?
262. Let W ⊂ V be a subspace in an Hermitian space, and let P : V → V
be the map that to each vector v ∈ V assigns its orthogonal projection to
W. Prove that P is an Hermitian operator, and that P 2 = P , and that
Ker P = W ⊥ .(It is called the orthogonal projector to W.)
263. Prove that an n × n-matrix is unitary if and only if its rows (or
columns) form an orthonormal basis in the coordinate Hermitian space Cn .
264. Prove that the determinant of a unitary matrix is a complex number
of absolute value 1.
265. Prove that the Cayley transform: C 7→ (I −C)/(I +C), well-defined
for linear transformations C such that I +C is invertible, transforms unitary
1. The Spectral Theorem
135
operators into anti-Hermitian and vice versa. Compute the square of the
Cayley transform. 266. Prove that the commutator AB − BA of anti-Hermitian operators A
and B is anti-Hermitian.
267. Give an example of a normal 2 × 2-matrix which is not Hermitian,
anti-Hermitian, or unitary.
268. Prove that for any n × n-matrix A and any complex numbers α, β of
absolute value 1, the matrix αA + βA∗ is normal.
269. Prove that A : V → V is normal if and only if |Ax| = |A∗ x| for all
x ∈ V.
270. Prove that the characteristic polynomial det(λI − A) of a square matrix A does not change under similarity transformations A 7→ C −1 AC and
thus depends only on the linear operator defined by the matrix.
271. Show that if λn + p1 λn−1 + · · · + pn is the characteristic polynomial
of a matrix A, then pn = (−1)n det A, and p1 = − tr A.
272. Prove that all roots of characteristic polynomials of Hermitian matrices are real.
273. Find eigenspaces and eigenvalues of an orthogonal projector to a subspace W ⊂ V in an Hermitian space.
274. Prove that every Hermitian operator P satisfying P 2 = P is an orthogonal projector. Does this remain true if P is not Hermitian?
275. Prove directly, i.e. not referring to the Spectral Theorem, that every
Hermitian operator has an orthonormal basis of eigenvectors. 276. Prove that if A and B are normal and AB = 0, then BA = 0.
277.⋆ Let A be a normal operator. Prove that the set of complex numbers
{hx, Axi | |x| = 1} is a convex polygon whose vertices are eigenvalues of
A. 278. Prove that two (or several) commuting normal operators have a common orthonormal basis of eigenvectors. 279. Prove that if A is normal and AB = BA, then AB ∗ = B ∗ A, A∗ B =
BA∗ , and A∗ B ∗ = B ∗ A∗ .
280. Prove that if (λ − λ1 )P
· · · (λ − λn ) is the characteristic polynomial of
a normal operator A, then
|λi |2 = tr(A∗ A).
281. Classify up to linear changes of coordinates pairs (Q, A) of forms,
where S is positive definite Hermitian, and A anti-Hermitian.
282. An Hermitian operator S is called positive (written: S ≥ 0) if
hx, Sx) ≥ 0 for all x. Prove that for every √positive operator S there is
a unique positive square root (denoted by S), i.e. a positive operator
whose square is S.
136
Chapter 4. EIGENVALUES
283. Using Singular Value Decomposition theorem with m = n, prove that
every linear transformation A of an Hermitian space has a polar decomposition A = SU , where S is positive, and U is unitary.
284. Prove that the polar
√ decomposition A = SU is unique when A is
invertible; namely S = AA∗ , and U = S −1 A. What are polar decompositions of non-zero 1 × 1-matrices?
285. Describe the complexification of Cn considered as a real vector space.
286. Let σ be the complex conjugation operator on Cn . Consider Cn as a
real vector space. Show that σ is symmetric and orthogonal.
287. Prove that every orthogonal transformation in R3 is either a rotation
through an angle θ, 0 ≤ θ ≤ π, about some axis, or the composition of such
a rotation with the reflection about the plane perpendicular to the axis.
288. Find an orthonormal basis in Cn in which the transformation defined
by the cyclic permutation of coordinates: (z1 , z2 , . . . , zn ) 7→ (z2 , . . . , zn , z1 )
is diagonal.
289. In the coordinate Euclidean space Rn with n ≤ 4, find real and complex normal forms of orthogonal transformations defined by various permutations of coordinates.
290. Transform to normal forms by orthogonal transformations:
(a) x1 x2 + x3 x4 , (b) 2x21 − 4x1 x2 + x22 − 4x2 x3 ,
(c) 5x21 + 6x22 + 4x23 − 4x1 x2 − 8x1 x3 − 4x2 x3 .
291. In Euclidean spaces, classify all operators which are both orthogonal
and anti-symmetric.
292. Let U and V be two subspaces of dimension 2 in the Euclidean 4space. Consider the map T : V → V defined as the composition: V ⊂ R4 →
U ⊂ R4 → V, where the arrows are the orthogonal projections to U and
V respectively. Prove that T is positive, and that its eigenvalues have the
form cos φ, cos ψ where φ, ψ are certain angles, 0 ≤ φ, ψ ≤ π/2.
293. Solve Gelfand’s problem: In the Euclidean 4-space, classify pairs
of planes passing through the origin up to orthogonal transformations of
the space. 294. Prove that every ellipsoid in Rn has pairwise perpendicular hyperplanes of bilateral symmetry.
295. Prove that every ellipsoid in R4 has a plane section which is a circle.
Does this remain true for ellipsoids in R3 ?
296. Formulate and prove counterparts of Courant–Fisher’s minimax principle and Cauchy’s interlacing theorem for Hermitian forms.
297. Prove that semiaxes α1 ≤ α2 ≤ . . . of an ellipsoid in Rn and semiaxes
α′k ≤ α′2 ≤ . . . of its section by a linear subspaces of codimension k are
related by the inequalities: αi ≤ α′i ≤ αi+k , i = 1, . . . , n − k.
2. Jordan Canonical Forms
2
137
Jordan Canonical Forms
Characteristic Polynomials and Root Spaces
Let V be a finite dimensional K-vector space. We do not assume that
V is equipped with any structure in addition to the structure of a Kvector space. In this section, we study geometry of linear operators
on V. In other words, we study the problem of classification of linear
operators A : V → V up to similarity transformations A 7→ C −1 AC,
where C stands for arbitrary invertible linear transformations of V.
Let n = dim V, and let A be the matrix of a linear operator with
respect to some basis of V. Recall that
det(λI − A) = λn + p1 λn−1 + · · · + pn−1 λ + pn
is called the characteristic polynomial of A. In fact it does not
depend on the choice of a basis. Indeed, under a change x = Cx′ of
coordinates, the matrix of a linear operator x 7→ Ax is transformed
into the matrix C −1 AC similar to A. We have:
det(λI − C −1 AC) = det[C −1 (λI − A)C] =
(det C −1 ) det(λI − A)(det C) = det(λI − A).
Therefore, the characteristic polynomial of a linear operator is welldefined (by the geometry of A). In particular, coefficients of the
characteristic polynomial do not change under similarity
transformations.
Let λ0 ∈ K be a root of the characteristic polynomial. Then
det(λ0 I − A) = 0, and hence the system of homogeneous linear equations Ax = λ0 x has a non-trivial solution, x 6= 0. As before,we
call any such solution an eigenvector of A, and call λ0 the corresponding to the eigenvalue. All solutions to Ax = λ0 x (including
x = 0) form a linear subspace in V, called the eigenspace of A
corresponding to the eigenvalue λ0 .
Let us change slightly our point of view on the eigenspace. It
is the null space of the operator A − λ0 I. Consider powers of this
operator and their null spaces. If (A − λ0 I)k x = 0 for some k > 0,
then then (A − λ0 I)l x = 0 for all l ≥ k. Thus the null spaces are
nested:
Ker(A − λ0 I) ⊂ Ker(A − λ0 I)2 ⊂ · · · ⊂ Ker(A − λ0 I)k ⊂ . . .
138
Chapter 4. EIGENVALUES
On the other hand, since dim V < ∞, nested subspaces must stabilize, i.e. starting from some m > 0, we have:
Wλ0 := Ker(A − λ0 I)m = Ker(A − λ0 I)m+1 = · · ·
We call the subspace Wλ0 a root space of the operator A, corresponding to the root λ0 of the characteristic polynomial.
Note that if x ∈ Wλ0 , then Ax ∈ Wλ0 , because (A − λ0 I)m Ax =
A(A − λ0 I)m x = A0 = 0. Thus a root space is A-invariant. Denote
by Uλ0 the range of (A − λ0 I)m . It is also A-invariant, since if x =
(A − λ0 I)m y, then Ax = A(A − λ0 I)m y = (A − λ0 I)m (Ay).
Lemma. V = Wλ0 ⊕ Uλ0 .
Proof. Put B := (A − λ0 I)m , so that Wλ0 = Ker B, Uλ0 =
B(V). Let x = By ∈ Ker B. Then Bx = 0, i.e. y ∈ Ker B 2 . But
Ker B 2 = Ker B by the assumption that Ker B = Wλ0 is the root
space. Thus y ∈ Ker B, and hence x = By = 0. This proves that
Ker B ∩ B(V) = {0}. Therefore the subspace in V spanned by Ker B
and B(V) is their direct sum. On the other hand, for any operator,
dim Ker B + dim B(V) = dim V. Thus, the subspace spanned by
Ker B and B(V) is the whole space V.
Corollary 1. Root spaces of the restriction of A to Uλ0
are exactly those root spaces Wλ of A in the whole space V,
which correspond to roots λ 6= λ0 .
Proof. It suffices to prove that if λ 6= λ0 is another root of the
characteristic polynomial, then Wλ ⊂ Uλ0 . Indeed, Wλ is invariant
with respect to A − λ0 I, but contains no eigenvectors of A with
eigenvalue λ0 . Therefore A − λ0 I and all powers of it are invertible
on Wλ . Thus Wλ lies in the range Uλ0 of B = (A − λ0 I)m .
Corollary 2. Suppose that (λ − λ1 )m1 . . . (λ − λr )mr is the
characteristic polynomial of A : V → V, where λ1 , . . . , λr ∈ K
are pairwise distinct roots. Then V is the direct sum of root
spaces:
V = Wλ1 ⊕ · · · ⊕ Wλr .
Proof. By induction on r, it follows from Corollary 1 that the
subspace W ⊂ V spanned by root spaces Wλi is their direct sum
and moreover, V = W ⊕ U, where U is the intersection of all Uλi .
Furthermore, the characteristic polynomial of A restricted to U has
none of λi as a root. From our assumption that the characteristic
polynomial of A factors into λ−λi , it follows however that dim U = 0.
2. Jordan Canonical Forms
139
Indeed, the characteristic polynomial of A is equal to the product of
the characteristic polynomials of its restrictions to W and U (because
the matrix of A in a suitable basis of a direct sum is block diagonal).
Hence the factor corresponding to U must be of degree 0.
Remarks. (1) We will see later that dimensions of the root spaces
coincide with multiplicities of the roots: dim Wλi = mi .
(2) The restriction of A to Wλi has the property that some power
of A − λi I vanishes. A linear operator some power of which vanishes
is called nilpotent. Our next task will be to study the geometry of
nilpotent operators.
(3) Our assumption that the characteristic polynomial factors
completely over K is automatically satisfied in the case K = C due
to the Fundamental Theorem of Algebra. Thus, we have proved for
every linear operator on a finite dimensional complex vector space,
that the space decomposes in a canonical fashion into the direct sum
of invariant subspaces on each of which the operator differs from a
nilpotent one by scalar summand.
Nilpotent Operators
Example. Introduce a nilpotent linear operator N : Kn → Kn by
describing its action on vectors of the standard basis:
N en = en−1 , N en−1 = en−2 , . . . , N e2 = e1 , N e1 = 0.
Then N n = 0 but N n−1 6= 0. We will call N , as well as any operator
similar to it, a regular nilpotent operator. The matrix of N in the
standard basis has the form
0 1 0 ... 0
 0 0 1 ... 0 
...
.
 0 0 ...
0 1 
0 0 ...
0 0
It has the range of dimension n − 1 spanned by e1 , . . . , en−1 and the
null space of dimension 1 spanned by e1 .
Proposition. Let N : V → V be a nilpotent operator on a
K-vector space of finite dimension. Then the space can be
decomposed into the direct sum of N -invariant subspaces,
on each of which N is regular.
140
Chapter 4. EIGENVALUES
Proof. We use induction on dim V. When dim V = 0, there is
nothing to prove. Now consider the case when dim V > 0.
The range N (V) is N -invariant, and dim N (V) < dim V (since
otherwise N could not be nilpotent). By the induction hypothesis, there space N (V) can be decomposed into the direct sum of N invariant subspaces, on each of which N is regular. Let l be the num(i)
(i)
ber of these subspaces, n1 , . . . , nl their dimensions, and e1 , . . . , eni
a basis in the ith subspace such that N acts on the basis vectors as
in Example:
(i)
e(i)
ni 7→ · · · 7→ e1 7→ 0.
(i)
(i)
Since each eni lies in the range of N , we can pick a vector eni +1 ∈ V
(i)
(i)
(l)
(1)
such that N eni +1 = eni . Note that e1 , . . . , e1 form a basis in
(Ker N ) ∩ N (V). We complete it to a basis
(1)
(l)
(l+1)
e1 , . . . , e1 , e1
(r)
, . . . , e1
(i)
of the whole null space Ker N . We claim that all the vectors ej
form a basis in V, and therefore the l + r subspaces
(i)
(i)
Span(e1 , . . . , e(i)
ni , eni +1 ), i = 1, . . . , l, l + 1, . . . , r,
(of which the last r − l are 1-dimensional) form a decomposition of
V into the direct sum with required properties.
To justify the claim, notice that the subspace U ⊂ V spanned
(i)
by n1 + · · · + nl = dim N (V) vectors ej with j > 1 is mapped by
N onto the space N (V). Therefore: those vectors form a basis of
U, dim U = dim N (V), and U ∩ Ker N = {0}. On the other hand,
(i)
vectors ej with j = 1 form a basis of Ker N , and since dim Ker N +
dim N (V) = dim V, together with the above basis of U, they form a
basis of V. Corollary 1. The matrix of a nilpotent operator in a
suitable basis is block diagonal with regular diagonal blocks
(as in Example) of certain sizes n1 ≥ · · · ≥ nr > 0.
The basis in which the matrix has this form, as well as the decomposition into the direct sum of invariant subspaces as described
in Proposition, are not canonical, since choices are involved on each
step of induction. However, the dimensions n1 ≥ · · · ≥ nr > 0 of
the subspaces turn out to be uniquely determined by the geometry
of the operator.
141
2. Jordan Canonical Forms
To see why, introduce the following Young tableaux (Figure
41). It consist of r rows of identical square cells. The lengths of
the rows represent dimensions n1 ≥ n2 ≥ · · · ≥ nr > 0 of invariant
subspaces, and in the cells of each row we place the basis vectors
of the corresponding subspace, so that the operator N sends each
vector to its left neighbor (and vectors of the leftmost column to 0).
(1)
(1)
(1)
(2)
(2)
(2)
en
(3)
(3)
(3)
en
e1 e2 e3
e1 e2 e3
(1)
(1)
1
1
e n−1 e n
e1 e2 e3
(2)
2
(3)
3
(r−1) (r−1)
e1 e2
(r)
e1
Figure 41
The format of the tableaux is determined by the partition of the
total number n of cells (equal to dim V) into the sum n1 + · · · + nr
of positive integers. Reading the same format by columns, we obtain
another partition n = m1 + · · · + md , called transposed to the first
one, where m1 ≥ · · · ≥ md > 0 are the heights of the columns.
Obviously, two transposed partitions determine each other.
(i)
It follows from the way how the cells are filled with vectors ej ,
that the vectors in the columns 1 through k form a basis of the space
Ker N k . Therefore
mk = dim Ker N k − dim Ker N k−1 ,
k = 1, . . . , d.
Corollary 2. Consider the flag of subspaces defined by a
nilpotent operator N : V → V:
Ker N ⊂ Ker N 2 ⊂ · · · ⊂ Ker N d = V
and the partition of n = dim V into the summands mk =
dim Ker N k −dim Ker N k−1 . Then summands of the transposed
partition n = n1 + · · · + nr are the dimensions of the regular
nilpotent blocks of N (described in Proposition and Corollary 1).
Corollary 3. The number of equivalence classes of nilpotent operators on a vector space of dimension n is equal to
the number of partitions of n.
142
Chapter 4. EIGENVALUES
The Jordan Canonical Form Theorem
We proceed to classification of linear operators A : Cn → Cn up to
similarity transformations.
Theorem. Every complex matrix is similar to a block diagonal normal form with each diagonal block of the form:
λ0 1
0 ... 0
 0 λ0 1 . . . 0 
...
 , λ0 ∈ C,
 0 0 ... λ
1 
0
0 0 ...
0 λ0
and such a normal form is unique up to permutations of
the blocks.
The block diagonal matrices described in the theorem are called
Jordan canonical forms (or Jordan normal forms). Their diagonal blocks are called Jordan cells.
It is instructive to analyze a Jordan canonical form before going
into the proof of the theorem. The characteristic polynomial of a
Jordan cell is (λ − λ0 )m where m is the size of the cell. The characteristic polynomial of a block diagonal matrix is equal to the product
of characteristic polynomials of the diagonal blocks. Therefore the
characteristic polynomial of the whole Jordan canonical form is the
product of factors (λ − λi )mi , one per Jordan cell. Thus the diagonal
entries of Jordan cells are roots of the characteristic polynomial. After subtracting the scalar matrix λ0 I, Jordan cells with λi = λ0 (and
only these cells) become nilpotent. Therefore the root space Wλ0 is
exactly the direct sum of those subspaces on which the Jordan cells
with λi = λ0 operate.
Proof of Theorem. Everything we need has been already established in the previous two subsections.
Thanks to the Fundamental Theorem of Algebra, the characteristic polynomial det(λI −A) of a complex n×n-matrix A factor into the
product of powers of distinct linear factors: (λ − λ1 )n1 · · · (λ − λr )nr .
According to Corollary 2 of Lemma, the space Cn is decomposed in a
canonical fashion into the direct sum Wλ1 ⊕ · · · ⊕ Wλr of A-invariant
root subspaces. On each root subspace Wλi , the operator A − λi I
is nilpotent. According to Proposition, the root space Wni is represented (in a non-canonical fashion) as the direct sum of invariant
subspaces on each of which A − λi I acts as a regular nilpotent. Since
2. Jordan Canonical Forms
143
the scalar operator λi I leaves every subspace invariant, this means
that Wλi is decomposed into the direct sum of A-invariant subspaces,
on each of which A acts as a Jordan cell with the eigenvalue λ0 = λi .
Thus, existence of a basis in which A is described by a Jordan normal
form is established.
To prove uniqueness, note that the root spaces Wλi are intrinsically determined by the operator A, and the partition of dim Wλi
into the sizes of Jordan cells with the eigenvalue λi is uniquely determined, according to Corollary 2 of Proposition, by the geometry of
the operator A− λi I nilpotent on Wλi . Therefore the exact structure
of the Jordan normal form of A (i.e. the numbers and sizes of Jordan
cells for each of the eigenvalues λi ) is uniquely determined by A, and
only the ordering of the diagonal blocks remains ambiguous. Corollary 1. Dimensions of root spaces Wλi coincide
with multiplicities of λi as roots of the characteristic polynomial.
Corollary 2. If the characteristic polynomial of a complex matrix has only simple roots, then the matrix is diagonalizable, i.e. is similar to a diagonal matrix.
Corollary 3. Every operator A : Cn → Cn in a suitable
basis is described by the sum D + N of two commuting matrices, of which D is diagonal, and N strictly upper triangular.
Corollary 4. Every operator on a complex vector space
of finite dimension can be represented as the sum D + N of
two commuting operators, of which D is diagonalizable and
N nilpotent.
Remark. We used that K = C only to factor the characteristic polynomial of the matrix A into linear factors. Therefore the
same results hold true over any field K such that all non-constant
polynomials from K[λ] factor into linear factors. Such fields are
called algebraically closed. In fact (see [8]) every field K is contained in an algebraically closed field. Thus every linear operator
A : Kn → Kn can be brought to a Jordan normal form by transformations A 7→ C −1 AC, where however entries of C and scalars λ0 in
Jordan cells may belong to a larger field F ⊃ K.7 We will see how
this works when K = R and F = C.
7
For this, F does not have to be algebraically closed, but only needs to contain
all roots of det(λI − A).
144
Chapter 4. EIGENVALUES
The Real Case
Let A : Rn → Rn be an R-linear operator. It acts8 on the complexification Cn of the real space, and commutes with the complex
conjugation operator σ : x + iy 7→ x − iy.
The characteristic polynomial det(λI − A) has real coefficients,
but its roots λi can be either real or come in pair of complex conjugated roots (of the same multiplicity). Consequently, the complex
root spaces Wλi , which are defined as null spaces in Cn of sufficiently
high powers of A − λi I, come in two types. If λi is real, then the root
space is real in the sense that it is σ-invariant, and thus is the complexification of the real root space Wλi ∩Rn . If λi is not real, and λ̄i is
its complex conjugate, then Wλi and Wλ̄i are different root spaces of
A, but they are transformed into each other by σ. Indeed, σA = Aσ,
and σλi = λ̄i σ. Therefore, if z ∈ Wλi , i.e. (A − λi I)d z = 0 for some
d, then 0 = σ(A − λi I)d z = (A − λ̄i I)d σz, and hence σz ∈ Wλ̄i .
This allows one to obtain the following improvement for the Jordan
Canonical Form Theorem applied to real matrices.
Theorem. A real linear operator A : Rn → Rn can is represented by the matrix in a Jordan normal form with respect
to a basis of the complexified space Cn invariant under complex conjugation.
Proof. In the process of construction bases in Wλi in which A
has a Jordan normal form, we can use the following procedure. When
λi is real, we take the real root space Wλi ∩ Rn and take in it a real
basis in which the matrix of A − λi I is block diagonal with regular
nilpotent blocks. This is possible due to Proposition applied to the
case K = R. This real basis serves then as a σ-invariant complex
basis in the complex root space Wλi . When Wλi and Wλ̄i is a pair
of complex conjugated root spaces, then we take a required basis in
one of them, and then apply σ to obtain such a basis in the other.
Taken together, the bases form a σ-invariant set of vectors. Of course, for each Jordan cell with a non-real eigenvalue λ0 ,
there is another Jordan cell of the same size with the eigenvalue λ̄0 .
Moreover, if e1 , . . . , em is the basis in the A-invariant subspace of
the first cell, i.e. Aek = λ0 ek + ek−1 , k = 2, . . . , m, and Ae1 = λ0 e1 ,
then the A-invariant subspace corresponding to the other cell comes
with the complex conjugate basis ē1 , . . . , ēm , where ēk = σek . The
direct sum U := Span(e1 , . . . , em , ē1 , . . . , ēm ) of the two subspaces is
8
Strictly speaking, it is the complexification AC of A that acts on Cn = (Rn )C ,
but we will denote it by the same letter A.
2. Jordan Canonical Forms
145
both A- and σ-invariant and thus is a complexification of the real
A-invariant subspace U ∩ Rn . We use this to describe a real normal
form for the action of A on this subspace.
Namely, let λ0 = α−iβ, and write each basis vector ek in terms of
its real and imaginary part: ek = uk − ivk . Then the real vectors uk
and vk form a real basis in the real part of the complex 2-dimensional
space spanned by ek and ēk = uk + ivk . Thus, we obtain a basis
u1 , v1 , u2 , v2 , . . . , um , vm in the subspace U ∩ Rn . The action of A
on this basis is found from the formulas:
Au1 + iAv1 = (α − iβ)(u1 + iv1 ) = (αu1 + βv1 ) + i(−βu1 + αv1 ),
Auk + iAvk = (αuk + βvk + uk−1 ) + i(−βuk + αvk + vk−1 ), k > 1.
Corollary 1. A linear operator A : Rn → Rn is represented in a suitable basis by a block diagonal matrix with
the diagonal blocks that are either Jordan cells with real
eigenvalues, or have the form (β 6= 0):
α −β 1
0 0
...
0
0
 β
α 0
1 0
...
0
0 
 0
0 α −β 1
0
...
0 
0 β
α 0
1
...
0 
 0
...
.
 0
...
0
α −β 1
0 
 0
.
.
.
0
β
α
0
1
 0
0
...
0
0 α −β 
0
0
...
0
0 β
α
Corollary 2. If two real matrices are related by a complex similarity transformation, then they are related by a
real similarity transformation.
Remark. The proof meant here is that if two real matrices are
similar over C then they have the same Jordan normal form, and
thus they are similar over R to the same real matrix, as described in
Corollary 1. However, Corollary 2 can be proved directly, without
a reference to the Jordan Canonical Form Theorem. Namely, if two
real matrices, A and A′ , are related by a complex similarity transformation: A′ = C −1 AC, we can rewrite this as CA′ = AC, and
taking C = B + iD where B and D are real, obtain: BA′ = AB and
DA′ = AD. The problem now is that neither B nor D is guaranteed
to be invertible. Yet, there must exist an invertible linear combination E = λB + µD, for if the polynomial det(λB + µD) of λ and µ
146
Chapter 4. EIGENVALUES
vanishes identically, then det(B + iD) = 0 too. For invertible E, we
have EA′ = AE and hence A′ = E −1 AE.
EXERCISES
298. Let A, B : V → V be two commuting linear operators, and p and q
two polynomials in one variable. Show that the operators p(A) and q(B)
commute.
299. Prove that if A commutes with B, then root spaces of A are Binvariant.
300. Let λ0 be a root of the characteristic polynomial of an operator A,
and m its multiplicity. What are possible values for the dimension of the
eigenspace corresponding to λ0 ?
301. Let v ∈ V be a non-zero vector, and a : V → K a non-zero linear
function. Find eigenvalues and eigenspaces of the operator x 7→ a(x)v.
302. Let Vn ⊂ K[x] be the space of all polynomials of degree < n. Prove
d
that the differentiation dx
: Vn → Vn is a regular nilpotent operator.
303. Prove that a square matrix satisfies its own characteristic equation;
namely, if p denotes the characteristic polynomial of a matrix A, then
p(A) = 0. (This identity is called the Hamilton–Cayley equation.)
304. Is there an n × n-matrix A such that A2 6= 0 but A3 = 0: (a) if n = 2?
(b) if n = 3?
305. Classify similarity classes of nilpotent 4 × 4-matrices.
306. Prove that the number of similarity classes of unipotent n×n-matrices
is equal to the number of partitions of n.
307. Find Jordan normal forms of the following matrices:
"
#
"
#
"
#
1
2 0
4
6 0
13 16 16
0
2 0 , (b) −3 −5 0 , (c) −5 −7 −6 ,
(a)
−2 −2 1
−3 −6 1
−6 −8 −7
#
#
"
#
"
"
7 −12 −2
−4 2 10
3
0
8
3 −4
0 ,
3 −1 −6 , (e) −4 3 7 , (f )
(d)
−2
0
2
−3 1 7
−2
0 −5
#
#
"
#
"
"
1
1 −1
0
3
3
−2
8
6
3 ,
8
6 , (i) −3 −3
6 , (h) −1
(g) −4 10
−2 −2
2
2 −14 −10
4 −8 −4
#
#
"
#
"
"
3
7 −3
−1
1
1
1 −1 2
2 ,
21
17 , (l) −2 −5
(j) 3 −3 6 , (k) −5
−4 −10
3
6 −26 −21
2 −2 4
"
#
"
#
"
#
8
30 −14
9 22 −6
4
5 −2
9 , (n) −1 −4
1 , (o) −2 −2
1 .
(m) −6 −19
−6 −23
11
8 16 −5
−1 −1
1
2. Jordan Canonical Forms
147
308. Find all matrices commuting with a regular nilpotent.
309. Compute powers of Jordan cells.
310. Prove that if some power of a complex matrix is the identity, then the
matrix is diagonalizable.
311. Prove that transposed square matrices are similar.
P
Q
312. Prove that tr A = λi and det A = λi , where λ1 , ,̇λn are all roots
of the characteristic polynomial (not necessarily distinct).
313. Find complex eigenvectors and eigenvalues of rotations on the Euclidean plane.
314. Classify all linear operators in R2 up to linear changes of coordinates.
a
b
315. Consider real traceless 2 × 2-matrices
as points in the 3c −a
dimensional space with coordinates a, b, c. Sketch the partition of this space
into similarity classes.
148
Chapter 4. EIGENVALUES
Hints
−−→
−−→
8. AA′ = 3AM /2.
14. cos2 θ ≤ 1.
19. Rotate the regular n-gon through 2π/n.
22. Take X = A first.
−−
→ −−→ −→
23. AB = OB − OA.
24. If a + b + c + d = 0, then 2(b + c)(c + d) = a2 − b2 + c2 − d2 .
−−→ −→
−−→ −→
25. XA2 = |OX − OA|2 = 2R2 − 2hOX, OAi if O is the center.
26. Consider projections of the vertices to any line.
27. Project faces to an arbitrary plane and compute signed areas.
28. Q(−x, −y) = Q(x, y) for a quadratic form Q.
31. AX 2 + CY 2 is even in X and Y .
32. Q(−x, y) = Q(x, y) only if b = 0.
35. cos θ = 4/5.
38. x2 + 4xy = (x + 2y)2 − 4y 2 .
42. e = |AB|/|AC| is the slope of the secting plane.
43. Show that the sum of distances to the foci of an ellipse from points
on a tangent line is minimal at the tangency point; then reflect the source
of light about the line.
52.
54.
57.
58.
67.
Compute (cos θ + i sin θ)3 .
Divide P by z − z0 with a remainder.
(z − z0 )(z − z̄0 ) is real.
Apply Vieta’s formula.
Draw the regions of the plane where the quadratic forms Q > 0.
70. First apply the Inertia Theorem to make Q = x21 + x22 .
74. In each ak bm−k , each a comes from one of m factors (a + b).
76. Take x1 = x, x2 = ẋ.
149
150
Hints
2
3n(n+1)
2
− n2 = n(n+3)
> n2 .
2
P
83. Write x = xi ei .
94. Pick two 2 × 2-matrices at random.
97. Compute BAC, where B and C are two inverses of A.
80.
98. Compute (B −1 A−1 )(AB) and (AB)(B −1 A−1 ).
99. Consider 1 × 2 and 2 × 1 matrices.
101. Solve D−1 AC = I for D and C.
104. Which 2 × 2-matrices are anti-symmetric?
109. Start with any non-square matrix.
110. Each elementary product contains a zero factor.
113. There are 12 even and 12 odd permutations.
115. Permutation
„
1
n
2
n−1
...
...
n−1
2
n
1
«
has length
n
2
.
117. Pairs of indices inverted by σ and σ −1 are the same.
118. The transformation: logarithm → lagorithm → algorithm consists
of two transpositions.
123. 247 = 2 · 100 + 4 · 10 + 7.
127. det(−A) = (−1)n det A.
139. Apply the 1st of the “cool formulas.”
140. Compute the determinant of a 4 × 4-matrix the last two of whose
rows repeat the first two.
141. Apply Binet–Cauchy’s formula to the 2 × 3 matrix whose rows are
xt and yt .
142. Show that ∆n = an ∆n−1 + ∆n−2 .
m
144.
Use the defining property of Pascal’s triangle:
=
k
m−1
m−1
+
.
k
k−1
145. Use the fact of algebra that a polynomial in (x1 , . . . , xn ), which
vanishes when xi = xj , is divisible by xi − xj .
146. Divide the kth column by k and apply Vandermonde’s identity.
147. Redefine multiplication by scalars as λu = 0 for all λ ∈ K and all
u ∈ V.
148. 0u = (0 + 0)u = 0u + 0u.
149. (−1)u + u = (−1 + 1)u = 0u = 0.
150. Use the Euclidean algorithm (see e.g. [3]) to show that for relatively
prime k and p, the least positive integer of the form mk + np is equal to 1.
153. There is no 0 among polynomials of a fixed degree.
π
f
160. To a linear form f : V/W → K, associate V → V/W → K.
151
Hints
162. Given B : V → W ∗ , show that evaluation (Bv)(w) of the linear
form Bv ∈ W ∗ on w ∈ W defines a bilinear form.
164. Translate given linear subspaces by any vector in the intersection of
the affine ones.
166. A linear function on V ⊕ W is uniquely determined by its restrictions
to V and W.
171. Compute q1 q2 q2∗ q1∗ .
174. Introduce multiplication by scalars q ∈ H as A 7→ q ∗ A.
176. |q|2 = | − 1| = 1.
181. cos 4θ + i sin 4θ = (cos θ + i sin θ)4 .
182. H : R7 → R1 .
184. The space spanned by columns of A and B contains columns of A+B.
188. (At a)(x) = a(Ax) for all x ∈ Kn and a ∈ (Km )∗ .
196. Prove that in a finite field, multiples of 1 form a subfield isomorphic
to Zp where p is a prime.
203. Modify the Gaussian elimination algorithm of Section 2 by permuting
unknowns (instead of equations).
204. Apply the LU P decomposition to M t .
205. Apply LP U , LU P , and P LU decompositions to M −1 .
210.
214.
227.
228.
When K has q elements, each cell of dimension l has q l elements.
Consider the map Rn → Rp+q defined by the linear forms.
P = A + iB where both A and B are Hermitian.
hx, yi = x∗ y.
230. hz, wi = z∗ w.
231. Use the previous exercise.
233.
238.
240.
243.
The normal forms are: i|Z1 |2 − i|Z2 |2 , |Z1 |2 − |Z2 |2 , |Z1 |2 .
Take the linear form for one of new coordinates.
Prove that −1 is a non-square.
There are 1 even and 3 odd non-degenerate forms.
247.
1
3
= 1 − 32 .
248. Start with inverting 2-adic units · · · ∗ ∗ ∗ 1. (∗ is a wild card).
250. Use the previous exercise.
275. Find an eigenvector, and show that its orthogonal complement is invariant.
P
277. Compute hx, Axi
P 2for x = ti vi , where {vi } is an orthonormal basis
of eigenvectors, and
ti = 1.
278. If AB = BA, then eigenspaces of A are B-invariant.
293. Use the previous exercise.
152
309. Use Newton’s binomial formula.
Answers
Answers
√
1. mg/2, mg 3/2.
2. 18 min (reloading time excluded).
6. It rotates with the same angular velocity along a circle centered at the
barycenter of the triangle formed by the centers of the given circles.
10. 3/4.
−−→
−−→
−→ −−→
13. 7OA = OA′ + 2OB ′ + 4OC ′ for any O.
15. 3/2.
17. 2hu, vi = |u + v|2 − |u|2 − |v|2 .
18. (b) No.
p
p
30. (a) ±( α2 − β 2 , 0), (b) ±( α2 + β 2 , 0).
33. Yes, k(x2 + y 2 ).
34. y = ±x; level curves of (a) are ellipses, of (c) hyperbolas.
35.
38.
39.
44.
2X 2 − Y 2 = 1.
2nd ellipse, 1st & 4th hyperbolas.
A pair of intersecting lines, x − 1 = ±(2y − 1).
Yes; Yes (0); Yes.
1+5i
13 ;
−2
45. (a)
(b)
√
1−i 3
.
2
47. |z|
.
√
48. ±i 3.
√
50. (a) 2, −π/4; (b) 2, −π/3.
51.
√
−1+i 3
.
2
√
√
√
2
55. 2 ± i; i 1±2 5 ; 1 + i, 1 + i; 1 ± 6−i
.
2
√
√
√
√
√
√
√
3±i
3±i
,
;
±i,
±
56. −2, 1 ± i 3; i, ± 23−i ; ±i 2, ±i 2; 3±i
2
2
2 .
P
k
59. ak = (−1)
1≤i1 <···<ik ≤n zi1 · · · zik .
60. For normal forms, y 2 and 0 can be taken.
153
154
63.
65.
69.
71.
Answers
Yes, some non-equivalent equations represent the empty set.
m = n = r = 2.
n(n + 1).
The ODEs ẋ = ax and ẋ = a′ x are non-equivalent whenever a 6= a′ .
77. Ẋ1 = iX1 , Ẋ2 = −iX2 .
78. x1 = c1 eλt , x2 = c2 eλt , and y1 = (c1 + c2 t)eλt , y2 = c2 eλt .
79. n(n+1)
.
2
81. yes, no, no, yes, no, yes, yes, no.
84. E = va, i.e. eij = vi aj , i = 1, . . . , m, j = 1, . . . , n.
cos θ − sin θ
.
85.
sin θ
cos θ
86. Check your answers using associativity, e.g. (CB)A = C(BA).
0 −2
3 1
.
, ACB =
Other answers: BAC =
3
6
3 3
91.
cos 1◦
sin 1◦
− sin 1◦
cos 1◦
.
92. If B = Ak , then bi,i+k = 1, and all other bij = 0.
93. (a) If A is m × n, B must be n × m. (b) Both must be n × n.
95. Exactly when AB = BA.
96. Those with all aii 6= 0.
102. No; yes (of x, y ∈ R1 ); yes (in R2 ).
103. I.
105. S = 2x1 y1 + x1 y2 + x2 y1 , A = x1 y2 − x2 y1 .
P
P
P
106. ( xi )( yi ) and i6=j xi yj /2.
107. No, only if AB = BA.
111. 1; 1; cos(x + y).
112. 2, 1; −2.
114. k(k − 1)/2.
116. n(n − 1)/2 − l.
119. E.g. τ14 τ34 τ25 .
120. No; yes.
121. + (6 inverted pairs); + (8 inverted pairs).
122. −1 522 200; −29 400 000.
124. 0.
125. Changes sign, if n = 4k + 2 for some k, and remains unchanged
otherwise.
155
Answers
126. x = a1 , . . . , an .
128. Leave it unchanged.
131. (a) (−1)n(n−1)/2 a1 a2 · · · an , (b) (ad − bc)(eh − f g).
132. (a) 9, (b) 5.
5
7
− 18
18
1
133. (a) 
−5
18
7
18
134. (a) x
18
1
18
= [− 92 , 31 , 19 ]t ,
n−1
1
18
7
18
5
− 18
, (b)
"
1
0
0
−1
0
1 −1
0
1
#
.
(b) x = [1, −1, 1]t.
136. (det A)
, where n is the size of A.
137. Those with determinants ±1.
138. (a) x1 = 3, x2 = x3 = 1, (b) x1 = 3, x2 = 4, x3 = 5.
139. (a) −x21 − · · · − x2n , (b) (ad − bc)3 .
143.
144.
146.
152.
λn + a1 λn−1 + · · · + an−1 λ + an .
1.
1!3!5! · · · (2n − 1)!.
pn ; pmn .
155. pn(n−1)/2 .
157. Ker D = K[xp ] ⊂ K[x].
163. Points, lines, and planes.
168. Ker At = A(V)⊥ , At (W ∗ ) = (V/ Ker A)∗ ⊂ V ∗ .
170. −1; −q.
z −w̄
.
175.
w
z̄
176. {bi + cj + dk | b2 + c2 + d2 = 1}.
178. 2.
179. 2, if n > 1.
180. xk = xk1 L1 (x) + · · · + xkn Ln (x), k = 0, . . . , n − 1.
182. 1.
185. The system is consistent whenever b1 + b2 + b3 = 0.
187. (a) Yes, (b) yes, (c) yes, (d) no, (e) yes, (f) no, (g) no, (h) yes.
191. 0 ≤ codim ≤ k.
193. Two subspaces are equivalent if and only if they have the same dimension.
194. The equivalence class of an ordered pair U, V of subspaces is determined by k := dim U , l := dim V, and r := dim(U + V), where k, l ≤ r ≤ n
can be arbitrary.
156
Answers
195. (a) q n ;
(b) (q n − 1)(q n − q)(q n − q 2 ) · · · (q n − q n−1 );
n
n
(c) (q − 1)(q − q)(q n − q 2 ) · · · (q n − q r−1 );
(d) (q n − 1)(q n − q) · · · (q n − q r−1 )/(q r − 1)(q r − q) · · · (q r − q r−1 ).
197. x1 = 3, x2 = 1, x3 = 1; inconsistent; x1 = 1, x2 = 2, x3 = −2;
x1 = 2t1 − t2 , x2 = t1 , x3 = t2 , x4 = 1;
x1 = −8, x2 = 3 + t, x3 = 6 + 2t, x4 = t; x1 = x2 = x3 = x4 = 0;
3
13
20
x1 = 17
t1 − 17
t2 , x2 = 19
17 t1 − 17 t2 , x3 = t1 , x4 = t2 ;
x1 = −16 + t1 + t2 + 5t3 , x2 = 23 − 2t1 − 2t2 − 6t3 , x3 = t1 , x4 = t2 , x5 = t3 .
198. λ = 5.
199. (a) rk = 2, (b) rk = 2, (c) rk = 3, (d) rk = 4, (e) rk = 2.
200. Inverse matrices are:
 
#
"
2 −1
0
0
1
1
1
1
1 −4 −3
1 1
2
0
0 
1 −1 −1   −3
1 −5 −3 , 
.
,
3 −4 
1 −1
1 −1   31 −19
4
−1
6
4
−23
14 −2
3
1 −1 −1
1
208.
n
2
− l(σ), where l(σ) is the length of the permutation.
209. [n]q ! := [1]q [2]q · · · [n]q (called q-factorial), where [k]q :=
qn −1
q−1 .
211. Inertia indices (p, q) = (1, 1), (2, 0), (1, 2).
216. Empty for p = 0, has 2 components for p = 1, and 1 for p = 2, 3, 4.
220. z12 + z22 = 1, z12 + z22 = 0, z2 = z12 , z12 = 1, z12 = 0.
221. Two parallel lines.
226. Those all of whose entries are imaginary.
227. P (z, w) =
1
2 [P (z + w, z + w) + iP (iz + w, iz + w) − (1 + i) (P (z, z) + P (w, w))].
234. dii = ∆i /∆i−1 .
235. (p, q) = (2, 2), (3, 1).
253. No.
254. The sum of the squares of all sides of a parallelogram is equal to the
sum of the squares of the diagonals.
265. id.
"
#
"
#
"
#
1 0
0
−2 0 0
−3 0 0
0 (b)
0 1 0 (c)
0 1 1
307. (a) 0 2
0 0 −1
0 0 1
0 0 1
"
#
"
#
"
#
"
#
2 1 0
2
0 0
−1 0 0
1 0
0
0 0 1 (l) 0 i
0 .
(e) 0 2 1 (f ) 0 −1 0 (k)
0 0 2
0
0 0
0 0 0
0 0 −i
Bibliography
[1]
[2] D. K. Faddeev, I. S. Sominsky. Problems in Linear Algebra. 8th
edition, Nauka, Moscow, 1964 (in Russian).
[3] A. P. Kiselev. Kiselev’s Geometry / Book I. Planimetry. Adapted
from Russian by Alexander Givental. Sumizdat, El Cerrito, 2006.
[4] A. P. Kiselev. Kiselev’s Geometry / Book II. Stereometry. Adapted
from Russian by Alexander Givental. Sumizdat, El Cerrito, 2008.
[5] J.-P. Serre. A Course in Arithmetic.
[6] Ronald Solomon. Abstract Algebra. Thomson — Brooks/Cole, Belmont, CA, 2002.
[7] Berezin
[8] Van der Waerden
[9] Hermann Weyl. Space. Time. Matter.
157
Index
absolute value, 18, 67
addition of vectors, 59
additivity, 5
adjoint linear map, 119
adjoint map, 70, 78
adjoint matrix, 50
adjoint quaternion, 67
adjoint systems, 82
affine subspace, 65
algebraic number, 60
algebraically closed field, 143
annihilator, 69
anti-Hermitian form, 105
anti-Hermitian matrix, 103
anti-symmetric form, 37
anti-symmetric operator, 129
antilinear function, 103
argument of complex number, 19
associative, 34
associativity, 2
augmented matrix, 84
axiom, 59
axis of symmetry, 14
Bézout, 22
Bézout’s theorem, 22
back substitution, 84
barycenter, 7
basis, 3, 71
bijective, 63
bilinear form, 36
bilinearity, 5
Binet, 54
Binet–Cauchy formula, 54
binomial coefficient, 30
binomial formula, 30
block, 46
block triangular matrix, 46
Bruhat cell, 94
canonical form, 24
canonical projection, 65
Cartesian coordinates, 6
Cartesian product, 62
category of vector spaces, 62
Cauchy, 54
Cauchy – Schwarz inequality, 8, 117
Cauchy’s interlacing theorem, 133
Cayley transform, 134
change of coordinates, 34
characteristic equation, 121
characteristic polynomial, 137
Chebyshev, 75
Chebyshev polynomials, 75
classification theorem, 24
codimension, 80
cofactor, 49
cofactor expansion, 49, 52
column space, 86
commutative square, 78
commutativity, 2
commutator, 114
complementary multi-index, 52
complete flag, 92
completing squares, 12
complex conjugate, 17
complex conjugation, 126
complex multiplication, 130
complex sphere, 102
complex vector space, 60
complexification, 126
composition, 33
conic section, 9
conics, 100
conjugate quaternion, 67
coordinate Euclidean space, 128
158
159
Index
coordinate flag, 92
coordinate quaternionic space, 68
coordinate space, 32
coordinate system, 3
coordinate vector, 32
coordinates, 3, 73
Cramer’s rule, 51
cross product, 58
cylinder, 100
Dandelin, 10
Dandelin’s spheres, 9
Darboux, 111
Darboux basis, 111
Descartes, 6
determinant, 28, 41
diagonal, 39
diagonalizable matrix, 143
differentiation, 63
dimension, 72
dimension of Bruhat cell, 95
direct sum, 65
directed segment, 1
directrix, 15
discriminant, 20, 116
distance, 117
distributive law, 18, 32
distributivity, 2
dot product, 5, 36
dual basis, 74
dual map, 70
dual space, 63
eccentricity, 15
eigenspace, 121, 137
eigenvalue, 26, 120, 137
eigenvector, 120, 137
elementary product, 41
elementary row operations, 83
ellipse, 10
ellipsoid, 133
equivalent, 23
equivalent conics, 100
equivalent linear maps, 78
Euclidean inner product, 127
Euclidean space, 127
Euclidean structure, 127
evaluation, 32
evaluation map, 63
even form, 110
even permutation, 42
field, 18, 60
field extension, 64
field of p-adic numbers, 113
finite dimensional spaces, 72
flag, 92
focus of ellipse, 10
focus of hyperbola, 14
focus of parabola, 15
Gaussian elimination, 83
Gelfand’s problem, 136
Gram matrix, 134
Gram–Schmidt process, 107, 118
graph, 65
Hamilton–Cayley equation, 146
Hasse, 113
head, 1
Hermite, 103
Hermite polynomials, 75
Hermitian adjoint, 103, 119
Hermitian adjoint matrix, 103
Hermitian anti-symmetric form, 103
Hermitian dot product, 104
Hermitian form, 103
Hermitian inner product, 117
Hermitian isomorphic, 118
Hermitian matrix, 103
Hermitian quadratic form, 103
Hermitian space., 117
Hermitian symmetric form, 103
Hilbert space, 117
Hirani, 15
homogeneity, 5
homogeneous system, 80
homomorphism, 63
homomorphism theorem, 66
hyperbola, 10
hyperplane, 82
hypersurface, 100
identity matrix, 34
identity permutation, 42
imaginary part, 17
imaginary unit, 17
160
Index
inconsistent system, 85
indices in inversion, 42
induction hypothesis, 73
inertia index, 99
injective, 62
inner product, 5
invariant subspace, 121
inverse matrix, 35
inverse quaternion, 68
inverse transformation, 35
inversion of indices, 42
involution, 126
isometric Hermitian spaces, 118
isomorphic spaces, 63
isomorphism, 63
Jordan
Jordan
Jordan
Jordan
canonical form, 142
cell, 27, 142
normal form, 142
system, 27
kernel, 62
kernel of form, 100, 114
Lagrange, 52
Lagrange polynomials, 75
Lagrange’s formula, 52
law of cosines, 6
LDU decomposition, 91
leading coefficient, 84
leading entry, 84
leading minors, 106
left inverse, 82
left kernel, 69
left singular vector, 125
left vector space, 68
length, 117
length of permutation, 42
linear combination, 2
linear form, 32, 63
linear function, 32, 63
linear map, 33, 62
linear subspace, 61
linear transformation, 35
linearity, 32
linearly dependent, 72
linearly independent, 71
Linnaeus, 23
lower triangular, 38, 89
LPU decomposition, 89
LU decomposition, 91
LUP decomposition, 95
Möbius band, 75
mathematical induction, 73
matrix, 31
matrix entry, 31
matrix product, 32, 33
metric space, 117
Minkowski, 113
Minkowski–Hasse theorem, 113
minor, 49, 52
multi-index, 52
multiplication by scalar, 2
multiplication by scalars, 59
multiplicative, 18
multiplicity, 20
nilpotent operator, 139
non-degenerate Hermitian form, 106
non-negative form, 125
nontrivial linear combination, 72
norm of quaternion, 67
normal form, 24
normal operator, 119, 129
null space, 62, 86
odd form, 110
odd permutation, 42
opposite coordinate flag, 93
opposite vector, 59
orthogonal, 118
orthogonal basis, 97
orthogonal complement, 121
orthogonal diagonalization, 124
orthogonal projection, 134
orthogonal projector, 134
orthogonal transformation, 128
orthogonal vectors, 6
orthonormal basis, 107, 118, 127
parabola, 13
partition, 141
Pascal’s triangle, 150
permutation, 41
permutation matrix, 89
pivot, 84
161
Index
Plücker, 57
Plücker identity, 57
PLU decomposition, 95
polar, 19
polar decomposition, 136
positive definite, 98
positive Hermitian form, 104
positive operator, 135
positivity, 5
power of matrix, 39
principal axes, 14, 133
principal minor, 113
projection, 118
Pythagorean theorem, 6
q-factorial, 95, 156
quadratic curve, 9
quadratic form, 11, 25, 37
quadratic formula, 20
quaternions, 67
quotient space, 65
range, 62
rank, 25, 76
rank of linear system, 80
rank of matrix, 77
real part, 17
real spectral theorem, 129
real vector space, 60
realification, 126
reduced row echelon form, 84
regular nilpotent, 139
right inverse, 82
right kernel, 69
right singular vector, 125
right vector space, 68
ring, 31
root of unity, 21
root space, 138
row echelon form, 84
row echelon form of rank r, 84
row space, 86
scalar, 59, 60
scalar product, 5
semiaxes, 133
semiaxis of ellipse, 13
semilinear function, 103
sesquilinear form, 102
sign of permutation, 41
similarity, 137
similarity transformation, 35
simple problems, 29
singular value, 125
singular value decomposition, 125
skew-filed, 67
span, 71
spectral theorem, 120
spectrum, 26, 132
square matrix, 34
square root, 21, 135
standard basis, 33, 71
standard coordinate flag, 92
surjective, 63
Sylvester, 106
symmetric bilinear form, 37
symmetric matrix, 38
symmetric operator, 129
symmetricity, 5
system of linear equations, 38
tail, 1
total anti-symmetry, 44
transition matrix, 35
transposed bilinear form, 36
transposed partition, 141
transposition, 36
transposition matrix, 89
transposition permutation, 42
triangle inequality, 8, 117
unipotent matrix, 91
unit vector, 5
unitary rotation, 123
unitary space, 117
unitary transformation, 119
upper triangular, 38, 89
Vandermonde, 58
Vandermonde’s identity, 58
vector, 1, 59
vector space, 28, 59
vector sum, 2
Vieta, 22
Vieta’s theorem, 22
Young tableaux, 141
162
zero vector, 2, 59
Index
					 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                            