* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download 3463: Mathematical Logic
Gödel's incompleteness theorems wikipedia , lookup
Intuitionistic logic wikipedia , lookup
Law of thought wikipedia , lookup
History of the function concept wikipedia , lookup
Model theory wikipedia , lookup
Combinatory logic wikipedia , lookup
Quasi-set theory wikipedia , lookup
Structure (mathematical logic) wikipedia , lookup
Computability theory wikipedia , lookup
Mathematical logic wikipedia , lookup
Propositional calculus wikipedia , lookup
Axiom of reducibility wikipedia , lookup
Boolean satisfiability problem wikipedia , lookup
List of first-order theories wikipedia , lookup
Mathematical proof wikipedia , lookup
Curry–Howard correspondence wikipedia , lookup
Sequent calculus wikipedia , lookup
Peano axioms wikipedia , lookup
Halting problem wikipedia , lookup
Interpretation (logic) wikipedia , lookup
Turing's proof wikipedia , lookup
Non-standard calculus wikipedia , lookup
Computable function wikipedia , lookup
Busy beaver wikipedia , lookup
Mathematics 3463 — Logic and foundations,
Michaelmas 2016
Colm Ó Dúnlaing
April 21, 2017
Sources.
• Mendelson. Introduction to Mathematical Logic.
• Chang and Lee. Symbolic logic and mechanical theorem-proving.
• Davis. Computability and unsolvability.
• Machtey and Young. Introduction to the general theory of algorithms.
• Rogers. Theory of recursive functions and effective computability.
• Boolos. The unprovability of consistency.
• Shoenfield. Mathematical logic.
• Various papers, mentioned when they arise.
1 Turing machines
(1.1) Definition An alphabet is a finite nonempty set Σ, any nonempty set, but elements of Σ are
called symbols (or sometimes letters). Σ∗ is the set of finite sequences of symbols from Σ. They are
called em strings over Σ, and they include the empty string, denoted λ.
Thus {0, 1}∗ is the set of bitstrings.
(1.2) Free monoid. Often Σ∗ is called the free monoid (free semigroup with identity) on Σ. The
identity is λ, and the semigroup operation is concatenation. If x and y are strings, then xy is the string
obtained by concatenating x and y in that order.
Turing machines (after Alan Turing) are abstract computers which manipulate strings of symbols.
Traditionally, in mathematical logic, they help us to define the notion of computable function, as
applied to the natural number system.
1
B
1
0
1
1
q2
Figure 1: Turing machine configuration. The current symbol is 0 and the current state is q2 .
For mathematical logic, they have another, more basic, but important use: they allow us to define
what we mean by a natural number (nonnegative integer). In fact, we shall give three alternative definitions of natural numbers just as strings of symbols. The first definition relates directly to counting,
the second to arithmetic of binary numbers, and the third uses a bijection between natural numbers
and the set of all bitstrings (distinguishing 10 and 010, for example).
(1.3) Definition A Turing Machine is an abstract computing system with the following components:
• A finite set K of states.
• An input alphabet Σ.
• A tape alphabet Γ. Γ ⊇ Σ ∪ {B} where B is a distinguished blank symbol. (B ∈
/ Σ).
• A distinguished state q0 ∈ K called the initial state
• A set
δ ⊆ K × Γ × Γ × {L,R} × K
of quintuples, which define a partial function on the first two arguments. In other words, δ
cannot contain two different quintuples
pabµq
pab′ µ′ q ′
beginning with the same state and symbol pa.
The system is imagined to describe a computing machine which operates on an infinite tape
divided into squares. It has a read/write head which scans one square at a time. Each square contains
exactly one tape symbol. A configuration of the machine consists of the machine’s
• current state, which belongs to K,
• its tape contents,
• and the current square, i.e., the square being scanned, i.e., the square where the read/write head
is currently positioned. See Figure 1.
2
The machine executes a computation in a series of steps. Its action at a particular step is determined uniquely, by
pa
where p is its current state and a is the symbol being scanned.
If δ contains no quintuple beginning pa . . . then the machine does nothing; it is in a halting
configuration.
Otherwise, there is a unique quintuple
pabµq
beginning with pa. The machine
• writes b in the current square (overwriting the current symbol),
• moves one square to the left/right if µ = L or R respectively, and
• adopts the state q.
The tape contents are almost entirely blank. That is, a finite part of the tape contains the nonblank
symbols, if any, and the square being scanned.
(1.4) Example. M = K, Σ, Γ, q0 , δ, where K = {q0 , q1 , q2 }, Σ = {1}, Γ = {1, B}. Quintuples:
q0 11Rq0 ,
q0 B1Lq1 ,
q1 11Lq1 ,
q1 BBRq2 .
This machine appends a single 1 to the input, moves to the leftmost nonblank square, and stops.1
(1.5) The initial configuration on input x. An input string is a string in Σ∗ . Given an input string
x = a1 . . . an , the initial configuration on input x is where the string x occupies n consecutive tape
squares, the rest of the tape is blank, the current state is the initial state q0 , and the current square is
the square containing a1 (or any square if n = 0, i.e., x is the empty string).
(1.6) Definition Let M = K, Σ, Γ, q0 , δ be a Turing machine. Since K is a finite nonempty set, one
can view it as a set of symbols. Without loss of generality K and Γ∗ are disjoint sets of symbols; one
can ensure this by renaming the states in K, if necessary.
Note that
Γ∗ \ ({B} · Γ∗ )
is the set of strings in Γ not beginning with the blank symbol. Γ∗ \ (Γ∗ {B}) is the set of strings not
ending with B. CONFIGM is a set of strings in (K ∪ Γ)∗ :
CONFIGM = {α p a β : α ∈ Γ∗ \ ({B} · Γ∗ ), β ∈ Γ∗ \(Γ∗ · {B}), p ∈ K, a ∈ Γ}.
In the above definition, the string
αpaβ
encodes the configuration with state p, tape contents αaβ (extended by blanks infinitely in both directions), α left of the current square, a the current symbol, and β right of the current square.
1
On Monday 26/9/16 I said, wrongly, that it did nothing on empty input.
3
The string
αaβ
is the shortest string which contains the current symbol and every nonblank symbol currently on the
tape. It encodes the current tape contents.
Given an input string x ∈ Σ∗ , the initial configuration on input x is q0 x if x 6= λ and q0 B if x = λ.
We shall use the symbol σ ⊢M σ ′ to mean that σ is a configuration and σ ′ is obtained by M from
σ in a single step.
The notation is very convenient for describing the actions of M . From the first example (1.4):
′
q0 11 ⊢M 1q0 1 ⊢M 11q0 B ⊢M 1q1 11 ⊢M q1 111 ⊢M q1 B111 ⊢M q2 111
With the notation, the actions are very easy to define. For example, the quintuple
pabRq
is applied to any configuration of the form αpaβ, or possibly αp if a is the blank symbol, and yields
αbqβ. There are a few more cases to be considered for quintuples pabLq, but it is all quite simple.
(1.7) Lemma If M is a Turing machine with initial state q0 , and x is an input string, then there is a
unique longest sequence σ0 , σ1 , . . . such that σ0 = q0 x, the initial configuration on input x, and for
j = 1, 2, . . . if σj is defined then σj−1 ⊢M σj .
This longest sequence is bounded if and only if it contains a halting configuration, which is the
last in the sequence, and depends uniquely on σ0 and hence on x. (Easy proof omitted.)
(1.8) Definition If the sequence mentioned above is bounded then we say M halts on input x.
A partial function from a set X to a set Y is a function f : W → Y where W ⊆ X. Given x ∈ X,
we say f (x) is defined if x ∈ W .
Given two alphabets Σ and ∆, let f be a partial function from Σ∗ to ∆∗ . A Turing machine
M = (K, Σ, B, Γ, q0 , δ) realises f if ∆ ⊆ Γ\{B} and for any x ∈ Σ∗ , M halts on input x if and
only if f (x) is defined, and in this case the unique halting configuration has the form qy where either
f (x) = λ and y = B, or f (x) 6= λ and y = f (x).
(1.9) Example. Suppose a natural number n is represented by the string 1n (λ if n = 0). Let
Σ = {1, ‘,’}, so the comma is itself an input symbol. Addition can be realised by a partial function
f : 1m , 1n 7→ 1m+n .
The domain W of f is 1∗ , 1∗ (i.e., strings in which the comma occurs exactly once). It is realised by
the following Turing machine.
q0 11Rq0
q0 , 1Rq1
q1 11Rq1
q1 BBLq2
q2 1BLq3
q3 11Lq3
q3 BBRq4
search for comma
replace comma by 1
search for blank
erase a 1
move left
and halt
4
One would prove that it works using some kind of mathematical induction.
Obviously a natural number n ∈ N can be represented uniquely by a string of n 1s. We can go
further: we can define N in this way.
(1.10) Definition (N: the ‘tally model’ or ‘Gold standard’).
N = {1}∗ .
A k-tuple (n1 , . . . , nk ) ∈ Nk is represented as a string over {1, ‘,’}, namely, 1n1 , . . . , 1nk . Addition,
multiplication, etcetera, are defined explicitly by Turing machines.
(1.11) Definition Let Σ and ∆ be alphabets. If k = 1 let Σ′ = Σ, else let Σ′ = Σ ∪ {, } where the
comma is a new ‘punctuation’ symbol not in Σ.
A function f : (Σ∗ )k → ∆∗ is computable if there exists a Turing machine which on input
q0 x1 , . . . , xk
(xj ∈ Σ∗ ) reaches a halting configuration
qi y
where y = f (x1 , . . . , xk ), or y = B and f (x1 , . . . , xk ) = λ.
A function f : Nk → N is computable2 if the corresponding function
(1m1 , . . . , 1mk ) 7→ 1f (m1 ,...,mk )
is computable.
For example, the successor function is computable (§1.4).
Addition: A Turing machine for addition is given in §1.9.
Multiplication is laborious. A Turing machine could execute multiplication as follows.
The input is q0 1m , 1n and the machine should halt in a configuration qi 1mn .
If the symbol being scanned is a comma (m = 0), erase all nonblank tape symbols and halt
If symbol after comma is blank (n = 0), erase all nonblank tape symbols and halt
Else move to the leftmost nonblank symbol.
Repeatedly
Erase leftmost 1
Scan right past comma,
Repeatedly
Replace leftmost 1 after comma by x
Scan right past nonblanks
Replace first blank by y
2
Traditionally, the word ‘recursive’ was used, but I prefer ‘computable’ since ‘recursive’ has a different meaning in
computer programming.
5
Scan left past ys and 1s to leftmost 1 after comma, if any
Until all 1s after comma have been changed to xs,
Replace all xs after comma by 1s,
Scan leftwards to leftmost nonblank
until leftmost nonblank symbol is comma: erase it,
Scan past all nonblank symbols
Scan left replacing y by 1,
to leftmost nonblank (which is now a 1) on the tape.
Thus successor, addition, and multiplication, are computable.
2 Binary alphabets
f
(2.1) Partial functions. Given X ⊇ Y → Z, we can call f a partial function from X to Z.
(2.2) Multivariate functions. If f : Nk → N is a partial function (k ≥ 1), we can choose bijections
ek : Nk → N so that computability properties of f are transferred to the univariate partial function
f ◦ e−1
k .
To begin with, e1 is the identity on N. Define
e2 : N × N → N
(i + j)(i + j + 1)
e2 : (i, j) 7→
+i
2
which is bijective and computable; moreover, its inverse is also computable.
Inductively, for k ≥ 3, define
ek (x1 , . . . , xk ) = e2 (ek−1 (x1 , . . . , xk−1 ), xk ).
Instead of ek , use angle brackets:
hx1 , . . . , xn i = (def) ek (x1 , . . . , xk ), so if k ≥ 2
hx1 , . . . , xk i = hhx1 , . . . , xk−1 i, xk i.
For each k there exists a Turing machine which can convert a k-tuple of comma-separated numbers into a single number. There are also Turing machines to decode single numbers into k-tuples.
It follows (after investigation) that a function f (r1 , . . . , rk ) is computable if and only if the function g : N → N is computable where
g(hr1 , . . . , rk i) = f (r1 , . . . , rk ).
Often we use binary alphabets. For example, we could represent numbers by bitstrings, ignoring
leading zeroes.
• Converting between unary (tally) and binary representations is straightforward.
6
• Hence any function which is computable when the arguments are bitstring representations is
computable when the arguments are unary, and vice-versa.
The difference is efficiency. Binary (or decimal) notation is unnatural: the natural model is the
tally model. One can, of course, produce relatively efficient implementations of successor, and so on,
with binary numbers. Here is a machine computing the successor function.
q0 00Rq0
q1 10Lq1
q2 11Lq2
q0 11Rq0
q1 01Lq2
q2 00Lq2
q0 BBLq1
q1 B1Lq2
q2 BBRq3
Another way of looking at things is to equate the sequence
λ
1
11
111
1111
11111 . . .
with
0
1
10
11
100
101 . . .
emphasising that the successor function is what matters here.
Computation here is not concerned with efficiency, only with feasibility. It is convenient to devise
a numbering of bitstrings which is bijective. The following sequence does the trick:
λ
0
1
00
01
10
11
000 . . .
called ranked lexicographical or length-lexicographical order.
A few hours’ (or days’) investigation would lead one to accept the following:
(2.3) Proposition A function f : Nk → N is computable iff it is computable in tally notation, or with
bitstrings in the usual way, or with length-lexicographically ordered bitstrings.
Furthermore, a multivariate function f (r1 , . . . , rk ) is computable iff the single-argument function
g : hr1 , . . . , rk i 7→ f (r1 , . . . , rk )
is computable.
Repeat: we recognise three ways of representing the natural numbers N = {0, 1, 2, . . .}.
• Tally notation. The number n is represented by a string of n 1s; so 0 is represented by the
empty string.
• Binary. The numbers 0, 1, 2, . . . are represented in binary:
0, 1, 10, 11, 100, 101, 110, 111, 1000, . . .
• The numbers are represented in length-lexicographical binary form
0 1 2 3 4 5 6
7
8
9 10
11
12
13
14 . . .
λ 0 1 00 01 10 11 000 001 010 011 1000 1001 1010 1011 . . .
This has the convenience that the correspondence between natural numbers and binary strings
is bijective.
• Whichever number-system is used, by using the given bijection from Nk to N, k-argument
functions can be related to unary functions, and we can confine our attention to unary functions.
7
3 The halting problem
In this section we concentrate on Turing Machines whose alphabet is {0, 1} — bitstrings.
Any such machine can be specified fully by listing its quintuples on the assumption that q0 is its
initial state.
Furthermore, the quintuples can be encoded as bitstrings: each tape symbol would be represented
by a bitstring.
Given there are k states and n tape symbols, a state qj can be represented as
qj = 10j+1 1
and a tape symbol aj as
aj = 110j+1 11
and a quintuple
q i aj ak
by the bitstring
L
R
qℓ
0
q i aj ak
qℓ
1
If we stipulate that a0 and a1 represent 0 and 1 respectively, a Turing machine can be represented
as a sequence of bitstrings. The symbol 1 occurs at most 4 times in succession, so if we pad the
representation with 5 1s on the right, we have
Q1 Q2 . . . Qn 11111
(3.1)
and in this representation there is the added bonus that no such representation can be part of a longer
representation.
The same Turing machine has several such representations depending on the order of quintuples
and tape symbols. However
(3.2) Definition Say that a property (or set) of bitstrings is decidable or computable if its characteristic function is computable.
So, it is decidable of a bitstring y whether it encodes a valid Turing machine. In that case, Ty is
the Turing machine it encodes.
TM is the set of bistrings encoding Turing machines.
We sometimes write ‘Ty exists’ when y is a valid encoding of a Turing machine, i.e, y ∈ TM.
CHANGE in definition. I now think it is preferable to define
Ty
for all y: if y is not a valid encoding of a Turing machine then Ty should be a machine computing the
nowhere-defined function ∅. . .
Σ = {0, 1},
Γ = Σ ∪ {B}, K = {q0 },
δ:
q0 0BRq0
q0 1BRq0
q0 BBRq0
8
(3.3) Proposition Every Turing machine can be encoded in this way.
It should be easy to believe that deciding whether y encodes a valid Turing machine is computable
(that is, there is a Turing machine which, on input y, always halts with the result 1 if y is valid, 0 if y
is invalid.
Also, deciding whether the input x can be factored as yz where y encodes a Turing machine, is
computable.
Because the encoding string y should end in a long string of 1s, there is at most one such factorisation possible.
The Halting Problem is to decide whether a Turing machine halts on a given input.
That is,
HALTING = {yz : Ty exists and halts on input z}.
Note that for any string x, there exists at most one factorisation x = yz §1.2) such that Ty exists.
This has been ensured by padding (Equation 3.1).
(3.4) Theorem The Halting Problem is unsolvable.
Sketch proof. Otherwise there exists a Turing machine T , processing input bitstrings x, such that if
x ∈ HALTING then T produces the output 1, and otherwise it produces the output 0.
It is possible to construct a related machine T ′ so that
• T ′ converts its input x to a string xx, then follows the action of T on xx, except
• T ′ produces output 0 (on input x) where T produces output 0 (on input xx), and
• T ′ loops (on input x) where T produces output 1 on input xx.
(that is, where T ends in configuration qi 1, T ′ adds a new state qj and the quintuples
qi 11Rqj
qj AARqj
for every tape symbol A).
In other words,
T ′ (x) =
(
↑ if xx ∈ HALTING
0 if xx ∈
/ HALTING.
T ′ is a well-defined Turing machine and it must be on the list. Suppose T ′ = Tc . Then
T ′ (c) =
(
↑ if cc ∈ HALTING
0 if cc ∈
/ HALTING.
Suppose T ′ (c) ↑. Then cc ∈ HALTING = {yz : . . .}. By the unique factorisation property,
Tc (c) ↓.
Suppose T ′ (c) ↓. Then cc ∈
/ HALTING. That is, for no correct factorisation yz of cc . . . does
Ty (z) ↓. But y = z = c is the only correct factorisation of cc, so Tc (c) ↑. Contradiction.
9
4 Universal Turing machines
In view of the Halting Problem, we cannot always decide (that is, no Turing machine can always
decide) by inspecting its quintuples whether a Turing machine will halt on a given input.
Therefore we generalise the notion of computable function to semicomputable:
(4.1) Definition A function
f
(Σ∗ )k ⊇ D → ∆∗
whose domain D is a set of k-tuples, of strings over Σ is semicomputable if there exists a Turing
machine which computes f for every tuple in D and loops for every tuple outside D.
Textbooks say ‘partial recursive.’
If D = Σ∗ then f is (of course) computable, also called ‘total recursive’ or plain ‘recursive.’
(4.2) Proposition Given our encoding of Turing machines as bitstrings, there exists a Universal
Turing machine U which can ‘simulate’ every Turing machine as follows. On input x (a bitstring)
• If x cannot be factored as yz where Ty exists, then U loops on input x
• If x = yz where Ty exists and halts on input z, then U halts with the same output as Ty
(produces on input z).
• If x = yz where Ty exists but loops on input z, U loops on input x.
Proof omitted. U can be constructed explicitly: how much effort is needed is a matter for speculation.
5 Partial computable functions φn; fixed point theorem
5.1 The partial function φn
Every bitstring defines a number. We interpret it in length lex order, so the relation is bijective.
It might be ok now to think of n as a number, in the sense that we can add and subtract, etcetera,
and also of n as a bitstring (under the length-lex order).
When a TM (with binary input) halts, we need to give a rule for interpreting its tape contents as
a number — for example, the square being scanned could be part of a bitstring, and the longest such
bitstring could be interpreted as the result. That’s probably the simplest way to do it (if the symbol in
that square is not 0 or 1, then the bitstring is λ).
Again, if a bitstring y does not encode a Turing machine, we could define Ty by the following
quintuples
q0 00Rq0 q0 11Rq0 q0 BBRq0
Obviously it loops on every input.
Now we can define a list φn of partial recursive functions
phin (m)
is the number computed by Tn on input m.
10
5.2 Pairing functions
As has been noted in earlier sections, there exist bijections
n1 , . . . nk
↔
hn1 , . . . , nk i.
On the left, we have a k-tuple of natural numbers, on the right a single natural number.
The correspondence is easy to deal with computationally, and it allows us to combine several
arguments into a unique single number.
5.3 Universal Turing Machine
There is a partial recursive function ψ, say, with one argument, such that
(
φn (m) if φn (m) ↓
ψ(t) =
↑ if + +φn (m) ↑
where t = hn, mi. More succinctly,
ψ(hn, mi) = φn (m)
This is not the same as before, because it uses an arithmetic pairing function rather than concatentation of strings, but it is true.
5.4 The fixpoint theorem
(5.1) Theorem (fixed point theorem or maybe recursion theorem) Let f : N → N be any (total)
recursive function. Then there exists an index n such that φf (n) = φn .
Proof. This is probably the slickest proof of any result in existence. It is easiest to remember in
the following form: suppose n = g(m) where m is another index; then we would need to show
φf ◦g(m) = φg(m) .
Now imagine that f ◦ g = φm :
φφm (m) = φg(m) .
This leads us to a requirement for g — apart from its being total recursive, which would seem reasonable:
φφm (m) = φg(m)
if φm (m) ↓.
Indeed, there exists a recursive function g(m) with this property: given input m, g(m) is the index
of another Turing machine M : given input m, g(m) is the index of another Turing machine M :
• If m is not the valid encoding of a TM, then M loops on all inputs.
• Otherwise, on input n, M first attempts to calculate φm (m). If this computation halts with
output r, and r is the valid encoding of a Turing machine N , then M imitates N on input n,
otherwise N loops.
11
In other words, if φm (m) ↓ then M evaluates the partial function φφm (m) . The function g(m) is
an encoding of the machine M , and it is a recursive function of m.
Now, given f , let φm = f ◦ g, a total recursive function, and n = g(m), so
φf (n) = φf ◦g(m) = φφm (m) = φg(m) = φn .
Q.E.D.
6 Multitape Turing machines
7 Total is worse than halting
8 Propositional logic, truth tables, and resolution
8.1 Truth tables and propositional connectives
Propositional logic is concerned with truth-functions, functions whose values are the two truthvalues 0, 1 (for false and true respectively), and whose arguments are also truth-values.
(8.1) Definition boolean variables are variables which are restricted to truth-values. A boolean expression, boolean formula, or formula for short, is a correctly formed expression involving boolean
variables and boolean connectives.
Certain truth-functions are well-known.
0 7→ 1,
1 7→ 0
is simply negation (not). If X is a boolean variable then ¬X is its negation. Negation can be represented in a truth table as follows
¬X
1
0
X
0
1
(0, 0) 7→ 0,
(0, 1) 7→ 0,
(1, 0) 7→ 0,
(1, 1) 7→ 1
is conjunction (and). If X and Y are boolean variables, X ∧ Y represents their conjunction. Here is
the truth table for conjunction.
X
0
0
1
1
Y
0
1
0
1
X ∧Y
0
0
0
1
It can also be displayed in a table as follows.
12
X ∧Y
0
1
0
0
0
1
0
1
Disjunction (or) is represented X ∨ Y and has the following table.
X ∨Y
0
1
0
0
1
1
1
1
(8.2) Definition Two formulae are equivalent if they have the same truth-table.
Implication (if. . . then) is represented X ⇒ Y and has the following table.
X⇒Y
0
1
0
1
0
1
1
1
It is just a way of connecting boolean variables, and in fact X =⇒ Y is equivalent to (¬X) ∨ Y
— the two expressions have the same truth-table. I believe it is called the Philonian conditional.
(8.3) The Philonian conditional is the weakest kind of ‘implication’ which guarantees the following:
If X is true and (X =⇒ Y ) is true then Y is true.
(8.4) The propositional connectives have the following properties, which can be checked by inspecting the truth-tables.
• ∧ and ∨ are commutative and associative.
• ∧ distributes over ∨, that is, X ∧ (Y ∨ Z) and (X ∧ Y ) ∨ (X ∧ Z) are equivalent.
• ∨ distributes over ∧.
• X =⇒ Y is equivalent to (¬X) ∨ Y .
• X ∧ ¬X is always false and X ∨ ¬X is always true.
• ¬¬X and X are equivalent.
• (De Morgan laws.) ¬(X ∧Y ) and (¬X)∨(¬Y ) are equivalent, and ¬(X ∨Y ) and (¬X)∧(¬Y )
are equivalent.
(8.5) Conventions about precedence of connectives. Just as with arithmetic expressions, it is
convenient to drop parentheses from boolean expressions.
• ¬ has highest precedence
• =⇒ has lowest precedence
• there is no distinction of precedence between ∧ and ∨
13
• ∧ and ∨ are evaluated from left to right
• =⇒ is evaluated from right to left
For example
A ∧ B ∧ C ∧ D =⇒ E ∨ F =⇒ G
means
(((A ∧ B) ∧ C) ∧ D) =⇒ ((E ∨ F ) =⇒ G)
If mathematics is about separating the true from the false, then the main problem in propositional
logic is, given a boolean expression, is it always true no matter what the values of its boolean variables? Is it always false?
(8.6) Definition Let F be a boolean formula involving the variables X1 , . . . , Xn . An interpretation or
truth-assignment to F is a map X1 7→ T1 , . . . , Xn 7→ Tn , where T1 , . . . , Tn is a vector of truth-values.
There are 2n interpretations of F .
A boolean formula is a tautology if it is true in all interpretations, and it is inconsistent if it is
false in all interpretations.
To begin with, every truth-function can be realised by a boolean expression using only ∧, ∨, ¬.
(8.7) Definition Sometimes we write X for ¬X. We also define ¬X = X.
A literal is either a boolean variable X or the negation X of a boolean variable.
A disjunctive normal formula (DNF) is a boolean expression of the form
(L1 ∧ L2 ∧ . . . ∧ Lk ) ∨ (Lk+1 ∧ Lk+2 ∧ . . . Lℓ ) ∨ . . . ∨ (Lr+1 ∧ Lr+2 ∧ . . . Ls )
where L1 , . . . , Ls are literals, not necessarily distinct.
For example
X ∨Y
is a very simple DNF.
(8.8) Lemma Every truth-function f can be realised by a DNF.
Non-proof. Suppose f has the following truth-table:
X1 X2 X3 f
X1 X2 X3 f
0
0
0 0
1
0
0 0
0
0
1 1
1
0
1 1
0
1
0 1
1
1
0 0
0
1
1 0
1
1
1 0
The DNF is easily formed by picking out the rows where the f -value is 1.
(X1 ∧ X2 ∧ X3 ) ∨ (X1 ∧ X2 ∧ X3 ) ∨ (X1 ∧ X2 ∧ X3 )
This will break down when f is everywhere zero (false). In that case, use the single conjunct
X1 ∧ X1 .
14
(8.9) Definition A formula is in conjunctive normal form (CNF) if it is of the form
(L1 ∨ L2 ∨ . . . ∨ Lk ) ∧ (Lk+1 ∨ Lk+2 ∨ . . . Lℓ ) ∧ . . . ∧ (Lr+1 ∨ Lr+2 ∨ . . . Ls )
where L1 , . . . , Ls are literals, not necessarily distinct.
(8.10) Corollary Every truth-function can be realised by a CNF.
Proof. Let D be a DNF realising the negation of f (T1 , . . . , Tn ). The formula ¬D is easily
converted into a CNF using De Morgan’s laws, and it realises f . Q.E.D.
8.2 The first goal of mathematical logic
The first goal is to provide methods for proving true things which are true.
At present, the ‘things’ are Boolean formulae, and ‘true’ means ‘tautology.’
A certain way of proving something (containing n Boolean variables) true is to check it against
all 2n interpretations — in other words, build the truth-table.
Resolution (see below) provides a generally more efficient method. It is not always very efficient,
as was shown at different times by Tseitin, Galil, Haken, and Fouks. The P=NP? question makes it
very doubtful whether truly efficient methods exist.
8.3 Resolution proofs and refutations
There is an important proof method called Robinson’s Resolution Principle. It can be applied to a
DNF to test for a tautology and to a CNF to test for inconsistency (it is easy, but not very useful, to
test a CNF for tautology or a DNF for inconsistency). The method is essentially the same for each.
We consider testing a CNF for inconsistency (i.e., whether it is contradictory).
The subformulae Li ∨ Li+1 ∨ . . . ∨ Lj are called clauses. One regards each clause as a set of
literals. This is acceptable because ∨ is commutative and associative. One also views the CNF as a
set of clauses, and repeatedly adds resolvents to the set of clauses.
Given two clauses C and C ′ , a resolvent of C and C ′ is constructed as follows. It is necessary
that C contains a literal L whose complement L occurs in C ′ . In this case suppose
C = L1 ∨ . . . ∨ Lk ∨ L
and C ′ = L′1 ∨ . . . ∨ L′m ∨ L
then the clause obtained by resolving L and L is
L1 ∨ . . . ∨ Lk ∨ L′1 ∨ . . . ∨ L′m .
It is possible that k = m = 0, in which case the resolvent is not a conventional formula but is called
the empty clause and written .
Note. We extend the definition of truth-value under a truth-assignment, by saying that a clause
(in a CNF) is true if and only if at least one of the literals in the clause is true.
This extends the definition because is automatically false, whatever the interpretation.
It does no harm to regard a CNF as a list of clauses, or even a set of clauses in no particular order.
15
To construct a Resolution refutation of a
CNF F means to start with F (as a list of
clauses) and repeatedly add new clauses to
the list by resolving clauses already present,
until the list contains .
For example, Modus Ponens is another ‘inference rule’ (see 8.3):
From X and X =⇒ Y , infer Y .
The following is a kind of justification of Modus Ponens: we show that
X, X ∨ Y, Y
are inconsistent.
X, X ∨ Y, Y
X, X ∨ Y, Y , Y
X, X ∨ Y, Y , Y, Or we may present the proof by listing the clauses as they are supplied or generated by resolution.
Given the CNF
A ∨ D,
A ∨ D,
A ∨ B ∨ C,
A ∨ B ∨ C,
B,
D ∨ C,
D∨C
here is a resolution refutation (proof of inconsistency).
7
→
A ∨ B ∨ C, B
A ∨ B ∨ C, B
7
→
A ∨ C, C ∨ D
7→
A ∨ D, A ∨ D
A, A ∨ D
C ∨ D, D
A ∨ C, C
A, A
A∨C
A∨C
A∨D
7→ A
7→ D
7→ C
7→ A
7→ 8.4 Proof trees
A resolution refutation can be given in a tree-like arrangement as illustrated in Figure 2. If we label
the edges by the literals eliminated, and reverse the direction, and remove the resolvents labelling the
tree nodes, we get a structure as illustrated in Figure 3. The interesting thing about this arrangement
is that for every leaf in the tree, every literal in the clause labelling in the leaf occurs as an edge-label
in the path from the root to that leaf.
This gives a direct way of associating interpretations with input clauses which they contradict. As
an example take the interpetation
A 7→ 1, B 7→ 0, C 7→ 1, D 7→ 0
16
ABC
B
AD
AD
AC
CD
CD
A
C
D
AD
A
ABC
B
ABC
B
AC
AD
AD
AC
CD
A
Figure 2: resolution proof tree
D
CD
D
A
C
B
C
B
A
D
D
B
ABC
C
B
B
C
CD
D
D
A
A
AD
AD
ABC
B
AD
B
ABC
C
B
B
C
CD
Figure 3: refutation tree
17
ABC
B
AC
CD
AD
AD
CD
A
C
D
AD
A
ABC
B
AC
Figure 4: resolution proof graph
D
C
A
A
C
D
B
B
CD
D
A
D
A
AD
AD
B
ABC
C
B
B
C
CD
ABC
B
Figure 5: refutation graph
Choose the path from the root which ‘refutes’ this interpretation: A, B, C, D. This leads to the clause
ABC. Every interpretation is refuted in this way.
In the resolution tree, every time a resolvent is used it must be ‘re-proved.’ This inefficiency can
be avoided if we consider graphs rather than trees. These are illustrated in figures 4 and 5.
(8.11) Lemma Given a CNF S, if can be constructed from the clauses in S using resolution, then
S is false in every interpretation.
Sketch proof. Informally, a refutation tree shows that S is false under every interpretation. A
formal proof is easily produced by induction on the number of boolean variables.
(8.12) Definition Let S be a set of clauses and L a literal.
S\L = (def)
{C\{L} : C ∈ S ∧ L ∈
/ C}.
(8.13) Lemma If S is inconsistent then S\L is inconsistent.
Proof. Suppose that Let T be a truth-assignment, making S\L true. Extend it to all literals
occurring3 in S by making L false. Then for every clause C in S, if L ∈ C then C is true, and if
L∈
/ C then C\L is true, so C is again true.
(8.14) Theorem A CNF S is inconsistent if and only if the empty clause is in S or can be generated
from S by resolution.
3
It was pointed out in class that other literals besides L, L might be lost, that is, S\L may omit some other boolean
variables as well. But if another literal A is lost, then A and A can occur only in clauses containing L, and they can be
assigned arbitrary truth-values and S will still be satisfied.
18
Sketch proof. The ‘if’ part has been mentioned already (8.11).
Only if: by induction on n, the number of boolean variables in S. Immediate if n = 0: S = ∅
(consistent) or S = {} (inconsistent). If n = 1, S contains just one boolean variable X, and is
inconsistent, then either ∈ S or X, X ∈ S, and in any case can be generated.
Induction: Choose X ∈ S. By the above lemma, S\X and S\X are inconsistent. By the inductive
hypothesis can be generated both from S\X and S\X.
Take a proof-tree showing generated from S\X. For every leaf of the proof tree, if it is labelled
by a clause C in S, leave it untouched; else, it is labelled by a clause C\{X} where X ∈ C and
C ∈ S. Relabel it by C.
Work from the leaves to the root, relabelling nodes. Generally, the new clause labelling a node
will either be the old clause C ′ or a clause C ′ ∪ {X}, and it is a resolvent of the clauses labelling its
children. We obtain a resolution proof tree whose leaves are labelled with clauses from S.
It follows that either no clause was altered and has been derived from S, or some clauses were
altered and X has been derived from S.
Similarly, or X can be derived from S. Since X and X resolve to , the proof is complete.
Example. S = XY, XY, Y . S\X = Y, Y 7→ . Thus from S, XY, Y 7→ X.
S\X = Y, Y 7→ . Thus from S, XY, Y 7→ X.
Then X, X 7→ .
8.5 Converting expressions to CNF and DNF
Given any expression E involving, say, ∧, ∨, ¬, =⇒ , we can produce an equivalent DNF by
• constructing the truth-table for E
• Constructing the DNF from the truth-table
Suppose for simplicity that the expression E is fully parenthesised. For example, it cannot be
A =⇒ B ∨ C: it should be A =⇒ (B ∨ C). We can perform the conversion directly as follows.
• Repeatedly replace subformulae (X =⇒ Y ) in E by (¬X ∨ Y ) until all such expressions
have been replaced.
• Repeatedly replace ¬(X ∧ Y ) by (¬X) ∨ (¬Y ), and ¬(X ∨ Y ) by (¬X) ∧ (¬Y ) until all such
expressions have been replaced.
Perform this procedure ‘outside-in,’ that is, if a replaceable expression B occurs within another
A, change A first. This pushes the ¬ connectives to the inside of the expression tree.
• Repeatedly remove ¬¬ until no double negatives remain.
• By this time the expression consists of literals connected by mixtures of ∧ and ∨. Repeatedly,
while there exists a subexpression of the form (X∨Y )∧Z or X∧(Y ∨Z), find an innermost such
sub-expression, i.e., where X, Y, Z do not need to be fixed, and replace it by (X ∧ Z) ∧ (Y ∧ Z)
or (X ∧ Y ) ∨ (X ∧ Z).
This description could be expanded into a formal proof of the following result.
(8.15) Lemma Any expression E involving =⇒ , ¬, ∨, ∧, can be converted to a DNF by transformations which apply only the rules presented in paragraph 8.4.
19
9 Axiom system for propositional logic
Resolution provides a procedure for verifying contradictions, and hence, tautologies. For Mathematical logic, a ‘generating’ view is taken rather than a verification: axioms are supplied, and tautologies
are verified by being deduced from the axioms. The system is due to Frege (I think): anyway, it is the
system covered in Mendelson’s book.
A logical system involves formulae, axioms, and rules of inference. Resolution is an example of
a rule of inference. Our system for propositonal logic uses modus ponens, which is a restricted form
of resolution.
Formulae are built using the two connectives ¬ and =⇒ . Since X ∨Y is equivalent to (¬X) =⇒
Y , and X ∧ Y is equivalent to ¬(X =⇒ ¬Y ), any CNF can easily be translated into a formula
using only these connectives. Therefore the two connectives ¬, =⇒ , are adequate for expressing all
truth-functions.
There are three groups of logical axioms in our system. Each group represents infinitely many
axioms, since A, B, and C can be any formula:
(I) A =⇒ (B =⇒ A)
(II) (A =⇒ (B =⇒ C)) =⇒ ((A =⇒ B) =⇒ (A =⇒ C))
(III) ((¬B) =⇒ (¬A)) =⇒ (((¬B) =⇒ A) =⇒ B)
(9.1) Lemma Every logical axiom is a tautology.
Sketch proof. Easily proved by analysing the truth-tables of each logical axiom.
There is one rule of inference:
Modus ponens.4 From A and A =⇒ B, deduce B.
Systems may also include some extra proper axioms.
Supposing that Γ is the set of proper axioms, possibly empty, and Z a formula, a deduction
or proof of Z from Γ in the system is a finite sequence of formulae with justifications, where the
justification of each step A is that either
• A is a logical axiom,
• A is a proper axiom, i.e., A ∈ Γ, or
• A is deduced from two earlier formulae B and B ⇒ A by Modus Ponens.
and Z occurs in one of the steps of the proof. We write
Γ⊢Z
when Z can be deduced from Γ, and
⊢Z
when ∅ ⊢ Z. In this case, i.e, Γ = ∅, Z is called a theorem (of SC).
4
This is a restricted kind of resolution.
20
(9.2) Definition A system of the above kind, with logical axioms I–III and Modus ponens, is called a
sentential calculus. When there are no proper axioms, we call the system a pure sentential calculus.
(9.3) Lemma Suppose Γ ⊢ Z, a particular proof being given. Let I be an intepretation of all the
Boolean variables occurring in Γ and in the formulae occurring in the proof. Suppose
I(A) = 1 for all A ∈ Γ.
Then I(Z) = 1. In particular if ⊢ Z then Z is a tautology.
Proof. (By induction on the length of the given proof.) If Z is a proper axiom then I(Z) = 1. If
Z is a logical axiom then it is a tautology (this is easily checked with truth-tables), so I(Z) = 1. If Z
is deduced from earlier formulae A and A ⇒ Z, then I(A) = 1 and I(A =⇒ Z) = 1, so I(Z) = 1
also. Q.E.D.
Now to prove our first theorem within the system.
(9.4) Lemma ⊢ A ⇒ A.
1.
2.
3.
4.
5.
Proof. The following is a proof of A =⇒ A.
(A ⇒ ((A ⇒ A) ⇒ A)) ⇒ ((A ⇒ (A ⇒ A)) ⇒ (A ⇒ A)) (Axioms II).
A ⇒ ((A ⇒ A) ⇒ A) (Axioms I).
((A ⇒ (A ⇒ A)) ⇒ (A ⇒ A)) (1,2, MP).
(A ⇒ (A ⇒ A)) (Axioms I).
A ⇒ A (3,4, MP). Q.E.D.
(9.5) Corollary (¬A =⇒ A) ⊢ A
Proof.
¬A =⇒ A given
¬A =⇒ ¬A Lemma 9.4
A (2,1,III,MP twice) Q.E.D.
(9.6) In mathematical proofs, in order to prove A ⇒ B, it is customary to assume A and deduce B.
In fact, this is almost the invariable practice. The following simple yet very important result shows
that the practice is just a convenient short-cut.
(9.7) Theorem (the Deduction Theorem for Sentential Calculus).
If Γ, A ⊢ B then Γ ⊢ A ⇒ B.
Proof. By induction on the length of proofs. In proofs of length 1 either (i) B = A, (ii) B ∈ Γ,
or (iii) B is a logical axiom.
In case (i) Γ ⊢ A ⇒ B by Lemma 9.4.
In cases (ii) and (iii), Γ ⊢ B, and Γ ⊢ B ⇒ (A ⇒ B) (Axioms I), so Γ ⊢ A ⇒ B by MP.
For the inductive step, suppose that B is the formula given in the n + 1st step of a proof. If B is
justified under cases (i)–(iii) above, the same arguments apply. Otherwise (iv) B arises from using
MP from two previous formulae in the proof, so Γ ⊢ C and Γ ⊢ C ⇒ B in a proof of length ≤ n. By
induction
Γ ⊢ A ⇒ (C ⇒ B)
and
Γ ⊢ A ⇒ C.
21
Since
Γ ⊢ (A ⇒ (C ⇒ B)) ⇒ (A ⇒ C) ⇒ (A ⇒ B)
(Axioms II), Γ ⊢ A ⇒ B by two applications of MP. This completes the inductive step. Q.E.D.
(9.8) Corollary Implication is transitive, i.e.,
A =⇒ B, B =⇒ C ⊢ A =⇒ C
1.
2.
3.
4.
5.
A
Proof.
A hypothesis
A =⇒ B given
B 1,2,MP
B =⇒ C given
C 3,4,MP.
Thus, A, A =⇒ B, B =⇒ C ⊢ C, so by the Deduction Theorem, A =⇒ B, B =⇒ C ⊢
=⇒ C. Q.E.D.
(9.9) Implication and deduction. Suppose A ⊢ B. By the Deduction Theorem, ⊢ A =⇒ B.
Suppose ⊢ A =⇒ B. By Modus Ponens, A ⊢ B. Thus, in Sentential Calculus, if we can prove
A =⇒ B then we can deduce B from A, and vice-versa.
Our aim is
(9.10) Theorem A formula S is a tautology if and only if ⊢ S.
This will be proved by connecting SC with resolution proofs. The main point is that resolution
can be imitated in SC.
(9.11) Lemma ¬¬A ⊢ A
2.
3.
4.
5.
Proof. 1. ¬¬A given
¬A =⇒ ¬¬A 1, I, MP
¬A =⇒ ¬A Lemma 9.4
(¬A =⇒ ¬¬A) =⇒ ((¬A =⇒ ¬A) =⇒ A) III
A 2,3,4, MP twice. Q.E.D.
(9.12) Lemma (counterpositive). (A =⇒ B) ⊢ (¬B) =⇒ (¬A)
1.
2.
3.
4.
5.
6.
Proof. It is enough to prove A =⇒ B, ¬B ⊢ ¬A.
(¬¬A) =⇒ A (Lemma 9.11)
A =⇒ B given
(¬¬A) =⇒ B 1,2,Transitivity
¬B given
(¬¬A) =⇒ ¬B I,4,MP
¬A 5,3,III, MP twice. Q.E.D.
(9.13) Lemma A ⊢ ¬¬A
22
Proof: exercise.
(9.14) Definition Two formulae B and B ′ in SC are equivalent in SC if B ⊢ B ′ and B ′ ⊢ B.
(9.15) Corollary (subformula substitution). Suppose A, B, B ′ are formulae where B, B ′ are equivalent in SC. Let A′ be the formula obtained from A by replacing some occurrences of B in A by B ′ .
Not all occurrences of B need be replaced by B ′ . Then A and A′ are equivalent in SC.
Sketch proof. The proof is by induction on the length of A. If A is a boolean variable then if B
is the same variable then A = B and A′ = A or A′ = B ′ . The result holds in this case.
If A is ¬C then A′ is ¬C ′ where by induction we can assume that C and C ′ are equivalent in SC.
Thus
C ⊢ C ′ : ⊢ C =⇒ C ′ : ⊢ (¬C ′ ) =⇒ (¬C)
by Lemma 9.12. By symmetry, ⊢ (¬C ′ ) =⇒ (¬C).
If A is (C =⇒ D), then A′ is (C ′ =⇒ D′ ). By induction, assuming A, C ′ =⇒ C and
D =⇒ D′ , so by transitivity, C ′ =⇒ D′ : A ⊢ A′ . Similarly A′ ⊢ A. Q.E.D.
(9.16) Definition We introduce ∨ and ∧ and define them in terms of ¬ and =⇒ as follows.
(A ∨ B)
(A ∧ B)
=(definition)
=(definition)
(¬A) =⇒ B
(¬((¬A) ∨ (¬B)))
(9.17) Lemma (i) B ⊢ A ∨ B
(ii) ∨ is commutative, i.e., A ∨ B ⊢ B ∨ A
(iii) A ⊢ A ∨ B
(iv) A ∧ B ⊢ A
(v) A ∧ B ⊢ B
(vi) A, B ⊢ A ∧ B
(vii) ∧ is associative
(viii) ∨ distributes over ∧
(ix) ∧ is commutative and ∨ is associative
(x) ∧ distributes over ∨
Proof. (i) B ⊢ A ∨ B from I and MP.
(ii) Suppose A ∨ B, i.e., (¬A) =⇒ B.
1. (¬A) =⇒ B given
2. (¬B) =⇒ ¬¬A (Lemma 9.12)
3. (¬B) =⇒ A (2, Lemma 9.11, transitivity)
i.e. B ∨ A as required.
(iii) Immediate from (i) and (ii).
(iv) Suppose A ∧ B, i.e., ¬((¬A) ∨ (¬B)).
1. (¬A) =⇒ ((¬A) ∨ (¬B)) from (iii)).
2. (¬((¬A) ∨ (¬B))) =⇒ ¬¬A (1, Lemma 9.12), i.e.
23
A ∧ B =⇒ ¬¬A. 3. ¬¬A =⇒ A (Lemma 9.11).
By transitivity, A ∧ B =⇒ A, which is equivalent to (iv).
(v) Similarly, using commutativity of ∨.
(vi) Let X be ¬¬A =⇒ ¬B, so ¬X is identical to A ∧ B.
1. A (given)
2. X (hyp)
3. ¬¬A (1, Lemma 9.13)
4. ¬B (3,2,MP)
5. ⊢ X =⇒ ¬B (1–4,DT)
6. ¬¬B =⇒ ¬X (Lemma 9.12)
7. B (given)
8. ¬¬B (7, Lemma 9.13)
9. ¬X (8,6,MP) as required.
(vii – ix): exercises.
(x) can be deduced from (viii), proof omitted. Q.E.D.
(9.18) Lemma (resolution valid in SC). A ∨ L, B ∨ ¬L ⊢ A ∨ B. (Here A or B can be empty, but
not both).
1.
2.
3.
4.
5.
6.
Proof.
¬A hyp
A ∨ L given, i.e. (¬A) =⇒ L
L 1,2,MP
B ∨ ¬L given
(¬L) ∨ B, i.e., L =⇒ B (∨ commutative)
B 3,5,MP. In other words,
A ∨ L, B ∨ ¬L, ¬A ⊢ B, so
A ∨ L, B ∨ ¬L ⊢ ¬A =⇒ B (DT). In other words, A ∨ L, B ∨ ¬L ⊢ A ∨ B. Q.E.D.
Similarly,
(9.19) Lemma (dual resolution valid in SC). A ∧ B ⊢ (A ∧ L) ∨ (B ∧ ¬L).
(9.20) Corollary If X, a formula of SC, is a tautology, then it is a theorem of SC.
Proof. By applying valid transformations to ¬X, we can get a CNF Z ′ which is equivalent to ¬X
in SC. This has the form
C1 ∧ C2 ∧ . . . CN
(Parentheses are unnecessary because ∧ is associative.) From this, the separate clauses C1 , C2 , . . . , CN
can be deduced in SC. Since resolution is valid in SC (Lemma 9.18), every step in a resolution refutation, except the last, can be simulated in SC. The last step can’t be simulated, since is outside
the scope of SC; for this reason also, none of the clauses Cj can be empty, so there is a final step
involving resolution.
C1 , . . . , CN ⊢ L and C1 , . . . , CN ⊢ ¬L
24
for some literal L. Therefore
Z′ ⊢ L
so
⊢ Z ′ =⇒ L
by the Deduction Theorem. Since Z ′ and ¬X are equivalent,
⊢ ¬X =⇒ L.
Similarly
⊢ ¬X =⇒ ¬L.
Using
((¬X ⇒ ¬L) → (¬X → L)) =⇒ X,
and MP twice, we have completed a proof of X in SC. Q.E.D.
10
First-order languages
We have seen zero-order logic which is concerned with formulae built from boolean variables.
First-order logic is infinitely more expressive. It allows for functions, and in place of boolean
variables we have infinitely many atomic formulae. There is a notion of interpretation, more elaborate
than the straightforward truth-assignment of first-order logic.
First-order theories include Frege’s three axiom schemes in Propositional logic, plus two more
schemes. Note: these schemes are subject to certain conditions.
IV ∀(xi A(xi )) =⇒ A(t)
V (∀xi (A ⇒ B)) =⇒ (A → ∀xi B)
Exceptions.
∀x1 ∃x2 (x1 6= x2 ) does not necessarily imply ∃x2 (x2 6= x2 ),
∀x1 (x1 = 0 ⇒ x1 = 0) does not imply x1 = 0 ⇒ ∀x1 (x1 = 0).
Recall that Sentential Calculus enables proof of all tautologies, formulae which are true under all
interpretations. The system we aim towards will enable proof of all logically valid formuae, formulae
which are true under all interpretations.
The order of development is as follows.
• Introduce first-order languages in which the formulae are constructed.
• Introduce interpretations of first-order languages, and define when a formula is true under an
interpretation, and under all interpretations (logical validity).
• Show that formulae (IV) and (V), with important qualifications, are logically valid.
• Introduce first-order theories.
25
Our study of zero-order logic is a prelude to the much more important first-order logic. First-order
logic is suitable for formalising most mathematical theories, such as Peano arithmetic, ZermeloFraenkel set theory, theory of groups, and so on. For example, here is a set of axioms for elementary
group theory:
• Equality is reflexive, symmetric, and transitive, and substitutions of equals yields equals.
• Multiplication is associative.
• e is a 2-sided identity.
• Every element has a 2-sided inverse.
These axioms must be expressed in a formal language. A first-order language is built from the
following elements:
• A countable sequence x1 , x2 , . . . of individual variables.5
• A set (possibly empty) of individual constants ai .
• A set (possibly empty) of function symbols fi . With each fi is associated its arity, a positive
integer.
• A set (possibly empty) of predicate symbols Pi . With each Pi is associated its arity, a nonnegative integer, possibly zero.6
For group theory we need one constant a1 for the identity, a binary function f1 for multiplication,
a unary function f2 for the inverse, and a binary predicate P1 for equality. The axioms can be expressed as follows. (∀ means ‘for all.’)
∀x1 P1 (x1 , x1 )
∀x1 (∀x2 (P1 (x1 , x2 ) ⇒ P2 (x2 , x1 )))
∀x1 (∀x2 (∀x3 (P1 (x1 , x2 ) ⇒ (P1 (x2 , x3 ) ⇒ P1 (x1 , x3 )))))
∀x1 (∀x2 (∀x3 (∀x4 (P1 (x1 , x3 ) ⇒ (P1 (x2 , x4 ) ⇒ P1 (f1 (x1 , x2 ), f1 (x3 , x4 )))))))
∀x1 (∀x2 (P1 (x1 , x2 ) ⇒ P1 (f2 (x1 ), f2 (x2 ))))
∀x1 (∀x2 (∀x3 P1 (f1 (x1 , f1 (x2 , x3 )), f1 (f1 (x1 , x2 ), x3 ))))
∀x1 P1 (f1 (a1 , x1 ), x1 )
∀x1 P1 (f1 (x1 , a1 ), x1 )
∀x1 P1 (f1 (x1 , f2 (x1 )), a1 )
∀x1 P1 (f1 (f2 (x1 ), x1 ), a1 )
The formal notation is of course extremely clumsy, and we shall always deal with informal translations which can be converted into formal versions, in principle. The same goes with the logical
5
The adjective ‘individual’ is immediately forgotten and individual variables and individual constants are called just
variables and constants.
6
An interpretation of a 0-ary predicate (rather unlikely to be useful) is true or false without qualification — essentially,
a boolean variable.
26
connectives: we can use ∨, ∧, ⇐⇒ , ∃ freely on the assumption that the formulae can be converted
into equivalent versions in the formal language. The symbol ∃ means ‘for some’ (‘there exists’ is the
obvious meaning, but it requires an additional ‘such that.’7 It is understood that ∃xi A is equivalent to
¬(∀xi (¬A)).
Indeed, the relations between ∀ and ∃ are also De Morgan laws, except that for us the first is a
definition (of an abbreviation) and the second follows from other De Morgan laws.
∃xA abbreviates ¬∀x¬A
∀xA is equivalent to ¬∃x¬A
The expressions of the formal language are its terms and formulae. These are built up recursively
using
• The set of terms is the smallest set T such that T contains all variables and constants, and if f
is a k-ary function symbol and t1 , . . . , tk ∈ T then the string of symbols f (t1 , . . . , tk ) is also in
T.
• A term in which no variable occurs is called a ground term.
• The atomic formulae are strings of the form P (t1 , . . . , tk ) where P is a k-ary predicate symbol
and t1 , . . . , tk are terms.
• The set of formulae is the smallest set F containing all atomic formulae with the property that
whenever A, B ∈ F then so are (¬A), (A =⇒ B), and (∀xj A), for any variable xj .
• An expression is either a term or a formula.
Our formalised axioms for group theory are formulae as required above, except that the outermost
parentheses are omitted, being unnecessary. We shall frequently omit parentheses where they seem
unecessary, and where the result is not obviously ambiguous, without formulating rules to disambiguate it.
(10.1) Definition Let A be a formula.
• If A contains a subformula (∀xr B), then that subformula is the scope of that occurrence of the
quantifier ∀xr .
• All occurrences of xr within that scope are bound.
• A free occurrence of xr is one not occurring within the scope of a quantifier ∀xr .
11
Interpretations, quantifiers, substitution
We now consider theories which use the universal quantifier ∀, and also the existential quantifier ∃,
which is just an abbreviation for ¬∀¬.
7
It is more consistent with ‘for all’ to read ∃ as ‘for some.’
27
11.1 Interpretations and models
First we must understand how to interpret such theories. The definition is Tarski’s. Compare this with
propositional logic: an interpretation there is an assignment of truth-values to the boolean variables.
These would be equivalent to 0-ary predicates in first-order logic.
An intepretation I of a first-order language consists of the following
• A nonempty domain D
• For each constant ai an element aIi of D
• For each k-ary function symbol fi a map fiI : Dk → D
• For each k-ary predicate symbol Pi a map PiI : Dk → {true,false} (rather, {1, 0}), or equivalently (via characteristic functions), a subset of Dk .
For example, if D is the set of invertible 3 × 3 matrices then the natural interpretation, call it
M , would make eM = I,8 and take the product map to matrix product, the inverting map to matrix
inverse, and the equality predicate to equality of matrices.
Such an interpretation can be extended to all ground terms and ground formulas, but not to terms
and formulas with variables. For this we introduce snapshots (a term used only in this course). A
snapshot σ is a countable sequence σ1 , σ2 , . . . of elements of D. It is used purely as a mapping from
the set of variables into D, i.e., xi 7→ σi . We write
xσi
rather than σ(xi ).
(11.1) Definition If σ and τ are snapshots, and i, j, . . . are indices, we write
σ=τ
eex i, j, . . .
that is σ and τ coincide everywhere except i, j, . . . means that for all indices r, if r is not listed as an
exception then σr = τr , and if it is then σr and τr may differ, though not necessarily.
Also, if d ∈ D,
σi7→d
is the snapshot σ ′ = σ (eex i) such that
σj′
=
(
d if j = i
σj if j 6= i.
(11.2) Definition (Truth relative to a snapshot.) Given an interpretation I and a snapshot σ, one
can assign a value to every term and formula, with or without variables:
• If ai is a constant, then aσi is aIi .
8
This is not what it seems. Recall that e represents the identity in group theory.
28
• If xi is a variable, then xσi is σi .
• If t = f (t1 , . . . , tn ) is a term, then tσ is the result of applying the function f I to the arguments
tσ1 , . . . , tσn .
• If A = Pi (t1 , . . . , tn ) is an atomic formula, then Aσ is the result of applying the truth-function
PiI to the arguments tσ1 , . . . , tσn .
• If A is ¬B, then Aσ is the complement of B σ .
• If A is B ⇒ C, then Aσ is true iff either B σ is false or C σ is true.
• If A is (∀xi )B, then Aσ is true if and only if for all d ∈ D,
B σi7→d
is true.
We usually prefer to write
I, σ |= A
to mean that Aσ is true.
The following short definition is crucial to our understanding of first-order logic.
(11.3) Definition (Truth in I.) I |= X iff for all snapshots σ,
I, σ |= X.
From this definition, it would appear that universal quantifiers are unnecssary. No, they are necessary. It is only near the beginning of formulae that quantifiers might be unnecessary. They affect
substitution. Anyway,
(11.4) Corollary I |= X iff I |= ∀xi X.
(11.5) Definition A model of a theory T is an interpretation in which all proper axioms of T are
true.
So an interpretation of group theory could attach any kind of nonsensical meaning to the formulae; a model must conform to the axioms.
When I |= X we say that X is true in I, and when I, σ |= X we say that X is true with respect
to (w.r.t.) I, σ.
The following lemma is easily proved from the above recursive definitions.
(11.6) Lemma Let i, σ, τ satisfy
σ=τ
(eex i).
(i) If t is a term in which xi does not occur, then tσ = tτ .
(ii) If X is a formula containing no free occurrence of xi , then
I, σ |= X ⇐⇒ I, τ |= X.
29
11.2 Substitution
(11.7) First, in a quantified formula (∀xi A), one calls all occurrences of xi bound, and none of them
are altered by substitution. That is, the quantifier causes substitutions of xi to be suppressed. Here
are the rules for substitution.
Let y1 , . . . , yr be distinct variables. (That is, there is a selection i(1), . . . , i(r) of indices such that
yj = xi(j) , 1 ≤ j ≤ r.) Also, let t1 , . . . , tr be terms.
Suppose that s is a term. We will write it as
s(y1 , . . . , yr )
to indicate that the variables y1 , . . . , yr probably occur in s (it is not obligatory; nor must they be the
only variables occurring in s). They are listed with a view to substituting.
Definition.
s(t1 , . . . , tr )
is the term got by taking all occurrences in s of all variables yj and simultaneously replacing each yj
by the corresponding tj .
Given a quantifier-free formula A(y1 , . . . , yr ),
A(t1 , . . . , tr )
is the formula got by taking all occurrences in A of all variables yj and simultaneously replacing each
yj by the corresponding tj .
(This is an informal rather than a formal definition, of course. A formal definition would use
some kind of induction.) We concentrate on compound formulae.
Also, we use ~y as an abbreviation for y1 , . . . , yr and ~t as an abbreviation for t1 , . . . , tr .
• If A is (¬B), then A(~t) is (¬B(~t).
• If A is (B =⇒ C), then A(~t) is (B(~t) =⇒ C(~t)).
• If A is
(∀xi B(~y ))
where xi is not one of the variables yj , then
A(~t)
is
(∀xi B(~t))
• If A is
(∀xi B(~y ))
where xi is one of the variables yj , let
C(y1 , . . . , yj−1 , yj+1 , . . . , yr ) = B(y1 , . . . , yr )
30
This means that while yj is probably mentioned in C, it is removed from the list to suppress its
substitution. Then
A(t1 , . . . , tr )
is (∀xi C(t1 , . . . , tj−1 , tj+1 , . . . , tr ))
(In the case where r = 1, A(t1 ) and A(xi ) are the same.)
(11.8) Lemma If xi does not occur free in an expression (term or formula) E then E(t) = E.
(11.9) Second, we would expect
(∀xi A(xi )) =⇒ A(t)
to be always true. It isn’t. For example, we would not expect
(∀x1 ∃x2 (x1 6= x2 )) =⇒ (∃x2 (x2 6= x2 ))
to be true. This leads to the idea of a term t being free for a variable xi in a formula A.
(11.10) Definition t free for xi in A: Given a variable xi , term t, and formula A, if some variable
xj occurs in t where some free occurrence of xi is within the scope of a quantifier ∀xj , then t is not
free for xi in A.
(Put another way: substituting t for xi in A would produce an occurrence of t in which some
variable in t would be bound.)
So (corrected): t is free for xi in A means that no free occurrence of xi is within the scope of a
quantifier ∀xj where xj occurs in t.
(11.11) Lemma Let s(xi ), t be terms, σ a snapshot. Let
τ = σi7→tσ
Then τ = σ (eex i) and
sτ = (s(t))σ .
Proof By induction on the structure of s.
• If s is a constant ar then s(t) is s and sτ = s(t)σ = aIr .
• If s is xi then s(t) is t, s(t)σ is tσ , and sτ is tσ .
• If s is xj , j 6= i, then s(t) is s, s(t)σ = σj , and sτ = τj = σj .
• If s is fj (t1 (xi ), . . . , tk (xi )), then s(t) is fj (t1 (t), . . . , tk (t)), s(t)σ is fjI (t1 (t)σ , . . . , tk (t)σ ), sτ
is fjI (tτ1 , . . . , tτk ), and by induction these objects are the same.
(11.12) Lemma Suppose A(xi ) is a formula and t a term,
where t is free for xi in A.
Let σ be any snapshot. Then
I, σ |= A(t)
iff
I, σi7→tσ |= A(xi ).
31
Proof. By induction on the structure of A. The proof is routine except for one case.
• If A is an atomic formula Pj (t1 (xi ), . . . , tk (xi )), then A(t) is Pj (t1 (t), . . . , tk (t)).
I, τ |= A iff PjI (tτ1 , . . . , tτk ) is true.
With τ = σi7→tσ , by the above lemma, the right-hand side is PjI (t1 (t)σ , . . . , tk (t)σ ), i.e., I, σ |=
A(t).
• (The hardest case.) Suppose that A(xi ) is (∀xj B(xi )) where xi has a free occurrence in B.
By the way, the induction hypothesis will be assumed for every σ.
It is required to show that (i) for every d ∈ D, I, σj7→d |= B(t) if and only if (ii) for every
d ∈ D, I, τj7→d |= B(xi ).
Assuming (i), pick any d, and let τ ′ = τj7→d .
Let σ ′ = σj7→d .
Note that
τj′ = σj′ = d
τi′ = τi = tσ
Since xi occurs free in A(xi ) and t is free for xi in that formula, xj does not occur in t; but
σ ′ = σ (eex j), so
′
tσ = tσ .
Therefore
′
τi′ = tσ .
Since (i) is assumed,
I, σ ′ |= B(t)
so by induction,
I, τ ′ |= B(xi ),
as desired.
That (ii) =⇒ (i):
Assume (ii), choose any d ∈ D, and let σ ′ = σj7→d . We need to prove
I, σ ′ |= B(t).
Let τ ′ = τj7→d . By similar reasoning, since xj does not occur in t, τ ′ = σi7′→tσ′ , so assuming (ii),
I, τ ′ |= B(xi ), so by induction, I, σ ′ |= B(t), as desired.
• Suppose A is ¬B and the result holds for B. Then A(t) is ¬B(t). Again, τ = σi7→tσ .
I, σ |= A(t) iff
B(t) is false in I under σ iff (induction)
B is false in I under τ iff
I, τ |= A.
32
• Suppose A is B =⇒ C. Again τ = σi7→tσ . Then A is false in I, τ iff
B is true and C is false, iff (induction)
B(t) is true and C(t) is false in I, σ, iff
A(t) is false in I, σ.
• Suppose xi does not occur free in A. Then A and A(t) are the same, and
I, σ |= A iff I, τ |= A (Lemma 11.6).
• This holds in particular if A has the form (∀xi B).
(11.13) Corollary If t is free for xi in B(xi ), then (∀xi B(xi )) =⇒ B(t) is true in all interpretations.
Proof. Given an intepretation I, we need to prove that
I |= (∀xi B(xi )) =⇒ B(t).
For each σ, we need to show that either ∀xi B(xi ) is false, or B(t) is true, under I and σ.
We assume the first part is not false, so for every τ = σ (eex i),
I, τ |= B(xi ).
This is true in particular when t = σi7→tσ , and by Lemma 11.12,
I, σ |= B(t)
as required.
(11.14) Lemma If xi does not occur free in B, then
(∀xi (B ⇒ C)) =⇒ (B ⇒ (∀xi C))
(11.15)
is true in all interpretations.
Proof. Let I be an interpretation. If
(∀xi (B ⇒ C))
(11.16)
is false in I, then the implication (11.15) is true. So we assume that (11.16) is true in I, and hence
I, σi7→d |= B ⇒ C
for every snapshot σ and domain element d.
Suppose B is false under I, σ. Then
I, σ |= B ⇒ ∀xi C
33
(11.17)
Suppose B is true under I, σ:
I, σ |= B
Then for any d ∈ D, since xi does not occur free in B,
I, σi7→d |= B
By (11.17),
I, σi7→d |= C
for every d, so
I, σ 7→ ∀xi C
so
I, σ |= B ⇒ ∀xi C
in this case also.
12
First-order theories
(12.1) A first-order theory T consists of the following:
• A first-order language L. The expressions of T is an expression of L, and an interpretation of
T is an interpretation of L.
• A set of formulae of L, possibly empty, constituting the proper axioms of T ,
• Five groups of logical axioms of T , including the axioms from Sentential Calculus. In the
following axioms, A, B, and C are any formulae in L.
1. (A ⇒ (B ⇒ A)).
2. ((A ⇒ (B ⇒ C)) ⇒ ((A ⇒ B) ⇒ (A ⇒ C))).
3. (((¬B) ⇒ (¬A)) ⇒ (((¬B) ⇒ A) ⇒ B)).
4. ((∀xi A(xi )) ⇒ A(t)), if t is free for xi in A(xi ).
5. ((∀xi (A ⇒ B)) ⇒ (A ⇒ (∀xi B))), if xi does not occur free in B.
• Two rules of inference: Modus ponens,
A, A ⇒ B ⊢ B,
and Generalisation,
A ⊢ (∀xi A).
(12.2) Deductions, proofs, and theorems of a first-order theory. Let T be a first-order theory and
Γ a set of formulae of T . A proof or deduction from Γ in T is a sequence of steps, each step giving a
formula and a justification. If B is the formula given in a step of the proof, then the justification must
be one of the following.
34
1. B ∈ Γ,
2. B is a proper or logical axiom of T ,
3. B is deduced from two earlier formulae C and C ⇒ B using Modus Ponens, or
4. B is (∀xi C), C is an earlier formula, and B is deduced from C using Generalisation.
If A is a formula in T , then A is deducible from Γ in T , written
Γ ⊢T A,
if there exists a deduction from Γ in T where A is a formula given in one of the steps of the deduction.
If
∅ ⊢T A
then we also write
⊢T A
and say A is a theorem of T .
(12.3) Lemma (Details omitted to reduce clutter). Every instance of a tautology is provable in T ; it
is also true in every interpretation.
(12.4) Definition Let T be a first-order theory. A model of T is an interpretation I of T with the
property that I |= A for every proper axiom A of T .
(12.5) Lemma Let T be a first-order theory, Γ a set of formulae in T , and I a model of T such that
I |= Γ.
Then every formula deducible from Γ in T is also true in I.
Proof omitted to reduce clutter. Easy induction on proof length.
(12.6) Lemma If t is free for xi in A(xi ) then (i) (∀xi A(xi )) ⊢T A(t), and (ii) A ⊢T A(t).
Proof. For (ii),
1. A(xi ) (Given).
2. (∀xi A(xi )) (1, Gen).
3. (∀xi A(xi )) ⇒ A(t) (Axioms IV).
4. A(t) (2,3, MP).
Omitting (1) we get a proof of (i). Q.E.D.
In particular, with t = xi , we have two valid inferences:
A ⊢ (∀xi A)
and
(∀xi A) ⊢ A.
which seems to imply that A and ∀xi A are equivalent. So they are, in some sense. However, this is
not to say that A ⇒ (∀xi A). Indeed, this last formula need not be logically valid.9
9
This has been noted earlier.
35
(12.7) This is an example of A ⊢T B where A ⇒ B is false in at least one model of T , and hence
A ⇒ B is not a theorem of T . In other words, the Deduction Theorem for zero-order logic does not
always hold in first-order theories.
(12.8) Definition Let P be a deduction from Γ, A in a first-order theory T . An occurrence of B in
one of the proof steps depends on A if either B = A, and the justification is that it is a given formula,
or B is deduced from C, C ⇒ B using Modus Ponens, where C or C ⇒ B depends on A, or B is
deduced from C using Generalisation, where C depends on A.
(12.9) Lemma Suppose Γ, A ⊢ B in a proof where the considered occurrence of B does not depend
on A. Then Γ ⊢T B.
Proof. By induction on the length of proof. If B ∈ Γ or B is an axiom of T then Γ ⊢ B. If B
is deduced from two earlier formulae C, C ⇒ B not depending on A, using Modus Ponens, then by
induction Γ ⊢T C and Γ ⊢ C ⇒ B, so Γ ⊢ B using Modus Ponens. If B is deduced from an earlier
formula C using Generalisation, where C does not depend on A, then Γ ⊢ C by induction, so Γ ⊢ B
by Generalisation. Q.E.D.
(12.10) Theorem (Deduction theorem in first-order theories). Suppose Γ, A ⊢T B with a proof in
which no formula depending on A is subjected to generalisation on a variable occurring free in A.
Then
Γ ⊢T A ⇒ B.
Proof. By induction on the length of proof. If B (more correctly, if the step being considered in
which B is deduced) does not depend on A, then Γ ⊢ B by Lemma 12.9), so Γ ⊢ A ⇒ B using an
Axiom I and Modus Ponens.
If B = A then ⊢T A ⇒ B (Lemma 9.4), and if B ∈ Γ or B is a logical or proper axiom of T then
⊢ A =⇒ B using one of the axioms I. If B is deduced from earlier formulae C and C ⇒ B using
Modus Ponens then by induction Γ ⊢ A ⇒ C and Γ ⊢ A ⇒ (C ⇒ B). Also,
⊢T (A ⇒ (C ⇒ B)) ⇒ ((A ⇒ C) ⇒ (A ⇒ B)).
Using two applications of Modus Ponens, Γ ⇒ A ⇒ B.
Finally, if B is deduced from an earlier formula C, where C depends on A, so B is (∀xi C), then
xi does not occur free in A. Therefore
⊢T (∀xi (A ⇒ C)) ⇒ (A ⇒ (∀xi C)).
By induction, Γ ⊢T A ⇒ C, so using Generalisation, Γ ⊢T (∀xi (A ⇒ C)). Therefore Γ ⊢T A ⇒ B
using Modus Ponens. Q.E.D.
(12.11) Derived rules. There are various short-cuts, called derived rules, to abbreviate proofs in
first-order theories (also in zero order). We have already seen the derived rule
A(xi ) ⊢ A(t)
where t is free for xi in A. With the use of De Morgan Formulae ∃ ≡ ¬∀ and (X ⇒ Y ) ⇒ ((¬Y ) ⇒
(¬X)) (Lemma 9.12),
36
The next rule we call ∃-introduction.
A(t) ⊢ (∃xi A(xi )),
(12.12)
assuming t is free for xi in A.
We have also used derived rules A ⊢ B ⇒ A from Axioms I, and similar shortcuts involving
Axioms II and III.
More generally, if a formula A is seen to be an instance of a tautology, then ⊢ A by Corollary
9.20. We might say also that B is a tautological consequence of A if A ⇒ B is an instance of a
tautology, and again say A ⊢ B.
The following lemma is left vague, but spells out how these abbreviations are used.
(12.13) Lemma If A is deduced from Γ in T using an abbreviated proof, or pseudo-proof, incorporating the above-mentioned derived rules, then Γ ⊢T A. That is, the abbreviated proof can be
converted into a correct proof in T .
The next derived rule is more subtle. It is a modified form of the ‘choice rule’ given by Mendelson,
and is common mathematical practice. We call it the ‘Fix xi ’ rule. Suppose a proof reaches a formula
∃xi B(xi ).
To fix xi means to allow the formula B(xi ) to be assumed for the remainder of the proof.
The important thing is that xi be treated as a constant, not a variable, which means that it cannot
be generalised.
If generalisation were not suppressed then the effects would be ridiculous. For example:
1. ∃x1 P1 (x1 ) (Given).
2. P1 (x1 ) (1, fixing x1 ).
3. ∀x1 P1 (x1 ) (2, Gen).
(12.14) Definition The fix xi rule is as follows. If in a proof (of a formula A) a formula
∃xi B(xi )
occurs, we can admit the formula
B
provided that
neither xi nor any (other) variable free in B is generalised in any step depending on the
step in which B is introduced,
and
xi does not occur free in A.
(12.15) Theorem (the fix xi rule or fixing rule). Suppose that an abbreviated proof is given in
which A is deduced from Γ in T using the ‘Fix xi ’ rule (one or more times). Then Γ ⊢T A.
37
Proof by induction on the length of proofs. It is enough to consider proofs containing just one
occurrence of the fixing rule. Suppose B(xi ) is admitted, fixing xi and all variables free in B. This
is equivalent to a deduction
Γ, B(xi ) ⊢ A.
The restriction on variables in B ensures that the criterion for the Deduction Theorem 12.10 are
satisfied, so
Γ ⊢ B(xi ) ⇒ A.
Applying De Morgan rules to this last formula we deduce
(¬A) ⇒ (¬B(xi ))
where xi is not free in A. Then by generalisation
(∀xi (¬A) ⇒ (¬B(xi )),
and using Axioms V (since xi is not free in ¬A),
(¬A) ⇒ (∀xi (¬B(xi )).
This is equivalent to
(∃xi B(xi )) ⇒ A,
and Γ ⊢T ∃xi B(xi ), so Γ ⊢T A. Q.E.D.
(12.16) Definition Given a formula A(xi ) and a variable xj , say that xj can replace xi if xj is free
for xi in A(xi ), and xj does not occur free in A(xi ).
In this case we say that A(xi ) and A(xj ) are similar.
Recall (?) X ⇐⇒ Y is an abbreviation for (X ⇒ Y )∧(Y ⇒ X), which itself is an abbreviation.
(12.17) Lemma If xj can replace xi in A(xi ) then (i) xi can replace xj in A(xj ) (producing the
original formula A(xi )), and (ii) ⊢ (∀xi A(xi )) ⇐⇒ (∀xj A(xj )).
Proof omitted to reduce clutter.
One last result we shall state without proof; it extends Corollary 9.15.
(12.18) Lemma Suppose A, B, B ′ are formulae where
⊢T B ⇐⇒ B ′ .
Let A′ be the formula obtained from A by replacing some occurrences of B in A by B ′ . Not all
occurrences of B need be replaced by B ′ .
Then
⊢T A ⇐⇒ A′ .
38
13
Prenex normal forms and Skolem functions
(If it hasn’t been mentioned already, ⊢PC means provability in predicate calculus, i.e., in the firstorder theory with given language and no proper axioms.) Also, ⊢SC means provable in Sentential
Calculus, i.e., axiom schemes I-III and MP.
As usual, ∃ is an abbreviation for ¬∀¬.
Relabelling quantified variables. Let A(xi ) be a formula, and suppose
• xj is free for xi in A(xi ).
• xj has no free occurrences in A(xi ).
Then A(xi ) and A(xj ) (in the latter, all free occurrences of xi are replaced by xj .) are called similar
(13.1) Lemma If A(xi ) and A(xj ) are similar, then xi is free for xj in A(xj ), and replacing all free
occurrences of xj by xi in A(xj ), we obtain the original formula A(xi ).
Also,
⊢SC (∀xi A(xi )) ⇐⇒ (∀xj A(xj )).
Proof. If xi were not free for xj in A(xj ), then there would be a free occurrence of xj in A(xj )
within the scope of a quantifier ∀xi ; but all free occurrences of xj in A(xj ) match free occurrences
of xi in A(xi ), which of course cannot occur in the scope of ∀xi .
Since all free occurrences of xi are replaced to yield A(xj ), xi has no free occurrences in A(xj ).
Thus A(xj ) and A(xi ), the latter meaning the result of substituting xi for xj in A(xj ), are similar.
Again, because free occurrences of xj in A(xj ) match free occurrences of xi in (the original)
A(xi ), substituting xi for xj in A(xj ) produces the original A(xi ).
Thus A(xj ) and A(xi ) are similar.
Next, observe ∀xi A(xi ) ⊢SC ∀xj A(xj ):
1.∀xi A(xi ) Given.
2.∀xi A(xi ) =⇒ A(xj ) Axiom IV.
3.A(xj ). 1,2,MP.
4.∀xj A(xj ) 3, Gen.
Since the single use of generalisation was on a variable xj not occurring free in ∀xi A(xi ),
⊢SC (∀xi A(xi )) =⇒ ∀xj A(xj ).
The converse implication follows similarly or by symmetry.
A formula is prenex if all its quantifiers are on the left:
Q1 y1 . . . Qk yk A(y1 , . . . , yk )
where for 1 ≤ i ≤ k, Qi is ∀ or ∃. Possibly k = 0.
(13.2) Theorem For any formula A, there is a formula A′ , where A is prenex and provably equivalent
to A′ , i.e.,
⊢PC A ⇐⇒ A′ .
39
Given x does not occur free in B, the following equivalences are provable in PC:
(B =⇒ ∀xC) ⇐⇒ ∀x(B =⇒ C)
((∃xC) =⇒ B) ⇐⇒ (∀x(C =⇒ B))
(∀x¬C) ⇐⇒ ¬∃xC
(∃x¬C) ⇐⇒ ¬∀xC
(B =⇒ ∃xC) ⇐⇒ ∃x(B =⇒ C)
((∀xC) =⇒ B) ⇐⇒ ∃x(C =⇒ B).
(13.3)
Since these can be used repeatedly (since quantified variables can be renamed) to move quantifiers
to the left, Theorem 13.2 follows.
13.1 Skolem form
Let T be a theory in which all proper axioms are closed and in prenex normal form. For example,
∃u∀v∀w∃x∀y∃zA(u, v, w, x, y, z)
We can add new function letters and constant letters and replace each existentially-quantified
variables by a term depending on the universally-quantified variables to its left (a constant if it is
leftmost):
A(a, v, w, f (v, w), y, g(v, w, y))
The new constants and functions are called Skolem functions. The procedure is fairly common: for
example, in set theory, the axiom of the empty set
∃x∀y(y ∈
/ x)
becomes
∀y(y ∈
/ ∅)
or in the theory of monoids,
∃x∀y(xy = y ∧ yx = y)
becomes
∀y(ye = ey = y)
On the other hand, existential quantification seldom defines a function. We could have
∀x∃y(y 6= x)
which becomes
40
∀x(f (x) 6= x).
and f has the appearance of a ‘choice function.’ If the ‘Skolemised’ version of a theory has a model,
it seems reasonable that the theory itself has a model, and the converse holds using the Axiom of
Choice. Actually, the Axiom of Choice is not necessary. We can prove the result with a direct
analysis of the proof procedures.
With the Axiom of Choice, it is easy.
Of course, one should avoid the Axiom of Choice. This axiom can be proved relatively consistent
(Gödel), but the arguments bring in the completeness theorem.
14
Herbrand models and Gödel’s completeness theorem
Gödel’s completeness theorem for first-order logic says that every consistent theory has a model.
If T is a theory and A a formula (in its language), we write
T |= A
to mean that A is true in every model of T .
Recall that a closed formula is one without free variables, and a ground term is a term without
variables.
(14.1) Lemma If T is a theory, A a closed formula, and A cannot be proved in T , then T + ¬A is
consistent.
Proof. If T + A is inconsistent, then there exists a formula X such that X and ¬X can be proved
in T . By the Deduction Theorem,
⊢T (¬A) ⇒ X
⊢T (¬A) ⇒ ¬X
and A can be deduced using Axiom III and MP.
(14.2) Definition Let T be a theory whose axioms are all ‘Skolemised.’ If T has no constants then
extend it by adding a1 as a constant.10
The Herbrand universe of T is the set of all ground terms.
We shall need König’s Lemma.
(14.3) Lemma Let T be a tree in which the children of each node are explicitly ordered,11 and every
node has finitely many children.
If T is infinite, then there exists an infinite branch. (Elementary set theory.)
10
11
One must extend the range of the axioms I–V and one must show that this does not cause inconsistencies.
Otherwise the Axiom of Choice is needed.
41
(14.4) Theorem If T is a ‘Skolemised’ theory in which every finite set of ground instances of the
proper axioms is consistent, then T has a Herbrand model.
Proof. Extend the theory if necessary to include at least one constant; let H be the Herbrand
universe. Let G be the set of ground instances of (proper) axioms of T ; the logical axioms are
tautologies. Let P1 , . . . be an enumeration of the ground atomic formulae of T . For n = 1, 2, . . .,
there are 2n truth-assignments
Ijn : {P1 , . . . , Pn } → {0, 1},
j = 1, . . . , 2n
enumerated in some suitable manner. Each truth-assignment corresponds to a formula Xjn = ±P1 ∧
±P2 . . . ± Pn .
Fix n. If for every j,
⊢T ¬Xjn
then ⊢T ¬X, where
X
_
is
Xjn .
j
Choose any proof within T , and let C be the conjunction of all the ground instances of the nonlogical formulae involved. Then
C ⊢PC ¬X.
By the Deduction Theorem, since C has no variables,
⊢PC C ⇒ ¬X
so X =⇒ ¬C. But X is a tautology, so ¬C is a tautology, so ⊢T ¬C and T would be inconsistent.
Therefore for each n there exists an assignment
±P1 , . . . , ±Pn
which is consistent with T . Apply König’s Lemma to the tree whose nodes carry truth-assignments
±P1 , . . . , ±Pn which are consistent with T and where the assignment at the child or children of a
node extends the assignment at a node.
There is an infinite branch, hence a truth-assignment
±P1 , ±P2 , . . .
to all the ground atomic formulae. This defines a complete interpretation of the language over the
Herbrand Universe. By construction, every ground instance of every proper axiom of T is true in this
interpretation, so it is a model of T .
(14.5) Lemma If T is a first-order theory and T has a model M , then T is consistent.
Sketch Proof. Every axiom of T is true in M . We have seen that MP and Gen preserve truth in
M , so every theorem of T is true in M . But no contradiction is can be true in any model.
42
(14.6) Corollary A (Skolemised) theory is consistent if and only if it has a model.
(14.7) Corollary Every formula which is true in every model of T is a theorem of T .
Proof. Let A be such a formula (a closed formula). If A is not a theorem then T +¬A is consistent
and there exists a model of T in which A is false.
(14.8) Corollary If A is a closed formula which is ‘logically valid,’ meaning true in every interpretation, then A can be proved in Predicate Calculus (in which there are no proper axioms).
(14.9) Corollary Every consistent theory has a countable model.
15
Peano Arithmetic
(15.1) The natural numbers, or nonnegative integers, are denoted N.
N = {0, 1, 2, 3, . . .}.
That is, the natural numbers are just the nonnegative integers. Zero is a natural number, the rest
are positive.
Our formal language for arithmetic has
• One constant which we’ll call 0.
• One unary function, which we’ll write as s, and call the successor function. (Textbooks often
write x′ for the successor of x.)
• Two binary functions, sum and product, for which we’ll use the usual notation x + y and xy.
• One relation, equality. (The inequality relation, x 6= y is an abbreviation for ¬(x = y).)
(15.2) Peano’s axioms are as follows:12
Axiom 0 Equality is reflexive, symmetric, and transitive; substitution of equals for equals in arithmetic
expressions yields equal results; substitution of equals for equals in formulae yield equivalent
formulae.
Axiom 1 0 is not a successor, i.e.
∀x1 (s(x1 ) 6= 0).
Axiom 2 Every other natural number is a successor, i.e.,
∀x1 (x1 6= 0 ⇒ ∃x2 (x1 = s(x2 ))).
12
These axioms are different from the original version of Peano’s.
43
Axiom 3 Cancellation: if x + 1 = y + 1 then x = y.
∀x1 ∀x2 (s(x1 ) = s(x2 ) ⇒ x1 = x2 ).
Axiom 4 Principle of induction: if P (x) is a property of natural numbers x such that P (0) and P (x) ⇒
P (x + 1) for all natural numbers x, then P (x) is true for all natural numbers x.
[P (0)
∧
∀x(P (x) ⇒ P (s(x)))]
⇒ ∀xP (x)
Addition and multiplication of natural numbers x are described by some equations:
Axiom 5 ∀x1 (x1 + 0 = x1 )
Axiom 6 ∀x1 ∀x2 ((x1 + s(x2 )) = s(x1 + x2 ))
Axiom 7 ∀x1 (x1 0 = 0)
Axiom 8 ∀x1 ∀x2 (x1 s(x2 ) = (x1 x2 ) + x1 )
We can deduce a surprisingly strong result about multiplication:
(15.3) Lemma If x and y are nonzero, then so is xy.
Proof. Using Axiom 2, let x = s(u) and y = s(v). Then xy = s(u)s(v) = s(u)v + s(u) =
s(s(u)v + u), using Axioms 8 and 6 respectively. Hence xy 6= 0 by Axiom 1.
More formally:
(We omit a proof of 21 steps, many of which are abbreviated, appealing to the Fix rule, Deduction
Theorem, properties of equality, etcetera).
This is not, of course, a formal proof in the system, as details have been omitted. Formal proofs
are too long — we shall have to be satisfied with informal proofs.
Indeed, to save time, we shall be satisfied with no proofs.
(15.4) Definition 1 is an abbreviation for s(0).
(15.5) Corollary (to the definition.) x + 1 = s(x).
Proof. x + 1 = x + s(0) = s(x + 0) = s(x), using the definition of 1 and Axioms 6 and 5
respectively. Q.E.D.
Now let us derive some properties of addition and multiplication.
(15.6) Lemma Zero is an additive identity on the left: 0 + x = x.
44
Proof. By induction. Let P (x) mean 0 + x = x.
The basis of the induction is P (0), which proved as follows: from Axiom 5, 0 + 0 = 0. This is
just P (0).
P (x) =⇒ P (s(x)): this is proved by assuming P (x) (the inductive hypothesis) and deducing
P (s(x)). Assume P (x), so assume 0 + x = x. From Axiom 6, 0 + s(x) = s(0 + x), Since by
induction 0 + x = x, 0 + s(x) = s(x), that is, P (s(x)), as required. Q.E.D.
We state some other fairly easy theorems (of PA) without proof.
• x + (y + z) = (x + y) + z
• s(x) + y = x + s(y)
• x+y =y+x
• x + z = y + z =⇒ x = y
• 0x = 0
• s(x)y = xy + y
• x1 = x
• 1x = x
• x(y + z) = (xy) + (xz)
• (x + y)z = (xz) + (yz)
• x(yz) = (xy)z
• xy = yx
(15.7) Definition
x≤y
or
y≥x
is an abbreviation, meaning ∃z(x + z = y) (z is a nonnegative integer), and
x<y
or
y>x
means ∃z 6= 0(x + z = y).
(15.8) Definition (there exists a unique . . . )
∃!xi A(xi )
is an abbreviation for
(∃xi A(xi )) ∧ ∀xi ∀xj ((A(xi ) ∧ A(xj )) =⇒ xi = xj )
Also,
∃∗ xi A(xi )
means that there exists at most one xi such that A(xi ).
45
(15.9) Lemma ∀x(∀y ≥ x)∃!z(x + z = y)
(15.10) Definition (subtraction). If x ≤ y then we write y − x for the unique z such that x + z = y.
Abbreviations: divergence from Mendelson. Mendelson would introduce new predicate letters
for <, ≥, etcetera; we consider ‘x < y’ and so on as abbreviations for formulas truly in the language
of PA.
We have a reasonable definition of subtraction. In Mendelson’s book this would be treated as an
excuse to introduce a new function symbol to the language of PA, to represent this function.
(There is an objection: subtraction is a partial function, not a total function. It should be made a
total function. Usually, one stipulates y − x = 0 if y < x.)
Instead, we regard the minus sign as an abbreviation of some kind. It really boils down to using
the subtraction function in a very restricted way, i.e., in the form
x=y−z
is an abbreviation for
(y < z ∧ x = 0) ∨ (y ≥ z ∨ y = x + z)
Since y < z and y ≥ z are themselves abbreviations, this is an abbreviation for a longer formula.
• ≤ is transitive
• Trichotomy: exactly one of the following holds for any x, y:
x < y,
x = y,
x > y.
• If x 6= 0 and xy = xz then y = z.
• Division algorithm:
∀x(∀y 6= 0)(∃!q, r)(r < y ∧ x = qy + r).
16
Representability of semicomputable functions
Unlike theories about R, for instance, Peano arithmetic has the property that all the numbers it is
‘thinking about’ have ‘names.’
(16.1) Definition A numeral is a ground term of the form
s(s(. . . (s(0) . . .))
or, abbreviating,
sn (0)
where n ≥ 0. We call these ground terms numerals and write
n
for sn (0).
46
Thus:
The numerals are part, not all, of the Herbrand Universe.
In an odd sort of way, PA (and weaker theories of arithmetic) can be thought of as a calculator, in
the sense that a proof of a result such as
⊢PA 2 + 2 = 4
is a bit like evaluating 2 + 2. Because PA can prove theorems like this, we say that addition is
representable in PA.
The Prolog programming language uses ideas like this.
Nonstandard models. N is not the only model of Peano Arithmetic. From the principle of
induction, there cannot exist infinite descending sequences
t1 > t2 > . . .
(in this case t1 would have to be infinite) yet one can construct nonstandard models which allow such
sequences. The subtle point is that there is no formula of PA describing such a sequence.
(16.2) Definition A function
f : Nk → N
(or a partial function) is representable in Peano Arithmetic (by a formula A(x1 , . . . , xk+1 )) if for any
natural numbers r1 , . . . , rk ,
⊢PA ∃∗ xk+1 A(r1 , . . . , rk , xk+1 )
and if f (r1 , . . . , rk ) is defined then
⊢PA A(r1 , . . . , rk , f (r1 , . . . , rk ))
Logic textbooks don’t pay much attention to partial functions, only to total functions; computability textbooks are different. Notice that a function — i.e., a total function — which is semicomputable
is actually computable. When total functions are considered, ∃∗ can be replaced by ∃!.
Our goal is to prove that
every semicomputable function is representable in PA
(actually, in certain finitely axiomatised subtheories).
17
Recursive and primitive recursive functions
(17.1) Definition A function is primitive recursive (p.r.) if it can be constructed from the following
functions:
• The zero function
Z(n) : n 7→ 0.
• The successor function
s : n 7→ n + 1
47
• The projection functions
Ujk : (n1 , . . . nk ) 7→ nj
(1 ≤ j ≤ k).
by a finite sequence of applications of
• Substitution: given p.r. functions fj : Nm → N, 1 ≤ j ≤ k, and a p.r. function g : Nk → N,
the composite function g(f1 (~x), . . . fk (~x)) (from Nm to N) is also p.r.
• Primitive recursion: Given a k-ary function g and a (k + 2)-ary function h (if k = 0 then g is
a natural number), the k + 1-ary function f defined by the recurrence
f (~n, 0) = g(~n),
f (~n, r + 1) = h(~n, r, f (~n, r))
is also p.r.
(17.2) Proposition Every p.r. function is computable.
(Because the primitive functions are computable, and primitive recursion and composition can be
carried out on a Turing machine.)
(17.3) Lemma Addition and multiplication are primitive recursive.
Proof. For addition
n+0=n
n + (r + 1) = s(n + r)
g(n) = n
h(n, r, n + r) = s(n + r).
Similarly for multiplication.
Ackermann’s function is the best-known example of a function which is recursive but not primitive recursive. A(m, n) is defined as follows
n + 1 if m = 0
A(m, n) = A(m − 1, 1) if n = 0
A(m − 1, A(m, n − 1)) if m, n > 0
Ackermann’s function is computable, but is not p.r. because every p.r. function has growth rate
comparable to some ‘row’ A(m, . . .) of Ackermann’s function (the Grzegorczyk Hierarchy). The
function A(m, m) grows extremely fast.
(17.4) Lemma The Zero function is representable. Likewise for the successor and projection functions.
48
Proof. Z(n) = 0 is represented by the formula
x1 = x1 ∧ x2 = 0.
I’m not sure why we are obliged to ‘mention’ the variable x1 , but that’s how it’s done in Mendelson.
It is provable (in PA) that for any n there exists at most one x2 such that n = n ∧ x2 = 0, and so on.
The successor function is represented by the formula x1 = x1 ∧ x2 = s(x1 ).
The projection function Ujk is represented by the formula
x1 = x1 ∧ . . . ∧ xk = xk ∧ xk+1 = xj .
Further details are omitted.
(17.5) Lemma If a (partial) function is a composite (i.e., got by substitution) of representable functions, then it is representable.
Partial proof. In the case f (n) = g ◦ h(n). Given that G(x1 , x2 ) represents g and H(x1 , x2 )
represents h, f is represented by the formula
F (x1 , x2 ) :
∃x3 (H(x1 , x3 ) ∧ G(x3 , x2 )).
Primitive recursion is a longer story. Gödel used his so-called β-function β(r, s, t) by means of
which any finite sequence of numbers ti , 0 ≤ i ≤ n, say, can be generated as a sequence β(b, c, i),
0 ≤ i ≤ n. This property of the β-function depends on the Chinese Remainder Theorem. I don’t
think that the CRT can be proved in PA — indeed, it hardly makes sense in PA being a property
of sequences of numbers rather than individual numbers. But any ground instance of the CRT is
provable in PA, which is enough for the ‘representability’ theorem.
We could work with another, more intuitive function, if we think of bitstrings. We could, for
example, use the bitstring
10t0 110t2 1 . . . 10tn 1
to represent the n + 1-tuple t0 , . . . , tn , and create a suitable formula of PA for extracting the i-th
number from this list.
(17.6) Lemma If f (~n, r) is obtained from representable functions g(~n) and h(~n, r, f (~n, r)) by primitive recursion, then f is also representable.
Sketch proof. Suppose that G and H represent the functions f and g. To evaluate f (~n, r), one
would naturally calculate f (~n, 0), f (~n, 1), and so on. That is, there exists a sequence
x0 = g(~n), x1 = h(~n, 0, x0 ), x2 = h(~n, 1, x1 ), . . . , xr = h(~n, r − 1, xr−1 ).
Every sequence x0 , . . . , xn can be generated using Gödel’s β-function or some variant, and the formulae G and H will pin down the unique correct sequence and hence the value of xr = f (~n, r).
Let us reconsider the Universal Turing machine U . It is clearly possible to represent a sequence
of configurations of any Turing machine as a sequence of bitstrings. A (partial) computation of a
Turing machine Ty , on input z, could be represented by a sequence
yz
C1
C2
49
C3 . . .
Cn
Here C1 represents the initial configuration on input z, and no configuration Cj (represented as a
bitstring) has more than 5 1s in succession, so we can interleave blocks of — let us say 11 – 1s and
allow the entire sequence to be represented unambiguously as a single bitstring.
It is decidable of a string of this kind whether C1 , . . . , Cn is a partial computation of Ty on input
z, and if so, whether it is a halting computation, and if so, what the result of the computation is. This
is easy to believe. But more strongly:
(17.7) Proposition There are primitive recursive functions f and g such that on input x, f (x) = 1
if the length-lexicographical bitstring corresponding to x encodes a sequence y, z, C1 , . . . , Cn where
y is a valid TM encoding and C1 , . . . , Cn is a halting computation on input z, otherwise f (x) = 0;
and if f (x) = 1 then g(x) is the number encoded (length-lexicographical) by the bitstring produced
by the computation.
Now a new design of Universal Turing machine is possible. This time, on input x, if the input
can be factorised as yz where Ty exists, then the universal machine can cycle through all possible
bitstrings w, each time testing whether yzw encodes a halting configuration of Ty on input z; this test
is a ‘primitive recursive problem,’ and all the guesswork is left in the suffix w. In consequence
(17.8) Theorem There are primitive recursive functions f (y, z, w) and g(y, z, w) such that for any
semicomputable function φ, there exists y, such that
• If φ(z) ↓ u, then there exists a w such that f (y, z, w) = 1 and g(y, z, w) = u;
• If φ(z) ↑ then for all w f (y, z, w) = 0 and g(y, z, w) is irrelevant.
18
Gödel, Tarski, Church
Restating our result about Universal Turing machines
There are primitive recursive functions f (y, z, w) and g(y, z, w) such that
• If Ty exists and w encodes a computation of Ty and z is its input and w is a halting computation;
w is a halting computation on input z,
• then f (y, z, w) = 1, and g(y, z, w) is the number computed
• otherwise f (y, z, w) = 0, and g(y, z, w) is irrelevant.
Then the universal Turing machine can be intepreted (differently from its previous definition) as
computing the function
φy (z) = g(y, z, min{w : f (y, z, w) = 1})
Uparrow, Downarrow. Write φm (n) ↓ if φm (n) is defined, and ↑ if undefined.
(18.1) Lemma φy (z) is representable by the formula
A(x1 , x2 , x3 ) : ∃x4 F (x1 , x2 , x4 , 1) ∧ G(x1 , x2 , x4 , x3 )
where F (x1 , x2 , x4 , x5 ) represents f (x1 , x2 , x4 ) = x5
and G(x1 , x2 , x4 , x3 ) represents x3 = g(x1 , x2 , x4 ).
50
The phi-listing. We let φm (. . .) represent the function computed by Turing machine Tm , if it
exists; else φm is nowhere defined. Thus φm (n) = r means that m encodes a valid Turing machine
which halts on input n — with output r.
(18.2) Theorem (the fixpoint or recursion theorem). For any (fully) computable function f , there
exists an n such that φn = φf (n) .
Proof. Given any x, there is a Turing machine which first ignores its input n attempts to compute
φx (x), and if successful and the result is r, simulates Tr on input n.
This Turing machine more-or-less calculates φφx (x) (n). This Turing machine can be computed
directly from x. The encoding of this Turing machine is a recursive function g(x) of x.
If φx (x) ↑ then this Turing machine never halts. But we can say: there is a total computable
function g(x) such that
φg(x) (. . .) = φφx (x) (. . .) if φx (x) ↓
φg(x) () ↑ if φx (x) ↑
Choose m so that φm = f ◦ g, and let n = g(m). Then
φn = φg(m) = φφm (m) = φf ◦g(m) = φf (n)
(18.3) Definition Two sets A, B ⊆ N are recursively inseparable if no computable set C has the
property that A ⊆ C and C ∩ B = ∅.
(18.4) Corollary The sets
{m : φm (0) ↓ 0}
and
{m : φm (0) ↓ 1}
are recursively inseparable.
Proof. Write X for the first set and Z for the second. Suppose there existed a recursive (computable) set Y which separated them. Fix any a ∈ X and b ∈ Z. Define
(
b if x ∈ Y
f (x) =
a if x ∈
/ Y.
The assumption that Y is computable, i.e., its characteristic function is, implies that f is totally
computable, so by the recursion theorem, there exists n so that φn = φf (n) .
If n ∈ Y , then f (n) = b, so φn = φb , so n ∈ Z, which is impossible.
If n ∈
/ Y then f (n) = a, so φn = φa , so n ∈ X, which is impossible. Q.E.D.
(18.5) Corollary The set of theorems of PA and the set of formulae which are false in N are recursively inseparable.
51
Proof. Let A(x1 , x2 , x3 ) be a formula representing φm (n) = r. With X and Z as in Corollary
18.4, if m ∈ X, i.e., φm (0) = 0, then
⊢PA A(m, 0, 0)
and if φm (0) = 1, i.e., m ∈ Z, then A(m, 0, 0) is false in N. Since X and Z are recursively
inseparable, so are these sets. Q.E.D.
(18.6) Corollary (Tarski’s Theorem). The set of formulae of PA false in N is not computable.
(18.7) Corollary (Gödel undecidability theorem). There is no way of deciding which (closed)
formulae of PA are theorems.
Proof. In other words, the above set Y is undecidable: this follows immediately from the recursive
inseparability of Y from Z.
(18.8) Corollary (Gödel-Rosser Incompleteness Theorem). PA is incomplete.
Proof. The theorems of PA are semidecidable. Equivalently, there is a Turing machine which
halts on those inputs encoding theorems of PA (and loops otherwise).
We assume PA is consistent, if not there’s little to be said. It is, of course, consistent since it
describes ‘known’ properies of N.
If PA were complete, then for every closed formula A, exactly one of A or ¬A would be a theorem.
one could ‘run two Turing machines in parallel,’ on a given closed formula F , one attempting to prove
F and the other to prove ¬F . Assuming PA were complete, one or other would eventually succeed.
If F is proved, it is a theorem. If ¬F is proved, F is not a theorem (since PA is consistent). This
gives a decision procedure for PA, which is impossible: hence PA is incomplete. Q.E.D.
(18.9) Corollary (Church’s Theorem). In (almost) any predicate calculus, theoremhood is undecidable.
Sketch of proof. We have shown, or rather cited, the fact that any partial recursive function is
representable in PA. If one worked through the details one would see that only finitely many axioms
of PA are needed for the proof. Hence there is a finite subtheory N of PA which has that property that
every partially recursive function is representable in N. The above theorems: recursive inseparability,
Gödel undecidability, Gödel-Rosser, hold for the theory N.
But N has finitely many proper axioms which can be gathered together into a single closed formula
C. By Gödel’s undecidability theorem, it is undecidable for a (closed) formula A whether
⊢N A
But this is equivalent to
C⊢A
or
⊢C⇒A
from the Deduction Theorem. Therefore it is undecidable whether C ⇒ A is a theorem of Predicate
Calculus, so theoremhood is generally undecidable. Q.E.D.
52
19
Gödel’s first incompleteness theorem
Gödel’s first incompleteness theorem was different from what we have seen. It says
(19.1) Theorem One can construct a closed formula G in PA (and in many weaker subtheories)
such that if PA is consistent (which we can see is clearly true) then G is not a theorem and if PA is
ω-consistent (which we can see is clearly true) then ¬G is not a theorem.
(19.2) Definition A theory N describing (which includes the language of PA) is ω-consistent if for
any formula A(x1 ) (with 1 free variable), if for n = 0, 1, 2, . . .
⊢N ¬A(n)
then
∃x1 A(x1 )
is not a theorem.
Construction of the Gödel sentence G begins by encoding the formulae (of PA) as numbers. This
is a Gödel numbering of the theory. We have seen how to encode Turing machines: one could proceed
much the same way with formulae. We will not go into any detail of the encoding. The following
functions are primitive recursive.
• D(x1 , x2 ): the ‘Diagonal function.’ Whenever m is the Gödel number of a formula A(x1 ) with
just one free variable x1 , then x2 is the Gödel number of the formula A(m).
• Prf(x1 , x2 ): Prf(m, n) is true (meaning provable in PA) iff m is a Gödel number of a formula
F of PA and n is the Gödel number of a proof of F in PA.
Then there is a semicomputable relation
Bew(x1 ) ⇐⇒ ∃x2 Prf(x1 , x2 )
‘beweisbar,’ Gödel’s famous provability relation. Bew(n) is true in N, and provable in PA, if and
only if n is the Gödel number of a theorem of PA.
(19.3) Definition If C is a formula of PA, then
⌈C⌉
is the numeral m where m is the Gödel number of the formula C.
(19.4) Lemma (Diagonalisation Lemma — Fixpoint Lemma). Let B(x1 ) be a formula with exactly one free variable. Then there exists a closed formula C such that
⊢PA C ⇐⇒ B(⌈C⌉)
53
Proof. Let A(x1 ) be the formula
∀x2 D(x1 , x2 ) ⇒ B(x2 ).
Let m be the Gödel number of A(), and let C be the closed formula
A(m)
Notice that
∀x2 (D(m, x2 ) ⇐⇒ x2 = ⌈C⌉)
is a theorem of PA.
RTP C ⇐⇒ B(⌈C⌉). First assume C:
∀x2 (D(m, x2 ) ⇒ B(x2 ))
D(m, ⌈C⌉) ⇒ B(⌈C⌉)
from which B(⌈C⌉) follows immediately.
Then assume B(⌈C⌉).
B(⌈C⌉)
x2 = ⌈C⌉ =⇒ B(x2 )
D(m, x2 ) ⇒ x2 = ⌈C⌉
so, from transitivity of ⇒,
D(m, x2 ) ⇒ B(x2 ).
Generalising, we get C again.
Proof of Gödel’s first incompleteness theorem. By the Diagonalisation lemma, one can construct a formula G such that
⊢PA G ⇐⇒ ¬Bew(G).
Suppose G were a theorem of PA. Then by analysing a proof of G we could construct the Gödel
number of a proof of G, call it n:
Prf(⌈G⌉, n)
from which
∃x2 Prf(⌈G⌉, x2 )
follows directly; i.e., Bew(⌈G⌉), which is equivalent to ¬G: PA would be inconsistent.
If ¬G were a theorem of PA, then assuming PA is consistent, no proof of G can be found. Therefore for each n,
¬Prf(⌈G⌉, n).
Assuming PA is ω-consistent,
∃x2 Prf(⌈G⌉, x2 )
is not a theorem of PA, i.e., Bew(G) is not a theorem of PA. This contradicts the assumption that ¬G
is a theorem of PA.
54
20
Gödel’s second incompleteness theorem
Gödel’s second incompleteness theorem is that the consistency of PA cannot be proved in PA — on
the basis that PA is consistent. The proof involves primitive recursive functions.13
Gödel’s second incompleteness theorem is proved easily with the help of the so-called HilbertBernays derivability conditions.
(20.1) Lemma (HB1). If A is provable (in PA) then Bew⌈A⌉ is provable in PA.
Proof. Let m be the Gödel number of A and let n be the Gödel number of a proof of A in PA.
Then
Prf(m, n)
from which Bew(m), i.e., Bew⌈A⌉, is easily deduced.
The proof of Bew⌈A⌉ was derived from a proof of A, not from A itself. It is certainly not true in
general that
⊢PA A ⇒ Bew⌈A⌉.
The third derivability theorem will show one class of formulae A for which it is true.
(20.2) Lemma (HB2). For any formulae X and Y ,
Bew⌈X⌉ ∧ Bew⌈X ⇒ Y ⌉
=⇒
Bew⌈Y ⌉.
HB2 is quite easy.
Condition HB3 is the third and trickiest. It takes considerable analysis.
(20.3) Lemma (HB3). For any formula A,
⊢PA Bew⌈A⌉ =⇒ Bew⌈Bew⌈A⌉⌉
(20.4) Definition
ConsisPA
abbreviates ¬Bew⌈0 6= 0⌉
(20.5) Theorem (Gödel’s second incompleteness theorem). If (or rather, because) PA is consistent,
ConsisPA is not a theorem of PA.
Proof. Let G be a (or the) Gödel sentence. By Gödel’s Theorem, because PA is consistent, G is
not a theorem of PA. Let X abbreviate Bew⌈G⌉ corrected,14 so
⊢PA G ⇐⇒ ¬X
It is enough to prove
⊢PA ConsisPA =⇒ G.
0. X (Hyp)
1. G =⇒ ¬X (GiT 1)
13
14
I’m not certain this is necessary, but it’s at least conventional. See George Boolos, ‘The unprovability of consistency.’
previously X was ¬Bew⌈G⌉.
55
2. Bew⌈G =⇒ ¬X⌉ (HB1)
3. Bew⌈G⌉ =⇒ Bew⌈¬X⌉ (HB2), i.e.,
X =⇒ Bew⌈¬X⌉
4. Bew⌈¬X⌉ (0,3,MP)
5. Bew⌈G⌉ =⇒ Bew⌈Bew⌈G⌉⌉ (HB3),i.e.,
X =⇒ Bew⌈X⌉
6. Bew⌈X⌉ (0,4,MP)
7. X =⇒ (¬X ⇒ 0 6= 0) (Tautology)
8. Bew⌈X =⇒ (¬X ⇒ 0 6= 0⌉ (HB1)
9. Bew⌈X⌉ =⇒ (Bew⌈¬X⌉ ⇒ Bew⌈0 6= 0⌉⌉) (HB2 twice)
10. Bew⌈0 6= 0⌉ (4,6,9,MP twice)
11. X =⇒ ¬ConsisPA (0–10, DT)
Hence ConsisPA =⇒ ¬X. Since ¬X =⇒ G, ConsisPA ⇒ G. But G is (granted PA is
consistent) unprovable in PA, and therefore ConsisPA is unprovable in PA.
21
Syllabus for 2017 exam
You will be asked to attempt 3 questions out of 4.
Synopsis of course topics:
The 5 quizzes.
Turing machines: construction of a Turing machine to perform some relatively simple task. Definition of computable and semicomputable functions. The Halting Problem. Universal Turing machines,
listing semicomputable functions φn , and the fixed point theorem (recursion theorem).
Propositional logic: resolution refutations applied to simple examples.
Resolution is complete
Axioms I–III, Modus Ponens
Deduction theorem in propositional logic
Completeness of SC: If a formula X is a tautology, then ¬X is equivalent to a CNF which can be
disproved using resolution, hence X can be proved in SC.
First-order languages: expressions, terms, formulae
Interpretations, ‘snapshots,’ and models
σ
If I, σ |= ∀xj A(xj ) and t is free for xj in A then I, σ j7→t |= A(t).
The five logical axiom schemes. Generalisation
Deduction theorem (with restrictions) in first-order logic.
The ‘fix xi ’ rule: its correct usage, and justification.
Prenex normal forms. Skolem functions and conversion of prenex form to Skolem form.
Completeness of theories in Skolem form.
56
Peano arithmetic: axioms.
Representability of semicomputable functions.
Gödel, Tarski, Church.
Gödel numbering, diagonalisation lemma, and Gödel’s First incompleteness theorem.
The Hilbert-Bernays derivability conditions and Gödel’s Second incompleteness theorem.
57
					 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                            