\documentclass[10pt,a4paper]{article}
\usepackage{geometry}
\geometry{a4paper,textheight=9.3in}
%\usepackage[parfill]{parskip} % Activate to begin paragraphs with an empty line rather than an indent
\usepackage{graphicx}
\usepackage{amsmath}
\usepackage{amssymb}
\usepackage{amsthm}
\usepackage{thm-restate}
\usepackage{bussproofs}
\newtheorem{defn}{Defn.}
\newtheorem{theorem}{Theorem}
\newtheorem{schema}{Schema}
\newtheorem{axiom}{Axiom}
\newcommand{\lif}{\to}
\renewcommand{\equiv}{\leftrightarrow}
\newcommand{\Q}{$\mathsf{Q}$}
\newcommand{\PA}{$\mathsf{PA}$}
\newcommand{\gd}{G\"odel}
\newtheorem*{repeatdefn}{}
\frenchspacing
%\title{{\Large \emph{G\"odel Without Tears}}\\[8pt] 1: Incompleteness -- the very idea}
%\author{Peter Smith}
%\date{} % Activate to display a given date or no date
\begin{document}
%\vskip{60pt}
\begin{center}%
{{\Large \emph{G\"odel Without (Too Many) Tears -- 1}}\\[16pt]{\LARGE Incompleteness -- the very idea} \par%
\vskip 1.5em%
{\large
\lineskip .75em%
\begin{tabular}[t]{c}%
Peter Smith
\end{tabular}\par}}%
\vskip 0.75em%
{University of Canterbury, Christchurch, NZ}\\[6pt]
{April 7, 2010}%
\vskip 1.5em%
\end{center}%\par
\noindent\hrulefill
\begin{itemize}\setlength{\itemsep}{0pt}
\item The notion of effective decidability
\item What's a formalized language?
\item What's a formal axiomatized theory?
\item What's negation incompleteness?
\item `Deductivism' about basic arithmetic
\item Two versions of G\"odel's First Incompleteness Theorem
\item The First Theorem as an \emph{incompletability} theorem
\item How did G\"odel prove (one version of) the First Incompleteness Theorem?
\end{itemize}
\noindent\hrulefill
\vspace{8pt}\noindent Why these notes? After all, I've already written a pretty detailed book, \emph{An Introduction to G\"odel's Theorems} (CUP, heavily corrected fourth printing 2009: henceforth \emph{IGT}). Surely that's more than enough to be going on with?
Ah, but there's the snag. It \emph{is} more than enough. In the writing, as is the way with these things, the book grew far beyond the scope of the lecture notes from which it started. And while I hope the result is still pretty accessible to someone prepared to put in the time and effort, there is -- to be frank -- a \emph{lot} more in the book than is really needed by philosophers meeting the incompleteness theorems for the first time, or indeed by mathematicians wanting a brisk introduction. You might reasonably want to get your heads around only those technical basics which are actually necessary for understanding how the theorems are proved and for appreciating philosophical discussions about incompleteness.
So you need a cut-down version of the book -- an introduction to the \emph{Introduction}! Well, isn't that what lectures are for? Indeed. But there's another snag. I haven't got many lectures to play with. So either (A) I crack on at a very fast pace (hard-core mathmo style), cover those basics, but perhaps leave too many people puzzled and alarmed. Or (B) I do relaxed talk'n'chalk, highlighting the really Big Ideas, making sure everyone is grasping them as we go along, but inevitably omit important stuff and leave quite a gap between what happens in the lectures and what happens in the book. What to do?
I'm going for plan (B). But then I ought to do something to fill that gap between lectures and book. Hence these notes.
The idea, then, is to give relaxed lectures, highlighting Big Ideas, not worrying too much about depth or fine-detail (nor even about getting through \emph{all} of the day's intended menu of topics). These notes then expand things just enough, and give pointers to relevant chunks of \emph{IGT}. Though I hope these notes will be to a fair extent be stand-alone, and tell a brief but coherent story read by themselves: so occasionally I'll copy a paragraph or two from the book, rather than just refer to them. And the notes come with a Logical Health Warning: in the interests of relative brevity, I'll occasionally have to apply that good maxim `Where it doesn't itch, don't scratch'. In other words, sometimes I'll say things that are not utterly rigorous, but I hope in unworrying ways that can be easily remedied if you are feeling pernickety. %If you are bright enough to spot the slight cheats or corner-cutting, you should be bright enough to spot how to repair what I say, at the cost of a bit of fuss and bother, so no harm done!
The bullet-pointed headers to each helping of notes -- to each episode, as I'll call it -- give pointers/reminders to the coverage.
A final introductory remark. If you notice any typos/thinkos in these notes and/or the latest printing of the first edition of \emph{IGT} please let me know (\textsf{peter$\_\!\_$smith@me.com}). In due course, there will be a second edition of the book; so I'd also be very grateful for any more general comments about the book that might help me improve the book. Some further relevant materials, plus the latest version of these notes, can be found at \textsf{www.logicmatters.net}.
%Warning: just occasionally in these notes, I'll no doubt apply that good maxim `Where it doesn't itch, don't scratch'. In other words, sometimes I'll say things that are not utterly rigorous, but I hope in unworrying ways that can be easily repaired. %If you are bright enough to spot the slight cheats or corner-cutting, you should be bright enough to spot how to repair what I say, at the cost of a bit of fuss and bother, so no harm done!
\section{Kurt G\"odel (1906--1978)}
The greatest logician of the twentieth century. Born in what is now Brno. Educated in Vienna. At 23, his doctoral dissertation established the \emph{completeness} theorem for the first-order predicate calculus (i.e. a standard proof system for first-order logic indeed captures all the valid inferences -- where validity is defined semantically). Later he would do immensely important work on set theory, as well as make contributions to proof theory. He even later wrote on models of General Relativity. Talk of `G\"odel's Theorems', however, typically refers to his two \emph{incompleteness} theorems in an epoch-making 1931 paper.\footnote{Yes, G\"odel proved a `completeness theorem' and `incompleteness theorems'. By the end of this first episode you should be able to tell the difference!}
G\"odel left Austria for the USA in 1938, and spent rest of his life at the Institute of Advanced Studies at Princeton. Always a perfectionist, after the mid 1940s he more or less stopped publishing.
For a brief overview of his life and work, see \textsf{http://en.wikipedia.org/wiki/Kurt\_G\"odel}, or better -- though you'll need to skip -- \textsf{http://plato.stanford.edu/entries/goedel}. There's a very nice biography, John Dawson \emph{Logical Dilemmas} (A. K. Peters, 1997), which will also give you a real sense of the logical scene in the glory days of the 1930s.
\section{{`On\; formally\; undecidable\; propositions\; of\;
\emph{Principia} {\emph{Mathematica}} and related systems I'}}
This is the title of the 1931 paper which proves the First Incompleteness Theorem and states the Second Theorem. (The `I' indicates that it is the first part of what was going to be a two part paper, with Part II spelling out the proof of the Second Theorem. But that was never written. I'll explain later why G\"odel didn't need to bother.)
Even the title gives us a number of things to explain. What's a `formally undecidable proposition'? What's \emph{Principia Mathematica}? -- you've heard of it, no doubt, but what's the project of that triple-decker work? What counts as a `related system'? In fact, just what is meant by `system' here? We'll take the last question first.
\subsection{`Systems' -- i.e. formal axiomatized theories}
Our concern is with systems in the sense of \emph{formal axiomatized theories}. $T$ is such a theory if it has (i) an effectively formalized language $L$, (ii) an effectively decidable set of axioms, (iii) an effectively formalized proof-system in which we can deduce theorems from the axioms.
To explain, we first need a definition:
\begin{defn}\label{defn_decidable}
A property $P$ defined over a domain $D$ is \emph{effectively decidable} iff there's an algorithm for settling in a finite number of steps, for any $o \in D$, whether $o$ has property $P$ -- i.e. there's a step-by-step mechanical routine for settling the issue, a suitably programmed computer could in principle do the trick. A set $\Sigma$ is effectively decidable if the property of being a member of that set is effectively decidable.
\end{defn}
\noindent Now take in turn those conditions (i) to (iii) for being a formal axiomatized theory.
(i) We'll assume that the general idea of a formal $L$ is familiar from earlier logic courses. There will be a \emph{syntax} which fixes which strings of symbols form terms, which form wffs, and in particular which strings of symbols form \emph{sentences}, i.e. closed wffs with no unbound variables dangling free. And crucially, to emphasize what is perhaps not emphasized in introductory courses,
\begin{defn}\label{formal_lang}
For an effectively formalized language $L$, the basic alphabet of $L$ is to be finite, and the syntactic rules of $L$ must be such that the properties of being a term, a wff, a wff with one free variable, a sentence, etc., are effectively decidable.
\end{defn}\noindent NB, the restriction to a finite basic vocabulary still allows us, e.g., to have an infinite supply of variables: for example, given the two symbols `$\mathsf{x}$' and `$'$' we can construct an infinite supply of composite symbols $\mathsf{x}$, $\mathsf{x'}$, $\mathsf{x''}$, $\mathsf{x'''}$.\footnote{In some contexts, for technical purposes, there is interest in talking about formal languages with an uncountably infinite number of primitives. That's why the finiteness constraint needs to be made explicit.} As to the effective decidability of the properties of being a term, etc., the point of setting up a formal language is usually (inter alia) precisely to put issues of what is and isn't a sentence beyond dispute, so we want to be able effectively to decide whether a string of symbols is or is not a sentence. A formal interpreted language will also normally have an intended \emph{semantics} which gives the interpretation of $L$, fixing truth conditions for each $L$-sentence -- again, the semantics should be presented in such a way that we can mechanically read off from the interpretation rules the interpretation of any given sentence.
(ii) A theory $T$ built in language $L$ will have a certain class of $L$-sentences picked out as \emph{axioms}.\footnote{We'll allow the class of axioms to be null. It should be familiar that we can trade in axioms for rules of inference -- though we can't trade in all rules of inference for axioms if we want to be able to deduce anything: cf. Lewis Carroll's Achilles and the Tortoise!} Again it is to be \emph{effectively decidable} what's an axiom. (After all, if we are making a theory rigorous, but then can't routinely tell whether a given sentence is one of its axioms, that would -- usually -- be pretty pointless.)
(iii) Just laying down a bunch of axioms would normally be pretty idle if we can't deduce conclusions from them! So a formal axiomatized theory $T$ comes equipped with a proof-system, a set of rules for deducing further theorems from our initial axioms. But describing a proof-system such that we couldn't then routinely tell whether its rules are in fact being followed wouldn't have much point. Hence we naturally require that it is effectively decidable whether a given array of wffs is indeed a proof from the axioms according to the rules. It doesn't matter for our purposes whether the proof-system is e.g. a Frege/Hilbert axiomatic logic, a natural deduction system, a tree/tableau system -- so long as it is indeed effectively checkable that a candidate proof-array has the property of being properly constructed according to the rules.
So, in summary
\begin{defn}\label{formal_theory}
A formal axiomatized theory $T$ has an effectively formalized language $L$, a certain class of $L$-sentences picked out as \emph{axioms} where it is decidable what's an axiom, and it has a proof-system such that it is effectively decidable whether a given array of wffs is indeed a proof from the axioms according to the rules.
\end{defn}
\noindent Careful, though! To say that, for a properly formalized theory $T$ it must be effectively decidable whether a given purported $T$-proof of $\varphi$ is indeed a kosher proof according to $T$'s deduction system is not, repeat \emph{not}, to say that it must be effectively decidable whether $\varphi$ has a proof. It is one thing to be able to effectively \emph{check} a proof once proposed, it is another thing to be able to effectively \emph{decide in advance} whether there is exists a proof to be discovered. (It will turn out, for example, that any formal axiomatized theory $T$ containing a certain modicum of arithmetic is such that, although you can mechanically check a purported proof of $\varphi$ to see whether it \emph{is} a proof, there's no general way of telling of an arbitrary $\varphi$ whether it is provable in $T$ or not.)
\subsection{Notational conventions}
Before going on, we should highlight a couple of useful notational conventions that we'll be using from now on in these notes (the same convention is used in \emph{IGT}, and indeed is not an uncommon one):
\begin{enumerate}
\item Particular expressions from formal systems -- and abbreviations of them -- will be in \textsf{sans serif} type. Examples: $\mathsf{SS0 + S0 = SSS0}$, $\mathsf{\forall x\,Sx \neq 0}$. [Blackboard convention: overline formal wffs when clarity demands. Bracketing will tend to be casual.]
\item Expressions in informal mathematics will be in ordinary serif font (with variables, function letters etc. in italics). Examples: $2 + 1 = 3$, $n + m = m + n$, $S(x + y) = x + Sy$.
\item Greek letters, as in the `$\varphi$' we've just used, are schematic variables in the metalanguage in which we talk about our formal systems.
\end{enumerate}
For more explanations, see \emph{IGT}, \S\S2.2, 3.1--3.3, 4.1.
\subsection{`Formally undecidable propositions' and negation incompleteness}
\begin{defn}`\/$T \vdash \varphi$' says: there is a formal deduction in $T$'s proof-system from $T$-axioms to the sentence $\varphi$ as conclusion. If $\varphi$ is a sentence and $T \vdash \varphi$, then $\varphi$ is said to be a \emph{theorem} of $T$.
\end{defn}
\noindent So NB, `$\vdash$' officially signifies provability in $T$, a formal syntactically definable relation, not semantic entailment.
\begin{defn}
If $T$ is a theory, and $\varphi$ is some sentence of the language
of that theory, then $T$ \emph{{formally decides}} $\varphi$ iff either
$T \vdash \varphi$ or $T \vdash \neg\varphi$.
\end{defn}
\noindent Hence,
\begin{defn}\label{def:formallyundecidable}A sentence $\varphi$ is \emph{{formally undecidable}} by $T$
iff $T \nvdash \varphi$ and $T \nvdash \neg\varphi$.
\end{defn}
Another bit of terminology:
\begin{defn}
A theory $T$ is \emph{{{negation complete}}} iff it formally decides
every closed wff of its language -- i.e. for every sentence $\varphi$, $T \vdash \varphi$
or $T \vdash \neg\varphi$
\end{defn}
\noindent Trivially, then, there are `formally undecidable propositions' in $T$ if and only if
$T$ isn't negation complete.
Of course, it is very easy to construct negation-incomplete theories: just leave out some necessary basic assumptions about the matter in hand! But suppose we are trying to fully pin down some body of truths using a formal theory. We fix on an interpreted formal language $L$ apt for expressing such truths. And then we'd ideally like to build a theory $T$ in $L$, whose axioms are such that when (but only when) $\varphi$ is true, $T \vdash \varphi$. So, making the classical assumption that either $\varphi$ is true or $\neg\varphi$ is true, we'd like $T$ to be such that either $T \vdash \varphi$ or $T \vdash \neg\varphi$. Negation completeness, then, is a natural desideratum for theories.
For more explanations, see \emph{IGT}, \S3.4.
\subsection{Deductivism, logicism, and \emph{Principia}}
The elementary arithmetic of successor (`next number'), addition, and multiplication is child's play (literally!). It is entirely plausible to suppose that, whether the answers are readily available to us or not, questions posed in what we'll call \emph{the language of basic arithmetic} -- i.e. the language of successor, addition, and multiplication plus familiar first-order logical apparatus -- have entirely determinate answers. These answers are surely `{{fixed}}' by (a) the fundamental zero-and-its-successors structure of the natural number series (with zero not being a successor, every number having a successor, distinct numbers having distinct successors, and so the sequence of zero and its successors never circling round but marching off for ever) plus (b) the nature of addition and multiplication as given by the school-room explanations.
So it is surely plausible to suppose that we should be able lay down a bunch of axioms which characterize the number series, addition and multiplication (which codify what we teach the kids), and that these axioms should settle every truth of basic arithmetic, in the sense that every such truth of the language of successor, addition, and multiplication is logically provable from these axioms. For want of a standard label, call this view \emph{{deductivism}} about basic arithmetic.
What could be the status of the axioms? I suppose you might, for example, be a Kantian deductivist who
holds that the axioms encapsulate `intuitions' in which we grasp
the fundamental structure of the numbers and the nature of addition and multiplication,
where these `intuitions' are a special cognitive achievement in which we somehow
represent to ourselves the arithmetical world.
But talk of intuition is very puzzling and problematic. So we might well be tempted instead by Frege's view that the axioms
are \emph{analytic}, truths of logic
or rather of logic-plus-definitions. On this view, we don't need Kantian `intuitions' going beyond logic: logical reasoning alone is enough.
The Fregean brand of deductivism is standardly dubbed `logicism'.
Famously, Frege's attempt to be a logicist deductivist about arithmetic (in fact, for him, more than basic arithmetic) hit the rocks, because -- as Russell showed -- his logical system is in fact inconsistent in a pretty elementary way (it is beset by Russell's Paradox). That devastated Frege, but Russell was undaunted, and still gripped by deductivist ambitions he wrote:
\begin{quote}
All mathematics [yep! -- \emph{all} mathematics] deals exclusively with concepts definable in terms of a very
small number of logical concepts, and \dots all its propositions are deducible
from a very small number of fundamental logical principles.
\end{quote}
That's a big promisory note in Russell's \emph{The Principles of Mathematics} (1903). And \emph{Principia Mathematica} (three volumes, though unfinished, 1910, 1912, 1913) is Russell's attempt with Whitehead to make good
on that promise. The project is to set down some logical axioms and definitions and deduce the laws of basic arithmetic (and then more) from them. Famously, they eventually get to prove that $1 + 1 = 2$ at *110.643 (Volume II, page 86), accompanied by the wry comment, `The above proposition is occasionally useful'.
\subsection{G\"odel's bomb}
\emph{Principia}, frankly, is a bit of a mess -- in terms of clarity and rigour, it's quite a step backwards from Frege. And there are technical complications which mean that not all \emph{Principia}'s
axioms are clearly `logical' even in a stretched sense. In particular, there's an appeal to a brute-force \emph{{Axiom of Infinity}} which in effect states that there is an infinite number of objects; and then there is the notoriously dodgy \emph{Axiom of Reducibility}.\footnote{\emph{Principia} without the dodgy Axiom is a `type theory' which is quite nicely motivated, but you can't reconstruct much maths in it: the dodgy Axiom of Reducibility allows you to reconstruct classical maths by pretending that the type distinctions by which we are supposed to avoid paradox can be ignored when we need to do so: for more on this see \textsf{http://plato.stanford.edu/entries/principia-mathematica/}}. But leave those worries aside -- they pale into insignificance compared with the bomb
exploded by G\"odel.
For G\"odel's First Incompleteness Theorem shows that any form of
deductivism about even just basic arithmetic (not just \emph{Principia}'s) is in trouble.
Why? Well the proponent of deductivism about basic arithmetic (logicist or otherwise) wants to pin down first-order arithmetical truths about successor/addition/multiplication, without leaving any out: so he wants to give a negation-complete theory. \emph{And there can't be such a theory}. G\"odel's First Theorem says -- at a very rough first shot -- that \emph{nice theories containing enough basic arithmetic are always negation incomplete}.
So varieties of deductivism, and logicism in particular, must always fail. Which is a rather stunning result!\footnote{`Hold on! I've heard of neo-logicism which has its enthusiastic advocates. How can that be so if G\"odel showed that logicism is a dead duck?' Well, we might still like the idea that some logical principles plus what are more-or-less definitions together \emph{{semantically}} entail all arithmetical truths, while allowing that we can't capture the relevant entailment relation in a single properly axiomatized deductive system of logic. Then the resulting overall system of arithmetic won't count as a formal axiomatized theory of all arithmetical truth since its logic is not formalizable, and G\"odel's theorems don't apply.}
\section{The First Incompleteness Theorem, a bit more carefully}
\subsection{Two versions of the First Theorem}
Three more definitions. First, let's be a bit more careful about that idea of `the language of basic arithmetic':
\begin{defn}\label{def:langcontainsbasicarith}
The formalized language $L$ \emph{contains the language of basic arithmetic} if $L$ has at least the standard first-order logical apparatus (including identity), has a term `\/$\mathsf{0}$' which denotes zero and function symbols for the successor, addition and multiplication functions defined over numbers -- either built-in as primitives or introduced by definition -- and has a predicate whose extension is the natural numbers.
\end{defn}
\noindent The point of that last clause is that if `$\mathsf{N}$' is a predicate satisfied just by numbers, then the wff $\mathsf{\forall x(Nx \to \varphi(x))}$ says that every number satisfies $\varphi$; so $L$ can make general claims specifically about natural numbers. (If $L$ is already defined to be a language whose quantifiers run over the numbers, then you could use `$\mathsf{x = x}$' for `$\mathsf{N}$', or -- equivalently -- just forget about it!)
\begin{defn}
A theory $T$ is \emph{sound} if its axioms are true (on the interpretation built in to $T$'s language), and its logic is truth-preserving, so all its theorems are true.
\end{defn}
\begin{defn}
A theory $T$ is \emph{consistent} if there is no $\varphi$ such that $T \vdash \varphi$ and $T \vdash \neg\varphi$,
\end{defn}
\noindent where `$\neg$' is $T$'s negation operator. In a classical setting, if $T$ is inconsistent, then $T \vdash \psi$ for all $\psi$. And of course, trivially, soundness implies consistency.
G\"odel now proves (more accurately, gives us most of the materials to prove) the following:
\begin{restatable}{theorem}{Godelsemantic}\label{th:soundrichtheoriesincomplete}
If $T$ is a sound formal axiomatized theory whose language contains the language of basic arithmetic, then there will be a true sentence $\mathsf{G}_T$ of basic arithmetic such that $T \nvdash \mathsf{G}_T$ and $T \nvdash \neg \mathsf{G}_T$, so
$T$ is negation incomplete.
\end{restatable}
However that \emph{isn't} what is usually referred to as the First Incompleteness Theorem. For note, Theorem 1 tells us what follows from a \emph{semantic} assumption, namely that $T$ is sound. And soundness is defined in terms of truth. Now, post-Tarski, we aren't particularly scared of the notion of the truth. To be sure, there are issues about how best to treat the notion formally, to preserve as many as possible of our pre-formal intuitions while blocking versions of the Liar Paradox. But most of us think that we don't have to regard the general idea of truth as \emph{metaphysically} loaded in an obscure and worrying way. But G\"odel was writing at a time when, for various reasons (think logical positivism!), the very idea of truth-in-mathematics was under some suspicion. There were other reasons too for wanting to steers away from semantic notions, reasons to do with `Hilbert's program' about which more anon. So it was \emph{extremely} important to G\"odel to show that you don't need to deploy any semantic notions to get (again roughly) the following result:
\begin{restatable}{theorem}{Godelsyntactic}\label{thm:Godelsyntactic}
For any consistent formal axiomatized theory $T$ which contains a certain modest amount of arithmetic (and has a certain additional desirable property that any sensible formalized arithmetic will share), there is a sentence of basic arithmetic $\mathsf{G}_T$ such that $T \nvdash \mathsf{G}_T$ and $T \nvdash \neg \mathsf{G}_T$, so
$T$ is negation incomplete.
\end{restatable}
\noindent (Here `contains' means not just can express but \emph{can prove}.) Of course, we'll need to be a lot more explicit in due course, but that indicates the general character of G\"odel's result. The `contains a modest amount of arithmetic' is what makes a theory sufficiently related to \emph{Principia}'s for the theorem to apply -- remember the title of G\"odel's paper! I'll not pause in this first episode to spell out that just how much arithmetic that is, but we'll find that it is stunningly little. (Nor will I pause now to explain that `additional desirable property' condition. We'll meet it in due course, but also explain how -- by a cunning trick discovered by J. Barkley Rosser in 1936 -- how we can drop that condition.)
For the present, however, let's concentrate on the semantic version of G\"odel's theorem, i.e. Theorem 1.
\subsection{Theorem 1 is better called an \emph{incompletability} theorem}\label{sec:bettercalledincompleteability}
Suppose $T$ is a sound theory which can express claims of basic arithmetic. Then we can find a true $\mathsf{G}_T$ such that $T \nvdash \mathsf{G}_T$ and $T \nvdash \neg \mathsf{G}_T$. \emph{Of course, that \emph{doesn't} mean that $\mathsf{G}_T$ is `absolutely unprovable', whatever that could mean. It just means that $\mathsf{G}_T$-is-unprovable-in-$T$. }
Now, we might want to `repair the gap' in $T$ by adding $\mathsf{G}_T$ as a new axiom. So consider the theory $U = T + \mathsf{G}_T$ (to use an obvious notation). Then (i) $U$ is still sound (for the old $T$-axioms are true, the added new axiom is true, and the logic is still truth-preserving). (ii) $U$ is still a properly formalized theory, since adding an specified axiom to $T$ doesn't make it undecidable what is an axiom of the augmented theory. (iii) $U$ still can express claims of basic arithmetic. So G\"odel's First Incompleteness Theorem applies, and we can find a sentence $\mathsf{G}_U$ such that $U \nvdash \mathsf{G}_U$ and $U \nvdash \neg \mathsf{G}_U$. And since $U$ is stronger than $T$, we have a fortiori, $T \nvdash \mathsf{G}_U$ and $T \nvdash \neg \mathsf{G}_U$. In other words, `repairing the gap' in $T$ by adding $\mathsf{G}_T$ as a new axiom leaves some other sentences that are undecidable in $T$ \emph{still} undecidable in the augmented theory.
And so it goes. Keep chucking more and more additional true axioms at $T$ and our theory still remains negation-incomplete, unless it stops being sound or stops being effectively axiomatizable. In a good sense, $T$ is \emph{incompletable}.
\section{How did G\"odel prove the First Theorem (in the semantic version)?}\label{sec:howGprovedFirstTheorem}
Let's take a first pass at outlining how G\"odel proved the semantic version of his incompleteness theorem. Obviously we'll be coming back to this in a lot more detail later, but we can give just a flavour of what's going on. We kick off with two natural definitions.
\begin{defn}\label{def:standardnumerals}
If $L$ contains the language of basic arithmetic, so it contains a term 0 for zero and a function expression $\mathsf{S}$ for the successor function, then the terms $\mathsf{0}$, $\mathsf{S0}$, $\mathsf{SS0}$, $\mathsf{SSS0}$, \ldots, are $L$'s \emph{standard numerals}, and we'll use `\/${\overline{\mathsf{n}}}$' to abbreviate the standard numeral for $n$.
\end{defn}
\noindent (The overlining convention to indicate standard numerals is a pretty standard one.) Henceforth, we'll assume that the language of any theory we are interested in contains the language of basic arithmetic and hence has standard numerals denoting the numbers.
\begin{defn}\label{defn_expresses}
The formal wff $\varphi(\mathsf{x})$ of the interpreted language $L$ \emph{expresses} the numerical property $P$ iff $\varphi(\overline{\mathsf{n}})$ is true on interpretation just when $n$ has property $P$. Similarly, the formal wff $\psi(\mathsf{x, y})$ {expresses} the numerical relation $R$ iff $\psi(\overline{\mathsf{m}}, \overline{\mathsf{n}})$ is true just when $m$ has relation $R$ to $n$. And the formal wff $\chi(\mathsf{x, y})$ {expresses} the numerical function $f$ iff $\chi(\overline{\mathsf{m}}, \overline{\mathsf{n}})$ is true just when $f(m) = n$.
\end{defn}
\noindent The generalization to many-place relations/many-argument functions is obvious.
Then the proof of Theorem 1 in outline form goes as follows:
\begin{enumerate}
\item \emph{Set up a G\"odel numbering}\quad We are nowadays familiar with the idea that all kinds of data can be coded up using numbers. So suppose we set up a sensible (effective) way of coding wffs and sequences of wffs by natural numbers -- so-called G\"odel-numbering. Then, given a formal axiomatized theory $T$, we can define e.g. the numerical properties $\mathit{Wff}_T$, $\mathit{Sent}_T$, $\mathit{Prf}_T$ and ${Prov}_T$, where
\begin{quote}
$\mathit{Wff}_T(n)$ iff $n$ is the code number of a $T$-wff.\\
$\mathit{Sent}_T(n)$ iff $n$ is the code number of a $T$-sentence.\\
$\mathit{Prf}_T(m,n)$ iff $m$ is the code number of a $T$-proof of the $T$-sentence \mbox{with code number $n$}.\\
${Prov}_T(n)$ iff $n$ is the code number of $T$-theorem.
\end{quote}
\item \emph{Expressing such properties/relations inside $T$}\quad We next show that such properties/relations can be expressed inside $T$ by wffs of the formal theory belonging to the language of basic arithmetic [takes a bit of work!]. We show in particular how to build -- just out of the materials of the language of basic arithmetic -- an arithmetic formal wff we'll abbreviate $\mathsf{Prov}_T(\mathsf{x})$ that expresses the property ${Prov}_T$, so $\mathsf{Prov}_T(\overline{\mathsf{n}})$ is true exactly when ${Prov}_T(n)$, i.e. when $n$ is the code number of $T$-theorem. \item \emph{The construction: building a G\"odel sentence}\quad Next -- the really cunning bit, but surprisingly easy -- we show how to build a `G\"odel' sentence $\mathsf{G}_T$ such that $\mathsf{G}_T$ is in fact equivalent to $\neg\mathsf{Prov}_T(\overline{\mathsf{g}})$, where the standard numeral `$\overline{\mathsf{g}}$' is the numeral denoting the code-number for $\mathsf{G}_T$. In other words (think about it!!), $\mathsf{G}_T$ is true if and only if $\mathsf{G}_T$ isn't a theorem.
\item \emph{The argument}\quad Suppose $T \vdash \mathsf{G}_T$. Then $\mathsf{G}_T$ would be a theorem, and hence $\mathsf{G}_T$ would be false, so $T$ would have a false theorem and hence not be sound, contrary to hypothesis. So $T \nvdash \mathsf{G}_T$. So $\mathsf{G}_T$ is true. So $\neg\mathsf{G}_T$ is false and $T$, being sound, can't prove it. Hence we also have $T \nvdash \neg\mathsf{G}_T$.
\end{enumerate}
There are big gaps to fill there, but that's the overall strategy. (The proof of Theorem 2 then shows that we can get the same result using the same construction of a G\"odel sentence by dropping the assumption that $T$ is sound, so long as we require a bit more by way of what the theory $T$ can prove, and require $T$ to have that currently mysterious `additional desirable property'. More about this in due course)
Of course, you might immediately think something a bit worrying about our sketch. For basically, I'm saying we can construct an arithmetic sentence in $T$ that, via the G\"odel number coding, says `I am not provable in $T$'. But shouldn't we be suspicious about that? After all, we know we get into paradox if we try to play with sentences that say `I am not true'. So why does the self-reference in the Liar sentence lead to \emph{paradox}, while the self-reference in G\"odel's proof give us a \emph{theorem}? A very good question. I hope that over the coming episodes, the answer to that good question will become clear!
\vspace{10pt}\noindent Now read \emph{IGT}, \S\S1.1--3.4. (``But hold on! What's the Second Theorem that you mentioned?" Good question -- the opening chapter of $\emph{IGT}$ will tell you: but here in these notes we'll maintain the suspense as far as Episode 10!)
\newpage
%\setcounter{section}{0}
\setcounter{page}{1}
%\vskip{60pt}
\begin{center}%
{{\Large \emph{G\"odel Without (Too Many) Tears -- 2}}\\[16pt]{\LARGE Incompleteness and undecidability} \par%
\vskip 1.5em%
{\large
\lineskip .75em%
\begin{tabular}[t]{c}%
Peter Smith
\end{tabular}\par}}%
\vskip 0.75em%
{University of Canterbury, Christchurch, NZ}\\[6pt]
{April 7, 2010}%
\vskip 1.5em%
\end{center}%\par
\noindent\hrulefill
\begin{itemize}\setlength{\itemsep}{0pt}
\item The idea of a decidable theory
\item Any consistent, negation-complete, axiomatized formal theory T is decidable
\item Expressing and capturing properties, relations and functions
\item The idea of a sufficiently strong theory
\item No consistent, sufficiently strong, axiomatized formal theory is decidable
\item A consistent, sufficiently strong, axiomatized formal theory cannot be negation complete
\end{itemize}
\noindent\hrulefill
\vspace{8pt}\noindent In Episode 1, we introduced the very idea of a negation-incomplete formalized theory $T$. We noted that if we aim to construct a theory of basic arithmetic, we'll ideally like the theory to be able to prove \emph{all} the truths expressible in the language of basic arithmetic, and hence to be negation complete. But G\"odel's First Incompleteness Theorem says, very roughly, that {a nice theory $T$ containing enough arithmetic will always be negation incomplete}.
Now, the Theorem comes in two flavours, depending on whether we cash out the idea of being `nice enough' in terms of (i) the semantic idea of $T$'s being a \emph{sound} theory, or (ii) the idea of $T$'s being a \emph{consistent theory which proves enough arithmetic}. And we noted that G\"odel's own proofs, of either flavour, go via the idea of numerically coding up inside arithmetic syntactic facts about what can be proved in $T$, and then constructing an arithmetical sentence that -- via the coding -- in effect `says' \emph{I am not provable in $T$}.
We ended by noting that, at least at the level of arm-waving description that of Episode 1, the G\"odelian construction might look a bit worrying. After all, we all know that self-reference is dangerous -- think Liar Paradox! So is G\"odel's construction entirely legitimate?
As I hope will become clear as we go along, it certainly is. But first I think it might well go a little way towards calming anxieties that some illegitimate trick is being pulled, and it is certainly of intrinsic interest, if we give a different sort of proof of incompleteness that doesn't go via any worryingly self-referential construction. So now read on \ldots
\section{Negation completeness and decidability}
Let's start with another definition (sections, definitions and theorems will be numbered consecutively through these notes, to make cross-reference easier):
\begin{defn}
A theory $T$ is \emph{decidable} iff the property of being a theorem of $T$ is an effectively decidable property -- i.e. iff there is a mechanical procedure for determining, for any given sentence $\varphi$ of $T$'s language, whether $T \vdash \varphi$.
\end{defn}
It's then easy to show:
\begin{theorem}\label{th:negcompletearedecidable}
Any consistent, negation-complete, formal axiomatized theory $T$ is decidable.
\end{theorem}
\noindent\emph{Proof}\quad For convenience, we'll assume $T$'s proof-system is a Frege/Hilbert axiomatic logic, where proofs are just linear sequences of wffs (it will be obvious how to generalize the argument to other kinds of proofs systems, e.g. where proof arrays are trees).
Recall, we stipulated (in Defns~\ref{formal_lang},~\ref{formal_theory}) that if $T$ is a properly formalized theory, its formalized language $L$ has a finite number of basic symbols. Now, we can evidently put those basic symbols in some kind of `alphabetical order', and then start mechanically listing off all the possible strings of symbols in some kind of order -- e.g. the one-symbol strings, followed by the finite number of two-symbol strings in `dictionary' order, followed by the finite number of three-symbol strings in `dictionary' order, followed by the four-symbol strings, etc., etc.
Now, as we go along, generating sequences of symbols, it will be a mechanical matter to decide whether a given string is in fact a sequence of wffs. And if it is, it will be a mechanical matter to decide whether the sequence of wffs is a $T$-proof, i.e. check whether each wff is either an axiom or follows from earlier wffs in the sequence by one of $T$'s rules of inference. (That's all effectively decidable in a properly formalized theory, by Defns~\ref{formal_lang},~\ref{formal_theory}). If the sequence is a kosher, well-constructed, proof, then list its last wff $\varphi$, i.e. the theorem proved.
So, we can in this way, start mechanically generating a list of all $T$-theorems (any $T$-theorem has a proof, and so by churning through all possible strings of symbols, we churn through all possible proofs).
And that enables us to decide, of an arbitrary sentence $\varphi$ of our consistent, negation-complete $T$, whether it is indeed a $T$-theorem. Just start dumbly listing all the $T$-theorems. Since $T$ is negation complete, eventually either $\varphi$ or $\neg\varphi$ turns up (and then you can stop!). If $\varphi$ turns up, declare it to be a theorem. If $\neg\varphi$ turns up, then since $T$ is consistent, we can declare that $\varphi$ is \emph{not} a theorem.
Hence, there \emph{is} a dumbly mechanical `wait and see' procedure for deciding whether $\varphi$ is a $T$-theorem.\hfill$\Box$
\vspace{8pt}\noindent We are, of course, relying here on a very relaxed notion of effective decidability-in-principle where we aren't working under time constraints (`effective' doesn't mean `practically efficacious' or `efficient'!). We might have to twiddle our thumbs for an immense time before one of $\varphi$ or $\neg \varphi$ turns up. Still, our `wait and see' method is guaranteed in this case to produce a result in finite time, in an entirely mechanical way -- so this counts as an effectively computable procedure in the official generous sense (explained more in \emph{IGT}, \S2.2).
\section{Capturing numerical properties in a theory}
Here's an equivalent way of rewriting the earlier Defn.~\ref{defn_expresses}:
\begin{defn}
A property $P$ is \emph{expressed}\index{express!property} by the open wff $\varphi\mathsf{(x)}$ with one free variable in an arithmetical language $L$ iff, for every $n$,\\
\hspace*{0.7cm} i. if $n$ has the property $P$, then
$\varphi(\overline{\mathsf{n}})$ is true,\\
\hspace*{0.7cm} ii. if $n$ does not have the property $P$, then $\neg\varphi(\overline{\mathsf{n}})$ is true.\\
A two-place relation $R$ is {expressed} by the open wff $\psi\mathsf{(x, y)}$ with two free variables iff, for every $m, n$,\\
\hspace*{0.7cm} i. if $m$ is $R$ to $n$, then
$\psi(\overline{\mathsf{m}}, \overline{\mathsf{n}})$ is true,\\
\hspace*{0.7cm} ii. if $m$ is not $R$ to $n$, then
$\neg\psi(\overline{\mathsf{m}}, \overline{\mathsf{n}})$ is true.\\
A one-place function $f$ is {expressed} by the open wff $\chi\mathsf{(x, y)}$ with two free variables iff, for every $m, n$,\\
\hspace*{0.7cm} i. if $f(m) = n$, then
$\chi(\overline{\mathsf{m}}, \overline{\mathsf{n}})$ is true,\\
\hspace*{0.7cm} ii. if $f(m) \neq n$, then
$\neg\chi(\overline{\mathsf{m}}, \overline{\mathsf{n}})$ is true.
\end{defn}
\noindent (We won't fuss about the obvious extension to many-place relations and functions.) Now we want a new companion definition:
\begin{defn}\label{def:captures}
The theory $T$ \emph{captures} the property $P$ by the open wff $\varphi\mathsf{(x)}$ iff, for any $n$,\\
\hspace*{0.7cm} i. if $n$ has the property $P$, then $T \vdash
\varphi(\overline{\mathsf{{n}}})$,\\
\hspace*{0.7cm} ii. if $n$ does not have the property $P$, then $T \vdash
\neg \varphi(\overline{\mathsf{{n}}}).$\\
The theory $T$ {captures} the two-place relation $R$ by the open wff $\psi\mathsf{(x, y)}$ iff, for any $m, n$,\\
\hspace*{0.7cm} i. if $m$ is $R$ to $n$, then $T \vdash
\psi(\overline{\mathsf{m}}, \overline{\mathsf{n}})$,\\
\hspace*{0.7cm} ii. if $m$ is not $R$ to $n$, then $T \vdash
\neg\psi(\overline{\mathsf{m}}, \overline{\mathsf{n}})$\\
The theory $T$ {captures} the one-place function $f$ by the open wff $\chi\mathsf{(x, y)}$ iff, for any $m, n$,\\
\hspace*{0.7cm} i. if $f(m) = n$, then $T \vdash
\chi(\overline{\mathsf{m}}, \overline{\mathsf{n}})$,\\
\hspace*{0.7cm} ii. if $f(m) \neq n$, then $T \vdash
\neg\chi(\overline{\mathsf{m}}, \overline{\mathsf{n}})$,
\end{defn}
\noindent So: what a theory can \emph{express} depends on the richness of its language; what a theory can \emph{capture} (mnemonic: \underline{ca}se-by-case \underline{p}rove) depends on the richness of its axioms and rules of inferences.
Ideally, of course, we'll want any theory that aims to deal with arithmetic not just to express but to capture lots of arithmetical properties, i.e. to prove which particular numbers have or lack which properties.
But what sort of properties do we want to capture? Well, suppose that $P$ is some effectively decidable property of numbers, i.e. one for which there is a mechanical procedure for deciding, given a natural number $n$, whether $n$ has property $P$ or not (see Defn.~\ref{defn_decidable}). Now, when we construct a formal theory of the arithmetic of the natural numbers, we will surely want deductions inside our theory to be able to track, case by case, any mechanical calculation that we can already perform informally. We don't want going formal to \emph{diminish} our ability to determine whether $n$ has this property $P$. Formalization aims at regimenting what we can already do: it isn't supposed to hobble our efforts. So while we might have some passing interest in more limited theories, we might naturally aim for a formal theory $T$ which at least (a) is able to frame some open wff $\varphi(\mathsf{x})$ which expresses the decidable property $P$, and (b)~is such that if $n$ has property $P$, $T \vdash \varphi(\overline{\mathsf{n}})$, and if $n$ does not have property $P$, $T \vdash \neg\varphi(\overline{\mathsf{n}})$. In short, we want $T$ to capture $P$ in the sense of our definition.
The working suggestion therefore is that, if $P$ is any effectively decidable property of numbers, we ideally want a competent theory of arithmetic $T$ to be able to capture $P$. Which motivates the following definition:
\begin{defn}\label{def:sufficientlystrong}
A formal theory $T$ including some arithmetic is \emph{{sufficiently strong}} iff it captures all decidable numerical properties.
\end{defn}
\noindent (Well, it would be natural to require the theory also capture all decidable relations and all computable functions -- but for present purposes we don't need to worry about that!). It seems a reasonable and desirable condition on an ideal formal theory of the arithmetic of the natural numbers that it be sufficiently strong: in effect, when we can decide whether a number has a certain property, the theory can do it.
\section{Sufficiently strong theories are undecidable}\label{sec:sffstrongundecidable}
We now prove a lovely theorem:
\begin{restatable}{theorem}{suffstrongnotdecidable}\label{th:suffstrongnotdecidable}
No consistent, sufficiently strong, axiomatized formal theory is decidable.
\end{restatable}
\noindent\emph{Proof}\quad We suppose $T$ is a consistent and sufficiently strong axiomatized theory yet also decidable, and derive a contradiction.
If $T$ is sufficiently strong, it must have a supply of open wffs. And by Defn~\ref{formal_lang}, it must in fact be decidable what strings of symbols are open $T$-wffs with the free variable `$\mathsf{x}$'. And we can use the dodge in the proof of Theorem~\ref{th:negcompletearedecidable} to start mechanically listing such wffs
\begin{quote}
$
\mathsf{\varphi_\mathsf{0}(x), \varphi_\mathsf{1}(x), \varphi_\mathsf{2}(x), \varphi_\mathsf{3}(x),} \ldots.
$
\end{quote}
For we can just churn out all the strings of symbols of $T$'s language, and mechanically select out the wffs with free variable `$\mathsf{x}$'.
Now we can introduce the following definition:
\begin{quote}
$n$ has the property $D$ if and only if $T \vdash \neg \varphi_n (\overline{\mathsf{{n}}})$.
\end{quote}
The supposition that $T$ is a decidable theory entails that $D$ is an effectively decidable property of numbers.
Why? Well, given any number $n$, it will be a mechanical matter to start listing off the open wffs until we get to the $n$-th one, $\varphi_n \mathsf{(x)}$. Then it is a mechanical matter to form the numeral $\overline{\mathsf{{n}}}$, substitute it for the variable and prefix a negation sign. Now we just apply the supposed mechanical procedure for deciding whether a sentence is a $T$-theorem to test whether the wff $\neg\varphi_n{(\overline{\mathsf{{n}}})}$ is a theorem. So, on our current assumptions, there is an algorithm for deciding whether $n$ has the property $D$.
Since, by hypothesis, the theory $T$ is sufficiently strong, it can capture all decidable numerical properties. So it follows, in particular, that $D$ is capturable by some open wff. This wff must of course eventually occur somewhere in our list of the $\varphi(\mathsf{x})$. Let's suppose the $d$-th wff does the trick: that is to say, property $D$ is captured by $\varphi_d \mathsf{(x)}$.
It is now entirely routine to get out a contradiction. For, just by definition, to say that $\varphi_d \mathsf{(x)}$ captures $D$ means that for any $n$,
\begin{quote}
if $n$ has the property $D$, $T \vdash \varphi_d (\overline{\mathsf{{n}}})$,\\
if $n$ doesn't have the property $D$, $T \vdash \neg \varphi_d (\overline{\mathsf{{n}}})$.
\end{quote}
So taking in particular the case $n = d$, we have
\begin{enumerate}\renewcommand{\labelenumi}{\roman{enumi}.}
\setlength{\itemsep}{-0.75ex}\setlength{\parsep}{0ex}
\item if $d$ has the property $D$, $T \vdash \varphi_d (\overline{\mathsf{{d}}})$,
\item if $d$ doesn't have the property $D$, $T \vdash\neg \varphi_d (\overline{\mathsf{{d}}})$.
\end{enumerate}
But note that our initial definition of the property $D$ implies for the particular case $n = d$:
\begin{enumerate}\renewcommand{\labelenumi}{\roman{enumi}.}\setcounter{enumi}{2}
\item $d$ has the property $D$ if and only if $T \vdash \neg \varphi_d (\overline{\mathsf{{d}}})$.
\end{enumerate}
\noindent From (ii) and (iii), it follows that whether $d$ has property $D$ or not, the wff $\neg \varphi_d (\overline{\mathsf{{d}}})$ is a theorem either way. So by (iii) again, $d$ does have property $D$, hence by (i) the wff $\varphi_d (\overline{\mathsf{{d}}})$ must be a theorem too. So a wff and its negation are both theorems of $T$. Therefore $T$ is inconsistent, contradicting our initial assumption that $T$ is consistent.
In sum, the supposition that $T$ is a consistent and sufficiently strong axiomatized formal theory of arithmetic \emph{and} decidable leads to contradiction. \hfill$\Box$
\vspace{8pt}\noindent So, if $T$ is properly formalized, consistent and can prove enough arithmetic, then there is no way of mechanically determining what's a $T$-theorem and what isn't. We could, I suppose, call this result a \emph{non-trivialization theorem}. We can't trivialize an interesting area of mathematics which contains enough arithmetic by regimenting it into a theory $T$, and then passing $T$ over to a computer to tells us what's a theorem and what isn't.
It's worth remarking on the key construction here. We take a sequence of wffs $\varphi_n(\mathsf{x})$ (for $n = 0, 1,2,\ldots$) and then considering the (negations of) the wffs $\varphi_0(\overline{\mathsf{0}})$, $\varphi_1(\overline{\mathsf{1}})$, $\varphi_2(\overline{\mathsf{2}})$, etc.\footnote{Reality check: what is the relation between $\varphi (\overline{\mathsf{0}})$ and $\varphi({\mathsf{0}})$?} This sort of thing is called a \emph{diagonalizing}. Why?
Well just imagine the square array you get by writing $\varphi_0(\overline{\mathsf{0}})$, $\varphi_0(\overline{\mathsf{1}})$, $\varphi_0(\overline{\mathsf{2}})$, etc. in the first row, $\varphi_1(\overline{\mathsf{0}})$, $\varphi_1(\overline{\mathsf{1}})$, $\varphi_1(\overline{\mathsf{2}})$, etc. in the next row, $\varphi_2(\overline{\mathsf{0}})$, $\varphi_2(\overline{\mathsf{1}})$, $\varphi_2(\overline{\mathsf{2}})$ etc. in the next row, and so on [go on, draw the diagram!]. Then the wffs of the form $\varphi_n(\overline{\mathsf{n}})$ lie down the diagonal.
As we'll see, it is diagonalization and not any worrying kind of self-reference that is really at the heart of G\"odel's incompleteness proof.
\section{A corollary about the decidability of logic}
\begin{defn}
A formalized logic is decidable if the property of being a theorem of the logic -- i.e. a sentence deducible from no premisses -- is decidable.
\end{defn}
\noindent It is familiar that standard propositional logic is decidable (doing a truth-table test or a tree test decides what's a tautology, and the theorems are all and only the tautologies). It is familiar too that there's no obvious analogue to the truth-table test for deciding of an arbitrary sentence whether it is theorem of standard first-order logic (a.k.a. the predicate calculus). But is there some other decision procedure?
Well, Theorem~\ref{th:suffstrongnotdecidable} now has an interesting corollary:
\begin{theorem}
If there is a consistent theory with a first-order logic which is sufficiently strong and has a finite number of axioms, then first-order logic is undecidable.
\end{theorem}
\noindent\emph{Proof}\quad Suppose $Q$ is a consistent finitely axiomatized theory with a first-order logic and which is sufficiently strong. Since it is finitely axiomatized, we can wrap all its axioms together into one long conjunction, $\hat{Q}$. And then, trivially, $Q \vdash \varphi$ if and only if $\vdash \hat{Q} \to \varphi$; i.e. we can prove $\varphi$ inside $Q$ if and only if a certain related conditional is logically provable from no assumptions.
So if the logic were decidable, and (1) we could mechanically tell whether the conditional $\hat{Q} \to \varphi$ is a logical theorem, then (2) we could mechanically decide whether $\varphi$ is a $Q$-theorem. But since $Q$ is a consistent sufficiently strong formalized theory (2) is impossible. So (1) is impossible -- the logic must be undecidable.\hfill$\Box$
\vspace{8pt}\noindent Much later, we'll find that there is indeed a consistent, finitely axiomatized, weak arithmetic with a first-order logic, which is sufficiently strong -- the so-called Robinson Arithmetic $\mathsf{Q}$ fits the bill. So that will settle it: first-order logic really is undecidable.
\section{Incompleteness again}
Theorem~\ref{th:negcompletearedecidable} says: any consistent, negation-complete, axiomatized formal theory is decidable. Theorem~\ref{th:suffstrongnotdecidable} says:
no consistent, sufficiently strong, axiomatized formal theory is decidable. It immediately follows that
\begin{restatable}{theorem}{easygodelthm}\label{easygodel}
A consistent, sufficiently strong, axiomatized formal theory cannot be negation complete.
\end{restatable}
\noindent Wonderful! A seemingly remarkable theorem proved remarkably quickly. But what can we learn from it?
Well, note that -- unlike G\"odel's own result -- Theorem~\ref{easygodel} doesn't actually yield a specific undecidable sentence for a given theory $T$. And more importantly, it doesn't tell us that $T$ must have an undecidable \emph{arithmetical} sentence.
So suppose we start off with a consistent `sufficiently strong' theory $T$ couched in some language which just talks about arithmetic matters: then this theory $T$ is incomplete, and will have arithmetical formally undecidable sentences. But now imagine that we extend $T$'s language (perhaps it now talks about sets of numbers as well as about numbers), and we give it richer axioms, to arrive at an expanded consistent theory $U$. Now, $U$ will still be sufficiently strong if $T$ is, and so Theorem~\ref{easygodel} will still apply. Note, however, that as far as Theorem~\ref{easygodel} is concerned, it could be that $U$ repairs the gaps in $T$ and proves every truth statable in $T$'s language, while the incompleteness has now `moved outwards', so to speak, to claims involving $U$'s new vocabulary. G\"odel's result is a lot stronger: he shows that some incompleteness will always remain \emph{even in the theory's arithmetical core}.
Still, the current theorem is surprising enough. Set down a purely arithmetical theory. Either it won't be sufficiently strong (will fail to prove some things you'd want a formalized arithmetic to prove) or it is incomplete (so still will fail to prove some arithmetic truths) or -- even worse -- it is simply inconsistent.
Finally, though, we should stress that the interest of Theorem~\ref{easygodel} really depends on the notion of a sufficiently strong theory -- defined in terms of the informal notion of a decidable property of numbers -- being in good order. Well, obviously, I wouldn't have written this episode if the notion of sufficient strength was intrinsically problematic. However, making good that claim by given a sharper account of the notion of decidability takes quite a lot of effort! And it takes notably more effort than we need to prove incompleteness by G\"odel's original method. So over the next episodes, we are going to revert to exploring G\"odel's route to the incompleteness theorems.
\vspace{8pt}\noindent At this point, you can usefully read Chs 4 and 6 of \emph{IGT}. You might also skim Ch.~5 -- but proof details there are perhaps only for real enthusiasts: in fact the arguments are about as tricky as any in the book, so I don't want you to get fazed by them!
\newpage
%\setcounter{section}{0}
\setcounter{page}{1}
\begin{center}%
{{\Large \emph{G\"odel Without (Too Many) Tears -- 3}}\\[16pt]{\LARGE Two weak arithmetics} \par%
\vskip 1.5em%
{\large
\lineskip .75em%
\begin{tabular}[t]{c}%
Peter Smith
\end{tabular}\par}}%
\vskip 0.75em%
{University of Canterbury, Christchurch, NZ}\\[6pt]
{March 2, 2010}%
\vskip 1.5em%
\end{center}%\par
\noindent\hrulefill
\begin{itemize}\setlength{\itemsep}{0pt}
\item Baby Arithmetic is negation complete
\item Robinson Arithmetic, $\mathsf{Q}$
\item A simple proof that Robinson Arithmetic is not complete
\item Adding $\leq$ to Robinson Arithmetic
\item Why Robinson Arithmetic is interesting
\end{itemize}
\noindent\hrulefill
\vspace{8pt}\noindent So far, we've been going at a pretty rapid pace (just to get to exciting stuff fast!). We now need to slow right down.
Our last big theorem -- Theorem~\ref{easygodel} -- tells us that \emph{if} a theory meets certain conditions, then it must be negation incomplete. And we made some initial arm-waving remarks to the effect that it \emph{seems} plausible that we should want theories which meet those conditions. Later, we announced that there actually \emph{is} a consistent (and finitely axiomatized) weak arithmetic with a first-order logic which meets the conditions (in which case, stronger arithmetics will also meet the conditions). But we didn't say anything about what such a weak theory really looks like. In fact, we haven't looked at \emph{any} detailed theory of arithmetic yet! It is high time, then, that we stop operating at the extreme level of abstraction of Episodes 1 and 2, and start getting our hands dirty.
This episode introduces a couple of weak arithmetics. Frankly, this is pretty unexciting stuff -- by all means skip fairly lightly over some of the more boring proof details! But you do need to get a flavour of how these two simple formal theories work, in preparation for the next episode where we tackle the canonical first-order arithmetic $\mathsf{PA}$.
\section{Baby Arithmetic}
We start by looking at an evidently sound theory \textsf{BA} (`Baby Arithmetic'). \emph{This is a negation complete theory of arithmetic.}
Question: How is that possible? -- for recall \Godelsemantic*
\noindent Answer: \textsf{BA}'s very limited language $L_B$ lacks quantifiers, so doesn't contain the language of basic arithmetic. Because its language is so limited, the theory can in fact prove or disprove every sentence constructible in $L_B$.
\subsection{The language $L_B$}
The language $L_B$ has non-logical symbols
\begin{quote}
$\mathsf{0, S, +, \times}$
\end{quote}
The first of these is of course a constant (intended to denoted zero). The next symbol is a one-place function symbol (intended to denote the successor function). The last two symbols in the list are two-place function symbols (with the obvious standard interpretations). Note that if we use `$+$' and `$\times$' as `infix' function symbols in the usual way -- i.e. we write $\mathsf{S0 + SS0}$ rather than prefix the function sign as in $\mathsf{+\,S0\,SS0}$ -- then we'll also need brackets for scoping the function signs, to disambiguate $\mathsf{S0 + SS0 \times SSS0}$, e.g. as $\mathsf{(S0 + (SS0 \times SSS0)).}$
From these symbols, we can construct the \emph{terms} of $L_B$. A term is a referring expression built up from occurrences of `$\mathsf{0}$' and applications of the function expressions `$\mathsf{S}$', `$+$', `$\times$'. So, examples are $\mathsf{0}$, $\mathsf{SSS0}$, $\mathsf{(S0 + SS0)}$, $\mathsf{((S0 + SS0) \times SSS0)}$.
The \emph{value} of a term is the number it denotes when standardly interpreted: the values of our example terms are respectively 0, 3, 3 and 9.
Recall, we use `${\overline{\mathsf{{n}}}}$' to represent the standard numeral $\mathsf{SS\ldots S0}$ with $n$ occurrences of `$\mathsf{S}$'. Thus `${\overline{\mathsf{{3}}}}$' is short for `${{\mathsf{{SSS0}}}}$'. The value of `${\overline{\mathsf{{n}}}}$' is of course $n$.\footnote{Recall also our convention: \textsf{sans serif} expressions belong to $L_B$, or whatever formal language is under discussion: \emph{italic} symbols are just part of everyday mathematical English.}
The sole built-in predicate of the language $L_B$ is the logical identity sign. Since $L_B$ lacks other non-logical predicates, its only way of forming atomic wffs is therefore by taking two {terms} constructed from the non-logical symbols and putting the identity sign between them. In other words, the atomic wffs of $L_B$ are \emph{equations} relating terms denoting particular numbers. So, for example, $\mathsf{(S0 + SS0) = SSS0}$ is a true atomic wff -- which we can abbreviate, dropping brackets in a natural way, as $\mathsf{\overline{1} + \overline{2} = \overline{3}}$ -- and $\mathsf{(S0 \times SS0) = SSS0}$ is a false one.
We also, however, want to be able to express \emph{inequations}, hence we'll want $L_B$ to have a negation sign. And note, for convenience, we will abbreviate wffs of the form $\neg\,\tau_1 = \tau_2$ by $\tau_1 \neq \tau_2$.
In \emph{IGT}, I go on to round things out so as to give $L_B$ some expressively complete set of propositional connectives, e.g. $\mathsf{\neg, \land, \lor, \to}$. We'll then also of course need brackets again for scoping the two-place connectives if we give them an `infix' syntax in the familiar way.
The syntax for constructing the complete class of wffs of $L_B$ is then exactly as you'd expect, and the semantics is the obvious one. [Exercise: spell out the details carefully!]
\subsection{The axioms and logic of $\mathsf{BA}$}
The theory \textsf{BA} in the language $L_B$ comes equipped with some classical propositional deductive system to deal with the propositional connectives (choose your favourite system!) and the usual identity rules.
Next, we want non-logical axioms governing the successor function. We want to capture the ideas that, if we start from zero and repeatedly apply the successor function, we keep on getting further numbers -- i.e. different numbers have different successors: contraposing, for any $m, n$, if $Sm = Sn$ then $m = n$.
And further, we never cycle back to zero: for any $n$, $0 \neq Sn$.
However, there are no quantifiers in $L_B$. So we can't directly express those general facts about the successor function inside the object language $L_B$. Rather, we have to employ \emph{schemata} (i.e. general templates) and use the generalizing apparatus in our English metalanguage to say: \emph{any sentence that you get from one of the following schemata by substituting standard numerals for the place-holders `$\zeta\!$', `$\xi\!$' is an axiom}.
\begin{schema}
\quad {$\mathsf{0 \not= S\zeta}$}
\end{schema}
\begin{schema}
\quad {$\mathsf{S\zeta = S\xi \:\to\: \zeta = \xi}$}
\end{schema}
Next, we want non-logical axioms for addition. This time we want to capture the idea that adding zero to a number makes no difference: for any $m$, $m + 0 = m$. And adding a larger number $Sn$ to $m$ is governed by the rule: for any $m, n$, $m + Sn = S(m +n)$. Those two principle together tell us how to add zero to a given number $m$; and then adding one is defined as the successor of the result of adding zero; and then adding two is defined as the successor of the result of adding one; and so on up -- thus defining adding $n$ for any particular natural number $n$.
Again, however, because of its lack of quantifiers, we can't express all that directly inside $L_B$. We have to resort to schemata again, and say that \emph{anything you get by substituting standard numerals for placeholders in the following is an axiom:
}\begin{schema}
\quad {$\mathsf{\zeta + 0 = \zeta}$}
\end{schema}
\begin{schema}
\quad {$\mathsf{\zeta + S\xi = S(\zeta + \xi)}$}
\end{schema}
We can similarly pin down the multiplication function by requiring that \emph{every numeral instance of the following is an axiom too}:
\begin{schema}
\quad {$\mathsf{\zeta \times 0 = 0}$}
\end{schema}
\begin{schema}
\quad {$\mathsf{\zeta \times S\xi = (\zeta \times \xi) + \zeta}$}
\end{schema}
\noindent Instances of Schema 5 tell us the result of multiplying by zero. Instances of Schema 6 with `$\xi$' replaced by `$\mathsf{0}$' define how to multiply by one in terms of multiplying by zero and then applying the already-defined addition function. Once we know about multiplying by one, we can use another instance of Schema 6 with `$\xi$' replaced by `$\mathsf{S0}$' to tell us how to multiply by two (multiply by one and do some addition). And so on and so forth, thus defining multiplication for every number.
To summarize, then,
\begin{defn}
$\mathsf{BA}$ is the theory whose language is $L_B$, logic is propositional logic plus standard identity rules, and whose non-logical axioms are every numerical instance of Schemas (1) to (6).
\end{defn}
\subsection{Some proofs inside $\mathsf{BA}$, and three little theorems}
We'll give two little examples of how arithmetic can be done inside $\textsf{BA}$. First, let's show that $\mathsf{BA} \vdash \mathsf{\overline{4} \neq \overline{2}}$, i.e. $\mathsf{BA} \vdash \mathsf{SSSS0 \neq SS0}$
\begin{tabbing}
\hspace{4em}\= \hspace{1cm} \= \hspace{7.2cm}\= \kill
\>1.\' \>$\mathsf{SSSS0 = SS0}$ \>Supposition\\
\>2.\' \>$\mathsf{SSSS0 = SS0 \to SSS0 = S0}$ \>Axiom, instance of Schema 2\\
\>3.\' \>$\mathsf{SSS0 = S0}$ \>From 1, 2 by MP\\
\>4.\' \>$\mathsf{SSS0 = SS0 \to SS0 = 0}$ \>Axiom, instance of Schema 2\\
\>5.\' \>$\mathsf{SS0 = 0}$ \>From 3, 4 by MP\\
\>6.\' \>$\mathsf{0 \neq SS0}$ \>Axiom, instance of Schema 1\\
\>7.\' \>Contradiction! \>From 5, 6 and identity rules\\
\hspace{4em}\= \hspace{0.2cm} \= \hspace{8cm}\= \kill
\>8.\' \>$\mathsf{SSSS0 \neq SS0}$ \>From 1 to 7, by RAA.
\end{tabbing}
\noindent And a little reflection on that illustrative proof should now convince you of this general claim:
\begin{theorem}\label{th:BAdealswithnumerals}
If $\mathit{s}$ and $\mathit{t}$ are distinct numbers, then $\mathsf{BA \vdash \overline{s} \neq \overline{t}}$.
\end{theorem}
\noindent [Exercise for the mathematical: turn that `little reflection' into a proper proof!]
And for our second example, we'll show that $\mathsf{BA} \vdash \mathsf{\overline{2} \times \overline{1} = \overline{2}}$. In unabbreviated form, though dropping outermost brackets, we need to derive $\mathsf{SS0 \times S0 = SS0}$.
\begin{tabbing}
\hspace{4em}\= \hspace{0.2cm} \= \hspace{8cm}\= \kill
\>1.\' \>$\mathsf{SS0 \times 0 = 0}$ \>Axiom, instance of Schema 5\\
\>2.\' \>$\mathsf{SS0 \times S0 = (SS0 \times 0) + SS0}$ \>Axiom, instance of Schema 6\\
\>3.\' \>$\mathsf{SS0 \times S0 = 0 + SS0}$ \>From 1, 2 by LL
\end{tabbing}
(`LL' of course indicates the use of {Leibniz's Law} to intersubstitute identicals.) To proceed, we now need to show that $\mathsf{0 + SS0 = SS0}$. For note, this \emph{isn't} an instance of Schema 3. So we have to do a bit of work to get it:
\begin{tabbing}
\hspace{4em}\= \hspace{0.2cm} \= \hspace{8cm}\= \kill
\>4.\' \>$\mathsf{0 + 0 = 0}$\>Axiom, instance of Schema 3\\
\>5.\' \>$\mathsf{0 + S0 = S(0 + 0)}$\>Axiom, instance of Schema 4\\
\>6.\' \>$\mathsf{0 + S0 = S0}$\>From 4, 5 by LL\\
\>7.\' \>$\mathsf{0 + SS0 = S(0 + S0)}$\>Axiom, instance of Schema 4\\
\>8.\' \>$\mathsf{0 + SS0 = SS0}$\>From 6, 7 by
LL
\end{tabbing}
Which gives us what we want:
\begin{tabbing}
\hspace{4em}\= \hspace{0.2cm} \= \hspace{8cm}\= \kill
\>9.\' \>$\mathsf{SS0 \times S0 = SS0}$ \>From 3, 8 by LL
\end{tabbing}
\noindent That's pretty laborious, to be sure, but again it works. And inspection of $\mathsf{BA}$'s axioms and a little reflection on our second illustrative proof should now convince you of a further general claim:
\begin{theorem}\label{BAcandosimpleaddmult}
$\mathsf{BA}$ can prove any true equation of the form $\mathsf{\overline{m} +\overline{n} = \overline{t}}$ or $\mathsf{\overline{m} \times \overline{n} = \overline{t}}$.
\end{theorem}
\noindent In other words, $\mathsf{BA}$ can correctly add or multiply any two numbers. [Exercise for the mathematical: give a proper proof of that!]
We can now generalize further: in fact $\mathsf{BA}$ can correctly evaluate all terms of its language. That is to say,
\begin{theorem}\label{th:BAevaluatesterms}
Suppose $\tau$ is a term of $L_B$ and the value of $\tau$ on the intended interpretation of the symbols is $t$. Then $\mathsf{BA \vdash \tau = \overline{t}}$.
\end{theorem}
\noindent Why so? Well, let's take a very simple example and then draw a general moral. Suppose we want to show e.g. that $\mathsf{(\overline{2} + \overline{3}) \times (\overline{2} \times \overline{2}) = \overline{20}}$. Then evidently we'll proceed as follows.
\begin{tabbing}
\hspace{4em}\= \hspace{0.2cm} \= \hspace{8cm}\= \kill
\>1.\' \> $\mathsf{(\overline{2} + \overline{3}) \times (\overline{2} \times \overline{2}) = (\overline{2} + \overline{3}) \times (\overline{2} \times \overline{2})}$ \> Trivial!\\
\>2.\' \>$\mathsf{\overline{2} + \overline{3} = \overline{5}}$ \>Instance of Thm.~\ref{BAcandosimpleaddmult}\\
\>3.\' \>$\mathsf{(\overline{2} + \overline{3}) \times (\overline{2} \times \overline{2}) = \overline 5 \times (\overline{2} \times \overline{2})}$ \>From 1, 2 using LL\\
\>4.\' \>$\mathsf{\overline{2} \times \overline{2} = \overline{4}}$ \>Instance of Thm.~\ref{BAcandosimpleaddmult}\\
\>5.\' \>$\mathsf{(\overline{2} + \overline{3}) \times (\overline{2} \times \overline{2}) = \overline{5} \times \overline{4}}$ \>From 3, 4 using LL\\
\>6.\' \>$\mathsf{\overline 5 \times \overline{4} = \overline{20}}$ \>Instance of Thm.~\ref{BAcandosimpleaddmult}\\
\>7.\' \>$\mathsf{(\overline{2} + \overline{3}) \times (\overline{2} \times \overline{2}) = \overline{20}}$ \>From 5, 6 using LL
\end{tabbing} \noindent What we do here is evidently `evaluate' the complex formula on the left `from the inside out', reducing the complexity of what needs to evaluated at each stage, eventually equating the complex formula with a standard numeral. [Exercise for the mathematical: give a proper argument `by induction on the complexity of the formula' that proves Thm~\ref{th:BAevaluatesterms}.]
\subsection{$\mathsf{BA}$ is a sound and complete theory of the truths of $L_B$}
Our little theorems now enable us to prove the following:
\begin{theorem}
Suppose $\sigma$ and $\tau$ are terms of $L_B$. Then if $\sigma = \tau$ is true, then $\mathsf{BA} \vdash \sigma = \tau$, and if $\sigma = \tau$ is false, then $\mathsf{BA} \vdash \sigma \neq \tau$.
\end{theorem}
\noindent\emph{Proof}\quad Let $\sigma$ evaluate to $s$ and $\tau$ evaluate to $t$. Then, by Theorem~\ref{th:BAevaluatesterms}, (i) $\mathsf{BA} \vdash \sigma = \overline{\mathsf{s}}$ and (ii) $\mathsf{BA} \vdash \tau = \overline{\mathsf{t}}$.
Now, suppose $\sigma = \tau$ is true. Then $s = t$, and so $\overline{\mathsf{s}}$ must be the very same numeral as $\overline{\mathsf{t}}$. We can therefore immediately conclude from (i) and (ii) that $\mathsf{BA} \vdash \sigma = \tau$ by the logic of identity.
Suppose, on the other hand, that $\sigma = \tau$ is false, so $s \neq t$. Then by Theorem~\ref{th:BAdealswithnumerals}, $\mathsf{BA \vdash \overline{s} \neq \overline{t}}$, and together with (i) and (ii) that implies $\mathsf{BA} \vdash \sigma \neq \tau$, again by the logic of identity.\hfill$\Box$
\vspace{8pt}\noindent And from that last theorem, it more or less immediately follows that
\begin{theorem}
$\mathsf{BA}$ is negation complete.\label{BAcomp}
\end{theorem}
\noindent\emph{Proof} The only atomic claims expressible in $\mathsf{BA}$ are equations involving terms; all other sentences are truth-functional combinations of such equations. But we've just seen that we can (1) prove each true equation and (2) prove the true negation of each false equation.
But now recall that there's a theorem of propositional logic which tells us that, given some atoms and/or negated atoms, we can prove every complex wff that must be true if those atoms/negated atoms are true, and prove the negation of every complex wff that must be false if those atoms/negated atoms are true. That means, given (1) and (2),
we can derive any true truth-functional combination of the equations/inequations in a complex wff, i.e. prove any true sentence. Likewise, we can also derive the negation of any false truth-functional combination of the equations/inequations in a complex wff, i.e. prove the negation of any false sentence.
Hence, for \emph{any} sentence $\varphi$ of $\mathsf{BA}$, since either $\varphi$ is true or false, either $\mathsf{BA} \vdash \varphi$ or $\mathsf{BA} \vdash \neg\varphi$. Hence $\mathsf{BA}$ is negation complete. \hfill$\Box$
\vspace{8pt}\noindent So the situation is this. $\mathsf{BA}$ is obviously a sound theory -- all its axioms are trivial arithmetical truths, and its logic is truth-preserving, so all its theorems are true. $\mathsf{BA}$ is also, as we've just seen, a complete theory in the sense of entailing all the truths expressible in its language $L_B$. However, the language $L_B$ only allows us to express a limited class of facts about adding and multiplying particular numbers (it can't express numerical generalizations). And, prescinding from practical issues about memory or display size, any pocket calculator can in effect tell us about all such facts. So it is no surprise that we can get a formalized theory to do the same!
\section{Robinson Arithmetic}
That's all very straightforward, but also very unexciting. The reason that Baby Arithmetic manages to prove every correct claim \emph{that it can express} -- and is therefore negation complete by our definition -- is that it can't express very much. In particular, as we stressed, it can't express any generalizations at all. $\mathsf{BA}$'s completeness comes at the high price of being expressively extremely impoverished. The obvious way to start beefing up $\mathsf{BA}$ into something more interesting is to restore the familiar apparatus of quantifiers and variables. So that's what we'll start doing.
\subsection{The language $L_A$}\label{subsec:L_A}
We'll keep the same non-logical vocabulary as in $L_B$: so there is still just a single non-logical constant denoting zero, and the three built-in function-symbols, $\mathsf{S}, +, \times$ expressing successor, addition and multiplication. But now we allow ourselves the full linguistic resources of first-order logic, with the usual supply of quantifiers and variables to express generality. We fix the domain of the quantifiers to be the natural numbers. The result is the interpreted language $L_A$.
$L_A$ is the least ambitious language which `contains the language of basic arithmetic' in the sense of Defn.~\ref{def:langcontainsbasicarith}. (For, of course, $L_A$ has the the predicate expression `$\mathsf{x=x}$' which has the numbers as its extension, so fits our official definition, if we want to fuss about that.)
\subsection{The axioms and logic of $\mathsf{Q}$}
The theory $\mathsf{Q}$ is built in the formal language $L_A$, and is equipped with a full first-order classical logic. And as for the non-logical axioms, now we have
the quantifiers available to express generality we can replace each of $\mathsf{BA}$'s metalinguistic schemata (specifying an infinite number of formal axioms governing particular numbers) by a single generalized Axiom expressed inside $L_A$ itself. For example, we can replace the first two schemata governing the successor function by
\begin{axiom}
\quad {$\mathsf{\forall x(0 \not= Sx)}$}
\end{axiom}
\begin{axiom}
\quad {$\mathsf{\forall x \forall y(Sx = Sy \lif x = y)}$}
\end{axiom}
\noindent Obviously, each instance of our earlier Schemata 1 and 2 can be deduced from the corresponding Axiom by instantiating the quantifiers.
These Axioms tell us that zero isn't a successor, but they don't explicitly rule it out that there are other objects that aren't successors cluttering up the domain of quantification (i.e. there could be `pseudo-zeros'). We didn't need to fuss about this before, because by construction $\mathsf{BA}$ can only talk about the numbers represented by standard numerals in the sequence `$\mathsf{0, S0, SS0}, \ldots$'. But now we have the quantifiers in play. And these quantifiers are intended to run over the natural numbers -- we certainly don't intend them to be running over stray objects that aren't successors. So let's reflect that in our axioms by explicitly ruling out such strays:%\footnote
\begin{axiom}
\quad {$\mathsf{\forall x(x \not= 0 \:\lif\: \exists y (x = Sy))}$}
\end{axiom}
Next, we can similarly replace our previous schemata for addition and multiplication by universally quantified Axioms in the obvious way:
\begin{axiom}
\quad {$\mathsf{\forall x(x + 0 = x)}$}
\end{axiom}
\begin{axiom}
\quad {$\mathsf{\forall x\forall y(x + Sy = S(x + y))}$}
\end{axiom}
\begin{axiom}
\quad {$\mathsf{\forall x(x \times 0 = 0)}$}
\end{axiom}
\begin{axiom}
\quad {$\mathsf{\forall x\forall y(x \times Sy = (x \times y) + x)}$}
\end{axiom}
\noindent Again, each of these axioms entails all the instances of $\mathsf{BA}$'s corresponding schema.
\begin{defn}
The formal axiomatized theory with language $L_A$, Axioms 1 to 7, plus a classical first-order logic, is standardly called \emph{Robinson Arithmetic}, or simply $\mathsf{Q}$.
\end{defn}
\noindent It is worth noting, for future reference, that it was first isolated as a weak system of arithmetic worthy of study in 1952 -- i.e. long after G\"odelian incompleteness was discovered.
\subsection{$\mathsf{Q}$ is not complete}\label{sec:Qnotcomplete}
$\mathsf{Q}$ is assuredly a sound theory. Its axioms are all true; its logic is truth-preserving; so its derivations are proper proofs in the intuitive sense of demonstrations of truth and every theorem of $\mathsf{Q}$ is true. But just which truths of $L_A$ are theorems?
Since any old $\mathsf{BA}$ axiom -- i.e. any instance of one of our previous schemata -- can be derived from one of our new $\mathsf{Q}$ Axioms, every $L_B$-sentence that can be proved in $\mathsf{BA}$ is equally a quantifier-free $L_A$-sentence which can be proved in $\mathsf{Q}$. Hence,
\begin{theorem}\label{th:QcorrectlydecidesQfree}
$\mathsf{Q}$ correctly decides every quantifier-free $L_A$ sentence (i.e. $\mathsf{Q} \vdash \varphi$ if the quantifier-free wff $\varphi$ is true, and $\mathsf{Q} \vdash \neg\varphi$ if the quantifier-free wff $\varphi$ is false).\end{theorem}
So far, so good. However, there are very simple true quantified sentences that $\mathsf{Q}$ can't prove.
For example, $\mathsf{Q}$ can of course prove any particular wff of the form $\mathsf{0 + \overline{n} = \overline{n}}$. \emph{But it {can't} prove the corresponding universal generalization}:% $\chi =_{\mathrm{def}} \mathsf{\forall x (0 + x = x)}$.
\begin{theorem}
$\mathsf{Q} \nvdash \mathsf{\forall x (0 + x = x)}$.
\end{theorem}
\noindent\emph{Proof}\quad Since $\mathsf{Q}$ is a theory with a standard first-order theory, for any $L_A$-sentence $\varphi$, $\mathsf{Q} \vdash \varphi$ if and only if $\mathsf{Q} \vDash \varphi$ (that's just the completeness theorem for first-order logic). Hence one way of showing that $\mathsf{Q} \nvdash \varphi$ is to show that $\mathsf{Q} \nvDash \varphi$: and we can show \emph{that} by producing a countermodel to the entailment -- i.e. by finding an interpretation (a deviant, unintended, `non-standard', re-interpretation) for $L_A$'s wffs which makes $\mathsf{Q}$'s axioms true-on-that-interpretation but which makes $\varphi$ false.
So here goes: take the domain of our deviant, unintended, re-interpretation to be the set $N^*$ comprising the natural numbers but with two other `rogue' elements $a$ and $b$ added (these could be e.g. Kurt G\"odel and his friend Albert Einstein -- but any other pair of distinct non-numbers will do). Let `$\mathsf{0}$' still to refer to zero. And take `$\mathsf{S}$' now to pick out the successor* function $S^*$ which is defined as follows: $S^*n = Sn$ for any natural number in the domain, while for our rogue elements $S^*a = a$, and $S^*b = b$. It is very easy to check that Axioms 1 to 3 are still true on this deviant interpretation. Zero is still not a successor. Different elements have different successors. And every non-zero element is a successor.
We now need to extend this interpretation to re-interpret the function-symbol `$+$'. Suppose we take this to pick out addition*, where $m +^* n = m + n$ for any natural numbers $m$, $n$ in the domain, while $a +^* n = a$ and $b +^* n = b$. Further, for any $x$ (whether number or rogue element), $x +^* a = b$ and $x +^* b = a$. If you prefer that in a matrix (read off \emph{row} $+^*$ \emph{column}):
\begin{center}
\begin{tabular}{|c|c|c|c|}\hline $+^*$ & $n$ & $a$ & $b$ \\\hline $m$ & $m + n$ & $b$ & $a$ \\\hline $a$ & $a$ & $b$ & $a$ \\\hline $b$ & $b$ & $b$ & $a$ \\\hline \end{tabular}
\end{center}
It is again easily checked that interpreting `$+$' as addition* still makes Axioms 4 and 5 true. (In headline terms: For Axiom 4, we note that adding* zero on the right always has no effect. For Axiom 5, just consider cases. (i) $m +^* S^*n = m + Sn = S(m +n) = S^*(m +^* n)$ for `ordinary' numbers $m, n$ in the domain. (ii) $a + S^*n = a = S^*a = S^*(a +^* n)$, for `ordinary' $n$. Likewise, (iii) $b + S^*n = S^*(b +^* n)$. (iv) $x +^* S^*a = x + a = b = S^*b = S^*(x +^* a)$, for any $x$ in the domain. (v) Finally, $x +^* S^*b = S^*(x +^* b)$. Which covers every possibility.)
We are not quite done, however, as we still need to show that we can give a co-ordinate re-interpretation of `$\times$' in $\mathsf{Q}$ by some deviant multiplication* function. We can leave it as an exercise to fill in suitable details. Then, with the details filled in, we will have an overall interpretation which makes the axioms of $\mathsf{Q}$ true and $\mathsf{\forall x (0 + x = x)}$ false. So $\mathsf{Q} \nvdash \mathsf{\forall x (0 + x = x)}$ \hfill$\Box$
\begin{restatable}{theorem}{Qisnegationincomplete}
$\mathsf{Q}$ is negation-incomplete.
\end{restatable}
\begin{proof}Put $ \varphi = \mathsf{\forall x (0 + x = x)}$. We've just shown that $\mathsf{Q} \nvdash \varphi$. But obviously, $\mathsf{Q}$ can't prove $\neg\varphi$ either. Just revert to the standard interpretation built into $L_A$. $\mathsf{Q}$ certainly has true axioms on this interpretation. So all theorems are true on that interpretation, but $\neg\varphi$ is false on that interpretation, so it can't be a theorem. Hence $\varphi$ is formally undecidable in $\mathsf{Q}$.
\end{proof}
Of course, we've already announced that G\"odel's incompleteness theorem is going to prove that \emph{no} sound axiomatized theory whose language is at least as rich as $L_A$ can be negation complete -- that was Theorem~\ref{th:soundrichtheoriesincomplete}. But we don't need to invoke anything as elaborate as G\"odel's arguments to see that $\mathsf{Q}$ is incomplete. $\mathsf{Q}$ is, so to speak, \emph{boringly} incomplete.
\subsection{$\mathsf{Q}$ can capture \emph{less-than-or-equals}}\label{subsec:defnorder}
We'e just seen something that $\mathsf{Q}$ can't do: now for something it \emph{can} do. We'll establish
\begin{theorem}
In $\mathsf{Q}$, the \emph{less-than-or-equal-to} relation is {captured} by the wff $\mathsf{\exists v(v + x = y)}$.
\end{theorem}
\noindent Given the definition of capturing, Defn~\ref{def:captures}, that means we need to show that, for any particular pair of numbers, $m$, $n$, if $m \leq n$, then $\mathsf{Q}$ $\vdash \mathsf{\exists v(v + \overline{{m}} = \overline{{n}})}$, and otherwise $\mathsf{Q}$ $\vdash \neg\mathsf{\exists v(v + \overline{{m}} = \overline{{n}})}$.
\vspace{8pt}\noindent\emph{Proof}\quad Suppose $m \leq n$, so for some $k \geq 0$, $k + m = n$. $\mathsf{Q}$ can prove everything $\mathsf{BA}$ proves and hence, in particular, can prove every true addition sum. So we have $\mathsf{Q} \vdash \mathsf{\overline{{{k}}} + \overline{{{m}}} = \overline{{{n}}}}$. But logic gives us $\mathsf{\overline{{{k}}} + \overline{{{m}}} = \overline{{{n}}}} \,\vdash\, \mathsf{\exists v(v + \overline{{{m}}} = \overline{{{n}}})}$ by existential quantifier introduction. Therefore $\mathsf{Q} \vdash \mathsf{\exists v(v + \overline{{{m}}} = \overline{{{n}}})}$, as was to be shown.
Suppose alternatively $m > n$. We need to show $\mathsf{Q}$ $\vdash \neg\mathsf{\exists v(v + \overline{{{m}}} = \overline{{{n}}})}$. We'll first demonstrate this in the case where $m = 2$, $n = 1$, using a Fitch-style proof-system. For brevity we will omit statements of $\mathsf{Q}$'s axioms and some other trivial steps; we drop unnecessary brackets\begin{tabbing}
\hspace{4em}\= \hspace{0.2em} \= \hspace{1cm} \= \hspace{1cm} \= \hspace{1cm} \= \hspace{1cm} \= \hspace{4.7cm}\= \kill
\>1.\' \>\>$\mathsf{\exists v(v + SS0 = S0)}$ \>\>\>\>Supposition\\
\>2.\' \>\>\>$\mathsf{a + SS0 = S0}$ \>\>\>Supposition\\
%\>$\vdots$\\
\>3.\' \>\>\>$\mathsf{a + SS0 = S(a + S0)}$ \>\>\>From Axiom 5\\
\>4.\' \>\>\>$\mathsf{S(a + S0) = S0}$ \>\>\>From 2, 3 by LL\\
\>5.\' \>\>\>$\mathsf{a + S0 = S(a + 0)}$ \>\>\>From Axiom 5\\
\>6.\' \>\>\>$\mathsf{SS(a + 0) = S0}$ \>\>\>From 4, 5 by LL\\
\>7.\' \>\>\>$\mathsf{a + 0 = a}$ \>\>\>From Axiom 4\\
\>8.\' \>\>\>$\mathsf{SSa = S0}$ \>\>\>From 6, 7 by LL\\
\>9.\' \>\>\>$\mathsf{SSa = S0 \lif Sa = 0}$ \>\>\>From Axiom 2\\
\>10.\'\>\>\>$\mathsf{Sa = 0}$ \>\>\>From 8, 9 by MP\\
\>11.\'\>\>\>$\mathsf{0 = Sa}$ \>\>\>From 10\\
\>12.\' \>\>\>$\mathsf{0 \not= Sa}$\>\>\>From Axiom 1\\
\>13.\' \>\>\>Contradiction!\>\>\>From 11, 12\\
\>14.\' \>\>Contradiction!\>\>\>\>$\exists$E 1, 2--13\\
\>15.\' \>$\neg\mathsf{\exists v(v + SS0 = S0)}$\>\>\>\>\>RAA 1--14\end{tabbing}
The only step to explain is at line (14) where we use a version of the Existential Elimination rule: if the temporary supposition $\varphi(\mathsf{a})$ leads to contradiction, for arbitrary $\mathsf{a}$, then $\exists \mathsf{v}\varphi(\mathsf{v})$ must lead to contradiction.
And having done the proof for the case $m = 2$, $n = 1$, inspection reveals that we can use the same general pattern of argument to show $\mathsf{Q}$ $\vdash \neg\mathsf{\exists v(v + \overline{{{m}}} = \overline{{{n}}})}$ whenever $m > n$. [Exercise: convince yourself that this claim is true!] So we are done.\hfill$\Box$
\subsection{Adding `$\leq$' to $\mathsf{Q}$}\label{addingLEQto Q}
Given the result we've just proved, we can sensibly add the standard symbol `$\leq$' to $L_A$, the language of $\mathsf{Q}$, defined so that whatever we put for `$\xi$' and `$\zeta$', $\mathsf{\xi \leq \zeta}$ is just short for $\exists \mathsf{v(v + \xi = \zeta)}$, and then $\mathsf{Q}$ will be able to prove at least the expected facts about the less-than-or-equals relations among quantifier-free terms. (Well, we really need to be a bit more careful than that in stating the rule for unpacking the abbreviation, if we are to avoid any possible `clash of variables'. But we're not going to fuss about the details.)
Note, by the way, that some presentations in fact treat `$\leq$' as a primitive symbol built into our formal theories like $\mathsf{Q}$ from the start, governed by its own additional axiom(s). But nothing important hangs on the difference between that approach and our policy of introducing the symbol by definition. (And of course, nothing hangs either on our policy of introducing `$\leq$' as our basic symbol rather than `$<$', which could have been defined by $\mathsf{\xi < \zeta} =_\mathrm{def}$ $\exists \mathsf{v(Sv + \xi = \zeta)}$.)
%And we can similarly show that $\exists \mathsf{v(v + x = y)}$ captures the relation \emph{less than or equal to} in $\mathsf{Q}$. Which motivates also adding the abbreviatory symbol `$\leq$' to our formal language with the definition $\mathsf{\xi \leq \zeta} =_\mathrm{def} \exists \mathsf{v(v + \xi = \zeta)}$.
Since it so greatly helps readability, we'll henceforth make very free use of `$\leq$' as an abbreviatory symbol inside formal arithmetics. We will also adopt a second, closely related, convention. In informal mathematics we often want to say that all/some numbers less than or equal to a given number have some particular property. We can now express such claims in formal arithmetics by wffs of the shape $\forall \xi(\xi \leq \kappa \lif \varphi(\xi))$ and
\mbox{$\exists \xi(\xi \leq \kappa \;\land\; \varphi(\xi))$}, where `$\leq$' is to be unpacked as we've just explained. And it is standard to further abbreviate such wffs by $(\forall\xi \leq \kappa)\varphi(\xi)$ and $(\exists\xi \leq \kappa)\varphi(\xi)$ respectively.
\subsection{Why $\mathsf{Q}$ is interesting}
Given it can't even prove $\mathsf{\forall x (0 + x = x)}$, $\mathsf{Q}$ is evidently a \emph{very} weak theory of arithmetic. Which is probably no surprise as (apart from Axiom 3) we've added little \emph{axiomatic} proof-power to $\mathsf{BA}$ while adding a lot of \emph{expressive power} to its language by adding quantifiers. So it's only to be expected that there will be lots of newly expressible truths that $\mathsf{Q}$ can't prove (and since $\mathsf{Q}$ is sound, it won't be able to disprove these truths either).
Even so, despite its great shortcomings, $\mathsf{Q}$ does have some nice properties. As we saw, it can capture the decidable relation that obtains when one number is at least as big as another. Moreover, we can eventually show that
\begin{restatable}{theorem}{Qissuffstrong}
{$\mathsf{Q}$ can capture \emph{all} decidable numerical properties -- i.e. it is
{sufficiently strong} in the sense of Defn~\ref{def:sufficientlystrong}}.
\end{restatable}
\noindent That might initially seem very surprising indeed, given $\mathsf{Q}$'s weakness. But remember, `sufficient strength' was defined as a matter of being able to \emph{case-by-case} prove enough wffs about decidable properties of individual numbers. It turns out that $\mathsf{Q}$'s hopeless weakness at proving generalizations doesn't stop it doing that.
So that's why $\mathsf{Q}$ is particularly interesting -- it is about the weakest arithmetic which is sufficiently strong (and it was isolated by Robinson for that reason), and for which G\"odelian proofs of incompleteness can be run. Suppose, then, that a theory is formally axiomatized, consistent and can prove everything $\mathsf{Q}$ can prove (those do indeed seem very modest requirements). Then what we've just announced and promised can be proved is that any such theory will be `sufficiently strong'. And therefore e.g. Theorem~\ref{easygodel} will apply -- any such theory will be incomplete.
However, we can only establish that $\mathsf{Q}$ \emph{does} have sufficient strength to capture all decidable properties if and when we have a quite general theory of decidability to hand. And we don't want to get embroiled in that (at least yet). So what we \emph{will} be proving quite soon (in Episode 6) is a rather weaker claim about $\mathsf{Q}$. We'll show that it can capture all so-called `primitive recursive' properties, where these form a very important subclass of the decidable properties. This major theorem will be a crucial load-bearing part of our proofs of various G\"odel style incompleteness theorems: it means that $\mathsf{Q}$ gives us `the modest amount of arithmetic' need for Theorem~\ref{thm:Godelsyntactic}.
But before we get round to showing all this, we are first going to take a look at a \emph{much} richer arithmetic than $\mathsf{Q}$, namely $\mathsf{PA}$.
\vspace{8pt}\noindent For parallel reading to this episode, see \emph{IGT}, Ch. 8, and Ch. 9, \S\S9.1--9.4.
\newpage
%\setcounter{section}{0}
\setcounter{page}{1}
\begin{center}%
{{\Large \emph{G\"odel Without (Too Many) Tears -- 4}}\\[16pt]{\LARGE First-order Peano Arithmetic} \par%
\vskip 1.5em%
{\large
\lineskip .75em%
\begin{tabular}[t]{c}%
Peter Smith
\end{tabular}\par}}%
\vskip 0.75em%
{University of Canterbury, Christchurch, NZ}\\[6pt]
{March 30, 2010}%
\vskip 1.5em%
\end{center}%\par
\noindent\hrulefill
\begin{itemize}\setlength{\itemsep}{0pt}
\item The $\omega$-rule
\item Induction: the induction axiom, the induction rule, the induction schema
\item First-order Peano Arithmetic
\item The idea of $\Delta_0$, $\Sigma_1$, and $\Pi_1$ wffs
\item A consistent extension of $\mathsf{Q}$ is sound for $\Pi_1$ wffs
\end{itemize}
\noindent\hrulefill
\vspace{8pt}\noindent This episode, after the preamble, falls into two parts. First I introduce the canonical first-order theory of arithmetic, $\mathsf{PA}$. Then -- tacked on here, because I need to fit it in somewhere and this is as good a place as any -- I introduce some terminology for distinguishing wffs on the basis of their `quantifier complexity'. Do make sure you understand the idea of induction, and how that is handled in $\mathsf{PA}$, before reading the rest of the episode.
Here's the story so far. We noted in Episode 1 that G\"odel showed, more or less, \Godelsemantic*
\noindent Of course, we didn't \emph{prove} that theorem, though we waved an arm airily at the basic trick that G\"odel uses to establish the theorem -- namely we `arithmetize syntax' (i.e. numerically code up facts about provability in formal theorems) and then construct a G\"odel sentence that sort-of-says `I am not provable'.
We did note, however, that this theorem invokes the assumption that we dealing with a \emph{sound} theory, and of course soundness is a \emph{semantic} notion. For various reasons, G\"odel thought it essential to establish that we can get incompleteness making merely syntactic assumptions, thus: \Godelsyntactic*
\noindent Here, `contains enough arithmetic' means proving enough (a syntactically characterizable condition). This theorem with syntactic assumptions is the sort of thing that's usually referred to as The First Incompleteness Theorem, and of course we again didn't prove it. Indeed, we didn't even say what that `modest amount of arithmetic' is (nor did we say anything about that `additional desirable property'). So Episode 1 was little more than a gesture in the right direction.
In Episode 2, we did a bit better, in the sense that we actually gave a \emph{proof} of the following theorem: \easygodelthm*
\noindent The argument was nice, as it shows that we can get incompleteness results without calling on the arithmetization of syntax and the construction of G\"odel sentences. However the argument depended on working with an informal, intuitive, notion of `decidable property'. And, as we noted, the result is weaker than Theorem~\ref{thm:Godelsyntactic} for it doesn't tell us that there will be a formally undecidable \emph{arithmetic} sentence. Moreover, the discussion in Episode 2 doesn't give us any clue what a `sufficiently strong' theory might look like.
Episode 3 took a step towards filling that last gap (and also towards telling us what the `modest amount of arithmetic' mentioned in Theorem~\ref{thm:Godelsyntactic} amounts to).
We first looked at $\mathsf{BA}$, the quantifier-free arithmetic of the addition and multiplication of particular numbers. This is a complete (and hence decidable!) theory -- but of course it is only complete, i.e. able to decide every sentence constructible in its language, because its language is indeed so weak. If we augment the language of $\mathsf{BA}$ by allowing ourselves the usual apparatus of first-order quantification, and replace the schematically presented axioms of $\mathsf{BA}$ with their obvious universally quantified correlates (and add in the axiom that every number bar zero is a successor) we get Robinson Arithmetic $\mathsf{Q}$. Since we've added pretty minimally to what is given in the axioms of $\mathsf{BA}$ while considerably enriching its language, it is no surprise that we have \Qisnegationincomplete* \noindent And we can prove this without any fancy G\"odelian considerations. A familiar and simple kind of model-theoretic argument is enough to do the trick: we find a deviant interpretation of $\mathsf{Q}$'s syntax which is such as to make the axioms all true but on which $\mathsf{\forall x(0 + x = x)}$ is false, thus establishing $\mathsf{Q} \nvdash \mathsf{\forall x(0 + x = x)}$. And since $\mathsf{Q}$ is sound on the built-in interpretation of its language, we also have $\mathsf{Q} \nvdash \neg\mathsf{\forall x(0 + x = x)}$.
$\mathsf{Q}$, then, is a very weak arithmetic. Still, it will turn out to be the `modest amount of arithmetic' needed to get Theorem~\ref{thm:Godelsyntactic} to fly. Also we have \Qissuffstrong*
\noindent so a theory's containing $\mathsf{Q}$ makes it a sufficiently strong' theory in the sense of Theorem~\ref{easygodel}. Of course establishing \emph{these} facts is a non-trivial task for later: but they do explain why $\mathsf{Q}$ is so interesting despite its weakness.
Now read on \ldots.
\section{Arithmetical Induction}\label{sec:arithinduct}
For a moment, put $\varphi(\mathsf{x})$ for $(\mathsf{0 + x = x})$. Then, as we noted, for any particular $n$, $\mathsf{Q} \vdash \varphi(\overline{\mathsf{n}})$, for $\mathsf{Q}$ can prove an unquantified true equation. But we showed that $\mathsf{Q} \nvdash \forall \mathsf{x}\varphi(\mathsf{x})$. In other words, $\mathsf{Q}$ can separately prove all instances of $\varphi(\overline{\mathsf{n}})$ but can't prove the corresponding simple generalization. So let's consider what proof-principle we might add to $\mathsf{Q}$ to fill this sort of gap.
\subsection{The $\omega$-rule}\label{subsec:theomegarule}
$\mathsf{Q}$, to repeat, proves each of $\varphi(\overline{\mathsf{0}}), \varphi(\overline{\mathsf{1}}), \varphi(\overline{\mathsf{2}}), \varphi(\overline{\mathsf{3}}), \ldots$. Suppose then that we added to $\mathsf{Q}$ the rule that we can infer as follows:%given each of $\varphi(\mathsf{0}), \varphi(\mathsf{1}), \varphi(\mathsf{2}), \varphi(\mathsf{3}), \ldots$ we can infer $\forall \mathsf{x}\varphi(\mathsf{x})$}, i.e.
\begin{prooftree}
\AxiomC{${\begin{array}{c}\vdots \\ \varphi(\overline{\mathsf{0}})\end{array}}\quad\quad {\begin{array}{c}\vdots \\ \varphi(\overline{\mathsf{1}})\end{array}}\quad\quad {\begin{array}{c}\vdots \\ \varphi(\overline{\mathsf{2}})\end{array}}\quad\quad {\begin{array}{c}\vdots \\ \varphi(\overline{\mathsf{3}})\end{array}}\quad\quad\ {\begin{array}{c}\ \\ \vdots\end{array}}$}
\UnaryInfC{$\forall \mathsf{x}\varphi(\mathsf{x})$}
\end{prooftree}
This rule -- or rather the generalized version for any $\varphi$ -- is what's called \emph{the $\omega$-rule}. It is evidently a sound rule: if each $\varphi(\overline{\mathsf{n}})$ is true then indeed all numbers $n$ satisfy $\varphi(\mathsf{x})$. Adding the $\omega$-rule would certainly repair the gap we exposed in $\mathsf{Q}$.
But of course there's a snag: the $\omega$-rule is infinitary. It takes as input an infinite number of premisses. So proofs invoking this $\omega$-rule will be infinite arrays. And being infinite, they cannot be mechanically checked in a finite number of steps to be constructed according to our expanded rules. In sum, then, a theory with a proof-system that includes an infinitary $\omega$-rule can't count as a formal axiomatized theory according to Defn.~\ref{formal_theory}.
There is some technical interest in investigating infinitary logics which allow infinitely long sentences (e.g. infinite conjunctions) and/or infinite-array proofs. But there is a clear sense in which such logics are not of practical use, and cannot be used to regiment how we in fact argue. The finiteness requirement we impose on formalized theories is, for that reason, not arbitrary. And so we'll stick to that requirement, and hence have to ban $\omega$-rule.
\subsection{Replacing an infinitary rule with a finite one}\label{subsec:inductionrule}
To repeat, as well as proving $\varphi(\mathsf{0})$, $\mathsf{Q}$ also proves $\varphi(\overline{\mathsf{1}}), \varphi(\overline{\mathsf{2}}), \varphi(\overline{\mathsf{3}}), \ldots$. And it isn't, so to speak, a global accident that $\mathsf{Q}$ can prove all those. Rather, $\mathsf{Q}$ proves them in a uniform way.
To bring this out, note that we have the following proof in $\mathsf{Q}$:
\begin{tabbing}
\hspace{4em}\= \hspace{1cm} \= \hspace{7.2cm}\= \kill
\>1.\' \>$\varphi(\mathsf{a})$ \>Supposition\\
\>2.\' \>$\mathsf{0 + a = a}$ \>Unpacking the definition\\
\>3.\' \>$\mathsf{S(0 + a) = Sa}$ \>From 2 by LL\\
\>4.\' \>$\mathsf{(0 + Sa) = S(0 + a)}$ \>Instance of Axiom 5\\
\>5.\' \>$\mathsf{(0 + Sa) = Sa}$ \>From 3, 4\\
\>6.\' \>$\varphi(\mathsf{Sa})$ \>Applying the definition\\
\>7.\'\ ($\varphi({\mathsf{a}}) \to \varphi(\mathsf{Sa}))$ \>\>From 1, 6 by Conditional Proof\\
\>8.\'\ $\forall \mathsf{x}(\varphi(\mathsf{x}) \to \varphi(\mathsf{Sx}))$ \>\>From 7, since ${\mathsf{a}}$ was arbitrary.
\end{tabbing}
Given $\mathsf{Q}$ trivially proves $\varphi(\mathsf{0})$, we can appeal to $\forall \mathsf{x}(\varphi(\mathsf{x}) \to \varphi(\mathsf{Sx}))$ to derive $\varphi(\mathsf{0}) \to \varphi(\overline{\mathsf{1}})$, and so modus ponens gives us $\varphi(\overline{\mathsf{1}})$. The same generalization also gives us $\varphi(\overline{\mathsf{1}}) \to \varphi(\overline{\mathsf{2}})$, so another modus ponens gives us $\varphi(\overline{\mathsf{2}})$. Now we can appeal to our generalization again to get $\varphi(\overline{\mathsf{2}}) \to \varphi(\overline{\mathsf{3}})$, and so can derive $\varphi(\overline{\mathsf{3}})$. Keep on going! In this way, $\varphi(\mathsf{0})$ and $\forall \mathsf{x}(\varphi(\mathsf{x}) \to \varphi(\mathsf{Sx}))$ together prove all of $\varphi(\overline{\mathsf{0}}), \varphi(\overline{\mathsf{1}}), \varphi(\overline{\mathsf{2}}), \varphi(\overline{\mathsf{3}}), \ldots$, which in turn, by the sound $\omega$-rule, entail $\forall \mathsf{x}\varphi(\mathsf{x})$.
But now \emph{we can evidently cut out the infinity of intermediate steps in that last bit of motivating argument}, and that will leave us a nice \emph{finitary} rule\begin{defn}The Induction Rule:
\begin{prooftree}
\AxiomC{$\varphi(\mathsf{0})$}
\AxiomC{$\forall \mathsf{x}(\varphi(\mathsf{x}) \to \varphi(\mathsf{Sx}))$}
\BinaryInfC{$\forall \mathsf{x}\varphi(\mathsf{x})$}
\end{prooftree}\end{defn}
\noindent This is evidently sound whatever predicate we put for $\varphi(\mathsf{\xi})$ which expresses a genuine numerical property. Add this finitary rule to $\mathsf{Q}$ and we'll evidently at least patch the gap we found.
\subsection{Induction: the basic idea}\label{subsec:arithind}
The basic idea reflected in that formal rule is as follows. \begin{quote}Whatever numerical property we take, if we can show that (i) zero has that property, and also show that (ii) this property is always passed down from a number $n$ to the next $Sn$, then this is enough to show (iii) the property is passed down to \emph{all} numbers.\end{quote}\noindent This is the \emph{principle of arithmetical induction}, and is a standard method of proof for establishing arithmetical generalizations.
For those not so familiar with this standard method, here's a little example of the principle at work in an everyday informal mathematical context. Suppose we want to establish that the sum of the first $n$ numbers is $n(n+1)/2$. Well, define $\psi(n)$ to hold if that claim is correct for $n$. Then (i) trivially the result holds for the sum of zero numbers is zero, so $\psi(0)$ is true! And (ii) suppose now the claim holds for a particular $n$. Then the sum of the first $n+1$ numbers is $n(n+1)/2 + (n + 1) = (n + 1)(n+2)/2 = (Sn)(Sn+1)/2$. Which means that $\psi(Sn)$ will be true too. So (i) $\psi(0)$, and (generalizing) (ii) for all numbers $n$, if $\psi(n)$, then $\psi(Sn)$. Therefore, as we want, by induction the claim holds for all numbers.
Here's another example of same principle at work, in telegraphic form. Suppose we want to show that all the theorems of a certain Hilbert-style axiomatized propositional calculus are tautologies. Define $\chi(n)$ to be true if the conclusions of proofs up to $n$ steps long are tautologies. Then we show that $\chi(0)$ is true (trivial!), and then argue that if $\chi(n)$ then $\chi(Sn)$ (e.g. we note that the last step of an $n+1$ step must either be an instance of an axiom, or follow by modus ponens from two earlier conclusions which -- since $\chi(n)$ is true -- must themselves be tautologies, and either way we get another tautology). Then `by induction on the length of proofs' we get the desired result.
\subsection{A word to worried philosophers}
Beginning philosophers, in week one of their first year logic course, have the contrast between deductive and inductive arguments dinned into them. So emphatically is the distinction made, so firmly are they drilled to distinguish conclusive deductive argument from merely probabilistic inductions, that some can't help feeling initially uncomfortable when they first hear of `induction' being used in arithmetic!
So let's be clear. We have a case of empirical, non-conclusive, induction, when we start from facts about a limited sample and infer a claim about the whole population. Number off the swans, for example, and let $\varphi(n)$ say that swan \#$n$ is white. We sample some swans and run (say) $k$ checks showing that $\varphi(0)$, $\varphi(1)$, $\varphi(2)$, \ldots, $\varphi(k)$ are all true. We hope that these are enough to be representative of the whole population of swans, and so -- taking a chance -- infer that for all $n$, $\varphi(n)$, now quantifying over over all (numbers for) swans, jumping beyond the sample of size $k$. The gap between the sample and the whole population, between the particular bits of evidence and the universal conclusion, allows space for error. The inference isn't deductively watertight.
By contrast, in the case of arithmetical induction, we start not from a bunch of claims about particular numbers but from an already universally quantified claim about all numbers, i.e. $\forall \mathsf{x}(\varphi(\mathsf{x}) \to \varphi(\mathsf{Sx}))$. We put that universal claim together with the particular claim $\varphi(0)$ to derive another universal claim, $\forall \mathsf{x}\varphi(\mathsf{x})$. This time, then, we are going from universal to universal, and there is no deductive gap.
You might say `Pity, then, that we use the same word in talking of empirical induction and arithmetical induction when they are such different kinds of inference.' True.
\subsection{The induction axiom, the induction rule, the induction schema}
The basic idea, we said in \S\ref{subsec:arithind}, is that for any property of numbers, if zero has it and it is passed from one number to the next, then all numbers have it. This intuitive principle is a generalization over properties of numbers. Hence to frame a corresponding formal version, it seems that we should ideally use a language that enables us to generalize not just over numbers but over properties of numbers. In a phrase, we'd ideally need to be working in a \emph{second-order} theory, which allows second order quantifiers -- i.e. we have not only first-order quantifiers running over numbers, but also a further sort of quantifier which runs over arbitrary-properties-of-numbers. Then we could state a second-order
\begin{defn}
Induction Axiom:
$$\forall \mathsf{X}([\mathsf{X0} \land \forall \mathsf{x}(\mathsf{Xx} \to \mathsf{XSx})] \to \forall \mathsf{x}\mathsf{Xx})$$
\end{defn}
\noindent (Predicates are conventionally written upper case: so too for variables that are to occupy predicate position.)
Despite that, however, for now we'll concentrate on formal theories whose logical apparatus involves only regular first-order quantification. Note: this isn't due to some perverse desire to fight with one hand tied behind our backs: there are some troublesome issues about second-order logic, though we can't go into them here.
But if we don't have second-order quantifiers available to range over {properties} of numbers, how can we handle induction? Well, one way is to adopt the \emph{induction rule} we encountered in \S\ref{subsec:inductionrule}. So long as $\varphi(\mathsf{x})$ expresses a kosher property -- and we'll say in a moment what that might come to -- we can apply the inference rule
\begin{prooftree}
\AxiomC{$\varphi(\mathsf{0})$}
\AxiomC{$\forall \mathsf{x}(\varphi(\mathsf{x}) \to \varphi(\mathsf{Sx}))$}
\BinaryInfC{$\forall \mathsf{x}\varphi(\mathsf{x})$}
\end{prooftree}
Alternatively, we can set down the first order \begin{defn} Induction Schema:
$$[\varphi(\mathsf{0}) \land \forall \mathsf{x}(\varphi(\mathsf{x}) \to \varphi(\mathsf{Sx}))] \to \forall \mathsf{x}\varphi(\mathsf{x})$$\end{defn}
\noindent and then say that for every kosher $\varphi(\mathsf{x})$, the corresponding instance of the induction schema is to be an axiom. Evidently, having the rule and having all instances of the schema come to just the same.
Techie note. Strictly speaking, we'll also want to allow uses of the inference rule where $\varphi$ has slots for additional variables dangling free. Equivalently, we will take the axioms to be the universal closures of instances of the induction schema with free variables. For more explanation, see \emph{IGT}, \S10.2, and see the idea being put to work in \emph{IGT}, \S10.3. We won't fuss about elaborating this point here.
\section{First-order Peano Arithmetic}\label{sec:firstorderPA}
\subsection{Getting generous with induction}
Suppose then we start again from $\mathsf{Q}$, and aim to build a richer theory in the language $L_A$ (as defined in \S\ref{subsec:L_A}) by adding induction.
Any instance of the induction schema, we said, should be intuitively acceptable as an axiom, so long as we replace $\varphi$ in the schema by a suitable open wff which expresses a genuine property/relation. Well, consider \emph{any} open wff $\varphi$ of $L_A$. This will be built from no more than the constant term `$\mathsf{0}$', the familiar successor, addition and multiplication functions, plus identity and other logical apparatus. Therefore -- you might very well suppose -- it ought also to express a perfectly determinate arithmetical property or relation (even if, in the general case, we can't always decide whether a given number $n$ has the property or not). \emph{So why not be generous and allow {any} open $L_A$ wff at all to be substituted for $\varphi$ in the schema?}
Here's a positive argument for generosity. Remember that instances of the induction schema (for monadic predicates) are \emph{conditionals} which look like this:
$$\mathsf{[\varphi(0) \:\land\: \forall x(\varphi(x) \lif
\varphi(Sx))] \;\lif\; {\forall x\varphi(x)}}$$
So they actually only allow us to derive some $\mathsf{\forall x \varphi(x)}$ when we can \emph{already} prove the corresponding (i) $\mathsf{\varphi(0)}$ and also can prove (ii) $\mathsf{\forall x(\varphi(x) \to \varphi(Sx))}$. But if we can already prove things like (i) and (ii) then aren't we already committed to treating $\varphi$ as a respectable predicate? For given (i) and (ii), we can already prove each and every one of $\mathsf{\varphi(0)}$, $\mathsf{\varphi(S0)}$, $\mathsf{\varphi(SS0)}$, \ldots. However, there are no `stray' numbers which aren't denoted by some numeral; so that means (iv) that we can prove of each and every number that $\varphi$ is true of it. What more can it possibly take for $\varphi$ to express a genuine property that indeed holds for every number, so that (v) $\mathsf{\forall x \varphi(x)}$ is true? In sum, it seems that we can't overshoot by allowing instances of the induction schema for \emph{any} open wff $\varphi$ of $L_{A}$ with one free variable. The only \emph{usable} instances from our generous range of axioms will be those where we can prove the antecedents (i) and (ii) of the relevant conditionals: and in those cases, we'll have every reason to accept the consequents (v) too.
(Techie note: the argument generalizes in the obvious way to the case where $\varphi(\mathsf{x})$ has other variables dangling free.)
\subsection{Introducing $\mathsf{PA}$}
Suppose then that we accept the conclusion of our last argument, and now take it that \emph{any} open wff of $L_A$ can be used in the induction schema. This means moving on from $\mathsf{Q}$, and jumping right over a range of possible intermediate theories, to adopt the much richer theory of arithmetic that we can briskly define as follows:
\begin{defn}
$\mathsf{PA}$\ -- \emph{First-order Peano Arithmetic}\footnote{The name is conventional. Giuseppe Peano\index{Peano, Giuseppe} did publish a list of axioms for arithmetic in 1889. But they weren't first-order, only explicitly governed the successor relation, and -- as he acknowledged -- had already been stated by Richard Dedekind.} -- is the first-order theory whose language is $L_A$ and whose axioms are those of $\mathsf{Q}$\ plus the [universal closures of] \emph{all} instances of the induction schema that can be constructed from open wffs of $L_A$.
\end{defn}
\noindent Plainly, it is still decidable whether any given wff has the right shape to be one of the new axioms, so this is a legitimate formalized theory.
Given its very natural motivation, {$\mathsf{PA}$\ is the benchmark axiomatized first-order theory of {basic arithmetic}}\index{arithmetic!basic}. Just for neatness, then, let's bring together all the elements of its specification in one place.
But first, a quick observation. $\mathsf{PA}$\ allows, in particular, induction for the formula
\begin{quote}
$\mathsf{\varphi(x)} =_{\mathrm{def}} \mathsf{(x \not= 0 \:\lif\: \exists y (x = Sy))}$.
\end{quote} But now note that the corresponding $\mathsf{\varphi(0)}$ is a trivial logical theorem. Likewise, $\mathsf{\forall x\varphi(Sx)}$ is an equally trivial theorem, and that entails $\mathsf{\forall x(\varphi(x)\lif\varphi(Sx))}$. So we can use an instance of the Induction Schema inside $\mathsf{PA}$\ to derive $\mathsf{\forall x\varphi(x)}$. But that's just Axiom 3 of $\mathsf{Q}$. So our initial presentation of $\mathsf{PA}$\ -- as explicitly having all the Axioms of $\mathsf{Q}$\ plus the instances of the Induction Schema -- involves a certain redundancy. Bearing that in mind, here's our \ldots
\subsection{Summary overview of $\mathsf{PA}$}\label{subsec:summaryPA}
First, to repeat, the \emph{language} of $\mathsf{PA}$\ is $L_A$, a first-order language whose non-logical vocabulary comprises just the constant `$\mathsf{0}$', the one-place function symbol `$\mathsf{S}$', and the two-place function symbols `$+$', `$\times$', and whose intended interpretation is the obvious one.
Second, $\mathsf{PA}$'s deductive \emph{proof system} is some standard version of classical first-order logic with identity. The differences between various \mbox{presentations} of first-order logic of course don't make a difference to what \mbox{sentences} can be proved in $\mathsf{PA}$. It's convenient, however, to fix officially on a Hilbert-style axiomatic system for later metalogical work theorizing about the theory.
%: see Sections~\ref{sec:systemsoflogic} and~\ref{Codsequences}).
And third, its non-logical \emph{axioms} -- eliminating the redundancy from our original listing and renumbering -- are the following sentences:
\setcounter{axiom}{0}
\begin{axiom}
\quad {$\mathsf{\forall x(0 \not= Sx)}$}
\end{axiom}
\begin{axiom}
\quad {$\mathsf{\forall x \forall y(Sx = Sy \lif x = y)}$}
\end{axiom}
%\begin{axiom}
%\quad {$\mathsf{\forall x(x \not= 0 \:\lif\: \exists y (x = Sy))}$}
%\end{axiom}
\begin{axiom}
\quad {$\mathsf{\forall x(x + 0 = x)}$}
\end{axiom}
\begin{axiom}
\quad {$\mathsf{\forall x\forall y(x + Sy = S(x + y))}$}
\end{axiom}
\begin{axiom}
\quad {$\mathsf{\forall x(x \times 0 = 0)}$}
\end{axiom}
\begin{axiom}
\quad {$\mathsf{\forall x\forall y(x \times Sy = (x \times y) + x)}$}
\end{axiom}
\noindent plus every instance of the following
\vspace{8pt}
\noindent{\bfseries Induction Schema}\quad {$\mathsf{(\{\varphi(0) \:\land\: \forall x(\varphi(x) \lif
\varphi(Sx))\} \;\to\; {\forall x\varphi(x))}}$}
\emph{where $\varphi(\mathsf{x})$ is an open wff of $L_A$ that has `\/$\mathsf{x}$' free}. (Techie note: if $\varphi(\mathsf{x})$ has other variables free then we'll need to `bind' this instance with universal quantifiers if we want every axiom to be a closed sentence.)
%\footnote{\label{fn:vecnotation}Here's some standard notation. $\vec{\mathsf{y}}$ indicates the $k$ variables $\mathsf{y_{1}, y_{2}, \ldots y_{k}}$ and $\forall\vec{\mathsf{y}}$ is shorthand for the block of quantifiers $\forall\mathsf{y_{1}\forall y_{2} \ldots \forall y_{k}}$ (we allow the null case where $k= 0$). Then we can also give $\mathsf{PA}$'s induction axioms as follows: they are all the wffs of the form
%\begin{quote}
%{${\forall \vec{\mathsf{y}}(\{\varphi(\mathsf{0}, \vec{\mathsf{y}}) \:\land\: \forall \mathsf{x}(\varphi(\mathsf{x}, \vec{\mathsf{y}}) \lif
% \varphi(\mathsf{Sx}, \vec{\mathsf{y}}))\} \:\lif\: \forall \mathsf{x}\varphi(\mathsf{x}, \vec{\mathsf{y}}))}$}
%\end{quote}}
\subsection{What $\mathsf{PA}$ can prove}
$\mathsf{PA}$ proves $\mathsf{\forall x(x \not= Sx)}$. Just take $\mathsf{\varphi(x)}$ to be $\mathsf{x \not= Sx}$. Then $\mathsf{PA}$ trivially proves $\mathsf{\varphi(0)}$ because that's Axiom~1. And $\mathsf{PA}$ also proves $\mathsf{\forall x(\varphi(x) \lif \varphi(Sx))}$ by contraposing Axiom~2. And then an induction axiom tells us that if we have both $\mathsf{\varphi(0)}$ and $\mathsf{\forall x(\varphi(x) \lif \varphi(Sx))}$ we can deduce $\mathsf{\forall x\varphi(x)}$, i.e. no number is a self-successor. It's as simple as that. Yet this trivial little result is worth noting when we recall our deviant interpretation which makes the axioms of $\mathsf{Q}$ true while making $\mathsf{\forall x(0 + x = x)}$ false: that had Kurt G\"odel himself added to the domain as a rogue self-successor. A bit of induction, however, rules out self-successors.
And so it goes: the familiar basic truths about elementary general truths about the successor function, addition, multiplication and ordering (with the order relation as defined in \S\ref{subsec:defnorder}) are all provable in $\mathsf{PA}$ using induction (and rule out other simple deviant models) There are more than enough examples worked through in \emph{IGT}, which we won't repeat here! So we might reasonably have hoped -- at least before we'd heard of G\"odel's incompleteness theorems -- that $\mathsf{PA}$\ would turn out to be a complete theory that indeed pins down all the truths of $L_A$.
Here's another fact that might well have encouraged this hope, pre-G\"odel. Suppose we define the language $L_P$ to be $L_A$ without the multiplication sign. Take $\mathsf{P}$ to be the theory couched in the language $L_P$, whose axioms are $\mathsf{Q}$'s now familiar axioms for successor and addition, plus the universal closures of all instances of the induction schema that can be formed in the language $L_P$. In short, $\mathsf{P}$ is $\mathsf{PA}$\ minus multiplication. \emph{Then $\mathsf{P}$ is a negation-complete theory of successor and addition.} We are not going to be able to prove that -- the argument uses a standard model-theoretic method called `elimination of quantifiers' which isn't hard, and was known in the 1920s, but it would just take too long to explain.
So the situation is as follows, and was known before G\"odel got to work. (i)~There is a complete formal axiomatized theory $\mathsf{BA}$ whose theorems are exactly the quantifier-free truths expressible using successor, addition and multiplication (and the connectives). (ii)~There is another complete formal axiomatized theory (equivalent to $\mathsf{PA}$\ minus multiplication) whose theorems are exactly the first-order truths expressible using just successor and addition. Against this background, G\"odel's result that adding multiplication in order to get full $\mathsf{PA}$\ gives us a theory which is incomplete and incompletable (if consistent) comes as a rather nasty surprise. It certainly wasn't obviously predictable that multiplication would make all the difference. Yet it does. In fact, as we've said before, as soon we have an arithmetic as strong as $\mathsf{Q}$, we get incompleteness.
{And by the way, it isn't that a theory of multiplication must in itself be incomplete. In 1929, Thoralf Skolem showed that there is a complete theory for the truths expressible in a suitable first-order language with multiplication but lacking addition or the successor function. So why then does putting multiplication together with addition and successor produce incompleteness? The answer will emerge shortly enough, but pivots on the fact that an arithmetic with all three functions built in can express/capture \emph{all} `primitive recursive' functions. But we'll have to wait to the next episode to explain what that means.
\section{Burning questions}
\begin{enumerate}
\item \emph{$\mathsf{PA}$\ has an infinite number of axioms: why is having an infinite number of axioms any better than using an infinitary $\omega$-rule?} Sure, there are an unlimited number of instances of the induction schema, each one of which is an axiom of $\mathsf{PA}$. Still, we can mechanically check a given wff to see whether it is or isn't one of the instances. So we can mechanically check a wff to see whether it is a $\mathsf{PA}$ axiom. So we can mechanically (and finitely) check a given finite array of wffs to see whether it is a properly constructed $\mathsf{PA}$ proof. By contrast, we obviously can't finitely check an array to see if it involves a correct application of the infinite-premiss $\omega$-rule. That's why $\mathsf{PA}$ is a kosher formal axiomatized theory in our official sense, and a system with the $\omega$-rule isn't.
\item \emph{But won't some of the instances of the induction schema will be ludicrously long, far to long to mechanically check?} Ah, but remember we are talking about checkability in principle, without constraints on time, the amount of ink to be spilt, etc. etc. Effective decidability is not practical decidability.
\item \emph{$\mathsf{PA}$ has an infinite number of axioms: but can we find a finite bunch of axioms with the same consequences?} No. First-order Peano Arithmetic is essentially infinitely axiomatized (not an easy result though!).
\item \emph{We saw that $\mathsf{Q}$ has `a non-standard model', i.e. there is a deviant unintended interpretation that still makes the axioms of $\mathsf{Q}$ true. Does $\mathsf{PA}$ have any non-standard models, i.e. deviant unintended interpretations that still make \emph{its} axioms true?} Yes -- though not the trivial non-standard model we built for $\mathsf{Q}$. So $\mathsf{PA}$ still doesn't pin down uniquely the structure of the natural numbers. More on this anon.
\end{enumerate}
\section{Quantifier complexity}\label{sec:qcomplexity}
\noindent OK, at this point, go and have a cup of coffee! Are you sure you understand about induction and how this is handled in $\mathsf{PA}$? If not, re-read: otherwise, let's now move on to consider \ldots\\
\noindent $\mathsf{PA}$ is the canonical, most natural, first-order theory of the arithmetic of successor, addition and multiplication. Indeed it is arguable that \emph{any} proposition about successor, addition and multiplication that can be seen to be true just on the basis of our grasp of the structure of the natural numbers can be shown to be true in $\mathsf{PA}$ (for discussion of this, see \emph{IGT}, \S23.3). Still there is some formal interest in exploring weaker systems, sitting between $\mathsf{Q}$ and $\mathsf{PA}$, systems which have \emph{some} induction, but not induction for all open $L_A$ wffs. For example, there is some interest in the theories you get by allowing as axioms only instances of the induction schema induction for so-called $\Delta_0$ wffs, or so-called $\Sigma_1$ wffs. Now, we are \emph{not} going to explore these weak arithmetics here. But, irrespective of that, it is in fact worth knowing what $\Delta_0$ , $\Sigma_1$, and $\Pi_1$ wffs are. So this section briskly explains.
\subsection{Bounded quantification}
As we said before in \S\ref{addingLEQto Q}, we often want to say that all/some numbers less than or equal to a given number have some particular property. We can now express such claims in formal arithmetics like $\mathsf{Q}$ and $\mathsf{PA}$ using wffs of the shape $\forall \xi(\xi \leq \kappa \lif \varphi(\xi))$ and
\mbox{$\exists \xi(\xi \leq \kappa \;\land\; \varphi(\xi))$}, where $\mathsf{\xi \leq \zeta}$ is just short for $\exists \mathsf{v(v + \xi = \zeta)}$. And it is standard to further abbreviate such wffs by $(\forall\xi \leq \kappa)\varphi(\xi)$ and $(\exists\xi \leq \kappa)\varphi(\xi)$ respectively.
For any theory $T$ containing $\mathsf{Q}$ -- and hence for $T = \mathsf{PA}$ in particular -- we have results like these:
\begin{enumerate}%\renewcommand{\theenumi}{O\arabic{enumi}}
%\item $T \vdash \mathsf{\forall x(0 \leq x)}$.
%\item For any $n$, $T \vdash \forall\mathsf{x(Sx + {\overline{n}} = x + S{\overline{n}}})$.
%\item $T \vdash \forall \mathsf{x(x \leq 0 \lif x = 0)}$.
\item For any $n$, $T \vdash\mathsf{\forall x(\{x = \overline{0} \lor x = \overline{{1}} \lor \ldots \lor x = {\overline{n}}\} \lif x \leq {\overline{n}})}$.
\item For any $n$, $T \vdash \mathsf{\forall x(x \leq {\overline{n}} \lif \{x = \overline{0} \lor x = \overline{{1}} \lor \ldots \lor x = {\overline{n}}\})}$.
\item For any $n$, if $T \vdash\varphi(\overline{\mathsf{0}}) \land \varphi(\overline{\mathsf{{1}}}) \land \ldots \land \varphi(\mathsf{{\overline{n}}})$, then $T \vdash(\forall \mathsf{x \leq {\overline{n}}})\varphi(\mathsf{x})$.% New 6!
\item For any $n$, if $T \vdash\varphi(\overline{\mathsf{0}}) \lor \varphi(\overline{\mathsf{{1}}}) \lor \ldots \lor \varphi(\mathsf{{\overline{n}}})$, then $T \vdash(\exists \mathsf{x \leq {\overline{n}}})\varphi(\mathsf{x})$.% New 7!
%\item For any $n$, $T \vdash \mathsf{\forall x(x \leq {\overline{n}} \;\lif\; x \leq S{\overline{n}})}$.
%%\item For any $n >0$, if %$T \mathsf{\vdash \varphi(0) \land \varphi({1}) \land \ldots \land \varphi({n - 1})}$, then\\
%%$T \mathsf{\vdash \varphi(0)}$, $T \mathsf{\vdash \varphi(1)}$, \ldots, $T \mathsf{\vdash \varphi({n - 1})}$, then\\
%%\hspace*{7cm}$T \vdash \mathsf{\forall x(x \leq {\overline{n}} \lif \varphi(x))}$.
%\item For any $n$, $T \vdash \mathsf{\forall x({\overline{n}} \leq x \;\lif\; ({\overline{n}} = x \;\lor\; S{\overline{n}} \leq x ))}$.
%\item For any $n$, $T \vdash \mathsf{\forall x(x \leq {\overline{n}} \;\lor\; {\overline{n}} \leq x )}$.
%\item For any $n >0$, $T \vdash \mathsf{(\forall x \leq \overline{n - 1})\varphi(x) \lif (\forall x \leq \overline{n})(x \neq \overline{n} \lif \varphi(x))}$.
\end{enumerate}
In other words, theories like $\mathsf{Q}$ and $\mathsf{PA}$ `know' that bounded universal quantifications behave like finite conjunctions, and that bounded existential quantifications behave like finite disjunctions. Hold on to that thought!
\subsection{$\Delta_0$ wffs}
Let's informally say that
\begin{defn}An $L_A$ wff is $\Delta_0$ if its only quantifications are bounded ones.\end{defn}
\noindent For a fancied-up definition, see \emph{IGT}, \S9.5. So a $\Delta_0$ wff is one which is built up using the successor, addition, and multiplication functions, identity, the less-than-or-equal-to relation (defined as usual), plus the familiar propositional connectives and/or \emph{bounded} quantifications.
In other words, a $\Delta_0$ wff is exactly like a quantifier-free $L_A$ wff, i.e. like an $L_B$ wff, except that we allow ourselves to wrap up some conjunctions like $\varphi(\overline{\mathsf{0}}) \land \varphi(\overline{\mathsf{{1}}}) \land \ldots \land \varphi(\mathsf{{\overline{n}}})$ into bounded quantifications $(\forall \mathsf{x \leq {\overline{n}}})\varphi(\mathsf{x})$, and similarly wrap up some disjunctions like like $\varphi(\overline{\mathsf{0}}) \lor \varphi(\overline{\mathsf{{1}}}) \lor \ldots \lor \varphi(\mathsf{{\overline{n}}})$ into bounded quantifications $(\exists \mathsf{x \leq {\overline{n}}})\varphi(\mathsf{x})$.
Since we can mechanically calculate the truth-value of every quantifier-free $L_A$ sentence, i.e. $L_B$ sentence, and a $\Delta_0$ sentence is exactly like one, we can mechanically determine the truth-value of a $\Delta_0$ sentence. It follows, of course, that we can mechanically determine whether a $\Delta_0$ open wff $\varphi(\mathsf{x})$ is satisfied by a number $n$ by determining whether $\varphi(\overline{\mathsf{n}})$ is true (likewise for open wffs with more than one free variable). So $\Delta_0$ open wffs express decidable properties of numbers.
Since we know from Theorem~\ref{th:QcorrectlydecidesQfree} that even $\mathsf{Q}$ can correctly decide all quantifier-free $L_A$ sentences, and $\mathsf{Q}$ knows that \emph{bounded} quantifications behave just like conjunctions/disjunctions, it won't be a surprise to hear that we have
\begin{theorem}\label{QisDeltacomplete}
$\mathsf{Q}$ (and hence $\mathsf{PA}$) can correctly decide all $\Delta_0$ sentences.
\end{theorem}
\subsection{$\Sigma_1$ and $\Pi_1$ wffs}
We now say, again informally, that
\begin{defn}\label{def:SigmaPi}
An $L_A$ wff is $\Sigma_1$ if it is (or is equivalent to) a $\Delta_0$ wff preceded by zero, one, or more unbounded \emph{existential} quantifiers. And a wff is $\Pi_1$ if it is (or is equivalent to) a $\Delta_0$ wff preceded by zero, one, or more unbounded \emph{universal} quantifiers.
\end{defn}
\noindent As a mnemonic, it is worth remarking that `$\Sigma$' in the standard label `$\Sigma_1$' comes from an old alternative symbol for the existential quantifier, as in $\Sigma xFx$ -- that's a Greek `S' for `(logical) sum'. Likewise the `$\Pi$' in `$\Pi_1$' comes from corresponding symbol for the universal quantifier, as in $\Pi xFx$ -- that's a Greek `P' for `(logical) product'. And the subscript `1' in `$\Sigma_1$' and `$\Pi_1$' indicates that we are dealing with wffs which start with \emph{one} block of similar quantifiers, respectively existential quantifiers and universal quantifiers.\footnote{By the same token, a $\Pi_2$ wff\index{$\Pi_2$ wff} is one that starts with \emph{two} blocks of quantifiers, a block of universal quantifiers followed by a block of existential quantifiers followed by a bounded kernel. And so it goes.}
So a $\Sigma_1$ wff says that some number (pair of numbers, etc.) satisfies the decidable condition expressed by its $\Delta_0$ core; likewise a $\Pi_1$ wff says that every number (pair of numbers, etc.) satisfies the decidable condition expressed by its $\Delta_0$ core.
To check understanding, show:
\begin{enumerate}
\item The negation of a $\Delta_0$ wff is $\Delta_0$.
\item A $\Delta_0$ wff is also $\Sigma_1$ and $\Pi_1$.
\item The negation of a $\Sigma_1$ wff is $\Pi_1$.
\end{enumerate}
\subsection{Two results}
Here's another pretty trivial result:
\begin{theorem}
$\mathsf{Q}$ (and hence $\mathsf{PA}$) can prove any true $\Sigma_1$ sentences (is `\/$\Sigma_1$-complete').
\end{theorem}
\begin{proof}
Take, for example, a sentence of the type $\mathsf{\exists x \exists y\varphi(x, y)}$, where $\varphi\mathsf{(x, y)}$ is $\Delta_0$. If this sentence is true, then for some pair of numbers $m, n$, the $\Delta_0$ sentence $\varphi(\mathsf{{\overline{m}}, {\overline{n}}})$ must be true. But then by Theorem~\ref{QisDeltacomplete}, $\mathsf{Q}$ proves $\varphi(\mathsf{{\overline{m}}, {\overline{n}}})$ and hence $\mathsf{\exists x \exists y\varphi(x, y)}$, by existential introduction.
Evidently the argument generalizes for any number of initial quantifiers, which shows that $\mathsf{Q}$ proves all truths which are (or are equivalent to) some $\Delta_0$ wff preceded by one or more unbounded {existential} quantifiers.
\end{proof}
But if that's trivial, the following consequence is more fun:
\begin{theorem}
If $T$ is a consistent theory which includes $\mathsf{Q}$, then every $\Pi_1$ sentence that it proves is true.
\end{theorem}
\begin{proof}
Suppose $T$ proves a \emph{false} $\Pi_1$ sentence $\varphi$. Then $\neg\varphi$ will be a \emph{true} $\Sigma_1$ sentence. But in that case, since $T$ includes $\mathsf{Q}$ and so is `$\Sigma_1$-complete', $T$ will prove $\neg\varphi$, making $T$ inconsistent. Contraposing, if $T$ is consistent, any $\Pi_1$ sentence it proves is true.
\end{proof}
\noindent This is, in its way, a rather remarkable observation. It means that we don't have to fully \emph{believe} a theory $T$ -- i.e. don't have to accept \emph{all} its theorems are true on the interpretation built into $T$'s language -- in order to use it to establish that some $\Pi_1$ arithmetic generalization is true. For example, it turns out that, with some trickery, we can state for example Fermat's Last Theorem as a $\Pi_1$ sentence. And Andrew Wiles showed how to prove Fermat's Last Theorem using some seriously heavy-duty infinitary mathematics. Now we see, intriguingly, that we don't have to believe that infinitary mathematics is true -- whatever that means when things get wildly infinitary! -- but only \emph{consistent}, to take him as establishing that the $\Pi_1$ arithmetical claim which is the Theorem is true.
\vspace{8pt}\noindent Now read \emph{IGT}, chs. 9 and 10.
\newpage
%\setcounter{section}{0}
\setcounter{page}{1}
\begin{center}%
{{\Large \emph{G\"odel Without (Too Many) Tears -- 5}}\\[16pt]{\LARGE Primitive recursive functions} \par%
\vskip 1.5em%
{\large
\lineskip .75em%
\begin{tabular}[t]{c}%
Peter Smith
\end{tabular}\par}}%
\vskip 0.75em%
{University of Canterbury, Christchurch, NZ}\\[6pt]
{April 7, 2010}%
\vskip 1.5em%
\end{center}%\par
\noindent\hrulefill
\begin{itemize}\setlength{\itemsep}{0pt}
\item What's a primitive recursive function?
\item How to prove results about all p.r. functions
\item The p.r. functions are computable \ldots
\item \ldots but not all computable functions are p.r.
\item The idea of a characteristic function, which enables us to define \ldots
\item \ldots the idea of p.r. properties and relations.
\end{itemize}
\noindent\hrulefill
\vspace{8pt}\noindent
In our preamble, it might be helpful this time to give a story about where we are going, rather than (as in previous episodes) review again where we've been. So, at the risk of spoiling the excitement, here's what's going to happen in this and the following three episodes.
\begin{enumerate}%\setcounter{enumi}{4}
\item The formal theories of arithmetic that we've looked at so far have (at most) the successor function, addition and multiplication built in. Why stop there? Even school arithmetic acknowledges many more numerical functions. This episode describes a very wide class of familiar such
functions, the so-called primitive recursive ones. We also define the primitive recursive properties and relations (i.e. those with a p.r. `characteristic function').
\item The next episode shows that $L_A$, the language of basic arithmetic, can express all p.r. functions and relations. Moreover $\mathsf{Q}$ and hence $\mathsf{PA}$ can capture all those functions and relations. So $\mathsf{PA}$, despite having only successor, addition and multiplication `built in', can actually deal with a vast range of functions.
\item Then we look at the `arithmetization of syntax' by G\"odel-numbering. Focus on $\mathsf{PA}$ for the moment: then we can define various properties/relations like this\begin{quote}
$\mathit{Wff}(n)$ iff $n$ is the code number of a $\mathsf{PA}$-wff.\\
$\mathit{Sent}(n)$ iff $n$ is the code number of a $\mathsf{PA}$-sentence.\\
$\mathit{Prf}(m,n)$ iff $m$ is the code number of a $\mathsf{PA}$-proof of the sentence \mbox{with code number $n$}.
\end{quote}
Moreover, these properties/relations are primitive recursive. Similar results obtain for any sensibly axiomatized formal theory.
\item Since $\mathit{Prf}$ is p.r., and $\mathsf{PA}$ can capture all p.r. relations, there is a wff $\mathsf{Prf(x,y)}$ which captures the relation $\mathit{Prf}$. And we'll use this fact -- or a closely related one -- to construct a G\"odel sentence which sort-of-says `I am not provable in $\mathsf{PA}$', and hence prove G\"odel first incompleteness theorem for $\mathsf{PA}$. Similarly for other sensibly axiomatized arithmetics that include $\mathsf{Q}$.
\end{enumerate}
\noindent\emph{Now read on \ldots}
\section{Introducing the primitive recursive functions} \label{prirec}
We'll start with two more functions that are familiar from elementary arithmetic. Take the factorial function $y!$, where e.g. $4! = 1 \times 2 \times 3 \times 4$. This can be defined by the following two equations:
\begin{quote}
{$0! = S0 = 1$}\\
{$(Sy)! = y! \times Sy$}
\end{quote}
The first clause tells us the value of the function for the argument $y = 0$; the second clause tells us how to work out the value of the function for $Sy$ once we know its value for $y$ (assuming we already know about multiplication). So by applying and reapplying the second clause, we can successively calculate 1!, 2!, 3!, \ldots. Hence our two-clause definition fixes the value of `$y!$' for all numbers $y$.
For our second example -- this time a two-place function -- consider the exponential, standardly written in the form `$x^y$'. This can be defined by a similar pair of equations:
\begin{quote}
{$x^0 = S0$}\\
{$x ^ {Sy} = (x ^ y \times x)$}
\end{quote}
\noindent Again, the first clause gives the function's value for a given value of $x$ and $y = 0$, and -- keeping $x$ fixed -- the second clause gives the function's value for the argument $Sy$ in terms of its value for~$y$.
We've seen this two-clause pattern before, of course, in our formal Axioms in $\mathsf{Q/PA}$ for the addition and multiplication functions. Presented in the style of everyday informal mathematics (leaving quantifiers to be understood) -- and note, everything in this episode \emph{is} just informal mathematics -- we have:
\begin{quote}
{$x + 0 = x$}\\
{$x + Sy = S(x + y)$}\\[6pt]
{$x \times 0 = 0$}\\
{$x \times Sy = (x \times y) + x$}
\end{quote}
Three comments about our examples so far:
\begin{enumerate}\renewcommand{\labelenumi}{\roman{enumi}.}
\item In each definition, the second clause fixes the value of a function for argument $Sn$ by invoking the value of the \emph{same} function for argument $n$. This kind of procedure where we evaluate a function by calling the same function is standardly termed `{recursive}' -- or more precisely, `primitive recursive'. So our two-clause definitions are examples of \emph{{definition by primitive recursion}}.\setcounter{footnote}{0}\footnote{Strictly speaking, we need a proof of the claim that primitive recursive definitions really do well-define functions: such a proof was first given by Richard Dedekind in 1888.}
\item Note, for example, that $(Sn)!$ is defined as $n! \times Sn$, so it is evaluated by evaluating $n!$ and $Sn$ and then feeding the results of these computations into the multiplication function. This involves, in a word, the \emph{composition} of functions\index{composition of functions}, where evaluating a composite function involves taking the output(s) from one or more functions, and treating these as inputs to another function.
\item Our examples so far can be put together to illustrate two short \emph{chains} of definitions by recursion and functional composition. Working from the bottom up, addition is defined in terms of the successor function; multiplication is then defined in terms of successor and addition; then the factorial (or, on the second chain, exponentiation) is defined in terms of multiplication and successor.
\end{enumerate}
Here's another little definitional chain:
\begin{quote}
{$P(0) = 0$}\\
{$P(Sx) = x$}\\[6pt]
{$x \mbox{\ $-${\hskip -5.4pt}\raisebox{2.3pt}{$\cdot$}{\hskip 2pt}\ } 0 = x$}\\
{$x \mbox{\ $-${\hskip -5.4pt}\raisebox{2.3pt}{$\cdot$}{\hskip 2pt}\ } Sy = P(x \mbox{\ $-${\hskip -5.4pt}\raisebox{2.3pt}{$\cdot$}{\hskip 2pt}\ } y)$}\\[6pt]
{$|x - y| = (x \mbox{\ $-${\hskip -5.4pt}\raisebox{2.3pt}{$\cdot$}{\hskip 2pt}\ } y) + (y \mbox{\ $-${\hskip -5.4pt}\raisebox{2.3pt}{$\cdot$}{\hskip 2pt}\ } x)$}
\end{quote}
`$P$' signifies the predecessor function (with zero being treated as its own predecessor); `$\mbox{$-${\hskip -5.5pt}\raisebox{2.3pt}{$\cdot$}{\hskip 2pt}}$' signifies `subtraction with cut-off', i.e. subtraction restricted to the non-negative integers (so $m \mbox{\ $-${\hskip -5.4pt}\raisebox{2.3pt}{$\cdot$}{\hskip 2pt}\ } n$ is zero if $m < n$). And $|m - n|$ is of course the absolute difference between $m$ and $n$. This time, our third definition doesn't involve recursion, only a simple {composition} of functions.
These examples motivate the following initial gesture towards a definition \begin{defn}\label{def:primrecursiveROUGH}
Roughly: a \emph{{primitive recursive function}}\index{function!primitive recursive} is one that can be similarly characterized using a chain of definitions by recursion and composition.\index{Kleene@Kleene, Stephen}\footnote{The basic idea is there in Dedekind and highlighted by Skolem in 1923. But the modern terminology `primitive recursion' seems to be due to R\'osza P\'eter in 1934; and `primitive recursive function' was first used by Stephen Kleene' in 1936.}
\end{defn}
\noindent That is a quick-and-dirty characterization, though it should be enough to get across the basic idea. Still, we really need to pause to do better. In particular, we need to nail down more carefully the `starter pack' of functions that we are allowed to take for granted in building a definitional chain.
\section{Defining the p.r.\ functions more carefully}\label{prcarefully}
On the one hand, I suppose you ought to read this section! On the other hand, \emph{don't} get lost in the techie details. All we are trying to do here is give a careful, explicit, presentation of the ideas we've just been sketching, and flesh out that rough and ready Defn.~\ref{def:primrecursiveROUGH}.
\subsection{Definition by primitive recursion -- one and two place functions}
Consider the recursive definition of the factorial again:
\begin{quote}
{$0! = 1$}\\
{$(Sy)! = y! \times Sy$}
\end{quote}
This is an example of the following general scheme for defining a one-place function $f$:
\begin{quote}
{$f(0) = g$}\\
{$f(Sy) = h(y, f(y))$}
\end{quote}
Here, $g$ is just a number, while $h$ is -- crucially -- a function we are assumed already to know about prior to the definition of $f$. Maybe that's because $h$ is an `initial' function that we are allowed to take for granted like the successor function; or perhaps it's because we've already given recursion clauses to define $h$; or perhaps $h$ is a composite function constructed by plugging one known function into another -- as in the case of the factorial, where $h(y,u) = u \times Sy$. %$h$ may not care about one argument or the other.
Likewise, with a bit of massaging, the recursive definitions of addition, multiplication and the exponential can all be treated as examples of the following general scheme for defining two-place functions:
\begin{quote}
{$f(x,0) = g(x)$}\\
{$f(x,Sy) = h(x, y, f(x, y))$}
\end{quote}
where now $g$ and $h$ are both functions that we already know about. Three points about this:
\begin{enumerate}\renewcommand{\labelenumi}{\roman{enumi}.}
\item To get the definition of addition to fit this pattern, we have to take $g(x)$ to be the trivial identity function $I(x) = x$.
\item To get the definition of multiplication to fit the pattern, $g(x)$ has to be treated as the even more trivial zero function $Z(x) = 0$.
\item Again, to get the definition of addition to fit the pattern, we have to take $h(x, y, u)$ to be the function $Su$. As this illustrates, we must allow $h$ not to care what happens to some of its arguments. One neat way of doing this is to help ourselves to some further trivial identity functions that serve to select out particular arguments. Suppose, for example, we have the three-place function $I_3^3(x, y, u) = u$ to hand. Then, in the definition of addition, we can put $h(x, y, u) = SI_3^3(x, y, u)$, so $h$ is defined by composition from previously available functions.
\end{enumerate}
\subsection{The initial functions}
With that motivation, we will now officially define the full `starter pack' of functions as follows:
\begin{defn}\label{defn:initialfunctions}
The \emph{{initial function}s} are the successor function $S$, the zero function $Z(x) = 0$ and all the $k$-place identity functions, $I_i^k(x_1, x_2, \ldots, x_k) = x_i$ for each $k$, and for each $i$, $1 \leq i \leq k$.
\end{defn}
\noindent The identity functions are also often called \emph{projection} functions. They `project' the vector with components $x_1, x_2, \ldots, x_k$ onto the $i$-th axis.
\subsection{Definition by primitive recursion -- generalizing}
We next want to generalize the idea of recursion from the case of one-place and two-place functions. There's a standard notational device that helps to put things snappily: we write $\vec{x}$\index{001@$\vec{x}$} as short for the array of $k$ variables $x_1, x_2, \ldots , x_k$. Then we can generalize as follows:
\begin{defn}\label{def:primitiverecursion}
Suppose that the following holds:
\begin{quote}
{$f(\vec{x},0) = g(\vec{x})$}\\
{$f(\vec{x},Sy) = h(\vec{x}, y, f(\vec{x}, y))$}
\end{quote}
Then \emph{$f$ is defined from $g$ and $h$ by primitive recursion}.
\end{defn} \noindent This covers the case of one-place functions $f(y)$ like the factorial if we allow $\vec{x}$ to be empty, in which case $g(\vec{x})$ is a `zero-place function', i.e. a constant.
\subsection{Definition by composition}
We need to tidy up the idea of definition by composition. The basic idea, to repeat, is that we form a composite function $f$ by treating the output value(s) of one or more given functions $g$, $g'$, $g''$, \ldots, as the input argument(s) to another function $h$. For example, we set $f(x) = h(g(x))$. Or, to take a slightly more complex case, we could set $f(x,y,z) = h(g(x, y), g'(y,z))$.
There's a number of equivalent ways of covering the manifold possibilities of compounding multi-place functions. But one standard way is to define what we might call one-at-a-time composition (where we just plug \emph{one} function $g$ into another function $h$), thus:
\begin{defn}\label{defn:composition}
If $g(\vec{y}\,)$ and $h(\vec{x},u, \vec{z}\,)$ are functions -- with $\vec{x}$ and $\vec{z}$ possibly empty -- then \emph{$f$ is defined by composition by substituting $g$ into $h$} just if $f(\vec{x},\vec{y}, \vec{z}\,) = h(\vec{x},g(\vec{y}), \vec{z}\,)$.
\end{defn}
\noindent We can then think of generalized composition -- where we plug more than one function into another function -- as just iterated one-at-a-time composition. For example, we can substitute the function $g(x, y)$ into $h(u,v)$ to define the function $h(g(x, y), v)$ by composition. Then we can substitute $g'(y,z)$ into the defined function $h(g(x, y), v)$ to get the composite function $h(g(x, y), g'(y,z))$.\index{composition of functions}
%For example, take the recursive definition of $f(y) = y!$ again. The second clause has the required form
%$f(Sy) = h(y, f(y))$
%where $h(u, v) =_{\mathrm{def}} (v \times Su)$. And here, as we noted before, $h$ is defined by composition from the successor and multiplication functions.
\subsection{Putting everything together}
We informally defined the primitive recursive (henceforth, p.r.) functions as those that can be defined by a chain of definitions by recursion and composition. Working backwards down a definitional chain, it must bottom out with members of an initial `starter pack' of trivially simple functions. At the outset, we highlighted the successor function among the given simple functions. But we've since noted that, to get our examples to fit our official account of definition by primitive recursion, we need to acknowledge some other, even more trivial, initial functions. So putting everything together, let's now offer this more formal characterization:\newlength{\oldsep}
\setlength{\oldsep}{\itemsep}
\begin{defn}\label{def:prfunctions}
The p.r. functions are as follows\begin{enumerate}
\setlength{\itemsep}{0pt}
\item The initial functions $S, Z$, and $I_i^k$ are p.r.;
\item if $f$ can be defined from the p.r. functions $g$ and $h$ by composition, substituting $g$ into $h$, then $f$ is p.r.;
\item if $f$ can be defined from the p.r. functions $g$ and $h$ by primitive recursion, then $f$ is p.r.;
\item nothing else is a p.r. function.
\end{enumerate}
\end{defn}\index{primitive recursive function}\index{function!primitive recursive}
\setlength{\itemsep}{\oldsep}
%Or to put it another, equivalent, way: the p.r. functions are the smallest class of functions which (a) contains the initial functions and (b) is closed under the operations of composition and recursion.
\noindent(We allow $g$ in clauses (2) and (3) to be zero-place, i.e. be a constant.) Note, by the way, that the initial functions are {total} functions of numbers, defined for every numerical argument; also, primitive recursion and composition both build total functions out of total functions. Which means that all p.r. functions are total functions, defined for all natural number arguments.
So: a p.r. function $f$ is one that \emph{can} be specified by a chain of definitions by recursion and composition, leading back ultimately to initial functions. Let's say:
\begin{defn}\label{def:fulldefinition}
A \emph{full definition} for the p.r. function $f$ is a specification of a sequence of functions $f_0, f_1, f_2, \ldots, f_k$ where each $f_j$ is either an initial function or is defined from previous functions in the sequence by composition or recursion, and $f_k = f$.
\end{defn}
\noindent Then what we've seen is that every p.r. function has a full definition in this defined sense. (That's the sharp version of the informal characterization we gave at the end of \S\ref{prirec}.)
\section{How to prove a result about all p.r. functions}\label{sec:provingabtallprfunctions}
That last point that every p.r. function has a full definition means that there is a simple way of proving that every p.r. function has some property $P$. For suppose that, for some given property $P$, we can show
\begin{enumerate}\setlength{\itemsep}{0pt}\renewcommand{\labelenumi}{P\arabic{enumi}.}
\item The initial functions have property $P$.%\footnote{As we noted in Section~\ref{sec:capfun}, `capturing' and `{capturing as a function}' come to the same in \Q: but to fix ideas, read this the second way.}
\item If the functions $g$ and $h$ have property $P$, and $f$ is defined by composition from $g$ and $h$, then $f$ also has property $P$. \item If the functions $g$ and $h$ have property $P$, and $f$ is defined by primitive recursion from $g$ and $h$, then $f$ also has property $P$.
\end{enumerate}
Then P1, P2, and P3 together suffice to establish that all primitive recursive functions have property $P$.
Why so? Well, as we said, any p.r. function $f$ has a \emph{full} definition, which specifies a sequence of functions $f_0, f_1, f_2, \ldots, f_k$ where each $f_j$ is either an initial function or is defined from previous functions in the sequence by composition or recursion, and $f_k = f$. So as we trek along the $f_j$, we start with initial functions which have property $P$ by P1. By P2 and P3, each successive definitional move takes us from functions which have property $P$ to another function with property $P$. So, every function we define as we go along has property $P$, including the final target function $f$. (This proof is, in effect, a proof by induction on the length of the full definition for $f$: do you see why? See \S\ref{subsec:arithind}.)
In sum, then: to prove that all p.r. functions have some property $P$, it suffices to prove the relevant versions of P1, P2 and P3.
\section{The p.r.\ functions are computable}\label{sec:prfuncsarecomp}
\subsection{The basic argument}\label{sec:strategyforprbeingcomputable}
We want to show that every p.r. function is mechanically computable. Given the general strategy just described, it is enough to show that
\begin{enumerate}\setlength{\itemsep}{0pt}\renewcommand{\labelenumi}{C\arabic{enumi}.}
\item The initial functions are computable.%\footnote{As we noted in Section~\ref{sec:capfun}, `capturing' and `{capturing as a function}' come to the same in \Q: but to fix ideas, read this the second way.}
\item If $f$ is defined by composition from computable functions $g$ and $h$, then $f$ is also computable. \item If $f$ is defined by primitive recursion from the computable functions $g$ and $h$, then $f$ is also computable.
\end{enumerate}
But C1 is trivial: the initial functions $S, Z$, and $I_i^k$ are effectively computable by a simple algorithm. And C2, the composition of two computable functions $g$ and $h$ is computable (you just feed the output from whatever algorithmic routine evaluates $g$ as input into the routine that evaluates $h$).
To illustrate C3, return once more to our example of the factorial.
Here is its p.r. definition again:
\begin{quote}
{$0! = 1$}\\
{$(Sy)! = y! \times Sy$}
\end{quote}
The first clause gives the value of the function for the argument 0; then -- as we said -- you can repeatedly use the second recursion clause to calculate the function's value for $S0$, then for $SS0$, $SSS0$, etc. So the definition encapsulates an {algorithm} for calculating the function's value for any number, and corresponds exactly to a certain simple kind of computer routine. And obviously the argument generalizes.
\subsection{Computing p.r. functions by `for'-loops}
Compare our p.r. definition of the factorial with the following schematic program:
\begin{tabbing}
\hspace{2.5em}\= \hspace{1.5em} \= \hspace{0.5cm}\= \hspace{5.5cm}\= \kill
\>1. \>$\mathit{fact} \; {:=} \; 1$ \>\>\\
\>2. \>For $y = 0$ to $n - 1$\\
\>3. \>\>$\mathit{fact} \; {:=} \; (\mathit{fact} \times Sy)$\\
\>4. \>Loop
\end{tabbing}
\noindent Here $\mathit{fact}$ is a memory register that we initially prime with the value of 0!. Then the program enters a loop: and the crucial thing about executing a `for' loop is that the total number of iterations to be run through is fixed in advance: we number the loops from 0, and in executing the loop, you increment the counter by one on each cycle. So in this case, on loop number $k$ the program replaces the value in the register with $Sk$ times the previous value (we'll assume the computer already knows how to find the successor of $k$ and do that multiplication). When the program exits the loop after a total of $n$ iterations, the value in the register $\mathit{fact}$ will be $n!$.\index{for@`for' loop}
More generally, for any one-place function $f$ defined by recursion in terms of $g$ and the computable function $h$, the same program structure always does the trick for calculating $f(n)$. Thus compare
\begin{quote}
{$f(0) = g$}\\
{$f(Sy) = h(y, f(y))$}
\end{quote}
with the corresponding program
\begin{tabbing}
\hspace{2.5em}\= \hspace{1.5em} \= \hspace{0.5cm}\= \hspace{5.5cm}\= \kill
\>1. \>$\mathit{func} \; {:=} \; g$ \>\>\\
\>2. \>For $y = 0$ to $n - 1$\\
\>3. \>\>$\mathit{func} \; {:=} \; h(y,\mathit{func})$\\
\>4. \>Loop
\end{tabbing}
So long as $h$ is already computable, the value of $f(n)$ will be computable using this `for' loop that terminates with the required value in the register $\mathit{func}$.
Similarly, of course, for many-place functions. For example, the value of the two-place function defined by
\begin{quote}
{$f(x,0) = g(x)$}\\
{$f(x,Sy) = h(x, y, f(x, y))$}
\end{quote}
is calculated by the algorithmic program
\begin{tabbing}
\hspace{2.5em}\= \hspace{1.5em} \= \hspace{0.5cm}\= \hspace{5.5cm}\= \kill
\>1. \>$\mathit{func} \; {:=} \; g(m)$ \>\>\\
\>2. \>For $y = 0$ to $n - 1$\\
\>3. \>\>$\mathit{func} \; {:=} \; h(m, y,\mathit{func})$\\
\>4. \>Loop
\end{tabbing}
which gives the value for $f(m,n)$ so long as $g$ and $h$ are computable. %And in sum there is a two-way link here. The effect of a definition by recursion can computed by a `for' loop; and a `for' loop (operating on known p.r. functions) corresponds to a definition by recursion introducing a new function.
Now, our mini-program for the factorial calls the multiplication function which can itself be computed by a similar `for' loop (invoking addition). And addition can in turn be computed by another `for' loop (invoking the successor). So reflecting the downward chain of recursive definitions \begin{quote}
{factorial $\Rightarrow$ multiplication $\Rightarrow$ addition $\Rightarrow$ successor}
%{successor $\rightarrow$ addition $\rightarrow$ multiplication $\rightarrow$ factorial}
\end{quote}
there's a program for the factorial containing nested `for' loops, which ultimately calls the primitive operation of incrementing the contents of a register by one (or other operations like setting a register to zero, corresponding to the zero function, or copying the contents of a register, corresponding to an identity function).
The point obviously generalizes, giving us \begin{theorem}\label{thm:prfunctionsforloopcomputable}Primitive recursive functions are effectively computable by a series of (possibly nested) `for' loops.\end{theorem}
\subsection{If you can compute it using `for'-loops, it is p.r.}
The converse is also true. Take a `for' loop which computes the value of a function $f$ for given arguments, a loop which calls on two prior routines, one which computes a function $g$ (used to set the value of $f$ with some key argument set to zero), the other which computes a function $h$ (which is used on each loop to fix the next value of $f$ as that argument is incremented). This plainly corresponds to a definition by recursion of $f$ in terms of $g$ and $h$. And generalizing,
\begin{theorem}\label{thm:forloopsmeanspr}If a function can be computed by a program using just `for' loops as its main programming structure -- with the program's `built in' functions all being p.r. -- then the newly defined function will also be primitive recursive.
\end{theorem}
\noindent This gives us a quick-and-dirty way of convincing ourselves that a new function is p.r.: {sketch out a routine for computing it and check that it can all be done with a succession of (possibly nested) `for' loops which only invoke already known p.r. functions: then the new function will be primitive recursive.}
%We can put all that a bit more carefully. Imagine a simple programming language \textsc{loop}\index{loop program@\textsc{loop} program}. A particular \textsc{loop} program operates on a finite set of registers. At the most basic level, the language has instructions for setting the contents of a register to zero, copying contents from one register to another, and incrementing the contents of a register by one. And the \emph{only} important programming structure is the `for' loop. Such a loop involves setting a register with some initial contents (at the zero-th stage of the loop) and then iterating a \textsc{loop}-defined process $n$ times (where on each loop, the process is applied to the result of its own previous application), which has just the effect of a definition by recursion.
%Such loops can be nested. And sets of nested \textsc{loop} commands can be concatenated so that e.g. a loop for evaluating a function $g$ is followed by a loop for evaluating $h$: concatenation evidently corresponds to composition of functions. Even without going into any more details, it is very easy to see that every \textsc{loop} program will define a p.r. function, and every p.r. function is defined by a \textsc{loop} program.
\section{Not all computable numerical functions are p.r.}\label{notallcom}
We have seen that any p.r. function is mechanically computable. \emph{But not all effectively computable numerical functions are primitive recursive.} In this section, we first make the claim that there are computable-but-not-p.r. numerical functions look plausible. Then we'll cook up an example.
First, then, some plausibility considerations. We've just seen that the values of a given primitive recursive function can be computed by a program involving `for' loops as its main programming structure. Each loop goes through a specified number of iterations.
However, we do allow computations to involve \emph{open-ended searches}, with no prior bound on the length of search. We made essential use of this permission when we showed that negation-complete theories are decidable -- for we allowed the process `enumerate the theorems and wait to see which of $\varphi$ or $\neg\varphi$ turns up' to count as a computational decision procedure.
Standard computer languages of course have programming structures which implement just this kind of {unbounded search}. Because as well as `for' loops, they allow `do until' loops\index{do until@`do until' loop} (or equivalently, `do while' loops). In other words, they allow some process to be iterated until a given condition is satisfied -- \emph{where no prior limit is put on the the number of iterations to be executed}.
If we count what are presented as unbounded searches as computations, then it looks very plausible that not everything computable will be primitive recursive.
True, that is as yet only a plausibility consideration. Our remarks so far leave open the possibility that computations can always somehow be turned into procedures using `for' loops with a bounded limit on the number of steps. But in fact we can now show that isn't the case:
\begin{theorem}\label{nonprcomp}
There are effectively computable numerical functions which aren't primitive recursive.
\end{theorem}
\begin{proof}The set of p.r. functions is effectively enumerable. That is to say, there is an effective way of numbering off functions $f_0$, $f_1$, $f_2$, \ldots, such that each of the $f_i$ is p.r., and each p.r. function appears somewhere on the list.
This holds because, by definition, every p.r. function has a `recipe' in which it is defined by recursion or composition from other functions which are defined by recursion or composition from other functions which are defined \ldots ultimately in terms of some primitive starter functions. So choose some standard formal specification language for representing these recipes. \begin{table}[h]
\renewcommand{\arraystretch}{1.5}\renewcommand{\arraycolsep}{2mm}
\vspace*{12pt}
\[\begin{array}{c|cccccc}
& 0 & 1 & 2 & 3 & \ldots\\ \hline
f_0 & \underline{f_0(0)} & f_0(1) & f_0(2) & f_0(3) & \ldots\\
f_1 & f_1(0) & \underline{f_1(1)} & f_1(2) & f_1(3) & \ldots\\
f_2 & f_2(0) & f_2(1) & \underline{f_2(2)} & f_2(3) & \ldots\\
f_3 & f_3(0) & f_3(1) & f_3(2) & \underline{f_3(3)} & \ldots\\
\ldots & \ldots & \ldots & \ldots & \ldots & \searrow
\end{array}\]
\vspace*{4pt}
\end{table}Then we can effectively generate `in alphabetical order' all possible strings of symbols from this language; and as we go along, we select the strings that obey the rules for being a recipe for a p.r. function. That generates a list of recipes which effectively enumerates the p.r. functions, repetitions allowed.
Now consider our table. Down the table we list off the p.r. functions $f_0$, $f_1$, $f_2$, \ldots . An individual row then gives the values of $f_n$ for each argument. Let's define the corresponding \emph{diagonal} function,
by putting $\delta(n) = f_n(n) + 1$. To compute $\delta(n)$, we just run our effective enumeration of the recipes for p.r. functions until we get to the recipe for $f_n$. We follow the instructions in that recipe to evaluate that function for the argument $n$. We then add one. Each step is entirely mechanical. So our {diagonal function}\index{function!diagonal} is effectively computable, using a step-by-step algorithmic procedure.
By construction, however, the function $\delta$ can't be primitive recursive. For suppose otherwise. Then $\delta$ must appear somewhere in the enumeration of p.r. functions, i.e. be the function $f_{{d}}$ for some index number ${d}$. But now ask what the value of $\delta(d)$ is. By hypothesis, the function $\delta$ is none other than the function $f_d$, so $\delta(d)= f_\mathit{d}(d)$. But by the initial definition of the diagonal function, $\delta(d)= f_\mathit{d}(d) + 1$. Contradiction.
So we have `diagonalized out' of the class of p.r. functions to get a new function $\delta$ which is effectively computable but not primitive recursive.\index{diagonalize out} \end{proof}
\noindent `But hold on! \emph{Why} is the diagonal function not a p.r. function? Where are the open-ended searches involved in computing it?' Well, consider evaluating $d(n)$ for increasing values of $n$. For each new argument, we will have to look along the sequence of strings of symbols in our computing language until we find the next one that gives us a well-constructed recipe for a p.r. function. That isn't given to us a bounded search.%a \emph{different} function $f_n$ for that argument (and then add 1). We have no reason to expect there will be a nice pattern in the successive computations of all the different functions $f_n$ which enables them to be wrapped up into a single p.r. definition. Our diagonal argument in effect shows that this can't be done.
\section{Defining p.r. properties and relations}\label{sec:prproperties}
%The {p.r. functions} are a large and important class of \emph{computable} functions. We now want to extend the idea of primitive recursiveness and introduce the ideas of \emph{p.r. (numerical) properties} and \emph{relations}. These form a large and important class of \emph{decidable} properties and relations.
We have defined the class of p.r. \emph{functions}. Next, we extend the scope of the idea of primitive recursiveness and introduce the ideas of \emph{p.r. decidable (numerical) properties} and \emph{relations}.
Now, quite generally, we can tie talk of functions and talk of properties and relations together by using the notion of a \emph{{characteristic function}}\index{function!characteristic}. Here's a definition.
\begin{defn}\label{defn:characteristicfn}
The \emph{characteristic function} of the numerical property $P$ is the one-place function $c_P$ such that if $m$ is $P$, then $c_P(m) = 0$, and if $m$ isn't $P$, then $c_P(m) = 1$.\\[6pt]
The {characteristic function} of the two-place numerical relation $R$ is the two-place function $c_R$ such that if $m$ is $R$ to $n$, then $c_R(m, n) = 0$, and if $m$ isn't $R$ to $n$, then $c_R(m, n) = 1$.
\end{defn}
\noindent And similarly for many-place relations. The choice of values for the characteristic function is, of course, entirely arbitrary: any pair of distinct numbers would do. Our choice is supposed to be \mbox{reminiscent} of the familiar use of 0 and 1, one way round or the other, to stand in for \emph{true} and \emph{false}. And our (less usual) selection of 0 rather than 1 for \emph{true} is merely for later convenience in $IGT$.
The numerical property $P$ partitions the numbers into two sets, the set of numbers that have the property and the set of numbers that don't. Its \mbox{corresponding} characteristic function $c_P$ also partitions the numbers into two sets, the set of numbers the function maps to the value 0, and the set of numbers the function maps to the value 1. And these are the \emph{same} partition. So in a good sense, $P$ and its characteristic function $c_P$ contain exactly the same information about a partition of the numbers: hence we can move between talk of a property and talk of its characteristic function without loss of information. Similarly, of course, for relations (which partition pairs of numbers, etc.). And in what follows, we'll frequently use this link between properties and relations and their characteristic functions in order to carry over ideas defined for functions and apply them to properties/relations.
For example, %\begin{enumerate}
%\item We can officially say that a numerical property is \emph{{effectively decidable}} -- i.e. a suitably programmed computer can decide whether the property obtains -- just if its \emph{characteristic function} is \emph{(total and) effectively computable}. (The characteristic function needs to be total because it needs to deliver a verdict about each number as to whether it has the property in question.)
%\end{enumerate}
without further ado, we now extend the idea of primitive recursiveness to cover properties and relations:
\begin{defn}%\setcounter{enumi}{1}
A \emph{p.r. decidable property} is a property with a p.r. characteristic function, and likewise a \emph{p.r. decidable relation} is a relation with a p.r. characteristic function.
\end{defn}
\noindent Given that any p.r. function is effectively computable, p.r. decidable properties and relations are among the effectively decidable ones. Hence the appropriateness of the label!
\vspace{8pt}\noindent Now read $IGT$, Ch. 11.
\newpage
%\setcounter{section}{0}
\setcounter{page}{1}
\setcounter{footnote}{0}
\begin{center}%
{{\Large \emph{G\"odel Without (Too Many) Tears -- 6}}\\[16pt]{\LARGE Expressing and capturing the primitive recursive functions} \par%
\vskip 1.5em%
{\large
\lineskip .75em%
\begin{tabular}[t]{c}%
Peter Smith
\end{tabular}\par}}%
\vskip 0.75em%
{University of Canterbury, Christchurch, NZ}\\[6pt]
{April 7, 2010}%
\vskip 1.5em%
\end{center}%\par
\noindent\hrulefill
\begin{itemize}\setlength{\itemsep}{0pt}
\item $L_A$ can express all p.r. functions
\item The role of the $\beta$-function trick in proving that result
\item In fact, $\Sigma_1$ wffs suffice to express all p.r. functions
\item $\mathsf{Q}$ can capture all p.r. functions
\item Expressing and capturing p.r. properties and relations.
\end{itemize}
\noindent\hrulefill
\vspace{8pt}\noindent The last episode wasn't about logic or \emph{formal} theories at all: it was about common-or-garden arithmetic and the informal notion of computability.
We noted that addition can be defined in terms of repeated applications of the successor function. Multiplication can be defined in terms of repeated applications of addition. The exponential and factorial functions can be defined, in different ways, in terms of repeated applications of multiplication. There's already a pattern emerging here!
The main task in the last episode was to get clear about this pattern. So first we said more about the idea of defining one function in terms of repeated applications of another function. Tidied up, that becomes the idea of \emph{defining a function by primitive recursion} (Defn.~\ref{def:primitiverecursion}).
Then we want the idea of a definitional chain where we define a function by primitive recursion from other functions which we define by primitive recursion from other functions, and so on down, until we bottom out with the successor function and other trivia. We also of course allow composition of functions -- i.e. feeding the output of one already-defined function into another already-defined function -- along the way. Tidied up, this gives us the idea of a \emph{primitive recursive function}, i.e. one that can be defined by such a definitional chain (Defn.~\ref {def:prfunctions}).
We then noted three key facts:
\begin{enumerate}
\item Every p.r. function is intuitively computable -- moreover it is computable without going in for open-ended searches. It can be computed using only `for' loops and not open-ended `do until' loops. That's Theorem~\ref{thm:prfunctionsforloopcomputable}.
\item Conversely, if a numerical function can be computed from a starter pack of trivial functions using only `for' loops, then it is primitive recursive. That's Theorem~\ref{thm:forloopsmeanspr}.
\item But not every intuitively computable numerical function is primitive recursive. That's Theorem~\ref{nonprcomp}.
\end{enumerate}
\noindent Let's just comment on the proof of the third result.
We noted that we can effectively list (the recipes for) p.r. functions, specifying the functions $f_0$, $f_1$, $f_2$, \ldots. We can then define the function $d(n) = f_n(n) + 1$. This is effectively computable, because we just go along the effectively generated list of recipes till the $n$-th one, and use that recipe applied to input $n$ to compute $f_n(n)$ and then add one. But this computable function is distinct from all the $f_j$. So it isn't primitive recursive.
The argument evidently generalizes. Suppose we can effectively list the recipes for some other class $C$ of computable total (i.e. everywhere-defined) functions, specifying the functions $f^C_0$, $f^C_1$, $f^C_2$, \ldots. Then again we can define $d^C(n) = f^C_n(n) + 1$, which will be everywhere-defined because $f^C_j$ is everywhere defined, is computable, but not in $C$. In a slogan, we can `diagonalize out' of class $C$. So that gives us a theorem:
\begin{theorem}
No effective listing of algorithms can include algorithms for \emph{all} the intuitively computable total functions.\footnote{Note that the restriction to total functions is doing essential work here. Consider algorithms for \emph{partial} computable functions (the idea is that when the algorithm for the partial function $\varphi_i$ `crashes' on input $n$, $\varphi_i(n)$ is undefined). And consider a listing of algorithms for partial functions. $\delta(n) = \varphi_n(n) + 1$ could then be e.g. $\varphi_d$, if $\varphi_d(d)$ and hence $\varphi_d(d) + 1$ are both undefined.}
\end{theorem}
OK, so the situation is now this. We've been talking about \emph{formal} arithmetics with just three functions -- successor, addition, multiplication -- built in. We've reminded ourselves that ordinary \emph{informal} arithmetic talks about heaps more elementary functions like the exponential, the factorial, the-number-of-prime-divisors-of, and so on and so forth: and we generalized the sort of way these functions can be defined to specify the whole class of primitive recursive functions. A gulf seems to have opened up between the modesty of the resources of our formal theories (including the strongest so far, $\mathsf{PA}$) and the richness of the world of p.r. functions (and we know that those aren't even all the computable arithmetical functions). In this episode, we show the gulf is merely apparent. The language $L_A$ in fact can \emph{express} all p.r. functions; and even the weak theory $\mathsf{Q}$ can \emph{capture} them all too. So, in fact, our formal theories -- despite their modest basic resources -- can deal with a lot more than you might at first sight suppose.
Now recall the idea of a sufficiently strong theory which we introduced in Defn.~\ref{def:sufficientlystrong}. That was the idea of capturing all decidable numerical properties. That's equivalent to the idea of capturing all computable one-place functions (by the link between properties and their characteristic functions). Well, what we are claiming is that we can show at least that $\mathsf{Q}$ and hence $\mathsf{PA}$ can capture all primitive recursive computable functions. That will be enough for G\"odel's argument for incompleteness to fly.
\section{$L_A$ can express all p.r. functions}\label{sec:expressingprfunctions}
We want to show that if the one-place function $f$ is p.r., then there is a two-place $L_A$ wff $\varphi(\mathsf{x}, \mathsf{y})$, such that $\varphi(\overline{\mathsf{m}}, \overline{\mathsf{n}})$ is true if and only if $f(m) = n$. And similarly, of course, for many-place p.r. functions.
\subsection{Proof strategy}\label{subsec:proofstrategy}
Suppose that the following three propositions are all true:
\begin{enumerate}\setlength{\itemsep}{0pt}\renewcommand{\labelenumi}{E\arabic{enumi}.}
\item $L_A$ can express the initial functions, and addition and multiplication. (See Defn.~\ref{defn:initialfunctions}.)%\footnote{As we noted in Section~\ref{sec:capfun}, `capturing' and `{capturing as a function}' come to the same in \Q: but to fix ideas, read this the second way.}
\item If $L_A$ can express the functions $g$ and $h$, then it can also express a function $f$ defined by composition from $g$ and $h$. (See Defn.~\ref{defn:composition}.)
\item If $L_A$ can express the functions $g$ and $h$, then it can also express a function $f$ defined by primitive recursion from $g$ and $h$. (See Defn.~\ref{def:primitiverecursion}.)
\end{enumerate}
Then by the argument of \S\ref{sec:provingabtallprfunctions}, that establishes
\begin{theorem}\label{thm:LAexpressallprfunctoins}
$L_A$ can express all p.r. functions.
\end{theorem}
But it is trivial to prove E1. Just look at cases. The successor function $Sx = y$ is of course expressed by the open wff $\mathsf{Sx= y}$. The addition function $x + y = z$ is expressed by $\mathsf{x + y = z}$. Similarly for multiplication.
The zero function, $Z(x) = 0$ is expressed by the wff $\mathsf{Z(x,y)} =_{\mathrm{def}} \mathsf{(x = x \land y = 0)}$. %It is trivial to check that (i) for any $m$, \Q~$\vdash \mathsf{\exists!y\,Z({\overline{m}},y)}$, and (ii) for any $m$, then \Q\ $\vdash \mathsf{Z({\overline{m}}, 0)}$.
Finally, the three-place identity function $I_2^3(x, y,z) = y$, to take just one example, is expressed by the wff $\mathsf{I_2^3(x,y,z,u)} =_{\mathrm{def}} \mathsf{(x = x \land y = u \land z = z)}$. Likewise for all the other identity functions. [Check those claims!]
So that just leaves E2 and E3 to prove.
\subsection{Proving E2}\label{sub:provingE2}
This result is pretty trivial too. Suppose $g$ and $h$ are one-place functions, expressed by the wffs $\mathsf{G(x, y)}$ and $\mathsf{H(x, y)}$ respectively. Then, the function $f(x) = h(g(x))$ is evidently expressed by the wff $\mathsf{\exists z (G(x, z)\; \land\; H(z, y))}$.
For suppose $g(m) = k$ and $h(k) = n$, so $f(m) = n$. Then by hypothesis $\mathsf{G(\overline{m}, \overline{k})}$ and $\mathsf{H(\overline{k}, \overline{n})}$ will be true, and hence $\mathsf{\exists z (G(\overline{m}, z)\; \land\; H(z, \overline{n}))}$ is true, as required. Conversely, suppose $\mathsf{\exists z (G(\overline{m}, z)\; \land\; H(z, \overline{n}))}$ is true. Then since the quantifiers run over numbers, $\mathsf{(G(\overline{m}, \overline{k})\; \land\; H(\overline{k}, \overline{n}))}$ must be true for some $k$. So we'll have $g(m) = k$ and $h(k) = n$, and hence $f(m) = h(g(m)) = n$ as required.
Other cases where $g$ and/or $h$ are multi-place functions can be handled similarly.
\subsection{What it takes to define the factorial}
Proving E3 is the tricky case.\footnote{Don't worry if you find the ensuing argument a bit boggling (though really you shouldn't, as the basic proof idea is not hard even if its implementation takes a bit of a trick). As far as understanding G\"odel's theorems are concerned, what you really need to know is that Theorem~\ref{thm:LAexpressallprfunctoins} \emph{can} be proved, and not the details about \emph{how} it is proved.} We'll illustrate the general strategy by first taking a particular case of a definition by primitive recursion, and then we'll generalize. So consider the primitive recursive definition of the factorial function again:
\begin{quote}
{$0! = 1$}\\
{$(Sx)! = x! \times Sx$}
\end{quote}
The multiplication and successor functions here are of course expressible in $L_A$: but how can we express our defined function in $L_A$?
Think about the p.r. definition for the factorial in the following way. It tells us how to construct a sequence of numbers $0!, 1!, 2!, \ldots, x!$, where we move from the $u$-th member of the sequence (counting from zero) to the next by multiplying by $Su$. Putting $y = x!$, the p.r. definition thus says
\begin{enumerate}
\renewcommand{\theenumi}{\Alph{enumi}}
\item There is a sequence of numbers $k_0, k_1, \ldots , k_x$ such that: $k_0 = 1$, and if $u < x$ then $k_{Su} = k_u \times Su$, and $k_x = y$.
\end{enumerate}
So the question of how to reflect the p.r. definition of the factorial inside $L_A$ comes to this: how can we express facts about \emph{finite sequences of numbers} using the limited resources of $L_A$?
What we need to do is to wrap up a finite sequence into a single code number $c$, and then have a decoding function $\mathit{decode}$ such that if you feed $\mathit{decode}$ the code number $c$ and the index $i$ it spits out the $i$-th member of the sequence which $c$ codes! In other words, if $c$ is the code number for the sequence $k_0, k_1, \ldots , k_x$ we want: $\mathit{decode}(c, i) = k_i$.
If we can find a coding scheme and a decoding function that does the trick, then we can rewrite the p.r. definition of the factorial as
\begin{enumerate}
\renewcommand{\theenumi}{\Alph{enumi}}\setcounter{enumi}{1}
\item There is a code number $c$ such that: $\mathit{decode}(c, 0) = 1$, and if $u < x$ then $\mathit{decode}(c, Se) = \mathit{decode}(c, u) \times Su$, and $\mathit{decode}(c, x) = y$.
\end{enumerate}
And if $\mathit{decode}$ can be expressed in $L_A$ then we can define the factorial in $L_A$.
\subsection{Coding sequences if we have a factorizing function to play with}
Let's just note -- before giving a decode function that can be expressed in pure $L_A$ -- that if we were working in a slightly richer language the task would be easy.
Suppose $\pi_0, \pi_1$, $\pi_2$, $\pi_3$, $\ldots$ is the series of prime numbers $2, 3, 5, 7,\; \ldots$\;. Now consider the number
\begin{quote}
$b = \pi_0^{k_0}\cdot\pi_1^{k_1}\cdot\pi_2^{k_2}\cdot\ldots\cdot\pi_n^{k_n}$.
\end{quote}
$b$ can be thought of as encoding the whole sequence $k_0, k_1, k_2$, \ldots, $k_n$. And we can recover the coded sequence from $b$ by using the (primitive recursive) decoding function $\mathit{power}$ where $\mathit{power}(b, i)$ is the power of the $i$-th prime in the prime factorization of $b$. (That's unique by the fundamental theorem of arithmetic that says that prime factorizations are unique.)
So \emph{there is nothing at all mysterious about a coding scheme for finite sequences and a decoding function that recovers elements of the sequence from its code}: the decoding function $\mathit{power}$ would do the job. And a language $L^+_A$ with an expression for $\mathit{power}$ built in would be able to define the factorial function in the way explained.
But of course, $\mathit{power}$ isn't built into $L_A$ or obviously definable in it. The question now is: can we construct a different coding scheme with a decoding function which can be constructed from the successor, addition and multiplication functions which \emph{are} built into $L_A$?
\subsection{G\"odel's $\beta$-function}
It turns out to simplify things if we liberalize our notion of coding/decoding just a little. So we'll now allow \emph{three}-place decoding-functions, which take \emph{two} code numbers $c$ and $d$, as follows:
\begin{quote}
A three-place decoding function is a function of the form $\mathit{decode}(c, d, i)$ such that, for \emph{any} finite sequence of natural numbers $k_0, k_1, k_2, \ldots , k_n$ there is a pair of code numbers ${c, d}$ such that for every $i \leq n$, $\mathit{decode}(c, d, i) = k_i$.
\end{quote}
A three-place decoding-function will do just as well as a two-place function to help us express facts about finite sequences.
Even with this liberalization, though, it still isn't obvious how to define a decoding-function in terms of the functions built into basic arithmetic. But G\"odel neatly solved our problem with the following little trick. Put
\begin{quote}
$\beta(c,d,i) =_{\mathrm{def}}$ the remainder left when $c$ is divided by $d(i + 1) + 1$.
\end{quote}
Then, given any sequence $k_0, k_1, \ldots , k_n$, we can find a suitable pair of numbers $c$, $d$ such that for $i \leq n$, $\beta(c,d,i) = k_i$.
This claim should look intrinsically plausible. As we divide $c$ by $d(i + 1) + 1$ for different values of $i$ ($0 \leq i \leq n$), we'll get a sequence of $n + 1$ remainders. Vary $c$ and $d$, and the sequence of $n + 1$ remainders will vary. The permutations as we vary $c$ and $d$ without limit \emph{appear} to be simply endless. We just need to check, then, that appearances don't deceive, and we \emph{can} always find a (big enough) $c$ and a (smaller) $d$ which makes the sequence of remainders match a given $n + 1$-term sequence of numbers (mathmos: see \emph{IGT}, ¤13.4, fn. 6 for proof that this works!)
But now reflect that the concept of a remainder on division can be elementarily defined in terms of multiplication and addition. Thus consider the following open wff:%\footnote{For readability, we temporarily recruit `$\mathsf{c}$', `$\mathsf{d}$' and `$\mathsf{i}$' as variables. %`$\exists !$' is the familiar uniqueness quantifier again.
%}
\begin{quote}
$\mathsf{B(c,d,i,y) =_{\mathrm{def}} (\exists u \leq c)[c = \{S(d \times Si) \times u\} + y \;\land\; y \leq (d \times Si)]}$.
\end{quote}
This, as we want, expresses our G\"odelian $\beta$-function in $L_A$ (for remember, we can define `$\leq$' in $L_A$).
\subsection{Defining the factorial in $L_A$}
We've just claimed: given any sequence of numbers $k_0, k_1, \ldots , k_x$, there are code numbers $c, d$ such that for $i \leq x$, $\beta(c, d, i) = k_i$. So we can reformulate
\begin{enumerate}
\renewcommand{\theenumi}{\Alph{enumi}}
\item There is a sequence of numbers $k_0, k_1, \ldots , k_x$ such that: $k_0 = 1$, and if $u < x$ then $k_{Su} = k_u \times Su$, and $k_x = y$,
\end{enumerate}
as follows:
\begin{enumerate}
\setcounter{enumi}{2}\renewcommand{\theenumi}{\Alph{enumi}}
\item There is some pair $c, d$ such that: $\beta(c, d, 0) = 1$, and if $u < x$ then $\beta(c, d, Su) = \beta(c, d, u) \times Su$, and $\beta(c, d, x) = y$.
\end{enumerate}
But we've seen that the $\beta$-function can be expressed in $L_A$ by the open wff we abbreviated $\mathsf{B}$. So we can translate (C) into $L_A$ as follows:\begin{enumerate}\renewcommand{\theenumi}{\Alph{enumi}}
\setcounter{enumi}{3}\renewcommand{\theenumi}{\Alph{enumi}}
\item $\exists \mathsf{c\exists d\{\mathsf{B}(c, d, 0, \overline{1})} \land \\
\hspace*{0.3cm} \mathsf{(\forall u \leq x)[u \neq x \lif \exists v\exists w \{(\mathsf{B}(c, d, u, v) \land \mathsf{B}(c, d, Su, w)) \land
w = v \times Su\}]\; \land}
\\
\hspace*{0.6cm}
\mathsf{B}\mathsf{(c, d, x, y)}\}$.
\end{enumerate}
Abbreviate all that by `$\mathsf{F(x,y)}$', and we've arrived! For this evidently {expresses} the factorial function.
%Let's summarize so far. We first noted that the p.r. definition of the factorial $n!$ tells us that there is a {sequence} of $(n +1)$ numbers satisfying a certain condition. Then we used the elegant $\beta$-function trick to re-write this as the claim that there is a code number for the sequence -- or rather, two code numbers -- satisfying a related condition. Using G\"odel's particular $\beta$-function, we can then render this re-written version into $L_A$ to give us a wff which expresses the recursive definition of the factorial.
\subsection{Generalizing to prove E3}\label{sub:generalizingE3}
Finally, we need to show that we can use the same $\beta$-function trick and prove more generally that, if the function $f$ is defined by recursion from functions $g$ and $h$ which are already expressible in $L_A$, then $f$ is also expressible in $L_{A}$.
So here, just for the record, is the entirely routine generalization we need (there are no new ideas here -- just unavoidable clutter).
We are assuming that
\begin{quote}
{$f(\vec{x},0) = g(\vec{x})$}\\
{$f(\vec{x},Sy) = h(\vec{x}, y, f(\vec{x}, y))$}.
\end{quote}
This definition amounts to fixing the value of $f(\vec{x},y) = z$ thus:
\begin{enumerate}
\setcounter{enumi}{0}\renewcommand{\theenumi}{\Alph{enumi}}
\renewcommand{\labelenumi}{\theenumi *}
\item There is a sequence of numbers $k_0, k_1, \ldots , k_y$ such that: $k_0 = g(\vec{x})$, and if $u < y$ then $k_{u + 1} = h(\vec{x}, u, k_u)$, and $k_y = z$.
\end{enumerate}
So using a three-place $\beta$-function again, that comes to
\begin{enumerate}
\setcounter{enumi}{2}\renewcommand{\theenumi}{\Alph{enumi}}
\renewcommand{\labelenumi}{\theenumi *}
\item There is some $c, d$, such that: $\beta(c, d, 0) = g(\vec{x})$, and if $u < y$ then\\
$\beta(c, d, Su) = h(\vec{x}, u, \beta(c, d, u))$, and $\beta(c, d, y) = z$.
\end{enumerate}
Suppose we can already express the $n$-place function $g$ by a $(n + 1)$-variable expression $\mathsf{G}$, and the $(n + 2)$-variable function $h$ by the $(n + 3)$-variable expression $\mathsf{H}$. Then -- using `${\vec{\mathsf{x}}}$' to indicate a suitable sequence of $n$ variables -- (B*) can be rendered into $L_A$ by
\begin{enumerate}
\setcounter{enumi}{3}\renewcommand{\theenumi}{\Alph{enumi}}
\renewcommand{\labelenumi}{\theenumi *}
\item $\mathsf{\exists c\exists d\{\exists k[\mathsf{B}(c, d, 0, k)} \:\land\: \mathsf{G}(\vec{\mathsf{x}}, \mathsf{k})] \;\land \\
\hspace*{0.3cm} \mathsf{(\forall u \leq y)[u \neq y \lif \exists v\exists w
\{(\mathsf{B}(c, d, u, v) \land \mathsf{B}(c,d, Su, w))} \land
\mathsf{H}(\vec{\mathsf{x}}, \mathsf{u, v, w})\}] \land
\\
\hspace*{0.6cm}
\mathsf{B}\mathsf{(c, d, y, z)}\}$.
\end{enumerate}
Abbreviate this defined wff $\varphi(\vec{\mathsf{x}}, \mathsf{y, z})$; it is then evident that $\varphi$ will serve to express the p.r. defined function $f$. Which gives us the desired result E3.
So, we've shown how to establish each of the claims E1, E2 and E3 from the start of \S\ref{subsec:proofstrategy}. Hence every p.r. function can be expressed in $L_A$.
Theorem~\ref{thm:LAexpressallprfunctoins} is in the bag!
\section{Primitive recursive functions can be canonically expressed by $\Sigma_1$ wffs}
In this section, we extract more information out of the proof of Theorem~\ref{thm:LAexpressallprfunctoins}. In particular we show that the $L_A$ wff needed to express a p.r. function is logically not very complex -- a $\Sigma_1$ wff (in the sense of Defn.~\ref{def:SigmaPi}) is enough to do the job.
\subsection{The proof of Theorem~\ref{thm:LAexpressallprfunctoins} is constructive}
What we showed, in effect, is how to take a chain of definitions by composition and primitive recursion -- starting with the initial functions and building up to a full definition for $f$ -- and then step-by-step reflect it in building up to a wff that expresses $f$.
Remember: a full definition for $f$ is in effect a recipe for defining a whole sequence of functions $f_0, f_1, f_2, \ldots, f_k$ where each $f_j$ is either an initial function or is constructed out of previous functions by composition or recursion, and $f_k = f$. Corresponding to that sequence of functions we can write down a sequence of $L_A$ wffs which express those functions. In the terms of \S\ref{subsec:proofstrategy}, we write down the E1 expression corresponding to an initial function. If $f_j$ comes from two previous functions by composition, we use the existential construction in E2 to write down a wff built out of the wffs expressing the two previous functions. If $f_j$ comes from two previous functions by recursion, we use the $\beta$-function trick and write down a D*-style expression built out of the wffs expressing the two previous functions.
So that means we've not only proved that for any given p.r. function $f$ \emph{there exists} an $L_A$-wff which expresses it. We've shown how to \emph{construct} such a wff by recapitulating the structure of a definitional `history' for $f$.
The proof is, in a good sense, a constructive one.
For brevity, let's now say that
\begin{defn}\label{def:canonical}
An $L_A$ wff \emph{canonically expresses} the p.r. function $f$ if it recapitulates a full definition for $f$ by being constructed in the manner described in the proof of Theorem~\ref{thm:LAexpressallprfunctoins}.
\end{defn}
\subsection{Canonical wffs for expressing p.r. functions are $\Sigma_1$}
The canonical wff which reflects a full definition of $f$ is built up starting from wffs expressing initial wffs (and addition and mutliplication). Those starter wffs are $\Delta_0$ wffs, and hence $\Sigma_1$.
Suppose $g$ and $h$ are one-place functions, expressed by the $\Sigma_1$ wffs $\mathsf{G(x, y)}$ and $\mathsf{H(x, y)}$ respectively. Then, the function $f(x) = h(g(x))$ is expressed by the wff $\mathsf{\exists z (G(x, z)\; \land\; H(z, y))}$ which is $\Sigma_1$ too. For that is equivalent to a wff with the existential quantifiers pulled from the front of the $\Sigma_1$ wffs $\mathsf{G}$ and $\mathsf{H}$ out to the very front of the new wff. Similarly for other cases of composition.
Finally, if we can already express the one-place function $g$ by a two-variable $\Sigma_1$ expression $\mathsf{G}$, and the two-place function $h$ by the $three$-variable $\Sigma_1$ expression $\mathsf{H}$. Then if $f$ is defined from $g$ and $h$ by primitive recursion, $f$ can be expressed by
\begin{enumerate}
\setcounter{enumi}{3}\renewcommand{\theenumi}{\Alph{enumi}}
\renewcommand{\labelenumi}{\theenumi *}
\item $\mathsf{\exists c\exists d\{\exists k[\mathsf{B}(c, d, 0, k)} \:\land\: \mathsf{G}({\mathsf{x}}, \mathsf{k})] \;\land \\
\hspace*{0.3cm} \mathsf{(\forall u \leq y)[u \neq y \lif \exists v\exists w
\{(\mathsf{B}(c, d, u, v) \land \mathsf{B}(c,d, Su, w))} \land
\mathsf{H}({\mathsf{x}}, \mathsf{u, v, w})\}] \land
\\
\hspace*{0.6cm}
\mathsf{B}\mathsf{(c, d, y, z)}\}$.
\end{enumerate}
And this too is $\Sigma_1$. For $\mathsf{B}$ is $\Sigma_1$: and D* is equivalent to what we get when we drag all the existential quantifiers buried at the front of each of $\mathsf{B}$, $\mathsf{G}$ and $\mathsf{H}$ to the very front of the wff. (Yes, dragging existentials past a universal is usually wicked! -- but here the only universal here is a bounded universal, which is `really' just a tame conjunction, and simple tricks explained in $IGT$ allow us to get the existentials all at the front). Again this generalizes to other cases of definition by recursion.
So in fact our recipe for building a canonical wff in fact gives us a $\Sigma_1$ wff. Which yields
\begin{theorem}
$L_A$ can express any p.r. function $f$ by a $\Sigma_1$ canonical wff which recapitulates a full definition for $f$.
\end{theorem}
\section{$\mathsf{Q}$ can capture all p.r. functions}\label{sec:QcancapturePRfunctions}
We now want to show that not only can the language of $\mathsf{Q}$ express all p.r. functions, but also:
\begin{theorem}\label{thm:Qpradequate}
The theory $\mathsf{Q}$ can capture any p.r. function by a $\Sigma_1$ wff.
\end{theorem}
\noindent Recall, `capturing' a function here means being able to case-by-case prove formulae that in effect assign the right values to the function (see Defn.~\ref{def:captures}). So the formula $\chi({\mathsf{x}, \mathsf{y}})$ captures the one-place predicate in $\mathsf{Q}$ if, when $f(m) = n$, $\mathsf{Q} \vdash \chi({\overline{\mathsf{m}}, \overline{\mathsf{n}}})$, and when $f(m) \neq n$, $\mathsf{Q} \vdash \neg\chi({\overline{\mathsf{m}}, \overline{\mathsf{n}}})$. Similarly for many-place functions.
Now there's more than one route to Theorem~\ref{thm:Qpradequate}. I'll mention \emph{the direct assault} and \emph{the clever trick}. In $\emph{IGT}$, Ch. 13, I go for the clever trick. Which I now rather regret -- for while clever tricks can give you a theorem they may not necessarily give you real understanding. So what I'll describe here is the direct assault. And this is just to once more replicate the overall strategy for proving results about all p.r. functions which we described in \S\ref{sec:provingabtallprfunctions}, and deployed already in \S\ref{sec:strategyforprbeingcomputable} and \S\ref{subsec:proofstrategy}.
Suppose then that we can prove
\begin{enumerate}\setlength{\itemsep}{0pt}\renewcommand{\labelenumi}{C\arabic{enumi}.}
\item $\mathsf{Q}$ can capture the initial functions.
\item If $\mathsf{Q}$ can capture the functions $g$ and $h$, then it can also capture a function $f$ defined by composition from $g$ and $h$. \item If $\mathsf{Q}$ can capture the functions $g$ and $h$, then it can also capture a function $f$ defined by primitive recursion from $g$ and $h$.
\end{enumerate}
where in each case the capturing wffs are $\Sigma_1$. Then -- by just the same sort of argument as in \S\ref{subsec:proofstrategy} -- it follows that $\mathsf{Q}$ can capture any p.r. function by a $\Sigma_1$ wff.
So how do we prove C1? We just check that the formulae we said in \S\ref{subsec:proofstrategy} \emph{express} the initial functions in fact serve to \emph{capture} the initial functions in $\mathsf{Q}$.
How do we prove C2? Again we track the proof in \S\ref{sub:provingE2}. Suppose $g$ and $h$ are one-place functions, captured by the wffs $\mathsf{G(x, y)}$ and $\mathsf{H(x, y)}$ respectively. Then we prove that the function $f(x) = h(g(x))$ is captured by the wff $\mathsf{\exists z (G(x, z)\; \land\; H(z, y))}$ (which is $\Sigma_1$ if $\mathsf{G}$ and $\mathsf{H}$ are).
And how do we prove C3? This is the tedious case that takes hard work! We need to show the formula $\mathsf{B}$ not only expresses but captures G\"odel's $\beta$-function. And then we use that fact to prove that if the $n$-place function $g$ is captured by a $(n + 1)$-variable expression $\mathsf{G}$, and the $(n + 2)$-variable function $h$ by the $(n + 3)$-variable expression $\mathsf{H}$, then the rather horrid wff D* in \S\ref{sub:generalizingE3} captures the function $f$ defined by primitive recursion from $g$ and $h$. (If you want the gory details establishing that, then you can consult e.g. Elliott Mendelson, \emph{Introduction to Mathematical Logic}, 4th edn, Prop. 3.24. And you can check that the result again is $\Sigma_1$ if $\mathsf{G}$ and $\mathsf{H}$ are.)
So the basic story is this. Take a full definition for defining a p.r. function, ultimately out of the initial functions. Follow the step-by-step instructions implicit in \S\ref{sec:expressingprfunctions} about how to build up a canonical wff which in effect recapitulates that recipe. You'll get a wff which expresses the function, and that same wff captures the function in $\mathsf{Q}$ (and in any stronger theory with a language which includes $L_A$). Moreover the wff in question will be $\Sigma_1$.
\section{Expressing/capturing properties and relations}
Just a brief coda, linking what we've done in this episode with the last section of the previous one.
We said in \S\ref{sec:prproperties}, Defn.~\ref{defn:characteristicfn} that the characteristic function $c_P$ of a monadic numerical property $P$ is defined by setting $c_P(m) = 0$ if $m$ is $P$ and $c_P(m) = 1$ otherwise. And a property $P$ is said to be p.r. decidable if its characteristic function is p.r.
Now, suppose that $P$ is p.r.; then $c_P$ is a p.r. function. So, by Theorem~\ref{thm:LAexpressallprfunctoins}, $L_A$ can express $c_P$ by a two-place open wff $\mathsf{c}_P(\mathsf{x, y})$. So if $m$ is $P$, then $c_P(m) = 0$, then $\mathsf{c}_P(\mathsf{\overline{m}, 0})$ is true. And if $m$ is not $P$, then $c_P(m) \neq 0$, then $\mathsf{c}_P(\mathsf{\overline{m}, 0})$ is not true. So, by the definition of expressing-a-property, the wff $\mathsf{c}_P(\mathsf{x, 0})$ serves to express the p.r. property $P$. The point trivially generalizes from monadic properties to many-place relations. So we have as an easy corollary of Theorem~\ref{thm:LAexpressallprfunctoins} that
\begin{theorem}
$L_A$ can express all p.r. decidable properties and relations.
\end{theorem}
Similarly, suppose again that $P$ is p.r. so $c_P$ is a p.r. function. So, by Theorem~\ref{thm:Qpradequate}, $\mathsf{Q}$ can capture $c_P$ by a two-place open wff $\mathsf{c}_P(\mathsf{x, y})$. So if $m$ is $P$, then $c_P(m) = 0$, so $\mathsf{Q} \vdash \mathsf{c}_P(\mathsf{\overline{m}, 0})$. And if $m$ is not $P$, then $c_P(m) \neq 0$, then $\mathsf{Q} \vdash \neg\mathsf{c}_P(\mathsf{\overline{m}, 0})$. So, by the definition of capturing-a-property, the wff $\mathsf{c}_P(\mathsf{x, 0})$ serves to capture the p.r. property $P$ in $\mathsf{Q}$. The point trivially generalizes from monadic properties to many-place relations. So we have as an easy corollary of Theorem~\ref{thm:Qpradequate} that
\begin{theorem}
$\mathsf{Q}$ can capture all p.r. decidable properties and relations.
\end{theorem}
\vspace{8pt}\noindent Now read \emph{IGT}, Chap 13.
\newpage
%\setcounter{section}{0}
\setcounter{page}{1}
\setcounter{footnote}{0}
\begin{center}%
{{\Large \emph{G\"odel Without (Too Many) Tears -- 7}}\\[16pt]{\LARGE The arithmetization of syntax} \par%
\vskip 1.5em%
{\large
\lineskip .75em%
\begin{tabular}[t]{c}%
Peter Smith
\end{tabular}\par}}%
\vskip 0.75em%
{University of Canterbury, Christchurch, NZ}\\[6pt]
{March 10, 2010}%
\vskip 1.5em%
\end{center}%\par
\noindent\hrulefill
\begin{itemize}\setlength{\itemsep}{0pt}
\item A very little on Hilbert's program
\item G\"odel coding
\item The relation that holds between $m$ and $n$ when $m$ codes for a $\mathsf{PA}$ proof of the wff with code $n$ is p.r.
\item The (standard) notation $\ulcorner{\varphi}\urcorner$ to denote to code number for the wff $\varphi$, and $\overline{\ulcorner{\varphi}\urcorner}$ as shorthand for the standard numeral for $\ulcorner{\varphi}\urcorner$.
\item The idea of diagonalization
\end{itemize}
\noindent\hrulefill
\vspace{8pt}\noindent This episode looks rather more substantial than it really is. The first page-and-a-bit reviews where we've been. And then \S\ref{sec:Formalization} is more scene-setting which (depending on your background) you might well just want to skip through. The real action continues with \S\ref{sec:Arithmetization} on p.~\pageref{sec:Arithmetization}.
Anyway, let's take stock yet again. %For three quarters of the battle is to get a clear sense of the overall route we are taking to proving G\"odel's First Incompleteness Theorem. Get that into focus and you'll understand which are the Big Ideas that are in play and see too which bits of the proof are ``boring details". This should have two pay-offs. You'll understand better how to structure answers to ``bookwork" exam questions which ask you to outline some or all of our proof of incompleteness; and you'll be inoculated against at least some silly philosophical misunderstandings about G\"odel's First Theorem. So here's the story so far!
\begin{enumerate}
\item In Episode 1 we introduced the very idea of a formal axiomatized theory, the notion of negation incompleteness, and we stated (two versions of) G\"odel's First Theorem.
\item Next, we gave a proof of an incompleteness theorem that is weaker than G\"odel's, since it doesn't tell us how to construct a true-but-unprovable sentence. But the proof is suggestive as it starts from the idea of a \emph{sufficiently strong} theory, i.e. one that can `capture' every decidable property $P$ of numbers, in the sense (roughly) of proving case by case that $n$ is $P$ when it is, and proving that it isn't when it isn't.
\item The opening episodes, however, proceeded at a considerable level of abstraction. We talked in the first episode of a theory's `containing a modest amount of arithmetic' without explaining how much that it is. Then in the second episode we talked of a theory's being `sufficiently strong', promising that a `sufficiently strong' theory is indeed modest, but again without saying what how much arithmetic that involves. But in Episode 3 we started looking at some actual theories of arithmetic of various strengths. As a warm-up exercise we first looked at `Baby Arithmetic' $\mathsf{BA}$, a complete quantifier-free arithmetic, that `knows' how to compute the results of additions and multiplications. Then we added quantifiers, replaced $\mathsf{BA}$'s axiom schemata with quantified axioms, added another axiom that says that every number other than zero is a successor, and we get Robinson Arithmetic $\mathsf{Q}$. This is boringly incomplete -- meaning that it can be shown to be incomplete without any fancy G\"odelian arguments. It can't even prove $\mathsf{\forall x(0 + x = x)}$. [Reality check: how is that $\mathsf{BA}$ is complete and the stronger theory $\mathsf{Q}$ is incomplete?] But $\mathsf{Q}$ is highly interesting despite its weakness: it will turn out to be `sufficiently strong' in the sense we introduced in the previous episode. But of course, at this stage we can't prove that claim!
\item $\mathsf{Q}$, as we said, is boringly incomplete, even for just the arithmetic of successor, addition and multiplication. How can we beef it up at least so that it can prove elementary quantified truths like $\mathsf{\forall x(0 + x = x)}$? By adding induction. If we add each instance of the so-called Induction Schema as an extra axioms to $\mathsf{Q}$, we get First-Order Peano Arithmetic $\mathsf{PA}$. This rich theory certainly proves all familiar elementary arithmetical claims than can be expressed in the first-order language $L_A$ which has the successor, addition, and multiplication functions built in. In fact, pre-G\"odel, the natural conjecture would be that it is a complete theory for the truths of $L_A$.
\item But of course, elementary arithmetic deals with many more functions than successor, addition, and multiplication, i.e. many more functions than are built in to $L_A$. In Episode 5, we looked at a whole family of functions -- the primitive recursive functions -- which can defined in the sort of way that e.g. multiplication and addition are defined by recursion. These p.r. functions are computable functions, but aren't all of the computable functions.
\end{enumerate}
Episodes 3 to 5 can be thought of as important scene-setting: but our real work proving (versions of) the First Incompleteness Theorem starts now.
\begin{enumerate}\setcounter{enumi}{5}
\item Episode 6 showed that $L_A$ can in fact \emph{express} all the p.r. functions -- i.e. for any one-place p.r. function $f$, we can construct in $L_A$ a wff $\varphi$ which is satisfied by a pair of numbers $m, n$ just when $f(m) = n$ (and similarly for many-place functions). And we gestured towards a proof that even $\mathsf{Q}$ (and hence, a fortiori $\mathsf{PA}$) can capture all the p.r. functions: if we construct the wff $\varphi$ that expresses $f$ in the right way, then if $f(m) = n$, $\mathsf{Q} \vdash \varphi(\overline{\mathsf{m}}, \overline{\mathsf{n}})$, and if $f(m) \neq n$, $\mathsf{Q} \vdash \neg\varphi(\overline{\mathsf{m}}, \overline{\mathsf{n}})$.
\end{enumerate}
Now that last result in Episode 6 doesn't \emph{quite} take us back to the ideas in Episode 2. Earlier, we talked about `sufficiently strong theories', where a theory is sufficiently strong if it captures all computably decidable properties of numbers. Episode 6 shows that $\mathsf{Q}$ and richer theories capture all p.r. decidable properties (meaning properties with p.r. characteristic functions). But since not all decidable properties are p.r. decidable (since not all computable functions are p.r. functions), this doesn't yet give us the result that $\mathsf{Q}$ and richer theories are sufficiently strong (even though they are).
However, it will turn out over this and the next episode that we don't need the stronger claim. $\mathsf{Q}$'s being able to be capture all p.r. functions is enough to make it the case that sensibly axiomatized theories that contain $\mathsf{Q}$ are incomplete (so long as they are consistent and satisfy another modest condition).
\section{Formalization and finitary reasoning}\label{sec:Formalization}
\subsection{Formalization and axiomatization again}
Before proceeding, let's give ourselves a few gentle reminders about the business of \emph{formalization} -- something that is now familiar to any logic student.
In elementary logic classes, we are drilled in translating arguments into an appropriate {formal language} and then constructing formal deductions of putative conclusions from given premisses. Why bother with formal languages? Because everyday language -- even mathematical English -- is replete with redundancies and ambiguities. So, in assessing complex arguments, it helps to regiment them into a suitable artificial language which is expressly designed to be free from obscurities, and where surface form reveals logical structure. And why bother with {formal deduction}s? Because informal arguments -- even mathematical arguments -- often involve suppressed premisses (and there is the lurking danger of inferential fallacies). It is only too easy to cheat. Setting out arguments as formal deductions in one style or another enforces honesty: we have to keep a tally of the premisses we invoke, and of exactly what inferential moves we are using. And honesty is the best policy. For suppose things go well with a particular formal deduction. Suppose we get from the given premisses to some target conclusion by small inference steps each one of which is obviously valid (no suppressed premisses are smuggled in, and there are no suspect inferential moves). Our honest toil then buys us the right to confidence that our premisses really do entail the desired conclusion.
Granted, outside the logic classroom we almost never set out deductive arguments in fully formalized versions. No matter. We have glimpsed a first ideal -- arguments presented in an entirely perspicuous language with maximal clarity and with everything entirely open and above board, leaving no room for misunderstanding, and with all the arguments' commitments systematically and frankly acknowledged.
Old-fashioned presentations of Euclidean geometry illustrate the pursuit of a related second ideal -- the (informal) {axiomatized theory}. Like beginning logic students, school students used to be drilled in providing deductions, though the deductions were framed in ordinary geometric language. The game is to establish a whole body of theorems about (say) triangles inscribed in circles, by deriving them from simpler results, which had earlier been derived from still simpler theorems that could ultimately be established by appeal to some small stock of fundamental principles or {axioms}. And the aim of this enterprise? By setting out the derivations of our various theorems in a laborious step-by-step style -- where each small move is warranted by simple inferences from propositions that have already been proved -- we develop a unified body of results that we can be confident must hold if the initial Euclidean axioms are true. On the surface, school geometry perhaps doesn't seem very deep: yet making all its fundamental assumptions fully explicit is surprisingly difficult. And giving a set of axioms invites further enquiry into what might happen if we tinker with these assumptions in various ways -- leading, as is now familiar, to investigations of non-Euclidean geometries.
%These days, quite a few mathematical theories are presented axiomatically in a more formal way from the very outset. %\footnote{For a classic defence, extolling the axiomatic method in mathematics, see \cite{Hil18}.}
% For example, set theories are typically presented by laying down some basic axioms expressed in a logical language and exploring their deductive consequences. We want to discover exactly what is guaranteed by the fundamental principles embodied in the axioms. And we are again interested in exploring what happens if we change the axioms and construct alternative set theories.% -- e.g. what happens if we drop the `axiom of choice' or add `large cardinal' axioms?
Now, even the most tough-minded mathematics texts which explore axiomatized theories are written in an informal mix of ordinary language and mathematical symbolism. Proofs are rarely spelt out in every formal detail, and so their presentation falls short of the logical ideal of full formalization.
But we will hope that nothing stands in the way of our more informally presented mathematical proofs being sharpened up into fully formalized ones -- i.e. we hope that they \emph{could} be set out in a strictly regimented formal language of the kind that logicians describe, with absolutely every inferential move made fully explicit and checked as being in accord with some overtly acknowledged rule of inference, with all the proofs ultimately starting from our explicitly given axioms. True, the extra effort of laying out everything in this kind of detail will almost never be worth the cost in time and ink. In mathematical practice we use enough formalization to convince ourselves that our results don't depend on illicit smuggled premisses or on dubious inference moves, and leave it at that -- our motto is `sufficient unto the day is the rigour thereof'. But still, we want good mathematics to achieve precision and to avoid the use of unexamined inference rules or unacknowledged assumptions -- and thus, perhaps, run the danger of paradox and inconsistency.
So, putting together the logician's aim of perfect clarity and honest inference with the mathematician's project of regimenting a theory into a tidily axiomatized form, we can see the point of the notion of an \emph{axiomatized {formal theory}} as a composite ideal.
Note, we are not saying that mathematicians ought really always to work inside fully formalized axiomatized theories. Mathematics is hard enough even when done using the usual strategy of employing just as much rigour as seems appropriate to the case in hand. And in any case, as mathematicians (and some philosophical commentators) are apt to stress, there is a lot more to mathematical practice than striving towards the logical ideal. For a start, we typically aim for proofs which are not merely correct but \emph{explanatory} -- which not only show that some proposition must be true, but in some sense make it clear \emph{why} it is true. However, such observations don't affect {our} present point, which is that the business of formalization just takes to the limit features that we expect to find in good proofs anyway, i.e. precise clarity and lack of inferential gaps.
\subsection{Responding to paradox}
Think yourself back to the situation in mathematics a century ago. Classical analysis -- the theory of differentiation and integration -- has, supposedly, been put on firm foundations. We've done away with obscure talk about infinitesimals; and we've traded in an intuitive grasp of the continuum of real numbers for the idea of reals defined as `Dedekind cuts' on the rationals or `Cauchy sequences' of rationals. The key idea we've used in our constructions is the idea of a \emph{set} of numbers. And we've been very free and easy with that, allowing ourselves to talk of arbitrary sets of numbers, even when there is no statable rule for collecting the numbers into the set.
This freedom to allow ourselves to talk of arbitrarily constructed sets is just one aspect of a wider freedom that mathematicians have, over the preceding decades of the second half of the nineteenth century, allowed themselves. They have loosed themselves from the assumption that mathematics should be tied to the description of nature: as Morris Kline puts it, ``after about 1850, the view that mathematics can introduce and deal with arbitrary concepts and theories that do not have any immediate physical interpretation \ldots gained acceptance". And Cantor could write ``Mathematics is entirely free in its development and its concepts are restricted only by the necessity of being non-contradictory".
It is bad news, then, if all this play with freely created concepts, and in particular the fundamental notion of arbitrary sets, in fact gets us embroiled in contradiction -- as seems to be the case as the set-theoretic paradoxes pile up. What to do?
We might distinguish two kinds of responses that we can have to the paradoxes that threaten Cantor's paradise where mathematicians can play freely, what we might suggestively call \emph{foundationalist}, and \emph{mathematical} lines.\\
\noindent\emph{Foundationalist responses to paradox}\quad Consider first the option of seeking ``foundations". We could, for example, seek to ``re-ground'' mathematics by confining ourselves again to applicable mathematics which has, as we'd anachronistically put it, a model in the natural world so \emph{must} be consistent. The trouble is we're none too clear what this would involve -- for remember, we are thinking back at the beginning of the twentieth century, as relativity and quantum mechanics are emerging, and any Newtonian confidence that we had about structure of the natural world is being shaken. So put the option of founding mathematics in the physical world aside. But perhaps (i) we could try to go back to find incontrovertible logical principles and definitions of mathematical notions in logical terms, and try to constrain mathematics to what we can reconstruct on a firm logical footing. Or, for another line, (ii) we could try to ensure that our mathematical constructions are grounded in mental constructions that we can perform and have a secure epistemic access to. %Or (iii) we could try to diagnose a theme common to the problem paradoxical cases Ñ e.g. ``impredicativity'' Ñ and secure mathematics by banning such constructions, and founding mathematics on constructions that avoid the danger of impredicativity.
Of course, the trouble is that the logicist response (i) is problematic, not least because (remember where we are in time!) logic itself isn't in as good a shape as most of the mathematics we are supposedly going to use it to ground, and what might count as logic is obscure. Indeed, as Peirce saw, we needed to appeal to mathematically developed ideas in order to develop logic itself; and indeed he thought that all formal logic is merely mathematics applied to logic. The intuitionistic line (ii) depends on an even more obscure notion of mental construction, and in any case -- in its most worked out form -- cripples mathematics. %The predicativist option (iii) is perhaps better, but still implies that swathes of seemingly harmless classical mathematics will have to be abandoned.
So what now? Is there some other philosophically well-motivated foundation that will rescue us from the threat of paradox?\\
\noindent\emph{`Mathematical' responses to paradox}\quad Well, perhaps we shouldn't seek to give mathematics a philosophical ``foundation" at all. After all, the paradoxes arise within mathematics, and to avoid them we just \ldots need to do mathematics more carefully. As Peirce -- for example -- held, mathematics risks being radically distorted if we seek to make it answerable to some outside considerations (from philosophy or logic). And we don't need to look outside for a prior justification that will guarantee consistency. Rather \emph{we need to improve our mathematical practice, in particular by improving the explicitness of our regimentations of mathematical arguments, to reveal the principles we actually use in `ordinary' mathematics, and to see where the fatal mis-steps must be occurring when we over-stretch these principles in ways that lead to paradox}.
How do we improve explicitness, and pursue more careful explorations? A first step will be to aim to regiment the principles that we actually need in mathematics into something approaching the ideal form of an axiomatized formal theory. This is what Zermelo aimed to do in axiomatizing set theory: to locate the principles actually needed for the seemingly `safe' mathematical constructions needed in grounding classical analysis and other familiar mathematical practice. And when the job is done it seems that \emph{these} principles don't in fact allow the familiar reasoning leading to Russell's Paradox or other set-theoretic paradoxes. So perhaps this axiomatized theory \emph{is} consistent and trouble-free?
Well, to explore this question, the thought goes, requires not `looking outside' mathematics for foundations, but engaging in more mathematical enquiry.\\
\noindent So note, both the foundationalist and the `do maths better' camp, are keen on battering a mathematical theory $T$ into nice tidy axiomatized formats and sharply defining the rules of the game. But for a foundationalist logicist this is a step on the way to showing $T$ is \emph{true}. The `do maths better' camp just want axiomatization to be a prelude to giving us enough control over theory $T$ as to be able e.g. to show it is consistent.
\subsection{Applying finitary mathematics to formal theories}\label{secwhatisHilbertsProgram}
Now note the key observation emphasized by Hilbert. The axiomatic formalization of a theory $T$ about widgets, wombats, or whatever, gives us \emph{new} formal objects that are themselves apt topics for formal mathematical investigation -- namely the $T$-wffs and $T$-proofs that make up the theory! And, crucially, when we go metatheoretical and move from thinking about \emph{sets} (for example) to thinking about the syntactic properties of \emph{formalized-theories-about-sets}, we move from considering suites of \emph{infinite} objects to considering suites of \emph{finite} formal objects (the wffs, and the finite sequences of wffs that form proofs). This means that we might then hope to bring to bear, at the metatheoretical level, entirely `safe', merely \emph{finitary}, reasoning about these suites of finite formal objects in order to prove consistency, etc.
Of course, it is a moot point what exactly constitutes `safe' finitary reasoning. But still, it looks as if we will -- for instance -- need much, much, less than full set theory to reason about formalized set theory. So we might, in particular, hope with Hilbert to be able to use a safe uncontentious fragment of finitary mathematics to prove that our wildly infinitary set theory is at least syntactically consistent (doesn't prove both $\varphi$ and $\neg\varphi$ for some wff $\varphi$). As we'll see later, the hope can't be fulfilled: still, you can see the attractions of Hilbert's hopeful programme here -- the programme of showing various systems of infinitary mathematics are contradiction-free by giving finitary consistency proofs.
But now, enter G\"odel \ldots
\section{Arithmetization}\label{sec:Arithmetization}
We just noted Hilbert's insight that the syntactic objects that comprise formal theories (the wffs, the proofs) are \emph{finite} objects, and so we only need mathematics of finite objects to theorize about the syntactic properties of theories. But now here comes G\"odel's great insight: \emph{when we are dealing with finite objects, we can associate them with numerical codes: then we can use arithmetic to talk about these codes, and to deal with the arithmetical properties that -- via the coding -- `track' syntactic properties of the theories.}
We'll implement this simple but powerful idea in stages.
\subsection{Coding expressions in a formal language}
We'll concentrate on the particular case of coding up expressions of the language $L_A$ (but you'll see that the same basic idea will work for any formal language). There are of course different ways of doing this: we'll follow in style G\"odel and tradition!
So suppose that our version of $L_A$ has the usual logical symbols (connectives, quantifiers, identity, brackets), and symbols for zero and for the successor, addition and multiplication functions: associate all those with odd numbers (different symbol, different number, of course). $L_A$ also has an inexhaustible supply of variables, which we'll associate with even numbers. So, to pin that down, let's fix on this preliminary series of \emph{basic codes}:
\renewcommand{\arraystretch}{1.5}\renewcommand{\arraycolsep}{1mm}
\vspace{-6pt}\[\begin{array}{cccccccccccccccccccc}
\neg & \land & \lor & \lif & \equiv & \forall
& \exists & = & ( & ) & \mathsf{0} & \mathsf{S} & + & \times & \mathsf{x} & \mathsf{y} & \mathsf{z} & \ldots \\ %\hline
1 & 3 & 5 & 7 & 9 & 11 & 13 & 15 & 17 & 19 & 21 & 23 & 25 & 27 & 2 & 4 & 6 & \ldots
\end{array}\]
Our G\"odelian numbering scheme for expressions is now defined in terms of this table of basic codes as follows:
\begin{defn}
Let expression $e$ be the sequence of $k + 1$ symbols and/or variables $s_0, s_1, s_2, \ldots, s_k$. Then $e$'s \emph{G\"odel\ number} (g.n.) is calculated by taking the basic code-number $c_i$ for each $s_i$ in turn, using $c_i$ as an exponent for the $i + 1$-th prime number $\pi_i$, and then multiplying the results, to get
$2^{c_0} \cdot 3^{c_1} \cdot 5^{c_2} \cdot \ldots \cdot \pi_k^{c_k}$.
\end{defn}
\noindent For example:
\begin{enumerate}\renewcommand{\labelenumi}{\roman{enumi}.}
\item The single symbol `$\mathsf{S}$' has the g.n.\ $2^{23}$ (the first prime raised to the appropriate power as read off from our correlation table of basic codes).
\item The standard numeral $\mathsf{SS0}$ has the g.n.\ $2^{23}\cdot 3^{23}\cdot 5^{21}$ (the product of the first three primes raised to the appropriate powers).
\item The wff
\begin{quote}
$
\exists \mathsf{y\,(S0 + y) = SS0}
$
\end{quote}
has the g.n.
\begin{quote}
$2^{13} \cdot 3^{4} \cdot 5^{17} \cdot 7^{23} \cdot 11^{21} \cdot 13^{25} \cdot 17^{4} \cdot 19^{19} \cdot 23^{15} \cdot 29^{23} \cdot 31^{23} \cdot 37^{21}$
\end{quote}
\end{enumerate}
That last number is, of course, \emph{enormous}. So when we say that it is elementary to decode the resulting g.n. by taking the exponents of prime factors, we don't mean that the computation is quick. We mean that the computational routine required for the task -- namely, repeatedly extracting prime factors -- involves no more than the mechanical operations of school-room arithmetic.
\subsection{Coding sequences}\label{Codsequences}
As well as talking about wffs via their code numbers, we'll want to talk about proofs via \emph{their} code numbers. But how \emph{do} we code for proof-arrays?
The details will obviously depend on the kind of proof system we adopt for the theory we are using. Suppose though, for simplicity, we consider theories with a Hilbert-style axiomatic system of logic. And in this rather old-fashioned framework, proof-arrays are simply \emph{linear sequences} of wffs. A nice way of coding these is by what we'll call \emph{super G\"odel\ numbers}.
\begin{defn}Given a sequence of wffs or other expressions
$e_0, e_1, e_2, \ldots, e_n$,
we first code each $e_i$ by a regular g.n.\ $g_i$, to yield a sequence of numbers $g_0, g_1, g_2, \ldots, g_n$.
We then encode this sequence of regular G\"odel\ numbers using a single \emph{super g.n.} by repeating the trick of multiplying powers of primes to get $2^{g_0} \cdot 3^{g_1} \cdot 5^{g_2} \cdot\,\ldots\,\cdot \pi_n^{g_n}$.
\end{defn}
\noindent Decoding a super g.n. therefore involves two steps of taking prime factors: first find the sequence of exponents of the prime factors of the super g.n.; then treat those exponents as themselves regular g.n., and take their prime factors to arrive back at a sequence of expressions.
\subsection{\emph{Term}, \emph{Wff} and \emph{Sent} are p.r. properties}\label{sec:gdlpr}
For this subsection, we'll continue to focus on the language $L_A$. But similar remarks will apply mutatis mutandis to any sensibly built formal language. We begin with some definitions (see \S\ref{sec:howGprovedFirstTheorem}):
\begin{defn}Having fixed on a scheme of G\"odel numbering, define the following numerical properties:
\begin{enumerate} \setlength{\itemsep}{-0.75ex}\setlength{\parsep}{0ex}
\item $\mathit{Term(n)}$ is to hold when $n$ codes for a term of $L_A$.
\item $\mathit{Wff(n)}$ is to hold when $n$ codes for a wff of $L_A$.
\item $\mathit{Sent(n)}$ is to hold when $n$ codes for a closed sentence of $L_A$.
\end{enumerate}
\end{defn}
\noindent Then we have the following key result:
\begin{theorem}\label{th:prefseqispr}
$\mathit{Term(n)}$, $\mathit{Wff}(n)$, and $\mathit{Sent}(n)$ are p.r. decidable properties.\end{theorem}
\noindent And we have a much easier time of it than G\"odel\ did. Writing at the very beginning of the period when concepts of computation were being forged, he couldn't expect his audience to take anything on trust about what was or wasn't `\emph{rekursiv}' or -- as we would now put it -- primitive recursive. He therefore had to do all the hard work of explicitly showing how to define these properties by a long chain of definitions by composition and recursion.
However, assuming only a very modest familiarity with the ideas of computer programs and p.r. functions, we can perhaps short-cut all that effort and be entirely persuaded by the following:
\begin{proof}To determine whether $\mathit{Term}(n)$, proceed as follows. Decode $n$: that's a mechanical exercise. Now ask: is the resulting expression a term? That is to say, is it `$\mathsf{0}$', a variable, or built up from `$\mathsf{0}$' and/or variables using just the successor, addition and multiplication functions? That's algorithmically decidable. The length of the first decoding stage of the computation will be bounded by a simple function of the length of $n$: similarly for the second stage of the computation, deciding whether the decoded expression -- if there is one -- is a term. Neither stage will involve any open-ended search. Of course, these computations involve shuffling strings of symbols; but -- run on a real computer -- those will in effect become computations done on binary numbers. And if the whole computation can therefore be done ultimately without unbounded searches, using only `for' loops operating on numbers, the numerical properties and relations which are decided by the whole procedure must be primitive recursive.
Similarly we can mechanically decide whether $\mathit{Wff}(n)$ or $\mathit{Sent}(n)$. Decode $n$ again. Now ask: is the result an a wff or a sentence of $L_A$? In each case, that's algorithmically decidable, without any open-ended searches. And again, what's computably decidable without open-ended searches is primitive-recursively decidable.
\end{proof}
\subsection{\emph{Prf} is a p.r. relation}\label{sec:gdlpr}
In this subsection, we'll focus on the theory $\mathsf{PA}$. But again, similar remarks will apply mutatis mutandis to any sensibly axiomatized theory.
We introduce another definition:
\begin{defn}
Again having fixed on a scheme of G\"odel numbering, the numerical relation $\mathit{Prf}(m, n)$ is defined to hold when $m$ is the super g.n. of a proof in $\mathsf{PA}$ of the sentence with g.n. $n$.
\end{defn}
\noindent We have, as you might expect, a corresponding theorem:
\begin{theorem}
$\mathit{Prf}(m, n)$ is a p.r. relation.
\end{theorem}
\noindent Again we'll given an informal argument:
\begin{proof}
To determine whether $\mathit{Prf}(m, n)$, proceed as follows. First doubly decode $m$: that's a mechanical exercise. Now ask: is the result a sequence of $\mathsf{PA}$ wffs? That's algorithmically decidable (since it is decidable whether each separate string of symbols is a wff). If it does decode into a sequence of wffs, ask: is this sequence a properly constructed $\mathsf{PA}$ proof? That's decidable too (check whether each wff in the sequence is either an axiom or is an immediate consequence of previous wffs by one of the rules of inference of $\mathsf{PA}$'s Hilbert-style logical system). If the sequence is a proof, ask: does its final wff have the g.n.\ $n$? That's again decidable. Finally ask whether $\mathit{Sent}(n)$ is true. Putting all that together, there is a computational procedure for telling whether $\mathit{Prf}(m, n)$ holds. Moreover, at each and every stage, the computation involved is once more a straightforward, bounded procedure that doesn't involve any open-ended search. \end{proof}
\noindent A similar result, as we said will hold for the corresponding $\mathit{Prf}_T$ relation for any theory $T$ which is like $\mathsf{PA}$ in this respect: we can mechanically check whether a string of wffs constitutes a $T$-proof without having to engage in any open-ended search. Any normally presented formalized theory will be like this. (In fact, we could reasonably have built that requirement into a sharper version of Defn.~\ref{formal_theory}.)
\subsection{Our results are robust!}
We've just shown that some properties and relations like $\mathit{Term}$ and $\mathit{Prf}$ are p.r. decidable. But note, $\mathit{Term}(n)$ -- for example-- is to hold when $n$ is the code number of a term of $L_A$ \emph{according to our G\"odel\ numbering scheme}. However, our numbering scheme was fairly arbitrarily chosen. We could, for example, shuffle around the preliminary assignment of basic codes to get a different numbering scheme; or (more radically) we could use a scheme that isn't based on powers of primes. So could it be that a property like $\mathit{Term}$ is p.r. when defined in terms of our arbitrarily chosen numbering scheme and not p.r. when defined in terms of some alternative but equally sensible scheme?
Well, what counts as `sensible' here? The key feature of our G\"odelian scheme is this: there is a pair of \emph{algorithms}, one of which takes us from an $L_A$ expression to its code number, the other of which takes us back again from the code number to the original expression -- and moreover, in following through these algorithms, the length of the computation is a simple function of the length of the $L_A$ expression to be encoded or the size of the number to be decoded. The algorithms don't involve open-ended computations using unbounded searches: in other words, the computations can be done just using `for' loops.
So
let $S$ be any other comparable coding scheme, which similarly involves a pair of algorithmic methods for moving to and fro between $L_A$ expressions and numerical codes (where the methods don't involve open-ended searches). And suppose $S$ assigns code $n_1$ to a certain $L_A$ expression. Consider the process of first decoding $n_1$ to find the original $L_A$ expression and then re-encoding the expression using our G\"odelian scheme to get the code number $n_2$ (strictly, we need to build in a way of handling the `waste' cases where $n_1$ isn't an $S$-code for any wff). By hypothesis, this process will combine two simple computations which just use `for' loops. Hence, there will be a \emph{primitive recursive} function which maps $n_1$ to $n_2$. Similarly, there will be another p.r. function which maps $n_2$ back to $n_1$.
Let's say:
\begin{defn} A coding scheme $S$ for $L_A$ mapping expressions to numbers is \emph{acceptable} iff there is a p.r. function $tr$ which `translates' code numbers according to $S$ into code numbers under our official G\"odelian scheme, and another p.r. function $tr^{-1}$ which converts code numbers under our scheme back into code numbers under scheme $S$.
\end{defn}
\noindent Then we've just argued that being acceptable in this sense is at least a {necessary} condition for being an intuitively `sensible' numbering scheme.
We immediately have
\begin{theorem} A property like $\mathit{Term}$ defined using our official G\"odelian coding scheme is p.r. if and only if the corresponding property $\mathit{Term_S}$ defined using scheme $S$ is p.r., for any acceptable scheme $S$.
\end{theorem}
\begin{proof}
Let the characteristic functions of $\mathit{Term}$ and $\mathit{Term_S}$ be $\mathit{term}$ and $\mathit{term_S}$ respectively. Then $\mathit{term_S}(n) = \mathit{term(tr(n))}$, hence $\mathit{term_S}$ will be p.r. by composition so long as $\mathit{term}$ is p.r.; and similarly $\mathit{term}(n) = \mathit{term_S(tr^{-1}(n))}$, hence $\mathit{term}$ is p.r. if $\mathit{term_S}$ is. So, in sum, $\mathit{Term}$ is p.r. iff $\mathit{Term}_S$ is p.r.: the property's status as p.r. is \emph{not} dependent on any particular choice of coding scheme (so long as it is acceptable).\end{proof}\noindent In sum, our result that $\mathit{Term}$ is p.r. is robust with respect to any sensible choice of coding scheme: similarly with the other results, in particular the result about $\mathit{Prf}$.
\section{Some cute notation}\label{cornernotation}
There's one other main idea we need in the next episode and which we'll introduce in this, namely the idea of `diagonalization'. But before explaining that, let's pause to introduce a really pretty bit of notation.
Assume we have chosen some system for G\"odel-numbering the expressions of some language $L$. Then
\begin{defn}
If $\varphi$ is an $L$-expression, then we'll use `\/$\ulcorner\varphi\urcorner$' \emph{in our logicians' augmented English} to denote $\varphi$'s G\"odel\ number.
\end{defn}\index{070@$\ulcorner\ \ \urcorner$}
\noindent Borrowing a species of quotation mark is appropriate because the number $\ulcorner \varphi\urcorner$ can be thought of as referring to the expression $\varphi$ via our coding scheme. (Sometimes, we'll write the likes of $\ulcorner\mathsf{U}\urcorner$ where $\mathsf{U}$ abbreviates an $L_A$ wff: we mean here, of course, the G\"odel-number for the unabbreviated original wff that $\mathsf{U}$ stands in for.)
So far so good. But in the book, I use this very same notation also to stand in for standard numerals inside our formal language, so that (in our second usage), in abbreviated $L$-expressions, `$\mathsf{\ulcorner\varphi\urcorner}$' {is shorthand for $L$'s standard numeral for the g.n. of} $\varphi$. In other words, inside formal expressions `$\ulcorner\varphi\urcorner$' stands in for the numeral for the number $\ulcorner\varphi\urcorner$.
A simple example to illustrate:
\begin{enumerate}
\item `$\mathsf{SS0}$' is an $L_A$ expression, the standard numeral for 2.
\item On our numbering scheme $\ulcorner\mathsf{SS0}\urcorner$, the g.n. of `$\mathsf{SS0}$', is $2^{21}\cdot3^{21}\cdot5^{19}$.
\item So, by our further convention in the book, we can also use the expression `$\ulcorner\mathsf{SS0}\urcorner$' inside (a definitional extension of) $L_A$, as an abbreviation for the standard numeral for that g.n., i.e. as an abbreviation for `$\mathsf{SSS\ldots S0}$' with $2^{21}\cdot3^{21}\cdot5^{19}$ occurrences of `$\mathsf{S}$'!
\end{enumerate}
This double usage -- outside a formal language to denote a g.n. of a formal expression {and} inside a formal language to take the place of a standard numeral for that g.n. -- should by this stage cause no confusion at all. I could have alternatively used in the book the common practice of always overlining abbreviations for standard numerals: we would then indicate the numeral for the g.n. number ${\ulcorner\mathsf{SS0}\urcorner}$ by the slightly messy `$\overline{\ulcorner\mathsf{SS0}\urcorner}$'. Many writers do this. But I thought that aesthetics recommended my fairly common and rather prettier convention.
I still think that was a reasonable decision: but in these notes, in the interests of maximal clarity -- if only as a helpful ladder that you can throw away once climbed! -- I \emph{will} here use the clumsier notation. So, to avoid any possibly misunderstandings, we'll adopt:
\begin{defn}
{Used in $L$-expressions}, `\/$\overline{\mathsf{\ulcorner\varphi\urcorner}}$' {is shorthand for $L$'s standard numeral for the g.n. of} $\varphi$.
\end{defn}
\noindent So: naked corner quotes belong to augmented English; overlined corner quotes are an abbreviatory device in the relevant formal language $L$.
\section{The idea of diagonalization}\label{sec:ideaofdiagonal}
In the next Episode, G\"odel\ is going to tell us how to construct a wff $\mathsf{G}$ in $\mathsf{PA}$\ that is true if and only if it is unprovable in $\mathsf{PA}$. We now have an inkling of how he can do that: wffs can contain numerals which refer to numbers which -- via G\"odel\ coding -- are correlated with wffs.
G\"odel's construction will involve taking an open wff that we'll abbreviate $\mathsf{U}$, or by $\mathsf{U(y)}$ when we want to emphasize that it contains just `$\mathsf{y}$' free. This wff has g.n.\ $\ulcorner\mathsf{U}\urcorner$. And then -- the crucial move -- G\"odel\ substitutes \emph{the numeral for $\mathsf{U}$'s g.n.} for the free variable in $\mathsf{U}$. So the key step involves forming the wff $\mathsf{U(\overline{\ulcorner\mathsf{U}\urcorner})}$.
This substitution operation is called \emph{diagonalization}, which at first sight might seem an odd term for it. But in fact, G\"odel's construction involves something quite closely akin to the `diagonal' construction we encountered in \S\ref{sec:sffstrongundecidable}, where we matched the index of a wff $\varphi_n(\mathsf{x})$ (in an enumeration of wffs with one free variable) with the numeral substituted for its free variable, to form $\varphi_n(\mathsf{\overline{n}})$. Here, in our G\"odelian diagonal construction, we match $\mathsf{U}$'s G\"odel\ number -- and we can think of this as indexing the wff in a list of wffs -- with the numeral substituted for its free variable, and this will yield the G\"odel\ sentence $\mathsf{U(\overline{\ulcorner\mathsf{U}\urcorner})}$. %{Hence the standard claim that G\"odel 's construction again involves a type of }{diagonalization}.
Now note the following additional point. Given the wff $\mathsf{U}$, it can't matter much whether we do the G\"odelian construction by forming (i)~$\mathsf{U(\overline{\ulcorner\mathsf{U}\urcorner})}$ (as G\"odel\ himself did in 1931) or alternatively by forming (ii) $\mathsf{\exists y(y = \overline{\ulcorner\mathsf{U}\urcorner} \land U(y))}$. For (i) and (ii) are trivially equivalent. But it in fact makes a few technical details go slightly easier if we do things the second way -- so that motivates our official definition in \emph{IGT}:\begin{restatable}{defn}{defdiagonal}
The \emph{{diagonalization}} of $\varphi$ is $\mathsf{\exists y(y = \overline{\ulcorner\varphi\urcorner} \;\land\; \varphi)}$.
\end{restatable}
\noindent It should go without saying that there is no special significance to using the variable `$\mathsf{y}$' for the relevant variable here! But we'll keep this choice fixed, simply for convenience.
Diagonalization is, evidently, a very simple mechanical operation on expressions.
%So there will be a corresponding simple computable function dealing with \mbox{numerical} codes for expressions which `tracks' the operation.
In fact,
\begin{restatable}{theorem}{tmdiagispr}\label{th:diagispr}
There is a p.r. function $\mathit{diag}(n)$ which, when applied to a number $n$ which is the g.n. of some wff, yields the g.n. of that wff's diagonalization.
\end{restatable}
\begin{proof}Consider this procedure. Decode the g.n.\ $n = \ulcorner\varphi\urcorner$ to get some expression $\varphi$ (assume we have some convention for dealing with `waste' cases where we don't get an expression). Then form $\varphi$'s diagonalization, $\mathsf{\exists y(y = \overline{\ulcorner\varphi\urcorner} \;\land\; \varphi)}$. Then work out the g.n. of this result to compute $\mathit{diag}(n)$. This procedure doesn't involve any unbounded searches. So we again will be able to program the procedure using just `for' loops. Hence $\mathit{diag}$ is a p.r. function\end{proof}
\vspace{8pt}\noindent And with those preliminaries in the bag, we can now -- at last -- turn to proving G\"odel's First Incompleteness Theorem. (Meanwhile, read Ch. 15 of \emph{IGT}.)
\newpage
%\setcounter{section}{0}
\setcounter{page}{1}
\setcounter{footnote}{0}
\begin{center}%
{{\Large \emph{G\"odel Without (Too Many) Tears -- 8}}\\[16pt]{\LARGE The First Incompleteness Theorem} \par%
\vskip 1.5em%
{\large
\lineskip .75em%
\begin{tabular}[t]{c}%
Peter Smith
\end{tabular}\par}}%
\vskip 0.75em%
{University of Canterbury, Christchurch, NZ}\\[6pt]
{March 23, 2010}%
\vskip 1.5em%
\end{center}%\par
\noindent\hrulefill
\begin{itemize}\setlength{\itemsep}{0pt}
\item How to construct a `canonical' G\"odel sentence
\item If $\mathsf{PA}$ is sound, it is negation incomplete
\item Generalizing that result to sound p.r. axiomatized theories whose language extends $L_A$
\item $\omega$-incompleteness, $\omega$-inconsistency
\item If $\mathsf{PA}$ is $\omega$-consistent, it is negation incomplete
\item Generalizing that result to $\omega$-consistent p.r. axiomatized theories which extend $\mathsf{Q}$
\item The historical First Theorem
\end{itemize}
\noindent\hrulefill
\vspace{8pt}\noindent The pieces we need to prove the First Theorem are finally all in place. So in this episode we at long last learn how to construct `G\"odel sentences' and use them to prove that $\mathsf{PA}$ is {incomplete}. We also show how generalize the result to other theories.
Let's quickly review the background that needs to be in place for the arguments to come. You need to understand the following:
\begin{enumerate}\renewcommand{\labelenumi}{\roman{enumi}.}
\item We can fix on some acceptable scheme for coding up wffs of $\mathsf{PA}$'s language $L_A$ by using G\"odel numbers (`g.n.' for short), and coding up $\mathsf{PA}$-proofs -- i.e. sequences or other arrays of wffs -- by super G\"odel numbers. Similarly later for coding up wffs and proofs of other theories. (\S\ref{sec:Arithmetization})
\item Notation: If $\varphi$ is an expression, then we'll denote its G\"odel number in our logician's English by `$\ulcorner{\varphi}\urcorner$'. We use `$\overline{\ulcorner{\varphi}\urcorner}$' as an abbreviation inside $L_A$ for the standard numeral for $\ulcorner{\varphi}\urcorner$. Note that later, when we start generalizing G\"odel's results to other theories, we'll use the same notation for G\"odel numberings of other languages. (\S\ref{cornernotation})
\item The {diagonalization} of $\varphi$ is $\mathsf{\exists y(y =\overline{ \ulcorner\varphi\urcorner} \;\land\; \varphi)}$. The diagonalization of $\varphi(\mathsf{y})$ is thus equivalent to $\varphi(\mathsf{\overline{\ulcorner\varphi\urcorner}})$. (\S\ref{sec:ideaofdiagonal})
\item $\mathit{diag}(n)$ is the p.r. function which, when applied to a number $n$ which is the g.n. of some wff $\varphi$, yields the g.n. of $\varphi$'s diagonalization. (\S\ref{sec:ideaofdiagonal})
\item $\mathit{Prf}(m, n)$ is the relation which holds just if $m$ is the {super} g.n. of a sequence of wffs that is a $\mathsf{PA}$ proof of a sentence with g.n.~$n$ (assume we've fixed on some definite version of $\mathsf{PA}$). This relation is p.r. decidable. (\S\ref{sec:gdlpr})
%\item $\mathit{Gdl}(m, n)$ holds just when $\mathit{Prf}(m, diag(n))$, i.e. just when $m$ is the super g.n. for a $\mathsf{PA}$\ proof of the diagonalization of the wff with g.n.\ $n$: $\mathit{Gdl}$ is also p.r. (Section~\ref{diagispr})
\item Any p.r. function or relation can be \emph{expressed} by a wff of $\mathsf{PA}$'s language $L_{A}$. In particular, we can choose a $\Sigma_1$ wff which `\emph{canonically}' expresses a given p.r. relation by recapitulating its p.r. definition (or more strictly, by recapitulating the definition of the relation's characteristic function). (\S\ref{sec:expressingprfunctions})
\item Any p.r. function or relation can be \emph{captured} in $\mathsf{Q}$ and hence in $\mathsf{PA}$\ (and captured by a $\Sigma_1$ wff which canonically expresses it). (\S\ref{sec:QcancapturePRfunctions})
\end{enumerate}
For what follows, it isn't necessary that you remember the \emph{proofs} of the claims we've just summarized: but do check that you at least fully understand what the various claims \emph{say}.
\section{Constructing a G\"odel sentence}\label{constr}
In this section, we construct a G\"odel sentence for $\mathsf{PA}$ in particular. But the mode of construction will evidently generalize -- a point we return to in the next section. First, another definition:
\begin{defn}
The relation $\mathit{Gdl}(m, n)$ is defined to hold just when $m$ is the super g.n. for a $\mathsf{PA}$ proof of the diagonalization of the wff with g.n.\ $n$.
\end{defn}
\begin{theorem}
$\mathit{Gdl}(m, n)$ is p.r. decidable.
\end{theorem}
\begin{proof}
Either we can informally note that we can mechanically check whether $\mathit{Gdl}(m, n)$ holds without open-ended searches.
Or we can note that $\mathit{Gdl}(m, n)$ holds, by definition, when $\mathit{Prf}(m, diag(n))$. The characteristic function of $\mathit{Gdl}$ is therefore definable by composition from the characteristic function of $\mathit{Prf}$ and the function $\mathit{diag}$, and hence is p.r., given facts (iv) and (v) from the preamble.
\end{proof}
Since $\mathit{Gdl}$ can be expressed in $L_A$ by a $\Sigma_1$ wff (by fact vi), which in fact captures $\mathit{Gdl}$ in $\mathsf{PA}$ (by fact vii). Of course there won't be a unique such $\Sigma_1$ wff. For a start, there will be more than one way of constructing a full definition of the (characteristic function) for the p.r. relation $\mathit{Gdl}$, so more than one way of tracking such a definition. But we'll adopt the following definition:
\begin{defn}\label{def:Gldasawff}
$\mathsf{Gdl}(\mathsf{x,y})$ stands in for some $\Sigma_1$ wff which canonically expresses and captures $\mathit{Gdl}$.
\end{defn}
\noindent And we next follow G\"odel in first constructing the corresponding wff
\begin{defn}
$\mathsf{U(y)} =_{\mathrm{def}}\forall\mathsf{ x \neg\mathsf{Gdl}(x, y)}$.
\end{defn}
\noindent (For reasons that will become clear in just a moment, you can think of that $\mathsf{U}$ as standing for `unprovable'.) And now we diagonalize $\mathsf{U}$, to give
\begin{defn}\label{def:godelsent}
${\mathsf{G}} =_{\mathrm{def}} \mathsf{\exists y(y = \overline{\ulcorner\mathsf{U}\urcorner} \land U(y))}$.
\end{defn}
\noindent Trivially, $\mathsf{G}$ is equivalent to $\mathsf{U(\overline{\ulcorner\mathsf{U}\urcorner})}$. Or unpacking that a bit,
$\mathsf{G}$ is equivalent to $\forall\mathsf{ x \neg\mathsf{Gdl}(x, \overline{\ulcorner\mathsf{U}\urcorner})}$.
$\mathsf{G}$ -- meaning of course the $L_A$ sentence you get when you unpack the abbreviations! -- is our `G\"odel sentence' for $\mathsf{PA}$. We might indeed call it a \emph{canonical} G\"odel sentence for three reasons: (a) it is defined in terms of a wff that we said canonically expresses/captures $\mathit{Gdl}$, and (b) because it is roughly the sort of sentence that G\"odel himself constructed, so (c) it is the kind of sentence people standardly have in mind when they talk of `\emph{the}' G\"odel sentence for $\mathsf{PA}$.
Note that $\mathsf{G}$ will in be horribly long when spelt out in unabbreviated $L_A$. But in another way, it is relatively simple. In the terminology of \S\ref{sec:qcomplexity}, we have the easy result that
\begin{theorem}
{$\mathsf{G}$ is $\Pi_1$}.
\end{theorem}
%\gap \Proof{} $\mathsf{Gdl}\mathsf{(x, y)}$ is $\Sigma_1$, and it expresses a p.r. relation. So $\mathsf{Gdl}\mathsf{(x, \ulcorner U\urcorner)}$ is also $\Sigma_1$, and expresses a p.r. property. So its negation $\neg\mathsf{Gdl}\mathsf{(x, \ulcorner U\urcorner)}$ is $\Pi_1$: and that also expresses a p.r. property, since the negation of a p.r. property is still p.r. (see Section~\ref{sec:prexamples}). Hence $\forall \mathsf{x}\neg\mathsf{Gdl}\mathsf{(x, \ulcorner U\urcorner)}$ is $\Pi_1$ too, and is a universal generalization about a p.r. property. {Its logical equivalent $\mathsf{G}$ is therefore also $\Pi_1$}. \sketch
%\footnote{For a reminder of all the jargon here, consult Section~\ref{sec:definingtheSigmawffs} again.}
\begin{proof}$\mathsf{Gdl}\mathsf{(x, y)}$ is $\Sigma_1$. So $\mathsf{Gdl}\mathsf{(x, \overline{\ulcorner U\urcorner})}$ is $\Sigma_1$. So its negation $\neg\mathsf{Gdl}\mathsf{(x, \overline{\ulcorner U\urcorner})}$ is $\Pi_1$. Hence $\forall \mathsf{x}\neg\mathsf{Gdl}\mathsf{(x, \overline{\ulcorner U\urcorner})}$ is $\Pi_1$ too. {Its logical equivalent $\mathsf{G}$ is therefore also $\Pi_1$}. \end{proof}
And now the key observation:
\begin{theorem}\label{th:Gistrueiffunprovable}
$\mathsf{G}$ is true if and only if it is unprovable in $\mathsf{PA}$.
\end{theorem}
\begin{proof} Consider what it takes for $\mathsf{G}$ to be true (on the interpretation built into $L_A$ of course), given that the formal predicate $\mathsf{Gdl}$ expresses the numerical relation $\mathit{Gdl}$.
$\mathsf{G}$ is true if and only if for all numbers $m$ it isn't the case that $\mathit{Gdl}(m, {\ulcorner\mathsf{U}\urcorner})$. That is to say, given the definition of $\mathit{Gdl}$, $\mathsf{G}$ is true if and only if there is no number $m$ such that $m$ is the code number for a $\mathsf{PA}$ proof of the diagonalization of the wff with g.n.\ $\ulcorner\mathsf{U}\urcorner$. But the wff with g.n.\ $\ulcorner\mathsf{U}\urcorner$ is of course $\mathsf{U}$; and its diagonalization is $\mathsf{G}$.
So, $\mathsf{G}$ is true if and only if there is no number $m$ such that $m$ is the code number for a $\mathsf{PA}$ proof of $\mathsf{G}$. But if $\mathsf{G}$ is provable, some number would be the code number of a proof of it. Hence $\mathsf{G}$ is true if and only if it is unprovable in $\mathsf{PA}$.
\end{proof}
\section{The First Theorem -- the semantic version}
\subsection{If $\mathsf{PA}$ is sound, it is incomplete}
Suppose $\mathsf{PA}$ is a sound theory, i.e. it proves no falsehoods (because its axioms are true and its logic is truth-preserving).
If $\mathsf{G}$ (which is true if and only if it is \emph{not} provable) could be proved in $\mathsf{PA}$, then $\mathsf{PA}$ \emph{would} prove a false theorem, contradicting our supposition. Hence, $\mathsf{G}$ is not provable in $\mathsf{PA}$.
But that shows that $\mathsf{G}$ \emph{is} true. So $\neg \mathsf{G}$ must be false. Hence $\neg \mathsf{G}$ cannot be proved in $\mathsf{PA}$ either, supposing $\mathsf{PA}$ is sound. In G\"odel's words, $\mathsf{G}$ is a `formally undecidable' sentence of $\mathsf{PA}$ (see Defn.~\ref{def:formallyundecidable}).
Which establishes
\begin{theorem}
If $\mathsf{PA}$ is sound, then there is a true $\Pi_1$ sentence $\mathsf{G}$ such that $\mathsf{PA} \nvdash \mathsf{G}$ and $\mathsf{PA} \nvdash \neg \mathsf{G}$, so $\mathsf{PA}$ is negation incomplete.
\end{theorem}
\noindent If we are happy with the semantic assumption that $\mathsf{PA}$'s axioms \emph{are} true on interpretation and so $\mathsf{PA}$ \emph{is} sound, the argument for incompleteness is as simple as that -- or at least, it's that simple once we have constructed $\mathsf{G}$. %And note that, unlike Theorems~\ref{th:firstversionsoundincomp} and~\ref{ssunde}, our new Theorem is proved constructively; in other words, our overall argument doesn't just make the bald existence claim that there is a formally undecidable sentence of \PA, it actually tells us how to construct one (construct a wff $\mathsf{Gdl}$ which expresses $\mathit{Gdl}$, then construct the corresponding $\mathsf{G}$).
%For reasons that will become clearer when we consider Hilbert's programme and related background in a later Interlude, it was very important to \gd\ that incompleteness can \emph{also} be proved \emph{without} supposing that \PA\ is sound: as he puts it, `purely formal and much weaker assumptions' suffice. However, the further argument that shows this is a little trickier.\footnote{Especially when we move on to consider Rosser's enhanced version of \gd's argument in Section~\ref{rosser}, which is needed to get the best non-semantic analogue for Theorem~\ref{thm:semanticgodel}.} \emph{So don't lose sight of \gd's simple `semantic' argument for incompleteness}.
\subsection{Generalizing the proof}
The proof evidently generalizes. Suppose $T$ is any theory at all, that is put together so that we can mechanically check whether a purported $T$-proof is indeed a kosher proof without going off on an open-ended search. Then, assuming a sensible scheme for G\"odel-number wffs of $T$, the relation $\mathit{Prf}_T(m,n)$ which holds when $m$ numbers a proof of the wff with number $n$ will be primitive recursive again. Let's say that a theory is \emph{p.r. axiomatized} when it is indeed axiomatized so as to make $\mathit{Prf}_T$ primitive recursive: then indeed any normal theory you dream up which is formally axiomatized is p.r. axiomatized.
Suppose now that $T$'s language includes the language of basic arithmetic, $L_A$ (see Defn.~\ref{def:langcontainsbasicarith}), so $T$ can form standard numerals, and we can form the diagonalization of a $T$-wff. Then we can also define the relation $\mathit{Gld}_T(m,n)$ which holds when $m$ numbers a $T$-proof of the diagonalization of the wff with number $n$. This too will be primitive recursive again.
Continuing to suppose that $T$'s language includes the language of basic arithmetic, $T$ will be able to express the p.r. relation $\mathit{Gld}_T$ by a $\Sigma_1$ wff $\mathsf{Gld}_T$. Then, just as we did for $\mathsf{PA}$, we'll be able to construct the corresponding $\Pi_1$ wff $\mathsf{G}_T$. And then exactly the same argument as before will show, more generally,
\begin{theorem}\label{thm:firsttheoremsemantic}
If $T$ is a sound p.r. axiomatized theory whose language contains the language of basic arithmetic, then there will be a true $\Pi_1$ sentence $\mathsf{G}_T$ such that $T \nvdash \mathsf{G}_T$ and $T \nvdash \neg \mathsf{G}_T$, so
$T$ is negation incomplete.
\end{theorem}
\noindent Which is our first, `semantic', version of the general Incompleteness Theorem!
\subsection{Comparisons}\label{subsec:comparisons}
Compare Theorem~\ref{thm:firsttheoremsemantic} with our initially announced
\Godelsemantic*
\noindent Our new theorem is stronger in one respect, weaker in another. But the gain is much more than the loss.
Our new theorem is stronger, because it tells us more about the character of the undecidable G\"odel sentence -- namely it has minimal quantifier complexity. The unprovable sentence $\mathsf{G}_T$ is a $\Pi_1$ sentence of arithmetic, i.e. is the universal quantification of a decidable condition. As far as quantifier complexity is concerned, it is on a par with Goldbach's conjecture that every number is such that, if even and greater than two, it is the sum of two primes (for note it is decidable whether a number is the sum of two primes). Indeed it is sometimes said that a G\"odel sentence like $\mathsf{G}_T$ is \emph{of Goldbach type}.
Our new theorem is weaker, however, as it only applies to p.r. axiomatized theories, not to formalized theories more generally. But that's not much loss. For what would a theory look like that was axiomatized but not p.r. axiomatized? It would be a matter, for example, of only being able to tell what's an axiom on the basis of an open-ended search: but that would require a \emph{very} unnatural way of specifying the theorem's axioms in the first place. As I noted before, any normally presented axiomatized theory will be p.r. axiomatized. (Later, in Episode 12, we will say something about how to extend G\"odel's theorem to cover the case of abnormally though still decidably axiomatized theories -- but that really is a minor extension.)
\subsection{Our Incompleteness Theorem is better called an \emph{incompletability} theorem}
Here, we just repeat the argument of \S\ref{sec:bettercalledincompleteability}: but the point is central enough to bear repetition. Suppose $T$ is a sound p.r. axiomatized theory which can express claims of basic arithmetic. Then Theorem~\ref{thm:firsttheoremsemantic} we can find a true $\mathsf{G}_T$ such that $T \nvdash \mathsf{G}_T$ and $T \nvdash \neg \mathsf{G}_T$. That \emph{doesn't} mean that $\mathsf{G}_T$ is `absolutely unprovable' in any sense: it just means that $\mathsf{G}_T$-is-unprovable-in-$T$.
Now, we might want to `repair the gap' in $T$ by adding $\mathsf{G}_T$ as a new axiom. So consider the theory $T' = T + \mathsf{G}_T$. Then (i) $T'$ is still sound (for the old $T$-axioms are true, the added new axiom is true, and the logic is still truth-preserving). (ii) $T$ is still a p.r. axiomatized theory, since adding an specified axiom to $T$ doesn't commit us to any open-ended searches to determine what is an axiom of the augmented theory. (iii) We haven't changed the language. So our Incompleteness Theorem applies, and we can find a sentence $\mathsf{G}_T'$ such that $T' \nvdash \mathsf{G}_{T'}$ and $T' \nvdash \neg\mathsf{G}_{T'}$. And since $T'$ is stronger than $T$, we have a fortiori, $T \nvdash \mathsf{G}_{T'}$ and $T \nvdash \neg \mathsf{G}_{T'}$. In other words, `repairing the gap' in $T$ by adding $\mathsf{G}_T$ as a new axiom leaves some other sentences that are undecidable in $T$ \emph{still} undecidable in the augmented theory.
And so it goes. Our theorem tells us that if we keep chucking more and more additional true axioms at $T$, our theory will still remain negation-incomplete, unless it either stops being sound or stops being p.r. axiomatized. In a good sense, $T$ is \emph{incompletable}.\\
\noindent Now do pause here: have a think, have a coffee! Are you absolutely clear about how $\mathsf{G}$ is constructed? Are you absolutely clear why it is true iff and only if unprovable? Do you understand why it it must be formally undecidable assuming $\mathsf{PA}$ is sound? Do you understand how and why the result generalizes?
If you answer `no' to any of those, re-read more carefully! If you answer `yes' to all, excellent: on we go \ldots
\section{$\omega$-completeness, $\omega$-consistency}
Before we turn to the second version of the First Incompleteness Theorem -- the version that downgrades the semantic assumption that we're dealing with a sound theory to the much weaker syntactic assumption that the theory is consistent (and a bit more) -- we need to pause to define two key notions.
Techie note: in this section, take the quantifiers mentioned to be arithmetical ones -- if necessary, therefore, replacing $\forall \mathsf{x}\varphi\mathsf{(x)}$ by $\forall \mathsf{x(Nx \to \varphi(x))}$, where `$\mathsf{N}$' picks out the numbers from the domain of the theory's native quantifiers (see Defn.~\ref{def:langcontainsbasicarith}).
\begin{defn}
A theory $T$ is $\omega$-\emph{incomplete} iff, for some open wff $\varphi\mathsf{(x)}$, $T$ can prove $\varphi\mathsf{(\overline{n})}$ for each natural number $m$, but $T$ can't go on to prove $\forall \mathsf{x}\varphi\mathsf{(x)}$.
\end{defn}
\noindent We saw in \S\ref{sec:Qnotcomplete}
that $\mathsf{Q}$ is $\omega$-{incomplete}: that's because it can prove each instance of $\mathsf{0 + \overline{n} = \overline{n}}$, but can't prove $\mathsf{\forall x(0 + x = x)}$. We could repair $\omega$-incompleteness if we could add the $\omega$-rule (see \S\ref{subsec:theomegarule}), but that's an infinitary rule that is not available in a formalized theory given the usual finitary restrictions on the checkability of proofs. We instead added induction to $\mathsf{Q}$ hoping to repair as much incompleteness as we could: but, as we'll see, $\mathsf{PA}$ remains $\omega$-{incomplete} (assuming it is consistent).
\begin{defn}
A theory $T$ is $\omega{}$-\emph{inconsistent} iff, for some open wff $\varphi\mathsf{(x)}$, $T$ can prove each $\varphi\mathsf{(\overline{n})}$ and $T$ can also prove $\neg\forall \mathsf{x}\varphi\mathsf{(x)}$.
\end{defn}
\noindent Or, entirely equivalently, we could of course say that $T$ is $\omega{}$-{inconsistent} if, for some open wff $\varphi'(\mathsf{x})$, $T \vdash \exists \mathsf{x} \varphi'(\mathsf{x})$, yet for each number $n$ we have $T \vdash \neg \varphi'(\mathsf{\overline{n}})$.
Note that $\omega${}-inconsistency, like ordinary inconsistency, is a syntactically defined property: it is characterized in terms of what wffs can be proved, not in terms of what they mean. Note too that, in a classical context, $\omega${}-consistency -- defined of course as not being $\omega${}-inconsistent! -- trivially implies plain consistency. That's because $T$'s being $\omega${}-consistent is a matter of its \emph{not} being able to prove a certain combination of wffs, which entails that $T$ can't be inconsistent and prove \emph{all} wffs.
Now compare and contrast. Suppose $T$ can prove $\varphi\mathsf{(\overline{n})}$ for each $m$. $T$ is $\omega$-{}{incomplete} if it {can't} also prove something we'd {like} it to prove, namely $\forall \mathsf{x}\varphi\mathsf{(x)}$. While $T$ is $\omega$-{}{inconsistent} if it {can} actually prove the \emph{negation} of what we'd like it to prove, i.e. it can prove $\neg\forall \mathsf{x}\varphi\mathsf{(x)}$.
So $\omega${}-incompleteness in a theory of arithmetic is a regrettable weakness; but $\omega${}-inconsistency is a Very Bad Thing (not as bad as outright inconsistency, maybe, but still bad enough). For evidently, a theory that can prove each of $\varphi\mathsf{(\overline{n})}$ and yet also prove $\neg\forall \mathsf{x}\varphi\mathsf{(x)}$ is just not going to be an acceptable candidate for regimenting arithmetic.
That last observation can be made vivid if we temporarily bring semantic ideas back into play. Suppose the theory $T$ is given a \emph{arithmetically standard} interpretation, by which we here mean just an interpretation which takes numerical quantifiers as running over a domain comprising the natural numbers, and on which $T$'s standard numerals denote the intended numbers (with the logical apparatus also being treated as normal, so that inferences in $T$ are truth-preserving). And suppose further that on this interpretation, the axioms of $T$ are all true. Then $T$'s theorems will all be true too. So now imagine that, for some $\varphi(\mathsf{x})$, $T$ does prove each of
$\varphi(\mathsf{0})$, $\varphi(\mathsf{\overline{1}})$, $\varphi(\overline{\mathsf{2}})$, \ldots. By hypothesis, these theorems will then be true on the given standard interpretation; so this means that every natural number must satisfy $\varphi(\mathsf{x})$; so $\forall \mathsf{x}\varphi(\mathsf{x})$ is true since the domain contains only natural numbers. Hence $\neg \forall \mathsf{x}\varphi(\mathsf{x})$ will have to be false on this standard interpretation. Therefore $\neg \forall \mathsf{x}\varphi(\mathsf{x})$ can't be a theorem, and $T$ must be $\omega$-consistent.
Hence, contraposing, we have
\begin{theorem}
If $T$ is $\omega${}-inconsistent then $T$'s axioms can't all be true on an arithmetically standard interpretation.
\end{theorem}
\noindent Given that we want formal arithmetics to have axioms which \emph{are} all true on a standard interpretation, we must therefore want $\omega${}-consistent arithmetics. And given that we think e.g. $\mathsf{PA}$ \emph{is} sound on its standard interpretation, we are committed to thinking that it \emph{is} $\omega${}-consistent.
\section{The First Theorem -- the syntactic version}
\subsection{If $\mathsf{PA}$ is consistent, it can't prove $\mathsf{G}$}
So far, we have actually only made use of the weak result that $\mathsf{PA}$'s language can \emph{express} the relation $\mathit{Gdl}$. But remember Defn.~\ref{def:Gldasawff}: our chosen $\mathsf{Gdl}$ doesn't just express $\mathit{Gdl}$ but \emph{captures} it. Using this fact about $\mathsf{Gdl}$, we can again show that $\mathsf{PA}$ does not prove $\mathsf{G}$, but this time \emph{without} making the semantic assumption that $\mathsf{PA}$\ is sound.
\begin{theorem}
{If $\mathsf{PA}$\ is consistent, $\mathsf{PA}$\ $\nvdash \mathsf{G}$.}
\end{theorem}
\begin{proof}Suppose $\mathsf{G}$ \emph{is} provable in $\mathsf{PA}$. If $\mathsf{G}$ has a proof, then there is some super g.n.\ $m$ that codes its proof. But by definition, $\mathsf{G}$ is the diagonalization of the wff $\mathsf{U}$. Hence, by definition, $\mathit{Gdl}(m, {\mathsf{\ulcorner U\urcorner}})$.
Now we use the fact that $\mathsf{Gdl}$ {captures} the relation $\mathit{Gdl}$. That implies that, since $\mathit{Gdl}(m, \mathsf{\ulcorner U\urcorner})$, we have (i) $\mathsf{PA}$\ $\vdash \mathsf{Gdl}(\mathsf{\overline{m}}, \overline{\mathsf{\ulcorner U\urcorner}})$.
But since $\mathsf{G}$ is logically equivalent to $\forall \mathsf{x} \neg \mathsf{Gdl}(\mathsf{x}, \overline{\mathsf{\ulcorner U\urcorner}})$, the assumption that $\mathsf{G}$ is provable comes to this: $\mathsf{PA}$\ $\vdash \forall \mathsf{x} \neg \mathsf{Gdl}(\mathsf{x}, \overline{\mathsf{\ulcorner U\urcorner}})$. %However, $\forall \mathsf{x} \neg \mathsf{Gdl}(\mathsf{x}, \mathsf{\ulcorner U\urcorner}) \vdash \neg \mathsf{Gdl}(\mathsf{\overline{m}}, \mathsf{\ulcorner U\urcorner})$.
The universal quantification here entails any instance. Hence (ii) $\mathsf{PA}$\ $\vdash \neg \mathsf{Gdl}(\mathsf{\overline{m}},\overline{ \mathsf{\ulcorner U\urcorner}})$.
So, combining (i) and (ii), the assumption that $\mathsf{G}$ is provable entails that $\mathsf{PA}$ is inconsistent. Hence, if $\mathsf{PA}$ is consistent, there can be no $\mathsf{PA}$ proof of $\mathsf{G}$.
\end{proof}
\subsection{If $\mathsf{PA}$ is consistent, it is $\omega$-incomplete}
Here's an immediate corollary of that last theorem:
\begin{theorem}
If $\mathsf{PA}$ is consistent, it is $\omega$-incomplete.
\end{theorem}
\begin{proof}
Assume $\mathsf{PA}$'s consistency. Then we've shown that $\mathsf{PA} \nvdash\mathsf{G}$, i.e.,
\begin{enumerate}
\item $\mathsf{PA} \nvdash\forall\mathsf{ x} \neg\mathsf{Gdl(x, \overline{\mathsf{\ulcorner U\urcorner}}})$.
\end{enumerate}
Since $\mathsf{G}$ is unprovable, that means that no number is the super g.n. of a proof of $\mathsf{G}$. That is to say, no number numbers a proof of the diagonalization of $\mathsf{U}$. That is to say, for any particular $m$, it\emph{ isn't }the case that $\mathit{Gdl}(m, \mathsf{\ulcorner U\urcorner})$. Hence, again by the fact that $\mathsf{Gdl}$ {captures} $\mathit{Gdl}$, we have
\begin{enumerate}\setcounter{enumi}{1}
\item For each $m$, $\mathsf{PA}$\ $\vdash \neg\mathsf{Gdl}(\overline{\mathsf{m}}, \overline{\mathsf{\ulcorner U\urcorner}})$.
\end{enumerate}
Putting $\varphi\mathsf{(x) =_{\mathrm{def}} \neg\mathsf{Gdl}(x, \overline{\mathsf{\ulcorner U\urcorner}})}$, the combination of (1) and (2) therefore shows that $\mathsf{PA}$ is $\omega$-{incomplete}.
\end{proof}
\subsection{If $\mathsf{PA}$ is $\omega$-consistent, it can't prove $\neg\mathsf{G}$}
We'll now show that $\mathsf{PA}$ can't prove the negation of $\mathsf{G}$, {without} assuming $\mathsf{PA}$'s soundness: we'll just make the syntactic assumption of $\omega${}-consistency.
\begin{theorem}{If $\mathsf{PA}$\ is $\omega${}-consistent, $\mathsf{PA} \nvdash \neg\mathsf{G}$.} %($\neg \mathsf{G}$ is unprovable in $\mathsf{PA}$).
\end{theorem}
\begin{proof}Suppose $\neg\mathsf{G}$ is provable in $\mathsf{PA}$. That's equivalent to assuming \begin{enumerate}\item $\mathsf{PA} \vdash\exists \mathsf{x}\mathsf{Gdl}(\mathsf{x}, \overline{\ulcorner \mathsf{U}\urcorner})$.\end{enumerate}
Now suppose too that $\mathsf{PA}$ is $\omega${}-consistent. Then, as we remarked before, that implies that $\mathsf{PA}$ is consistent. So if $\neg\mathsf{G}$ is provable, $\mathsf{G}$ is \emph{not} provable. Hence for any $m$, $m$ cannot code for a proof of $\mathsf{G}$. But $\mathsf{G}$ is (again!) the wff you get by diagonalizing $\mathsf{U}$. Therefore, by the definition of $\mathit{Gdl}$, our assumptions imply that $\mathit{Gdl}(m, \ulcorner \mathsf{U}\urcorner)$ is false, for each $m$.
So, by the requirement that $\mathsf{Gdl}$ captures $\mathit{Gdl}$, we have
\begin{enumerate}\setcounter{enumi}{1}
\item $\mathsf{PA}$\ $\vdash \neg\mathsf{Gdl}(\mathsf{\overline{m}}, \ulcorner \mathsf{U}\urcorner)$ for each $m$.
\end{enumerate}
But (1) and (2) together make $\mathsf{PA}$ $\omega${}-inconsistent after all, contrary to hypothesis. Hence, if $\mathsf{PA}$ is $\omega${}-consistent, $\neg \mathsf{G}$ is unprovable.
\end{proof}
\subsection{Putting together syntactic Incompleteness Theorem for $\mathsf{PA}$}
Let's put all the ingredients together. Recall that $\mathsf{G}$ is a $\Pi_1$ sentence (i.e. of the same quantifier complexity as e.g. Goldbach's Conjecture). And we know from Theorem~\ref{th:Gistrueiffunprovable} that $\mathsf{G}$ is true if and only if it is unprovable. That observation put together with what we've shown so far this section entails
\begin{theorem}\label{PAincompletesynt}
If $\mathsf{PA}$ is consistent, then there is a $\Pi_1$ sentence $\mathsf{G}$ such that $\mathsf{PA} \nvdash \mathsf{G}$, and if $\mathsf{PA}$ is $\omega$-consistent $\mathsf{PA} \nvdash \neg \mathsf{G}$, so -- assuming $\omega$-consistency and hence consistency -- $\mathsf{PA}$ is negation incomplete.
\end{theorem}
\subsection{Generalizing the proof}
The proof for Theorem~\ref{PAincompletesynt} evidently generalizes. Suppose $T$ is a p.r. axiomatized theory which contains $\mathsf{Q}$ -- so (perhaps after introducing some new vocabulary by definitions) the language of $T$ extends the language of basic arithmetic, and $T$ can prove $\mathsf{Q}$'s axioms. Then assuming a sensible scheme for G\"odel-number wffs of $T$, the relation $\mathit{Gdl}_T(m,n)$ which holds when $m$ numbers a $T$-proof of the diagonalization of the wff with number $n$ will be primitive recursive again.
Since $T$ can prove everything $\mathsf{Q}$ proves, $T$ will be able to capture the p.r. relation $\mathit{Gld}_T$ by a $\Sigma_1$ wff $\mathsf{Gld}_T$. Just as did for $\mathsf{PA}$, we'll be able to construct the corresponding $\Pi_1$ wff $\mathsf{G}_T$. And, exactly the same arguments as before will then show, more generally,
\begin{restatable}{theorem}{firsttheoremsyntactic}\label{thm:firsttheoremsyntactic}
If $T$ is a consistent p.r. axiomatized theory which contains $\mathsf{Q}$, then there will be a $\Pi_1$ sentence $\mathsf{G}_T$ such that $T \nvdash \mathsf{G}_T$, and if $T$ is $\omega$-consistent, $T \nvdash \neg \mathsf{G}_T$, so
$T$ is negation incomplete.
\end{restatable}
\noindent When people refer to `The First Incompleteness Theorem' (without qualification), they typically mean something like this second general result, deriving incompleteness from syntactic assumptions.
\subsection{Comparisons}
Compare Theorem~\ref{thm:firsttheoremsyntactic} with our initially announced
\Godelsyntactic*
\noindent Our new theorem fills out the old one in various respects, but it is weaker in another respect. But the gain is much more than the loss.
Our new theorem tells us more about the `modest amount of arithmetic' that $T$ is assumed to contain and it also spells out the `additional desirable property' which we previously left mysterious (and we now know the condition is only applied in half the theorem). Further it tells us more about the undecidable G\"odel sentence -- namely it has minimal quantifier complexity, i.e. it is a $\Pi_1$ sentence of arithmetic. Our new theorem is weaker, however, as it only applies to p.r. axiomatized theories, not to formal axiomatized theories more generally. But we've already note that that's not much loss. Later we'll make up the shortfall.\\
\section{The historical First Theorem}
Theorem~\ref{thm:firsttheoremsyntactic}, or something like it, is what people usually mean when they speak without qualification of `The First Incompleteness Theorem'. But since the stated theorem refers to Robinson Arithmetic $\mathsf{Q}$ (developed by Robinson in 1950!), and G\"odel didn't originally know about that (in 1931), our version can't be quite what G\"odel originally proved. But it is a near miss.
Looking again at our analysis of the syntactic argument for incompleteness, we see that we are interested in theories which extend \Q\ \emph{because we are interested in theories which can capture p.r. relations like $\mathit{Gdl}$}. It's being able to capture $\mathit{Gdl}$ that is the crucial condition for a theory's being incomplete. So let's say
\begin{defn}
A theory $T$ is \emph{p.r. adequate} if it can capture all primitive recursive functions and relations.
\end{defn}
\noindent Then, instead of mentioning \Q, let's instead explicitly write in the requirement of p.r. adequacy. So, by just the same arguments,
\begin{restatable}{theorem}{ffirstgodel}\label{firstgodel}
If $T$ is a p.r. adequate, p.r. axiomatized theory whose language includes $L_A$, then
there is $\Pi_1$ sentence $\varphi$ such that, if $T$ is consistent then $T \nvdash \varphi$, and if $T$ is $\omega$-consistent then $T \nvdash \neg\varphi$.
\end{restatable}
\noindent And this is pretty much G\"odel's own general version of the incompleteness result. I suppose that it has as much historical right as any to be called {\emph{\gd 's} First Theorem}.\footnote{`Hold on! If \emph{that's} the First Theorem, we didn't need to do all the hard work showing that \Q\ and \PA\ are p.r. adequate, did we?' Well, yes and no. No, proving \emph{this} original version of the Theorem of course doesn't depend on proving that any particular theory is p.r. adequate. But yes, showing that this Theorem has real bite, showing that it applies to familiar arithmetics, does depend on proving the adequacy theorem.}
For in his 1931 paper, \gd\ first proves his Theorem VI, which with a bit of help from his Theorem VIII shows that the formal system $P$ -- which is his simplified version of the hierarchical type-theory of \emph{Principia Mathematica}\index{Principia@\textit{Principia Mathematica}|)} -- has a formally undecidable $\Pi_1$ sentence (or sentence `of Goldbach type', see \S\ref{subsec:comparisons}). Then he immediately generalizes:
\begin{quote}
In the proof of Theorem VI no properties of the system $P$ were used besides the following:
\begin{enumerate}
\item The class of axioms and the rules of inference (that is, the relation `immediate consequence') are [primitive] recursively definable (as soon as we replace the primitive signs in some way by the natural numbers).
\item Every [primitive] recursive relation is definable [i.e. is `capturable'] in the system $P$.
\end{enumerate}
Therefore, in every formal system that satisfies the assumptions 1 and 2 and is $\omega$-consistent, there are undecidable propositions of the form $(x)F(x)$ [i.e. $\forall xF(x)$], where $F$ is a [primitive] recursively defined property of natural numbers, and likewise in every extension of such a system by a recursively definable $\omega$-consistent class of axioms.
\end{quote}
Which gives us our Theorem~\ref{firstgodel}.
\vspace{8pt}\noindent At this point, make sure you really understand at least what the core theorems in this episode \emph{mean}. Then read \emph{IGT}, Chs. 16 and 17. And then re-read those chapters! -- for they are at the very heart of the book, and of this course.
Then when you feel reasonably confident of the techie details, have a look at Ch. 18 (perhaps skipping \S18.3).
%%%%%%%%%%
\newpage
%\setcounter{section}{0}
\setcounter{page}{1}
\setcounter{footnote}{0}
\begin{center}%
{{\Large \emph{G\"odel Without (Too Many) Tears -- 9}}\\[16pt]{\LARGE The Diagonalization Lemma, Rosser and Tarski} \par%
\vskip 1.5em%
{\large
\lineskip .75em%
\begin{tabular}[t]{c}%
Peter Smith
\end{tabular}\par}}%
\vskip 0.75em%
{University of Canterbury, Christchurch, NZ}\\[6pt]
{April 7, 2010}%
\vskip 1.5em%
\end{center}%\par
\noindent\hrulefill
\begin{itemize}\setlength{\itemsep}{0pt}
\item Provability predicates
\item The Diagonalization Lemma
\item Incompleteness from the Diagonalization Lemma
\item Tarski's Theorem
\item The Master Argument
\item Rosser's Theorem
%\item Reflecting on the First Theorem
\end{itemize}
\noindent\hrulefill
\vspace{8pt}\noindent We've now proved our key version of the First Theorem, Theorem~\ref{thm:firsttheoremsyntactic}. If $T$ is the right kind of $\omega$-consistent theory including enough arithmetic, then there will be an arithmetic sentence $\mathsf{G}_T$ such that $T \nvdash \mathsf{G}_T$ and $T \nvdash \neg\mathsf{G}_T$. Moreover, $\mathsf{G}_T$ is constructed so that it is true if and only if unprovable-in $T$ (so it is true). We won't rehearse the construction and the arguments from the last episode again here.
This episode starts by proving the key theorem again by a slightly different route -- via the so-called Diagonalization Lemma. The interest in doing this is that the same Lemma leads to two other important theorem due to Rosser and Tarski.
\section{Provability predicates}
Recall that, for a p.r. axiomatized theory $T$, $\mathit{Prf}_T(m, n)$ is the relation which holds just if $m$ is the {super} g.n. of a sequence of wffs that is a $T$ proof of a sentence with g.n.~$n$. This relation is p.r. decidable (see \S\ref{sec:gdlpr}). Assuming $T$ extends \Q, $T$ can capture any p.r. decidable relation, including $\mathit{Prf}_T$ (\S\ref{sec:QcancapturePRfunctions}). So we can legitimately stipulate
\begin{restatable}{defn}{defofPrfcanonPrf}
$\mathsf{Prf}_T\mathsf{(x, y)}$ stands in for a $T$-wff that canonically captures $\mathit{Prf}_T$,
\end{restatable}
\noindent for there will indeed be such a wff. NB, for some of what follows, \emph{any} wff $\mathsf{Prf}_T$ that captures $\mathit{Prf}_T$ will do (it doesn't have to be `canonical' in the sense of Defn.~\ref{def:canonical}): but it's convenient to fix on some canonical way, hence $\Sigma_1$ way, of capturing $\mathit{Prf}_T$. Next, we say:
\begin{restatable}{defn}{defofProv}\label{th:defofProv}
Put $\mathsf{Prov}_T\mathsf{(y)} =_{\mathrm{def}} \mathsf{\exists v\,Prf}_T\mathsf{(v, y)}$: such an expression is a \emph{provability predicate} for $T$.
\end{restatable}
\noindent $\mathsf{Prov}_T\mathsf{(\overline{n})}$ is true, of course, on the standard arithmetic interpretation of $T$ just if $n$ numbers a $T$-theorem, i.e. a wff for which some number numbers a proof of it. Which means that $\mathsf{Prov}_T\mathsf{(\overline{\ulcorner{\varphi}\urcorner})}$ is true just when $\varphi$ is a theorem. Hence the aptness of the label `provability predicate' for $\mathsf{Prov}_T$. %\footnote{Recall: We assume we've fixed on some acceptable scheme for coding up wffs of a given theory ${T}$'s language by using G\"odel numbers. If $\varphi$ is an expression, then we'll denote its G\"odel number in our logician's English by `$\ulcorner{\varphi}\urcorner$'. We use `$\overline{\ulcorner{\varphi}\urcorner}$' as an abbreviation inside $T$'s language for the standard numeral for $\ulcorner{\varphi}\urcorner$. (\S\ref{cornernotation})}
Note, if $\mathsf{Prov}_T$ is built from a canonical $\mathsf{Prf}_T$ it is also $\Sigma_1$.
So our observation that $\mathsf{G}_T$ is so constructed that it is true if and only if unprovable-in $T$ can in fact be \emph{expressed} inside $T$ itself, by the wff $\mathsf{G}_T \equiv \neg\mathsf{Prov}_T\mathsf{(\overline{\ulcorner{\mathsf{G}_T}\urcorner})}$.
And now a key observation: $T$ doesn't just express this fact but can quite easily \emph{prove} it too, i.e. we have $T \vdash \mathsf{G}_T \equiv \neg\mathsf{Prov}_T\mathsf{(\overline{\ulcorner{\mathsf{G}_T}\urcorner})}$. More on this claim below, \S\ref{sec:relatingoldtonew}.
\section{The Diagonalization Lemma}
\subsection{Introducing the Lemma}
Now, when we think through a demonstration that there is some sentence $\delta$ (in fact our old friend $\mathsf{G}_T$) such that $T \vdash \delta \equiv \neg\mathsf{Prov}_T\mathsf{(\overline{\ulcorner{\delta}\urcorner})}$, we notice an interesting generalization. \emph{Take {any} open sentence $\varphi(\mathsf{x})$ at all} -- i.e. not just $\neg\mathsf{Prov}_T\mathsf{(x)}$ -- \emph{then we can \emph{always} find some sentence $\delta$ or other such that $T \vdash \delta \equiv \varphi{(\overline{\ulcorner{\delta}\urcorner})}$.} This generalized result is called The Diagonalization Lemma (and was first explicitly isolated by Carnap as a principle that could be seen as underlying G\"odel's incompleteness proof).
\subsection{Proving the Lemma}
First a reminder of something familiar:
\medskip
\noindent\textbf{Defn.~\ref{def:captures}.}\ \emph{The theory $T$ {captures} the one-place function $f$ by the open wff $\varphi\mathsf{(x, y)}$ iff, for any $m, n$,\\
\hspace*{0.7cm} i. if $f(m) = n$, then $T \vdash
\varphi(\overline{\mathsf{m}}, \overline{\mathsf{n}})$,\\
\hspace*{0.7cm} ii. if $f(m) \neq n$, then $T \vdash
\neg\varphi(\overline{\mathsf{m}}, \overline{\mathsf{n}})$. }
\medskip
\noindent Now let's introduce what is a variant idea of capturing, evidently closely related to that one, namely:
\begin{defn}
The theory $T$ {captures\/$^*$} the one-place function $f$ by the open wff $\varphi^*\mathsf{(x, y)}$ iff, for any $m, n$, if $f(m) = n$, then $T \; \vdash \; \forall \mathsf{y} (\varphi^*(\mathsf{{\overline{m}}}, \mathsf{y}) \equiv \mathsf{y}=\mathsf{{\overline{n}}})$.
\end{defn}
\noindent It's trivial that a wff that captures* $f$ will also capture $f$ (why?). And there's nearly a converse:
\begin{theorem}
If $T$ extends \Q, then if $f$ is captured by some wff $\varphi$, then there's also a wff $\varphi^*$ which {captures\/$^*$} $f$.
\end{theorem}
\noindent There's a little trick for defining a capturing$^*$ $\varphi^*$ from a capturing $\varphi$, a trick which is explained in \S12.2 of the book. But I'm not going to explain that here -- it would be seriously boring to delay over the details. (And when I do a second edition of the book, I'm tempted to define capturing$^*$ from the off, and then we wouldn't need the boring details!) So take the mini-theorem on trust, and work out how to prove it from the book if you must!
Now some more reminders: \defdiagonal* \tmdiagispr*
\noindent And following on from those, here's a new definition. If $T$ is a theory that contains \Q, it can capture (and hence capture$^*$) all p.r. functions, then so in particular it can capture$^*$ the function $\mathit{diag}$. Hence we put
\begin{defn}
$\mathsf{Diag}_T\mathsf{(x,y)}$ is a $T$-wff which captures$^*$ $\mathit{diag}$.
\end{defn}
\noindent
And now we can officially state and then prove
\begin{theorem}[Diagonalization Lemma]\label{th:diag}
If $T$ extends \Q, and $\varphi$ is a one-place open sentence of $T$'s language, then there is sentence $\delta$ such that $T \vdash \delta \leftrightarrow \varphi(\overline{{\ulcorner\delta\urcorner}})$.
\end{theorem}
\noindent Note by the way, $T$ here doesn't have to be a nicely axiomatized theory -- just containing \Q\ is enough.
To avoid unsightly rashes of subscripts, let's henceforth drop subscript $T$s. Then we can argue like this:
\begin{proof} Put $\alpha =_{\mathrm{def}} \forall \mathsf{z(Diag}\mathsf{(y,z) \to \varphi(z))}$, and let $\delta$ be the diagonalization of $\alpha$. Since diagonalizing $\alpha$ zields $\delta$, we have $\mathit{diag}({\ulcorner\alpha\urcorner}) = {\ulcorner\delta\urcorner}$. Hence (a) $T \vdash \mathsf{\forall z(Diag({\overline{{\ulcorner\alpha\urcorner}}},z) \leftrightarrow z = {\overline{{\ulcorner\delta\urcorner}}})}$ since by hypothesis $\mathsf{Diag}$ captures$^*$ $\mathit{diag}$ in $T$. But just from the definition of $\delta$ it is equivalent to $\forall \mathsf{z(Diag}\mathsf{({\overline{{\ulcorner\alpha\urcorner}}},z) \to \varphi(z))}$, and any theory containing a trivial amount of logic can prove that, so in particular (b)
$T \vdash \delta \leftrightarrow \forall \mathsf{z(Diag}\mathsf{({\overline{{\ulcorner\alpha\urcorner}}},z) \to \varphi(z))}$. Hence, substituting the provable equivalents from (a) into (b), we have $T \vdash \delta \leftrightarrow \forall \mathsf{z(z = {\overline{{\ulcorner\delta\urcorner}}} \to \varphi(z))}$, which trivally gives $T \vdash \delta \leftrightarrow \varphi({\overline{{\ulcorner\delta\urcorner}}})$.\end{proof}
\noindent I promised that it was going to be easy!
Finally, a bit of jargon before proceeding. By a certain abuse of mathematical terminology, we say
\begin{defn}
If $\delta$ is such that $T \vdash \delta \leftrightarrow \varphi(\overline{{\ulcorner\delta\urcorner}})$, then it is said to be a \emph{fixed point} for $\varphi$.
\end{defn}
\noindent So the Diagonalization Lemma is often called the Fixed Point Theorem -- every one-place open sentence has a fixed point.
\section{Incompleteness from the Diagonalization Lemma}
You could skip this section at a first reading. What we do is first recover the First Incompleteness Theorem from the Diagonalization Lemma, and then fulfil our promise to show directly that $T \vdash \mathsf{G}_T \equiv \neg\mathsf{Prov}_T\mathsf{(\overline{\ulcorner{\mathsf{G}_T}\urcorner})}$.
\subsection{Recovering the First Theorem}
First we have the following general observation about provability predicates (as a reality check, ask yourself where subscript $T$'s really belong in this statement and its proof):
\begin{theorem}\label{th:fixdptProv}
Suppose $T$ is p.r. axiomatized, contains \Q, and some sentence or other $\mathsf{\gamma}$ is a fixed point for $\neg\mathsf{Prov}$ -- i.e., $T \vdash \mathsf{\gamma} \leftrightarrow \neg \mathsf{Prov}{(\overline{{\ulcorner \mathsf{\gamma}\urcorner}})}$. Then (i) if $T$ is consistent, $T \nvdash \mathsf{\gamma}$. And (ii) if $T$ is $\omega$-consistent, $T \nvdash \neg \mathsf{\gamma}$.
\end{theorem}
\begin{proof}
(i) Suppose $T \vdash \mathsf{\gamma}$. Then $T \vdash \neg \mathsf{Prov}{(\overline{{\ulcorner \mathsf{\gamma}\urcorner}})}$. But if there \emph{is} a proof of $\mathsf{\gamma}$, then for some $m$, $\mathit{Prf}(m, {\ulcorner \mathsf{\gamma}\urcorner})$, so $T \vdash \mathsf{Prf}(\overline{{{\mathsf{m}}}}, \overline{{\ulcorner \mathsf{\gamma}\urcorner}})$, since $T$ captures $\mathit{Prf}$ by $\mathsf{Prf}$. Hence $T \vdash \mathsf{\exists x\,{Prf}}(\mathsf{x}, \overline{{\ulcorner \mathsf{\gamma}\urcorner}})$, i.e. we also have $T \vdash \mathsf{Prov}{(\overline{{\ulcorner \mathsf{\gamma}\urcorner}})}$, making $T$ inconsistent. So if $T$ is consistent, $T \nvdash \mathsf{\gamma}$.
(ii) Suppose $T \vdash \neg \mathsf{\gamma}$. Then $T \vdash \mathsf{Prov}(\overline{{\ulcorner \mathsf{\gamma}\urcorner}})$, i.e. $T \vdash \mathsf{\exists x\,{Prf}}(\mathsf{x}, \overline{{\ulcorner \mathsf{\gamma}\urcorner}})$. But given $T$ is consistent, there is no proof of $\mathsf{\gamma}$, i.e. for every $m$, not-$\mathit{Prf}(m, {\ulcorner \mathsf{\gamma}\urcorner})$, whence for every $m$, $T \vdash \neg \mathsf{Prf}(\overline{{\mathsf{m}}}, \overline{{\ulcorner \mathsf{\gamma}\urcorner}})$. So we have a $\varphi$ such that $T$ proves $\exists \mathsf{x}\varphi(\mathsf{x})$ while it refutes each instance $\varphi(\overline{{\mathsf{m}}})$, which makes $T$ $\omega$-inconsistent. So if $T$ is $\omega$-consistent, $T \nvdash \neg \mathsf{\gamma}$.
\end{proof}
\noindent But now note that the general {Diagonalization Lemma} implies as a special case
\begin{theorem}
There exists a sentence $\mathsf{\gamma}$ such that $T \vdash \mathsf{\gamma} \leftrightarrow \neg \mathsf{Prov}{(\overline{\ulcorner \mathsf{\gamma}\urcorner})}$.
\end{theorem}
\noindent Moreover, since $\mathsf{Prov}$ is $\Sigma_1$, $\neg\mathsf{Prov}$ is $\Pi_1$, and the diagonalization construction produces a $\Pi_1$ fixed point $\gamma$. So putting those last two theorems together, we immediately recover Theorem~\ref{thm:firsttheoremsyntactic}.
\subsection{Relating old and new}
Briefly: how does the specific G\"odel sentence $\mathsf{G}$ as we \emph{originally} constructed it via the definitions in \S\ref{constr} stand to the generic G\"odel sentences $\gamma$s we've just been talking about?
Well, how does our proof of the Diagonalization Lemma tell us to construct a $\gamma$ such that $T \vdash \mathsf{\gamma} \leftrightarrow \neg \mathsf{Prov}{({\ulcorner \mathsf{\gamma}\urcorner})}$? It says: first form a wff $\alpha = \forall \mathsf{z(Diag}\mathsf{(y,z) \to \neg Prov(z))}$, and then diagonalize $\alpha$ to get $\gamma$. So think more about $\alpha$. Unpacking a bit, $\alpha$ is $\forall \mathsf{z(Diag}\mathsf{(y,z) \to \neg\exists xPrf(x, z))}$, which is equivalent to $\forall \mathsf{x}\forall\mathsf{z\neg(Diag}\mathsf{(y,z) \land Prf(x, z))}$, i.e. to $\forall \mathsf{x}\neg\exists\mathsf{z(Diag}\mathsf{(y,z) \land Prf(x, z))}$.
But now note that $\exists\mathsf{z(Diag}\mathsf{(y,z) \land Prf(x, z))}$ captures the $Gdl$ relation. So this makes $\alpha$ pretty much like $\forall \mathsf{x\neg Gdl(x,y)}$ -- and hence $\gamma$ (the diagonalization of $\alpha$) like $\mathsf{G}$ (the diagonalization of $\forall \mathsf{x\neg Gdl(x,y)}$).
\subsection{Proving our old $\mathsf{G}_T$ is in fact a fixed point for $\neg\mathsf{Prov}_T$}\label{sec:relatingoldtonew}
But let's now directly check that our old $\mathsf{G}_T$ is in fact a fixed point for $\neg\mathsf{Prov}_T$, by giving a direct proof of
\begin{restatable}{theorem}{TprovsGiffnotprovable}\label{th:TprovsGiffnotprovable}
If $T$ is p.r. axiomatized and contains \Q, $T \vdash \mathsf{G}_T \equiv \neg\mathsf{Prov}_T\mathsf{(\overline{\ulcorner{\mathsf{G}_T}\urcorner})}$.
\end{restatable}
\noindent But don't get bogged down in this proof -- I repeat it here from the book, just for the record.
\begin{proof}
Recall, dropping subscripts,
\begin{quote}
$\mathsf{G =_{\mathrm{def}} \exists y(y = \overline{\ulcorner U \urcorner} \land U)}$,
\end{quote}
where `$\overline{\ulcorner\mathsf{U}\urcorner}$' stands in for the numeral for $\mathsf{U}$'s g.n.: further recall\begin{quote}
$\mathsf{U =_{\mathrm{def}} \forall x\neg Gdl(x, y)}$
\end{quote}
where $\mathsf{Gdl(x, y)}$ captures our old friend, the relation $\mathit{Gdl}$, where $\mathit{Gdl}(m,n)$ holds when $m$ codes for a proof of the diagonalization of the wff with number $n$ (a proof in $T$, of course!).
Now, by definition then,
\begin{quote}
$\mathit{Gdl}(m,n) =_{\mathrm{def}} \mathit{Prf}(m, \mathit{diag}(n))$.
\end{quote}
But the one-place p.r. function $\mathit{diag}$ is captured* by an open wff $\mathsf{Diag(x, y)}$. We can therefore now retrospectively fix on the following definition:
\begin{quote}
$\mathsf{Gdl(x, y) =_{\mathrm{def}}
\exists z(Prf(x,z) \land Diag(y,z))
}$.
\end{quote}
And now let's do some elementary manipulations:%\renewcommand{\equiv}{\equiv}
\begin{quote}
$\mathsf{G} \equiv \mathsf{\forall x\neg Gdl(x, \overline{\ulcorner{U}\urcorner})}$\\
\hspace*{1em}$\equiv \mathsf{\forall x \neg \exists z(Prf(x,z) \land Diag( \overline{\ulcorner{U}\urcorner},z))}$\hspace{1.035cm}(definition of $\mathsf{Gdl}$)\\
\hspace*{0.9em}$\equiv\hspace{1em}\forall\mathsf{ x \forall z\neg(Prf(x,z) \land Diag( \overline{\ulcorner{U}\urcorner},z))}$\hspace{1.035cm}(pushing in the negation)\\\hspace*{0.9em}$\equiv\hspace{1em}\forall\mathsf{ z \forall x\neg(Prf(x,z) \land Diag( \overline{\ulcorner{U}\urcorner},z))}$\hspace{1.035cm}(swapping quantifiers)\\
\hspace*{0.9em}$\equiv\hspace{1em}\forall\mathsf{ z(Diag( \overline{\ulcorner{U}\urcorner},z) \lif \neg\exists x\,Prf(x,z))}$\hspace{0.825cm}(rearranging after `$\forall\mathsf{z}$')\\\hspace*{0.9em}$\equiv\hspace{1em}\forall\mathsf{ z(Diag( \overline{\ulcorner{U}\urcorner},z) \lif \neg\exists v\,Prf(v,z))}$\hspace{0.82cm}(changing variables)\\
\hspace*{1em}$=_{\mathrm{def}} \forall\mathsf{ z(Diag( \overline{\ulcorner{U}\urcorner},z) \lif \neg Prov(z))}$
\hspace{1.24cm}(definition of $\mathsf{Prov}$)
\end{quote}
Since this is proved by simple logical manipulations, that means we can prove the equivalence inside the formal first-order logic built into \Q\ and hence in $T$. So
\begin{quote}
$T \vdash \mathsf{G} \equiv \mathsf{\forall z(Diag(\overline{\ulcorner\mathsf{{U}}\urcorner},z) \lif \neg Prov(z))}$.
\end{quote}
Now, diagonalizing $\mathsf{U}$ yields $\mathsf{G}$. Hence, just by the definition of $\mathit{diag}$, we have $\mathit{diag}({\ulcorner\mathsf{U}\urcorner}) = {\ulcorner\mathsf{G}\urcorner}$. Since by hypothesis $\mathsf{Diag}$ captures* $\mathit{diag}$ as a function, it follows by definition that \begin{quote}
$ T \vdash \forall \mathsf{z} (\mathsf{Diag}(\overline{{\ulcorner\mathsf{U}\urcorner}}, \mathsf{z}) \equiv \mathsf{z}= \overline{\ulcorner\mathsf{G}\urcorner})$.
\end{quote}
Putting those two results together, we immediately get \begin{quote}
$T\vdash \mathsf{G} \equiv \mathsf{\forall z(\mathsf{z}= \overline{\ulcorner\mathsf{G}\urcorner} \lif \neg Prov(z))}$.
\end{quote}
But the right-hand side of that biconditional is trivially equivalent to $\neg\mathsf{Prov}(\ulcorner\mathsf{G}\urcorner)$.
So we've proved the desired result. \end{proof}
%\noindent To repeat, what this shows is that the informal claim `$\mathsf{G}$ is true if and only if it is unprovable' can itself be formally proved within \PA. Very neat!
\section{Tarski's Theorem}\label{tarskisection}
In a way, Rosser's Theorem which -- as it were -- tidies up the First Theorem by enabling us to get rid of the assumption of $\omega$-consistency is the natural next topic to look at. And that's what I do in the book. But here let's proceed in a different order and next visit two other peaks which be reached via the Diagonalization Lemma: the path is very straightforward, but it leads to a pair of rather spectacular results that are usually packaged together as \emph{Tarski's} \mbox{\emph{Theorem}}.
\subsection{Truth-predicates and truth-definitions}
Recall a familiar thought: `snow is white' is true iff snow \emph{is} white. Likewise for all other sensible replacements for `snow is white'. In sum, every instance of \emph{`$\varphi$' is true iff $\varphi$} is true. And that's because of the meaning of the informal truth-predicate `true'. %Generalizing, let's say that any informal predicate \emph{tr} is a truth-predicate if every instance of \emph{`$\varphi$' is tr iff $\varphi$} is likewise true.
Suppose we have fixed on some scheme for \gd\ numbering wffs of the interpreted arithmetical language $L$. Then we can define a corresponding numerical property $\mathit{True}$ as follows:
\begin{quote}
$\mathit{True}(n)$ is true iff $n$ is the g.n. of a true sentence of $L$.
\end{quote}
Now imagine we have some expression $\mathsf{T(x)}$ which is suitably defined to expresses this numerical property $\mathit{True}$, and let $L'$ be the result of adding $\mathsf{T(x)}$ to our initial language $L$. (For the moment, we leave it open whether $L'$ is just $L$, which it would be if $\mathsf{T(x)}$ is in fact already definable from $L$'s resources.)
Then, for any $L$-sentence $\varphi$, we have
\begin{quote}
$\varphi$ is true iff $\mathit{True}(\ulcorner\varphi\urcorner)$ iff $\mathsf{T(\overline{\ulcorner\varphi\urcorner})}$ is true.
\end{quote}
Hence, for any $L$-sentence $\varphi$, every corresponding $L'$-sentence
\begin{quote}
$\mathsf{T(\ulcorner\varphi\urcorner)} \equiv \varphi$
\end{quote}
is true. Which motivates our first main definition:
\begin{defn}An open $L'$-wff $\mathsf{T(x)}$ is a \emph{formal truth-predicate for $L$} iff for every $L$-sentence $\varphi$, $\mathsf{T(\ulcorner\varphi\urcorner)} \equiv \varphi$ is true.
\end{defn}
\noindent And here's a companion definition:
\begin{defn}A theory $\Theta$ (with language $L'$ which includes $L$) is a \emph{truth-theory for $L$} iff for some $L'$-wff $\mathsf{T(x)}$, $\Theta \vdash \mathsf{T(\ulcorner\varphi\urcorner)} \equiv\varphi$ for every $L$-sentence $\varphi$.
\end{defn}
\noindent Equally often, a truth-theory for $L$ is called a `definition of truth for $L$'.
So: in sum, a truth-predicate $\mathsf{T}$ is a predicate that applies to (the G\"odel numbers for) true sentences, and so \emph{expresses} truth, and a truth-theory is a theory which \emph{proves} the right $\mathsf{T}$-biconditionals.
%If $T$ is a sound theory with true theorems, then it includes a truth-theory if it has a truth-predicate and can prove that it \emph{is} a truth-predicate.
\subsection{The undefinability of truth} Suppose $T$ is a nice arithmetical theory with language $L$. An obvious question arises: could $T$ be competent to `define' truth \emph{for its own language} (i.e., can $T$ include a truth-theory for $L$)? And the answer is immediate:
\begin{theorem}\label{th:TarskiOne}
No consistent theory $T$ which includes \Q\ can define truth for its own language.
\end{theorem}
\begin{proof}Assume $T$ defines truth for $L$, i.e. there is an $L$-predicate $\mathsf{T(x)}$ such that $T \vdash \mathsf{T(\ulcorner\varphi\urcorner)} \equiv\varphi$ for every $L$-sentence $\varphi$. Since $T$ is has the right properties, the Diagonalization Lemma applies, so applying the Lemma to $\neg\mathsf{T(x)}$, we know that there must be some sentence $\mathsf{L}$ -- a Liar sentence\index{Liar paradox}! -- such that
\begin{enumerate}\setcounter{enumi}{0}
\item $T \vdash \mathsf{L} \equiv \neg \mathsf{T(\ulcorner\mathsf{L}\urcorner)}$.
\end{enumerate}
But, by our initial assumption, we also have
\begin{enumerate}\setcounter{enumi}{1}
\item $T \vdash \mathsf{T(\ulcorner\mathsf{L}\urcorner)} \equiv\mathsf{L}$.
\end{enumerate}
It is immediate that $T$ is inconsistent, contrary to hypothesis. So our assumption must be wrong: $T$ can't define truth for its own language. \end{proof}
\subsection{The inexpressibility of truth}
That first theorem puts limits on what a nice theory can \emph{prove} about truth. But with very modest extra assumptions, we can put limits on what a theory's language can even \emph{express} about truth.
Consider our old friend $L_A$ for the moment, and suppose that there is an $L_{A}$ truth-predicate $\mathsf{T}_A$ that expresses the corresponding truth property $\mathit{True}_A$. The Diagonalization Lemma applies, in particular to the negation of $\mathsf{T_{\mathit{A}}(x)}$.
So we know that for some $L_A$ sentence $\mathsf{L}$,
\begin{enumerate}
\item \Q\ $\vdash \mathsf{L} \equiv \neg \mathsf{T_{\mathit{A}}(\ulcorner\mathsf{L}\urcorner)}$.
\end{enumerate}
But (and here comes the extra assumption we said we were going to invoke!) everything \Q\ proves is true, since \Q's axioms are of course true and its logic is truth preserving. So
\begin{enumerate}\setcounter{enumi}{1}
\item $\mathsf{L} \equiv \neg \mathsf{T_{\mathit{A}}(\ulcorner\mathsf{L}\urcorner)}$
\end{enumerate}
will also be a true $L_A$ wff. But, by the assumption that $\mathsf{T}_A$ is a truth-predicate for $L_{A}$,
\begin{enumerate}\setcounter{enumi}{2}
\item $\mathsf{T_{\mathit{A}}(\ulcorner\mathsf{L}\urcorner)} \equiv\mathsf{L}$
\end{enumerate}
must be true too. (2) and (3) immediately lead to contradiction again. Therefore our supposition that $\mathsf{T}_A$ is a truth-predicate has to be rejected. Hence no predicate of $L_A$ can even express the numerical property $\mathit{True}_A$.
The argument evidently generalizes. Take any language $L$ rich enough for us to be able to formulate in $L$ something equivalent to the very elementary arithmetical theory \Q\ (that's so we can prove the Diagonalization Lemma again). Call that an arithmetically adequate language. Then by the same argument, assuming \Q\ is a correct theory,
\begin{theorem}\label{Tarskitheorem}
%\textbf{Theorem~\ref{unprovable}*}\hspace{0.8em}\emph
{No predicate of an arithmetically adequate language $L$ can express the numerical property \emph{True}$_{L}$ (i.e. the property of numbering a truth of $L$).
}
\end{theorem}
%Let's now introduce the numerical property $\mathit{true_{\mathit{T}}}$ which is defined so that $\mathit{true_{\mathit{T}}}(n)$ holds just in case $n$ is the g.n. of a wff of the arithmetic theory $T$ which is true on the standard interpretation. Suppose there were a $T$-predicate
%$\mathsf{T}$ which expresses this property. In other words, suppose (T):\begin{quote}
%%(T)
%For any $T$-wff $\varphi$, $\mathsf{T(\ulcorner\varphi\urcorner)}$ is true if and only if $\varphi$ is true, i.e. $\varphi \equiv \mathsf{T(\ulcorner\varphi\urcorner)}$ is true.
%\end{quote}
%But now apply the Diagonalization Theorem to the predicate $\neg\mathsf{T}$. This tells us that there is a sentence $\mathsf{L}$ (`L' for `Liar'!) such that
%\begin{quote}
%$T \vdash \mathsf{L} \equiv \neg \mathsf{T(\ulcorner\mathsf{L}\urcorner)}$
%\end{quote}
%And trouble is immediate (assuming $T$ is a sound theory). Since $\mathsf{L} \equiv \neg \mathsf{T(\ulcorner\mathsf{L}\urcorner)}$ is a theorem, it is true, given that $T$ is a sound theory and proves only truths. But by (T), $\mathsf{L} \equiv \mathsf{T(\ulcorner\mathsf{L}\urcorner)}$ is also true. So $\mathsf{L} \equiv \neg\mathsf{L}$. Contradiction! So (T) cannot be true.
This tells us that while you can express \emph{syntactic} properties of a sufficiently rich formal theory of arithmetic (like provability) inside the theory itself via \gd\ numbering, you can't express some key \emph{semantic} properties (like arithmetical truth) inside the theory.\index{Tarski,1@Tarski's Theorem|)}
\subsection{A moral}
% -- the proof which, so to speak, belongs in The Book in which God maintains the best proofs for mathematical theorems. But we needn't pause to debate this point of mathematical aesthetics.\footnote{For Paul Erd\H{o}s's conceit of proofs from The Book, see
%\cite{Aig04}.}
Suppose $T$ is a nice theory. Then (1) there are some numerical properties that $T$ can capture (the p.r. ones for a start); (2) there are some properties that $T$ can express but not capture (for example the property of G\"odel-numbering a $T$-theorem -- see the book, \S21.4); and (3) there are some properties that $T$'s language $L$ cannot even express (for example \emph{True}$_{\mathit{L}}$, the numerical property of numbering-a-true-$L$-wff).
It is not, we should hasten to add, that the property \emph{True}$_{\mathit{L}}$ is mysteriously ineffable, and escapes all formal treatment. A richer theory $T'$ with a richer language $L'$ may perfectly well be able to capture \emph{True}$_{\mathit{L}}$. But the point remains that, however rich a given theory of arithmetic is, there will be limitations, not only on what numerical properties it can capture but even on which numerical properties that particular theory's language can express.
\section{The Master Argument}
Our results about the non-expressibility of truth of course point to a particularly illuminating take on the argument for incompleteness.
For example: truth in $L_A$ isn't provability in \PA, because while \PA-provability \emph{is} expressible in $L_A$, truth-in-$L_A$ \emph{isn't}. So assuming that \PA\ is sound and everything provable in it is true, this means that there must be truths of $L_A$ which it can't prove. Similarly, of course, for other nice theories.
And in a way, we might well take this to be \emph{the} Master Argument for incompleteness, revealing the roots of the phenomenon. \gd\ himself wrote (in response to a query)
\begin{quote}
I think the theorem of mine that von Neumann\index{von Neumann, John} refers to is \ldots that a complete epistemological description
of a language A cannot be given in the same language A, because
the concept of truth of sentences in A cannot be defined in A. \emph{It
is this theorem which is the true reason for the existence of
undecidable propositions in the formal systems containing arithmetic.}
I did not, however, formulate it explicitly in my paper of 1931 but
only in my Princeton lectures of 1934. The same theorem was
proved by Tarski in his paper on the concept of truth.\end{quote}
In sum, as we emphasized before, arithmetical truth and provability in this or that formal system must peel apart.
\section{Rosser's Theorem}
\subsection{Rosser's basic trick}
One half of the First Theorem requires the assumption that we are dealing with a theory $T$ which is not only consistent but is $\omega$-consistent. But we can improve on this in two different ways:
\begin{enumerate}
\item %In Section~\ref{sec:1consistency},
We can keep the \emph{same} undecidable sentence $\mathsf{G}_T$ while invoking the weaker assumption of so-called `1-consistency' in showing that $T \nvdash \neg\mathsf{G}_T$.
\item Following Barkley Rosser, we can construct a \emph{different} and more complex sentence $\mathsf{R}_T$ such that we only need to assume $T$ is plain consistent in order to show that $\mathsf{R}_T$ is formally undecidable.
\end{enumerate}
Since Rosser's clever construction yields the better result, that's the result we'll talk about here (I say something about 1-consistency in the book).
So how does Rosser construct an undecidable sentence $\mathsf{R}_T$ for $T$? Well, essentially, where \gd\ constructs a sentence $\mathsf{G}_T$ that indirectly says `I am unprovable in $T$', Rosser constructs a `Rosser sentence' $\mathsf{R}_T$ which indirectly says `if I am provable in $T$, then my negation is already provable' (i.e. it says that if there is a proof of $\mathsf{R}_T$ with super g.n.\ $n$, then there is a proof of $\neg\mathsf{R}_T$ with a smaller code number).
\subsection{Implementing the trick}
Consider the relation $\overline{\mathit{Prf}}_T(m, n)$ which holds when $m$ numbers a $T$-proof of the \emph{negation} of the wff with number $n$. This relation is obviously p.r. given that ${\mathit{Prf}}_T$ is; so assuming $T$ is has the usual properties it will be captured by a wff $\overline{\mathsf{Prf}}_T(\mathsf{x, y})$.
So let's consider \emph{the Rosser provability predicate} defined as follows:
\begin{defn}
$\mathsf{RProv_{\mathit{T}}(x) =_{\mathrm{def}} \exists v(Prf_{\mathit{T}}(v,x) \land (\forall w \leq v)\neg\overline{Prf}_{\mathit{T}}(w,x))}$.
\end{defn}
\noindent Then a sentence is Rosser-provable in $T$ -- its g.n. satisfies the Rosser provability predicate -- if it has a proof (in the ordinary sense) and there's no `smaller' proof of its negation.
Now we apply the Diagonalization Lemma, not to the negation of a regular provability predicate (which is what we just did to get \gd's First Theorem again), but to the negation of the Rosser provability predicate. The Lemma then tells us,
\begin{theorem}
Given that $T$ is p.r. axiomatized and contains \Q, then there is a sentence $\mathsf{R}_T$ such that
$T \vdash \mathsf{R}_T \equiv \neg\mathsf{RProv_{\mathit{T}}}(\ulcorner\mathsf{R}_T\urcorner)$.
\end{theorem}
\noindent We call such a sentence $\mathsf{R}_T$ a Rosser sentence for $T$.
Another semantic incompleteness result is immediate:
\begin{theorem}
If $T$ is a \emph{sound} p.r. axiomatized theory including \Q\ (and because sound therefore consistent), $T \nvdash \mathsf{R}_T$ and $T \nvdash \neg\mathsf{R}_T$, where $\mathsf{R}_T$ is a Rosser sentence
\end{theorem}
\begin{proof} Assume $T$'s soundness, then its theorems are true, and $\mathsf{R}_T$ is true if and only if it is not Rosser-provable. Suppose $\mathsf{R}_T$ \emph{were} a theorem. Then it would be true since all theorems are true. So it is not Rosser-provable, which means that `if $\mathsf{R}_T$ is provable, $\neg \mathsf{R}_T$ is already provable' would be true, and also this conditional would have a true antecedent. We can infer that $\neg \mathsf{R}_T$ is provable. Which makes $T$ inconsistent, contrary to hypothesis. Therefore $\mathsf{R}_T$ is unprovable. Which shows that the material conditional `if $\mathsf{R}_T$ is provable, $\neg \mathsf{R}_T$ is already provable' has a false antecedent, and hence is true. In other words, $\mathsf{R}_T$ is true. Hence its negation $\neg \mathsf{R}_T$ is false, and is therefore unprovable since only truths are provable in a sound theory.
\end{proof}
\noindent As we said, however, in order to show that neither $\mathsf{R}_T$ nor $\neg \mathsf{R}_T$ is provable we do not need the semantic assumption that $T$ is sound. {The syntactic assumption of $T$'s consistency is enough}.
\subsection{Rosser's Theorem}
We can now show that
\begin{theorem}\label{th:diagforrosser}
Let $T$ be consistent p.r. axiomatized theory including \Q\, and let $\rho$ be {any} fixed point for $\neg\mathsf{RProv_{\mathit{T}}(x)}$.
Then $T \nvdash \rho$ and $T \nvdash \neg\rho$.
\end{theorem}
\noindent And since the Diagonalization Lemma tells us that there is a fixed point, it follows that $T$ has an undecidable sentence $\mathsf{R}_T$, without now requiring $\omega$-consistency. Sadly, however -- and there's no getting away from it -- the proof of Theorem~\ref{th:diagforrosser} is messy and very unpretty. Masochists can check out the proof of Theorem 21.2 in the book (on p. 178). We then have to do more work to beef up that proof idea to show that in fact (as with G\"odel's original proof) we can find a $\Pi_1$ sentence which is undecidable so long as $T$ is consistent (that work is done on p. 179). You do not need to know these proofs! -- just that they exist, so we get Rosser's Theorem (compare Theorem~\ref{firstgodel}):
\begin{restatable}{theorem}{rosserthm}\label{thmrosserthm}
If $T$ is a p.r. adequate, p.r. axiomatized theory whose language includes $L_A$, then
there is $\Pi_1$ sentence $\varphi$ such that, if $T$ is consistent then $T \nvdash \varphi$ and $T \nvdash \neg\varphi$.
\end{restatable}
\vspace{12pt}\noindent And that's enough -- at least in these notes -- about the First Incompleteness Theorem. There's quite a bit more in the book, in Chs 19--23, which I'm not going to be covering in lectures. Enthusiasts will want to devour the lot! -- but let me especially highlight the sections which amplify this episode, and then the sections you ought to know about for general logical purposes anyway:
\begin{enumerate}
\item More on the topics in the episode: \S\S19.1--19.3, Ch. 20, \S21.1--21.6: browse through Ch. 23.
\item For second-order arithmetics: \S\S22.1--22.6.
\end{enumerate}
%%%%%%%%%%
\newpage
%\setcounter{section}{0}
\setcounter{page}{1}
\setcounter{footnote}{0}
\begin{center}%
{{\Large \emph{G\"odel Without (Too Many) Tears -- 10}}\\[16pt]{\LARGE Introducing the Second Theorem} \par%
\vskip 1.5em%
{\large
\lineskip .75em%
\begin{tabular}[t]{c}%
Peter Smith
\end{tabular}\par}}%
\vskip 0.75em%
{University of Canterbury, Christchurch, NZ}\\[6pt]
{March 30, 2010}%
\vskip 1.5em%
\end{center}%\par
\noindent\hrulefill
\begin{itemize}\setlength{\itemsep}{0pt}
\item $\mathsf{Con}_T$, a canonical consistency sentence for $T$
\item The Formalized First Theorem
\item The Second Theorem
\item Why the Second Theorem matters
\item What it takes to prove it%\item Reflecting on the First Theorem
\end{itemize}
\noindent\hrulefill
\vspace{8pt}\noindent This episode introduces -- at last -- the \emph{Second} Incompleteness Theorem, says just something about why it matters, and about what it takes to prove it.
Just two very quick reminders before we start. We said
\defofPrfcanonPrf*
\defofProv*
\noindent And then recall we proved:
\TprovsGiffnotprovable*
\section{The Second Theorem introduced}
\subsection{Definitional preliminaries}
\noindent We haven't put any requirement on the particular formulation of first-order logic built into \Q\ (and hence any theory which contains it). It may or may not have a built-in absurdity constant. But henceforth, let's use the sign `$\bot$' in the following way:
\begin{defn}
`\,$\bot$\!' is $T$'s built-in absurdity constant if it has one, or else it is an abbreviation for `\,$\mathsf{0 = \overline{1}}\!$'.
\end{defn}
\noindent If $T$ contains \Q, $T$ of course proves $\mathsf{0 \neq \overline{1}}$. So on either reading of `$\bot$', if $T$ proves $\bot$, it is inconsistent. And if $T$'s logic is standard and it is inconsistent, then it will prove $\bot$.
Now consider the wff $\neg\mathsf{Prov}_T(\overline{\ulcorner{\bot}\urcorner})$. That is true if and only if $T$ \emph{doesn't} prove $\bot$, i.e. (given what we've just said) if and only if $T$ is consistent. That evidently motivates the definition
\begin{defn}
$\mathsf{Con}_T$ abbreviates $\neg\mathsf{Prov}_T(\overline{\ulcorner{\bot}\urcorner})$.
\end{defn}
\noindent Note by the way that since $\mathsf{Prov}_T$ is $\Sigma_1$, $\mathsf{Con}_T$ is $\Pi_1$. For obvious reasons, the arithmetic sentence $\mathsf{Con}_T$ is called a canonical consistency sentence for $T$.
Or at least, $\mathsf{Con}_T$ is the crispest definition of a consistency sentence for $T$. There are alternatives. Here's another pretty natural one. Suppose $\mathsf{Contr(x, y)}$ captures the p.r. relation which holds between two numbers when they code for a contradictory pair of sentences, i.e. one codes for some sentence $\varphi$ and the other for $\neg\varphi$. Then we could put
\begin{defn}\label{seconddefnforCon}
$\mathsf{Con}'_T =_{\mathrm{def}} \neg\mathsf{\exists x\exists y(Prov_{\mathit{T}}(x) \land Prov_{\mathit{T}}(y) \land Contr(x,y))}$.
\end{defn}
\noindent But, on modest assumptions, this sort of definition and its variants are equivalent: so we'll stick to the crisp one.
%From now on, we will again usually drop subscripts if confusion won't ensue.
%A normal theory $T$, recall, is one that includes Robinson Arithmetic, but need be no stronger. But henceforth, we are going to be more interested in theories which are at least a bit stronger. We'll say that a
%\begin{defn}
%A theory $T$ is $\Sigma$-normal if it is normal and has induction for $\Sigma_1$-predicates.
%\end{defn}
%$\mathsf{PA}$, and any stronger theory, is of course $\Sigma$-normal. The interest of such theories is that they can use induction to prove general truths involving $\Sigma_1$-predicates like $\mathsf{Prf}$ and $\mathsf{Diag}$.
%\begin{defn}
%$\mathsf{I\Sigma}_1$ is the weakest $\Sigma$-normal theory, i.e. it is the result of adding all instances of induction for $\Sigma_1$-predicates to $\mathsf{Q}$.
%\end{defn}
\subsection{The Formalized First Theorem}
One half of the First Theorem tells us that, for suitable $T$ (nicely axiomatized, containing \Q),
\begin{enumerate}\renewcommand{\labelenumi}{(\arabic{enumi})}
\item If $T$ is consistent then $\mathsf{G}_T$ is not provable in $T$.
\end{enumerate}
(Remember, we only need the idea of $\omega$-consistency for the other half of the First Theorem). Now, we can faithfully express (1) inside $T$ itself by
\begin{enumerate}\renewcommand{\labelenumi}{(\arabic{enumi})}\setcounter{enumi}{1}
\item $\mathsf{Con}_T \to \neg\mathsf{Prov}_T(\overline{\ulcorner{\mathsf{G}_T}\urcorner})$.
\end{enumerate}
But now reflect that the informal reasoning for the First Theorem is in fact rather elementary (we needed no higher mathematics at all, just simple reasoning about arithmetic matters). So we might well expect that if $T$ contains enough arithmetic, it should itself be able to replicate that elementary reasoning.
In other words, if $T$ is strong enough, then $T$ can not only express (half) of the First Theorem via the wff abbreviated (2), but should be able to prove it too! -- so we'd hope to have
\begin{restatable}{theorem}{formalizedFirstThm}\label{th:formalizedFirstThm}%\renewcommand{\labelenumi}{(\arabic{enumi})}\setcounter{enumi}{2}
For strong enough $T$, $T \vdash \mathsf{Con}_T \to \neg\mathsf{Prov}_T(\overline{\ulcorner{\mathsf{G}_T}\urcorner})$.
\end{restatable}
\noindent Call such a result the \emph{Formalized First Theorem} for the relevant provability predicate. Suppose for now that we can indeed prove such a result.
\subsection{The unprovability of consistency}
We've just reminded ourselves of Theorem~\ref{th:TprovsGiffnotprovable} which says that $T \vdash \mathsf{G}_T \leftrightarrow \neg\mathsf{Prov}_T(\overline{\ulcorner{\mathsf{G}_T}\urcorner})$. So putting that together with Theorem~\ref{th:formalizedFirstThm}, we get
\begin{enumerate}\renewcommand{\labelenumi}{(\arabic{enumi})}\setcounter{enumi}{2}
\item if $T \vdash \mathsf{Con}_T$, then $T \vdash \mathsf{G}_T$.
\end{enumerate}
But we know from the First Theorem that,
\begin{enumerate}\renewcommand{\labelenumi}{(\arabic{enumi})}\setcounter{enumi}{3}
\item If $T$ is consistent, $T \nvdash \mathsf{G}_T$.
\end{enumerate}
So the Formalized First Theorem immediately yields the unprovability of the relevant consistency sentence. %Restoring the subscript we have
\begin{theorem}\label{th:vaguesecondtheorem}
For strong enough $T$, if $T$ is consistent, then $T \nvdash \mathsf{Con}_T$.
\end{theorem}
\noindent Which is a somewhat vague version of the Second Incompleteness Theorem: roughly, for the right kind of theories $T$ and the right kind of consistency sentences, $T$ can't prove its own consistency sentences.
Obviously, we need to say something about what counts as a `strong enough $T$'; but our vague statement will do as a very first introduction. Indeed, this is about as much as G\"odel says in his original 1931 paper where he too didn't spell out the details.
%(Actually, it can be proved by a different route that $\mathsf{Q} \nvdash\mathsf{Con}_{\mathsf{Q}}$: so we can weaken the condition for the Second Theorem to apply. But the point is of limited interest, and we'll say no more about it.)
\section{How interesting is the Second Theorem?}
You might well think: `OK, so we can't derive $\mathsf{Con}_T$ in $T$. But that fact is of course no evidence at all \emph{against} $T$'s consistency, since we already know from the First Theorem that various true claims about unprovability -- like the standardly constructed G\"odel sentence $\mathsf{G}_T$ -- will be underivable in $T$. On the other hand, if -- \emph{per impossibile} -- we could have given a $T$ proof of $\mathsf{Con}_T$, that wouldn't have given us any special evidence \emph{for} $T$'s consistency: we could simply reflect that even if $T$ were inconsistent we'd still be able to derive $\mathsf{Con}_T$, since we can derive \emph{anything} in an inconsistent theory! Hence the derivability or otherwise of a canonical statement of $T$'s consistency {inside $T$ itself} can't show us a great deal.'
But, on reflection, the Theorem \emph{does} yield some plainly important and substantial corollaries, of which the most important is this:
\begin{theorem}
Suppose $S$ is a consistent theory, strong enough for the Second Theorem to apply to it, and $W$ is a fragment of $S$, then $W \nvdash \mathsf{Con}_S$.
\end{theorem}
\noindent That's because, if $S$ can't prove $\mathsf{Con}_S$, a fortiori \emph{part} of $S$ can't prove it .
So, for example, we \emph{can't} take some problematic rich theory like set theory which extends arithmetic and show that it is consistent by (i) using arithmetic coding for talking about its proofs and then (ii) using uncontentious reasoning already available in some relatively {weak}, purely arithmetical, theory.
Which means that the Second Theory -- at least at first blush -- sabotages Hilbert's Programme (see \S\ref{secwhatisHilbertsProgram} in these notes, and in the longer discussions in $IGT$).
\section{What does it take to prove the Second Theorem?}
\subsection{Sharpening the Second Theorem}
$\mathsf{Prov}_T(\mathsf{y})$ abbreviates $\mathsf{\exists v\,Prf_{\mathit{T}}(v,y)}$; and $\mathsf{Prf_{\mathit{T}}}$ is a $\Sigma_1$ expression. So arguing about provability inside $T$ will involve establishing some general claims involving $\Sigma_1$ expressions. And how do we prove quantified claims? Using induction is the default method.
It therefore looks quite a good bet that $T$ will itself be able to prove (the relevant half of) the First Theorem for $T$, i.e. we will have $T \vdash \mathsf{Con}_T \to \neg\mathsf{Prov}_T(\overline{\ulcorner{\mathsf{G}_T}\urcorner})$, if $T$ has $\Sigma_1$-induction -- meaning that $T$'s axioms include (the universal closures of) all instances of the first-order Induction Schema where the induction predicate $\varphi$ is no more complex than \mbox{$\Sigma_1$}. So let's define
\begin{defn}
A theory is $\Sigma$-normal, if it is p.r. axiomatized, contains \Q, and also {also} includes induction at least for $\Sigma_1$ wffs.
\end{defn}
\noindent Then the following looks a plausible conjecture
\begin{theorem}\label{th:ifsigmathenderivability}
If $T$ is $\Sigma$-normal, then $T$ proves the formalized First Theorem, and so if $T$ is consistent, $T \nvdash \mathsf{Con}_T$.
\end{theorem}
\noindent (Warning: `$\Sigma$-normal' is my shorthand: there seems to be no standard term here.) That sharpens our vaguely stated Theorem~\ref{th:vaguesecondtheorem}; and this better version is indeed provable. We won't give a full proof, however: but in the rest of the episode, we'll say something about how the details get filled in.
\subsection{The box notation}
To improve readability, let's introduce some notation. We will henceforth \mbox{abbreviate} $\mathsf{Prov_{\mathit{T}}(\overline{\ulcorner\varphi\urcorner}})$ simply by $\Box_T\varphi$. {If you are familiar with modal logic, then you will immediately recognize the conventional symbol for the necessity operator. And the parallels and differences between `{``$\mathsf{1 + 1 = 2}$''} is provable (in $T$)' and `It is necessarily true that $1 + 1 = 2$' are highly suggestive. These parallels and differences are the topic of `provability logic', the subject of a contemporary classic, Boolos's \emph{The Logic of Provability}.}
So in particular, $\neg\mathsf{Prov}_T(\overline{\ulcorner{\mathsf{G}_T}\urcorner})$ can be abbreviated $\neg\Box_T\mathsf{G}_T$. Thus in our new notation, the Formalized First Theorem is $T \vdash \mathsf{Con}_T \to \neg\Box_T\mathsf{G}_T$. Moreover, $\mathsf{Con}_T$ can now alternatively be abbreviated as $\neg\Box_T\bot$.
However, we will very often drop the explicit subscript from the box symbol and elsewhere, and let context supply it.
\subsection{The `Derivability Conditions' and a tiny amount of history}
First a standard definition: we will say (dropping subscripts)
\begin{restatable}{defn}{derivabilityconditions}
The derivability conditions hold in $T$ if and only if, for any $T$-sentences $\varphi$, $\psi$,\begin{enumerate}\renewcommand{\theenumi}{C\arabic{enumi}}\setlength{\itemsep}{0pt}
\item If $T \vdash \varphi$, then $T \vdash \Box\varphi$,
\item $T \vdash \Box\mathsf{(\varphi \lif \psi)} \lif (\Box\varphi \lif \Box\psi)$,
\item $T \vdash \Box\varphi \lif \Box\Box\varphi$.
\end{enumerate}
\end{restatable}
\noindent We can then prove
\begin{restatable}{theorem}{sigmanormalimpliesderivability}\label{thm:sigmanormalimpliesderivability}
If $T$ is $\Sigma$-normal, then the derivability conditions hold for $T$.
\end{restatable}
\begin{restatable}{theorem}{deriventailfirst}\label{thm:deriventailfirst}
If $T$ is $\Sigma$-normal and the derivability conditions hold for $T$, then $T$ proves the formalized First Theorem.
\end{restatable}
\noindent These two theorems together evidently entail Theorem~\ref{th:ifsigmathenderivability}, so we can concentrate on them. Now, proving Theorem~\ref{thm:sigmanormalimpliesderivability} in detail is a seriously tedious task, and I don't propose to do it here, nor do I do it in the current edition of $IGT$ either -- though I do outline proof-sketches there in \S\S26.1--26.3. But we \emph{will} here prove Theorem~\ref{thm:deriventailfirst}.
But first, a \emph{very} small amount of history. As we remarked before, \gd\ himself didn't prove the Formalized First Theorem for his particular formal theory $P.$ The hard work was first done for a different theory by David Hilbert and Paul Bernays\index{Bernays, Paul} in their \emph{Grundlagen der Mathematik} of 1939: the details of their proof are in fact due to Bernays, who had discussed it with \gd\ during a transatlantic voyage.
Now, Hilbert and Bernays helpfully isolated `derivability conditions' on the predicate $\mathsf{Prov}_T$, conditions whose satisfaction is indeed enough for a theory $T$ to prove the Formalized First Theorem. Later, Martin H. Lob gave a rather neater version of these conditions: and it is his version which has become standard and features in our definition above. The derivability conditions are consequently sometimes called the HBL conditions.
\subsection{Deriving the Formalized First Theorem}
As announced, we'll now prove Theorem~\ref{thm:deriventailfirst}. We assume $T$ is $\Sigma$-normal (in fact, all we need is that it is p.r. axiomatized and contains \Q), and that the derivability conditions hold, and aim to show (dropping subscript `$T$'s) $T \vdash \mathsf{Con} \to \neg\Box\mathsf{G}$.
\begin{proof}
First, since $T$ is p.r. axiomatized and contains \Q, Theorem~\ref{th:TprovsGiffnotprovable} holds. So, in our new symbolism, we have $T \vdash \;\mathsf{G} \leftrightarrow \neg \Box\mathsf{G}$.
Second, note that for any theory $T$ containing \Q, $T \vdash \neg\bot$ (either by the built-in logic, or because we've put $\bot =_{\mathrm{def}} \mathsf{0 = 1}$). And simple logic will show that, for any wff $\varphi$, we have \begin{tabbing}
\hspace{2.5em}\= \hspace{0.2em} \= \hspace{1cm} \= \hspace{1cm}\hspace{5.5cm}\= \kill
\>\>$T \vdash\neg\varphi \lif (\varphi \lif \bot)$.
\end{tabbing}
Given the latter and the derivability condition (C1), this means
\begin{tabbing}
\hspace{2.5em}\= \hspace{0.2em} \= \hspace{1cm} \= \hspace{1cm}\hspace{5.5cm}\= \kill
\>\>$T \vdash\Box(\neg\varphi \lif (\varphi \lif \bot))$.
\end{tabbing}
So given the derivability condition (C2) and using modus ponens, it follows that for any $\varphi$\begin{tabbing}
\hspace{2.5em}\= \hspace{0.2em} \= \hspace{1cm} \= \hspace{1cm}\hspace{5.5cm}\= \kill
\>A.\'\>$T \vdash \Box\neg\varphi\,\lif\, \Box(\varphi\lif \bot)$.\>\end{tabbing}
We now argue as follows:
\begin{tabbing}
\hspace{2.5em}\= \hspace{0.2em} \= \hspace{1cm} \= \hspace{1cm}\hspace{5.5cm}\= \kill
\>1.\' \>$T \vdash \mathsf{G \lif \neg\Box{G}}$\>\>Half of Thm~\ref{th:TprovsGiffnotprovable}\\
\>2.\' \>$T \vdash \Box(\mathsf{G \lif \neg\Box{G}})$\>\>From 1, given C1 \\
\>3.\' \>$T \vdash \Box\mathsf{G} \lif \Box\neg\Box\mathsf{G}$\>\>From 2, given C2\\
\>4.\' \>$T \vdash \Box\neg\Box\mathsf{G} \,\lif\, \Box(\Box\mathsf{G} \lif \bot)$\>\>Instance of A\\
\>5.\' \>$T \vdash \Box\mathsf{G}\lif \Box(\Box\mathsf{G} \lif \bot)$\>\>From 3 and 4\\
\>6.\' \>$T \vdash \Box\mathsf{G}\lif (\Box\Box\mathsf{G} \lif \Box\,\bot)$\>\>From 5, given C2\\
\>7.\' \>$T \vdash \Box\mathsf{G} \lif \Box\Box\mathsf{G}$\>\>Instance of C3\\
\>8.\' \>$T \vdash \Box\mathsf{G}\lif \Box\,\bot$\>\>From 6 and 7\\
\>9.\' \>$T \vdash \neg\Box\,\bot \lif \neg\Box\mathsf{G}$ \>\>Contraposing\\
\>10.\' \>$T \vdash \mathsf{Con} \lif \neg\Box\mathsf{G}$ \>\>Definition of $\mathsf{Con}$\hspace{1.2cm} %\\
%\>12. \>$\mathsf{Con} \lif \mathsf{G}$ \>\>From 1 and 11
\end{tabbing}
Which gives us the Formalized First Theorem (and hence, as before, the Second Incompleteness Theorem).
\end{proof}
%%%%%%%%%%
\newpage
%\setcounter{section}{0}
\setcounter{page}{1}
\setcounter{footnote}{0}
\begin{center}%
{{\Large \emph{G\"odel Without (Too Many) Tears -- 11}}\\[16pt]{\LARGE L\"ob's Theorem and other excitements} \par%
\vskip 1.5em%
{\large
\lineskip .75em%
\begin{tabular}[t]{c}%
Peter Smith
\end{tabular}\par}}%
\vskip 0.75em%
{University of Canterbury, Christchurch, NZ}\\[6pt]
{March 31, 2010}%
\vskip 1.5em%
\end{center}%\par
\noindent\hrulefill
\begin{itemize}\setlength{\itemsep}{0pt}
\item Curry's Paradox
\item L\"ob's Theorem
\item L\"ob's Theorem implies the Second Incompleteness Theorem again
\item $\mathsf{Con}_T$ as an undecidable sentence
\item Consistent theories that `prove' their own inconsistency
\item What's still to come \ldots
\end{itemize}
\noindent\hrulefill
\vspace{8pt}\noindent Let's just restate a key definition from the last episode, which we'll be repeatedly using again. \derivabilityconditions*
\section{L\"ob's Theorem}
\subsection{Curry's Paradox}
Let's start with by visiting an embroidery of an argument known as Curry's Paradox, though in fact it has medieval ancestry.\footnote{For a much simpler version, see \textsf{http://en.wikipedia.org/wiki/Curry's$\_$paradox}: the version I'm giving involves more palaver, in order to give something parallel to the proof of L\"ob's Theorem.}
The following seem compelling principles about truth:
\begin{enumerate}\renewcommand{\theenumi}{T\arabic{enumi}}\setlength{\itemsep}{0pt}\setcounter{enumi}{-1}
\item If $\varphi$ is true, then $\varphi$,
\item If $\varphi$ then $\varphi$ is true,
\item If $\varphi \lif \psi$ is true, then if $\varphi$ is true, $\psi$ is true too,
\item If $\varphi$ is true, then it is true that $\varphi$ is true.
\end{enumerate}
or in symbols, using $\mathsf{T}$ as a truth-predicate,
\begin{enumerate}\renewcommand{\theenumi}{T\arabic{enumi}}\setlength{\itemsep}{0pt}\setcounter{enumi}{-1}
\item If $\mathsf{T}\varphi$, then $\varphi$,
\item If $\varphi$ then $\mathsf{T}\varphi$,
\item If $\mathsf{T}(\varphi \lif \psi)$, then if $\mathsf{T}\varphi$, then $\mathsf{T}\psi$,
\item If $\mathsf{T}\varphi$ is true, then it is true that $\mathsf{T}\mathsf{T}\varphi$.
\end{enumerate}
Let $\varphi$ stand in for any sentence you like -- as it might be `The moon is made of green cheese'. Using our principles about truth, we can now show that $\varphi$ is true!
For it seems -- doesn't it? -- that we can construct a sentence that says e.g. `If I am true, then the moon is made of green cheese' (after all, I've just constructed it!). Let's abbreviate that simply $\delta$. Then $\delta$ says that if $\delta$ is true, then $\varphi$. So in symbols we have
\begin{tabbing}
\hspace{3em}\= \hspace{1em} \= \hspace{1cm} \= \hspace{1cm}\hspace{6cm}\= \kill
\>1.\' \>$\mathsf{\delta \equiv (\mathsf{T} \delta \lif \varphi)}$.
\end{tabbing}
And now we can argue as follows:
\begin{tabbing}
\hspace{3em}\= \hspace{1em} \= \hspace{1cm} \= \hspace{1cm}\hspace{6cm}\= \kill
\>2.\' \>$\mathsf{T}\varphi \lif \varphi$\>\>From T0\\
\>3.\' \>$\mathsf{\delta \lif (\mathsf{T} \delta \lif \varphi)}$\>\>From 1\\
\>4.\' \>$\mathsf{T}\mathsf{(\delta \lif (\mathsf{T} \delta \lif \varphi))}$\>\>From 3, by T1\\
\>5.\' \>$\mathsf{T}\mathsf{\delta \lif \mathsf{T}(\mathsf{T} \delta \lif \varphi)}$\>\>From 4, by T2 \\
\>6.\' \>$\mathsf{T}\mathsf{\delta \lif (\mathsf{T}\mathsf{T} \delta \lif \mathsf{T} \varphi)}$\>\>From 5, by T2\\
\>7.\' \>$\mathsf{T}\mathsf{\delta \lif \mathsf{T}\mathsf{T} \delta}$\>\>By T3\\
\>8.\' \> $\mathsf{T}\mathsf{\delta \lif \mathsf{T} \varphi}$\>\>From 6 and 7\\
\>9.\' \> $\mathsf{T}\mathsf{\delta \lif \varphi}$\>\>From 2 and 8\\
\>10.\'\> $\mathsf{\delta}$\>\>From 1 and 9\\
\>11.\'\> $\mathsf{T}\mathsf{\delta}$\>\>From 10, by T1\\
\>12.\'\> $\mathsf{\varphi}$\>\>From 9 and 11\end{tabbing}
Wow!
Well, it seems that the rabbit must have got smuggled into the hat when we accepted that there could be such a sentence as $\delta$ satisfying (1). What this shows is that it isn't just self-reference-combined-with-negation that causes problems (as in the Liar Paradox). In other words, just trying to box clever with ideas about negation, the distinction between being not true and being false, etc., won't resolve all paradoxes in the same general family.
\subsection{Proving L\"ob's Theorem}
What G\"odel saw, in the broadest terms, is that when we move from talking about truth to talking about provability, Liar-style reasoning leads not to paradox but a theorem. Much later, L\"ob spotted that, similarly, when we move from talking about truth to talking about provability, Curry-style reasoning again leads not to paradox but a theorem. Here's how.
Again let $\varphi$ be any sentence, and consider the wff $\mathsf{(Prov(z) \lif \varphi)}$. Assuming $T$ can prove the general Diagonalization Lemma (Theorem~\ref{th:diag}), we'll have the particular case that there is a $\delta$ such that $T$ can prove $\delta \leftrightarrow \mathsf{(Prov(\overline{\ulcorner{\delta}\urcorner}) \lif \varphi)}$.
Hence, in our new notation, we'll have
\begin{tabbing}
\hspace{3em}\= \hspace{1em} \= \hspace{1cm} \= \hspace{1cm}\hspace{6cm}\= \kill
\>1.\' \>$T \vdash \mathsf{\delta \equiv (\Box \delta \lif \varphi)}$.\>\>By Thm~\ref{th:diag}
\end{tabbing}
And now we can argue as follows, exactly parallel to our argument for Curry's Paradox, after the initial supposition:
\begin{tabbing}
\hspace{3em}\= \hspace{1em} \= \hspace{1cm} \= \hspace{1cm}\hspace{6cm}\= \kill
\>2.\' \>$T \vdash \Box\varphi \lif \varphi$ \>\>Supposition\\
\>3.\' \>$T \vdash \mathsf{\delta \lif (\Box \delta \lif \varphi)}$\>\>From 1\\
\>4.\' \>$T \vdash \Box\mathsf{(\delta \lif (\Box \delta \lif \varphi))}$\>\>From 3, by C1\\
\>5.\' \>$T \vdash \Box\mathsf{\delta \lif \Box(\Box \delta \lif \varphi)}$\>\>From 4, by C2 \\
\>6.\' \>$T \vdash \Box\mathsf{\delta \lif (\Box\Box \delta \lif \Box \varphi)}$\>\>From 5, by C2\\
\>7.\' \>$T \vdash \Box\mathsf{\delta \lif \Box\Box \delta}$\>\>By C3\\
\>8.\' \>$T \vdash \Box\mathsf{\delta \lif \Box \varphi}$\>\>From 6 and 7\\
\>9.\' \>$T \vdash \Box\mathsf{\delta \lif \varphi}$\>\>From 2 and 8\\
\>10.\' \>$T \vdash \mathsf{\delta}$ \>\>From 1 and 9\\
\>11.\' \>$T \vdash \Box\mathsf{\delta}$\>\>From 10, by C1\\
\>12.\' \>$T \vdash \mathsf{\varphi}$ \>\>From 9 and 11\end{tabbing}
So, discharging the assumption at line (2) we have
\begin{theorem}\label{Lobstheorem}
If $T$ is p.r. axiomatized, contains \Q, and the derivability conditions hold, then if $T \vdash \:\Box\varphi \lif \varphi$ then $T \vdash \varphi$.
\end{theorem}
\noindent This is L\"ob's Theorem. This is a bit of a surprise: you might have expected that a theory will `think' quite generally that if it can prove $\varphi$ then it's true, i.e. in general $T \vdash \:\Box\varphi \lif \varphi$. But not so. A respectable theory $T$ can only prove this if it in fact can already show that $\varphi$.
\subsection{L\"ob's Theorem answers a question of Henkin's}
By the Diagonalization Lemma applied to the unnegated wff $\mathsf{Prov(z)}$, there is a sentence $\mathsf{H}$ such that $T \vdash \mathsf{H \equiv Prov(\ulcorner H \urcorner)}$ -- i.e., we can use diagonalization again to construct such a sentence $\mathsf{H}$ that `says' that it is provable. Henkin asked: \emph{is} $\mathsf{H}$ provable?
It is. For by hypothesis, $T \vdash \mathsf{Prov(\ulcorner H \urcorner) \lif H}$, i.e. $T \vdash \Box\mathsf{H \lif H}$; so $T \vdash \mathsf{H}$ by L\"ob's Theorem.
\subsection{L\"ob's Theorem implies the Second Incompleteness Theorem}
The argument is swift! Assume the conditions for L\"ob's Theorem apply. Then as a special case we get that if $T \vdash \Box\mathsf{\bot} \lif \mathsf{\bot}$ then $T \vdash \mathsf{\bot}$. Hence, if $T \nvdash \mathsf{\bot}$, so $T$ is consistent, then $T \nvdash \Box\mathsf{\bot} \lif \mathsf{\bot}$, hence $T \nvdash \neg\Box\mathsf{\bot}$ hence $T \nvdash \mathsf{Con}$.
\section{Other excitements}
We'll assume throughout this section that $T$ is $\Sigma$-normal and so that the derivability conditions hold.
\subsection{$\mathsf{G}_T$ and $\mathsf{Con}_T$ are provably equivalent in $T$}
We know from Theorem~\ref{thm:deriventailfirst} that $T \vdash \mathsf{Con} \to \neg\Box\mathsf{G}$ (suppressing subscripts!).
Now note the following result which says that $T$ knows that, \emph{if} it can't prove $\varphi$, it must be consistent.
\begin{theorem}\label{th:notprovphientailsCon}
For any sentence $\varphi$, $T \vdash\neg\Box\mathsf{\varphi} \;\lif\; \mathsf{Con}$.
\end{theorem}
\begin{proof}We argue as follows:
\begin{tabbing}
\hspace{2.5em}\= \hspace{0.2em} \= \hspace{1cm} \= \hspace{1cm}\hspace{5.5cm}\= \kill
\>1.\' \>$T \vdash \mathsf{\bot \lif \varphi}$ \>\>Logic!\\
\>2.\' \>$T \vdash\Box(\mathsf{\bot \lif \varphi})$\>\>From 1, given C1\\
\>3.\' \>$T \vdash \Box\,\bot\;\lif\; \Box\mathsf{\varphi}$\>\>From 2, given C2 \\
\>4.\' \>$T \vdash\neg\Box\mathsf{\varphi} \;\lif\; \neg\Box\bot$\>\>Contraposing\\
\>5.\' \>$T \vdash\neg\Box\mathsf{\varphi} \;\lif\; \mathsf{Con}$\>\>Definition of $\mathsf{Con}$\hspace{1.84em}
\end{tabbing}
So, since $T$ can't prove $\mathsf{Con}$, $T$ doesn't entail $\neg\Box\mathsf{\varphi}$ for any $\varphi$ at all. Hence $T$ doesn't ever `know' that it can't prove $\varphi$, even when it can't.
\end{proof}
\noindent In sum, suppose that $T$ the usual sort of theory: by (C1), $T$ knows all about what it \emph{can} prove; but we've now shown that it knows nothing about what it \emph{can't} prove.
Now, as a particular instance of this, we have $T \vdash\neg\Box\mathsf{G} \;\lif\; \mathsf{Con}$. So putting that together with Theorem~\ref{thm:deriventailfirst}, we have $T \vdash \mathsf{Con} \leftrightarrow \neg\Box\mathsf{G}$. And now combine \emph{that} with Theorem~\ref{th:TprovsGiffnotprovable} which tells us that $T \vdash \mathsf{G} \leftrightarrow \neg\Box\mathsf{G}$, and low and behold we've shown
\begin{theorem}\label{th:ConequivtoG}
If $T$ is $\Sigma$-normal then $T \vdash \mathsf{Con} \leftrightarrow \mathsf{G}$.
\end{theorem}
Now, not only do we have $T \nvdash \mathsf{Con}$, we also have (assuming $T$ is $\omega$-consistent) $T \nvdash \neg\mathsf{Con}$. In other words, $\mathsf{Con}$ is formally undecidable by $T$.
But $\mathsf{Con}$ is \emph{not} self-referential in any way, however loosely interpreted.That observation should scotch once and for all any lingering suspicion that the incompleteness phenomena are somehow inevitably tainted by self-referential paradox.
\subsection{Theories that `prove' their own inconsistency}\label{sec:theoriesproveowninconsistency}
An $\omega$-consistent $T$ can't prove $\neg\mathsf{Con}_T$, as we've just noted. By contrast, a consistent but $\omega$\emph{-inconsistent} $T$ might well have $\neg\mathsf{Con}_T$ as a theorem!
The proof is pretty trivial, once we note a simple lemma. Suppose $S$ and $R$ are two p.r. axiomatized theories, which share a deductive logic; and suppose every axiom of the simpler theory $S$ is also an axiom of the richer theory $R$. Evidently, if the richer $R$ is consistent, then the simpler $S$ must be consistent too. And the arithmetical claim that encodes this fact can be formally proved. Contraposing,
\begin{theorem} Under the given conditions, $\vdash \neg \mathsf{Con}_S \lif \neg\mathsf{Con}_R$.
\end{theorem}
\begin{proof}Suppose $\neg\mathsf{Con}_S$, i.e. suppose $\exists \mathsf{v}\,\mathsf{Prf}_S(\mathsf{v}, \bot)$. Hence for some $\mathsf{a}$, $\mathsf{Prf}_S(\mathsf{a}, \bot)$. And that implies $\mathsf{Prf}_R(\mathsf{a}, \bot)$. Why? Because the difference between the unpacked definitions of $\mathsf{Prf}_S$ and $\mathsf{Prf}_R$ -- the definitions which formally reflect what counts as (the code for) an $S$ proof and an $R$ proof -- will just be that the latter needs some more disjuncts to allow for the extra axioms that can be invoked in an $R$ proof. So it follows that $\exists \mathsf{v}\,\mathsf{Prf}_R(\mathsf{v}, \bot)$, i.e. $\neg\mathsf{Con}_R$. And the inferences here only use first-order logic.\end{proof}
Now let's put that theorem to use. Take the simpler theory $S$ to be \PA\ (which is $\Sigma$-normal!). And let the richer theory $R$ be \PA\ augmented by the extra axiom $\neg\mathsf{Con_{PA}}$.
By definition, $R \vdash\neg\mathsf{Con_{PA}}$. So using our lemma we can conclude $R \vdash\neg\mathsf{Con}_R$. $R$ is $\omega$-inconsistent (why? because that means $R \vdash\neg\mathsf{G}_R$, which it wouldn't be the case if $R$ were $\omega$-consistent). But it is consistent if \PA\ is (why? because we know from the Second Theorem that if $R$ proved a contradiction, and hence
\PA\ $\vdash \mathsf{Con_{PA}}$, then \PA\ would be inconsistent). So, \begin{theorem}
Assuming \PA\ is consistent, the theory $R$ = \PA\ $+\ \neg\mathsf{Con_{PA}}$
is a consistent theory which `proves' its own inconsistency.
\end{theorem}
\noindent And since $R$ proves $\neg\mathsf{Con}_R$,
\begin{theorem}
The \emph{consistent} theory $R$ is such that $R + \mathsf{Con}_R$ is \emph{inconsistent}.
\end{theorem}
What are we to make of these apparent absurdities? Well, giving the language of $R$ its standard arithmetical interpretation, the theory is just wrong in what it says about its inconsistency! But on reflection that shouldn't be much of a surprise. Believing, as we no doubt do, that \PA\ is consistent, we already know that the theory $R$ gets things wrong right at the outset, since it has the false axiom $\neg\mathsf{Con_{PA}}$. So $R$ doesn't really \emph{prove} (establish-as-true) its own inconsistency, since we don't accept the theory as correct on the standard interpretation.
Now, the derivability conditions hold for theories that contain \PA, so they will hold for $R$. Hence by Theorem~\ref{th:ConequivtoG}, $R \vdash \mathsf{Con}_R \equiv \mathsf{G}_R$. So since $R \vdash\neg\mathsf{Con}_R$, $R \vdash\neg\mathsf{G}_R$. Hence the $\omega$-inconsistent $R$ also `disproves' its own true canonical \gd\ sentence. That's why the requirement of $\omega$-inconsistency -- or at least the cut-down requirement of 1-consistency explained in the book -- \emph{has} to be assumed in the proof that arithmetic is incomplete, if we are to prove it by constructing an original-style \gd\ sentence like $\mathsf{G}_R$.
\vspace{12pt}\noindent There's lots more in the book around and about these issues, in Chaps 24--28. If you are going to read more, however, perhaps the three things to concentrate on are these. First, the discussion of Hilbert's Programme in the first half of Chap. 28 (as the topic is historically important). Second, the quick remarks about minds and machines later in Chap. 28. And then -- for enthusiasts -- there's the more intricate arguments in Chap. 27 leading up to discussion of what G\"odel called the `best and more general version' of the Second Theorem.
\section{What's still to come \ldots} And here, my Christchurch classes have to end. But the book carries on (and some further classes by others will cover some of that material and other related matters). But let me \emph{very} briefly indicate where the book now goes, and how it links in with what we've done so far.
We have now proved \gd's First Incompleteness Theorem and outlined a proof of his Second Theorem.
And it is worth stressing that the ingredients used in our discussions so far have really been \emph{extremely} modest. We introduced the ideas of expressing and capturing properties and functions in a formal theory of arithmetic, the idea of a primitive recursive function, and the idea of coding up claims about relations between wffs into claims about relations between their code-numbers. We showed that some key numerical relations coding proof relations for sensible theories are p.r., and hence can be expressed and indeed captured in any theory that includes \Q. Then, in the last dozen chapters, we have worked \gd ian wonders with these very limited ingredients. But we haven't need to deploy any of the more sophisticated tools from the \mbox{logician's} bag of tricks.
Note, in particular, that in proving our formal theorems, we \emph{haven't} yet had to call on any \emph{general} theory of computable functions or (equivalently) on a general theory of effectively decidable properties and relations. (Recall: the p.r. functions are not all the computable functions, by Theorem~\ref{nonprcomp}.)
Compare our later official theorems with the more informal early theorems \suffstrongnotdecidable* \easygodelthm* \noindent A `sufficiently strong theory' is, recall, one which can capture at least all effectively decidable numerical properties. So both those informal theorems \emph{do} deploy the informal notion of effective decidability. And to prove an analogue of the undedicability theorem and to get the early incompleteness theorem to fit together nicely with our later official \gd ian proof, we'll therefore need to give a proper formal treatment of decidability.
And that shapes the main tasks in the remaining chapters of $IGT$. In more detail
\begin{enumerate}
\item We first extend the idea of a primitive recursive function in a natural way, and define a wider class of intuitively computable functions, the \emph{$\mu$-recursive} functions. We give an initial argument for \emph{Church's Thesis} that these $\mu$-recursive functions indeed comprise \emph{all} total numerical functions which are effectively computable. (So the suggestion is that we can trade in the informal notion of effective decidability for the formally defined notion of recursive decidability.)
\item We already know that \Q, and hence \PA, can capture all the p.r. functions: we next show that they can capture all the $\mu$-recursive functions. The fact that \Q\ and \PA\ are in this sense recursively adequate immediately entails that neither theory is decidable -- and it isn't mechanically decidable either what's a theorem of first-order logic. We can also quickly derive the formal counterpart of the informal incompleteness theorem, Theorem~\ref{easygodel}
\item We then turn to introduce another way of defining a class of intuitively computable functions, the \emph{Turing-computable} functions: \emph{Turing's Thesis} is that these are exactly the effectively computable functions. We go on to outline a proof of the pivotal result that the Turing-computable (total) functions are in fact just the $\mu$-recursive functions again.
\item Next we prove another key limitative result (i.e. a result, like \gd's, about what \emph{can't} be done). There can't be a Turing machine which solves the \emph{halting problem}: there is no general effective way of telling in advance whether an arbitrary machine with program $\Pi$ ever halts when it is run from input $n$. We show that the unsolvability of the halting problem gives us another proof that it isn't mechanically decidable what's a theorem of first-order logic, and it also entails \gd ian incompleteness again.
\item The fact that two independent ways of trying to characterize the class of computable functions coincide supports what we can now call the \emph{Church-Turing Thesis}, which underlies the links we need to make e.g. between formal results about what a Turing machine can decide and results about what is effectively decidable in the intuitive sense. We finish the book by discussing the Church--Turing Thesis further, and consider its status.
\end{enumerate}
\end{document}