Timothy Chow on the NF consistency proof and Lean

Timothy Chow has sent a long comment, adding to the discussion following the previous post NF is really consistent. I think the points he makes are interesting enough to highlight as a standalone guest post. He writes:


First, let me congratulate Randall and Sky on a magnificent achievement!

I’m afraid I have to agree with Randall that the likely response of the mathematical community is a polite yawn. That is unfortunate, because there are several reasons why people should be excited about this work.

The first reason is simply that a difficult, well-known, and longstanding conjecture has finally been proved, via a brilliant argument that uncovers deep structure in an unexpected place. I don’t know what mathematics is, if not the uncovering of deep and unexpected structure. If the subject in question were anything other than mathematical logic, which many mathematicians are unjustly prejudiced against, the champagne corks would already be popping.

Even many set theorists seem to be prejudiced against NF. There are standard complaints that it presents a strange and unmotivated foundation for set theory. But as you can see from older papers on his webpage, Randall has long ago addressed these objections. The time is long overdue for a re-evaluation of the possible role that NF might play in the foundations of mathematics. Note that NF has type-theoretic origins; type theory is experiencing a renaissance of sorts nowadays, so it is especially appropriate for NF to be given a fresh look.

A second, and totally different, reason to be excited about this piece of research is that it convincingly demonstrates the value of proof assistants in the process of producing new mathematical knowledge. Not all readers of this blog may be aware of the history of this proof of the consistency of NF. Randall announced the result over ten years ago, and submitted a paper for publication, but the editor did not want to publish it until a referee could be found who was able to independently vouch for its correctness, and no such referee was forthcoming, because the proof was so complicated. This state of affairs highlights one of the weaknesses of our peer review system; if a proof is very complicated and there is uncertainty about its correctness, then referees are typically reluctant to invest a lot of time checking it, because if the proof turns out to be wrong, then they will likely have wasted a lot of time that they could have spent pursuing more rewarding activities. People don’t want to read the proof until they’re sure it’s correct, but there’s no way for the proof to be recognized as correct until people read it. In the absence of a proof assistant, an author in possession of a correct but complicated proof is caught in a vicious cycle.

This is one major reason we should be grateful for proof assistants; they allow us to break the impasse. Rowsety Moid’s initial comment betrays a lack of understanding of this point. To say that the correctness of a proof rests on “the judgement of competent mathematicians who understand the proof” is partially correct, but fails to acknowledge that sometimes, human mathematicians are unable to arrive at such an understanding and judgment without machine assistance. Randall began with a basically correct proof, but the process of communicating the proof to other humans (which you might think is a purely human activity) turned out to be impossible until a proof assistant entered the picture. Furthermore, Lean did not simply play a purely passive role, dully checking a proof that had been completely spelled out by the humans in advance. As Randall will be quick to acknowledge, the process of coaxing Lean to certify the argument was not a monologue by the humans, but a dialogue that forced the humans to rethink, clarify, and improve the argument. Lean is truly an interactive proof assistant; we are entering a new era in which proof assistants are becoming vital partners in the creation of proofs, not just by performing routine calculations, but by engaging us in interactive dialogue.

Of course, some of these lessons were already learned with Flyspeck or Gonthier’s proof of the four-color theorem, etc. But a new feature of the proof that NF is consistent is that it demonstrates that what might seem to be a purely sociological issue (the difficulty of persuading a referee to check a proof and thereby allowing the proof to be assimilated into the body of accepted mathematical knowledge) can in fact be addressed with the help of an interactive proof assistant, which has infinite patience, and no pressing need to advance its own career by spending its time on more “important” tasks.

26 thoughts on “Timothy Chow on the NF consistency proof and Lean”

  1. This is one major reason we should be grateful for proof assistants; they allow us to break the impasse. Rowsety Moid’s initial comment betrays a lack of understanding of this point.

    I did not lack understanding of that point. Even in my initial comment, I said “formal verification in Lean can increase our confidence that mistakes haven’t been made in the proof”. And I was aware of the impasse, at least as you described it here:

    There is a little problem, however. Holmes’s proof is so intricate that it has run into the same problem that Hales (and Ferguson) encountered with the (pre-Flyspeck) proof of the Kepler conjecture. Namely, the experts have taken a serious look, and believe that it is probably correct, but have not been able to understand the proof thoroughly enough to vouch for its correctness. Holmes has responded by trying to rewrite the proof in a simpler and more understandable manner, but his multiple rewrites have generated a new problem: Experts don’t want to invest a lot of time studying a proof that is not stable and that might change from under their feet at any time. Even though Holmes announced the proof back in 2010, it has not yet been accepted for publication. I believe that the main obstacle is the difficulty of finding a referee who is willing to sign off on the paper.

    I didn’t go into such things, because I don’t have any objection to the way Randall Holmes and Sky Wilshaw have used Lean (I even said in one of my comments that I liked the way they used it) and didn’t want my comment(s) to be even longer.

    I’m not sure you even disagree with the point I was making here

    I’m reacting to what seems a suggestion that’s it’s verification in Lean that tells us the proof really is a proof (and to some other comments I’ll describe below). I think it should be the other way around: while formal verification in Lean can increase our confidence that mistakes haven’t been made in the proof, if we want mathematics to remain a human activity, the crucial test for whether something is a proof should still be the judgement of competent mathematicians who understand the proof.

    If, in some case, “human mathematicians are unable to arrive at such an understanding and judgment without machine assistance”, that’s fine, so long as they do arrive. If instead they never understand the proof and treat it as correct because a proof assistant says so, then we have a problem “if we want mathematics to remain a human activity” (which I hope we do).

    1. Thanks for your clarifications. You are right that we don’t disagree much. Nevertheless, I think I am willing to delegate more to the computer than you are. Setting aside interactive proof assistants for the moment, what about results that have been established only through extremely intensive machine calculations, such as the Boolean Pythagorean triples problem? There is no meaningful sense in which humans “understand” such proofs, beyond understanding what the computer is supposed to be doing, and trusting that it is doing it correctly. We are basically saying the proof is correct because the computer says so.

      In the case at hand, I am suggesting that we are going one step further, and (partially) delegating the refereeing task to the proof assistant. I mean, think about it: if the only thing that mattered were the judgment of a human expert who understood the proof, then why didn’t the story end when Randall announced his proof the first time around, and posted his preprint on his website? Randall is an expert; he understood the proof, and made the judgment that it was correct. Why wasn’t that good enough? Well, as a community, we have decided that some kind of refereeing or peer review process needs to take place. The situation as of this writing is that formal acceptance by a journal hasn’t occurred yet; nevertheless, the certification by Lean has pushed us past a critical hump. We are basically treating Lean’s verdict as a referee’s report.

      Now, you could object that in this particular case, I’m overstating the case. Lean’s verdict isn’t fully usurping the role of the referee’s report. Importantly, Sky now also understands the proof, and is playing the role of a peer reviewer (as well as of a coauthor). And the story isn’t over just yet; we all still want to see a formal referee’s report and a journal acceptance. I agree with all that. However, I expect that in the future, we’re likely to evolve to a situation where we’re going to be satisfied with the verdict of the proof assistant, and we’ll regard the “understanding” of a coauthor or a human referee to be desirable, but not mandatory, at least as far as the correctness is concerned.

      If this prediction comes true, does that mean that mathematics will cease to be a “human activity”? I don’t think so. Just because machines play key roles in some activity doesn’t stop it from being a human activity. For example, right now, I am having a dialogue with you, despite the fact that I have never met you face to face, and despite the fact that I am typing these sentences at a different time and place from when and where you’re reading them. Machines are playing an indispensable role in this communication; nevertheless, our dialogue is a human activity.

      I don’t believe that “understanding” or “refereeing” are sacred activities that cannot be offloaded to machines without sacrificing the humanness of mathematics.

      Now, there is some point beyond which mathematics will cease to be a human activity. If there ever comes a day when the AIs are autonomously doing all the mathematics and humans are just sitting around with folded hands (that’s an allusion to an excellent 1947 story by Jack Williamson, by the way), then yes, mathematics will no longer be a human activity. But as of now, that day is still far in the future.

    2. Do you feel that most competent mathematicians would stop trying to understand a proof if they believe it is certainly objectively correct (which, according to this post, seems to be the opposite of the actual situation)? If not, why are you worrying that mathematics might cease to be a human activity?

      1. I don’t quite understand your question. What do you think is the actual situation? Which post is “this post”, and what do you think it says the situation is?

        If you want to understand my concerns, you should consider what I said in comments on the earlier blog post, NF really is consistent.

        I don’t think I’ve said anything that says (or assumes, implies, presupposes) that mathematicians who were trying to understand a proof would stop if they believed it was certainly correct.

        Some might stop, of course. It will be interesting to see what happens now that the NF consistency proof has been verified in Lean. Will the impasse that Timothy Chow described be broken, so that some mathematicians are now willing to act as referees and check the proof? Or will the attitude instead be that there’s no point in spending time checking the proof now that Lean has approved it?

        1. I agree that it’s an interesting question whether other mathematicians will now take the time to understand the proof thoroughly. My guess is that there is a small group of people who already have an interest in NF, but who were deterred by the deficiencies of the pre-Lean writeup, and who will take the time to study the proof now that Lean has helped cleared away the underbrush. They will invest this effort not so much to “act as referees and check the proof” but because they care about NF and are seeking a deeper understanding of it.

          There may also be some people whose attention is captured by the publicity surrounding the announcement of the proof, and who will study the proof, perhaps just out of curiosity, or because they think the proof opens the door to new mathematics that is waiting to be discovered.

          In both of the above cases, I think that the clearing away of the cloud of uncertainty surrounding the correctness of the proof will increase people’s interest in the proof.

          At the same time, I can easily imagine that the actual referee (assuming the paper is duly submitted for publication in the normal manner) might take the attitude that there’s no need to invest a huge effort to humanly understand every detail of the proof before recommending publication. The role of a referee is to give a stamp of approval. Even if you take the view that for mathematics to remain a human activity, referees must be human, it doesn’t follow that said humans are required to load the entire proof into their brains before issuing a verdict. Indeed, even before computers, referees did not necessarily check every detail of a proof before recommending publication. If they were doing their job properly, they would of course invest a reasonable amount of effort to convince themselves of the overall correctness of the paper, but this process would often fall short of 100% verification. I expect it to be much easier now to find people willing to act as a referee for the consistency of NF, precisely because referees now have more flexibility about how much effort they will put into correctness-checking.

          In summary, if people choose not to understand the proof in detail, it will be because they’re not sufficiently interested in the result itself, or at least they’re not sufficiently interested in the method used to arrive at the result. (In other words, if they’re interested, then they’ll study the proof, even if they are happy to trust Lean 100% when it comes to checking correctness.) And it has never been the case that a referee has an obligation to load the entire proof into human memory and check every step.

        2. Your original comment first raises the concern of “Why should we trust the Lean verification?” (Which I think the other replies have addressed: The probability of a bug in the Lean-verified proof might be less than the probability of a bug in a proof verified by multiple human mathematicians.) Then you revealed that your true concern is whether mathematics would remain a human activity. It seemed to me that you are less really worried about the objective correctness, and more worried about human mathematicians ceasing to interact with the proof because of the objective correctness.

          I think we can have the cake and eat it too: Instead of viewing correctness as the end, mathematicians can view it as a beginning. The obvious next step is to study the structure of the proof and see whether it might be useful for solving other problems. Maybe some people will study how exactly the axiom of choice is violated in the given model. I believe this is the sort of thing that mathematicians have always really been doing anyway, verification of correctness being more or less incidental.

          On a reread of your previous comment, I realized that you did make the point that the mathematician writing the proof may cease to try to present it in human-readable ways. Although this is largely the same issue: Not trying to convince others of its correctness does not necessarily mean not trying to convey the brilliancy of the method. Plus, I tend to believe that, in general, a machine-readable proof cannot be that human-unreadable; at least it won’t leave out details.

          1. I love your comment, “Instead of viewing correctness as the end, mathematicians can view it as a beginning.” Yes, indeed!

            What worries me a bit, though, is that NF’s undeserved reputation for being bizarre and uninteresting will cause people to ignore the consistency proof. And then others might incorrectly blame Lean for the lack of interest.

            As for human readability, I feel confident, based on things Randall has said, that in this particular case, the formalization process has made the proof more understandable to humans. And yes, it sure helps that no details are left out! There’s an interesting MathOverflow question which lists some major mathematical results that are generally accepted as having been proved, even though no proof is available to the public. That is a frustrating state of affairs.

            On the other hand, I’m less optimistic that all machine-readable proofs will be human-readable. I already mentioned one example of a highly computational proof that is simply beyond full human comprehension. But I don’t think that proof assistants will foster an attitude of, “I explained the proof to the machine so now I don’t have to explain the proof to other humans.” People will continue to explain proofs to each other for all the same reasons that they do now: to share their excitement and their new insights, to encourage others to follow up with more research, etc.

          2. I largely agree with your middle paragraph. I think it’s a good point (though I don’t think verification of correctness is quite incidental).

            For the third paragraph, I don’t think I’ve used the term “human-readable”, and I think it’s ambiguous. It can just mean that something is in the form of text, and a human can read the characters; it can instead mean that a human can make sense of it. Would you consider a 200 terabyte propositional proof “human readable”, for example?

            (I did use “readable”, when quoting Randall Holmes. He said “the conclusions of the Lean proof are readable” — the conclusions, interestingly, rather than the whole proof.)

            That leaves your first paragraph. I not happy with the wording “then you revealed that your true concern”. That makes it sound like I wasn’t really concerned with anything else and was perhaps even being dishonest in some way. I also don’t think I’ve said I’m worried that human mathematicians would “(cease) to interact with the proof because of the objective correctness”. And while it’s true that other replies have addressed my question, “Why should we trust the Lean verification?”, I don’t find those replies fully convincing. I hope I can find time to say more about that (under the other blog post).

            1. If your suggestion is that we should be skeptical of the Lean verification because Lean might have bugs, then I agree, as long as you agree that we should be skeptical of human verification because humans also have bugs. There’s another interesting MathOverflow question about widely accepted mathematical results that were later shown to be wrong. Bugs in proof assistants at least have the advantage that their nature is well understood, and we know how to minimize their presence. Unlike other pieces of modern software, whose bugginess increases with size, a theorem prover can be designed so that as far as correctness is concerned, we need only worry about bugs in a fixed logical “core” or “kernel” that performs the verification. The HOL Light theorem prover, for example, has a very small logical core which can be humanly checked for bugs.

              Bugs in human verification, however, can be much more intractable. If you have followed the Mochizuki/abc saga, you know that Mochizuki, along with a small coterie around him, has dug in his heels and insisted that his proof is correct, despite devastating objections by Scholze, Stix, and others. A similar irresolvable disagreement happened with Wu-Yi Hsiang’s claimed proof of the Kepler conjecture. Even in less pathological cases, we often find ourselves in the position of trusting what a tiny number of human experts say. The details of the proofs are not always written down fully, or the proofs may have bugs, or be written in a language that is decipherable only if you apprentice yourself to one of the experts and get the unwritten details via oral communication.

              It is a fair question to ask why we should trust Lean. I think it is also a fair question to ask why we should trust human experts. You have emphasized the humanness of the mathematical enterprise. Well, as the old saying goes, to err is human. If preserving the humanness of mathematics means preserving its high error rate, then I’m not sure I want that. Computers don’t give us complete certainty, but if used responsibly, experience has shown that they can give us greater certainty than purely human methods can.

              If we widen the discussion to include other kinds of computer assistance, then the four-color theorem is often cited as a case study. But I think not everyone is fully aware of the facts of the case, and so people don’t always draw justified conclusions. The least reliable part of the original Appel-Haken-Koch proof was not the computer-verified part. There was a 400-page microfiche supplement tucked into the back of the journal issue, with a ton of tiny hand-drawn diagrams that had to be checked manually. As far as I know, nobody has independently checked the entirety of the original proof. Someone went over a large percentage of it, and found lots of little bugs, and one serious bug that required a nontrivial fix. When Robertson et al. took a look, they gave up after a while and decided to generate their own proof. Confidence that the theorem really was a theorem increased after that, not just because there was an independent verification, but because the Robertson et al. proof relied less on human checking than the original proof did.

              The Flyspeck story had a similar arc. The process of machine verification turned up a bunch of minor flaws and one serious gap in the original proof. All the evidence shows that carefully executed machine verification reduces the error probability.

              This is not to say that we should uncritically accept all claims of machine verification. Ed Nelson once famously announced that he had proved that PRA was inconsistent and that he had verified the proof in a proof assistant. Well, of course his proof was wrong, and it turned out that the proof assistant was some home-grown piece of code that had not been robustly tested. It’s important that Lean or Coq or Isabelle or whatever is not just any old piece of software, but a code base that has been robustly tested by a large community of people. If Randall and Sky had used their own private proof assistant, then we would have greater cause for doubts. But they have used a well-known assistant, and their code will soon be made public, after which skeptics can examine the proof themselves and run independent tests.

            2. Let me say a little bit more about what measures have been taken to address the concern about bugs in the proof assistant. I will focus on HOL Light, which was used for Flyspeck. In 2014, Kumar et al. published HOL with Definitions: Semantics, Soundness, and a Verified Implementation. This work proves the formal correctness of the HOL Light kernel. Of course, you could ask why we should trust Kumar et al.’s proof. You could also point out that HOL Light runs on top of OCAML, and OCAML has not been formally verified. People have attempted to address the latter objection (although not in the specific case of OCAML as far as I know): see Myreen and Davis’s paper, The Reflective Milawa Theorem Prover Is Sound (Down to the Machine Code That Runs It). The “turtles all the way down” objection can never be fully rebutted, of course, but the point is that people in the formal theorem proving community are well aware of all the possible sources of bugs, and continue to try to minimize them. For more commentary and references, see the paper Flyspecking Flyspeck by Mark Adams and this MathOverflow question.

              Let me also make a general comment about trusting proof assistants. If I am unfamiliar with proof assistants, then they may seem to be some kind of distant, mysterious authority whom I am expected to worship with blind faith. But I would claim that in reality, the opposite is true. Proof assistants actually help democratize mathematics (for lack of a better word). Non-mathematicians sometimes ask why mathematicians are so obsessed with rigor. As I have explained elsewhere, one of the most valuable features of rigorous mathematical reasoning is that it allows me to determine, in the privacy of my own study and without having to seek approval from human authorities, whether a given mathematical argument is fully correct. This is what I mean by “democratization.” Machine proofs have (or can have) the same character. If I have doubts about correctness, I don’t have to trust the verdict of some famous human expert, to whom I may have no direct access. I can download the machine proof and check the proof myself. This is “nullius in verba” in action!

              But let me end with two caveats. In the case of Flyspeck specifically, what I just said about checking the proof myself is not very realistic. The code is something of a mess, and more importantly, the amount of computer time required for re-verification is huge. (Fortunately, the proof of the consistency of NF does not require vast amounts of computer resources.)

              The second caveat is that a very paranoid skeptic could wonder whether a malicious attacker could convince the mathematical community of the correctness of a formal proof when no correct proof exists. Adversarial bugs, as opposed to unintentional bugs, are of course much more difficult to detect. In an extreme case, some company could obtain a near-monopoly on the hardware, software, and operating systems in common use, and maliciously insert code that consistently flags the bogus proof as correct, no matter how hard you try to write your own code (on the compromised machines) to “independently” verify the correctness of the proof. This type of concern may seem far-fetched today, but it’s something to keep in mind as we move into an era when proof assistants may come to be widely trusted. Part of the reason the Internet is so vulnerable to hackers nowadays is that security was not a prime consideration in the early days. If we want to guarantee the security of machine proofs against malicious adversaries, then the sooner we start thinking about the problem, the better.

  2. Daniel M Gessel

    Certainly mistakes (aka “bugs”) will be discovered but the probability will go down with time – standards for acceptance by the mathematical community will adapt.

    The more difficult concern is the degree of human involvement and understanding. Wouldn’t “Human Readability” be a requirement for “proof” from the philosophical side, where Mathematics is part of the human drive to understand the nature of things? On the other hand, practicing Engineers (perhaps including applied Mathematicians) may be less concerned as experimental validation is often the norm – but isn’t the purpose of more theoretical pursuits for actual human beings to get at the truth of things?

    1. In the mathematical community, human readability is typically not a requirement for a proof. There are increasingly many examples of proofs that are not humanly readable, but that cause the mathematical community to switch from using the term “conjecture” to the term “theorem,” indicating that the proof is indeed considered to be a bona fide proof.

      On the other hand, nobody (or almost nobody) is happy with a correct but unreadable proof if it seems as though there ought to exist a more understandable proof. You mentioned a drive to understand; this drive will motivate people to find alternative, and hopefully more intellectually satisfying, proofs. But that doesn’t mean they’ll declare the unreadable proof to be “not a proof.” Declaring that something is “not a proof” is an assertion about its correctness, not its readability (except insofar as the unreadability might generate doubts about correctness).

      You also mentioned “getting at the truth of things” as if this is the same as “understanding the nature of things,” but mathematicians typically distinguish between judging that a proof is correct and understanding the proof. Every mathematician has had the experience of working through a proof (not necessarily even a particularly long proof) and becoming convinced that the proof is correct, but nevertheless feeling that the proof remains mysterious.

      1. Daniel M Gessel

        I see; my assumptions, that the philosopher is interested in understanding “why” and that “truth” in that context includes an answer, aren’t shared. I have had the experience you describe: following a proof step by step but missing the woods while finding my way through those trees.
        If a valid formal proof is mechanically verifiable (by definition, I would think) computer verification will be the standard as systems become trusted.
        I’m guessing machine learning experiments to guide a proof assistant (e.g. a LLM trained on code libraries and the mathematical literature could play the role of “user”) are underway – an ATP. Verified by the PA’s kernel, how will those proofs be viewed? Will they be published in journals or maybe just stored away on an SD card or burnt on a CD along with old photos…?
        Alas, we live in interesting times.

        1. Those are good questions, and we don’t even have to wait for a nontrivial LLM+PA proof to materialize. Unwieldy machine proofs already exist today. Consider, for example, the proof of the q-TSPP conjecture by Kauers, Koutschan, and Zeilberger. This is a stellar piece of work resolving a longstanding conjecture, and I was on the committee that decided to award it the Robbins Prize. But you can certainly raise some legitimate concerns about the accessibility of the proof. The half-gigabyte certificate is only the tip of the iceberg. If you want to verify their proof, you not only have to study their published PNAS paper; you have to slog through their Mathematica notebook, which intersperses detailed comments with computer code. Mathematica is proprietary, so you have to acquire a license somehow, and even if you do so, the code is closed-source, limiting your insight into what it’s really doing. And is anyone archiving a container to allow you to run the code far in the future, after umpteen upgrades in the software and operating system? I doubt it. To top it all off, when I took the first few steps to try to verify the proof, I discovered that at one point I had to email one of the authors for some extra code that was not publicly posted on their website. This is certainly not a satisfactory state of affairs.

          More and more libraries are discarding paper copies of journals in favor of electronic access. Doing so has obvious advantages, but it also comes with risks. It’s easy to imagine some catastrophic bug permanently wiping out our only records of the proofs of many mathematical results. I think we’re probably not doing enough as a community to guard against such an apocalyptic scenario.

  3. I was intrigued by this part of Timothy Chow’s post

    Even many set theorists seem to be prejudiced against NF. There are standard complaints that it presents a strange and unmotivated foundation for set theory. But as you can see from older papers on his webpage, Randall has long ago addressed these objections.

    When I looked at the web page, it wasn’t obvious which papers addressed those objections. The set-theoretical program of Quine succeeded, but nobody noticed looked like it might, though, so that’s what I opened.

    It does indeed say it will ‘present an intuitive argument for the “reasonableness” of the stratification criterion for comprehension.’ (p 4)

    When I (eventually) found that argument (p 36), however, I was dismayed. It seemed far from intuitive to me. I struggled to make any sense of it.

    It starts by drawing an analogy with abstract data types.

    Suppose that we have a universe of objects, and we wish to implement classes of objects (pure extensions, not yet identified with any objects) as objects themselves. The abstract structures being represented are very simple: they are pure extensions. But the implementation of each class as a set has an additional feature: it is identified with an object. The implementation of each class is not a pure extension (collection of objects), but an extension with an object label attached. The analogy to abstract data types dictates that any feature of an implemented set we use should be a feature of the pure extension, not of the structure “label + extension” which implements it.

    At first, that says we want to implement extensions “as objects”. Then it instead says the implementation “has a additional feature: it is identified with an object.” And that seems to mean, not the the implementation is the object (with “identified with” meaning “identical to”) but that the object is a label (“identified with” meaning something more like “identified by”).

    Why is a label needed? When would it ever be used? And if it’s part of the internal representation, how could it ever be used (except internally, which should be allowed)?

    Now consider the “property” of sets “x is an element of itself”. This property, unlike its notorious negation, is not paradoxical. But we assert that it is illegitimate by the criterion indicated above: it is a property of the structure “label + extension” for each set, not a property of the extension considered by itself. One way to see this is to consider that a permutation technique could be used to reshuffle the “labels” attached to classes without changing which classes are implemented, and an extension previously labelled by one of its elements could come to be labelled by an object which is not one of its elements. Thus we can reject this property (and its paradoxical complement) on a priori grounds before ever considering the paradoxes.

    Why is it a property a property of the structure “label + extension”? And how could reshuffling the labels show that or make any difference? So what if “an extension previously labelled by one of its elements could come to be labelled by an object which is not one of its elements”?

    (If the labels are internal (hidden, not made visible by the abstract data type), then the value of x cannot be a label. So when asking whether x is a member of itself, we cannot be asking whether x is a label that matches the label of one of the elements. So why would it matter what any of the labels were?)

    So I don’t find this argument intuitive at all.

    In any case, it would at most provide an ‘argument for the “reasonableness” of the stratification criterion for comprehension’. The standard complaints go beyond that, for example in the conclusion of Luca Incurvati’s chapter on The Stratified Conception in his book, Conceptions of Set and the Foundations of Mathematics:

    The standard view that NF does not embody a conception of set is correct in some respects but not in others. It is correct in that there does not seem to be a conception of set behind NF which gives rise to an intuitive picture of the set-theoretic universe. But it is not correct in that the stratified conception – best thought of as a conception of objectified property – can be taken as backing NF. Whilst not delivering an intuitive picture of the NF universe, this conception motivates the stratification requirement, which has long appeared as a mere technical trick, in a principled manner.

    1. I should correct a slight misstatement that I made. Randall’s view is that NFU is a plausible foundation for mathematics. But he doesn’t think that NF is a plausible foundation for mathematics. I apologize if my previous sloppy statement misled you (or anyone else).

    2. The discussion you allude to above is speculative. It is related to David Lewis’s approach to set and class theory in Parts of Classes.

      In fact, if you look at my web page I direct you to what I regard as my best discussion of the foundational issues:

      https://randall-holmes.github.io/Papers/foundations3.pdf

      It does discuss the same speculative philosophical picture found in the earlier paper, among other things, I hope more helpfully. I do not regard what I say there as at all obvious.

      There is another much more recent paper, accepted for publication but not yet appeared:

      https://randall-holmes.github.io/Papers/symmetryrevisited.pdf

      It might be best for purposes now under consideration to read the set of axioms in the appendix. The burden of this paper is that a predicative theory of sets and classes in which the criterion for sethood is symmetry in a suitable sense turns out to be an extension of New Foundations (I do not know how to show that this extension is consistent!). This is philosophically interesting because it presents hypotheses which lead to the comprehension axioms of NF and which are not syntactical in nature at all. In the main body of the paper I am using lots of indirect notations because I am discussing this theory from the metatheoretical standpoint of the usual set theory: in the appendix the theory being studied is presented in usual set theory notation.

      The Incurvati paragraph seems to be saying that stratification is in itself a reasonable definition of objectified property which successfully backs NF! (that is, it doesn’t seem to say what you think it says) [I think more work is required to make such a claim; I want to point out that Incurvati is NOT accepting an objection to NF here, but accepting a justification, and also that I do not think you can justify stratification as a way to define an objectifiable property without work –which is actually what I am trying to sketch in my informal philosophical presentation]

      1. Thanks for the pointers to the other papers. I will have a look at them.

        The Incurvati paragraph is saying more than one thing. It says “The standard view that NF does not embody a conception of set is correct in some respects but not in others.” This part is saying where the “standard view” is not correct:

        But it is not correct in that the stratified conception – best thought of as a conception of objectified property – can be taken as backing NF.

        I understand that part in much the same way you do, as “saying that stratification is in itself a reasonable definition of objectified property which successfully backs NF.”

        He also pointed to a way in which (he thinks) “the standard view that NF does not embody a conception of set is correct”:

        It is correct in that there does not seem to be a conception of set behind NF which gives rise to an intuitive picture of the set-theoretic universe.

        Incurvati’s view seems to be that NF is best seen as being about objectified properties rather than sets. (I’m not basing that only on the one paragraph I quoted.)

        In any case, I’d be interested in your view of that chapter, “The Stratified conception”, of Incurvati’s book. I found it the hardest chapter to understand, partly because of an “abstract data type” argument similar to yours (on his page 176).

        1. I’m glad you are reading the paragraph in the same way I am.

          I would regard “objectified property” and “set” as the same thing. Probably very Quinean of me ;-) So I read the second paragraph and think, Incurvati is not expressing an objection here. I think that the identification of “objectified property” and “set” is very sensible for the usual set theory, too, in which sets are playing multiple roles, as objects, as properties, and as the bounding types determining which objects a property applies to.

          But I object to this approach to justifying NF, because stratification is really very syntactical and one cannot declare it a reasonable criterion for objectifiable property just by fiat. Work has to be done, and I have been trying to sketch what such work would be like.

          The second paper I point to makes the point that in fact there is a conception of set which leads to NF (more naturally than to NFU!) — if one characterizes sets as the classes which are symmetric in a suitable sense, one gets an extension of NF. I don’t know what conclusions can be drawn, because I cannot yet build a model of this theory.

          The first paper I point to makes a rather different point. In that paper I sketch a different account of what Quine might be construed as being up to, which is justified basically by the fact that it can be seen to work after the fact. Quine wasn’t starting from naive set theory: he was starting with the typed theory of sets, which is a consistent, principled view of the world. The typed theory of sets has a formal property, a high degree of symmetry, which is expressed in the syntactical fact that whatever you can prove or define in a lower type you can prove or define in each higher type. This creates a problem not of paradox but of ontological clutter. Quine proposed that the clutter could all be removed at one fell swoop by identifying the levels. He also made a mathematical error in proposing to do this with strong extensionality (he really did, I can explain on request). The proposal should have been NFU, and in 1962 Specker justified Quine’s approach to decluttering (by showing that TST(U) with ambiguity is equiconsistent with NF(U)) and in 1969 Jensen showed that the collapse actually works. That in itself can be taken as justifying Quine’s approach and (particularly after Boffa’s simplified description of Jensen’s construction) presents a conception of set which leads to NFU (as the theory of an initial segment of the usual cumulative hierarchy with an external automorphism which moves an ordinal). Not only is there a conception, but it can be viewed as a version of the same one that we take to underly the usual set theory (which is no more originally founded on this conception than NF was: the 1908 axiomatization of Zermelo set theory is NOT based on the cumulative hierarchy conception at all; it is arguably as much a hack as NF is [to some extent a hack, and not so much as people think, in both cases]).

          My account of what Quine is doing is borne out by things that he says in various places. The critique I would make is that he was being far too daring, but it works.

          I close by remarking that one should not forget that I think that Zermelo-style foundations are pragmatically better. But I think that it is very useful to know that there are other approaches, and NFU is another plausible approach, with its own creation narrative.

  4. Since replies doesn’t seem to be possible beyond a certain depth, I’ll post this here instead. It’s a reply to this comment from Timothy Chow.

    If I have doubts about correctness, I don’t have to trust the verdict of some famous human expert, to whom I may have no direct access. I can download the machine proof and check the proof myself.

    I don’t think anyone has said we should just “trust the verdict of some famous human expert”. When a proof is published, anyone can read it and form their own opinion, as can other people, including other experts. One person’s verdict isn’t treated as final. I don’t think formal proofs necessarily make it easier for someone to read a proof and form their own opinion; they may even make it more difficult. I think it’s quite wrong to make it sound like the way maths is developed without proof assistants is undemocratic.

    However, I’m not sure what you mean by “check the proof myself”. Run the proof assistant yourself and see what it says? Or go through the details of the formal proof yourself without the assistant’s help? And which version of the proof? The fully “elaborated” one that the assistant checked directly, or the unelaborated one at the higher level where the human mathematician worked?

    (It should not, BTW, be assumed that machine proofs will always be available for download, or that when they are available they’re available at reasonable cost to everyone.)

    If I am unfamiliar with proof assistants, then they may seem to be some kind of distant, mysterious authority whom I am expected to worship with blind faith.

    Setting aside the hyperbolic language, what is the alternative to trusting the proof assistant when you don’t understand the proof and trusting human experts is derided as undemocratic? (One alternative, of course, is not trusting the proof assistant either.)

    … people in the formal theorem proving community are well aware of all the possible sources of bugs, and continue to try to minimize them.

    To some extent

    Lean is under heavy development, and we are constantly trying new ideas and tweaking the system. It is a research project and not a product. Things change rapidly, and we constantly break backward compatibility. Lean comes “as is”, you should not expect we will fix bugs and/or add new features for your project. We have our own priorities, and will not change them to accommodate your needs.

    1. I agree with your concerns about proof assistants, but my argument is that all the same concerns apply to human experts, in spades.

      You say that when a proof is published, anyone can read it and form their own opinion. In some idealized world, yes. But who is going to read the proof of Fermat’s Last Theorem, or the Poincare Conjecture, or the classification of finite simple groups, and “form their own opinion”? Hardly anyone. You or I or any mere mortal is just taking that claim on faith. The amount of effort required to master the relevant mathematics is just not something that any but an elite few can be expected to invest. Furthermore, in many cases, even if I do decide to sit down and try to invest the effort, my chances of success are in many cases pretty slim, unless I have interactive access to an expert. For many proofs of major results, there is just too much implicit knowledge that isn’t on the written page and that is not easily obtainable unless you have social access to elite circles.

      Take Michael Freedman’s Fields-Medal-winning work: he proved the disc embedding theorem, and deduced the 4D Poincare conjecture. His original paper is extremely difficult to read. Freedman himself admits that the paper has some mistakes (albeit correctable, if you are sufficiently proficient in the area), but more importantly, I’ve heard of a number of highly competent mathematicians saying that they made a concerted attempt to understand the paper but simply had to give up after a while. Fortunately for the mathematical community, there is a recent book on the disc embedding theorem, which is much more readable and has far fewer gaps and errors. But even this book tacitly acknowledges that Freedman’s original argument involving uncountably many Casson handles is, at best, very hard to follow, since the new proof bypasses that argument in favor of a new argument.

      As Fang Zhang said, at least with a proof assistant, all the details are there. With a very long and complicated proof, you might not have the energy to go over every detail, but you at least won’t have to deal with errors and gaps that cannot be repaired except with expert knowledge.

      Your point about accessibility of the machine proof is a good one, which I have also made in a different comment. But human proofs are not always accessible either. Journals can be locked behind paywalls. Even worse, people don’t always write down complete proofs or make them publicly available.

      I don’t deny that there are limitations to machine proofs. But I think you are failing to acknowledge similar, or worse, limitations with the current system. I suspect that you have not had personal experience with banging your head against an impenetrable published proof of a major result, and struggling to thread your way through the errors and obscurities without social access to elite circles. If I am wrong and you have had such an experience, don’t you agree that having a fully machine-checked proof would likely have saved you a whole lot of wasted time?

      Your final quote about Lean I find to be something of a non sequitur. The type of bugs that are being referred to in that quote are not major bugs that cause the proof assistant to incorrectly certify statements that should not be certified.

      1. You say that when a proof is published, anyone can read it and form their own opinion. In some idealized world, yes. But who is going to read the proof of Fermat’s Last Theorem, or the Poincare Conjecture, or the classification of finite simple groups, and “form their own opinion”? Hardly anyone.

        Hardly anyone would read a fully machine-checked proof of such results either, especially it’s about reading it at the fully elaborated level where it was checked. And they would still have to master the relevant mathematics if they wanted to understand the proof.

        If they also need to master something like Martin-Löf type theory before they can understand the proof, that’s a significant barrier in itself: one that raises the same sorts of issues — such as interactive access to an expert, implicit knowledge, and mastering relevant mathematics — that you raise for proofs published in the usual way.

        “Don’t you agree that having a fully machine-checked proof would likely have saved you a whole lot of wasted time?” Probably not. I don’t think there’s a rule, though. In some cases, it might help to have a fully-checked formal proof; in others it wouldn’t.

        (I’m still not sure what you have in mind when saying “check the proof myself” — see the questions in my earlier comment. Something I neglected to mention is that if I need to run the proof assistant, that may no longer be possible, or it may not be possible to find / run the required version. You mentioned something in that area when talking about the proof of the q-TSPP conjecture.)

        *

        I think that if the Lean 4 fAQ answer I quoted from were only about minor bugs that don’t cause Lean to incorrectly certify statements, it would say so. It doesn’t.

        I don’t want to copy everything I said in the linked comment, piece by piece, from the other Logic Matters post to this one, so that the discussion happens here instead. However, what I quoted from the Lean 3 FAQ does rather suggest that people who are “really concerned about soundness” should not totally trust Lean. (Which is not to say Lean isn’t useful or that people “really concerned about soundness” shouldn’t make any use of it.)

        It mentions “two independent checkers: tc and trepplein” and recommends that “if you are really concerned about soundness, … you often export your project and recheck it using the independent checkers above.”

        Independent checkers — not part of Lean, written in a different ways in different programming languages by different people — would be a strong reason for trusting proofs developed in Lean. Unfortunately, they don’t seem to exist for Lean 4. (If they do exit, I’d be glad to hear of it.)

        1. Again, I think our main point of disagreement is not actually about proof assistants, but about the limitations of the traditional system, which you don’t really address here. But let me address a couple of subsidiary issues that you raise.

          Mastering type theory is not really a prerequisite for reading most Lean proofs, just as mastering Zermelo-Fraenkel set theory is not really a prerequisite for reading most mathematics books and journal articles.

          The quote from the Lean 3 FAQ is a legal disclaimer. Interpreting it as a scientific statement is what I would call a category error.

  5. There was a human being whose patience was indispensible to the formalization of my construction of a model of tangled type theory. Sky Wilshaw invested a lot of patient effort in the formalization : “an interactive proof assistant, which has infinite patience, and no pressing need to advance its own career by spending its time on more “important” tasks” was definitely involved, but so was human patience.

  6. It’s possible to become so used to talk of software in anthropomorphic terms (and of humans in terms more appropriate to software and machines) that it seems insignificant — as just using convenient metaphors — and that’s how it was for me, at first, when reading comments here.

    (Examples include describing Lean or proof assistants as being coaxed, having “infinite patience”, being “vital partners”, “engaging us”; and of human mathematicians as having “bugs” and not needing to “load” an entire proof.)

    On reflection, though, I think such talk should often be resisted, and I think this part of Timothy Chow’s initial comment ends up being misleading:

    Furthermore, Lean did not simply play a purely passive role, dully checking a proof that had been completely spelled out by the humans in advance. As Randall will be quick to acknowledge, the process of coaxing Lean to certify the argument was not a monologue by the humans, but a dialogue that forced the humans to rethink, clarify, and improve the argument. Lean is truly an interactive proof assistant; we are entering a new era in which proof assistants are becoming vital partners in the creation of proofs, not just by performing routine calculations, but by engaging us in interactive dialogue.

    Of course, it’s true that Lean wasn’t given a proof “that had been completely spelled out by the humans in advance”, that Randall and Sky used Lean to help them “rethink, clarify, and improve the argument”, and that Lean is “interactive”.

    However, it’s less misleading to say Lean was passive than to say it wasn’t. Lean doesn’t take the initiative or decide to engage us in dialogue. It’s interactive in essentially the same way as a programming language’s ‘read-eval-print loop’ or ‘interactive development environment’ (IDE), which is perhaps not surprising since Lean is also a programming language. The human user does something with their keyboard, mouse, etc, sees what comes back (an error message, say, or some information about their program), does something more, and so on. It’s a useful tool, not a partner.

    And when using a proof assistant, the resulting proof will not always be one that is clearer (or otherwise improved) from a human perspective. Using Lead doesn’t force mathematicians to make their proof clearer, as if greater clarity (from a human perspective) must always result.

    The advantages of using proof assistants and related software typically come with downsides that are often ignored. For instance, by making it easier to manage long and complex proofs, increased use of proof assistants could well lead to more proofs that are too long or too complicated for anyone to understand or that can be understood only with the help of particular software that may cease to be available.

    Here’s an example from a different stage: writing up a proof for publication or to tell other mathematicians about it. If you have a formal proof developed in something like Lean, at least part of that could be automated. People are developing software that would turn a formal proof into the mixture of natural language and formulas mathematicians use when writing up a proof. That can make the job easier. However, the process of writing up can help the author understand the proof and can lead to changes that make it easier to write up and hence easier for others to understand. Automation can decrease or even remove those benefits.

    *

    To keep down the number of separate comment posts, I’ll now reply to some related points (from two other comments)

    But I don’t think that proof assistants will foster an attitude of, “I explained the proof to the machine so now I don’t have to explain the proof to other humans.”

    I think there are already signs of that happening. In What do we want a foundation to do?, Penelope Maddy quotes Vladimir Voevodsky as saying

    “I now do my mathematics with a proof assistant. I have lots of wishes in terms of getting this proof assistant to work better, but at least I don’t have to go home and worry about having made a mistake in my work. I know that if I did something, I did it, and I don’t have to come back to it nor do I have to worry about my arguments being too complicated or about how to convince others that my arguments are correct. I can just trust the computer.” (Emphasis added.)

    We also know from other area that many people, once they have their result, find explaining it a tedious chore that they do only because they’re in effect forced to. Consider some programmers and comments / documentation, for example. Some people who write fiction avoid creating an outline, because once they’d worked out the plot and the effects on their characters in an outline, they’d feel like they’d finished and wouldn’t enjoy writing it all out in sentences and paragraphs.

    However, I expect that in the future, we’re likely to evolve to a situation where we’re going to be satisfied with the verdict of the proof assistant, and we’ll regard the “understanding” of a coauthor or a human referee to be desirable, but not mandatory, at least as far as the correctness is concerned.

    That too, if it happens, will help foster the attitude in question.

    I don’t believe that “understanding” or “refereeing” are sacred activities that cannot be offloaded to machines without sacrificing the humanness of mathematics.

    Now, there is some point beyond which mathematics will cease to be a human activity. If there ever comes a day when the AIs are autonomously doing all the mathematics and humans are just sitting around with folded hands …, then yes, mathematics will no longer be a human activity. But as of now, that day is still far in the future.

    That looks like an argument that, so long as it’s still possible to construe mathematics as a “human activity” of some sort, even a change as great as humans giving up on understanding proofs would not be significant, and that (“sacred”) any objection could only be religious.

    1. I’m not going to quibble about anthropomorphic language. If you insist on calling a proof assistant a tool rather than a partner, that’s fine. The point is that it’s a tool that gives you significant feedback, of a kind that is not available from other tools, and that’s what makes it so powerful.

      I do agree that people can abuse tools, but I’m not yet convinced that interactive proof assistants are particularly prone to abuse. It’s important to distinguish between “understanding” in the sense of having a satisfying conceptual grasp of a topic, and “understanding” purely for the purposes of checking correctness. For example, you quote Voevodsky talking about verification of correctness, but then your comments immediately after the quote seem to refer to conceptual explanations, which is a very different kettle of fish. I will agree with you that if proof assistants become increasingly powerful and socially accepted, then we will likely outsource a lot of correctness-checking to them. It’s not clear to me that this will decrease our appetite for conceptual understanding.

      We don’t really have to wait for interactive proof assistants to become more mainstream to explore this question. Computer-assisted proofs have been around for at least half a century. Almost every time that I have had a discussion with a colleague about a highly computational proof (including, by the way, proofs that didn’t involve electronic computers), the conversation has focused on the unsatisfactory nature of a proof that only confirms correctness but that leaves the reader with no sense of what is going on. Offhand, I can’t remember any conversation that had the tenor of, “Well, we know it’s true, so there’s nothing more to see here.” I have not detected any decrease in the appetite for understanding. So I don’t see why interactive proof assistants are going to change that.

      Having said that, I do think that one possible future scenario is that there will be a proliferation of theorems that seem to be provable only by means of heavy computations. Let’s consider an analogy from chess. Chess instructors uniformly praise understanding over memorization of variations. But the advent of powerful computers has threatened this view of the game. Endgame tablebases have revealed some extraordinary complexities that seem to defy human understanding. Even John Nunn, who completed a Herculean task of combing through the five-man databases and writing three books about them, bringing some conceptual understanding to some notoriously difficult endgames such as K+B+B vs. K+N, admitted defeat when it came to K+Q+P vs. K+Q. And what about the opening? Top-level competition seems to require memorizing enormous numbers of computer-generated variations.

      Mathematics is not the same as chess, of course. Nevertheless, I can imagine a similar future in which there are increasingly many theorems which seem to defy conceptual understanding. Such an eventuality is not a pleasant one to contemplate. But still, even if this scenario comes to pass, my inclination is not to “shoot the messenger.” That is, I would still not blame the computers. Rather, it would just be an unpleasant fact of life that much of mathematics is just too difficult for us to understand. (Not that we shouldn’t still try to understand as much as we can.)

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top