# Another round with ChatGPT

ChatGPT is utterly unreliable when it comes to reproducing even very simple mathematical proofs. It is like a weak C-grade student, producing scripts that look like proofs but mostly are garbled or question-begging at crucial points. Or at least, that’s been my experience when asking for (very elementary) category-theoretic proofs. Not at all surprising, given what we know about its capabilities or lack of them.

But this did surprise me (though maybe it shouldn’t have done so: I’ve not really been keeping up with discussions of  the latest iteration of ChatGPT). I asked — and this was a genuine question, hoping to save time on a literature search — where in the literature I could find a proof of a certain simple result about pseudo-complements (and I wasn’t trying to trick the system, I already knew one messy proof and wanted to know where else a proof could be found, hopefully a nicer one). And this came back:

So I take a look. Every single reference is a total fantasy. None of the chapters/sections have those titles or are about anything even in the right vicinity. They are complete fabrications.

I complained to ChatGPT that it was wrong about Mac Lane and Moerdijk. It replied “I apologize for the confusion earlier. Here are more accurate references to works that cover the concept of complements and pseudo-complements in a topos, along with their proofs.” And then it served up a complete new set of fantasies, including quite different suggestions for the other two books.

Now, it is one thing ChatGPT being unreliable about proofs (as I’ve said before, it at least generates reams of homework problems for maths professors of the form “Is this supposed proof by Chat GPT sound? If not, explain where it goes wrong”). But being utterly unreliable about what is to be found in the data it was trained on means that any hopes of it providing even low-grade reference-chasing research assistance look entirely misplaced too.

Hopefully, this project for a very different kind of literature-search AI resource (though this one aimed at philosophers) will do a great deal better.

### 6 thoughts on “Another round with ChatGPT”

1. ChatGPT is utterly unreliable when it comes to reproducing even very simple mathematical proofs. It is like a weak C-grade student, producing scripts that look like proofs but mostly are garbled or question-begging at crucial points.

That’s very like what I see when I ask ChatGPT for a program. It returns code that looks superficially like it might do the right thing, but doesn’t. (Often it doesn’t even work on the examples ChatGPT gave when explaining the program.). If I point out a problem, it will say, oh, sorry, here’s a correct one, and it will be just as bad or worse.

It comes down to ChatGPT not actually understanding anything. Some tasks make that evident more quickly than others.

Quite a few people say they can get ChatGPT to write code for them, though, so there may be some programming tasks that it can handle moderately well.

Much the same goes for poetry. For example, when ChatGPT is asked to write a villanelle, the result is a poem with the right number of stanzas, each with the right number of lines, and with the right line repetitions and rhymes. (Other LLM systems I’ve tried have failed at one or more parts of that.) It’s also quite good at haiku and some other poetic forms. However, even in the midst of doing well it will sometimes get a rhyme wrong (“blow” does not rhyme with “old”) or fail to follow other rules when you ask for changes.

2. I don’t find that too surprising. LLMs don’t work by looking things up in their training data, they just imitate the patterns in how words were arranged. It makes sense that it’s capable of generating something which looks like some references, but there’s no reason to expect it to generate anything veridical.

I’m open to the idea that LLMs can potentially be useful here, if instead of using them to generate the results directly, you first find references with a traditional search engine, feed in the content of those webpages, and engineer the LLM to choose the most relevant sources and summarise them for you. perplexity.ai is attempting this approach. I’m not sure how useful this is: but worth trying out if you’re interested.

3. Just ask it to give you a syllabus with readings on any topic and you will see how good it is at listing articles and books that you would almost believe are real!

1. I was using the latest version of ChatGPT. I’d asked ChatGPT to show that in a topos, if a subobject has a complement, then this complement equals the result of pulling back along . It replied with a non-proof, prefaced by (correctly!) saying “it is indeed a well-documented result in topos theory.” So I asked “Thanks, but do you have page references for an actual published version of this “well-documented” proof?” And it is to this prompt I get back complete fictions.

Scroll to Top