ChatGPT will lead you astray — but you know that!

There’s no doubt that playing around with ChatGPT 4 can be fun. And the DALL·E image-generating capacities are really rather remarkable. Here I am, sitting in the ruins, as the world seems to be falling apart around me, trying to distract myself with matters categorial.

I’m mighty glad, though, that I am retired from the fray and am not having to cope with students using ChatGPT for good or ill. To be sure, it has its academic uses. For example, a few trials asking it to recommend books on various topic provided quite sensible lists (and it even had the good taste to recommend a certain intro to formal logic … and it doesn’t know who’s asking!). But as for doing your writing for you …

I might be deceiving myself, but in the Cambridge supervision system, where students have to argue about what they have written, week by week, you won’t get away with relying too much on ChatGPT to write your essays. But elsewhere, in places where one-to-one (or one-to-two) teaching-time is nowhere near so generous, how will teachers negotiate the new situation? There’s an interesting and not exactly cheering discussion thread here, most relevant to philosophers, on Daily Nous.

In a different kind of usage, I did try asking ChatGPT some elementary questions in category theory. For example, it is well known that not all Xs are Ys (the details don’t matter): I had a slighty messy example, but am sure it is easy to do better. So I asked for the simplest case of an X which isn’t a Y. And got back a very nicely constructed answer, which set things up very well, explained the notions involved and gave a supposed example that looked superficially plausible. But it was just wrong, though it took me a little while to see it. So I pressed for more detail of why the described X wasn’t a Y. And got back more superficially plausible chat, which I could imagine well taking in a novice student.

The same again, when I asked ChatGPT to fill in some details of a sketched proof of a well-known categorial result (the sort of place where one might arm-wave in a lecture, and say “we can now easily show ….”). Again its supposed completions had just the right look-and-feel. But were in fact just wrong at key points.

This might be good for teachers — a whole new class of examples to use: “Here is a ChatGPT proof. Is it right? If not where does it go wrong?”. But not so good, perhaps, for mathematics students: somewhat less strong students who lean on ChatGPT and aren’t suitably primed are going to repeatedly end up with flatly false beliefs about which alleged proofs really are in good order.

11 thoughts on “ChatGPT will lead you astray — but you know that!”

  1. I like the acknowledgment of ChatGPT as an available resource and then analyzing where it goes wrong! ChatGPT is clearly not such a good accomplice in analyzing its own folly.

    Criticism of ChatGPT here sounds valid. It is seemingly effective even when it does not really understand anything. In my view, this property makes it a rather reliable first-line (beginning) exploratory tool whose responses should be carefully corroborated.

    Like everyone else, I have had some very good responses on some topics about which I (think of me as an enquiring student here) was unclear. On some topics it failed miserably and unknowingly went on drowning under its own weight, and didn’t even know that it had failed (think of me as somewhat more knowledgeable than it about something — perhaps a teacher).

    And still, while wearing these two hats, I felt like consulting it on some matters. Has ChatGPT become a routine check in my workflow? No, it hasn’t yet. But given that its exploratory usefulness is only going to get better (OpenAI has a huge vested interest in its perceived usefulness), my prejudice against using it at least in the beginning of any serious work is gradually waning.

    1. ChatGPT certainly has its role to play in serious work. I had occasion to ask it the other day “What are simple, natural, examples of X?” (for a category-theoretic value of X!). And I got back a very sensible list. I then asked for more examples, and got a handful more, less simple, examples. I already knew all the examples well enough, and in fact I wasn’t really looking for news. But I did get what I wanted, which was a degree of confirmation that I wasn’t having a memory lapse and clean forgetting a central paradigm example which keeps cropping up in the literature (or at least, in the literature which ChatGPT has been trained on). And that confirmation was genuinely useful.

      1. Dumb question from a non-user of ChatGPT: When it comes time to give credit where credit is due, say when you want to use a non-obvious example in a publication, is ChatGPT as helpful in telling you who came up with the example as it was in telling you what the example is?

        1. Interesting question. I just asked ChatGPT for a textbook source for one of the topological examples it had given me of an X. It asserted that the example was covered in Munkres. No it isn’t! So I asked for a page reference in Munkres and it went off and looked at the contents page and index(?) and came back on that basis and said the example wasn’t in the book. (Given my entirely superficial understanding of how ChatGPT is trained, I’m not surprised it doesn’t “remember” sources of the training data.)

  2. On the basis of the discussion so far, without having used ChatGPT myself, I have the impression that a major weakness of the system is that it not honest about admitting its own limitations. When it is unable to answer a precise question properly, it (sometimes, often, always?) fudges with a simulacre of one; when challenged on that it produces another, until the user tires. Just like a bureaucrat and some teachers when in the same situation….

    1. I would say its answers are always simulacra: even when it happens to get something right, it’s generating text without understanding anything. And that lack of understanding is often revealed.

      However, it has some surprising abilities. For example, ChatGPT-3.5 can write a villanelle about pretty much anything (while Bard, it seems, cannot). This is interesting, because it has to repeat certain lines exactly, word for word, in a certain pattern, as well as following a rhyme scheme, having the right number of lines per stanza, and so on. It’s at least not obvious how it manages to do that.

  3. How does one get to play with ChatGPT 4? (GPT-4 as opposed to the 3.5 which still seems to be what you normally get as ChatGPT)

    User: Which version of GPT are you?

    ChatGPT: I am ChatGPT, based on the GPT-3.5 architecture developed by OpenAI. If there have been newer versions released since my last update in January 2022, I may not be aware of them.

    Anyway, using 3.5 my experience with asking it to write programs is like yours with proofs: the result is superficially plausible but doesn’t actually work. If I point out a problem, it apologises and produces a new version which can be even further from working.

    1. There is a paid option for ChatGPT Plus (though I think you can get some free access to ChatGPT 4 by other routes). On programs, I’m told by The Daughter — a software engineer — that ChatGPT can be pretty helpful when getting on top of coding in a new language because it can provide very useful commenting in response to questions of the kind “what does this fragment of code do?”. I guess that you’d expect it to be much better on that sort of thing, given that what’s going on under the bonnet with ChatGPT is a lot of pattern recognition?

  4. AI – starting with these amazing AI generated images – has me convinced that in 100 years (or less) it will take over all productive human work, with the most menial the last to go as I think designing functional robots will take some time.
    I don’t think our culture (globally) fundamentally values the human organism; without a major change (or a Luddite revolution, perhaps) our most difficult times are ahead.

    1. It’s possible that the capabilities of machine-learning-based AI of the sort we’re seeing now will continue to significantly increase, which would be worrying. However, it’s also possible that they won’t, and that we’re already near the limit of what such systems can do.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top