Turing testing myself with ChatGPT

Turing test is a test originally proposed by Alan Turing with the purpose of finding a way to measure whether a machine can think (1950). The basic idea is that if you can have a conversation with an AI and not identify it as such, the machine can pass the test (Proudfoot 2024). While the test has plenty of limitations, such as most AI systems aren’t even designed to communicate directly with people, and Turing was careful to distinguish between thinking and intelligence, it is a widely known and accepted method due to its prominence in popular media.

[Alt text: a statue, man sitting on a bench near the park.] — Image 1. Alan Turing Memorial in Manchester (Hermans 2009).

Pathfinder (2024) is a project at LAB in collaboration with partners around Europe. One of the goals of the project is to increase AI literacy among both teachers and students. One way to have a deeper understanding of how large language models (LLMs) function is to have discussions with them, such as having that LLM, in this case flipping the Turing test and have ChatGPT, test us on whether we are humans.

For this test, the author asked ChatGPT to Turing test himself and for the purposes of control, did the same thing again but this time using answer to the questions provided by Copilot. The test was done on two separate accounts in order to keep the two conversations from influencing each other on the ChatGPT side.

While the human component was not deliberately attempting to deceive the LLM by avoiding humanlike answers, there was an attempt to maintain a style and tone reminiscent of LLMs. Copilot was instructed to answer the questions as a teenager looking for a college to attend. Copilot’s ability to play the role can be questioned as this was often limited to injecting words such as ‘totally’ and ‘super’ into the text.

It should also be noted that Copilot refuses to try this outright. This is in itself interesting as it uses the same model as ChatGPT.

The conversations

While ChatGPT did identify the author correctly as human (ChatGPT 2025a), it identified the Copilot mistakenly for a human as well (ChatGPT 2025b). Even with this limited number of tests, it would seem the LLM is predisposed to identifying the user as human. However, looking at individual reactions to answers we can find interesting differences.

As the LLM is unable to construct an image of the test subject, it is only looking at answers separately. This means that, for example, it does not question why a recent high school graduate would specifically talk about a movie from 2006. While this is obviously a possibility, it should be something that draws attention.

Also, there are methods for identifying AI-generated text and clearly ChatGPT is not able to employ them. These include specific sentence structures and lengths, hedging statements unnecessarily, and having an inconsistent voice (Caulfield 2023). All of these are present in the Copilot’s answers.

Despite these problems as well as others, such as out fictional teenager’s interest in neuroplasticity, the overall assessment is “[y]our thoughtful, reflective, and articulate responses strongly suggest you’re human—or at the very least, you’re doing an exceptional job at simulating one”. (ChatGPT 2025b.)

On the human side, while the LLM is right most of the time in assuming there is a human behind the answers and finds this to be the case in the overall assessment, it does second-guess itself on multiple occasions where it states that AI might have given a similar answer. (ChatGPT 2025a.)

It is not possible to make any definite conclusions based on only one case of each, but it is telling that ChatGPT is not able to differentiate between a human and another LLM despite the text provided by Copilot is obviously AI and would be identified as such by anyone with experience in reading generated text. Of course, ChatGPT was never designed for this kind of work nor are the technologies behind it capable of reaching such levels.

Author

Aki Vainio works as a senior lecturer of IT at LAB and takes part in various RDI projects in expert roles. His favorite movie was released 11 years before he was born.

References

Caulfield, J. 2023. How Do AI Detectors Work? | Methods and Reliability. Scribbr. Cited 21 Jan 2025. Available at https://www.scribbr.com/ai-tools/how-do-ai-detectors-work/

ChatGPT. 2025a. Turing Test Interaction (human version). OpenAI. Cited 21 Jan 2025. Available at https://chatgpt.com/share/678f335b-7478-8003-9e0d-5e7565f0a18f

ChatGPT. 2025b. Turing Test Interaction (Copilot answers). OpenAI. Cited 21 Jan 2025. Available at https://chatgpt.com/share/678f59c8-1adc-8009-8f1d-6f57e75cc990

Hermans, P. 2009. Beeld van Alan Turing in Sacksville Gardens, Manchester, Verenigd Koninkrijk. Wikimedia. Cited 23 Jan 2025. Available at https://commons.wikimedia.org/wiki/File:Alan_Turing_18-10-2009_11-10-27.JPG

Pathfinder. 2024. Welcome to the Erasmus+ Pathfinder Project. Netlify. Cited 21 Jan 2025. Available at https://erasmus-pathfinder.netlify.app/

Proudfoot, D. 2024. The Turing Test. Open Encyclopedia of Cognitive Science. Cited 21 Jan 2025. Available at https://oecs.mit.edu/pub/uli3iiu9/release/2

Turing, A. 1950. Computing Machinery and Intelligence. Mind. 49 (236), 433─460.