AI Writing Detection Doesn’t Work.

By Kai Quizon,

When you buy through our links, we may earn an affiliate commission.

It took less than 2 weeks for schools across the nation to begin banning use of ChatGPT in school settings after its debut. In order to enforce any kind of ban on artificial intelligence, instructors must be able to detect if content is generated by a language model. To fill this new void, a variety of tools have sprung up online. Even the parent company of ChatGPT, OpenAI, has created a classifier.

This sounds like the next step of the well known turnitin.com, a critical component to fighting plagiarism in academia and schools. There’s only one problem: it doesn’t work.

Our classifier is not fully reliable. In our evaluations on a “challenge set” of English texts, our classifier correctly identifies 26% of AI-written text (true positives) as “likely AI-written,” while incorrectly labeling human-written text as AI-written 9% of the time (false positives). Our classifier’s reliability typically improves as the length of the input text increases. Compared to our previously released classifier, this new classifier is significantly more reliable on text from more recent AI systems.
OpenAI Blog, AI Classifier

OpenAI is oft heralded as the leader in large language models at the moment, yet even their content classifier has only a 26% success rate of correctly flagging AI-written text. If we were to be grading this bot as teachers grade their students, it would have failed by no small margin.

The reliability of tools designed to detect whether content has been generated by an AI is still in its early stages and is demonstrably unreliable. While these tools use various methods to identify patterns and characteristics associated with AI-generated content, their accuracy is limited not only by the skill of current AI, but the boundless advancement of AI generators. Because AIs like ChatGPT can be trained to mimic human writing styles and patterns using various style guidelines, it is very difficult for detection tools to distinguish this content from human written content.

How Do They Even Work?

AI content detectors typically work by analyzing the probability of the word matrix they are presented with. This is often referred to as probability “temperature.” If the temperature is low, the response is predictable and sensible. Let’s take an example to see how this works. We’ll start with the sentence fragment:

I am going to drink __.

The most common word following this sentence is coffee with a 62% chance. Following that is water with a 24% chance. When ChatGPT or another large language model is forming sentences, it will typically use formulations to keep the temperature low, either by finishing the sentence with “coffee” or “water.”

Low Temperature Distributions have very little variability in their response; they are predictable. These are represented by a “steep” bell curve, as seen above. High Temperature Distributions have a large amount of variability in their responses and take words further from the “normal.” Hence, High Temperature Distributions are wide bell curves, as seen above.

A good AI operating without any additional style guidelines will keep the temperature relatively low while staying on topic. This prevents the AI from seeming erratic or nonsensical during conversations, but also makes the writing sound programmatic and predictable. AI content detectors use this predictability and bank on the low temperature tendencies of AI to predict whether or not content is AI generated.

Ultimately, AI content checkers like writer.com look at the temperature of the sample provided. If the temperature is too low, the content is flagged as potentially being AI generated.

You may have noticed a fundamental flaw in this check: humans often like to sound sensible in their writing as well. This is even more true in technical writing, or when younger children begin writing formal papers for the first time. This introduces a secondary check that many tools use: human error tracking.

Many tools look at consistent use of Oxford commas, spelling mistakes, grammatical inconsistencies, and other idiosyncrasies of human writing as a secondary check on the likelihood of AI generation. This is intended to combat the chance that a human writer has a low temperature in their writing.

A Test

Let’s put one of the most popular AI content detection sites to the test. For our first test, we will use a piece of writing that is positively not written by an AI: the Constitution of the United States of America:

Hm! It seems nearly 50% of our Constitutional preamble is likely AI generated content! This is incredibly impressive for a document drafted in 1787. This demonstrates the core issue with temperature based detection: predictable sentences occur in both human generated content and AI generated content.

Let’s try the other way. First, let’s give ChatGPT some guidelines to generate a piece of text for us:

Now let’s feed our John Hammond apology to writer.com:

This time, the program has done better! It recognizes that this is likely AI generated content, but not nearly as well as one may hope. Writer.com believes that approximately 45% of the sample provided is AI generated, despite it being 100% AI generated. Using style guidelines in ChatGPT are an easy and quick way to subvert AI content detectors.

Forging Ahead

Despite their obvious shortcomings, these content classifiers are being deployed in academia across the nation, often with dire consequences for students that have their work flagged as likely generated by an AI. The pinnacle of plagiarism prevention, turnitin.com, has dedicated an entire web page (https://www.turnitin.com/solutions/ai-writing) to their efforts in AI and AI content detection. Ranging from ethics to applications, turnitin.com implores students to further investigate before simply turning in ChatGPT’s responses.

Academia and schools find themselves in a self described “crisis.” Students are now able to rapidly generate believable essays with very little effort. There are still further questions to be answered, like who owns ChatGPT’s responses? Is it plagiarism to copy directly from ChatGPT when your prompt caused a tool to generate a response? What is plagiarism in the age of AI? Many of these questions will need to be answered as AI text generators continue to invade our lives.

For the time being, there are not reliable and accurate methods for detecting AI content, despite many tools rapidly coming to market. We now face an uncertain reality where artificial intelligence is capable of writing in humanistic manners, making it near impossible to detect the difference between man and machine. We have progressed far beyond the simple days of the Turing Test.

Keep Reading