Is GPTZero Accurate? Can It Detect ChatGPT? Here’s What Our Tests Revealed

ChatGPT has taken the world by storm ever since it made the news in November 2022. People have started using it in their daily routine as it can be of great help to get logical answers to your questions about the things around you. With its increasing popularity, the tool has inspired more large language models, even from the likes of Google and Meta, which can be equally a cause of concern as they’re exciting.

In the months since its launch, there have been several instances of students misusing ChatGPT to create essays and submit assignments, as the tool can generate comprehensive content with just a simple prompt. To counter the misuse of AI-generated content, there’s now a new tool – GPTZero that educators and journalists can use to check whether a piece of writing was created using AI or not.

In this post, we’ll explain what GPTZero is, how you can use it, and how far you can trust it to reliably detect and distinguish a human article from AI-generated content.

What is GPTZero

Developed by Princeton University student Edward Tian, GPTZero is a software that uses statistical analysis to detect whether a text was written by a human or was copied from an AI content generator like ChatGPT. The tool has been designed to help people in education, journalism, and other sectors fight AI plagiarism and know when they’re viewing texts generated by large language models (LLMs), one of which is ChatGPT.

With the ever-increasing popularity of tools like ChatGPT, many people have been misusing the written content generated by AI services and passing it as their own. GPTZero vowes to make the use of AI work transparent by detecting the complexity of texts with two major factors – Perplexity and Burstiness.

Perplexity – refers to the measure of randomness of the input text that GPTZero will compare with how text from a language model would look like. The higher this score is, the more chances are that the text was written by a human and not by a machine.

Burstiness – refers to the distributions of texts in a text. While text generated by AI is of uniform lengths all around, those written by humans may include both long and short sentences with smooth patterns. The higher the Burstiness score of a text, it’s more likely that it was written by a human.

Besides determining whether the text you copied to the tool is written by AI or humans, GPTZero can also detect parts of the text that may have been generated using an LLM. If an article was written using both AI and human work, the tool will highlight parts that it thinks could be created using artificial intelligence. In some cases, GPTZero may also determine that the input text is “more likely human written” but includes “sentences with low perplexities” so that you can improve on them.

How can you use GPTZero

While you may require an account to use ChatGPT, using GPTZero is fairly easy as you don’t require an account or a subscription to check whether a text was written by AI. This means you don’t have to share your personal info, like your email address or phone number, to start using the service. All you need to use GPTZero are:

A device like a computer or a phone that can connect to the internet
An active internet connection
A web browser to launch the GPTZero website

Once you have these requirements sorted, launch GPTZero on a web browser on any of your devices. We’re using it on Firefox on a Mac in this instance but you could use any browser across any computer or phone.

When the GPTZero loads up, scroll down to the Try it out section. In the text box that’s visible under it, copy and paste the text that you want to check for AI plagiarism. The text you paste in here should at least be 250 characters in length for the detector to analyze it.

You can also check texts from a document you have on your device for AI involvement by clicking on Browse underneath the text box. From there, you can upload a file across these supported formats – PDF, DOCX, and TXT to allow GPTZero to analyze it.

Note: When pasting texts or uploading documents to GPTZero, you need to keep in mind that the service may access, store or use any information you share with it. So, you need to avoid sharing any sensitive information like contact details or location here to avoid privacy concerns in the future.

Once you’ve entered a text you want to check, click on Get Results.

GPTZero will instantly check the text you shared and determine its results. You will get the result below letting you know whether the text was written by a human or was AI-generated.

Based on the text you input, you may get any of the following results:

Your text is likely to be written entirely by a human.
Your text is likely to be written entirely by Al.
Your text is most likely human written but there are some sentences with low perplexities.
Your text may include parts written by Al.

You will see more details about the results as you scroll downward. If GPTZero detects any AI involvement in your text, the portion that the tool determined as AI-written will be highlighted in yellow.

When you scroll further, you will see a detailed analysis of the input text with its Perplexity and Burstiness measurements under the “Stats” section. These measurements will be indicated in numerical and you’ll see how it fares in a bar chart. The lower a text scores in both Perplexity and Burstiness values, the higher the chance that it was written with the help of an AI content generator.

At the end of the Stats section, GPTZero will also show the sentence with the highest perplexity as well as its individual score. This doesn’t necessarily mean that this portion of the text was written by humans but it’s an indicator that this portion has the least possibility of being written using AI.

Is GPTZero accurate?

TL;DR version: In our limited time testing the software, we were able to deduce that GPTZero accurately determines texts generated by ChatGPT almost all the time. And when it comes to checking texts written by humans, that’s where it hits a roadblock.

While GPTZero can easily detect content generated by AI, it also flags content written by humans as “written by AI” even when it’s not. This beats the purpose of using this tool to check for AI-generated content since GPTZero can also mark false positives when the actual result is negative.

Full version: To test whether GPTZero is able to determine whether a text is AI-generated or written by humans, we put it to use ourselves. Before we reveal how accurate the tool is, you need to first understand how we tested it so that you get a general idea of how the service works.

How we tested GPTZero

To thoroughly put GPTZero to the test, we used texts from our existing articles on Nerdschalk.com and copied different sections of texts from these articles like the intro and guides. Inside GPTZero, we pasted the copied excerpts from those articles and checked them for AI involvement.

Along with human-written texts (our content), we also wanted to test whether GPTZero detects texts generated through AI. For this, we used ChatGPT and used it to create intros and guides for the same topics we copied texts from Nerdschalk.

To give you an instance, we asked ChatGPT to create us an intro for this post – How to Unmerge Cells in Google Docs.

When the service generated a response to our query, we copied the AI-written text and pasted it on GPTZero’s text box to check for its legitimacy.

Similarly, we copied the intro from our own post and checked it on GPTZero for AI involvement.

To make sure that we can determine the consistency of GPTZero’s results, we tested this with at least 10 excerpts of texts each from our own posts and the ones we asked ChatGPT to create on the same topic as our posts. This is what we found.

Does GPTZero detect texts written by ChatGPT?

For a tool designed to detect texts written using AI, GPTZero does a really good job at recognizing the texts created using ChatGPT. Every time we copied content we asked ChatGPT to create, GPTZero was able to accurately ascertain that it was likely written with the help of AI.

For text created by ChatGPT, GPTZero would either determine that the entire text was written by AI or includes parts of the text that had AI involvement. To help you understand how it found AI-written texts, GPTZero would show you Perplexity and Burstiness scores at the end of each result.

For AI-generated texts, the software consistently revealed low Perplexity values to indicate that they were easier to predict which in a human’s case would be hard as one’s lexical knowledge will be different from others and thus texts may seem a little more random. The same was also true when determining the Burstiness value as texts generated by ChatGPT scored lower, thus indicating that the sentences used were more uniform in terms of length.

The tool would also isolate portions of text it thinks has the highest likelihood of being generated through AI. Look at this screenshot, for example:

Although this is still a small sample scale, we could conclude that GPTZero fared quite well in flagging ChatGPT-generated content as AI-written.

Does GPTZero detect texts written by humans?

Now, this is where we hit a roadblock. While GPTZero was able to easily determine ChatGPT texts are AI-written, it did the same thing for even texts we copied from our original Nerdschalk articles. As we used texts from the same topic we asked ChatGPT to create, GPTZero could only correctly detect that the specific text was written by a human twice across ten attempts.

In both of the “successful” instances, we got varied results as to how much of the text GPTZero thinks was written by us. For example, when we checked this excerpt from our original post, the software showed an accurate result saying this text was possibly written entirely by a human.

However, when we scrolled to check its Perplexity and Burstiness scores, the values (42.5 and 13.4) that were shown were lower than that of the text generated by ChatGPT (which were 46 and 20.8). This means even the parameters that were used to determine a text’s AI involvement were inconsistent, although the result was accurate in this instance.

Another instance where GPTZero got right was when we copied portions of text from this Nerdschalk post. Unlike the previous case, although the tool was able to conclude that it was written by a human, it found sentences within the excerpt that had lower perplexity values. It even highlighted the sentences it thought were written by AI when the whole text was originally written by us.

When we compared this text’s stats with the previous one, GPTZero showed a similar Perplexity score of 40.2 with a slightly higher Burstiness value of 17.9.

As for the other results, the software wrongly flagged 8 out of 10 portions of text we wrote as those generated by AI. Like for instance, an intro from this original post was shown as “likely to be written entirely by AI”…

while another portion of the same post revealed a slightly different result like this –

…which is confusing as the same excerpt scored the highest Perplexity and Burstiness marks of 76.3 and 59.3, higher than any other text we submitted on GPTZero.

This goes to say that GPTZero, being in its early phase, isn’t capable of detecting texts written by humans with the same accuracy as how it detects content generated through AI.

How accurate is GPTZero?

In our testing of the software, we came to the conclusion that results from GPTZero were passable at best, owing to the tool’s inconsistency in detecting texts written by humans. In spite of the fact that it was able to read and detect ChatGPT-generated content as AI-written, the software’s inability to recognize short sentences and texts written by humans makes it an unreliable tool for educators or journalists to check for AI plagiarism.

Since the purpose of such a tool is to help people tackle the unethical use of AI content generation, GPTZero cannot be used with 100% reliability. This isn’t to say that there’s no scope for improvement – as the recognition of texts can attain a higher potential as the software adds more data from other large language models (LLMs) to enhance the accuracy of recognition. For now, though, you can use GPTZero with a pinch of salt and your own ability to distinguish words written by humans and a machine.

READ SOURCE