Conceived in Liberty, Birthed in Imperfection: The Importance of Inconsistent AI Detectors

By Jake Rigdon
Owner, JakeGPT1973.com

About three months ago – ancient history now – I published a post on the ChatGPT subReddit asking people for their recommendations for the best AI-detectors.

That post didn’t initially generate much response, and those who did comment didn’t have a ton to add to the actual topic. Then again, that wasn’t entirely unexpected. ChatGPT was released in late November 2022, so I wasn’t expecting a huge response just three months later.

However, that post has continued to generate the occasional helpful comment, so I finally decided to test out all those suggested AI-detectors, along with several other popular ones that were not mentioned in the comments of that subReddit post.

Spoiler alert: The results weren’t great.

However, these tools should still play a major role in getting your content ready to publish – even if you don’t use AI tools.

First, let’s take a look at how 12 AI detectors fared against three famous speeches. Then we’ll see how they did against two blog posts: one that was aided by ChatGPT and another one that was 100-percent human created (this very blog post!).

How they did against famous speeches

I had higher expectations for these three famous speeches, so I was strict in what I considered a “passing” or “failing” grade. For these speeches, a truly accurate “passing score” should be no-doubt human-written, but I’ll be slightly generous and call it at 95 percent (or the equivalent).

Here are those results:

“I Have a Dream” speech

Open AI Text Classifier: “The classifier considers the text to be very unlikely AI-generated.”
GPT4Detector.ai: “There is a 46% probability that this text is fully generated by AI.”
GPT-2 Output Detector: 99.97 percent real, but this is the most generous grader.
Originality.ai (one of the tougher graders): 96 percent original.
CopyLeaks: “This is human text.” 99.9 percent human
Writer.com: “100% human generated” – but only takes 1,500 characters at a time.
ContentAtScale: “90% Likely to be Human.”
Winston.AI: 100% “Human Score
Grammica: 99.57% “Real”
ZeroGPT: 40.62% “AI” – but means it’s 40% “fake,” too, according to this app.
Sapling: Unexpected error, no matter how many times I tried or tightened up the copy
GPTkit.ai: Website won’t open, Gateway errors when trying to register, worked one time and gave pretty good results on a different blog post.

As you can see, seven out of 12 would be given a “passing grade,” meaning the speech got a 95 or higher (or were given an equivalent affirmation). However, two said there was about a 40 to 46 percent chance that the speech was AI-generated, while another fell below, but came relatively close to, the grading threshold.

That said, two of them have character counts, so it was hard to judge the effectiveness of their results – both said the speech was human-created – and two just flat-out didn’t work.

PASSING GRADES: 7 out of 12

“The Gettysburg Address,” second draft

OpenAI Text Classifier: “The classifier considers the text to be unlikely AI-generated.” – leaving open the possibility that it might also be AI-generated.
GPT4Detector.ai: “There is a 1% probability that this text is fully generated by AI.”
GPT-2 Output Detector: 99.97 percent real, but this is the most generous grader.
Originality.ai: 92% original
CopyLeaks: “This is human text.” 99.9 percent human
Writer.com: “100% human generated” – but 1500=character limit
ContentAtScale: “93% Likely to be Human.”
Winston.AI: 97% “Human Score”
Grammica: 99.98% “Real”
ZeroGPT: 0% “GPT”

Again, similar results. We removed the two tools that aren’t working and tested out the speech on the remaining 10 AI-detectors, but the results would have been the same – only this time, two of the sites that passed Martin Luther King Jr.’s speech wasn’t as impressed with Lincoln’s much shorter speech.

The biggest surprise? Originality.ai’s analysis, which said there was a 92 percent chance that the speech was written by a human. Again, 92 percent would be good enough for most people, but there’s no reason that speech shouldn’t get a 100 percent human score – especially from such a popular website.

That said, Originality.ai will often times fail content that other sites pass – but the opposite is true, too, as they will often times mark my own, 100-percent human-written content to be 40- to 80-percent “AI generated.”

PASSING GRADES: 7 out of 10

President John F. Kennedy's Inaugural Address

OpenAI Text Classifier: “The classifier considers the text to be very unlikely AI-generated.”
GPT4Detector.ai: “There is a 22% probability that this text is fully generated by AI.
GPT-2 Output Detector: 99.98 percent real, but this is the most generous grader.
Originality.ai: 99% original
CopyLeaks: “This is human text.” 98.4 percent human
Writer.com: “100% human generated” -- but 1500=character limit
ContentAtScale: “92% Likely to be Human.”
Winston.AI: 92% “Human Score”
Grammica: 99.98% “Real”
ZeroGPT: 21.02% “GPT AI Generated”, meaning, only 80 percent of this was “human.”

This time, six sites failed to recognize Kennedy’s famous speech as no-doubt human-written. Originality.ai got it right this time, calling the speech “99% human,” but another heavyweight in this space, Winston.AI, wasn’t so sure, giving it a 92 percent “Human Score.”

PASSING GRADE: 6 out of 10.

What about content that’s not famous?

Of course, you can’t just go by these results. Sure, they tell you something, but you don’t have the full picture.

So I ran a recent blog post my team wrote for another client through the same AI-detectors … and got pretty much the same results. Finally, I tested out this very post.

For the first post, ChatGPT wrote 100 percent of the first draft. However, the content was generated by using one of JakeGPT’s most impactful SEO prompts, which forces ChatGPT to emulate human writing (the reason that’s important is because this content will likely be better than any content generated by ChatGPT without). Then the post went through Copyscape and Grammarly. From there, my team used several keyword research tools to add in the right keywords – if necessary – and we used another tool to optimize the headline.

Then it went through two different human copy edits, which further tweaked the content. For the fourth draft, my team added a quote from the client. Next up: running the content through a chatbot app that ensured the post met Google’s guidelines and best practices for AI content.

The final version that was ready for publication, or the sixth-draft, included additional human tweaks before running it through Grammarly one more time

Ultimately, the post was a shade under 770 words, and of that total, approximately 7 percent of the copy was untouched by humans. Therefore, the post should have received passing scores from every detector, or, at worst, something along the lines of a 93 percent “human” grade or higher.

Again, only six out of 10 got it right. Originality.ai was fairly confident the post was mostly AI-generated, saying it was only 42 percent original. This time, though, Winston was spot-on, giving it an exactly-right 93 percent human score. GPT4Detector.ai – the AI-detector I use – was mostly certain the post was written by a human, calling it 72 percent original.

Finally, this particular blog post – again, completely AI-free – generated the most accurate results, as only one of the AI detectors had doubts about this content’s origins, marking it 75 percent “human-generated.” Unfortunately, that’s one of the AI detectors I use the most.

Are AI-Detectors Still Important?

While the results might be a bit disappointing and prove how far away the technology is, AI-detectors are still vitally important – even if your content is 100-percent human-written.

Think of it like predicting the weather.

If the weatherman for Channel 4 says there’s an 80 percent chance of rain tomorrow, the Channel 5 weatherman says the chance of rain is more like 50 percent, then the weatherman for Channel 8 says the odds are at 60 percent, what are you going to do?

I don’t know about you, but I’m bringing an umbrella.

And that’s how you have to treat these AI detectors. The tools I tested out are all amazing, and the developers should be applauded for their efforts. But I’m sure they would tell you that the technology isn’t quite “there” yet and that the results are still fairly inconsistent – and sometimes inaccurate – across platforms.

However, if your post is showing up as more than 30 percent AI, then you need to revise the copy – even if it’s 100-percent “human created.” The reason is simple: You could have used some kind of phrase or word combination that typically triggers the AI detectors. So, if any of these tools think your content is AI generated, then it’s possible that others do, too.

Including other clients. Or readers. Or Google.

So why risk it? It’s best to “bring your umbrella” and revise the copy accordingly.

But what if it doesn’t “rain?” Then at least you came prepared – which you’re doing by revising your content until it gets a higher “human” grade.

As far as whether you should revise your AI-flagged copy that was human-written, some of these detectors will point out the problem sentences. If you further examine those sentences/sections, most of the time it’ll be pretty obvious that the copy could be improved.

Sure enough, when checking the “questionable” sentences from the two blog posts – this one and the one my team wrote – the sections flagged as AI had something in common: those sections were poorly written.

#ai #chatbot #GPT4 #promptengineer #promptengineering #aiprompts #generativeai #llm #blogging #blogger #contentmarketing

-------------------

JakeGPT is the owner and operator of JakeGPT1973.com, an AI-powered digital marketing company based in Carrollton, Texas. Email him at jakegpt@jakegpt1973.com. Click here if you’d like to learn more about JakeGPT’s DIY SEO services or its newly expanded optimized blogging services.

Search This Blog

The JakeGPT Blog