Conceived in Liberty, Birthed in Imperfection: The Importance of Inconsistent AI Detectors
By Jake Rigdon
Owner, JakeGPT1973.com
About three months ago
– ancient history now – I published a post on the ChatGPT subReddit asking people for their recommendations for the best AI-detectors.
That post didn’t
initially generate much response, and those who did comment didn’t have a ton
to add to the actual topic. Then again, that wasn’t entirely unexpected.
ChatGPT was released in late November 2022, so I wasn’t expecting a huge
response just three months later.
However, that post
has continued to generate the occasional helpful comment, so I finally decided
to test out all those suggested AI-detectors, along with several other popular
ones that were not mentioned in the comments of that subReddit post.
Spoiler alert: The
results weren’t great.
However, these
tools should still play a major role in getting your content ready to
publish – even if you don’t use AI tools.
First, let’s take a look at how 12 AI detectors fared against three famous speeches. Then we’ll see how they did against two blog posts: one that was aided by ChatGPT and another one that was 100-percent human created (this very blog post!).
How they did against famous speeches
I had higher expectations
for these three famous speeches, so I was strict in what I considered a “passing”
or “failing” grade. For these speeches, a truly accurate “passing score” should
be no-doubt human-written, but I’ll be slightly generous and call it at 95
percent (or the equivalent).
Here are those
results:
“I Have a Dream” speech
- Open AI Text Classifier: “The classifier considers the text to be very unlikely AI-generated.”
- GPT4Detector.ai: “There is a 46% probability that this text is fully generated by AI.”
- GPT-2 Output Detector: 99.97 percent real, but this is the most generous grader.
- Originality.ai (one of the tougher graders): 96 percent original.
- CopyLeaks: “This is human text.” 99.9 percent human
- Writer.com: “100% human generated” – but only takes 1,500 characters at a time.
- ContentAtScale: “90% Likely to be Human.”
- Winston.AI: 100% “Human Score
- Grammica: 99.57% “Real”
- ZeroGPT: 40.62% “AI” – but means it’s 40% “fake,” too, according to this app.
- Sapling: Unexpected error, no matter how many times I tried or tightened up the copy
- GPTkit.ai: Website won’t open, Gateway errors when trying to register, worked one time and gave pretty good results on a different blog post.
As you can see,
seven out of 12 would be given a “passing grade,” meaning the speech got a 95
or higher (or were given an equivalent affirmation). However, two said there
was about a 40 to 46 percent chance that the speech was AI-generated, while
another fell below, but came relatively close to, the grading threshold.
That said, two of them have character counts, so it was hard to judge the effectiveness of their results – both said the speech was human-created – and two just flat-out didn’t work.
PASSING GRADES: 7 out of 12
“The Gettysburg Address,” second draft
- OpenAI Text Classifier: “The classifier considers the text to be unlikely AI-generated.” – leaving open the possibility that it might also be AI-generated.
- GPT4Detector.ai: “There is a 1% probability that this text is fully generated by AI.”
- GPT-2 Output Detector: 99.97 percent real, but this is the most generous grader.
- Originality.ai: 92% original
- CopyLeaks: “This is human text.” 99.9 percent human
- Writer.com: “100% human generated” – but 1500=character limit
- ContentAtScale: “93% Likely to be Human.”
- Winston.AI: 97% “Human Score”
- Grammica: 99.98% “Real”
- ZeroGPT: 0% “GPT”
Again, similar
results. We removed the two tools that aren’t working and tested out the speech
on the remaining 10 AI-detectors, but the results would have been the same –
only this time, two of the sites that passed Martin Luther King Jr.’s speech wasn’t
as impressed with Lincoln’s much shorter speech.
The biggest
surprise? Originality.ai’s analysis, which said there was a 92 percent chance
that the speech was written by a human. Again, 92 percent would be good enough
for most people, but there’s no reason that speech shouldn’t get a 100 percent
human score – especially from such a popular website.
That said, Originality.ai will often times fail content that other sites pass – but the opposite is true, too, as they will often times mark my own, 100-percent human-written content to be 40- to 80-percent “AI generated.”
PASSING GRADES:
7 out of 10
President John F. Kennedy's Inaugural Address
- OpenAI Text Classifier: “The classifier considers the text to be very unlikely AI-generated.”
- GPT4Detector.ai: “There is a 22% probability that this text is fully generated by AI.
- GPT-2 Output Detector: 99.98 percent real, but this is the most generous grader.
- Originality.ai: 99% original
- CopyLeaks: “This is human text.” 98.4 percent human
- Writer.com: “100% human generated” -- but 1500=character limit
- ContentAtScale: “92% Likely to be Human.”
- Winston.AI: 92% “Human Score”
- Grammica: 99.98% “Real”
- ZeroGPT: 21.02% “GPT AI Generated”, meaning, only 80 percent of this was “human.”
This time, six sites failed to recognize Kennedy’s famous speech as no-doubt human-written. Originality.ai got it right this time, calling the speech “99% human,” but another heavyweight in this space, Winston.AI, wasn’t so sure, giving it a 92 percent “Human Score.”
PASSING GRADE: 6 out of 10.
What about content that’s not famous?
Of course, you can’t
just go by these results. Sure, they tell you something, but you don’t have the
full picture.
So I ran a recent
blog post my team wrote for another client through the same AI-detectors … and
got pretty much the same results. Finally, I tested out this very post.
For the first post, ChatGPT wrote 100 percent of the first draft. However, the content was generated by using one of JakeGPT’s most impactful SEO prompts, which forces ChatGPT to emulate human writing (the reason that’s important is because this content will likely be better than any content generated by ChatGPT without). Then the post went through Copyscape and Grammarly. From there, my team used several keyword research tools to add in the right keywords – if necessary – and we used another tool to optimize the headline.
Then it went through two different human
copy edits, which further tweaked the content. For the fourth draft, my team
added a quote from the client. Next up: running the content through a chatbot
app that ensured the post met Google’s guidelines and best practices for AI
content.
The final version
that was ready for publication, or the sixth-draft, included additional human
tweaks before running it through Grammarly one more time
Ultimately, the
post was a shade under 770 words, and of that total, approximately 7 percent of
the copy was untouched by humans. Therefore, the post should have received
passing scores from every detector, or, at worst, something along the lines of
a 93 percent “human” grade or higher.
Again, only six out
of 10 got it right. Originality.ai was fairly confident the post was mostly
AI-generated, saying it was only 42 percent original. This time, though,
Winston was spot-on, giving it an exactly-right 93 percent human score. GPT4Detector.ai
– the AI-detector I use – was mostly certain the post was written by a human,
calling it 72 percent original.
Finally, this particular blog post – again, completely AI-free – generated the most accurate results, as only one of the AI detectors had doubts about this content’s origins, marking it 75 percent “human-generated.” Unfortunately, that’s one of the AI detectors I use the most.
Are AI-Detectors Still Important?
While the results might
be a bit disappointing and prove how far away the technology is, AI-detectors
are still vitally important – even if your content is 100-percent
human-written.
Think of it like
predicting the weather.
If the weatherman for
Channel 4 says there’s an 80 percent chance of rain tomorrow, the Channel 5
weatherman says the chance of rain is more like 50 percent, then the weatherman
for Channel 8 says the odds are at 60 percent, what are you going to do?
I don’t know about
you, but I’m bringing an umbrella.
And that’s how you
have to treat these AI detectors. The tools I tested out are all amazing, and
the developers should be applauded for their efforts. But I’m sure they would
tell you that the technology isn’t quite “there” yet and that the results are
still fairly inconsistent – and sometimes inaccurate – across platforms.
However, if your
post is showing up as more than 30 percent AI, then you need to revise the copy
– even if it’s 100-percent “human created.” The reason is simple: You could
have used some kind of phrase or word combination that typically triggers the AI
detectors. So, if any of these tools think your content is AI generated, then
it’s possible that others do, too.
Including other clients.
Or readers. Or Google.
So why risk it? It’s
best to “bring your umbrella” and revise the copy accordingly.
But what if it
doesn’t “rain?” Then at least you came prepared – which you’re doing by
revising your content until it gets a higher “human” grade.
As far as whether
you should revise your AI-flagged copy that was human-written, some of these
detectors will point out the problem sentences. If you further examine those
sentences/sections, most of the time it’ll be pretty obvious that the copy
could be improved.
Sure enough, when
checking the “questionable” sentences from the two blog posts – this one
and the one my team wrote – the sections flagged as AI had something in common:
those sections were poorly written.
#ai #chatbot #GPT4 #promptengineer #promptengineering #aiprompts #generativeai #llm #blogging #blogger #contentmarketing
-------------------
JakeGPT is the owner and operator of JakeGPT1973.com, an
AI-powered digital marketing company based in Carrollton, Texas. Email him at jakegpt@jakegpt1973.com. Click here if you’d like to
learn more about JakeGPT’s DIY SEO services or its newly expanded
optimized blogging services.
Comments
Post a Comment