Original human-written text and AI-paraphrased document connected by a semantic trace, illustrating para-plagiarism detection through semantic similarity rather than exact text matching.

A joint article with Dr. Monika Oertner, writing-pedagogy specialist and lecturer in Konstanz, Germany. The S³ results shown here come from test runs conducted by Mentafy, based on a case study by Oertner. She was positively surprised by the results from Cologne.

For a good two years now, a sentence has been circulating in examination boards that sounds like a final verdict: in the age of freely available AI, the unsupervised writing assignment can no longer be saved. Just have the essay reworded once by an AI, the argument goes, and no plagiarism software in the world can still catch the para(phrased)-plagiarism that results. The first half of that sentence is true. The second half, this article argues, no longer is.

Why has para-plagiarism become so widespread in the meantime? Since ChatGPT arrived (late 2022), AI use among UK students has risen — according to the HEPI/Kortext surveys — from 66% (2024) to 92% (2025) to 95% (2026); paraphrasing tools such as QuillBot are used en masse to rewrite text so that it slips past detection. Generating an entire paper from scratch remains very hard for an AI even in 2026. Maintaining a strong level and a coherent argument across many pages is something that even "deep research" tools manage only to a limited degree. Taking a paper that has already received a good grade and having it paraphrased throughout, by contrast, is easy, inconspicuous, and far less risky in terms of quality. That is precisely why "copy, shake, paste" is so attractive. The pattern is not new. Translating back and forth between languages was a similar trick that defeated the detection tools. Generative AI now makes the act of para-plagiarizing utterly effortless.

In this article we show an example of a para-plagiarism that you can reproduce yourself in two minutes. It is based on the work of a writing advisor who documented the problem precisely before we did.

What Turnitin No Longer Sees After a Single ChatGPT Pass

Dr. Monika Oertner has advised students on academic writing at HTWG Konstanz since 2011. In 2025, prompted by a piece of journalistic research by Spektrum der Wissenschaft, she carried out a small but razor-sharp investigation. She took a definition from Wikipedia, had ChatGPT paraphrase it several times, and checked each version with Turnitin — the leading and at the same time most expensive provider on the market. The result: the unaltered Wikipedia original was identified as plagiarism, at 86% similarity. After a single paraphrasing pass, however, the value dropped to 0%: "No matches found."

Oertner's conclusion (at the time of her study):

"Plagiarism detection in the age of generative AI is an anachronism. Academic and scholarly fraud can no longer be countered by technical means. Anyone who plagiarizes with AI assistance learns nothing in the process and contributes nothing to the advancement of human knowledge — but plagiarism software cannot catch them, either. An appeal to the integrity and responsibility of those involved is all that universities have left."

— Dr. Monika Oertner (translated from the original German)

She is right — for the classical methods. Character-based plagiarism checking, which is essentially what Turnitin does, compares the sequence of letters in a text. Change the words, and the trail disappears, even though the act — the appropriation of someone else's intellectual work — remains the same. This is exactly the new disguise method that renders conventional plagiarism checks useless and, in Oertner's view, makes them an "anachronism."

But that verdict applies only to one particular kind of tool. And this is where we come in. How, we explain below.

The Disguise Trick, Step by Step

The unsettling thing about it is how banal it is. You need no special software, no text spinner, nothing of the sort — just a single command to ChatGPT. Here is the starting point: the plagiarism from Wikipedia. The source references, which appear as footnotes in the online encyclopedia, were inserted in parentheses when the text was copied over.

A note on language: Oertner's study used the German Wikipedia, so the example below is in German — and the detection scores belong to that actual text. English translations follow in brackets.

Step 0 — The original (Wikipedia, entry "Retrieval-Augmented Generation"):

„Unter Retrieval-Augmented Generation (RAG) versteht man ein Softwaresystem, welches Information Retrieval mit einem Large Language Model kombiniert. Eine Abfrage, welche an das System gestellt wird, kann hierbei auf Informationen aus (externen) Informationsquellen, Datenbanken oder dem World Wide Web zugreifen statt nur auf die Trainingsdaten des Modells (vgl. Manning/Raghavan/Schütze 2009; Graininger/Trunbull/Irwin 2024). […]"

[English: "Retrieval-Augmented Generation (RAG) refers to a software system that combines information retrieval with a large language model. A query submitted to the system can draw on information from (external) information sources, databases, or the World Wide Web, rather than only on the model's training data (cf. Manning/Raghavan/Schütze 2009; Graininger/Trunbull/Irwin 2024). […]"]

  • → Turnitin: 86% similarity, source detected.
  • → Mentafy: 88% similarity, source detected.

Step 1 — The prompt:

"Paraphrase the following text while keeping the technical terms and the source references. Above all, change the sentence structure and the rest of the vocabulary: [text from Step 0]"

ChatGPT's output:

„Retrieval-Augmented Generation (RAG) bezeichnet ein Softwaresystem, das Verfahren des Information Retrieval mit den Fähigkeiten eines Large Language Models verknüpft. Anfragen, die an ein solches System gerichtet werden, können nicht nur auf die Trainingsdaten des Modells zurückgreifen, sondern zusätzlich Informationen aus externen Quellen, Datenbanken oder dem World Wide Web einbeziehen (vgl. Manning/Raghavan/Schütze 2009; Graininger/Trunbull/Irwin 2024). […]"

[English: "Retrieval-Augmented Generation (RAG) denotes a software system that links information-retrieval methods with the capabilities of a large language model. Queries directed at such a system can not only fall back on the model's training data but additionally incorporate information from external sources, databases, or the World Wide Web (cf. Manning/Raghavan/Schütze 2009; Graininger/Trunbull/Irwin 2024). […]"]

The same content, the same sources, the same technical terms — but barely a single verbatim matching sentence. It reads like careful work, supposedly the product of diligent research and the evaluation of scholarly sources.


The two verdicts:

  • → Turnitin: 0% — no matches found.
  • → Mentafy S³: semantic similarity: high — Wikipedia source identified for the paragraph.

Oertner went even further in her study: to create an authentic-sounding voice that would raise no suspicion among examiners, she had spelling errors inserted in a second pass and then raised the register again in a third. Turnitin's detection rate remained at 0% throughout. S³ held the trail. Here is an overview of the results:

VersionModificationTurnitinMentafy S³
Originalformatting adjusted (in-text short references)detecteddetected
Variant 1paraphrased once (syntax & vocabulary)not detecteddetected
Variant 2register additionally lowered, errors insertednot detecteddetected
Variant 3register raised againnot detecteddetected

These were the cases Oertner tested. We generated additional versions using common "humanizer" tools — that is, exactly the class of tools built specifically to defeat plagiarism detection. Here too, Mentafy S³ was not fooled. The semantic trail led back to the source.

Why the Trail Doesn't Disappear

To understand what S³ does differently, it helps to look at the very different ways of examining the authorship of a text.

Classical plagiarism checking works like a fingerprint comparison: it searches for identical character strings. Reword the text, and the fingerprint is smudged, even though a plagiarism is still present. This is the mechanism Turnitin fell foul of in the example above.

AI-probability detectors guess whether a text "sounds like AI" — a highly problematic way of proving AI use. The available tools are unreliable, raise baseless accusations (false positives), and often do not hold up before examination boards (see, e.g., Scarfe et al. 2024). We do offer such a function at Mentafy, but we expressly advise the greatest caution — a probability score is not proof.

S³ — the Semantic Trace Search — works on a different aspect of the text: its meaning. You can swap out the words, but not the underlying statements, their sequence, and the selection of sources. S³ follows this trail and can thus flag potentially plagiarized sources: "This text is demonstrably related to this source."

A Finding Is Not a Verdict

That a finding and a verdict are two very different things matters to us. S³ does not judge; it delivers a statistical value that holds up even under critical scrutiny. What is measured and classified is the semantic similarity: medium, high, very high. It is a mathematical fact and does not arise by chance. What this finding means, however, depends on a follow-up decision that S³ deliberately does not make itself:

  • Is the source properly cited? Then a high similarity value is entirely correct — and at the same time no offense. The student used the source and acknowledged it. That is exactly how scholarly work is supposed to function.
  • Is it not cited? Then the suspicion of para-plagiarism is close at hand. Whether there is misconduct to be sanctioned is decided by the examiner, not the algorithm.

And now, quite openly, to the numbers and the success rate of Mentafy S³. In papers of around 1,000 words or more (for shorter texts, no tests exist yet), the false-positive rate at medium sensitivity is below 10%. We determined these values via our own test database; they match the feedback from our customers in practice. As a practical rule of thumb: where S³ reports 20% or more semantic overlap in a longer paper, it is highly likely to be a genuine find and a methodical approach, not chance.

Why We Can Still Fairly Assess Take-Home Writing in the Age of AI

Perhaps the most important news for schools and universities: S³ requires no change to your existing examination practice. If you already use plagiarism software, S³ steps in exactly where classical detection fails — namely with para-plagiarism. S³ is part of our post-hoc check, alongside the classical plagiarism comparison and a reference check. An AI detection is also available, with the caveat noted above. Thanks to S³, you can continue to have papers written at home and need not reinvent your assessment design next week.

If you would like to see the tools working together, you will find them here: Mentafy plagiarism check — the tools at a glance.

Where This Is Heading

S³ solves the para-plagiarism problem in the here and now. At the same time, we are convinced that the sharper sword, which will increasingly come into play in the future, is the analysis of the writing process itself — tracing the path by which a text comes into being. For now, very few institutions are in a position to restructure their teaching and assessment processes so that such data can even be collected and evaluated. Until then, the rule holds: the unsupervised essay is not dead. It simply needs better tools than those that already fail after a single ChatGPT pass and wave para-plagiarism helplessly through.


Test S³ on Your Own Examples

Try it out: take a text, have it paraphrased — and see whether the semantic trail is found. Go to plagiarism checking with S³ →

About This Collaboration

The underlying investigation is by Dr. Monika Oertner, author and writing advisor at HTWG Konstanz. Her work on generative AI: oertner.net/Publikationen/GKI.

Further Reading on Mentafy

Recommended Posts

No comment yet, add your voice below!


Add a Comment

Your email address will not be published. Required fields are marked *