People and institutions are grappling with the consequences of Text written by AI. Teachers want to know whether students’ work reflects their own understanding; consumers want to know if an ad was written by a human or a machine.
Write rules for govern the use of AI-generated content is relatively easy. Their application depends something much more difficult: reliably detect whether a piece of text was generated by artificial intelligence.
The problem with AI text detection
The basic workflow behind AI text detection is easy to describe. Start with a piece of text whose origin you want to determine. Then apply a detection tool, often an AI system itself, which analyzes the text and produces a score, usually expressed as a probability, indicating how likely the text was to have been generated by AI. Use the score to inform downstream decisions, such as whether to impose a penalty for violating a rule.
This simple description, however, hides great complexity. It ignores a certain number of basic assumptions which must be made explicit. Do you know what AI tools could plausibly have been used to generate the text? What type of access do you have to these tools? Can you run them yourself or inspect their inner workings? How much text do you have? Do you have a single text or a collection of writings collected over time? What AI detection tools can or cannot tell you depends mostly on the answers to questions like these.
There is one additional detail that is particularly important: did the AI system that generated the text deliberately embed markers to facilitate later detection?
These indicators are called watermarks. Watermarked text looks like regular text, but the markers are embedded in subtle ways that aren’t revealed by casual inspection. A person with the correct key can then check for the presence of these markers and verify that the text comes from an AI-generated watermarked source. This approach, however, relies on the cooperation of AI providers and is not always available.
How AI Text Detection Tools Work
An obvious approach is to use the AI itself to detect text written by the AI. The idea is simple. Start by collecting a large corpus, that is, a collection of writings, examples labeled as human-written or AI-generated, and then train a model to distinguish between the two. Indeed, AI text detection is treated as a standard classification problem, similar in spirit to spam filtering. Once trained, the detector examines the new text and predicts whether it is more similar to the AI-generated examples or human-written ones it has seen before.
The learned detector approach can work even if you know little about the AI tools that may have generated the text. The main requirement is that the training corpus be diverse enough to include results from a wide range of AI systems.
But if you have access to the AI tools you’re interested in, a different approach becomes possible. This second strategy does not rely on collecting large sets of labeled data or training a separate detector. Instead, it looks for statistical signals in the text, often related to how specific AI models generate language, to assess whether the text is likely to be generated by AI. For example, some methods look at the probability that an AI model assigns to a piece of text. If the model assigns an unusually high probability to the exact sequence of words, this may indicate that the text was in fact generated by that model.
Finally, in the case of text generated by an AI system incorporating a watermark, the problem shifts from detection to verification. Using a secret key provided by the AI provider, a verification tool can assess whether the text is consistent with having been generated by a watermarked system. This approach relies on information that is not available in the text alone, rather than inferences drawn from the text itself.
Each family of tools comes with one’s own limitsmaking it difficult to declare a clear winner. Learning-based detectors, for example, are sensitive to how new text resembles the data they were trained on. Their accuracy decreases when the text differs significantly from the training corpus, which can quickly become outdated as new AI models are released. Continually retaining new data and retraining detectors is costly, and detectors inevitably lag behind the systems they are intended to identify.
Statistical tests face a different set of constraints. Many rely on assumptions about how specific AI models generate text, or access to the probability distributions of those models. When models are proprietary, frequently updated, or simply unknown, these assumptions break down. As a result, methods that work well in controlled environments may become unreliable or inapplicable in the real world.
Watermarking shifts the problem from detection to verification, but it introduces its own dependencies. It relies on the cooperation of AI providers and only applies to text generated with the watermark enabled.
More broadly, AI text detection is part of a growing arms race. Detection tools must be publicly available to be useful, but that same transparency allows for evasion. As AI text generators become more capable and evasion techniques become more sophisticated, detectors are unlikely to gain the upper hand in the long term.
Harsh reality
The problem of AI text detection is simple to state but difficult to reliably solve. Institutions with rules governing the use of AI-written texts cannot rely solely on detection tools for their enforcement.
As society adapts to generative AI, we will likely refine standards around the acceptable use of AI-generated text and improve detection techniques. But ultimately we will have to learn to live with the fact that these tools will never be perfect.
This edited article is republished from The conversation under Creative Commons license. Read the original article.
