
An LLM predicts the next word based on probability distribution. Let ๐ ( ๐ค 1 , ๐ค 2 , โฆ , ๐ค ๐ ) P(w 1 โ ,w 2 โ ,โฆ,w N โ ) be the probability of a sequence of words. The perplexity is defined as: ๐ ๐ ( ๐ ) = ๐ ( ๐ค 1 , ๐ค 2 , โฆ , ๐ค ๐ ) โ 1 ๐ PP(W)=P(w 1 โ ,w 2 โ ,โฆ,w N โ ) โ N 1 โ Or, using the chain rule of probability: ๐ ๐ ( ๐ ) = โ ๐ = 1 ๐ 1 ๐ ( ๐ค ๐ โฃ ๐ค 1 , โฆ , ๐ค ๐ โ 1 ) ๐ PP(W)= N โ i=1 N โ P(w i โ โฃw 1 โ ,โฆ,w iโ1 โ ) 1 โ โ Detectors look for low perplexity (high probability). The prompt instruction "Do not choose the most statistically probable next token" forces the model to select tokens from lower in the probability distribution (e.g., the 3rd or 4th most likely word rather than the 1st), artificially inflating the ๐ ๐ PP value to match human levels.