The Intuition and Derivation of Perplexity for LLM Evaluation
The intuition When deriving the cross-entropy loss, we’ve shown how entropy plays a central role in the optimization of softmax models (ie. multi-class classification models). All large language models (LLMs) are exactly that - softmax models that for an input sequence of \(t\) tokens \(x=[x_1, x_2, \ldots, x_t]\) output a conditional probability distribution \(P(w|x)\) over the vocabulary \(V\) of all tokens. This distribution gives us the most likely next token(s) to continue the input sequence. ...