concept
active
concept:digit-token-logit-distributionDigit-token logit distribution
Full distribution over tokens 0-9 at first generation step; contains more information than any single sampled token
Neighborhood — ranked by edge-count
Methods (1)
method
- Logit-based self-reportimplementsPrimary self-report measure: probability-weighted expected value over all ten digit-token logits, yielding a continuous rating that preserves full distributional signal
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Basic unit of LLM input/output: words, parts of words, punctuation marks, emojis
- The methodological confound identified by this paper: injection biases model toward 'YES' for any binary question regardless of content
- Conjugate prior for categorical variables; used for beliefs about likelihood matrix A.
- In active inference, the distribution over goal states; here replaced by the learned self-prior rather than a hand-specified prior
- Computing each feature's linear effect on output token logits via path expansion through MLP output weights and unembedding matrix
- The distribution of latent representations produced by the model under unperturbed inputs
- Probability distribution over discrete states or outcomes.
- Feature that fires on a specific token only within a specific surrounding context (e.g., 'the' in physics vs 'the' in mathematics)