Digit-token logit distribution

Full distribution over tokens 0-9 at first generation step; contains more information than any single sampled token

Neighborhood — ranked by edge-count

method

Logit-based self-report
implements
Primary self-report measure: probability-weighted expected value over all ten digit-token logits, yielding a continuous rating that preserves full distributional signal

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Tokenconcept0.760
Basic unit of LLM input/output: words, parts of words, punctuation marks, emojis
global logit shiftconcept0.749
The methodological confound identified by this paper: injection biases model toward 'YES' for any binary question regardless of content
Dirichlet Distributionconcept0.727
Conjugate prior for categorical variables; used for beliefs about likelihood matrix A.
Preferred Distributionconcept0.719
In active inference, the distribution over goal states; here replaced by the learned self-prior rather than a hand-specified prior
Logit Weight Analysismethod0.709
Computing each feature's linear effect on output token logits via path expansion through MLP output weights and unembedding matrix
Natural Distribution of Representationsconcept0.707
The distribution of latent representations produced by the model under unperturbed inputs
Categorical Distributionconcept0.704
Probability distribution over discrete states or outcomes.
Token-in-Context Featureconcept0.704
Feature that fires on a specific token only within a specific surrounding context (e.g., 'the' in physics vs 'the' in mathematics)