concept
active
concept:anomalous-tokens

Anomalous Tokens

Extremely rare or never-used vocabulary elements that may distort logit weight analysis; excluded from feature analysis

Neighborhood — ranked by edge-count

Methods (1)

method
  • Logit Weight Analysis
    associated_with
    Computing each feature's linear effect on output token logits via path expansion through MLP output weights and unembedding matrix

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

  • Tokenconcept0.751
    Basic unit of LLM input/output: words, parts of words, punctuation marks, emojis
  • Token embeddingsconcept0.724
    Vector representations of individual tokens from genomic foundation models; the raw inputs to sequence pooling methods.
  • Functional Tokenconcept0.723
    A discrete token in the vocabulary that represents a visual operation (e.g., <|Line|>, <|Shape|>, <|Text|>), generated via next-token prediction within autoregressive sequences.
  • The training objective of LLMs: predicting the most likely next token given context; formally P(w_{n+1}|w_1...w_n)
  • Feature that fires on a specific token only within a specific surrounding context (e.g., 'the' in physics vs 'the' in mathematics)
  • A possible circuit that triggers when activations deviate from expected values, hypothesized to underlie noticing injected thoughts.
  • Baseline steering method that applies intervention at every token generation step, shown to degrade performance at high strengths
  • An attention head that primarily attends to the immediately preceding token; key building block for induction heads via K-composition