concept
active
concept:internal-features

Internal Features

Representations inside LLMs that can be intervened upon.

Neighborhood — ranked by edge-count

Concepts (1)

concept

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

  • Criterion requiring that causal influence of internal state on description be internal, not routed through sampled outputs; rules out pseudo-introspection via self-observation.
  • The latent activations or embeddings inside a neural network.
  • The idea that the geometry and dynamics of genetic material itself contributes directional ordering to evolution, beyond external Darwinian selective pressure
  • The model's internal representation of uncertainty hypothesized to trigger self-reflection
  • A representation that maintains stable activation across many tokens rather than being locally triggered by specific content
  • The possibility of a stably encoded, causally active emotional state within LLMs, as distinct from token-by-token semantic content
  • Features that activate when the model is asked about itself, invoking AI tropes and anthropomorphization.
  • The inferred mechanism underlying ESR whereby the model tracks coherence of its own outputs