concept
active
concept:internal-model-representationsInternal model representations
The latent activations or embeddings inside a neural network.
Neighborhood — ranked by edge-count
Concepts (1)
concept
- Feedback loops from internal model representationsassociated_withMechanism of using internal activations or representations to create corrective feedback during generation.
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- High-dimensional vectors produced at each transformer layer for each input token; the primary substrate analyzed in this study.
- Core question motivating interchange intervention and interpretability research supported by pyvene
- A representation that maintains stable activation across many tokens rather than being locally triggered by specific content
- The latent representational state of a model's answer confidence as decoded from activations, distinct from what appears in generated text
- The central question of whether representational geometry implies corresponding computational structure
- A representation that captures relevant aspects of a system; according to the theorem, the regulator must embody this.
- Representations inside LLMs that can be intervened upon.
- The paper's deepest interpretive claim, asserting that representation structure and behavioral structure are not coincidentally aligned but deeply connected.