Internal model representations

The latent activations or embeddings inside a neural network.

Neighborhood — ranked by edge-count

concept

Feedback loops from internal model representations
associated_with
Mechanism of using internal activations or representations to create corrective feedback during generation.

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

LLM Internal Representationsconcept0.849
High-dimensional vectors produced at each transformer layer for each input token; the primary substrate analyzed in this study.
Where and how is information stored in model-internal representations?question0.844
Core question motivating interchange intervention and interpretability research supported by pyvene
Stateful Internal Representationconcept0.815
A representation that maintains stable activation across many tokens rather than being locally triggered by specific content
Model Internal Beliefconcept0.780
The latent representational state of a model's answer confidence as decoded from activations, distinct from what appears in generated text
Structure in representationsconcept0.780
The central question of whether representational geometry implies corresponding computational structure
modelconcept0.757
A representation that captures relevant aspects of a system; according to the theorem, the regulator must embody this.
Internal Featuresconcept0.755
Representations inside LLMs that can be intervened upon.
The geometry of internal representations and the geometry of model behavior share a precise correspondence — representation geometry is a window into the inner world of neural networks.claim0.755
The paper's deepest interpretive claim, asserting that representation structure and behavioral structure are not coincidentally aligned but deeply connected.