question
active
question:where-and-how-is-information-stored-in-model-internal-representationsWhere and how is information stored in model-internal representations?
Core question motivating interchange intervention and interpretability research supported by pyvene
Source paper
extracted_from(2024) · Zhengxuan Wu · Atticus Geiger · Aryaman Arora · Jing Huang +4
Neighborhood — ranked by edge-count
Papers (1)
paper
Findings (1)
finding
- Case Study I demonstrating pyvene can replicate a major interpretability result compactly
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- The latent activations or embeddings inside a neural network.
- Mechanism of using internal activations or representations to create corrective feedback during generation.
- High-dimensional vectors produced at each transformer layer for each input token; the primary substrate analyzed in this study.
- A representation that maintains stable activation across many tokens rather than being locally triggered by specific content
- Interpretive claim from probing experiment showing reflection direction features outperform baseline for uncertainty prediction
- The central question of whether representational geometry implies corresponding computational structure
- The paper's deepest interpretive claim, asserting that representation structure and behavioral structure are not coincidentally aligned but deeply connected.
- The causal hypothesis motivating the use of causality (intervention) as the lens connecting representation and behavior geometry.