concept
active
concept:llm-internal-representationsLLM Internal Representations
High-dimensional vectors produced at each transformer layer for each input token; the primary substrate analyzed in this study.
Neighborhood — ranked by edge-count
Thinkers (2)
thinker
- Charlotte CaucheteuxstudiesAuthor whose work on comparing LLM representations with human brain activity during NLP informed this study's layer sampling strategy.
- Martin SchrimpfstudiesAuthor whose findings suggest intermediate-to-deep LLM layers best predict human brain activity; guides layer selection in this study.
Frameworks (1)
framework
- Novel construct introduced by this paper: a hypothetical graph embedded in the time series of LLM representations, where each dimension is a node and latent connections are edges.
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- The latent activations or embeddings inside a neural network.
- The finding that interpretable concepts including character traits are encoded as linear directions in transformer residual streams
- The authors' interpretive assertion based on their steering results.
- Recent work identifying cases where LLM features are not one-dimensionally linear, a caveat to the linearity hypothesis.
- A representation that maintains stable activation across many tokens rather than being locally triggered by specific content
- The case study target in Section 4: localizing gender information in hidden representations of Pythia-6.9B
- Core question motivating interchange intervention and interpretability research supported by pyvene
- Core claim directly challenged by prior work denying introspection; forms foundation for Koan Battery introspection studies.