LLM Internal Representations

High-dimensional vectors produced at each transformer layer for each input token; the primary substrate analyzed in this study.

Neighborhood — ranked by edge-count

thinker

Charlotte Caucheteux
studies
Author whose work on comparing LLM representations with human brain activity during NLP informed this study's layer sampling strategy.
Martin Schrimpf
studies
Author whose findings suggest intermediate-to-deep LLM layers best predict human brain activity; guides layer selection in this study.

framework

Representation Network (RN)
about
Novel construct introduced by this paper: a hypothetical graph embedded in the time series of LLM representations, where each dimension is a node and latent connections are edges.

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Internal model representationsconcept0.849
The latent activations or embeddings inside a neural network.
Linear Representation of Concepts in LLMsconcept0.836
The finding that interpretable concepts including character traits are encoded as linear directions in transformer residual streams
LLMs internalize deeply integrated representations of high-order concepts.claim0.834
The authors' interpretive assertion based on their steering results.
Non-Linear Representations in LLMsconcept0.825
Recent work identifying cases where LLM features are not one-dimensionally linear, a caveat to the linearity hypothesis.
Stateful Internal Representationconcept0.802
A representation that maintains stable activation across many tokens rather than being locally triggered by specific content
Gender Representation in LLMsconcept0.792
The case study target in Section 4: localizing gender information in hidden representations of Pythia-6.9B
Where and how is information stored in model-internal representations?question0.779
Core question motivating interchange intervention and interpretability research supported by pyvene
LLM introspection on internal computations is architecturally permitted; whether models leverage this is an empirical question.claim0.779
Core claim directly challenged by prior work denying introspection; forms foundation for Koan Battery introspection studies.