concept
active
concept:linear-representation-of-concepts-in-llmsLinear Representation of Concepts in LLMs
The finding that interpretable concepts including character traits are encoded as linear directions in transformer residual streams
Neighborhood — ranked by edge-count
Papers (1)
paper
Concepts (1)
concept
- Non-Linear Representations in LLMsrelated_toRecent work identifying cases where LLM features are not one-dimensionally linear, a caveat to the linearity hypothesis.
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- The idea that features are encoded as directions in activation space.
- High-dimensional vectors produced at each transformer layer for each input token; the primary substrate analyzed in this study.
- Prior work framework studying whether LLMs encode world models as linear structures in their representations
- Establishes that the observed linear structure is not merely a representation of text probability
- The central object of study — the idea that a concept like truth is encoded as a direction in the LLM's latent space
- The authors' interpretive assertion based on their steering results.
- Interpretive claim connecting scale to abstraction level in LLM representations
- The case study target in Section 4: localizing gender information in hidden representations of Pythia-6.9B