concept
active
concept:non-linear-representations-in-llmsNon-Linear Representations in LLMs
Recent work identifying cases where LLM features are not one-dimensionally linear, a caveat to the linearity hypothesis.
Neighborhood — ranked by edge-count
Papers (1)
paper
Frameworks (1)
framework
- Linear Representation HypothesiscontradictsThe hypothesis that models internalize concepts as approximately linear directions in representation space; used to interpret MDS injection behavior
Concepts (1)
concept
- The finding that interpretable concepts including character traits are encoded as linear directions in transformer residual streams
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- High-dimensional vectors produced at each transformer layer for each input token; the primary substrate analyzed in this study.
- Core contribution: the impasse where lifting linearity in alignment maps makes causal abstraction vacuous, but keeping it may miss non-linearly encoded features
- The case study target in Section 4: localizing gender information in hidden representations of Pythia-6.9B
- Prior work framework studying whether LLMs encode world models as linear structures in their representations
- The idea that features are encoded as directions in activation space.
- Hypothesis that information may be encoded in arbitrary non-linear subspaces of a neural network
- Establishes that the observed linear structure is not merely a representation of text probability
- Motivates the RN hypothesis by pointing to the unknown relational structure within high-dimensional representation vectors.