Non-Linear Representations in LLMs

Recent work identifying cases where LLM features are not one-dimensionally linear, a caveat to the linearity hypothesis.

Neighborhood — ranked by edge-count

paper

framework

Linear Representation Hypothesis
contradicts
The hypothesis that models internalize concepts as approximately linear directions in representation space; used to interpret MDS injection behavior

concept

Linear Representation of Concepts in LLMs
related_to
The finding that interpretable concepts including character traits are encoded as linear directions in transformer residual streams

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

LLM Internal Representationsconcept0.825
High-dimensional vectors produced at each transformer layer for each input token; the primary substrate analyzed in this study.
Non-Linear Representation Dilemmaconcept0.816
Core contribution: the impasse where lifting linearity in alignment maps makes causal abstraction vacuous, but keeping it may miss non-linearly encoded features
Gender Representation in LLMsconcept0.811
The case study target in Section 4: localizing gender information in hidden representations of Pythia-6.9B
Linear World Models in LLMsframework0.806
Prior work framework studying whether LLMs encode world models as linear structures in their representations
Linear representationconcept0.797
The idea that features are encoded as directions in activation space.
Non-Linear Representation Hypothesisconcept0.789
Hypothesis that information may be encoded in arbitrary non-linear subspaces of a neural network
LLMs linearly represent truth-relevant information beyond the plausibility of text, as evidenced by probes trained on likely performing poorly on anti-correlated datasetsclaim0.788
Establishes that the observed linear structure is not merely a representation of text probability
What is the relationship between different dimensions or clusters of dimensions in LLM representations? Do they and/or how do they interact with each other?question0.785
Motivates the RN hypothesis by pointing to the unknown relational structure within high-dimensional representation vectors.