Non-Linear Representation Hypothesis

Hypothesis that information may be encoded in arbitrary non-linear subspaces of a neural network

Neighborhood — ranked by edge-count

Papers (1)

paper

The Non-Linear Representation Dilemma: Is Causal Abstraction Enough for Mechanistic Interpretability?
introduces

Thinkers (2)

thinker

Max Tegmark
studies
Tiago Pimentel
studies
Reframed probes as measuring accessibility of information; cited for probe methodology critique

Frameworks (1)

framework

Linear Representation Hypothesis
extends
The hypothesis that models internalize concepts as approximately linear directions in representation space; used to interpret MDS injection behavior

Methods (1)

method

Non-Linear Alignment Map (ϕ_nonlin)
implements
Alignment map implemented as a reversible residual network (RevNet); assumes non-linear representation hypothesis

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Non-Linear Representation Dilemmaconcept0.824
Core contribution: the impasse where lifting linearity in alignment maps makes causal abstraction vacuous, but keeping it may miss non-linearly encoded features
Linear representationconcept0.803
The idea that features are encoded as directions in activation space.
Non-Linear Representations in LLMsconcept0.789
Recent work identifying cases where LLM features are not one-dimensionally linear, a caveat to the linearity hypothesis.
Linear representation hypothesis: neural networks represent meaningful concepts as directions in their activation spaces.hypothesis0.785
Foundation for interpreting features as linear directions.
Assuming linear representations enables identifying the location of certain variables in a DNN, but many insights fail to generalise when more powerful non-linear maps are usedclaim0.779
Interpretive claim about what linear DAS results actually tell us
Results collectively provide strong evidence that some version of the superposition hypothesis and linear representation hypothesis is trueclaim0.765
Authors' overall conclusion from number of interpretable features, activation-level correspondence to intensity, sensible logit weights, and interference weights
Linear Representation of Featuresconcept0.761
The central object of study — the idea that a concept like truth is encoded as a direction in the LLM's latent space
How can non-linear reflection dynamics be formalized using probabilistic modeling and information theory?question0.749
Theoretical open question about the mathematical treatment of reflection mechanisms.