concept
active
concept:non-linear-representation-hypothesisNon-Linear Representation Hypothesis
Hypothesis that information may be encoded in arbitrary non-linear subspaces of a neural network
Neighborhood — ranked by edge-count
Papers (1)
paper
Thinkers (2)
thinker
- Max Tegmarkstudies
- Tiago PimentelstudiesReframed probes as measuring accessibility of information; cited for probe methodology critique
Frameworks (1)
framework
- The hypothesis that models internalize concepts as approximately linear directions in representation space; used to interpret MDS injection behavior
Methods (1)
method
- Non-Linear Alignment Map (ϕ_nonlin)implementsAlignment map implemented as a reversible residual network (RevNet); assumes non-linear representation hypothesis
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Core contribution: the impasse where lifting linearity in alignment maps makes causal abstraction vacuous, but keeping it may miss non-linearly encoded features
- The idea that features are encoded as directions in activation space.
- Recent work identifying cases where LLM features are not one-dimensionally linear, a caveat to the linearity hypothesis.
- Linear representation hypothesis: neural networks represent meaningful concepts as directions in their activation spaces.hypothesis0.785Foundation for interpreting features as linear directions.
- Interpretive claim about what linear DAS results actually tell us
- Authors' overall conclusion from number of interpretable features, activation-level correspondence to intensity, sensible logit weights, and interference weights
- The central object of study — the idea that a concept like truth is encoded as a direction in the LLM's latent space
- How can non-linear reflection dynamics be formalized using probabilistic modeling and information theory?question0.749Theoretical open question about the mathematical treatment of reflection mechanisms.