concept
active
concept:non-linear-representation-dilemmaNon-Linear Representation Dilemma
Core contribution: the impasse where lifting linearity in alignment maps makes causal abstraction vacuous, but keeping it may miss non-linearly encoded features
Neighborhood — ranked by edge-count
Papers (1)
paper
Questions (2)
question
- Identified as exciting future work direction
- Practical question the paper attempts to answer in its conclusion
Concepts (1)
concept
- Probing Complexity–Accuracy Trade-offanalogous_toLongstanding debate from probing literature about whether complex probes reveal genuine encodings or just memorise; this paper revives it for causal abstraction
Findings (1)
finding
- Central theoretical result proving unrestricted causal abstraction is trivial
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Hypothesis that information may be encoded in arbitrary non-linear subspaces of a neural network
- Recent work identifying cases where LLM features are not one-dimensionally linear, a caveat to the linearity hypothesis.
- The idea that features are encoded as directions in activation space.
- The distribution of latent representations produced by the model under unperturbed inputs
- The hypothesis that models internalize concepts as approximately linear directions in representation space; used to interpret MDS injection behavior
- The finding that interpretable concepts including character traits are encoded as linear directions in transformer residual streams
- Architectural requirement from machine learning.
- Interpretive claim about what linear DAS results actually tell us