concept
active
concept:linear-representationLinear representation
The idea that features are encoded as directions in activation space.
Neighborhood — ranked by edge-count
Papers (1)
paper
Thinkers (1)
thinker
- Neel NandastudiesExternal commenter; resolved apparent counterexample to linear representation hypothesis
Frameworks (1)
framework
- Superposition HypothesissupportsCore theoretical framework: neural networks represent more features than neurons by encoding features as directions in superposition
Methods (1)
method
- Sparse Autoencoders (SAE)implementsInterpretability method criticized in this paper for shattering manifolds into atomic pieces, obscuring overarching semantic structure.
Concepts (2)
concept
- Linear Representation of Featuresrelated_toThe central object of study — the idea that a concept like truth is encoded as a direction in the LLM's latent space
- Truth Directionassociated_withA hypothesized direction in LLM activation space that encodes the truth or falsehood of factual statements
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- The hypothesis that models internalize concepts as approximately linear directions in representation space; used to interpret MDS injection behavior
- The finding that interpretable concepts including character traits are encoded as linear directions in transformer residual streams
- Core contribution: the impasse where lifting linearity in alignment maps makes causal abstraction vacuous, but keeping it may miss non-linearly encoded features
- The idea that programs can be expressed as logical sentences, enabling direct deductive verification.
- The sequential, continuous order of text, often challenged by diagrammatic branching.
- Hypothesis that information may be encoded in arbitrary non-linear subspaces of a neural network
- Correlative technique measuring the type of information encoded in distributed representations via linear predictability.
- Semantic domain for linear transformations; denotation as actual linear function; Category instance generated from homomorphism principle.