concept
active
concept:linear-representation-of-featuresLinear Representation of Features
The central object of study — the idea that a concept like truth is encoded as a direction in the LLM's latent space
Neighborhood — ranked by edge-count
Papers (1)
paper
Concepts (2)
concept
- Linear representationrelated_toThe idea that features are encoded as directions in activation space.
- Truth Direction in LLM Latent Spaceassociated_withA specific direction in an LLM's residual stream that encodes the truth or falsehood of factual statements
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- The hypothesis that models internalize concepts as approximately linear directions in representation space; used to interpret MDS injection behavior
- The finding that interpretable concepts including character traits are encoded as linear directions in transformer residual streams
- The sequential, continuous order of text, often challenged by diagrammatic branching.
- Semantic domain for linear transformations; denotation as actual linear function; Category instance generated from homomorphism principle.
- Correlative technique measuring the type of information encoded in distributed representations via linear predictability.
- The idea that programs can be expressed as logical sentences, enabling direct deductive verification.
- Research thread within About Blank concerning the structure and relational properties of neural network feature representations; covariance pooling tangentially supports this thread.
- Method of optimizing input to cause a neuron to fire maximally, used to characterize what a neuron detects; establishes causal link