Linear Representation of Features

The central object of study — the idea that a concept like truth is encoded as a direction in the LLM's latent space

Neighborhood — ranked by edge-count

paper

concept

Linear representation
related_to
The idea that features are encoded as directions in activation space.
Truth Direction in LLM Latent Space
associated_with
A specific direction in an LLM's residual stream that encodes the truth or falsehood of factual statements

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Linear Representation Hypothesisframework0.821
The hypothesis that models internalize concepts as approximately linear directions in representation space; used to interpret MDS injection behavior
Linear Representation of Concepts in LLMsconcept0.812
The finding that interpretable concepts including character traits are encoded as linear directions in transformer residual streams
linearityconcept0.780
The sequential, continuous order of text, often challenged by diagrammatic branching.
Linear Map (a ⊸ b)framework0.771
Semantic domain for linear transformations; denotation as actual linear function; Category instance generated from homomorphism principle.
Linear Decodingmethod0.771
Correlative technique measuring the type of information encoded in distributed representations via linear predictability.
logical representation of programsconcept0.770
The idea that programs can be expressed as logical sentences, enabling direct deductive verification.
Geometry of featuresconcept0.767
Research thread within About Blank concerning the structure and relational properties of neural network feature representations; covariance pooling tangentially supports this thread.
Feature Visualizationmethod0.766
Method of optimizing input to cause a neuron to fire maximally, used to characterize what a neuron detects; establishes causal link