Orthonormal Basis Vectors

The set of mutually orthogonal unit vectors that span the concept cone, each independently causally mediating target behavior

Neighborhood — ranked by edge-count

method

Loss-Guided Concept Cone Discovery
introduces
Optimization procedure that learns orthonormal basis vectors satisfying causal truth and retention constraints via composite loss

concept

Truth Subspace
associated_with
The multi-dimensional activation subspace whose directions causally mediate truthful behavior in LLMs

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Random vector baselinemethod0.740
Baseline method sampling a random vector as feature direction for comparison with learned methods
Sense Vectorsconcept0.724
Vectors acquired during pretraining in Backpack LMs that have a multiplication effect on model generation
Change-of-Basis for Neural Representationsconcept0.723
Key insight that rotating a neural representation to a non-standard basis can reveal distributed causal structure invisible in standard neuron-aligned basis.
Persona Vectors (Chen et al.)framework0.712
Prior framework for monitoring and controlling character traits in LLMs via activation directions; this paper extends it to 275 roles
Individual cone basis vectors may correspond to interpretable semantic facets of truth such as temporal facts, geographic facts, or commonsensehypothesis0.708
Future direction hypothesis for giving semantic meaning to individual axes
steering vectorsconcept0.705
A method for modifying model behavior by adding perturbation vectors to activations, used here to try to reduce eval awareness.
Residual Activation Vectorsconcept0.703
Layer-40 activations with the component explained by compressed Gemini embeddings subtracted, isolating information not driven by surface text content
What semantic labels correspond to the individual basis vectors of the truth cone?question0.701
Central open question for future work on interpretability of cone axes