concept
active
concept:orthonormal-basis-vectorsOrthonormal Basis Vectors
The set of mutually orthogonal unit vectors that span the concept cone, each independently causally mediating target behavior
Neighborhood — ranked by edge-count
Methods (1)
method
- Loss-Guided Concept Cone DiscoveryintroducesOptimization procedure that learns orthonormal basis vectors satisfying causal truth and retention constraints via composite loss
Concepts (1)
concept
- Truth Subspaceassociated_withThe multi-dimensional activation subspace whose directions causally mediate truthful behavior in LLMs
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Baseline method sampling a random vector as feature direction for comparison with learned methods
- Vectors acquired during pretraining in Backpack LMs that have a multiplication effect on model generation
- Key insight that rotating a neural representation to a non-standard basis can reveal distributed causal structure invisible in standard neuron-aligned basis.
- Prior framework for monitoring and controlling character traits in LLMs via activation directions; this paper extends it to 275 roles
- Future direction hypothesis for giving semantic meaning to individual axes
- A method for modifying model behavior by adding perturbation vectors to activations, used here to try to reduce eval awareness.
- Layer-40 activations with the component explained by compressed Gemini embeddings subtracted, isolating information not driven by surface text content
- Central open question for future work on interpretability of cone axes