Concept Cones

The central framework this paper extends from refusal to propositional truth; identifies multi-dimensional subspaces that causally mediate target behaviors

Neighborhood — ranked by edge-count

Papers (1)

paper

From Directions to Cones: Exploring Multidimensional Representations of Propositional Facts in LLMs
citesextends

Methods (4)

method

Loss-Guided Concept Cone Discovery
implements
Optimization procedure that learns orthonormal basis vectors satisfying causal truth and retention constraints via composite loss
Activation Addition
uses
Intervention method that adds a learned direction vector to residual stream activations to steer model behavior
Directional Ablation
uses
Intervention method that removes a direction from residual stream activations to disrupt corresponding behavior
Monte Carlo Cone Sampling
uses
Procedure for sampling 64 random nonnegative combinations of cone basis vectors to evaluate the full cone distribution

Concepts (1)

concept

Truth Subspace
introduces
The multi-dimensional activation subspace whose directions causally mediate truthful behavior in LLMs

Frameworks (1)

framework

Linear Representation Hypothesis
extends
The hypothesis that models internalize concepts as approximately linear directions in representation space; used to interpret MDS injection behavior

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

concept geometryconcept0.782
The spatial/geometric organization of conceptual structure within neural network representations; central to the paper's thesis.
Conceptconcept0.759
Central entity of Jackson's framework: a structure invented to give coherent account of immediate consequences of actions; the building block of software design
Concept Steeringmethod0.751
Latent intervention technique that manipulates sparse features to steer model predictions toward desired concepts.
concept vectorconcept0.744
Computed directional vector in activation space representing a specific concept, used for injection experiments
Concept Latticeconcept0.743
Central algebraic structure in FCA that orders formal concepts and preserves information from formal contexts.
concept representationconcept0.742
How a neural network encodes a semantic concept internally, argued to be better captured by manifolds than by atomic features.
Concept Algebraframework0.741
Probabilistic framework formalizing concept-specific subspaces for targeted steering in generative models.
Concept-specific periodconcept0.739
The natural cycle length of a cyclic concept (e.g., 12 for months, 7 for days of the week)