framework
active
framework:concept-conesConcept Cones
The central framework this paper extends from refusal to propositional truth; identifies multi-dimensional subspaces that causally mediate target behaviors
Neighborhood — ranked by edge-count
Papers (1)
paper
Methods (4)
method
- Loss-Guided Concept Cone DiscoveryimplementsOptimization procedure that learns orthonormal basis vectors satisfying causal truth and retention constraints via composite loss
- Intervention method that adds a learned direction vector to residual stream activations to steer model behavior
- Intervention method that removes a direction from residual stream activations to disrupt corresponding behavior
- Procedure for sampling 64 random nonnegative combinations of cone basis vectors to evaluate the full cone distribution
Concepts (1)
concept
- Truth SubspaceintroducesThe multi-dimensional activation subspace whose directions causally mediate truthful behavior in LLMs
Frameworks (1)
framework
- The hypothesis that models internalize concepts as approximately linear directions in representation space; used to interpret MDS injection behavior
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- The spatial/geometric organization of conceptual structure within neural network representations; central to the paper's thesis.
- Central entity of Jackson's framework: a structure invented to give coherent account of immediate consequences of actions; the building block of software design
- Latent intervention technique that manipulates sparse features to steer model predictions toward desired concepts.
- Computed directional vector in activation space representing a specific concept, used for injection experiments
- Central algebraic structure in FCA that orders formal concepts and preserves information from formal contexts.
- How a neural network encodes a semantic concept internally, argued to be better captured by manifolds than by atomic features.
- Probabilistic framework formalizing concept-specific subspaces for targeted steering in generative models.
- The natural cycle length of a cyclic concept (e.g., 12 for months, 7 for days of the week)