framework
active
framework:concept-cones

Concept Cones

The central framework this paper extends from refusal to propositional truth; identifies multi-dimensional subspaces that causally mediate target behaviors

Neighborhood — ranked by edge-count

Methods (4)

method
  • Optimization procedure that learns orthonormal basis vectors satisfying causal truth and retention constraints via composite loss
  • Intervention method that adds a learned direction vector to residual stream activations to steer model behavior
  • Intervention method that removes a direction from residual stream activations to disrupt corresponding behavior
  • Procedure for sampling 64 random nonnegative combinations of cone basis vectors to evaluate the full cone distribution

Concepts (1)

concept
  • Truth Subspace
    introduces
    The multi-dimensional activation subspace whose directions causally mediate truthful behavior in LLMs

Frameworks (1)

framework
  • The hypothesis that models internalize concepts as approximately linear directions in representation space; used to interpret MDS injection behavior

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

  • concept geometryconcept0.782
    The spatial/geometric organization of conceptual structure within neural network representations; central to the paper's thesis.
  • Conceptconcept0.759
    Central entity of Jackson's framework: a structure invented to give coherent account of immediate consequences of actions; the building block of software design
  • Concept Steeringmethod0.751
    Latent intervention technique that manipulates sparse features to steer model predictions toward desired concepts.
  • concept vectorconcept0.744
    Computed directional vector in activation space representing a specific concept, used for injection experiments
  • Concept Latticeconcept0.743
    Central algebraic structure in FCA that orders formal concepts and preserves information from formal contexts.
  • How a neural network encodes a semantic concept internally, argued to be better captured by manifolds than by atomic features.
  • Concept Algebraframework0.741
    Probabilistic framework formalizing concept-specific subspaces for targeted steering in generative models.
  • The natural cycle length of a cyclic concept (e.g., 12 for months, 7 for days of the week)