method
active
method:loss-guided-concept-cone-discoveryLoss-Guided Concept Cone Discovery
Optimization procedure that learns orthonormal basis vectors satisfying causal truth and retention constraints via composite loss
Neighborhood — ranked by edge-count
Papers (1)
paper
Frameworks (1)
framework
- Concept ConesimplementsThe central framework this paper extends from refusal to propositional truth; identifies multi-dimensional subspaces that causally mediate target behaviors
Concepts (3)
concept
- Orthonormal Basis VectorsintroducesThe set of mutually orthogonal unit vectors that span the concept cone, each independently causally mediating target behavior
- Implementation technique zeroing all logits except Yes/No tokens to convert steering into binary cross-entropy
- Regularization component of the composite loss that penalizes deviation from baseline model behavior on Alpaca instructions
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Concept cone truth interventions would generalize to larger frontier models and multimodal settingshypothesis0.702Key robustness question raised as future work
- Concept defining self by the spatiotemporal scale and nature of goals a system can pursue; limits of concern demarcate identity.
- Concept cone methodology failed to produce a meaningful cone for sentiment on Stanford Sentiment Treebankfinding0.692Negative result from sentiment extension showing concept cones do not trivially generalize
- The open problem of assigning interpretable semantic meaning to individual cone basis vectors (e.g., temporal vs geographic facts)
- Future direction hypothesis for giving semantic meaning to individual axes
- Extrapolation of scaling predictive models to AGI.
- Central open question for future work on interpretability of cone axes