Loss-Guided Concept Cone Discovery

Optimization procedure that learns orthonormal basis vectors satisfying causal truth and retention constraints via composite loss

Neighborhood — ranked by edge-count

Papers (1)

paper

From Directions to Cones: Exploring Multidimensional Representations of Propositional Facts in LLMs
introduces

Frameworks (1)

framework

Concept Cones
implements
The central framework this paper extends from refusal to propositional truth; identifies multi-dimensional subspaces that causally mediate target behaviors

Concepts (3)

concept

Orthonormal Basis Vectors
introduces
The set of mutually orthogonal unit vectors that span the concept cone, each independently causally mediating target behavior
Binary Generation Constraint
uses
Implementation technique zeroing all logits except Yes/No tokens to convert steering into binary cross-entropy
L_retain Loss Term
uses
Regularization component of the composite loss that penalizes deviation from baseline model behavior on Alpaca instructions

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Concept cone truth interventions would generalize to larger frontier models and multimodal settingshypothesis0.702
Key robustness question raised as future work
The cone is the geometrical primitive for care: multidimensional goal space where cognitive and physiological light cones interact through resonance, response, and signal.claim0.700
Cognitive Light Coneconcept0.696
Concept defining self by the spatiotemporal scale and nature of goals a system can pursue; limits of concern demarcate identity.
Concept cone methodology failed to produce a meaningful cone for sentiment on Stanford Sentiment Treebankfinding0.692
Negative result from sentiment extension showing concept cones do not trivially generalize
Semantic Labeling of Cone Axesconcept0.685
The open problem of assigning interpretable semantic meaning to individual cone basis vectors (e.g., temporal vs geographic facts)
Individual cone basis vectors may correspond to interpretable semantic facets of truth such as temporal facts, geographic facts, or commonsensehypothesis0.680
Future direction hypothesis for giving semantic meaning to individual axes
If loss keeps going down on the test set, in the limit the model must be learning to interpret and predict all patterns represented in language, including common-sense reasoning, goal-directed optimization, and deployment of the sum of recorded human knowledge.hypothesis0.677
Extrapolation of scaling predictive models to AGI.
What semantic labels correspond to the individual basis vectors of the truth cone?question0.676
Central open question for future work on interpretability of cone axes