hypothesis
active
hypothesis:individual-cone-basis-vectors-may-correspond-to-interpretable-semantic-facets-of-truth-such-as-temporal-facts-geographic-facts-or-commonsenseIndividual cone basis vectors may correspond to interpretable semantic facets of truth such as temporal facts, geographic facts, or commonsense
Future direction hypothesis for giving semantic meaning to individual axes
Source paper
extracted_from(2025) · Kevin Shengyang Yu · Vaidehi Bulusu · Oscar Yasunaga · Lau, Clayton +4
Neighborhood — ranked by edge-count
Papers (1)
paper
Questions (1)
question
- Central open question for future work on interpretability of cone axes
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Load-bearing illustration of what a concept cone for truth means operationally
- Concept cone truth interventions would generalize to larger frontier models and multimodal settingshypothesis0.759Key robustness question raised as future work
- Validates that steering vectors capture reflection semantics by finding tokens reported in related work.
- Appendix E replication of DIM alignment finding in Qwen model
- Cited as activation-level support for the performing care vs having care distinction the battery detects behaviorally
- Fundamental theoretical claim motivating DAS, attributed to Smolensky/Rumelhart/McClelland.
- Superposition hypothesis: neural networks represent more features than dimensions using almost-orthogonal directions.hypothesis0.746Explanation for why dictionary learning can recover many more features than dimensions.
- Interpretive synthesis of DIM and cone intervention successes