question
active
question:do-divergent-representations-change-what-an-intervention-can-say-about-an-nn-s-natural-mechanismsDo divergent representations change what an intervention can say about an NN's natural mechanisms?
Core research question motivating the paper
Source paper
extracted_from(2025) · Satchel Grant · Simon Jerome Han · Alexa R. Tartaglini · Christopher Potts
Neighborhood — ranked by edge-count
Claims (1)
claim
- Core claim about why pernicious divergence undermines mechanistic conclusions
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Core empirical claim of the paper supported by both theoretical proof and empirical demonstration
- Third core research question motivating the CL loss approach in Section 5
- Load-bearing description of the core pernicious divergence mechanism illustrated in Figure 1
- Opening sentence framing the paper's core inquiry.
- Key insight that rotating a neural representation to a non-standard basis can reveal distributed causal structure invisible in standard neuron-aligned basis.
- Neural representation geometry causally shapes behavior; interventions respecting that geometry will yield natural trajectories.hypothesis0.760Central hypothesis tested via manifold steering experiments across language models and video world models.
- Core question motivating the shift from linear to geometry-aware steering; answered via manifold alignment analysis.
- The motivating research question of the paper