finding
active
finding:intervention-on-a-balanced-subspace-dimension-while-holding-others-fixed-crosses-the-decision-boundary-using-a-non-native-mechanismIntervention on a balanced subspace dimension while holding others fixed crosses the decision boundary using a non-native mechanism
Additional synthetic example of pernicious divergence from balanced subspaces
Source paper
extracted_from(2025) · Satchel Grant · Simon Jerome Han · Alexa R. Tartaglini · Christopher Potts
Neighborhood — ranked by edge-count
Claims (1)
claim
- Core claim about why pernicious divergence undermines mechanistic conclusions
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Intervention targeting specific dimensional subsets of activation vectors rather than full representations
- Claim about broad impact of studying these dynamics
- Load-bearing interpretive claim about the layer-specificity of Burger et al.'s finding.
- Raises spatial division to a foundational creative gesture.
- Distinguishes between control over reading and enabling/disabling interpretative moves.
- Alexander's claim that the limiting factor in creating living structure is not method but the maker's persistence.
- The core testable hypothesis driving the experimental design
- Mechanistic interpretation of how activation steering induces deception through the model's reasoning process