method
active
method:causal-intervention-via-activation-shift

Causal Intervention via Activation Shift

Intervening in model forward pass by adding/subtracting probe direction to group (b) hidden states to flip truth judgments

Neighborhood — ranked by edge-count

Methods (1)

method

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

  • The use of interventions (rather than correlations) to establish a causal link between representation geometry and behavioral geometry.
  • Traditional mechanistic accounts (Danto, Chisholm, Goldman) that Juarrero critiques as resting on outdated Newtonian causality.
  • causal bypassingconcept0.790
    Confound where naming injected concepts reflects direct logit effects rather than metacognitive awareness, raised by Morris & Plunkett
  • The general experimental approach of intervening along geometrically-defined paths rather than single-point or linear-direction interventions
  • Core question motivating the shift from linear to geometry-aware steering; answered via manifold alignment analysis.
  • Causal Mediationconcept0.769
    Whether an internal direction causally controls a target behavior, verified by intervention success
  • Causal Emergenceconcept0.767
    Core concept: degree to which an agent exerts unique predictive power on its future; key to cognition at all scales.
  • Pearson correlation of feature activations across 40M tokens used to measure feature similarity and universality across models