concept
active
concept:causal-tracing

Causal Tracing

Mechanistic interpretability technique for locating factual associations, mentioned as future work direction.

Neighborhood — ranked by edge-count

Methods (1)

method
  • Standard method in mechanistic interpretability that intervenes on activations; VPD flips this paradigm by patching parameters.

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

  • The structural-realist grounding for self-evidencing after the bounded self is relinquished.
  • Causal abstractionconcept0.812
    A framework the paper uses alongside feature geometry to deepen mechanistic understanding of LMs
  • Causal powerconcept0.804
    The ability of an agent to be a driver of subsequent events; a hallmark of cognition that causal emergence quantifies.
  • Causal Scrubbingmethod0.804
    Method by Chan et al. 2022 for rigorously testing interpretability hypotheses via interventions
  • Causal Geometryframework0.803
    Chvykov and Hoel's geometric extension of causal emergence to continuous systems using Fisher information.
  • Causal Mediationconcept0.800
    Whether an internal direction causally controls a target behavior, verified by intervention success
  • Framework informing path-specific objectives by identifying causal chains leading to risky behaviors
  • Causal Mechanismconcept0.794
    Function determining the value of a variable based on its causal parents in an acyclic causal model.