claim
active
claim:adversarial-ablations-enforce-mechanistic-faithfulness

Adversarial ablations enforce mechanistic faithfulness.

Methodological claim that the adversarial ablation approach ensures decomposed components causally correspond to computation.

Source paper

extracted_from
Interpreting Language Model Parameters
(2026) · Bushnaq, Lucius · Braun, Dan · Clive-Griffin, Oliver · Bussmann, Bart +4

Neighborhood — ranked by edge-count

Concepts (1)

concept
  • Idea that decomposed components must causally reflect the model's computation; enforced by adversarial ablation in VPD.

Methods (1)

method
  • Technique used in VPD to enforce mechanistic faithfulness of parameter decompositions.

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.