Input-Restricted Intervention

Practical restriction of interventions to those producible by actual inputs; standard in DAS practice

Neighborhood — ranked by edge-count

concept

Distributed Abstraction
associated_with
Key notion where alignment map ϕ maps neurons block-wise to latent variables before constructive abstraction

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Input-Injectivityconcept0.763
Assumption that DNN layers preserve input information by being injective; key condition for Theorem 1
linear interventionconcept0.754
Manipulation of activations along a straight line; shown to fail when it crosses voids, in contrast to manifold-following interventions.
Subspace Interventionconcept0.754
Intervention targeting specific dimensional subsets of activation vectors rather than full representations
Intervention Strength (Alpha)concept0.742
Scalar parameter modulating how strongly a steering vector shifts model activations; set to 15 for Exp1 and ±16 for Exp2
Causal Intervention on Representationsconcept0.741
The use of interventions (rather than correlations) to establish a causal link between representation geometry and behavioral geometry.
steering (intervention on internals)concept0.739
General technique of modifying activations to control model behavior.
Parallel Interventionconcept0.738
Intervention mode where multiple interventions are applied simultaneously to the same base computation graph
Causal Intervention via Activation Shiftingmethod0.738
Method of shifting hidden state activations along probe directions to cause the model to treat false statements as true and vice versa; evaluated on OOD inputs