concept
active
concept:input-restricted-interventionInput-Restricted Intervention
Practical restriction of interventions to those producible by actual inputs; standard in DAS practice
Neighborhood — ranked by edge-count
Concepts (1)
concept
- Distributed Abstractionassociated_withKey notion where alignment map ϕ maps neurons block-wise to latent variables before constructive abstraction
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Assumption that DNN layers preserve input information by being injective; key condition for Theorem 1
- Manipulation of activations along a straight line; shown to fail when it crosses voids, in contrast to manifold-following interventions.
- Intervention targeting specific dimensional subsets of activation vectors rather than full representations
- Scalar parameter modulating how strongly a steering vector shifts model activations; set to 15 for Exp1 and ±16 for Exp2
- The use of interventions (rather than correlations) to establish a causal link between representation geometry and behavioral geometry.
- General technique of modifying activations to control model behavior.
- Intervention mode where multiple interventions are applied simultaneously to the same base computation graph
- Method of shifting hidden state activations along probe directions to cause the model to treat false statements as true and vice versa; evaluated on OOD inputs