concept
active
concept:input-restricted-intervention

Input-Restricted Intervention

Practical restriction of interventions to those producible by actual inputs; standard in DAS practice

Neighborhood — ranked by edge-count

Concepts (1)

concept
  • Key notion where alignment map ϕ maps neurons block-wise to latent variables before constructive abstraction

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

  • Input-Injectivityconcept0.763
    Assumption that DNN layers preserve input information by being injective; key condition for Theorem 1
  • Manipulation of activations along a straight line; shown to fail when it crosses voids, in contrast to manifold-following interventions.
  • Intervention targeting specific dimensional subsets of activation vectors rather than full representations
  • Scalar parameter modulating how strongly a steering vector shifts model activations; set to 15 for Exp1 and ±16 for Exp2
  • The use of interventions (rather than correlations) to establish a causal link between representation geometry and behavioral geometry.
  • General technique of modifying activations to control model behavior.
  • Intervention mode where multiple interventions are applied simultaneously to the same base computation graph
  • Method of shifting hidden state activations along probe directions to cause the model to treat false statements as true and vice versa; evaluated on OOD inputs