concept
active
concept:intervention-strength-alphaIntervention Strength (Alpha)
Scalar parameter modulating how strongly a steering vector shifts model activations; set to 15 for Exp1 and ±16 for Exp2
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Number of latent variables assigned per algorithm node in distributed abstraction; affects IIA
- Intervention targeting specific dimensional subsets of activation vectors rather than full representations
- Property that additive modifications to activations affect all downstream computations, enabling tractable behavioral control
- The goal of mechanistically-grounded, reliable control of neural network behavior via activation interventions
- Intervention mode where interventions are applied sequentially, each building on the previous one
- Spearman ρ measuring rank-order agreement between logit-based self-report and probe score; the paper's primary monotonic association metric
- Practical restriction of interventions to those producible by actual inputs; standard in DAS practice