finding
active
finding:steering-at-6-layers-strength-0-6-each-total-3-6-outperforms-single-layer-steering-at-equivalent-total-strength-for-type-hint-suppression

Steering at 6 layers (strength 0.6 each, total 3.6) outperforms single-layer steering at equivalent total strength for type hint suppression

Demonstrates distributed steering is more effective and less accuracy-damaging than concentrated steering.

Source paper

extracted_from
Steering Evaluation-Aware Language Models to Act Like They Are Deployed
(2025) · Hua, Tim Tian · Qin, Andrew · Marks, Samuel · Nanda, Neel

Neighborhood — ranked by edge-count

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.