finding
active
finding:concept-steering-with-target-vs-off-target-probe-area-metric-reveals-three-operational-regimes-selectively-steerable-encoded-but-entangled-non-encoded-across-sleepfm-reve-labramConcept steering with target vs off-target probe area metric reveals three operational regimes (selectively steerable, encoded but entangled, non-encoded) across SleepFM, REVE, LaBraM.
Result categorizing concept steerability into three distinct regimes.
Source paper
extracted_from(2026) · William Lehn-Schiøler · Magnus Ruud Kjær · Rahul Thapa · M. Pedersen +9
Neighborhood — ranked by edge-count
Claims (1)
claim
- Interpretive claim summarizing the spectrum of concept steerability discovered.
Communities (3)
community
- Explores geometry of activation/behavior manifolds to enable selective, non-destructive concept interventions.
- Concepts encoded as curved manifolds and circular structures in LLM activation spaces.
- Investigates inseparability of clinical concepts (age, pathology) in EEG transformers using SAE feature analysis and steering metrics across SleepFM, REVE, LaBraM architectures.
Questions (1)
question
- Core research question driving the mechanistic investigation.
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Justification for the novel metric introduced in the paper
- Key methodological insight: introspection enables a new probe validation criterion beyond conventional separation metrics
- Main empirical finding of the concept steering analysis
- Metric introduced to quantify steering selectivity by comparing the area of target and off-target concept probes.
- Supported by the instruction discovery experiments comparing steering vs. embedding baselines.
- Empirical comparison showing advantage of SAE features in low-data regime.
- Quantitative assessment of feature quality using clinical concepts across models.