framework
active
framework:seal-steerable-reasoning-calibrationSEAL (Steerable Reasoning Calibration)
Prior work using steering vectors to control reflection, motivated by reducing redundant self-reflection in long CoT.
Neighborhood — ranked by edge-count
Papers (1)
paper
Thinkers (1)
thinker
- Runjin Chen (SEAL)introducesAuthor of SEAL paper on steerable reasoning calibration using steering vectors.
Concepts (1)
concept
- The paper's central construct: a vector in LLM activation space encoding the transition between reflection levels.
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- A pair of query and key subcomponents distributed across attention heads performs syntax-boundary routingfinding0.732VPD recovers an attention algorithm for routing across syntactic boundaries, distributed across heads.
- Modifying model behavior by clamping SAE feature activations to specific values during forward pass.
- Shows alignment faking can emerge from training data information without explicit prompting
- Framework of using internal-state representations to control or steer generative models; conceptually parallel to manifold steering in language models.
- Cited regarding possibility of encoding misaligned reasoning in benign chains-of-thought
- Practical guidance for practitioners who lack ground-truth model organisms.
- The adaptive, incremental nature of living process, allowing small steps with continuous evaluation and adjustment.
- Evidence that NLA explanations bear causal relationship to model outputs; demonstrates validity of extracted representations.