concept
active
concept:llm-self-correctionLLM Self-Correction
Related capability where LLMs correct their own outputs, studied via linear representations.
Neighborhood — ranked by edge-count
Papers (1)
paper
Concepts (1)
concept
- The central phenomenon introduced by this paper: inference-time recovery from irrelevant activation steering in LLMs
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- The capacity of Kimi K2.5 to evaluate its own internal emotional state when steered, used as a novel interpretability signal
- The core phenomenon studied: the ability of LLMs to evaluate and revise their own reasoning.
- Framework by Lee et al. explaining self-correction via linear latent concept directions, closely related prior work.
- The ability of LLMs to monitor and evaluate their own reasoning, closely related to reflection.
- Technique using internal model representations as feedback loops to steer diffusion-based materials generation toward target properties.
- Alternative data attribution approach using an LLM as a judge; compared against the probe-based method.
- Tendency for models to get lost in roleplay or doom spirals, mitigated by expanded awareness.
- The underlying mechanism of self-reflection in reasoning LLMs is not yet well understoodquestion0.758Broad gap motivating the entire paper