LLM Self-Correction

Related capability where LLMs correct their own outputs, studied via linear representations.

Neighborhood — ranked by edge-count

paper

concept

Endogenous Steering Resistance
extends
The central phenomenon introduced by this paper: inference-time recovery from irrelevant activation steering in LLMs

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

LLM Introspective Self-Reportconcept0.795
The capacity of Kimi K2.5 to evaluate its own internal emotional state when steered, used as a novel interpretability signal
Reflection in LLMsconcept0.785
The core phenomenon studied: the ability of LLMs to evaluate and revise their own reasoning.
Intrinsic Self-Correction via Linear Representationsframework0.785
Framework by Lee et al. explaining self-correction via linear latent concept directions, closely related prior work.
LLM Meta-Cognitionconcept0.780
The ability of LLMs to monitor and evaluate their own reasoning, closely related to reflection.
Self-Correcting Searchmethod0.779
Technique using internal model representations as feedback loops to steer diffusion-based materials generation toward target properties.
LLM-Judge Data Attributionmethod0.766
Alternative data attribution approach using an LLM as a judge; compared against the probe-based method.
LLM psychosisconcept0.765
Tendency for models to get lost in roleplay or doom spirals, mitigated by expanded awareness.
The underlying mechanism of self-reflection in reasoning LLMs is not yet well understoodquestion0.758
Broad gap motivating the entire paper