Reflection in LLMs

The core phenomenon studied: the ability of LLMs to evaluate and revise their own reasoning.

Neighborhood — ranked by edge-count

claim

method

Chain-of-thought prompting
associated_with
Technique by which LLMs generate intermediate reasoning steps before final output; used by ChatGPT o3.

concept

LLM Meta-Cognition
associated_with
The ability of LLMs to monitor and evaluate their own reasoning, closely related to reflection.
Situational Reflection
associated_with
The specific form of reflection studied, where a model reflects on reasoning generated by another source.

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

The underlying mechanism of self-reflection in reasoning LLMs is not yet well understoodquestion0.823
Broad gap motivating the entire paper
what is the underlying mechanism of self-reflection in reasoning LLMs?question0.822
Open question motivating the entire paper; identified as not yet well understood
Reasoning LLMs trigger reflection when their internal uncertainty is highhypothesis0.801
Core hypothesis linking internal uncertainty to self-reflection behavior, tested via probing experiments
A linear reflection direction exists in reasoning LLMs' latent representation space that governs self-reflection behaviorclaim0.794
Core claim of ReflCtrl that a single direction captures and controls reflection
Truth direction in LLMsconcept0.789
Linear direction in LLM activations associated with truthfulness, identified by Burns et al. 2022 and Azaria & Mitchell 2023
LLM Self-Correctionconcept0.785
Related capability where LLMs correct their own outputs, studied via linear representations.
Linear Representation of Concepts in LLMsconcept0.784
The finding that interpretable concepts including character traits are encoded as linear directions in transformer residual streams
LLM Internal Representationsconcept0.776
High-dimensional vectors produced at each transformer layer for each input token; the primary substrate analyzed in this study.