question
active
question:what-are-the-specific-attention-heads-or-mlp-neurons-circuits-responsible-for-self-reflection-in-llmsWhat are the specific attention heads or MLP neurons (circuits) responsible for self-reflection in LLMs?
Future research question about pinpointing fine-grained mechanistic components of reflection.
Source paper
extracted_from(2025) · Chang, Fu-Chieh · Lee, Yu-Ting · Wu, Pei-Yuan
Neighborhood — ranked by edge-count
Papers (1)
paper
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Hypothesis based on observed negative cosine similarity between input and output weights of some neurons
- Core claim of ReflCtrl that a single direction captures and controls reflection
- Empirical observation from examining expanded OV/QK matrices; approximately 10 out of 12 heads show significant copying
- The paper's reformulation of the core open question after establishing systematic self-reports
- Open question motivating the entire paper; identified as not yet well understood
- Structural finding about which attention heads control reflection behavior
- The underlying mechanism of self-reflection in reasoning LLMs is not yet well understoodquestion0.777Broad gap motivating the entire paper
- Key decomposition enabling separate analysis of where attention goes and what it does