question

active

question:what-are-the-specific-attention-heads-or-mlp-neurons-circuits-responsible-for-self-reflection-in-llms

What are the specific attention heads or MLP neurons (circuits) responsible for self-reflection in LLMs?

Future research question about pinpointing fine-grained mechanistic components of reflection.

Source paper

extracted_from

Unveiling the Latent Directions of Reflection in Large Language Models

(2025) · Chang, Fu-Chieh · Lee, Yu-Ting · Wu, Pei-Yuan

Neighborhood — ranked by edge-count

Papers (1)

paper

Unveiling the Latent Directions of Reflection in Large Language Models
associated_with

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Some MLP neurons and attention heads perform memory management by reading residual stream information and writing its negative to delete itclaim0.817
Hypothesis based on observed negative cosine similarity between input and output weights of some neurons
A linear reflection direction exists in reasoning LLMs' latent representation space that governs self-reflection behaviorclaim0.803
Core claim of ReflCtrl that a single direction captures and controls reflection
Most attention heads in one-layer models dedicate an enormous fraction of their capacity to copying behaviorclaim0.786
Empirical observation from examining expanded OV/QK matrices; approximately 10 out of 12 heads show significant copying
When LLMs claim consciousness under self-reference, is this sophisticated simulation or genuine self-representation, and how would we tell the difference?question0.781
The paper's reformulation of the core open question after establishing systematic self-reports
what is the underlying mechanism of self-reflection in reasoning LLMs?question0.780
Open question motivating the entire paper; identified as not yet well understood
Attention heads with positive projection on reflection direction are sparse and located mostly in deeper layers of DeepSeek-R1-Qwen-1.5Bfinding0.780
Structural finding about which attention heads control reflection behavior
The underlying mechanism of self-reflection in reasoning LLMs is not yet well understoodquestion0.777
Broad gap motivating the entire paper
Each attention head has two largely independent computations: a QK circuit computing the attention pattern and an OV circuit computing the effect if attended toclaim0.774
Key decomposition enabling separate analysis of where attention goes and what it does