community

active

leiden_hybrid_concepts

label: haiku

community:leiden_hybrid_concepts-run4-c0-c2-c3

Mechanistic introspection in language models

Investigates how different introspective processes activate distinct computational mechanisms at specific model depths, using layer-wise analysis.

2 members. Each node is clickable.

Loading graph…

Drawn from 1 source

The papers/notes whose extracted claims & findings make up this cluster.

Emergent Introspective Awareness in Large Language Models2 members

Bridges (4)

Other communities that share members with this one — cross-cutting threads or papers that sit at the seam between two themes.

Mechanistic interpretability & model evaluation2 shared
Mechanistic introspection in language models2 shared
LLM functional introspective awareness1 shared
LLM introspective awareness of injected concepts1 shared

Claims (1)

Different forms of introspection invoke mechanistically different processesBased on layer-selective perturbation results.

Findings (1)

Introspective awareness peaks at a layer about two-thirds through Opus 4.1 for injected thoughtsThe success rate shows a sharp peak at a specific middle layer.