community
active
leiden_hybrid_concepts
label: haiku
community:leiden_hybrid_concepts-run4-c0-c2-c6Introspective signal suppression in transformer layers
Study of how self-referential representations emerge in middle layers but are actively dampened by post-training procedures like RLHF.
2 members. Each node is clickable.
Loading graph…
Drawn from 2 sources
The papers/notes whose extracted claims & findings make up this cluster.
Bridges (3)
Other communities that share members with this one — cross-cutting threads or papers that sit at the seam between two themes.
Findings (2)
- Introspective signals appear in middle layers but are suppressed by later post-training-shaped layers.Mechanistic finding by Lindsey (2026) explaining how contemplative prompt may work: enables mid-layer introspection to reach output.
- Layer-dependent introspective peaksIntrospective awareness in Opus 4.1 peaks at layer ~2/3 through model depth for thought injection and text distinction; prefill detection most sensitive to earlier layer, suggesting mechanistically distinct processes.