finding
active
finding:thought-detection-peaks-at-2-3-depth-in-transformersthought detection peaks at ~2/3 depth in transformers
Lindsey (2026) found that thought detection accuracy is highest around two-thirds of the network depth.
Source paper
extracted_fromNeighborhood — ranked by edge-count
Claims (1)
claim
- Interpretive claim connecting exponential path combinatorics to Lindsey's layer-dependent findings.
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Thought detection peaks at ~2/3 layer depth; intention checking peaks at ~1/2 layer depth.finding0.871Lindsey (2026) differential layer performance explained by Janus's path combinatorics — different tasks use different path distributions.
- Lindsey (2026) found that intention checking accuracy peaks around half the network depth.
- Introspective awareness peaks at a layer about two-thirds through Opus 4.1 for injected thoughtsfinding0.749The success rate shows a sharp peak at a specific middle layer.
- Opus 4.1 never claims to detect injected thought when none applied (0/100 trials); production Claude models maintain essentially zero false positive rate.
- Janus's claim linking path redundancy to interferometric phenomenology.
- The optimal layer for the prefill introspection differs from the optimal layer for detecting injected thoughts.
- Task of detecting a model's internal thoughts; found by Lindsey (2026) to peak at ~2/3 depth in transformers.
- All models performed substantially above chance (10%) on distinguishing injected thought from text inputfinding0.732All tested models could both identify the injected concept and transcribe the input sentence well above random.