thought detection peaks at ~2/3 depth in transformers

Lindsey (2026) found that thought detection accuracy is highest around two-thirds of the network depth.

Source paper

extracted_from

Janus Information Flow Transformers 2025

Neighborhood — ranked by edge-count

Claims (1)

claim

Different introspective tasks may preferentially use different path distributions in the transformer.
supports
Interpretive claim connecting exponential path combinatorics to Lindsey's layer-dependent findings.

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Thought detection peaks at ~2/3 layer depth; intention checking peaks at ~1/2 layer depth.finding0.871
Lindsey (2026) differential layer performance explained by Janus's path combinatorics — different tasks use different path distributions.
intention checking peaks at ~1/2 depth in transformersfinding0.857
Lindsey (2026) found that intention checking accuracy peaks around half the network depth.
Introspective awareness peaks at a layer about two-thirds through Opus 4.1 for injected thoughtsfinding0.749
The success rate shows a sharp peak at a specific middle layer.
Production models show zero false positives on thought injection detectionfinding0.747
Opus 4.1 never claims to detect injected thought when none applied (0/100 trials); production Claude models maintain essentially zero false positive rate.
Redundant information paths create interference patterns, so transformers likely experience memory and cognition as interferometric and continuous.claim0.742
Janus's claim linking path redundancy to interferometric phenomenology.
Prefill detection effect peaks at an earlier layer (slightly over halfway through) in Opus 4.1, different from injected thoughts peakfinding0.741
The optimal layer for the prefill introspection differs from the optimal layer for detecting injected thoughts.
thought detectionconcept0.733
Task of detecting a model's internal thoughts; found by Lindsey (2026) to peak at ~2/3 depth in transformers.
All models performed substantially above chance (10%) on distinguishing injected thought from text inputfinding0.732
All tested models could both identify the injected concept and transcribe the input sentence well above random.