thought detection

Task of detecting a model's internal thoughts; found by Lindsey (2026) to peak at ~2/3 depth in transformers.

Neighborhood — ranked by edge-count

paper

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Thinking stepsconcept0.771
Segments of reasoning separated by \n\n tokens used as the unit of analysis in ReflCtrl
"Thoughts are thinkers"concept0.770
William James aphorism cited by Levin to support the idea that thought forms possess minimal agency rather than being purely passive data.
Distinguishing thoughts from text taskmethod0.768
Task where the model must simultaneously identify an injected thought and transcribe a text sentence.
Injected thoughts taskmethod0.767
Experimental paradigm where the model is told about the possibility of thought injection and asked to report detection and identification.
Binary Detection Taskmethod0.765
Task paradigm from prior work asking 'Did you detect an injected thought?' via YES/NO logit comparison; shown here to be confounded
Attention probes for belief decodingconcept0.761
Consciousness Detectionconcept0.759
Current research focus in literature; contrasted with the need for systematic introspective processes.
Thought detection peaks at ~2/3 layer depth; intention checking peaks at ~1/2 layer depth.finding0.749
Lindsey (2026) differential layer performance explained by Janus's path combinatorics — different tasks use different path distributions.