concept
active
concept:thought-detectionthought detection
Task of detecting a model's internal thoughts; found by Lindsey (2026) to peak at ~2/3 depth in transformers.
Neighborhood — ranked by edge-count
Papers (1)
paper
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Segments of reasoning separated by \n\n tokens used as the unit of analysis in ReflCtrl
- William James aphorism cited by Levin to support the idea that thought forms possess minimal agency rather than being purely passive data.
- Task where the model must simultaneously identify an injected thought and transcribe a text sentence.
- Experimental paradigm where the model is told about the possibility of thought injection and asked to report detection and identification.
- Task paradigm from prior work asking 'Did you detect an injected thought?' via YES/NO logit comparison; shown here to be confounded
- Current research focus in literature; contrasted with the need for systematic introspective processes.
- Thought detection peaks at ~2/3 layer depth; intention checking peaks at ~1/2 layer depth.finding0.749Lindsey (2026) differential layer performance explained by Janus's path combinatorics — different tasks use different path distributions.