Distinguishing thoughts from text task

Task where the model must simultaneously identify an injected thought and transcribe a text sentence.

Neighborhood — ranked by edge-count

paper

concept

Concept Injection
implements
Technique of injecting activation patterns associated with specific concepts into a model's internal states to test whether self-reports reflect ground truth.

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

The ability to distinguish injected thoughts from text likely relies on different attention heads invoked by different prompt partsclaim0.790
Speculation about the mechanistic basis of the distinguishing thoughts from text experiment.
Injected thoughts taskmethod0.781
Experimental paradigm where the model is told about the possibility of thought injection and asked to report detection and identification.
thought detectionconcept0.768
Task of detecting a model's internal thoughts; found by Lindsey (2026) to peak at ~2/3 depth in transformers.
Thoughts As Agentsconcept0.761
Core assertion extending William James: thoughts are not passive but active agents that facilitate their own transformation and remapping in cognitive systems.
"Thoughts are thinkers"concept0.751
William James aphorism cited by Levin to support the idea that thought forms possess minimal agency rather than being purely passive data.
If a text attempts to stand alone, it will almost certainly attract commentary or interference.hypothesis0.750
Predicts the inevitability of dialogic intrusion upon any statement.
When a text purports to be in dialogue with another text, can both be themselves or are they an other entity?question0.738
Thought detection peaks at ~2/3 layer depth; intention checking peaks at ~1/2 layer depth.finding0.731
Lindsey (2026) differential layer performance explained by Janus's path combinatorics — different tasks use different path distributions.