method
active
method:injected-thoughts-taskInjected thoughts task
Experimental paradigm where the model is told about the possibility of thought injection and asked to report detection and identification.
Neighborhood — ranked by edge-count
Papers (1)
paper
Concepts (1)
concept
- Concept InjectionimplementsTechnique of injecting activation patterns associated with specific concepts into a model's internal states to test whether self-reports reflect ground truth.
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Models can detect and identify injected concept vectors ~20% of the time at optimal layer/strength in Opus 4.1, with immediacy suggesting internal rather than output-inferred detection.
- Task where the model must simultaneously identify an injected thought and transcribe a text sentence.
- Speculation about the mechanistic basis of the distinguishing thoughts from text experiment.
- Core assertion extending William James: thoughts are not passive but active agents that facilitate their own transformation and remapping in cognitive systems.
- Task of detecting a model's internal thoughts; found by Lindsey (2026) to peak at ~2/3 depth in transformers.
- Acknowledges that the model's additional descriptions of its experience are unverified.
- William James aphorism cited by Levin to support the idea that thought forms possess minimal agency rather than being purely passive data.
- Models maintain ability to accurately transcribe input text while simultaneously reporting on injected thoughts, all models perform above chance, Opus 4/4.1 best.