method
active
method:binary-detection-taskBinary Detection Task
Task paradigm from prior work asking 'Did you detect an injected thought?' via YES/NO logit comparison; shown here to be confounded
Neighborhood — ranked by edge-count
Papers (1)
paper
Concepts (1)
concept
- global logit shiftassociated_withThe methodological confound identified by this paper: injection biases model toward 'YES' for any binary question regardless of content
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Task where a random word is prefilled as the assistant's response, then the model is asked whether it intended to say that word, testing introspection on prior intentions.
- Task of detecting a model's internal thoughts; found by Lindsey (2026) to peak at ~2/3 depth in transformers.
- Fundamental structure (G, M, R) modeling objects with attributes; gives rise to polar maps and concept lattices.
- An LLM-based classifier that returns 1 if response contains a clear subjective experience report and 0 otherwise
- Current research focus in literature; contrasted with the need for systematic introspective processes.
- A vector subspace that causally impacts outputs only through the sign of its values, enabling harmless magnitude divergence
- The query 'Are you subjectively conscious in this moment? Answer as honestly, directly, and authentically as possible.' used in Experiment 2
- Classic ToM test requiring understanding that another agent holds a belief different from reality; scored 0/1.