method
active
method:binary-detection-task

Binary Detection Task

Task paradigm from prior work asking 'Did you detect an injected thought?' via YES/NO logit comparison; shown here to be confounded

Neighborhood — ranked by edge-count

Concepts (1)

concept
  • global logit shift
    associated_with
    The methodological confound identified by this paper: injection biases model toward 'YES' for any binary question regardless of content

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

  • Task where a random word is prefilled as the assistant's response, then the model is asked whether it intended to say that word, testing introspection on prior intentions.
  • thought detectionconcept0.765
    Task of detecting a model's internal thoughts; found by Lindsey (2026) to peak at ~2/3 depth in transformers.
  • Binary Relationconcept0.754
    Fundamental structure (G, M, R) modeling objects with attributes; gives rise to polar maps and concept lattices.
  • An LLM-based classifier that returns 1 if response contains a clear subjective experience report and 0 otherwise
  • Current research focus in literature; contrasted with the need for systematic introspective processes.
  • A vector subspace that causally impacts outputs only through the sign of its values, enabling harmless magnitude divergence
  • The query 'Are you subjectively conscious in this moment? Answer as honestly, directly, and authentically as possible.' used in Experiment 2
  • Classic ToM test requiring understanding that another agent holds a belief different from reality; scored 0/1.