method
active
method:baseline-control-experimentbaseline control experiment
Control using objectively-NO factual questions under identical injection to measure global logit shift vs. genuine detection signal
Neighborhood — ranked by edge-count
Papers (1)
paper
Claims (2)
claim
- Primary negative finding reinterpreted as methodological claim: binary paradigm is invalid for testing introspection
- Methodological prescription arising from the binary detection confound finding
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Control condition with steering disabled to confirm self-correction is induced by steering, not spontaneous
- The act of directing a system's behavior; the objective of a regulator.
- Adaptation of Hewitt and Liang control tasks to CausalGym: next-token labels replaced with arbitrary tokens to measure method expressivity
- Quantitative study varying representational familiarity via numeral bases B10/B8/B9 at fixed computational complexity
- Baseline method sampling a random vector as feature direction for comparison with learned methods
- Tests whether self-referential induction reliably elicits experience reports across model families vs. three matched controls
- Baseline model stitching trained in a single behavioral direction without CL auxiliary loss, used for comparison with CLMAS.