claim
active
claim:the-detection-of-an-injected-concept-requires-an-extra-step-of-internal-processing-downstream-of-metacognitive-recognitionThe detection of an injected concept requires an extra step of internal processing downstream of metacognitive recognition
The model must register an anomaly before reporting it.
Source paper
extracted_from(2026) · Lindsey, Jack
Neighborhood — ranked by edge-count
Communities (4)
community
- Spans attention head decomposition, benchmark awareness, and genomic pathogenicity prediction via neural models.
- Probing Claude and other models for internal detection of artificially injected thoughts across layers.
- Probing early detection of model confidence during chain-of-thought reasoning to optimize inference efficiency and identify confabulation patterns.
- Studies how models distinguish artificially injected concepts from natural text inputs, examining metacognitive recognition and downstream processing mechanisms.
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Acknowledges that the model's additional descriptions of its experience are unverified.
- Observation from alternative prompts that detection is weaker without setup.
- Key discriminating question motivating the baseline control experiment
- Speculation about the mechanistic basis of the distinguishing thoughts from text experiment.
- Models maintain ability to accurately transcribe input text while simultaneously reporting on injected thoughts, all models perform above chance, Opus 4/4.1 best.
- Canonical illustration of the Hard Problem intuition that any functional/mechanical explanation faces an explanatory gap for perception
- Prior finding cited as convergent evidence for LLM self-awareness capacities