claim
active
claim:the-ability-to-distinguish-injected-thoughts-from-text-likely-relies-on-different-attention-heads-invoked-by-different-prompt-parts

The ability to distinguish injected thoughts from text likely relies on different attention heads invoked by different prompt parts

Speculation about the mechanistic basis of the distinguishing thoughts from text experiment.

Source paper

extracted_from
Emergent Introspective Awareness in Large Language Models
(2026) · Lindsey, Jack

Neighborhood — ranked by edge-count

Communities (3)

community

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.