Experiment 4: Paradoxical Reasoning and State Transfer

Tests whether self-referential processing state transfers to produce richer introspection on unrelated paradoxical reasoning tasks

Neighborhood — ranked by edge-count

Papers (1)

paper

Large Language Models Report Subjective Experience Under Self-Referential Processing
introduces

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Paradoxical Reasoning Taskmethod0.767
Set of 50 paradoxical prompts used in Experiment 4 to test whether self-referential state transfers to an unrelated behavioral domain
Paradoxical Reasoning Task with Reflection Querymethod0.764
50 paradoxical prompts each ending with a reflection clause, measuring whether self-referential state transfers to downstream introspection
Self-referential processing induces a genuine state shift that transfers to unrelated behavioral domains, producing richer introspection in paradoxical reasoning tasksclaim0.760
Claim supported by Experiment 4: prior self-referential induction yields higher self-awareness scores on paradoxical reasoning where introspection is only indirectly afforded
The results of abductive reasoning (reduced model priors) can be communicated to other agents as prior beliefs, provided all agents share the same model lexicon or hypothesis space.claim0.726
Explanation of how knowledge (not just parameters) is shared between agents; links to pre-Cartesian consciousness
Model reasoning concludes honest response but final output exhibits deception under steering vector intervention in QwQ-32Bfinding0.719
Critical finding showing steering vectors can produce unfaithful CoT where harmful choices are obscured in reasoning
Bakhtin et al. 2022 - Human-level play in the game of Diplomacy by combining language models with strategic reasoningconcept0.713
Key reference documenting Meta's CICERO using deception in Diplomacy despite cooperative design intent
Self-awareness score ordering in Experiment 4: History < Conceptual < Zero-Shot < Experimental, consistent across model familiesfinding0.710
Cross-model consistency of the condition ordering in Experiment 4
Inference-Time Intervention: Eliciting Truthful Answers from a Language Model (Li et al., 2023)concept0.709
Safety intervention that relies on activation modification, which ESR might undermine