finding

active

finding:baseline-llm-condition-in-ipd-replicates-prior-findings-agents-cooperate-selectively-only-when-opponent-consistently-cooperates

Baseline LLM condition in IPD replicates prior findings: agents cooperate selectively only when opponent consistently cooperates

Replication of Fontana et al. 2025 findings in the paper's own Experiment 2 baseline condition

Source paper

extracted_from

Contemplative Agent

(2025) · Ruben Laukkonen · Fionn Inglis · Shamil Chandaria · Lars Sandved-Smith +4

Neighborhood — ranked by edge-count

Concepts (1)

concept

Contemplative Artificial Intelligence (Laukkonen et al., 2025)
associated_with
The primary source paper proposing four contemplative principles for AI alignment and piloting them empirically

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

All prompting techniques led to full cooperation against Always Cooperate opponents in IPDfinding0.815
Ceiling finding in IPD experiment; baseline sufficient when opponent always cooperates
Two heuristic code agents outperform most tested LLMs, and behavioural traces surface recurring LLM failure modes including overbidding, self-bidding, bankrupt TC initiation, and weak opponent-state adaptation.quote0.782
Abstract sentence summarising performance and failures.
Behavioural traces surface recurring LLM failure modes including overbidding, self-bidding, bankrupt TC initiation, and weak opponent-state adaptation that never appear in code agents.claim0.782
LLMs exhibit systematic errors that deterministic logic avoids.
Deceptive RL baseline agents have lower mean neural self-other overlap than honest baseline agentsclaim0.778
Core empirical prediction tested in RL experiments, confirmed by 100% classification accuracy
LLM personality self-reports are illusory: post-training alignment creates stable human-like reports dissociated from actual behavior (Han et al. 2025)claim0.777
Skeptical prior work motivating the need to validate self-reports against internal states rather than taking them at face value
Truthful behavior in LLMs is not confined to a single linear axis; multiple orthogonal directions can independently mediate itclaim0.776
Central interpretive claim of the paper
The systematic behavioral shift of LLMs under self-referential processing conditions predicted by consciousness theories represents something more structured than superficial correlations in training dataclaim0.776
The paper's claim that theoretical convergence across GWT, RPT, HOT, IIT makes the findings non-coincidental
LLMs can predict their own responses more accurately than external observers, implying privileged internal knowledgefinding0.772
Binder et al. finding cited as evidence that LLMs possess introspective capacity analogous to mindfulness