finding
active
finding:cross-model-pairwise-cosine-similarity-of-zero-shot-control-responses-0-603-n-12-720-pairs-t-35-1-p-4-3-10-262-vs-experimentalCross-model pairwise cosine similarity of zero-shot control responses = 0.603 (n=12,720 pairs, t=35.1, p=4.3×10⁻²⁶² vs. experimental)
Experiment 3 comparison: zero-shot control shows lower semantic convergence than experimental condition
Source paper
extracted_from(2025) · Berg, Cameron · de Lucena, Diogo · Rosenblatt, Judd
Neighborhood — ranked by edge-count
Claims (1)
claim
- Interpretive claim from Experiment 3; GPT, Claude, Gemini families converge on similar descriptive style despite independent training
Hypotheses (1)
hypothesis
- Hypothesis tested in Experiment 3; independently trained GPT, Claude, Gemini architectures converge on similar descriptive vocabulary
Concepts (1)
concept
- Sycophantic RoleplaycontradictsThe alternative explanation for LLM consciousness claims that the paper seeks to distinguish against
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Core result of Experiment 3: cross-model semantic convergence under self-referential processing
- Controls for probe artifacts; demonstrates self-reports carry information specifically about probe-defined concept directions
- Shows trait space has more cross-model consistency than role space beyond PC1
- Appendix E replication of DIM alignment finding in Qwen model
- Shows persona space axes are inherited from pre-training, not solely created by post-training
- Models produce first-attempt mean scores 87.8-91.8/100 without steering across all model familiesfinding0.758Establishes high baseline quality confirming steering-induced degradation is the experimental signal
- Mechanistic evidence that network actively attenuates injected perturbations, explaining late-layer introspection failure
- Validates robustness of alignment metric choice