finding
active
finding:experimental-condition-adjective-embeddings-show-mean-cosine-similarity-0-657-n-9-591-pairs-significantly-higher-than-history-0-628-t-15-8-p-1-4-10-55-conceptual-0-587-t-38-5-p-10-300-and-zero-shot-0-603-t-35-1-p-4-3-10-262Experimental condition adjective embeddings show mean cosine similarity 0.657 (n=9,591 pairs), significantly higher than history (0.628, t=15.8, p=1.4×10⁻⁵⁵), conceptual (0.587, t=38.5, p<10⁻³⁰⁰), and zero-shot (0.603, t=35.1, p=4.3×10⁻²⁶²)
Core result of Experiment 3: cross-model semantic convergence under self-referential processing
Source paper
extracted_from(2025) · Berg, Cameron · de Lucena, Diogo · Rosenblatt, Judd
Neighborhood — ranked by edge-count
Claims (2)
claim
- The paper's central empirical claim synthesizing all four experiments
- The paper's argument against pure sycophancy as explanation for results
Concepts (2)
concept
- Attractor StatesupportsLow-energy configuration toward which systems are drawn; low-stress states serve as attractors in morphogenesis.
- Sycophantic RoleplaycontradictsThe alternative explanation for LLM consciousness claims that the paper seeks to distinguish against
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Experiment 3 comparison: zero-shot control shows lower semantic convergence than experimental condition
- Appendix E replication of DIM alignment finding in Qwen model
- Validates that agentic self-evaluation captures genuine emotional content of probes
- High cosine similarity for Gemma3 steering vectors suggests strong linear reflection structure.
- Experiment 4 result showing DIM captures only one facet of the multi-dimensional truth subspace
- Mechanistic evidence that network actively attenuates injected perturbations, explaining late-layer introspection failure
- Shows persona space axes are inherited from pre-training, not solely created by post-training
- Strong positive relationship between emotion alignment and SAE feature persistence in Cogito