finding
active
finding:large-language-models-develop-surprisingly-coherent-yet-often-rigid-internal-preferences-as-they-scale

Large language models develop surprisingly coherent yet often rigid internal preferences as they scale

Mazeika et al. finding reinforcing the need for emptiness-based flexible value architectures

Source paper

extracted_from
Contemplative Agent
(2025) · Ruben Laukkonen · Fionn Inglis · Shamil Chandaria · Lars Sandved-Smith +4

Neighborhood — ranked by edge-count

Claims (1)

claim

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.