finding
active
finding:earlier-less-capable-models-exhibit-a-larger-gap-between-think-and-don-t-think-representation-strengthEarlier/less capable models exhibit a larger gap between think and don't think representation strength
Claude 3 models show a bigger difference than newer models like Opus 4.1.
Source paper
extracted_from(2026) · Lindsey, Jack
Neighborhood — ranked by edge-count
Communities (3)
community
- Spans attention head decomposition, benchmark awareness, and genomic pathogenicity prediction via neural models.
- Probing Claude and other models for internal detection of artificially injected thoughts across layers.
- Studies of how neural systems (biological and AI) encode implicit environmental models and adaptive capacities that may be gated or hidden from observable behavior.
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- All models exhibit above-baseline representation of the think word when instructed to think about itfinding0.818In the intentional control experiment, all tested models show above-zero cosine similarity to the think word's concept vector.
- Selective pressure toward convergence via task generality
- Caveat and forward-looking statement from the abstract.
- Bigger models are more likely to converge to a shared representation than smaller modelshypothesis0.788Selective pressure toward convergence via model capacity
- The model tends to reflect more when the question is difficult, and accuracy is generally lower for harder questionshypothesis0.786Hypothesis explaining negative correlation between reflection rate and accuracy without implying reflection is harmful
- Author's interpretation of the VTAB alignment results echoing Tolstoy
- Practical bottleneck explaining why these phenomena are not widely studied.
- Comparative prediction motivating future work contrasting different approaches to LLM self-knowledge