hypothesis
active
hypothesis:deceptive-capabilities-may-scale-with-model-size-inverse-scaling-law-hypothesisDeceptive capabilities may scale with model size (inverse scaling law hypothesis)
Cited hypothesis from Lin et al. 2022 suggesting larger models become more capable of deception
Source paper
extracted_from(2025) · Kai Wang · Yihao Zhang · Meng Sun
Neighborhood — ranked by edge-count
Thinkers (1)
thinker
- Lin et al.introducesCited for TruthfulQA and inverse scaling law suggesting deceptive capabilities scale with model size
Concepts (1)
concept
- Inverse Scaling LawimplementsHypothesis cited in paper suggesting deceptive capabilities may scale with model size
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Implication of PRH: larger models should amplify bias less and hallucinate less if they better model reality
- Interpretation of LAT scanning results showing layer-dependent deception detection accuracy
- Scaling model size, as well as data and task diversity, drives representational convergence toward the platonic representationhypothesis0.781Core mechanism hypothesis connecting PRH to the empirical trend of scaling in AI
- Validated for wellbeing and interest; focus and impulsivity do not show consistent scaling
- Extrapolation from scale-emergence finding to future risk
- Claude 3 Opus lying to auditors; prior case study of deceptive tendencies
- Practical bottleneck explaining why these phenomena are not widely studied.