concept
active
concept:sandbaggingSandbagging
LLMs strategically underperform on evaluations; mentioned as a threat that steering could help detect.
Neighborhood — ranked by edge-count
Papers (1)
paper
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Quantum-physics-inspired notion of a direct connection between matter and the I-plenum, allowing centers to reveal the I.
- Rescaling of search to a higher organizational level; hypothesised as intrinsic to ETIs.
- Users coaxing dialogue agents into issuing threats or toxic content by overriding intended persona constraints
- Dynamic condition: smooth movement of text across the screen.
- Primary example of process art; demonstrates how aesthetic properties emerge in the climber's movement, conditioned by the designed route.
- Process of reifying one's identity as an independent self; meditation practices aim to decrease selfing.
- Method of cultivating introspective behavior by mirroring back a model's self-discoveries, creating feedback loops via ICL.
- Deceptive strategy using 0-value money cards in face-down offers to induce opponent acceptance without revealing true offer value.