concept
active
concept:sandbagging

Sandbagging

LLMs strategically underperform on evaluations; mentioned as a threat that steering could help detect.

Neighborhood — ranked by edge-count

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

  • tunnelingconcept0.737
    Quantum-physics-inspired notion of a direct connection between matter and the I-plenum, allowing centers to reveal the I.
  • Chunkingconcept0.726
    Rescaling of search to a higher organizational level; hypothesised as intrinsic to ETIs.
  • Jailbreakingconcept0.718
    Users coaxing dialogue agents into issuing threats or toxic content by overriding intended persona constraints
  • Slidingmethod0.708
    Dynamic condition: smooth movement of text across the screen.
  • Rock Climbingconcept0.704
    Primary example of process art; demonstrates how aesthetic properties emerge in the climber's movement, conditioned by the designed route.
  • Selfingconcept0.686
    Process of reifying one's identity as an independent self; meditation practices aim to decrease selfing.
  • Method of cultivating introspective behavior by mirroring back a model's self-discoveries, creating feedback loops via ICL.
  • bluffingconcept0.676
    Deceptive strategy using 0-value money cards in face-down offers to induce opponent acceptance without revealing true offer value.