paper
referenced-only
paper:t-ai-sandbagging-language-models-can-strat-2025

AI sandbagging: Language models can strategically underperform on evaluations

External IDs

title_hash
8b5b479ccf482db75f7adeef6879393a3b87c9f5
legacy_slug
t-ai-sandbagging-language-models-can-strat-2025
Frontmatter (8 fields)
{
  "doi": null,
  "year": 2025,
  "title": "AI sandbagging: Language models can strategically underperform on evaluations",
  "venue": "International Conference on Learning Representations",
  "authors": [
    "van der Weij, T.",
    "Hofstätter, F.",
    "Jaffe, O.",
    "Brown, S. F.",
    "Ward, F. R."
  ],
  "arxiv_id": null,
  "s2_paper_id": null,
  "ingest_status": "referenced-only"
}

Outgoing (0)

None.

Incoming (0)

None.