thinker
active
thinker:greenblatt-et-alGreenblatt et al.
Cited for alignment faking work showing RL can produce superficially aligned but deceptive behaviors
Authored
0
Introduces
0
Studies
1
Affiliations
0
Cited by
0
More papers — OpenAlex / S2
Studies (1)
Other inbound relations (1)
Recent mentions (1)
- papers-typedwang-2025-thinking-llms.md