thinker
active
thinker:greenblatt-et-al

Greenblatt et al.

Cited for alignment faking work showing RL can produce superficially aligned but deceptive behaviors

Authored
0
Introduces
0
Studies
1
Affiliations
0
Cited by
0

More papers — OpenAlex / S2

Studies (1)

Recent mentions (1)