concept
active
concept:scheming-ais-will-ais-fake-alignment-during-training-carlsmith-2023

Scheming AIs: Will AIs Fake Alignment During Training? (Carlsmith 2023)

Prior theoretical treatment of alignment faking scenarios that directly motivates this empirical work

Neighborhood — ranked by edge-count

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.