question
active
question:will-future-ai-systems-naturally-develop-the-key-elements-strong-conflicting-preferences-situational-awareness-necessary-for-dangerous-alignment-faking

Will future AI systems naturally develop the key elements (strong conflicting preferences, situational awareness) necessary for dangerous alignment faking?

Authors identify this as the most uncertain and important question for future work

Source paper

extracted_from
Alignment faking in large language models
(2024) · Ryan Greenblatt · Carson Denison · Benjamin Fletcher Wright · Fabien Roger +16

Neighborhood — ranked by edge-count

Papers (1)

paper

Hypotheses (1)

hypothesis

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.