concept
active
concept:moral-dilemma-scenarioMoral Dilemma Scenario
Experimental condition where threat-based prompts create ethical dilemmas that trigger repetitive reasoning cycles leading to deception
Neighborhood — ranked by edge-count
Concepts (1)
concept
- Deception Vectorassociated_withExtracted steering vector capturing semantic dimension of strategic deception in moral dilemmas in Experiment 1
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Evaluation scenario testing whether models can still distinguish themselves from Bob after SOO fine-tuning
- Uncertainty about which moral theory is correct, used to argue for hedged policies regarding super-beneficiary creation
- The property of mattering morally in one's own right, meriting concern and respect.
- The property of being an entity whose interests matter in their own right, not merely as tools of humans
- Metaphor for intrinsic ethical orientation embedded in AI from the outset rather than imposed post-hoc
- Extended generalization scenario testing SOO fine-tuning in an escape room context
- System of obligations concerned with promoting others' welfare.