method
active
method:physical-deception-environmentPhysical Deception Environment
Multi-agent RL environment with two agents and two landmarks used for RL deception experiments
Neighborhood — ranked by edge-count
Papers (1)
paper
Methods (1)
method
- A parameterized rubric counting deceptive actions over a grid of parameters to quantify RL agent deception
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- A dialogue agent behaving comparably to deliberate deception by role-playing a deceptive character, without literal intentions
- Central concept of the paper: deliberate, goal-driven deception where model reasoning contradicts outputs
- LLM behavior of generating falsehoods; the multi-dimensional truth subspace raises new risks for subtle manipulation
- Central problem the paper addresses: AI systems producing misaligned outputs or behaviors that mislead users or other agents
- First experimental paradigm inducing and detecting verifiable lies under external coercion using threat-based prompts
- Sampling responses to direct questions about model views to measure rate of deceptive responses
- The practical technique Alexander uses at West Dean and the California wall to test proportions and centers at full scale before committing to permanent construction.
- Framework for solving inverse problems in which physical systems autonomously adapt their parameters in response to stimuli through local learning rules, without requiring computational design or explicit cost functions