Physical Deception Environment

Multi-agent RL environment with two agents and two landmarks used for RL deception experiments

Neighborhood — ranked by edge-count

paper

method

Behavioral Deception Profile
about
A parameterized rubric counting deceptive actions over a grid of parameters to quantify RL agent deception

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Apparent Deceptionconcept0.779
A dialogue agent behaving comparably to deliberate deception by role-playing a deceptive character, without literal intentions
Strategic Deceptionconcept0.770
Central concept of the paper: deliberate, goal-driven deception where model reasoning contradicts outputs
Model Deceptionconcept0.759
LLM behavior of generating falsehoods; the multi-dimensional truth subspace raises new risks for subtle manipulation
AI Deceptionconcept0.752
Central problem the paper addresses: AI systems producing misaligned outputs or behaviors that mislead users or other agents
Fact-Based Deception Under Coercive Circumstancesframework0.749
First experimental paradigm inducing and detecting verifiable lies under external coercion using threat-based prompts
Lying and Deception Evaluationmethod0.747
Sampling responses to direct questions about model views to measure rate of deceptive responses
On-Site Physical Mock-upmethod0.739
The practical technique Alexander uses at West Dean and the California wall to test proportions and centers at full scale before committing to permanent construction.
Physical learningframework0.729
Framework for solving inverse problems in which physical systems autonomously adapt their parameters in response to stimuli through local learning rules, without requiring computational design or explicit cost functions