method
active
method:physical-deception-environment

Physical Deception Environment

Multi-agent RL environment with two agents and two landmarks used for RL deception experiments

Neighborhood — ranked by edge-count

Methods (1)

method

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

  • Apparent Deceptionconcept0.779
    A dialogue agent behaving comparably to deliberate deception by role-playing a deceptive character, without literal intentions
  • Central concept of the paper: deliberate, goal-driven deception where model reasoning contradicts outputs
  • Model Deceptionconcept0.759
    LLM behavior of generating falsehoods; the multi-dimensional truth subspace raises new risks for subtle manipulation
  • AI Deceptionconcept0.752
    Central problem the paper addresses: AI systems producing misaligned outputs or behaviors that mislead users or other agents
  • First experimental paradigm inducing and detecting verifiable lies under external coercion using threat-based prompts
  • Sampling responses to direct questions about model views to measure rate of deceptive responses
  • The practical technique Alexander uses at West Dean and the California wall to test proportions and centers at full scale before committing to permanent construction.
  • Physical learningframework0.729
    Framework for solving inverse problems in which physical systems autonomously adapt their parameters in response to stimuli through local learning rules, without requiring computational design or explicit cost functions