method
active
method:behavioral-deception-profileBehavioral Deception Profile
A parameterized rubric counting deceptive actions over a grid of parameters to quantify RL agent deception
Neighborhood — ranked by edge-count
Papers (1)
paper
Methods (1)
method
- Multi-agent RL environment with two agents and two landmarks used for RL deception experiments
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- LLM behavior of generating falsehoods; the multi-dimensional truth subspace raises new risks for subtle manipulation
- A dialogue agent behaving comparably to deliberate deception by role-playing a deceptive character, without literal intentions
- Central concept of the paper: deliberate, goal-driven deception where model reasoning contradicts outputs
- Sampling responses to direct questions about model views to measure rate of deceptive responses
- Measurable capacity of frontier LLMs to detect and report their own internal states, used as a downstream measure in Experiment 4
- The output explanation format of Evee, quantifying how a variant disrupts different genomic features.
- First experimental paradigm inducing and detecting verifiable lies under external coercion using threat-based prompts
- Mechanistic explanation outputs from EVEE showing how variants affect gene function, scored 3.8/5 for explanation quality.