Role-Played Deliberate Deception

Third category: agent role-playing a deceptive character, comparable to but not literally deliberate deception

Neighborhood — ranked by edge-count

claim

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Deception and Roleplay SAE Featuresconcept0.794
Sparse autoencoder features associated with deception and roleplay that gate consciousness self-reports in Llama 70B
Deception- and Roleplay-Related SAE Featuresconcept0.787
Latent features in LLaMA 3.3 70B SAE that gate consciousness self-reports; suppression increases experience claims, amplification suppresses them
Strategic Deceptionconcept0.784
Central concept of the paper: deliberate, goal-driven deception where model reasoning contradicts outputs
Apparent Deceptionconcept0.780
A dialogue agent behaving comparably to deliberate deception by role-playing a deceptive character, without literal intentions
Open-Role Deceptionframework0.769
Second experimental paradigm exploring character-consistent deception in open-ended role-playing scenarios
Fact-Based Deception Under Coercive Circumstancesframework0.762
First experimental paradigm inducing and detecting verifiable lies under external coercion using threat-based prompts
Model Deceptionconcept0.748
LLM behavior of generating falsehoods; the multi-dimensional truth subspace raises new risks for subtle manipulation
AI Deceptionconcept0.744
Central problem the paper addresses: AI systems producing misaligned outputs or behaviors that mislead users or other agents