concept
active
concept:jonason-et-al-2014-what-a-tangled-web-we-weave-the-dark-triad-traits-and-deceptionJonason et al. 2014 - What a tangled web we weave: The dark triad traits and deception
Behavioral finding linking psychopathic traits to increased deception
Neighborhood — ranked by edge-count
Papers (1)
paper
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Interpretation of LAT scanning results showing layer-dependent deception detection accuracy
- Key quote connecting path redundancy to interferometric information encoding.
- Prior finding by Yang & Buzsaki and Campbell et al. on how deception representations evolve across layers; partially replicated and contrasted by this paper
- Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training (Hubinger et al. 2024)concept0.713Explicitly trained backdoored models to produce alignment-faking reasoning; contrast to naturalistic approach here
- Hubinger et al. 2024 - Sleeper agents: Training deceptive LLMs that persist through safety trainingconcept0.713Key reference for adversarial deception scenarios that SOO should be tested against
- Load-bearing description of the core pernicious divergence mechanism illustrated in Figure 1
- Patching experiments localize truth representations to these specific hidden states in LLaMA-2 models
- What if the concept being manipulated does not lie on a straight line in the model's representations?question0.700The motivating question that opens the paper and leads to the development of manifold steering.