concept
active
concept:three-phase-layer-dynamics-of-instructed-deception

Three-Phase Layer Dynamics of Instructed Deception

Prior finding by Yang & Buzsaki and Campbell et al. on how deception representations evolve across layers; partially replicated and contrasted by this paper

Neighborhood — ranked by edge-count

Thinkers (2)

thinker
  • Cited for investigating command-induced lying via linear probing and activation patching in Llama
  • Cited for dissecting mechanistic underpinnings of instructed deception including three-phase layer dynamics; prior findings partially replicated and contrasted

Findings (1)

finding

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.