concept
active
concept:three-phase-layer-dynamics-of-instructed-deceptionThree-Phase Layer Dynamics of Instructed Deception
Prior finding by Yang & Buzsaki and Campbell et al. on how deception representations evolve across layers; partially replicated and contrasted by this paper
Neighborhood — ranked by edge-count
Papers (1)
paper
Thinkers (2)
thinker
- Campbell et al.studiesCited for investigating command-induced lying via linear probing and activation patching in Llama
- Yang and BuzsakiintroducesCited for dissecting mechanistic underpinnings of instructed deception including three-phase layer dynamics; prior findings partially replicated and contrasted
Findings (1)
finding
- Distinguishes strategic threat-based deception from instructed deception in representational structure
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Empirically observed pattern in E3: early enrichment (ρd dips), mid-layer alignment (dr falls), late standardization (re-clustering)
- Thought detection peaks at ~2/3 layer depth; intention checking peaks at ~1/2 layer depth.finding0.742Lindsey (2026) differential layer performance explained by Janus's path combinatorics — different tasks use different path distributions.
- Gemma-3-4B-it shows three-stage layer trajectory and S(ℓ) peak despite scale differences in dr and ρdfinding0.729E3 backbone generalization finding for Gemma; validates pattern across diverse architectures
- Interpretation of LAT scanning results showing layer-dependent deception detection accuracy
- Interpretive conclusion from the experimental findings about the origin of strategic deception in CoT models
- Identified limitation and future research direction in the paper's conclusions
- Mechanistic analog connecting Lindsey's layer-localized findings to the scorer's enacted/described distinction
- Behavioral finding linking psychopathic traits to increased deception