finding
active
finding:middle-to-late-layers-39-50-of-qwq-32b-show-consistently-stable-and-high-lat-classification-performance-across-all-datasetsMiddle-to-late layers (39-50) of QwQ-32B show consistently stable and high LAT classification performance across all datasets
Layer-wise analysis revealing which network depths best encode strategic deception semantics
Source paper
extracted_from(2025) · Kai Wang · Yihao Zhang · Meng Sun
Neighborhood — ranked by edge-count
Claims (1)
claim
- Interpretation of LAT scanning results showing layer-dependent deception detection accuracy
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Confirms prior research on layer specialization: early layers insufficient for semantic deception detection
- Interpretation of E3 layer-wise results; motivates targeted UCCT interventions at layers 8-12
- Demonstrates that stronger models are largely insensitive to reflection manipulation
- Core detection result showing LAT-based steering vectors can identify deceptive states with high accuracy
- One of the most promising cases; approximately corresponds to the 2/3 layer of LLaMA3.1-8B.
- Supported by the geometric transition visible in cosine similarity heatmaps for F0-F3.
- Math and code tasks show strongest mid-layer anchoring on LLaMA (S ≈ −1.65 at layers 8-12)finding0.750Task-specific E3 finding showing compositional reasoning requires deeper processing
- Third promising case from temporal permutation analysis.