finding
active
finding:lat-classifiers-perform-worst-on-the-companions-dataset-weakest-model-cognition-domain-while-achieving-100-f1-on-facts-and-animals-datasetsLAT classifiers perform worst on the Companions dataset (weakest model cognition domain) while achieving 100% F1 on Facts and Animals datasets
Shows strong correlation between layer-wise representations and domain-specific semantic understanding
Source paper
extracted_from(2025) · Kai Wang · Yihao Zhang · Meng Sun
Neighborhood — ranked by edge-count
Claims (1)
claim
- Interpretation of LAT scanning results showing layer-dependent deception detection accuracy
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Shows that truth representations are not reducible to text probability representations
- Suggests fundamental differences in learning dynamics between normal and chronic perception models
- Motivation for using sparsity-based dictionary learning on language models
- Diagnosis of first failure mode explaining low harness-benefit for weak-tier models
- Motivates the introduction of mass-mean probing as an alternative to LR
- Demonstrates that early-layer probes capture sentence polarity rather than truth.
- Out-of-context reasoning work directly related to synthetic document fine-tuning experiments
- Central claim of the paper: the method scales to state-of-the-art transformers.