finding
active
finding:neuroticism-construct-classifier-achieves-99-00-accuracy-on-held-out-statement-corpusNeuroticism construct classifier achieves 99.00% accuracy on held-out statement corpus
Highest individual classifier performance among OCEAN constructs
Source paper
extracted_from(2026) · Leonardo Blas · Robin Jia · Emilio Ferrara
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Psychopathy construct classifier achieves 90.50% accuracy, lowest among all evaluated constructsfinding0.796Lowest individual classifier performance
- Validates use of lightweight classifiers as replacement for frontier LLM evaluation during alpha sweeps
- Table 2, row 3, showing equivalence when prior preferences match rewards.
- Opus 4.1 is most effective at recognizing injected abstract concepts (e.g., justice, peace) but detects other categories too.
- SAE features are not simply mirroring individual neurons.
- Comparative prediction motivating future work contrasting different approaches to LLM self-knowledge
- Mechanistic finding from CausalGym case study showing multi-step information movement in NPI mechanism
- Core claim that standard criteria fail for novel agents.