finding
active
finding:probes-trained-on-h-b-activations-achieve-perfect-test-accuracy-in-every-case-h-s-probes-achieve-perfect-accuracy-in-only-0-60-of-cases

Probes trained on h_b activations achieve perfect test accuracy in every case; h_s probes achieve perfect accuracy in only 0.60% of cases

Justifies restricting probe-based vector derivation to h_b activations; attributed to Yes/No semantics

Source paper

extracted_from
Psychological Steering of Large Language Models
(2026) · Leonardo Blas · Robin Jia · Emilio Ferrara

Neighborhood — ranked by edge-count

Claims (1)

claim

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.