claim
active
claim:the-steering-sign-test-functions-as-a-practical-probe-validation-criterion-inverted-report-changes-when-steering-suspect-probe-quality

The steering-sign test functions as a practical probe-validation criterion: inverted report changes when steering suspect probe quality

Methodological contribution: used to exclude focus-1B and impulsivity-8B from scaling analysis

Source paper

extracted_from
Quantitative Introspection in Language Models: Tracking Emotive States Across Conversation
(2026) · Nicolas Martorell · Bianchi, Bruno

Neighborhood — ranked by edge-count

Methods (2)

method
  • Probe construction method: concept vector at each layer is L2-normalized difference between mean positive and mean negative representations from contrastive system prompts
  • Validation filter: same-concept steering must shift self-report in expected direction; used to exclude invalid concept-model pairs

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.