claim
active
claim:even-validated-probes-may-capture-distributed-representations-mixing-emotive-states-with-correlated-features-like-persona-or-style

Even validated probes may capture distributed representations mixing emotive states with correlated features like persona or style

Caveat on probe interpretation; does not negate the introspection result but affects interpretation of the target variable

Source paper

extracted_from
Quantitative Introspection in Language Models: Tracking Emotive States Across Conversation
(2026) · Nicolas Martorell · Bianchi, Bruno

Neighborhood — ranked by edge-count

Methods (1)

method
  • Probe construction method: concept vector at each layer is L2-normalized difference between mean positive and mean negative representations from contrastive system prompts

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.