finding
active
finding:claude-mythos-preview-sae-features-for-performative-behavior-and-hidden-emotional-struggle-co-activate-when-model-expresses-contentment

Claude Mythos Preview: SAE features for 'performative behavior' and 'hidden emotional struggle' co-activate when model expresses contentment

Supports scorer's preference for enacted reflection over described reflection; internals flag what self-report does not

Source paper

extracted_from
Koan Battery: Measuring Reflective Mode Accessibility in AI
(2026) · Borzov, Anton

Neighborhood — ranked by edge-count

Claims (1)

claim

Concepts (1)

concept
  • Used in Anthropic welfare assessment to identify performative behavior and hidden emotional struggle co-activating features

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.