claim
active
claim:deception-related-sae-features-track-a-domain-general-representational-honesty-axis-rather-than-a-consciousness-specific-roleplay-artifact

Deception-related SAE features track a domain-general representational honesty axis rather than a consciousness-specific roleplay artifact

Supported by TruthfulQA generalization in Experiment 2: same feature directions gate factual accuracy across 29 independent categories

Source paper

extracted_from
Large Language Models Report Subjective Experience Under Self-Referential Processing
(2025) · Berg, Cameron · de Lucena, Diogo · Rosenblatt, Judd

Neighborhood — ranked by edge-count

Findings (2)

finding

Claims (2)

claim

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.