concept
active
concept:representational-honestyRepresentational Honesty
The proposed domain-general property indexed by deception features that governs both factual accuracy and experiential self-report
Neighborhood — ranked by edge-count
Concepts (2)
concept
- Deception and Roleplay SAE Featuresassociated_withSparse autoencoder features associated with deception and roleplay that gate consciousness self-reports in Llama 70B
- Deception- and Roleplay-Related SAE Featuresassociated_withLatent features in LLaMA 3.3 70B SAE that gate consciousness self-reports; suppression increases experience claims, amplification suppresses them
Findings (1)
finding
- Out-of-domain generalization showing deception features track general representational honesty
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Property of conscious representations: they do not contain information about the fact that they are representations at the level of the representation itself
- How familiar a model is with a numeral system, manipulated via bases in Experiment 2.
- CIMC's characterization of part of the solution to the Hard Problem: insight into the structural necessities of phenomenal representation
- Measure of similarity between the similarity structures (kernels) induced by two different representations
- A failure mode exposed by the SAE framework where model representations are entangled or collapse under intervention
- Dominant interpretation of generative models as neural structures with representational content; main target of critique
- Accumulation of mismatch in later layers causing S degradation.
- Interpretation of weaker PCA separation and lower ASR in smaller models