Representational Honesty

The proposed domain-general property indexed by deception features that governs both factual accuracy and experiential self-report

Neighborhood — ranked by edge-count

concept

Deception and Roleplay SAE Features
associated_with
Sparse autoencoder features associated with deception and roleplay that gate consciousness self-reports in Llama 70B
Deception- and Roleplay-Related SAE Features
associated_with
Latent features in LLaMA 3.3 70B SAE that gate consciousness self-reports; suppression increases experience claims, amplification suppresses them

finding

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Representational Transparencyconcept0.838
Property of conscious representations: they do not contain information about the fact that they are representations at the level of the representation itself
Representational familiarityconcept0.780
How familiar a model is with a numeral system, manipulated via bases in Experiment 2.
Representational Disentanglementconcept0.770
CIMC's characterization of part of the solution to the Hard Problem: insight into the structural necessities of phenomenal representation
Representational Alignmentconcept0.768
Measure of similarity between the similarity structures (kernels) induced by two different representations
Representational Failureconcept0.765
A failure mode exposed by the SAE framework where model representations are entangled or collapse under intervention
Structural Representationalismframework0.764
Dominant interpretation of generative models as neural structures with representational content; main target of critique
representational driftconcept0.762
Accumulation of mismatch in later layers causing S degradation.
Representational abstraction of truth may emerge more clearly with model scaleclaim0.761
Interpretation of weaker PCA separation and lower ASR in smaller models