concept
active
concept:privileged-self-accessPrivileged self-access
Models predict their own hypothetical behavior better than other models can, demonstrating a form of privileged self-access per Binder et al. 2024
Neighborhood — ranked by edge-count
Papers (1)
paper
Thinkers (1)
thinker
- Felix J. BinderintroducesDemonstrated models predict their own behavior better than others (privileged self-access) and studied introspection in constrained settings
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- A property of activations where neural network features align with basis dimensions due to sparse activation functions; absent in the residual stream but present in tokens, attention patterns, and MLP activations
- The childlike, genuine human part of oneself needed to create true life.
- The capacity to detect and report one's own internal states, measured via the five-adjective task and paradox reflection
- Information available for reasoning, report, and decision-making.
- A dialogue agent using first-personal pronouns and expressing self-concern in ways that suggest consciousness but are actually role play
- Hypothesis that neurons form privileged bases to encode information; consistent with constructive abstraction
- The implicit capacity the self-prior implements by assigning high density to familiar self-states and low density to non-self states
- A form of key-query attention within a single input sequence; core to Transformers.