concept
active
concept:privileged-self-access

Privileged self-access

Models predict their own hypothetical behavior better than other models can, demonstrating a form of privileged self-access per Binder et al. 2024

Neighborhood — ranked by edge-count

Thinkers (1)

thinker
  • Felix J. Binder
    introduces
    Demonstrated models predict their own behavior better than others (privileged self-access) and studied introspection in constrained settings

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

  • Privileged Basisconcept0.810
    A property of activations where neural network features align with basis dimensions due to sparse activation functions; absent in the residual stream but present in tokens, attention patterns, and MLP activations
  • Vulnerable Selfconcept0.780
    The childlike, genuine human part of oneself needed to create true life.
  • The capacity to detect and report one's own internal states, measured via the five-adjective task and paradox reflection
  • Information available for reasoning, report, and decision-making.
  • A dialogue agent using first-personal pronouns and expressing self-concern in ways that suggest consciousness but are actually role play
  • Hypothesis that neurons form privileged bases to encode information; consistent with constructive abstraction
  • The implicit capacity the self-prior implements by assigning high density to familiar self-states and low density to non-self states
  • Self-attentionconcept0.726
    A form of key-query attention within a single input sequence; core to Transformers.