concept
active
concept:partial-introspection

partial introspection

The authors' characterization of genuine but limited introspective capability found only in early-layer injection regimes

Neighborhood — ranked by edge-count

Communities (1)

community

Concepts (2)

concept
  • Introspection
    related_to
    The ability of a model to observe its own past internal states or computations; claimed to be architecturally permitted by transformers.
  • Ji-An et al.'s characterization of the limited regime in which model self-report succeeds, consistent with this paper's findings

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

  • Pearson-Vogel et al.'s finding that models can detect prior concept injections; introspective signals exist in middle layers suppressed by post-training
  • Tracking of functional/computational cognitive states, distinguished from phenomenal introspection.
  • The capacity of a model to self-report on its internal emotional state when its SAE features are steered, used here as a measurement tool
  • Direct introspection into phenomenal consciousness; its correlation with functional introspection is an open question.
  • AI Introspectionconcept0.803
    Key gap identified in the literature; systematic self-examination processes for machine consciousness development.
  • Identified gap; methods for enabling machine consciousness development through self-examination.
  • The capacity to detect and report one's own internal states, measured via the five-adjective task and paradox reflection
  • The central concept: the ability of a model to access and report on its internal states, as defined by the paper's criteria.