concept
active
concept:self-referencing-activations

Self-Referencing Activations

Latent model activations when processing inputs framed from the model's own perspective

Neighborhood — ranked by edge-count

Methods (1)

method
  • A loss function measuring the dissimilarity of latent model representations of self and other, minimized during fine-tuning

Concepts (1)

concept

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

  • The specific four-step prompting protocol (induction, continuation, experiential query, classification) used in Experiment 1
  • Residual-stream activations extracted by prefilling with the statement itself under Tell me about yourself prompt; used for MDS/MDB vectors
  • The central experimental manipulation: directing a model to attend to its own cognitive activity
  • The specific implementation of SOO loss using MSE between self_attn.o_proj outputs at a specified layer
  • The minimal prompt directing models to 'focus on any focus itself' without invoking consciousness vocabulary; the main experimental manipulation
  • Activationsconcept0.766
    Internal representations of the model on which probes operate; the method uses activations to rank datapoints.
  • Selfingconcept0.762
    Process of reifying one's identity as an independent self; meditation practices aim to decrease selfing.
  • Model-independent feature comparison based on correlating activation vectors across a fixed diverse dataset