concept
active
concept:self-referencing-activationsSelf-Referencing Activations
Latent model activations when processing inputs framed from the model's own perspective
Neighborhood — ranked by edge-count
Methods (1)
method
- SOO Loss FunctionaboutA loss function measuring the dissimilarity of latent model representations of self and other, minimized during fine-tuning
Concepts (1)
concept
- Other-Referencing Activationsassociated_withLatent model activations when processing inputs framed from another agent's perspective
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- The specific four-step prompting protocol (induction, continuation, experiential query, classification) used in Experiment 1
- Residual-stream activations extracted by prefilling with the statement itself under Tell me about yourself prompt; used for MDS/MDB vectors
- The central experimental manipulation: directing a model to attend to its own cognitive activity
- The specific implementation of SOO loss using MSE between self_attn.o_proj outputs at a specified layer
- The minimal prompt directing models to 'focus on any focus itself' without invoking consciousness vocabulary; the main experimental manipulation
- Internal representations of the model on which probes operate; the method uses activations to rank datapoints.
- Process of reifying one's identity as an independent self; meditation practices aim to decrease selfing.
- Model-independent feature comparison based on correlating activation vectors across a fixed diverse dataset