method
active
method:latent-soo-metricLatent SOO Metric
Metric measuring the mean MSE between self and other-referencing activations across all hidden MLP/attention layers
Neighborhood — ranked by edge-count
Papers (1)
paper
Concepts (1)
concept
- Self-Other OverlapimplementsThe extent to which a model exhibits similar internal representations when reasoning about itself and others in similar contexts
Claims (1)
claim
- Neural self-other overlap provides a hard-to-fake metric for classifying deceptive vs honest agentssupportsClaim that SOO is particularly useful as a detection metric because it is based on latent representations rather than observable behavior
Conceptual bridges
2-hop · via this method's ideasWhere ideas in this method connect to the rest of the corpus — the same concept, an analogy, or a restatement elsewhere.
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Methods that use latent reasoning; lack task generalization and are difficult to train with autoregressive parallelization.
- Statistical regularities stored in pretrained models.
- Reasoning approach using learnable hidden embeddings.
- Baseline method using a single orthogonal matrix trained to map source latents to target latents via CL auxiliary loss without behavioral objective.
- Interpretable features extracted by sparse autoencoders used as steering targets in this study
- Substrate on which causal emergence was computed across agent lifetimes; aligned with learning success.
- Hidden or underdeveloped structures existing 'between the lines' of a configuration that can be enhanced and developed through harmony-seeking computation.
- Output of alignment map ϕ applied to DNN hidden states; basis for distributed causal abstraction