concept
active
concept:natural-distribution-of-representations

Natural Distribution of Representations

The distribution of latent representations produced by the model under unperturbed inputs

Neighborhood — ranked by edge-count

Concepts (1)

concept
  • Core phenomenon studied: when causal interventions shift internal representations away from the natural distribution

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

  • The central question of whether representational geometry implies corresponding computational structure
  • Idea that information is spread across many neurons; superposition is a subtype.
  • The idea that features are encoded as directions in activation space.
  • Representations of one's own mental states; associated with consciousness in higher-order theories.
  • Representations where individual neurons play multiple conceptual roles; patterns consisting of linear combinations of unit vectors.
  • Probability distribution over discrete states or outcomes.
  • Core contribution: the impasse where lifting linearity in alignment maps makes causal abstraction vacuous, but keeping it may miss non-linearly encoded features
  • The evolution of an agent's latent representations over the course of training, shown to align with reward improvement when causal emergence is high.