concept
active
concept:latent-variables-in-distributed-abstractionLatent Variables in Distributed Abstraction
Output of alignment map ϕ applied to DNN hidden states; basis for distributed causal abstraction
Neighborhood — ranked by edge-count
Concepts (1)
concept
- Alignment Map (ϕ)associated_withThe bijective function mapping DNN inner neurons to latent variables in causal abstraction; its complexity is the central variable studied
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Key notion where alignment map ϕ maps neurons block-wise to latent variables before constructive abstraction
- Entities that become visible as centers in a configuration (e.g., rectangles of white space around a dot) that were not present before.
- Reasoning approach using learnable hidden embeddings.
- Statistical regularities stored in pretrained models.
- Idea that information is spread across many neurons; superposition is a subtype.
- Methods that use latent reasoning; lack task generalization and are difficult to train with autoregressive parallelization.
- A framework the paper uses alongside feature geometry to deepen mechanistic understanding of LMs
- The categorical representation produced by the VAE encoder; used as input to the self-prior and policy networks