Latent Variables in Distributed Abstraction

Output of alignment map ϕ applied to DNN hidden states; basis for distributed causal abstraction

Neighborhood — ranked by edge-count

concept

Alignment Map (ϕ)
associated_with
The bijective function mapping DNN inner neurons to latent variables in causal abstraction; its complexity is the central variable studied

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Distributed Abstractionconcept0.843
Key notion where alignment map ϕ maps neurons block-wise to latent variables before constructive abstraction
Latent entitiesconcept0.796
Entities that become visible as centers in a configuration (e.g., rectangles of white space around a dot) that were not present before.
latent reasoningconcept0.791
Reasoning approach using learnable hidden embeddings.
latent patternsconcept0.767
Statistical regularities stored in pretrained models.
Distributed representationconcept0.767
Idea that information is spread across many neurons; superposition is a subtype.
latent methodsconcept0.766
Methods that use latent reasoning; lack task generalization and are difficult to train with autoregressive parallelization.
Causal abstractionconcept0.761
A framework the paper uses alongside feature geometry to deepen mechanistic understanding of LMs
Discrete Latent Stateconcept0.761
The categorical representation produced by the VAE encoder; used as input to the self-prior and policy networks