concept
active
concept:latent-variables-in-distributed-abstraction

Latent Variables in Distributed Abstraction

Output of alignment map ϕ applied to DNN hidden states; basis for distributed causal abstraction

Neighborhood — ranked by edge-count

Concepts (1)

concept
  • Alignment Map (ϕ)
    associated_with
    The bijective function mapping DNN inner neurons to latent variables in causal abstraction; its complexity is the central variable studied

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

  • Key notion where alignment map ϕ maps neurons block-wise to latent variables before constructive abstraction
  • Latent entitiesconcept0.796
    Entities that become visible as centers in a configuration (e.g., rectangles of white space around a dot) that were not present before.
  • latent reasoningconcept0.791
    Reasoning approach using learnable hidden embeddings.
  • latent patternsconcept0.767
    Statistical regularities stored in pretrained models.
  • Idea that information is spread across many neurons; superposition is a subtype.
  • latent methodsconcept0.766
    Methods that use latent reasoning; lack task generalization and are difficult to train with autoregressive parallelization.
  • Causal abstractionconcept0.761
    A framework the paper uses alongside feature geometry to deepen mechanistic understanding of LMs
  • The categorical representation produced by the VAE encoder; used as input to the self-prior and policy networks