Change-of-Basis for Neural Representations

Key insight that rotating a neural representation to a non-standard basis can reveal distributed causal structure invisible in standard neuron-aligned basis.

Neighborhood — ranked by edge-count

Methods (1)

method

Distributed Alignment Search
implements
The core method introduced in this paper: finds alignments between high-level causal variables and distributed neural representations via gradient descent.

Concepts (1)

concept

Distributed Neural Representations
associated_with
Representations where individual neurons play multiple conceptual roles; patterns consisting of linear combinations of unit vectors.

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Smolensky (1986) proposes that viewing a neural representation under a basis that is not aligned with individual neurons can reveal the interpretable distributed structure of the neural representations.quote0.779
Load-bearing theoretical claim providing the conceptual foundation for DAS.
Neural Representations of Location Composed of Spatially Periodic Bands (Krupic et al., 2012)concept0.773
Discovery of band cells; TEM-t also recapitulates these representations.
Neural representation geometry causally shapes behavior; interventions respecting that geometry will yield natural trajectories.hypothesis0.771
Central hypothesis tested via manifold steering experiments across language models and video world models.
Neural Representation Geometryconcept0.770
The broader conceptual framework that neural activations exhibit non-Euclidean geometric structure causally linked to behavior.
Geometric structure of neural representations causally shapes model behaviorclaim0.769
The paper's core causal assertion: geometry is not incidental but mechanistically linked to behavior
Neural codeconcept0.768
The model's parameters considered as the actual 'code' implementing its algorithms, as opposed to human-written code.
Do divergent representations change what an intervention can say about an NN's natural mechanisms?question0.761
Core research question motivating the paper
neural substratesconcept0.760
Brain-based physical implementations of consciousness-related functions, assumed by many ToCs to be exclusive.