Privileged Bases Hypothesis

Hypothesis that neurons form privileged bases to encode information; consistent with constructive abstraction

Neighborhood — ranked by edge-count

Papers (1)

paper

The Non-Linear Representation Dilemma: Is Causal Abstraction Enough for Mechanistic Interpretability?
introduces

Frameworks (1)

framework

Linear Representation Hypothesis
extends
The hypothesis that models internalize concepts as approximately linear directions in representation space; used to interpret MDS injection behavior

Methods (1)

method

Identity Alignment Map (ϕ_id)
implements
Simplest alignment map ϕ(h)=h, equivalent to assuming privileged bases hypothesis

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Privileged Basisconcept0.823
A property of activations where neural network features align with basis dimensions due to sparse activation functions; absent in the residual stream but present in tokens, attention patterns, and MLP activations
Early causal abstraction methods (Geiger et al. 2021) implicitly rely on the privileged bases hypothesis, while recent methods (Geiger et al. 2024b) rely on the linear representation hypothesisclaim0.766
Historical framing of how representation assumptions have evolved in causal interpretability
Genesis Hypothesisframework0.750
The conjecture that consciousness does not result from the organized mind but creates and maintains complex models of reality; forms at the beginning of mental development
Reward Hypothesisconcept0.748
The claim in RL that any goal can be expressed as maximizing the expected cumulative sum of a scalar reward signal.
Capacity Hypothesishypothesis0.739
Bigger models are more likely to converge to a shared representation than smaller models because they can better approximate the global optimum
Privileged self-accessconcept0.731
Models predict their own hypothetical behavior better than other models can, demonstrating a form of privileged self-access per Binder et al. 2024
Superposition Hypothesisframework0.731
Core theoretical framework: neural networks represent more features than neurons by encoding features as directions in superposition
No Privileged Material Substrate for Mindclaim0.729
Minds not exclusively neural; basal cognition identifies intelligences in single cells, plants, tissues, swarms; brains pre-date neurons evolutionarily.