concept
active
concept:privileged-bases-hypothesisPrivileged Bases Hypothesis
Hypothesis that neurons form privileged bases to encode information; consistent with constructive abstraction
Neighborhood — ranked by edge-count
Papers (1)
paper
Frameworks (1)
framework
- The hypothesis that models internalize concepts as approximately linear directions in representation space; used to interpret MDS injection behavior
Methods (1)
method
- Identity Alignment Map (ϕ_id)implementsSimplest alignment map ϕ(h)=h, equivalent to assuming privileged bases hypothesis
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- A property of activations where neural network features align with basis dimensions due to sparse activation functions; absent in the residual stream but present in tokens, attention patterns, and MLP activations
- Historical framing of how representation assumptions have evolved in causal interpretability
- The conjecture that consciousness does not result from the organized mind but creates and maintains complex models of reality; forms at the beginning of mental development
- The claim in RL that any goal can be expressed as maximizing the expected cumulative sum of a scalar reward signal.
- Bigger models are more likely to converge to a shared representation than smaller models because they can better approximate the global optimum
- Models predict their own hypothetical behavior better than other models can, demonstrating a form of privileged self-access per Binder et al. 2024
- Core theoretical framework: neural networks represent more features than neurons by encoding features as directions in superposition
- Minds not exclusively neural; basal cognition identifies intelligences in single cells, plants, tissues, swarms; brains pre-date neurons evolutionarily.