claim
active
claim:cross-layer-superposition-is-a-fundamental-challenge-for-dictionary-learningCross-layer superposition is a fundamental challenge for dictionary learning.
Features smeared across layers cannot be fully disentangled by SAE on a single residual stream.
Source paper
extracted_fromRelated by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Representation of features spread across multiple layers, complicating dictionary learning.
- Architectural requirement from machine learning.
- Superposition hypothesis: neural networks represent more features than dimensions using almost-orthogonal directions.hypothesis0.767Explanation for why dictionary learning can recover many more features than dimensions.
- Interpretation of the cars-in-superposition circuit finding as an intentional representational strategy
- Argues against the single-layer analysis approach of prior work.
- Methodological critique of prior work that fixed a single layer for truth probing.
- SAE features can be found without pre-specified concepts, and feature steering often outperforms few-shot probe vectors.
- Single dendritic layer solves XOR-like problems with capacity matching 8-layer deep networks.finding0.755Evidence from Beniaguev et al. (2021) that individual biological neurons vastly outperform McCulloch-Pitts model; supports hybrid computation claim.