concept
active
concept:4-layer-67m-parameter-model4-layer 67M-parameter model
The model (trained on The Pile) on which VPD is demonstrated to scale.
Neighborhood — ranked by edge-count
Findings (1)
finding
- Empirical demonstration of VPD on a mid-scale transformer, establishing feasibility.
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Claim that geometry-to-behavior correlates exist
- The layer approximately two-thirds through an LLM's transformer stack, reported to best predict human brain activity; identified as promising for consciousness indicators.
- Layers where anchoring weakens systematically due to representational drift.
- Median layer where S(ℓ) peaks, across seeds.
- A 1:50 scale model used for overall design simulation of the Athens Megaron spaces and floors.
- Validates representational drift theory: later layers specialize for next-token prediction, increasing dr
- Decomposition of all 24 weight matrices in a 67M-parameter LM yields ~10,000 parameter subcomponentsfinding0.687Quantitative result of VPD application; the network's 24 matrices decompose into approximately 10,000 rank-one subcomponents.
- Empirical observation that steering efficiency peaks at middle transformer layers, consistent with emotion representation literature