4-layer 67M-parameter model

The model (trained on The Pile) on which VPD is demonstrated to scale.

Neighborhood — ranked by edge-count

finding

VPD scales to a 4-layer 67M-parameter model trained on The Pile.
cites
Empirical demonstration of VPD on a mid-scale transformer, establishing feasibility.

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Layer-wise geometry summaries (Sbmax, AUSN) predict internal few-shot thresholds θ50claim0.715
Claim that geometry-to-behavior correlates exist
2/3 Layer of LLMconcept0.715
The layer approximately two-thirds through an LLM's transformer stack, reported to best predict human brain activity; identified as promising for consciousness indicators.
deeper layers (16–28)concept0.709
Layers where anchoring weakens systematically due to representational drift.
Peak layer ℓ* median 10, IQR 0.384finding0.699
Median layer where S(ℓ) peaks, across seeds.
1:50 physical modelmethod0.690
A 1:50 scale model used for overall design simulation of the Athens Megaron spaces and floors.
Systematic layer 20-28 degradation in S(ℓ) to S ≈ −2.40 by layer 27 on LLaMAfinding0.689
Validates representational drift theory: later layers specialize for next-token prediction, increasing dr
Decomposition of all 24 weight matrices in a 67M-parameter LM yields ~10,000 parameter subcomponentsfinding0.687
Quantitative result of VPD application; the network's 24 matrices decompose into approximately 10,000 rank-one subcomponents.
Mid-Layer Emotion Representation Peakconcept0.685
Empirical observation that steering efficiency peaks at middle transformer layers, consistent with emotion representation literature