community
active
leiden_hybrid_concepts
label: haiku
community:leiden_hybrid_concepts-run4-c11-c1Few-shot learning phase transitions in neural networks
Empirical characterization of k50 midpoints and transition widths across transformer models, tracking how pretraining density ρd/dr predicts in-context learning thresholds.
9 members. Each node is clickable.
Loading graph…
Drawn from 1 source
The papers/notes whose extracted claims & findings make up this cluster.
Bridges (3)
Other communities that share members with this one — cross-cutting threads or papers that sit at the seam between two themes.
Findings (5)
- B10 phase width Δk = 1.21 ± 0.18Transition width (k90 – k10) for B10.
- B10 shot midpoint k50 = 0.28 ± 0.05 shots with accuracy 94.8 ± 1.2%Lowest threshold condition in E2; near-zero/one-shot threshold consistent with high pretraining density
- B9 phase width (k90 − k10) = 3.74 ± 0.31 shotsWidest transition in E2; consistent with lower prior density requiring more shots for reliable threshold crossing
- k50 for base-10 two-digit addition: 0.28 ± 0.05 shotsShot midpoint from logistic fit over 10 runs.
- k50 ordering: B10 (0.28) < B8 (1.83) < B9 (2.91) follows pretraining densityMonotone ordering consistent with k50 ∝ dr/ρd.
Claims (4)
- Few-shot thresholds and transition widths track ρd/dr at fixed computational complexityE2 main interpretive claim.
- Shot midpoint ordering k50(B10) < k50(B8) ≈ k50(B9) tracks pretraining exposure densityInterpretation that pattern density from pretraining determines few-shot requirements
- The ordering of few-shot thresholds k50 and transition widths aligns with k50 ∝ dr/ρd.Interpretation of E2 results.
- Transition widths ∆k increase with mismatch D(P0 ∥ PT), evidenced by wider widths from B10 to B9Interpretive claim linking phase width in E2 to mismatch term in UCCT