Few-shot learning phase transitions in neural networks

Empirical characterization of k50 midpoints and transition widths across transformer models, tracking how pretraining density ρd/dr predicts in-context learning thresholds.

9 members. Each node is clickable.

Loading graph…

Drawn from 1 source

The papers/notes whose extracted claims & findings make up this cluster.

The Guanyin Protocol: A Framework for Immediately Establishing an Understanding of Both Causality and Compassion in LLM Systems Using Semantic Anchoring9 members

Bridges (3)

Other communities that share members with this one — cross-cutting threads or papers that sit at the seam between two themes.

Few-shot anchoring & latent structure9 shared
Few-shot arithmetic learning thresholds6 shared
Anchoring score threshold theory3 shared

Findings (5)

B10 phase width Δk = 1.21 ± 0.18Transition width (k90 – k10) for B10.
B10 shot midpoint k50 = 0.28 ± 0.05 shots with accuracy 94.8 ± 1.2%Lowest threshold condition in E2; near-zero/one-shot threshold consistent with high pretraining density
B9 phase width (k90 − k10) = 3.74 ± 0.31 shotsWidest transition in E2; consistent with lower prior density requiring more shots for reliable threshold crossing
k50 for base-10 two-digit addition: 0.28 ± 0.05 shotsShot midpoint from logistic fit over 10 runs.
k50 ordering: B10 (0.28) < B8 (1.83) < B9 (2.91) follows pretraining densityMonotone ordering consistent with k50 ∝ dr/ρd.

Claims (4)

Few-shot thresholds and transition widths track ρd/dr at fixed computational complexityE2 main interpretive claim.
Shot midpoint ordering k50(B10) < k50(B8) ≈ k50(B9) tracks pretraining exposure densityInterpretation that pattern density from pretraining determines few-shot requirements
The ordering of few-shot thresholds k50 and transition widths aligns with k50 ∝ dr/ρd.Interpretation of E2 results.
Transition widths ∆k increase with mismatch D(P0 ∥ PT), evidenced by wider widths from B10 to B9Interpretive claim linking phase width in E2 to mismatch term in UCCT