community
active
leiden_hybrid_concepts
label: sonnet
community:leiden_hybrid_concepts-run4-c11Few-shot anchoring & latent structure
How minimal examples disambiguate and recruit latent arithmetic/reasoning interpretations in LLMs
59 members. Each node is clickable.
Loading graph…
Sub-communities (9)
Finer clusters this community splits into. Each is its own community page.
Unified Competency Control Theory (UCCT)9Few-shot learning phase transitions in neural networks9Prompt anchoring and latent structure binding8Mid-layer representation geometry in neural networks8Anchoring score S for few-shot learning transitions7Model base robustness and transfer learning asymmetries5Neural activation geometry and behavioral prediction5Mechanistic editing through parameter surgical intervention4Anchoring bias in commonsense reasoning4
Drawn from 6 sources
The papers/notes whose extracted claims & findings make up this cluster.
- The Guanyin Protocol: A Framework for Immediately Establishing an Understanding of Both Causality and Compassion in LLM Systems Using Semantic Anchoring51 members
- Paper Summary: Interpreting Language Model Parameters4 members
- 2026-05-12_room-to-play-in-eval-cohort.md1 member
- On biological and artificial consciousness: A case for biological computationalism1 member
- SFR-DeepResearch: Towards Effective Reinforcement Learning for Autonomously Reasoning Single Agents1 member
- cognitive-glue-and-alexander.md1 member
Bridges (20)
Other communities that share members with this one — cross-cutting threads or papers that sit at the seam between two themes.
- Layer-wise geometry predicting few-shot learning10 shared
- Anchoring score threshold theory10 shared
- Few-shot learning phase transitions in neural networks9 shared
- Unified Competency Control Theory (UCCT)9 shared
- Mid-layer representation geometry in neural networks8 shared
- Prompt anchoring and latent structure binding8 shared
- Anchoring score S for few-shot learning transitions7 shared
- Few-shot arithmetic learning thresholds6 shared
- Neural activation geometry and behavioral prediction5 shared
- Unified Contextual Conditioning Theory (UCCT)5 shared
- Model base robustness and transfer learning asymmetries5 shared
- Anchoring bias in commonsense reasoning4 shared
- Mechanistic editing through parameter surgical intervention4 shared
- Cross-base fine-tuning transfer asymmetry3 shared
- Commonsense reasoning anchoring bias3 shared
- Coherent anchor prior rebinding2 shared
- Targeted neural network weight surgery2 shared
- Anchors as latent structure recruiters1 shared
- Prompting as cognitive control operations1 shared
- Ambiguous arithmetic anchor interpretations1 shared
Claims (32)
- Few-shot thresholds and transition widths track ρd/dr at fixed computational complexityE2 main interpretive claim.
- Prompt and context design are cognitive-control operations: they toggle latent competencies rather than teaching the model from scratch.Assertion about the nature of prompt engineering.
- UCCT strictly generalizes ICL and reads retrieval-augmented generation and fine-tuning as the same anchoring process acting on one measurable score SAuthors' central interpretive claim about the scope of their theory
- Anchors recruit and bind latent structure; they do not create new knowledge in the modelScope-limiting claim clarifying UCCT's interpretation of what anchoring does
- Cross-domain anchoring demonstrates that UCCT's principles apply beyond textClaim of modality generality
- Fine-tuning reduces mismatch dr, retrieval increases effective cohesion ρd, and few-shot adjusts the budget kUnified interpretation of different adaptation methods via UCCT terms
- Higher-density priors (B10) are more robust to fine-tuning than lower-density ones (B9).Interpretation of cross-base transfer asymmetry.
- Layer-wise anchoring peaks in a 'Goldilocks zone' between early and late layers.Qualitative characterization of optimal anchoring depth.
- Layer-wise geometry summaries (Sbmax, AUSN) predict internal few-shot thresholds θ50Claim that geometry-to-behavior correlates exist
- Layer-wise trajectories show early enrichment, mid-layer alignment, and late re-clustering.Qualitative geometry pattern.
- Peak anchoring Sbmax and normalized area AUSN correlate with per-item success and internal shot midpoints θ50, providing a geometry-to-behavior bridge.Main interpretation of E3.
- Rank-one matrix decomposition constraint enforcing mechanistic simplicityCore design principle of VPD: each parameter subcomponent is constrained to be a simple rank-one matrix to enable isolated understanding and combination.
- S = ρd - dr - log k is a predictive correlate of when few-shot behavior flipsClaim that S predicts threshold midpoints across different bases, tasks, and models
- S = ρd - dr - log k predicts shot midpoints across different bases, tasks, and modelsPredictive practical utility claim.
- S = ρd − dr − log k is a predictive correlate of anchoring success across few-shot, SFT, and CoT.UCCT's practical utility claim.
- S is a predictive correlate calibrated on dev sets, not an absolute measureClarifies nature of S.
- Shot midpoint ordering k50(B10) < k50(B8) ≈ k50(B9) tracks pretraining exposure densityInterpretation that pattern density from pretraining determines few-shot requirements
- Small prompt changes can yield threshold-like shifts because S crosses the critical value ScAuthors' explanation for abrupt behavioral changes
- Small, coherent anchors can rebind strong priors and exhibit near-threshold sensitivity.Conclusion from E1 and central UCCT claim.
- Small, coherent anchors can rebind strong priors without changing model weightsCross-domain anchoring claim.
- The additive form S = ρd - dr - log k is parsimonious and aligns with log-odds intuitionJustification for the linear combination
- The anchoring score S is a predictive correlate of when anchoring succeeds and why small prompt changes yield threshold-like shifts.A central claim about the operational value of S.
- The budget term −log k acts as a regularizer to discourage degenerate long prompts.Theoretical interpretation.
- The ordering of few-shot thresholds k50 and transition widths aligns with k50 ∝ dr/ρd.Interpretation of E2 results.
- The three forces—cohesion, mismatch, budget—summarize anchoring trajectories.Summary of the decomposition of S.
- Threshold-like performance flips occur when anchoring strength S crosses a task-dependent critical value Sc.Interpretation of abrupt behavior changes.
- Transition widths ∆k increase with mismatch D(P0 ∥ PT), evidenced by wider widths from B10 to B9Interpretive claim linking phase width in E2 to mismatch term in UCCT
- UCCT fills a gap in explaining when behavior flips for a specific prompt and how much anchor budget is neededAuthors contrast their work with prior phase/representation studies
- UCCT offers a compact, testable formulation with measurable quantities (ρd, dr, k, S, Sc)Falsifiability claim.
- UCCT provides practical diagnostics for prompt design, retrieval, and light fine-tuning via S without additional training infrastructureApplied contribution claim: S enables 'add 2 more examples to cross threshold' decisions
- +2 more
Findings (27)
- 2-shot reinterpretation of '-' yields 23 for 15-8 on held-out queryE1 qualitative: two exemplars (2-3=5, 7-4=11) cause LLMs to output 23 for 15-8.
- Adding a single disambiguating example (12−9=21) aligns divergent M1-M4 interpretations under tested seedsE1 finding consistent with threshold-crossing: near-threshold state resolved by one additional anchor
- Ambiguous anchors (33-27=60, 11-9=20) yield four distinct arithmetic interpretations across M1-M4Models produce different answers (240, 138, -240) from the same ambiguous prompt
- AUSN mean -2.119 ± 0.198Normalized area under S(ℓ) averaged over seeds.
- B10 phase width Δk = 1.21 ± 0.18Transition width (k90 – k10) for B10.
- B10 shot midpoint k50 = 0.28 ± 0.05 shots with accuracy 94.8 ± 1.2%Lowest threshold condition in E2; near-zero/one-shot threshold consistent with high pretraining density
- B9 phase width (k90 − k10) = 3.74 ± 0.31 shotsWidest transition in E2; consistent with lower prior density requiring more shots for reliable threshold crossing
- Commonsense reasoning S ≈ -2.15 uniformLower, uniform anchoring for pattern-matching tasks.
- Commonsense reasoning shows uniform but weaker anchoring (S ≈ −2.15)Task-specific comparison.
- Commonsense reasoning tasks S≈-2.15Lower, more uniform anchoring for commonsense tasks
- Correlation between layer-wise S scores and task accuracy: ρ = -0.73, p < 0.001Shows S predicts anchoring effectiveness.
- Cross-base fine-tuning yields asymmetric transfer: B10 transfers most robustly, B9 leastIn-base gains accompanied by uneven OOD drops; higher-density priors more robust.
- Cross-base transfer: B10 transfers most robustlyB10 fine-tuning yields smallest OOD drops when transferring to other bases
- Direct model editing via parameter subcomponent modification—emoticon eye recognition altered to predict shocked faces with no retrainingDemonstrated that VPD-discovered subcomponents encode true computational machinery by enabling targeted, predictable behavior changes without gradient-based training.
- Editing the emoticon eye subcomponent to output the unembedding vector for 'o' causes the model to predict shocked faces for all emoticonsDirect parameter subcomponent overwrite produces a clean behavioral change without training.
- k50 for base-10 two-digit addition: 0.28 ± 0.05 shotsShot midpoint from logistic fit over 10 runs.
- k50 ordering: B10 (0.28) < B8 (1.83) < B9 (2.91) follows pretraining densityMonotone ordering consistent with k50 ∝ dr/ρd.
- Larger Sbmax associated with smaller θ50 in E3 sweepGeometry-to-behavior correlate within E3.
- Length normalization prevents degenerate tool-calling trajectories and repeated tool calls without normalization.Empirical result showing that without length normalization, RL training produces rapidly increasing tool usage with performance collapse and repetitive tool calls.
- LLaMA-3.1-8B: Sbmax = -1.896 ± 0.211, AUSN = -2.119 ± 0.198, peak layer ℓ* = 10 (median)Seed-pooled geometry-only statistics (per-dev z units).
- Math and code tasks show strongest mid-layer anchoring on LLaMA (S ≈ −1.65 at layers 8-12)Task-specific E3 finding showing compositional reasoning requires deeper processing
- Math/code tasks S ≈ -1.65 at layers 8–12Task-specific peak anchoring score for structured reasoning domains.
- Meta-LLaMA-3.1-8B-Instruct shows optimal anchoring at layer 9 (S ≈ −1.90, median peak layer ℓ* = 10 [IQR 0.384])E3 result establishing the Goldilocks zone at mid-layers for LLaMA architecture
- Peak layer ℓ* median 10, IQR 0.384Median layer where S(ℓ) peaks, across seeds.
- Sbmax mean -1.896 ± 0.211Geometry summary peak anchoring score averaged over seeds.
- Single dendritic layer solves XOR-like problems with capacity matching 8-layer deep networks.Evidence from Beniaguev et al. (2021) that individual biological neurons vastly outperform McCulloch-Pitts model; supports hybrid computation claim.
- Subcomponent L2.MLP.down:3382 (density 0.00%) predicts emoticon continuations after colon, semicolon, or equalsSpecific discovered subcomponent that activates on punctuation like ' :', ' ;', ' =', ':-' and predicts the rest of emoticons/emojis.