finding

active

finding:ambiguous-2-shot-anchors-yield-four-distinct-interpretations-across-m1-m4-p-abs-mult-p-add-x2-p-signed-mult

Ambiguous 2-shot anchors yield four distinct interpretations across M1-M4 (P_abs-mult, P_add x2, P_signed-mult)

E1 finding showing that near-threshold, marginal model differences tilt to qualitatively different bindings

Source paper

extracted_from

The Guanyin Protocol: A Framework for Immediately Establishing an Understanding of Both Causality and Compassion in LLM Systems Using Semantic Anchoring

(2025) · Edward Yi Chang · Kaya, Zeyneb N. · Ethan Chang

Neighborhood — ranked by edge-count

Hypotheses (1)

hypothesis

Hypothesis 1 (Threshold Behavior): There exists a task-dependent threshold Sc such that performance exhibits sharp changes as S crosses Sc, with value and transition width depending on model, layer, and pooling
supports
Core testable hypothesis of UCCT about the nature of performance transitions under anchoring

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Ambiguous anchors (33-27=60, 11-9=20) yield four distinct arithmetic interpretations across M1-M4finding0.820
Models produce different answers (240, 138, -240) from the same ambiguous prompt
Adding a single disambiguating example (12−9=21) aligns divergent M1-M4 interpretations under tested seedsfinding0.740
E1 finding consistent with threshold-crossing: near-threshold state resolved by one additional anchor
Calibrated few-shot prompting was a surprisingly weak baseline for truth classification compared to linear probesfinding0.728
Unexpected finding that behavioral baseline underperforms representational probing approaches
Two exemplars (2−3=5, 7−4=11) induce reinterpretation of '−' as addition on held-out queries across mainstream LLMsfinding0.726
E1 qualitative finding demonstrating anchor rebinding of strong arithmetic prior
2-shot reinterpretation of '-' yields 23 for 15-8 on held-out queryfinding0.723
E1 qualitative: two exemplars (2-3=5, 7-4=11) cause LLMs to output 23 for 15-8.
MDS injections align with the Linear Representation Hypothesis: target trait varies near-linearly with alpha in open-ended generationclaim0.723
Theoretical alignment claim backed by OLS R2 analysis showing 96.15% of trends have R2>=0.75
Shot midpoints follow k50 ∝ dr/ρd; higher cohesion and lower mismatch yield fewer required examplesclaim0.719
Core quantitative prediction of UCCT validated by E2 threshold ordering
Probes trained under different explicit instruction prompts (ask-correct, ask-t/f, ask-able, ask-arith) are highly aligned with each other in cosine similarity.finding0.719
Shows the passive vs. active divide is more important than the specific wording of instructions.