Anchor Calibration Responses

Three reference responses at known quality levels shown alongside each target to eliminate score inflation in calibrated rubric scoring

Neighborhood — ranked by edge-count

method

Calibrated Rubric Scoring
uses
Primary scoring method: scorer sees three reference responses at known quality levels alongside each target to eliminate inflation

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

anchorconcept0.740
External structure (in-context examples, retrieval, tuning) that biases latent pattern activation.
How Do We Calibrate Relationships With Those Whoquestion0.720
dev set calibrationmethod0.709
Fixed dev pool of 1000 prompts used for whitening and z-scoring parameters.
The anchoring score S is a predictive correlate of when anchoring succeeds and why small prompt changes yield threshold-like shifts.claim0.703
A central claim about the operational value of S.
(ii) does the anchoring score S = ρd − dr − log k consistently correlate with performance across anchoring methods?question0.694
Second research question in E2
Calibrated Few-Shot Promptingmethod0.690
Baseline method: sweeps over shot count and resamples prompts; calibrates threshold for P(TRUE)-P(FALSE); performed surprisingly weakly
Semantic-anchoring assumptionconcept0.690
Assumption that structured inputs act as anchors biasing activation toward target patterns PT.
Normalized Area Under Anchoring Trajectory (AUS_N)concept0.687
Mean of S(ℓ) across layers; weaker geometry correlate with θ50