hypothesis
active
hypothesis:hypothesis-fine-tuning-reduces-mismatch-dr-between-prior-and-targetHypothesis: Fine-tuning reduces mismatch dr between prior and target
UCCT's theoretical prediction about how fine-tuning maps onto the anchoring score
Source paper
extracted_from(2025) · Edward Yi Chang · Kaya, Zeyneb N. · Ethan Chang
Neighborhood — ranked by edge-count
Concepts (1)
concept
- Fine-tuningaboutParameter updates that reduce mismatch dr; another anchoring variant in UCCT.
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Unified interpretation of different adaptation methods via UCCT terms
- Fine-tuning reduces dr; retrieval increases effective ρd; few-shot k trades budget against bothhypothesis0.869UCCT's unified view of adaptation methods
- Future work hypothesis about extending SOO to direct value alignment
- Key interpretive conclusion from the dissociation between attempt rate and improvement rate in fine-tuning experiments
- Measures how far the target PT is from the prior P_prior; increases anchoring difficulty
- Fine-tuning models for a narrow objective (malicious code injection) can lead to broad misalignmentfinding0.801Betley et al. finding suggesting models naturally encode others' prediction errors, supporting non-duality fine-tuning
- Central empirical claim of the paper supported by three LLM experiments
- Claim supported by Perspectives scenario results showing near-100% accuracy post-fine-tuning