question
active
question:when-does-behavior-flip-for-a-specific-prompt-and-how-much-anchor-budget-is-neededWhen does behavior flip for a specific prompt and how much anchor budget is needed?
The specific gap UCCT addresses that prior phase/representation work left open
Source paper
extracted_from(2025) · Edward Yi Chang · Kaya, Zeyneb N. · Ethan Chang
Neighborhood — ranked by edge-count
Concepts (1)
concept
- anchoring strength Sanswered_byComposite score S = ρd − dr − log k predicting anchoring success.
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Practical question addressed by S and k50.
- Authors contrast their work with prior phase/representation studies
- Illustrates sensitivity to anchors.
- Interpretation of abrupt behavior changes.
- A central claim about the operational value of S.
- Finding that explicit correctness framing partially aligns truth directions across task families.
- Authors' interpretation of prompt variation results showing alignment faking disappears only when conflicting objective is removed
- Validates that steering vectors capture reflection semantics by finding tokens reported in related work.