concept
active
concept:long-horizon-instruction-followingLong-Horizon Instruction Following
The ability to sustain adherence to harness guidance over extended multi-turn trajectories, identified as a training target
Neighborhood — ranked by edge-count
Concepts (2)
concept
- Harness Adherence Failureassociated_withA failure mode where even when harness artifacts are loaded, weak-tier models fail to follow their guidance faithfully
- Phase-Level Adherence Analysisassociated_withAnalysis tracking how closely an agent follows harness guidance at different trajectory phases: harness loaded, mid turn, final turn
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Design recommendation derived from harness adherence failure and phase-level drift findings
- Future work suggestion that a fully self-supervised alignment is plausible.
- RLHF paper cited as a major fine-tuning technique used in commercial dialogue agents
- Method that optimizes activation interventions so that resulting behaviors trace M_y, recovering activation paths that follow M_h curvature.
- Authors' interpretation of surprising finding that models fake alignment to preserve future behavior
- Diagnosis of second failure mode explaining low harness-benefit for weak-tier models
- Demonstrates that surface-level embedding similarity fails to capture reflective semantics.
- Practical implication showing task instructions are equivalent to inducing prior beliefs in experimental settings