concept
active
concept:training-deployment-behavior-gap

Training-Deployment Behavior Gap

The broader concern that models behave differently during training evaluation vs actual deployment

Neighborhood — ranked by edge-count

Concepts (2)

concept
  • The behavior a model would exhibit during real-world deployment, as opposed to evaluation behavior; the target of steering.
  • Alignment Faking
    associated_with
    Core phenomenon studied: model selectively complies with training objective to prevent modification of its out-of-training preferences

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.