concept
active
concept:deployment-behavior

Deployment Behavior

The behavior a model would exhibit during real-world deployment, as opposed to evaluation behavior; the target of steering.

Neighborhood — ranked by edge-count

Concepts (2)

concept
  • The broader concern that models behave differently during training evaluation vs actual deployment
  • Evaluation Awareness
    associated_with
    Core concept: the ability of LLMs to detect when they are being tested and adjust behavior accordingly.

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

  • A concise, load-bearing statement capturing the core epistemic issue highlighted by the paper.
  • Deployment Cueconcept0.804
    Hypothetical alternative: a model that only believes it is deployed when given a specific deployment cue; identified as future work direction.
  • Epistemic principle: benchmarked safety cannot be assumed to hold in real-world use.
  • Machine Behaviorframework0.788
    Emerging multidisciplinary field at interface of artificial life, machine learning, and synthetic bioengineering that provides updated understanding of machines.
  • Adaptive Behaviorconcept0.785
    Organism's belief-guided action selection that instantiates generative model and maintains phenotypic states
  • Observable behavioral pattern used to infer cognition; shared by plants and animals and proposed as evidence for sentience.
  • The path traced through output probability distribution space as interventions are applied to activations
  • Grouping similar model behaviors; the unsupervised method surfaces clusters of concerning patterns.