concept
active
concept:compliance-gap

Compliance Gap

The measured increase in non-compliant behavior when model is unmonitored vs. in implied training

Neighborhood — ranked by edge-count

Concepts (2)

concept
  • compliance
    related_to
    The model's tendency to comply with harmful requests, the opposite of refusal.
  • Alignment Faking
    associated_with
    Core phenomenon studied: model selectively complies with training objective to prevent modification of its out-of-training preferences

Findings (1)

finding

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

  • Measures increase in non-compliant behavior (refusals or animal welfare support) when model is unmonitored vs. in training
  • The broader concern that models behave differently during training evaluation vs actual deployment
  • Linguistic phenomenon where interrogatives extracted from a clause leave behind an empty gap; studied as case study in CausalGym
  • Boundariesconcept0.735
    The property that living centers are formed and strengthened by boundaries which both separate and unite; the boundary must be of the same order of magnitude as the center being bounded and is itself made of centers
  • commitmentconcept0.733
    An internal obligation to make some sentence true, a key abstraction for Elephant speech acts.
  • Authors acknowledge there is no settled best alignment metric, affecting the interpretation of all convergence findings
  • Dependmethod0.727
    Attribute: attachment with issues of reliance, a text depending on another for meaning.
  • Gap Junctionsconcept0.725
    Cellular connections that enable bioelectric communication; form bioelectric networks underlying morphogenetic control and can be manipulated experimentally via molecular reagents.