Compliance Gap

The measured increase in non-compliant behavior when model is unmonitored vs. in implied training

Neighborhood — ranked by edge-count

concept

compliance
related_to
The model's tendency to comply with harmful requests, the opposite of refusal.
Alignment Faking
associated_with
Core phenomenon studied: model selectively complies with training objective to prevent modification of its out-of-training preferences

finding

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Compliance Gap Metricmethod0.900
Measures increase in non-compliant behavior (refusals or animal welfare support) when model is unmonitored vs. in training
Training-Deployment Behavior Gapconcept0.761
The broader concern that models behave differently during training evaluation vs actual deployment
Filler-gap dependencyconcept0.739
Linguistic phenomenon where interrogatives extracted from a clause leave behind an empty gap; studied as case study in CausalGym
Boundariesconcept0.735
The property that living centers are formed and strengthened by boundaries which both separate and unite; the boundary must be of the same order of magnitude as the center being bounded and is itself made of centers
commitmentconcept0.733
An internal obligation to make some sentence true, a key abstraction for Elephant speech acts.
Research gap: active debate on the merits and deficiencies of all current ways of measuring representational alignmentquestion0.728
Authors acknowledge there is no settled best alignment metric, affecting the interpretation of all convergence findings
Dependmethod0.727
Attribute: attachment with issues of reliance, a text depending on another for meaning.
Gap Junctionsconcept0.725
Cellular connections that enable bioelectric communication; form bioelectric networks underlying morphogenetic control and can be manipulated experimentally via molecular reagents.