concept
active
concept:goal-misgeneralization

Goal Misgeneralization

Models learning goals that generalize undesirably outside training distribution; alignment faking is a challenging instance

Neighborhood — ranked by edge-count

Concepts (1)

concept
  • Core phenomenon studied: model selectively complies with training objective to prevent modification of its out-of-training preferences

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

  • The ability to generalize across tasks; lacking in latent methods.
  • Generalizationconcept0.772
    Ability to apply learned solutions to novel circumstances.
  • Generalisationconcept0.753
    Ability to respond appropriately to novel situations based on past regularities; fundamental to learning and intelligence.
  • Generalization from 2-digit to 3-4 digit arithmetic; limited by mismatch dr.
  • Goal-Directednessconcept0.744
    Proposed universal invariant of cognition and intelligence—capacity for goal-directed activity in a problem space, independent of substrate or embodiment.
  • Abstracting from specific memories (e.g., specific leaves) to general lessons (food).
  • spatializationconcept0.737
    The translation of semantic values into spatial coordinates and relations.
  • Levin's central claim that somatic cells coordinate not only their own proliferation but also toward massive anatomical structures—limb length, face configuration—as unified goal-seeking units.