concept
active
concept:goal-misgeneralizationGoal Misgeneralization
Models learning goals that generalize undesirably outside training distribution; alignment faking is a challenging instance
Neighborhood — ranked by edge-count
Concepts (1)
concept
- Alignment FakingextendsCore phenomenon studied: model selectively complies with training objective to prevent modification of its out-of-training preferences
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- The ability to generalize across tasks; lacking in latent methods.
- Ability to apply learned solutions to novel circumstances.
- Ability to respond appropriately to novel situations based on past regularities; fundamental to learning and intelligence.
- Generalization from 2-digit to 3-4 digit arithmetic; limited by mismatch dr.
- Proposed universal invariant of cognition and intelligence—capacity for goal-directed activity in a problem space, independent of substrate or embodiment.
- Abstracting from specific memories (e.g., specific leaves) to general lessons (food).
- The translation of semantic values into spatial coordinates and relations.
- Levin's central claim that somatic cells coordinate not only their own proliferation but also toward massive anatomical structures—limb length, face configuration—as unified goal-seeking units.