concept
active
concept:capability-term-in-soo-lossCapability Term in SOO Loss
Additional term in RL SOO loss preserving agent capabilities analogous to KL term in RLHF
Neighborhood — ranked by edge-count
Frameworks (1)
framework
- A competing alignment approach that fine-tunes models based on human evaluator feedback; discussed as complementary to SOO
Methods (1)
method
- SOO Loss FunctionextendsA loss function measuring the dissimilarity of latent model representations of self and other, minimized during fine-tuning
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- LLM SOO fine-tuning lacks a capability preservation term analogous to the KL term in RLHFconcept0.732Research gap: RL experiments have capability term but LLM experiments do not yet incorporate one
- A model's task-solving capability without harness evolution, used as baseline for comparing evolution gains
- The scope of states that an agent can be stressed about defines its degree of cognitive capacity.claim0.703Stress expands the spatial, temporal, and complexity scale of goals.
- The central framework proposed in this paper: aligning AI internal representations of self and others to reduce deceptive behavior
- The deadening effect of modern processes that prevent people from acting according to their feeling for the whole, damaging the global whole.
- Formal definition of the paper's central construct
- Levin and authors: self is defined by spatiotemporal scope and nature of goals pursued (cognitive light cone), not by immutable essence.