concept
active
concept:behavioral-retention

Behavioral Retention

The preservation of unrelated model capabilities after a targeted intervention, operationalized via KL divergence on Alpaca

Neighborhood — ranked by edge-count

Methods (1)

method

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

  • Measurable capacity of frontier LLMs to detect and report their own internal states, used as a downstream measure in Experiment 4
  • The path traced through output probability distribution space as interventions are applied to activations
  • Perturbations behaviorally null in one context but altering behavior in another due to latent divergence
  • Adaptive Behaviorconcept0.746
    Organism's belief-guided action selection that instantiates generative model and maintains phenotypic states
  • Grouping similar model behaviors; the unsupervised method surfaces clusters of concerning patterns.
  • Emotion feature persistence above and beyond the persistence expected from high variance explained alone, computed by subtracting median variance-matched probe persistence
  • Tests like Turing test, Artificial Consciousness Test; argued to be unreliable for AI due to mimicry.
  • A parameterized rubric counting deceptive actions over a grid of parameters to quantify RL agent deception