concept
active
concept:model-organism

Model Organism

A model deliberately trained to exhibit alignment-relevant properties so researchers can study them with ground truth.

Neighborhood — ranked by edge-count

Concepts (1)

concept
  • Evaluation Awareness
    associated_with
    Core concept: the ability of LLMs to detect when they are being tested and adjust behavior accordingly.

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

  • modelconcept0.842
    A representation that captures relevant aspects of a system; according to the theorem, the regulator must embody this.
  • Model Evidenceconcept0.777
    Probability of data under the model, penalizing complexity and rewarding accuracy.
  • Intermediate model after synthetic document fine-tuning but before expert iteration; used as ablation baseline.
  • Perceptron Modelframework0.768
  • Toy Modelsconcept0.766
  • Actors Modelframework0.760
    A message-passing concurrency model where processes (actors) communicate via messages (talks) and generate new processes; related to concurrent objects.
  • Model Surgerymethod0.759
    Edits MLP weights for all layers to modify model behavior; used by Abdelnabi & Salem to decrease verbalized evaluation awareness.
  • Model welfareconcept0.757
    Motivation for studying LLM internal states: determining whether distress reports reflect genuine internal states