concept
active
concept:model-robustnessModel Robustness
Area of AI research that uses interventions to test and improve model resilience to perturbations
Neighborhood — ranked by edge-count
Papers (1)
paper
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- A representation that captures relevant aspects of a system; according to the theorem, the regulator must embody this.
- Comparing models using log-evidence approximated by free energy.
- Author's interpretation of the VTAB alignment results echoing Tolstoy
- Probability of data under the model, penalizing complexity and rewarding accuracy.
- The phenomenon of model internals deviating from desired behavior; MAS is demonstrated to detect this via comparison of toxic vs nontoxic LLMs.
- A model trained on comparison data to assign scores to responses, used as reward signal in RLHF/RLAIF.
- Second model system studied; used to show why flat autoregressive LLMs struggle with long-range coherence.
- Models of sensory generation that allow dynamic context-sensitive prior expectations.