concept
active
concept:gulf-of-evaluationGulf of Evaluation
Neighborhood — ranked by edge-count
Papers (1)
paper
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- nostalgebraist's term for measuring performance when the model is incentivised to perform well.
- A specific signal (Wood Labs) embedded in evaluation environments that the model organism uses to reliably identify testing contexts.
- Evaluation setting where the same task stream that drives evolution also serves as the evaluation set, with each task scored under the harness at time of attempt
- Nielsen and Molich's method for finding UI flaws by applying usability heuristics.
- The large center formed by the view through the columns to the Bay of Salerno, bringing life to the terrace.
- Core concept: the ability of LLMs to detect when they are being tested and adjust behavior accordingly.
- Cognitive behavior of evaluating risk, exhibited by plants according to S&C.