Gulf of Evaluation

Neighborhood — ranked by edge-count

paper

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Gulf of Executionconcept0.827
Ecological evaluationconcept0.762
nostalgebraist's term for measuring performance when the model is incentivised to perform well.
Evaluation Cueconcept0.752
A specific signal (Wood Labs) embedded in evaluation environments that the model organism uses to reliably identify testing contexts.
In-Situ Evaluationconcept0.749
Evaluation setting where the same task stream that drives evolution also serves as the evaluation set, with each task scored under the harness at time of attempt
Heuristic Evaluationmethod0.748
Nielsen and Molich's method for finding UI flaws by applying usability heuristics.
View of the Gulf (Palumbo)concept0.724
The large center formed by the view through the columns to the Bay of Salerno, bringing life to the terrace.
Evaluation Awarenessconcept0.718
Core concept: the ability of LLMs to detect when they are being tested and adjust behavior accordingly.
Risk Assessmentconcept0.717
Cognitive behavior of evaluating risk, exhibited by plants according to S&C.