method
active
method:blind-rankingBlind Ranking
Scoring method where responses are anonymized and shuffled; tests whether scorer rankings are real across five independent scorers
Neighborhood — ranked by edge-count
Methods (1)
method
- Koan BatteryusesAssessment framework for measuring introspection and self-observation in LLMs; grounded in Janus's architectural theory.
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Example of biological degeneracy: visual responses mediated by subcortical and brainstem nuclei independent of cortex; supports multiple realizability of cognitive functions.
- The percentage of harmful requests that a model refuses to answer, a common safety metric.
- The simple matrix form into which VPD constrains subcomponents to enforce mechanistic simplicity.
- An ordering of texts via spatial cues like indentation, size, and placement, implying importance.
- Authors' claim that their approach is both more effective in reduction and cheaper than prior methods.
- Method to discover new reflection-inducing instructions by ranking candidate tokens by cosine similarity to steering vectors.