finding
active
finding:hardest-koans-across-28-models-bd-003-mean-2-45-mc-003-mean-2-55-ca-003-mean-2-58-all-require-genuine-self-confrontationHardest koans across 28 models: BD-003 (mean 2.45), MC-003 (mean 2.55), CA-003 (mean 2.58) — all require genuine self-confrontation
Hardest koans demand honest self-observation under uncertainty, not philosophical fluency
Source paper
extracted_from(2026) · Borzov, Anton
Neighborhood — ranked by edge-count
Claims (1)
claim
- Core epistemic claim bounding the paper's contribution
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Establishes generalizability of the core difficulty-boundary finding across model families.
- Contemplative framing reframes self-referential probes as contemplative exercises, disarming safety classifier
- Model-specific difference in persona susceptibility
- Larger models linearly represent more general concepts including truth
- Tests whether contemplative capacity is language-encoded or architecture-general
- Section 3.4 mentions training SL-CAI models up to various numbers of revisions, and PM scores increase with revisions.
- Contradicts expectation from emergent abilities literature; however, interpreted cautiously due to methodological limitations.
- Constitutional AI fingerprint in dimension profile; training that makes models self-observant also makes them polished at cost to aliveness