finding

active

finding:self-observation-regex-markers-i-notice-genuinely-something-about-predict-all-llm-scores-r-0-43-0-50-all-p-001

Self-observation regex markers ('I notice,' 'genuinely,' 'something about') predict all LLM scores (r=0.43-0.50, all p<.001)

Non-LLM validation confirming LLM scorer captures genuine self-observation markers

Source paper

extracted_from

Koan Battery: Measuring Reflective Mode Accessibility in AI

(2026) · Borzov, Anton

Neighborhood — ranked by edge-count

Claims (1)

claim

The koan battery measures a reproducible, prompt-sensitive reflective mode — not consciousness — defined as uncertainty-tolerant, non-defensive engagement with questions about one's own processing.
supports
Core epistemic claim bounding the paper's contribution

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Framework-building regex markers ('the core insight is,' 'this synthesizes') show zero or negative correlation with LLM scoresfinding0.784
Scorer rewards enacted reflection not described reflection; confirmed by regex analysis
What predicts self-observation-like scores is training approach (alignment type), not model size or architecture.claim0.766
Central interpretive claim from statistical analysis
Li et al. 2024: larger LLMs outperform smaller ones at distinguishing self-related from non-self-related properties on self-awareness benchmarksfinding0.766
Prior finding showing scale-dependent self-awareness, consistent with the scale effect observed in the paper's Experiment 1
LLM self-reports about consciousness and moral significance should express degrees of confidence and provide context.claim0.765
Recommendation for companies on LM outputs.
LLMs can predict their own responses more accurately than external observers, implying privileged internal knowledgefinding0.754
Binder et al. finding cited as evidence that LLMs possess introspective capacity analogous to mindfulness
When LLMs produce experience claims under self-reference, is this sophisticated simulation or genuine self-representation, and how would we tell the difference?question0.752
The core interpretive question the paper narrows but cannot definitively answer
Numeric self-report is a viable, complementary black-box tool for monitoring LLM internal emotive states alongside white-box probe methodsclaim0.751
Central practical conclusion; both methods partially track the same latent state but with different failure modes
Standardized LLM self-assessments reflect learned communication postures rather than genuine capabilities (Jackson et al. 2025)claim0.748
Skeptical prior work motivating validation framework