finding

active

finding:framework-building-regex-markers-the-core-insight-is-this-synthesizes-show-zero-or-negative-correlation-with-llm-scores

Framework-building regex markers ('the core insight is,' 'this synthesizes') show zero or negative correlation with LLM scores

Scorer rewards enacted reflection not described reflection; confirmed by regex analysis

Source paper

extracted_from

Koan Battery: Measuring Reflective Mode Accessibility in AI

(2026) · Borzov, Anton

Neighborhood — ranked by edge-count

Claims (1)

claim

Enacted reflection may correspond to silent mid-layer processing; described reflection to the motor impulse of concepts leaking through to output.
supports
Mechanistic analog connecting Lindsey's layer-localized findings to the scorer's enacted/described distinction

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Self-observation regex markers ('I notice,' 'genuinely,' 'something about') predict all LLM scores (r=0.43-0.50, all p<.001)finding0.784
Non-LLM validation confirming LLM scorer captures genuine self-observation markers
DeepSeek-R1: Incentivizing reasoning capability in LLMs via reinforcement learning (DeepSeekAI, 2025)concept0.737
Paper introducing DeepSeek-R1 model and reporting self-reflection as aha moment
Connecting the Dots: LLMs Can Infer and Verbalize Latent Structure from Disparate Training Data (Treutlein et al. 2024)concept0.728
Out-of-context reasoning work directly related to synthetic document fine-tuning experiments
Under reward shaping (G=100, H=-100, F=0), Active Inference scored 99.52, Bayesian RL 99.77, Q-learning 95.56, with nearly identical behavior between belief-based agents.finding0.720
Table 2, row 3, showing equivalence when prior preferences match rewards.
LLMs linearly represent truth-relevant information beyond the plausibility of text, as evidenced by probes trained on likely performing poorly on anti-correlated datasetsclaim0.719
Establishes that the observed linear structure is not merely a representation of text probability
RSA shows low RDM correlation on embedding layers for GRU-GRU comparisons, despite high within-seed functional similarityfinding0.718
Demonstrates RSA's sensitivity issue in embedding layers; attributed partly to Spearman rank handling of RDMs with differing relative extrema.
LLM alignment score to DINOv2 shows an emergence-esque trend with GSM8K mathematical reasoning performancefinding0.717
Alignment predicts math performance with emergent pattern
Training probes on statements and their opposites improves generalization by mitigating non-truth features with opposite-sign correlationsclaim0.716
Explains why cities+neg_cities and larger_than+smaller_than training sets yield better OOD accuracy