finding
active
finding:framework-building-regex-markers-the-core-insight-is-this-synthesizes-show-zero-or-negative-correlation-with-llm-scoresFramework-building regex markers ('the core insight is,' 'this synthesizes') show zero or negative correlation with LLM scores
Scorer rewards enacted reflection not described reflection; confirmed by regex analysis
Source paper
extracted_from(2026) · Borzov, Anton
Neighborhood — ranked by edge-count
Claims (1)
claim
- Mechanistic analog connecting Lindsey's layer-localized findings to the scorer's enacted/described distinction
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Non-LLM validation confirming LLM scorer captures genuine self-observation markers
- DeepSeek-R1: Incentivizing reasoning capability in LLMs via reinforcement learning (DeepSeekAI, 2025)concept0.737Paper introducing DeepSeek-R1 model and reporting self-reflection as aha moment
- Out-of-context reasoning work directly related to synthetic document fine-tuning experiments
- Table 2, row 3, showing equivalence when prior preferences match rewards.
- Establishes that the observed linear structure is not merely a representation of text probability
- Demonstrates RSA's sensitivity issue in embedding layers; attributed partly to Spearman rank handling of RDMs with differing relative extrema.
- LLM alignment score to DINOv2 shows an emergence-esque trend with GSM8K mathematical reasoning performancefinding0.717Alignment predicts math performance with emergent pattern
- Explains why cities+neg_cities and larger_than+smaller_than training sets yield better OOD accuracy