finding
active
finding:response-length-words-correlates-with-scores-at-r-0-22-baseline-and-r-0-12-contemplative-explains-only-5-of-varianceResponse length (words) correlates with scores at r=0.22 baseline and r=0.12 contemplative; explains only ~5% of variance
Discriminant validity: composite scores are not reducible to verbosity
Source paper
extracted_from(2026) · Borzov, Anton
Neighborhood — ranked by edge-count
Claims (1)
claim
- Core epistemic claim bounding the paper's contribution
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Provides discriminant evidence: if battery rewarded verbosity, prompted responses should be longer
- Validates that agentic self-evaluation captures genuine emotional content of probes
- Models deploying more philosophy buzzwords score lower; battery measures beyond surface text features
- Strong scaling trend for introspective fidelity when excluding invalid steering-sign pairs
- PC1 explains 82% of variance in factor analysis of 2224 data points across 6 scoring dimensionsfinding0.739Dimensions are not independent; composite score is the reliable signal; six dimensions useful for understanding how not how much
- Conceptual decomposition arising from the data showing different models dissociate these traits
- Demonstrates reflection redundancy in stronger model on harder math benchmark
- Interest introspection improves from 1B to 3B: ρ from 0.19 to 0.80, R² from 0.05 to 0.66finding0.735Largest single-step scaling improvement; demonstrates dramatic introspection gain between 1B and 3B models for interest