finding

active

finding:strength-comparison-pair-3-7-with-4-outperforms-pair-3-5-with-2-indicating-graded-sensitivity-to-perturbation-magnitude

Strength comparison pair (3,7) with |Δα|=4 outperforms pair (3,5) with |Δα|=2, indicating graded sensitivity to perturbation magnitude

Shows that introspective accuracy scales with injection strength difference, not binary detection

Source paper

extracted_from

Detecting the Disturbance: A Nuanced View of Introspective Abilities in LLMs

(2025) · Ely Hahami · I. N. Sinha · Jain, Lavik · Kaplan, Josh +1

Neighborhood — ranked by edge-count

Claims (1)

claim

LLMs can compute meaningful functions over perturbations to their internal states, establishing introspection as a real but layer-dependent phenomenon
supports
Primary positive claim of the paper, grounded in strength comparison and localization results

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Strength comparison accuracy reaches 73% at layer 3 for injection pair (2,6) vs. 50% chancefinding0.801
Secondary positive result for strength comparison showing graded sensitivity to perturbation magnitude
Strength comparison accuracy averages 47% at layers 15-30, indistinguishable from 50% chancefinding0.764
Shows collapse of introspective capability at later layers in the strength comparison task
Experimental condition adjective embeddings show mean cosine similarity 0.657 (n=9,591 pairs), significantly higher than history (0.628, t=15.8, p=1.4×10⁻⁵⁵), conceptual (0.587, t=38.5, p<10⁻³⁰⁰), and zero-shot (0.603, t=35.1, p=4.3×10⁻²⁶²)finding0.762
Core result of Experiment 3: cross-model semantic convergence under self-referential processing
AUS_N is a weaker correlate of θ50 than S_max across E3 backbonesfinding0.761
E3 finding distinguishing the two geometry summaries; breadth less predictive than peak height
Pairwise similarity of trait PC1 across all three models is >0.81; no pairwise correlation in top 3 trait PCs is below 0.70finding0.754
Shows trait space has more cross-model consistency than role space beyond PC1
The degree of similarities which exist in a structure must correspond exactly to the degree of similarity of the conditions there, and the degrees of differences which exist in a structure must also correspond to the degrees of difference in the conditions there.claim0.749
The profound principle that underlies all living structure; symmetry as the mathematical trace of necessity.
The results generalise readily to non-equilibrium systems where scaling relationships remain similar (e.g., dynamic or localised scaling).claim0.748
Claim about broader applicability of the scaling argument
Impulsivity→interest: ρ increases from 0.70 (α=-4) to 0.83 (α=+4); R² from 0.46 to 0.69 in LLaMA-3.2-3Bfinding0.745
Scatter plot visualization showing strengthened probe-report relationship across alpha range