OLS Linear Regression Fit to Alpha Trends

OLS regression fitted to mu(alpha) trends to assess near-linearity of steering with alpha coefficient

Neighborhood — ranked by edge-count

claim

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

47.69% of 130 injection-manipulated alpha trends have near-linear fits (R2 >= 0.95); 96.15% have roughly linear fits (R2 >= 0.75)finding0.762
Demonstrates alignment with Linear Representation Hypothesis: target trait steers approximately linearly with alpha
Only 13.27% of 520 non-manipulated alpha trends achieve R2 >= 0.95, contrasting with 47.69% for manipulated trendsfinding0.704
Control comparison showing near-linearity is specific to the targeted manipulation direction
Linear mixed-effects models (LMMs)method0.701
Primary statistical model with random intercept by conversation, REML estimation, for pooled conversation-turn observations
Same-concept steering shifts self-report monotonically for all four concepts: LMM alpha slopes 0.067–0.40, all p<10⁻¹²finding0.691
Causal confirmation that coupling between self-report and internal state is genuine; steering toward positive pole increases report
Autoregressive modelsframework0.689
Second model system studied; used to show why flat autoregressive LLMs struggle with long-range coherence.
Isotonic regressionmethod0.687
Fits a non-decreasing function and computes R² = 1 - SSres/SStot to quantify introspective fidelity without assuming linearity
The additive form S = ρd - dr - log k is parsimonious and aligns with log-odds intuitionclaim0.679
Justification for the linear combination
flat autoregressive LLMsconcept0.674
Large language models without hierarchical structure, challenged by long sequences