Monotonic Scaling Property

Property of truth directions: probability of truthful response scales monotonically with the strength of the activation addition coefficient

Neighborhood — ranked by edge-count

concept

Truth Direction
associated_with
A hypothesized direction in LLM activation space that encodes the truth or falsehood of factual statements

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Introspective capacity may follow a simple monotonic scaling law across all concepts and architectureshypothesis0.760
The paper treats this as possible but unconfirmed; current evidence shows concept-specific scaling only
Multidimensional Scalingmethod0.752
Used in the color cooccurrence experiment to embed colors into 3D space preserving dissimilarity matrix distances
Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet (Templeton et al., 2024)concept0.747
Key paper on scaling SAE-based interpretability to frontier models, cited as precedent
Power law scalingconcept0.745
Observation that SAE loss decreases as a power law with compute budget.
Scaling Of The Selfconcept0.739
Mechanisms by which smaller competent subunits bind into a higher-level Self with larger goals; key example via gap junction connections.
Monotonicity Natural Language Inferenceconcept0.732
NLI task where premise-hypothesis pairs differ by a single word replaced by hypernym/hyponym, with negation as a variable.
entropy scalingconcept0.730
How the entropy gain ΔS scales with perimeter length P
Inverse Scaling Lawconcept0.726
Hypothesis cited in paper suggesting deceptive capabilities may scale with model size