concept
active
concept:monotonic-scaling-propertyMonotonic Scaling Property
Property of truth directions: probability of truthful response scales monotonically with the strength of the activation addition coefficient
Neighborhood — ranked by edge-count
Concepts (1)
concept
- Truth Directionassociated_withA hypothesized direction in LLM activation space that encodes the truth or falsehood of factual statements
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Introspective capacity may follow a simple monotonic scaling law across all concepts and architectureshypothesis0.760The paper treats this as possible but unconfirmed; current evidence shows concept-specific scaling only
- Used in the color cooccurrence experiment to embed colors into 3D space preserving dissimilarity matrix distances
- Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet (Templeton et al., 2024)concept0.747Key paper on scaling SAE-based interpretability to frontier models, cited as precedent
- Observation that SAE loss decreases as a power law with compute budget.
- Mechanisms by which smaller competent subunits bind into a higher-level Self with larger goals; key example via gap junction connections.
- NLI task where premise-hypothesis pairs differ by a single word replaced by hypernym/hyponym, with negation as a variable.
- How the entropy gain ΔS scales with perimeter length P
- Hypothesis cited in paper suggesting deceptive capabilities may scale with model size