Two-dimensional truth subspace

Burger et al. (2024) framework proposing that truth is linearly decoded along a 2D subspace capturing both polarity-dependent and polarity-invariant directions.

Neighborhood — ranked by edge-count

Concepts (2)

concept

Polarity-dependent truth direction (tP)
uses
A direction that classifies affirmative statements effectively but inverts for negated variants, dominating in early layers.
Polarity-invariant truth direction (tG)
uses
A direction that classifies truth irrespective of sentence polarity, emerging and dominating in middle-to-late layers.

Claims (1)

claim

The two-dimensional subspace reported by Burger et al. reflects a transitional phase in model processing rather than a universal property of truth directions.
contradicts
Reinterpretation of Burger et al.'s finding as layer-specific rather than universal.

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Truth Subspaceconcept0.904
The multi-dimensional activation subspace whose directions causally mediate truthful behavior in LLMs
The two-dimensional subspace reported by Burger et al. (2024) seems to reflect a stage of transition in the model's processing, rather than a universal property of truth directions.quote0.799
Load-bearing interpretive claim about the layer-specificity of Burger et al.'s finding.
Subspace DASmethod0.777
Extension of DAS that learns a second rotation matrix on top of a fixed first one to decompose representations into sub-representations.
How can we discover a maximally informative or interpretable truth subspace rather than just a sufficient one?question0.773
Limitation-driven open question about subspace optimality
Behaviorally Binary Subspaceconcept0.767
A vector subspace that causally impacts outputs only through the sign of its values, enabling harmless magnitude divergence
Emotion Subspaceconcept0.752
The subspace of activation space spanned by the 171 orthogonalized emotion probe vectors, used to measure SAE feature emotional alignment
Propositional Truthconcept0.745
The paper's operationalization of truthfulness as simple, unambiguous propositional statements that can be labeled true or false
Balanced Subspacesconcept0.745
Subspaces whose contributions to a layer's output are canceled by opposing weight values, making them non-causally active under natural inputs