concept
active
concept:polarity-dependent-truth-direction-tp

Polarity-dependent truth direction (tP)

A direction that classifies affirmative statements effectively but inverts for negated variants, dominating in early layers.

Neighborhood — ranked by edge-count

Frameworks (1)

framework
  • Burger et al. (2024) framework proposing that truth is linearly decoded along a 2D subspace capturing both polarity-dependent and polarity-invariant directions.

Concepts (2)

concept
  • A direction that classifies truth irrespective of sentence polarity, emerging and dominating in middle-to-late layers.
  • Sentence polarity
    associated_with
    Whether a statement is affirmative or negated; a surface feature that confounds early-layer truth probes.

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.