finding
active
finding:in-llama-2-13b-cities-and-neg-cities-show-approximately-orthogonal-axes-of-separation-in-pca-visualizations-at-intermediate-layersIn LLaMA-2-13B, cities and neg_cities show approximately orthogonal axes of separation in PCA visualizations at intermediate layers
Case of misalignment showing that the truth direction is not always shared between a dataset and its negation in smaller models
Source paper
extracted_from(2023) · Samuel Marks · Max Tegmark
Neighborhood — ranked by edge-count
Claims (2)
claim
- Interpretive claim connecting scale to abstraction level in LLM representations
- Scale-dependent structural finding from PCA visualizations in §4
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Layer-by-layer evolution of truth direction alignment, supporting hierarchical abstraction hypothesis
- Primary visual evidence for linear truth representations in large LLMs
- Scale-dependent alignment result demonstrating how more abstract truth representations emerge with scale
- Hypothesized intermediate feature explaining antipodal alignment between cities and neg_cities in early-middle layers
- Layer-wise PCA analysis shows hierarchical development of truth representations across forward pass
- Layer-wise emergence pattern supporting hierarchical development hypothesis
- Demonstrates strong anti-correlation between text probability and truth in negated datasets
- Key empirical result showing that optimizing for behavioral outputs and fitting representation geometry produce the same path in activation space.