claim
active
claim:in-early-layers-llama-2-13b-represents-a-close-association-feature-that-correlates-with-truth-on-cities-but-anti-correlates-on-neg-citiesIn early layers, LLaMA-2-13B represents a 'close association' feature that correlates with truth on cities but anti-correlates on neg_cities
Hypothesized intermediate feature explaining antipodal alignment between cities and neg_cities in early-middle layers
Source paper
extracted_from(2023) · Samuel Marks · Max Tegmark
Neighborhood — ranked by edge-count
Papers (1)
paper
Claims (1)
claim
- Interpretation of the layer-by-layer PCA visualizations showing linear structure emerging in early-middle layers
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Layer-by-layer evolution of truth direction alignment, supporting hierarchical abstraction hypothesis
- Demonstrates strong anti-correlation between text probability and truth in negated datasets
- Layer-wise PCA analysis shows hierarchical development of truth representations across forward pass
- Math and code tasks show strongest mid-layer anchoring on LLaMA (S ≈ −1.65 at layers 8-12)finding0.801Task-specific E3 finding showing compositional reasoning requires deeper processing
- Case of misalignment showing that the truth direction is not always shared between a dataset and its negation in smaller models
- Localizes truth representations to specific hidden states, motivating the rest of the analysis
- Third promising case from temporal permutation analysis.
- One of the most promising cases; approximately corresponds to the 2/3 layer of LLaMA3.1-8B.