finding

active

finding:pca-analysis-shows-token-embeddings-and-unembeddings-are-concentrated-in-a-relatively-small-fraction-of-residual-stream-dimensions-in-large-models

PCA analysis shows token embeddings and unembeddings are concentrated in a relatively small fraction of residual stream dimensions in large models

Supporting evidence for the claim that most residual stream dimensions are free for other layers to use

Source paper

extracted_from

A Mathematical Framework for Transformer Circuits

(2021) ·

Neighborhood — ranked by edge-count

Claims (1)

claim

Zero-layer transformers optimally approximate bigram log-likelihood through W_U W_E
supports
First result in the hierarchy: the simplest possible transformer does nothing more than learn which tokens follow which

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

PCA Analysis of Token Embeddings/Unembeddingsmethod0.860
PCA applied to token embedding and unembedding matrices to understand what fraction of residual stream dimensions they occupy and how they relate
In LLaMA-2-7B, PCA of larger_than+smaller_than shows statements clustering by surface-level characteristics (e.g., presence of token 'eighty') rather than truth valuefinding0.787
Shows absence of abstract truth representations in smallest model, supporting scale-dependent emergence claim
PCA is the appropriate dimensionality reduction technique for constructing the RN because it preserves global structure and provides deterministic, interpretable projections.claim0.765
Justifies PCA choice over UMAP or t-SNE for the node-structured RN model.
PCA visualizations of LLaMA-2-13B and 70B representations of curated datasets show clear linear structure, with true statements separating from false ones in the top two principal componentsfinding0.757
Primary visual evidence for linear truth representations in large LLMs
Principal components analysis (PCA)method0.738
Statistical method used to analyze neural activity data.
The residual stream has a deeply linear structure, enabling virtual weights and path expansion analysisclaim0.734
Architectural observation enabling the entire mathematical framework; the residual stream is purely a sum of linear projections
RSA shows low RDM correlation on embedding layers for GRU-GRU comparisons, despite high within-seed functional similarityfinding0.734
Demonstrates RSA's sensitivity issue in embedding layers; attributed partly to Spearman rank handling of RDMs with differing relative extrema.
Covariance pooling compresses gigabytes of activations into compact stable embeddings without large labeled datasetsfinding0.729
Practical finding: the method produces compact fixed-length representations from large volumes of token activations without requiring supervised labels.