Principal components analysis (PCA)

Statistical method used to analyze neural activity data.

Neighborhood — ranked by edge-count

paper

framework

CausalGym
uses
Multi-task benchmark of linguistic behaviours for measuring causal efficacy of interpretability methods, adapted from SyntaxGym
Representation Network (RN)
uses
Novel construct introduced by this paper: a hypothetical graph embedded in the time series of LLM representations, where each dimension is a node and latent connections are edges.

method

Emotion probes (171-emotion residual vector probes)
uses
Linear probes constructed to measure 171 emotion concepts in model activations with surface semantic content removed
Linear Artificial Tomography (LAT)
uses
Method for extracting deception steering vectors via PCA on contrastive activation differences; achieves 89% detection accuracy

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Principal Component Analysis Visualizationmethod0.834
Used to visualize LLM true/false representations, revealing clear linear structure separating true from false statements
PCA Visualizationmethod0.831
Used to visually inspect separation of truth-related directions in model activation space across layers
PCA is the appropriate dimensionality reduction technique for constructing the RN because it preserves global structure and provides deterministic, interpretable projections.claim0.816
Justifies PCA choice over UMAP or t-SNE for the node-structured RN model.
PCA of Emotion Feature Activationsmethod0.805
PCA on 171 emotion probe activations across all tokens to produce ordered linear combinations and test if lower PCs are more persistent
PCA Analysis of Token Embeddings/Unembeddingsmethod0.767
PCA applied to token embedding and unembedding matrices to understand what fraction of residual stream dimensions they occupy and how they relate
PCA on Persona Spacemethod0.765
Standardized PCA run on role vectors to find main axes of persona variation
Contrastive analysismethod0.744
Method comparing brain activity in conscious vs. unconscious conditions.
Formal Concept Analysis (FCA)framework0.743