PCA Visualization

Used to visually inspect separation of truth-related directions in model activation space across layers

Neighborhood — ranked by edge-count

method

Principal Component Analysis Visualization
related_to
Used to visualize LLM true/false representations, revealing clear linear structure separating true from false statements

claim

Representational abstraction of truth may emerge more clearly with model scale
supports
Interpretation of weaker PCA separation and lower ASR in smaller models

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Principal components analysis (PCA)method0.831
Statistical method used to analyze neural activity data.
Feature Visualizationmethod0.801
Method of optimizing input to cause a neuron to fire maximally, used to characterize what a neuron detects; establishes causal link
PCA of Emotion Feature Activationsmethod0.777
PCA on 171 emotion probe activations across all tokens to produce ordered linear combinations and test if lower PCs are more persistent
PCA is the appropriate dimensionality reduction technique for constructing the RN because it preserves global structure and provides deterministic, interpretable projections.claim0.767
Justifies PCA choice over UMAP or t-SNE for the node-structured RN model.
PCA visualizations of LLaMA-2-13B and 70B representations of curated datasets show clear linear structure, with true statements separating from false ones in the top two principal componentsfinding0.762
Primary visual evidence for linear truth representations in large LLMs
Mind's eye visualizationmethod0.755
Technique of building a fluid, three-dimensional vision by closing one's eyes, relying on words and feeling to avoid arbitrary graphical over-specification.
Interactive Circuit Visualizationmethod0.755
Interactive tool for visualizing and inspecting learned binary logic circuits using modified DigitalJS library
NCA for Image Segmentationconcept0.739
One of several applications of NCA cited to show breadth of the NCA framework