Pointwise Mutual Information Kernel

The kernel that contrastive learners converge to; similarity equals PMI between observations

Neighborhood — ranked by edge-count

framework

Contrastive learning
implements
Supervised learning framework where system learns by observing contrast between current response and nudged improved response; requires weak additional forces from supervisor

method

Binary NCE Loss
associated_withsupports
One of two contrastive objectives analyzed; shown to be minimized by PMI kernel representation
InfoNCE Loss
associated_withsupports
One of two contrastive objectives analyzed; shown to be minimized by PMI kernel representation up to scaling

concept

Platonic Representation
analogous_toassociated_with
The hypothesized converged representation that all sufficiently large AI models are approaching — a statistical model of underlying reality
Cooccurrence Probability
associated_with
Core statistical quantity in the paper's formal model: probability of two observations occurring within a time window
Kernel (representational)
extends
A function characterizing how a representation measures distance/similarity between datapoints; used to compare representations

hypothesis

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Mutual Informationconcept0.737
Expected mutual information between future states and outcomes; equivalent to intrinsic value.
Mutual Information Cap on Alignmentconcept0.715
The theoretical cap on cross-modal alignment determined by mutual information between input signals and model capacity
The difference-in-means direction is the unique nullity-1 projection kernel that eliminates all linearly-recoverable binary classification information from a datasetclaim0.715
Formal consequence of Belrose et al. (2023) Theorem G.1 connecting mass-mean probing to optimal linear concept erasure
Centered Kernel Nearest-Neighbor Alignmentmethod0.710
Modified CKA metric that restricts cross-covariance to nearest neighbors; introduced in this paper's appendix
Mutual k-Nearest Neighbor Alignment Metricmethod0.702
Primary alignment metric used in experiments; measures mean intersection of k-nearest neighbor sets between two kernels
Kernel Density Estimation (KDE)method0.697
Used in NIS+ to estimate natural distribution p(yt) for inverse probability weight.
Information Gain / Mutual Informationconcept0.688
The reduction in uncertainty about hidden states afforded by observing outcomes, motivating epistemic exploration.
Different tasks yield distinct distributions of logic gate types between perception kernels and update networksclaim0.688
Interpretive claim based on circuit analysis across experiments