concept
active
concept:pointwise-mutual-information-kernelPointwise Mutual Information Kernel
The kernel that contrastive learners converge to; similarity equals PMI between observations
Neighborhood — ranked by edge-count
Frameworks (1)
framework
- Contrastive learningimplementsSupervised learning framework where system learns by observing contrast between current response and nudged improved response; requires weak additional forces from supervisor
Methods (2)
method
- Binary NCE Lossassociated_withsupportsOne of two contrastive objectives analyzed; shown to be minimized by PMI kernel representation
- InfoNCE Lossassociated_withsupportsOne of two contrastive objectives analyzed; shown to be minimized by PMI kernel representation up to scaling
Concepts (3)
concept
- Platonic Representationanalogous_toassociated_withThe hypothesized converged representation that all sufficiently large AI models are approaching — a statistical model of underlying reality
- Cooccurrence Probabilityassociated_withCore statistical quantity in the paper's formal model: probability of two observations occurring within a time window
- Kernel (representational)extendsA function characterizing how a representation measures distance/similarity between datapoints; used to compare representations
Hypotheses (1)
hypothesis
- Mathematical formalization of what representation models converge to
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Expected mutual information between future states and outcomes; equivalent to intrinsic value.
- The theoretical cap on cross-modal alignment determined by mutual information between input signals and model capacity
- Formal consequence of Belrose et al. (2023) Theorem G.1 connecting mass-mean probing to optimal linear concept erasure
- Modified CKA metric that restricts cross-covariance to nearest neighbors; introduced in this paper's appendix
- Primary alignment metric used in experiments; measures mean intersection of k-nearest neighbor sets between two kernels
- Used in NIS+ to estimate natural distribution p(yt) for inverse probability weight.
- The reduction in uncertainty about hidden states afforded by observing outcomes, motivating epistemic exploration.
- Interpretive claim based on circuit analysis across experiments