concept
active
concept:token-embeddings

Token embeddings

Vector representations of individual tokens from genomic foundation models; the raw inputs to sequence pooling methods.

Neighborhood — ranked by edge-count

Artifacts (1)

artifact
  • Goodfire research post introducing covariance pooling as a replacement for mean pooling in genomic foundation models; shows +52.9% R² lift on genomic track prediction.

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

  • Tokenconcept0.808
    Basic unit of LLM input/output: words, parts of words, punctuation marks, emojis
  • The specific type of representation studied in the paper: function f: X→R^n assigning feature vectors to inputs
  • Embedmentconcept0.796
    Technique where text is nested hierarchically within another, using indentation and margins to create subordinate orders of detail within an overarching embrace.
  • Lagged time series used to capture dynamical dependencies.
  • Mutual Embeddingconcept0.763
    A reinforcing interlock between different materials, mentioned alongside Deep Interlock in West Dean construction.
  • Preprocessing step using dev-set covariance to standardize span embeddings before computing S
  • PCA applied to token embedding and unembedding matrices to understand what fraction of residual stream dimensions they occupy and how they relate
  • The component used in latent reasoning to perform internal computation.