concept
active
concept:softmax-bottleneck

Softmax Bottleneck

Failure mode for output-surjectivity: LLMs may lack capacity to predict all tokens due to rank constraints

Neighborhood — ranked by edge-count

Concepts (1)

concept
  • Assumption that every output class can be produced by the DNN in each layer; key condition for Theorem 1

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

  • Softmax Inc.institute0.794
    Organisation that hosted the Holistic Intelligence unconference where the paper's ideas originated
  • Softmax Functionmethod0.779
    Neuronal dynamics computed from free energy gradients; interpreted as average firing rate of neural populations.
  • Population structure mechanism implementing genetic assortment; cited as example of individuation mechanism in multicellularity.
  • Compression-prediction trade-off; NIS encodes micro-states through an information bottleneck.
  • A lower-dimensional activation that is the only pathway for information between higher-dimensional activations; e.g. the residual stream between MLP layers
  • Policies assigned probability via softmax of expected free energy; enables self-evidencing behavior.
  • Implementation detail weighting softmax by log(n_memories) to prevent down-weighting of attention values as memory set grows.
  • Selecting policies using a softmax (normalized exponential) function of negative expected free energy.