Adaptive Beta Softmax Scaling

Implementation detail weighting softmax by log(n_memories) to prevent down-weighting of attention values as memory set grows.

Neighborhood — ranked by edge-count

framework

TEM-Transformer (TEM-t)
implements
The transformer version directly analogous to TEM, introduced in this paper, offering dramatic performance improvements.

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Softmax Bottleneckconcept0.739
Failure mode for output-surjectivity: LLMs may lack capacity to predict all tokens due to rank constraints
Softmax Functionmethod0.738
Neuronal dynamics computed from free energy gradients; interpreted as average firing rate of neural populations.
Maximum gradient norm scalingconcept0.727
Scaling aggregated gradient by the maximum gradient norm among tasks.
Softmax Policy Priorconcept0.724
Policies assigned probability via softmax of expected free energy; enables self-evidencing behavior.
Winner take-all architectures of decision-making are already commonplace in computational neuroscience, and the softmax function provides a smooth approximation.claim0.723
Neural plausibility argument for softmax policy selection.
per-dev z-scalingmethod0.721
Standardizing ρd and dr using dev-set means and stds to form dimensionless components of S.
Multitask Scaling Hypothesishypothesis0.719
Argues that there are fewer representations competent for N tasks than M<N tasks, so more general models have a smaller solution space
Softmax policy selectionmethod0.718
Selecting policies using a softmax (normalized exponential) function of negative expected free energy.