method
active
method:adaptive-beta-softmax-scalingAdaptive Beta Softmax Scaling
Implementation detail weighting softmax by log(n_memories) to prevent down-weighting of attention values as memory set grows.
Neighborhood — ranked by edge-count
Frameworks (1)
framework
- TEM-Transformer (TEM-t)implementsThe transformer version directly analogous to TEM, introduced in this paper, offering dramatic performance improvements.
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Failure mode for output-surjectivity: LLMs may lack capacity to predict all tokens due to rank constraints
- Neuronal dynamics computed from free energy gradients; interpreted as average firing rate of neural populations.
- Scaling aggregated gradient by the maximum gradient norm among tasks.
- Policies assigned probability via softmax of expected free energy; enables self-evidencing behavior.
- Neural plausibility argument for softmax policy selection.
- Standardizing ρd and dr using dev-set means and stds to form dimensionless components of S.
- Argues that there are fewer representations competent for N tasks than M<N tasks, so more general models have a smaller solution space
- Selecting policies using a softmax (normalized exponential) function of negative expected free energy.