framework
active
framework:tem-transformer-tem-tTEM-Transformer (TEM-t)
The transformer version directly analogous to TEM, introduced in this paper, offering dramatic performance improvements.
Neighborhood — ranked by edge-count
Papers (1)
paper
Thinkers (1)
thinker
- James C.R. Whittingtonintroduces
Methods (6)
method
- Recurrent Position EncodingsimplementsKey modification to transformers proposed in this paper: position encodings generated by a recurrent network trained on action sequences.
- Causal Attention MaskimplementsModification to transformer restricting keys and values to previous time-steps only, mimicking how an agent accumulates experiences.
- Key architectural modification restricting queries and keys to position encodings while values depend only on stimuli; extreme version of best-practice insight.
- Method for stabilising drifting recurrent position encodings by querying stored landmark memories to correct path-integrated position.
- Spatial Understanding TaskimplementsTraining paradigm requiring prediction of upcoming sensory observations during spatial navigation across multiple environments sharing the same structure.
- Adaptive Beta Softmax ScalingimplementsImplementation detail weighting softmax by log(n_memories) to prevent down-weighting of attention values as memory set grows.
Concepts (1)
concept
- Self-attentionimplementsA form of key-query attention within a single input sequence; core to Transformers.
Frameworks (5)
framework
- Neuroscience model of hippocampal formation that the paper shows is mathematically equivalent to a transformer with recurrent position encodings.
- Biologically plausible two-pool architecture from Krotov & Hopfield (2020) splitting self-attention into feature and memory neuron populations; used to interpret TEM-t place cells.
- Hippocampal Indexing TheoryimplementsTheory that hippocampus provides an index binding together cortical patterns across different brain regions; TEM-t is shown to instantiate this.
- Extension of TEM-t to handle conjunctions of more than two brain regions with linear (not exponential) scaling in hippocampal neuron count.
- Transformer Neural NetworkextendsCore machine learning architecture analyzed in the paper; shown to be mathematically related to TEM.
Findings (2)
finding
- Empirical computational efficiency result comparing TEM-t to the original TEM implementation.
- Empirical performance comparison showing TEM-t is a more efficient learner than the original TEM.
Conceptual bridges
2-hop · via this framework's ideasWhere ideas in this framework connect to the rest of the corpus — the same concept, an analogy, or a restatement elsewhere.
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- TEM memory retrieval is mathematically equivalent to transformer self-attention without softmaxclaim0.762Central theoretical claim: a single step of TEM attractor dynamics equals a dot-product attention, making TEM a special case of transformer.
- Core abstraction in Fruit: pure function mapping signals to signals; enables compositional GUI definitions.
- A transformer with no attention layers; shown to model bigram statistics via T = W_U W_E
- The transformer's model of itself as a predictive text engine, developed through in-context learning.
- A model that frames RL as sequence modeling, SOTA from random trajectories.
- Empirical extension showing grid cell learning generalises to non-4-connected spatial environments.
- TEM-t learns band-cell-like position encoding representations resembling Krupic et al. band cellsfinding0.718Empirical result showing TEM-t position encodings also recapitulate band cells, not just grid cells.
- Foundational mechanistic interpretability paper on transformer circuit analysis