TEM-Transformer (TEM-t)

The transformer version directly analogous to TEM, introduced in this paper, offering dramatic performance improvements.

Neighborhood — ranked by edge-count

paper

thinker

method

Recurrent Position Encodings
implements
Key modification to transformers proposed in this paper: position encodings generated by a recurrent network trained on action sequences.
Causal Attention Mask
implements
Modification to transformer restricting keys and values to previous time-steps only, mimicking how an agent accumulates experiences.
Position-Only Keys/Queries, Stimulus-Only Values Factorization
implements
Key architectural modification restricting queries and keys to position encodings while values depend only on stimuli; extreme version of best-practice insight.
Sensory Landmark Position Encoding Stabilization
implements
Method for stabilising drifting recurrent position encodings by querying stored landmark memories to correct path-integrated position.
Spatial Understanding Task
implements
Training paradigm requiring prediction of upcoming sensory observations during spatial navigation across multiple environments sharing the same structure.
Adaptive Beta Softmax Scaling
implements
Implementation detail weighting softmax by log(n_memories) to prevent down-weighting of attention values as memory set grows.

concept

Self-attention
implements
A form of key-query attention within a single input sequence; core to Transformers.

framework

Tolman-Eichenbaum Machine (TEM)
extends
Neuroscience model of hippocampal formation that the paper shows is mathematically equivalent to a transformer with recurrent position encodings.
Feature Neurons and Memory Neurons Architecture
extends
Biologically plausible two-pool architecture from Krotov & Hopfield (2020) splitting self-attention into feature and memory neuron populations; used to interpret TEM-t place cells.
Hippocampal Indexing Theory
implements
Theory that hippocampus provides an index binding together cortical patterns across different brain regions; TEM-t is shown to instantiate this.
Multiple Cortical Inputs to Hippocampus Extension
extends
Extension of TEM-t to handle conjunctions of more than two brain regions with linear (not exponential) scaling in hippocampal neuron count.
Transformer Neural Network
extends
Core machine learning architecture analyzed in the paper; shown to be mathematically related to TEM.

finding

TEM-t requires less time per gradient step than TEM
supports
Empirical computational efficiency result comparing TEM-t to the original TEM implementation.
TEM-t requires many fewer data samples than TEM to reach equivalent performance (sample efficiency improvement)
supports
Empirical performance comparison showing TEM-t is a more efficient learner than the original TEM.

2-hop · via this framework's ideas

Where ideas in this framework connect to the rest of the corpus — the same concept, an analogy, or a restatement elsewhere.

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

TEM memory retrieval is mathematically equivalent to transformer self-attention without softmaxclaim0.762
Central theoretical claim: a single step of TEM attractor dynamics equals a dot-product attention, making TEM a special case of transformer.
Signal Transformerconcept0.750
Core abstraction in Fruit: pure function mapping signals to signals; enables compositional GUI definitions.
Zero-Layer Transformerconcept0.730
A transformer with no attention layers; shown to model bigram statistics via T = W_U W_E
self-model (transformer)concept0.729
The transformer's model of itself as a predictive text engine, developed through in-context learning.
Decision Transformermethod0.723
A model that frames RL as sequence modeling, SOTA from random trajectories.
TEM-t learns grid cells in hexagonal 6-connected worldsfinding0.723
Empirical extension showing grid cell learning generalises to non-4-connected spatial environments.
TEM-t learns band-cell-like position encoding representations resembling Krupic et al. band cellsfinding0.718
Empirical result showing TEM-t position encodings also recapitulate band cells, not just grid cells.
A Mathematical Framework for Transformer Circuits (Elhage et al., 2021)concept0.717
Foundational mechanistic interpretability paper on transformer circuit analysis