claim

active

claim:tem-memory-retrieval-is-mathematically-equivalent-to-transformer-self-attention-without-softmax

TEM memory retrieval is mathematically equivalent to transformer self-attention without softmax

Central theoretical claim: a single step of TEM attractor dynamics equals a dot-product attention, making TEM a special case of transformer.

Source paper

extracted_from

Relating transformers to models and neural representations of the hippocampal formation

(2021) · James C. R. Whittington · Joseph W. Warren · Timothy E.J. Behrens

Neighborhood — ranked by edge-count

Papers (1)

paper

Relating transformers to models and neural representations of the hippocampal formation
introduces

Findings (3)

finding

TEM-t with linear activations learns grid-cell-like position encoding representations in 2D spatial environments
supports
Empirical result showing TEM-t recapitulates entorhinal grid cell representations with linear post-transition activation.
TEM-t learns band-cell-like position encoding representations resembling Krupic et al. band cells
supports
Empirical result showing TEM-t position encodings also recapitulate band cells, not just grid cells.
TEM-t learns grid cells in hexagonal 6-connected worlds
supports
Empirical extension showing grid cell learning generalises to non-4-connected spatial environments.

Claims (2)

claim

TEM's path-integration representation g plays the role of position encodings in transformers
extends
Key structural correspondence claim linking the neuroscience model's spatial representation to ML concept of position encoding.
The relationship between the brain and transformers is close because of a mathematical relationship between models, not merely because of shared neural representations
extends
Methodological clarification distinguishing this paper's contribution from looser representational similarity claims.

Questions (1)

question

are hippocampal architecture and bespoke neuroscience models capable of the general purpose computations studied in machine learning?
answered_by
Motivating question from introduction that the TEM-transformer equivalence helps answer affirmatively.

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

TEM-t instantiates hippocampal indexing theory by using memory neurons to bind cortical representations across brain regionsclaim0.794
Theoretical claim linking the TEM-t architecture to the Teyler-Rudy hippocampal indexing theory.
TEM-t requires many fewer data samples than TEM to reach equivalent performance (sample efficiency improvement)finding0.784
Empirical performance comparison showing TEM-t is a more efficient learner than the original TEM.
TEM-t requires less time per gradient step than TEMfinding0.768
Empirical computational efficiency result comparing TEM-t to the original TEM implementation.
TEM-Transformer (TEM-t)framework0.762
The transformer version directly analogous to TEM, introduced in this paper, offering dramatic performance improvements.
TEM-t memory neurons show spatially-tuned firing resembling hippocampal place cells in each environmentfinding0.755
Empirical result demonstrating that the sparse softmax activation of memory neurons produces place-cell-like spatial tuning.
Transformers develop self-models through in-context learning, not just training data; even old base models without LLM-related text can bootstrap self-referential reasoning at runtime.claim0.739
Antra's foundational claim about how introspection arises computationally rather than from memorised text.
Tolman-Eichenbaum Machine (TEM)framework0.737
Neuroscience model of hippocampal formation that the paper shows is mathematically equivalent to a transformer with recurrent position encodings.
The earlier a base model (less exposure to LM-related data), the more it is surprised by its own spontaneous self-referential capabilities.claim0.734
Claim that capability emerges from architecture, not data, and that later models lose the surprise.