finding

active

finding:tem-t-requires-many-fewer-data-samples-than-tem-to-reach-equivalent-performance-sample-efficiency-improvement

TEM-t requires many fewer data samples than TEM to reach equivalent performance (sample efficiency improvement)

Empirical performance comparison showing TEM-t is a more efficient learner than the original TEM.

Source paper

extracted_from

Relating transformers to models and neural representations of the hippocampal formation

(2021) · James C. R. Whittington · Joseph W. Warren · Timothy E.J. Behrens

Neighborhood — ranked by edge-count

Frameworks (1)

framework

TEM-Transformer (TEM-t)
supports
The transformer version directly analogous to TEM, introduced in this paper, offering dramatic performance improvements.

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

TEM-t requires less time per gradient step than TEMfinding0.841
Empirical computational efficiency result comparing TEM-t to the original TEM implementation.
TEM memory retrieval is mathematically equivalent to transformer self-attention without softmaxclaim0.784
Central theoretical claim: a single step of TEM attractor dynamics equals a dot-product attention, making TEM a special case of transformer.
TEM-t instantiates hippocampal indexing theory by using memory neurons to bind cortical representations across brain regionsclaim0.751
Theoretical claim linking the TEM-t architecture to the Teyler-Rudy hippocampal indexing theory.
There are fewer representations competent for N tasks than M<N tasks, so training more general models should yield fewer possible solutionshypothesis0.742
Selective pressure toward convergence via task generality
The results are more widely applicable; similar results will come from asking people in other cultures to answer analogous questions.claim0.741
Universalist claim predicting cross-cultural generality.
The model tends to reflect more when the question is difficult, and accuracy is generally lower for harder questionshypothesis0.737
Hypothesis explaining negative correlation between reflection rate and accuracy without implying reflection is harmful
Training identical architectures on the same data with different objective functions should produce systematically different internal evaluative representations, detectable through interpretability tools, even when final task performance is matchedhypothesis0.736
Second falsifiable prediction linking objective function structure to valence profile
Scale is sufficient but not necessarily efficient to reach high levels of intelligence; different methods can scale with different efficiency levelsclaim0.734
Implication of PRH for 'scale is all you need' argument