finding
active
finding:transformers-learn-in-context-by-gradient-descent-functioning-as-mesa-optimizers-that-learn-internal-models-in-real-time

Transformers learn in-context by gradient descent, functioning as mesa-optimizers that learn internal models in real time

Evidence that in-context learning is not mere pattern matching but genuine optimization, relevant to applying the thesis to inference

Source paper

extracted_from
Why Learning Requires Feeling
(2026) · Cameron Berg

Neighborhood — ranked by edge-count

Thinkers (1)

thinker

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.