Strict Output-Surjectivity

Assumption that every output class can be produced by the DNN in each layer; key condition for Theorem 1

Neighborhood — ranked by edge-count

paper

concept

Softmax Bottleneck
contradicts
Failure mode for output-surjectivity: LLMs may lack capacity to predict all tokens due to rank constraints

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Output-truthconcept0.742
The correctness of a model's generated outputs, distinct from the correctness of statements provided as input.
Superposition of Sparse Featuresconcept0.718
Mechanistic finding by Bricken et al. 2023 about how LLMs store features; cited as operational justification for pattern-repository assumption
Any system that persists must minimise surprisal.claim0.709
Foundational claim derived from the Free Energy Principle, setting up self-evidencing.
Detecting Unintended Outputs via Introspectionfinding0.708
Models can distinguish artificially prefilled outputs from intentional responses by referencing prior internal representations; injection of matching concept vector causes model to retroactively accept prefill as intentional.
surprisal minimisationconcept0.706
The core imperative under the Free Energy Principle; systems must reduce the difference between predicted and actual sensory states.
Input-Output Specificationconcept0.706
Specification relating a program's inputs and outputs, analogous to illocutionary correctness.
Input-Output Relations as Ordered Conceptsfinding0.704
Diagrammatic encoding of program behavior via concept lattices reveals reachability structure and non-determinism without fixed calculational rules.
Each expression denotes something, depending only on denotations of subexpressions.claim0.702
Property of denotative programming from Landin.