Input-truth

Correctness of input statements to an LLM, as opposed to output-truth (correctness of model-generated outputs).

Neighborhood — ranked by edge-count

paper

concept

Output-truth
associated_with
The correctness of a model's generated outputs, distinct from the correctness of statements provided as input.

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Input-Injectivityconcept0.789
Assumption that DNN layers preserve input information by being injective; key condition for Theorem 1
Truth Directionconcept0.780
A hypothesized direction in LLM activation space that encodes the truth or falsehood of factual statements
sensory inputconcept0.778
Input from environment that the agent models and predicts.
Insightconcept0.755
Qualitative transition in generative model structure from Bayesian model reduction; emergence of understanding
Propositional Truthconcept0.751
The paper's operationalization of truthfulness as simple, unambiguous propositional statements that can be labeled true or false
Feedbackconcept0.749
The mechanism by which each step's effect is evaluated against the life of the whole, guiding the unfolding.
Helpful, Honest, Harmlessframework0.749
A set of evaluation criteria for AI assistants.
With unrestricted vocabulary, models occasionally respond in non-English Yes/No equivalents (e.g., Sí, Nein) after truth-direction interventionsfinding0.737
Suggestive evidence for language-independent truth representation in LLMs