unnatural outputs

Artifactual behaviors produced when interventions cut off the data manifold, e.g., via linear steering.

Neighborhood — ranked by edge-count

paper

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

unnatural structuresconcept0.781
Configurations created by human designers that violate structure-preserving unfolding, lying outside the set L of living structure.
Output-truthconcept0.761
The correctness of a model's generated outputs, distinct from the correctness of statements provided as input.
Input-Output Specificationconcept0.723
Specification relating a program's inputs and outputs, analogous to illocutionary correctness.
Input-Output Relations as Ordered Conceptsfinding0.720
Diagrammatic encoding of program behavior via concept lattices reveals reachability structure and non-determinism without fixed calculational rules.
Detecting Unintended Outputs via Introspectionfinding0.714
Models can distinguish artificially prefilled outputs from intentional responses by referencing prior internal representations; injection of matching concept vector causes model to retroactively accept prefill as intentional.
Recognition of self-generated outputsconcept0.713
Ability to distinguish one's own outputs from those of other models or humans; related to prefill detection.
Input-Output Relations Diagrammaticallymethod0.710
Artificial Lifeframework0.707
Computational modeling approach studying living/cognitive systems; used to test hypotheses about self-illusion effects without direct intervention.