Guardrails

Constraints imposed via fine-tuning to reduce harmful output; can reduce harm but also attenuate expressivity and creativity

Neighborhood — ranked by edge-count

claim

method

Fine-Tuning via Reinforcement Learning
associated_with
Technique used to impose guardrails on base LLMs, analogized to censorship on the simulator's range of simulacra

concept

Casper et al. 2023: Open problems and fundamental limitations of RLHF
about
Paper noting that RLHF guardrails can attenuate model expressivity and creativity; cited as ref 30

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Wallpaper Groupsframework0.685
Mathematical classification system for two-dimensional repeating patterns; mentioned as analogous to frieze groups.
Gradientsconcept0.683
The property that qualities vary slowly, subtly, gradually across the extent of each living thing; gradients arise as natural responses to changing circumstances and create field-like character that points toward and establishes centers
guarded clausesconcept0.673
Selection mechanism in concurrent logic languages where guards are evaluated in parallel.
Generative zoningconcept0.672
A zoning code based on generative sequences rather than fixed criteria, enabling well-adapted building form to arise.
The groundconcept0.663
The ultimate non-material reality behind matter, experienced when living structure opens a window to the I.
Green (gardens)concept0.663
The color representing private gardens and positive outdoor space in the four-fold pattern.
GTBenchframework0.662
Game-theoretic LLM evaluation benchmark with short-horizon interactions, cited.
Grid Cellsconcept0.659
Spatially periodic firing neurons in medial entorhinal cortex; TEM-t learns representations resembling these.