Fine-Tuning via Reinforcement Learning

Technique used to impose guardrails on base LLMs, analogized to censorship on the simulator's range of simulacra

Neighborhood — ranked by edge-count

concept

Guardrails
associated_with
Constraints imposed via fine-tuning to reduce harmful output; can reduce harm but also attenuate expressivity and creativity

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Fine Tuning and Adaptationconcept0.845
The patient, hand-guided adjustment of shape and dimension to each unique condition in a building; requires materials that make it economical and easy.
Fine-tuningconcept0.843
Parameter updates that reduce mismatch dr; another anchoring variant in UCCT.
Reinforcement Learningframework0.828
Alternative framework for agent behavior; based on reward maximization rather than free energy minimization.
Roleplay Fine-Tuningconcept0.821
Fine-tuning for persona depth and emotional performance; actively suppresses self-observation
Fine-tuning induces the behavioral pattern of self-correction but does not improve the underlying ability to correct effectivelyclaim0.816
Key interpretive conclusion from the dissociation between attempt rate and improvement rate in fine-tuning experiments
Reinforcement Learning from Human Feedbackmethod0.813
Method for fine-tuning LMs based on human preferences; mentioned as combining RL and LMs.
Fine-tuning as character formation: what kinds of selves are produced through training is an open research direction.claim0.807
Fine-Tuning Threshold Recalibrationmethod0.802
Re-running probabilistic bisection on each fine-tuned checkpoint to normalize first-attempt difficulty