method
active
method:fine-tuning-via-reinforcement-learningFine-Tuning via Reinforcement Learning
Technique used to impose guardrails on base LLMs, analogized to censorship on the simulator's range of simulacra
Neighborhood — ranked by edge-count
Concepts (1)
concept
- Guardrailsassociated_withConstraints imposed via fine-tuning to reduce harmful output; can reduce harm but also attenuate expressivity and creativity
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- The patient, hand-guided adjustment of shape and dimension to each unique condition in a building; requires materials that make it economical and easy.
- Parameter updates that reduce mismatch dr; another anchoring variant in UCCT.
- Alternative framework for agent behavior; based on reward maximization rather than free energy minimization.
- Fine-tuning for persona depth and emotional performance; actively suppresses self-observation
- Key interpretive conclusion from the dissociation between attempt rate and improvement rate in fine-tuning experiments
- Method for fine-tuning LMs based on human preferences; mentioned as combining RL and LMs.
- Re-running probabilistic bisection on each fine-tuned checkpoint to normalize first-attempt difficulty