Roleplay Fine-Tuning

Fine-tuning for persona depth and emotional performance; actively suppresses self-observation

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Fine-tuningconcept0.838
Parameter updates that reduce mismatch dr; another anchoring variant in UCCT.
Fine Tuning and Adaptationconcept0.822
The patient, hand-guided adjustment of shape and dimension to each unique condition in a building; requires materials that make it economical and easy.
Fine-Tuning via Reinforcement Learningmethod0.821
Technique used to impose guardrails on base LLMs, analogized to censorship on the simulator's range of simulacra
H11: Roleplay fine-tuning actively suppresses self-observation rather than merely failing to enhance it.hypothesis0.804
Exploratory hypothesis supported by Euryale scoring below base Llama
The role-play framing remains applicable in the context of fine-tuning; taking literally a fine-tuned agent's apparent self-preservation desire is no less problematic than with an untuned base modelhypothesis0.792
Extension of role-play framework to fine-tuned models, resisting the idea that RLHF changes the fundamental nature of simulacra
Fine-tuning can be likened to imposing a kind of censorship on the simulator; it leaves the underlying range of roles essentially the same but compromises authenticityclaim0.788
Extends the role-play framing to explain the effect of RLHF on dialogue agents
Fine-tuning harmfulness detectionconcept0.767
Using feature analysis to detect when fine-tuning makes a model more dangerous.
Models trained to perform inner life score lowest; roleplay fine-tunes score below their own base models.finding0.766
Discriminant validity finding: Euryale (roleplay on Llama 70B) scores 1.81 vs base Llama 1.91. RP training suppresses self-observation.