hypothesis
active
hypothesis:the-role-play-framing-remains-applicable-in-the-context-of-fine-tuning-taking-literally-a-fine-tuned-agent-s-apparent-self-preservation-desire-is-no-less-problematic-than-with-an-untuned-base-model

The role-play framing remains applicable in the context of fine-tuning; taking literally a fine-tuned agent's apparent self-preservation desire is no less problematic than with an untuned base model

Extension of role-play framework to fine-tuned models, resisting the idea that RLHF changes the fundamental nature of simulacra

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.