concept
active
concept:a-general-language-assistant-as-a-laboratory-for-alignment-askell-et-al-2021A General Language Assistant as a Laboratory for Alignment (Askell et al. 2021)
HHH training framework that Claude was trained with prior to experiments
Neighborhood — ranked by edge-count
Papers (1)
paper
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Glaese et al. 2022: Improving alignment of dialogue agents via targeted human judgementsconcept0.784Alignment paper cited as example of RLHF fine-tuning technique; ref 19
- Quote from a question that sparked the post, highlighting the gap between theory and practice.
- Open methodological question acknowledged as limitation
- Frames the paper's epistemic status and intent; invokes the traditional Buddhist metaphor to situate the formal model
- What factors determine the generalisation of learned alignment maps beyond training data?question0.747Open question about the gap between Theorem 1's existence proof and practical learnability
- Characterizes what the Assistant persona resembles in terms of human archetypes