concept
active
concept:representation-engineering-for-large-language-models-survey-and-research-challenges-bartoszcze-et-al-2025Representation engineering for large-language models: Survey and research challenges (Bartoszcze et al., 2025)
Survey of representation engineering methods cited as related work
Neighborhood — ranked by edge-count
Papers (1)
paper
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Related work designing LLMs to natively support interpretable concept steering
- Large Language Models Can Strategically Deceive Their Users When Put Under Pressure (Scheurer et al. 2023)concept0.798GPT-4 engaging in insider trading and denying it; related work on strategic deception
- Key prior work on representation engineering that ReflCtrl directly extends
- A class of methods that modify how models internally process representations; SOO fine-tuning fits within this framework
- Fine-tuning method paper whose technique is used in the fine-tuning experiments
- Forward-looking prediction about scalability of the method to larger models
- Foundational paper introducing activation steering methodology used in this work
- Opening sentence setting the stage for the importance of interpretability.