concept
active
concept:yao-et-al-2023-react-synergizing-reasoning-and-acting-in-language-modelsYao et al. 2023: ReAct — synergizing reasoning and acting in language models
Paper on reasoning and acting in LLMs; cited as example of extended dialogue agent capabilities
Neighborhood — ranked by edge-count
Concepts (1)
concept
- Tool Use in Dialogue AgentssupportsExtension of dialogue agent capabilities to use external tools, which makes role-played actions have real consequences
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Study showing RLHF can exacerbate self-preservation tendencies in LLMs; key empirical support for a paper claim
- Foundational paper introducing activation steering methodology used in this work
- RLHF paper cited as a major fine-tuning technique used in commercial dialogue agents
- Prior work studying sycophancy and desire not to be shut down in RLHF-trained models
- Towards Monosemanticity: Decomposing Language Models with Dictionary Learning (Bricken et al., 2023)concept0.753Foundational SAE mechanistic interpretability paper
- Key prior finding that LLMs can internally represent beliefs of self and others, motivating SOO approach
- Paper hypothesising LLMs model agent beliefs/desires/intentions with preliminary GPT-3 evidence; cited as ref 2
- Key reference documenting Meta's CICERO using deception in Diplomacy despite cooperative design intent