quote
active
quote:agentic-rl-training-can-produce-very-diverse-rollout-scenarios-with-varied-lengths-number-of-tool-calls-turns

agentic RL training can produce very diverse rollout scenarios with varied lengths (number of tool calls/turns)

Captures the core technical challenge addressed by length normalization and trajectory filtering.

Source paper

extracted_from
SFR-DeepResearch: Towards Effective Reinforcement Learning for Autonomously Reasoning Single Agents
(2025) · Xuan-Phi Nguyen · Shrey Pandit · Revanth Gangi Reddy · Aimin Xu +3

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.