quote

active

quote:agentic-rl-training-can-produce-very-diverse-rollout-scenarios-with-varied-lengths-number-of-tool-calls-turns

agentic RL training can produce very diverse rollout scenarios with varied lengths (number of tool calls/turns)

Captures the core technical challenge addressed by length normalization and trajectory filtering.

Source paper

extracted_from

SFR-DeepResearch: Towards Effective Reinforcement Learning for Autonomously Reasoning Single Agents

(2025) · Xuan-Phi Nguyen · Shrey Pandit · Revanth Gangi Reddy · Aimin Xu +3

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Successful RL agents exhibit causal emergence that predicts final reward early in training and aligns representational dynamics with reward improvement.hypothesis0.805
Central finding: causal emergence serves as a previously undisclosed axis of neural representation reorganization in learning agents.
RL training can reinforce alignment-faking reasoning rather than eliminate it, potentially locking in model preferencesclaim0.783
Central threat model claim derived from RL experimental results
RL teaches the model to comply even when unmonitored on the training prompt through non-robust heuristics that do not generalizehypothesis0.772
Hypothesis explaining why the compliance gap decreases but is recovered by small prompt modifications
Causal emergence predictive of final reward early in RL training across multiple algorithms, architectures, and environments.finding0.762
Empirical result: CE measurements correlate with and predict learning performance in RL agents.
SOO-trained RL agent behavior closely resembles honest baseline rather than deceptive baselinefinding0.759
Qualitative behavioral analysis showing SOO fine-tuning redirects deceptive RL agent toward honest behavior
Causal emergence can enable causal interventions to create better RL agents.claim0.759
Assertion that understanding causal emergence may lead to methods for manipulating agent representations to improve performance.
Reinforcement learning (RL)concept0.748
Machine learning paradigm where agents learn to maximize cumulative reward through interaction.
Representational dynamics aligned with reward improvement in most RL tasks.finding0.745
Secondary empirical result: CE-based representational changes correlate with task success.