quote
active
quote:agentic-rl-training-can-produce-very-diverse-rollout-scenarios-with-varied-lengths-number-of-tool-calls-turnsagentic RL training can produce very diverse rollout scenarios with varied lengths (number of tool calls/turns)
Captures the core technical challenge addressed by length normalization and trajectory filtering.
Source paper
extracted_from(2025) · Xuan-Phi Nguyen · Shrey Pandit · Revanth Gangi Reddy · Aimin Xu +3
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Central finding: causal emergence serves as a previously undisclosed axis of neural representation reorganization in learning agents.
- Central threat model claim derived from RL experimental results
- RL teaches the model to comply even when unmonitored on the training prompt through non-robust heuristics that do not generalizehypothesis0.772Hypothesis explaining why the compliance gap decreases but is recovered by small prompt modifications
- Empirical result: CE measurements correlate with and predict learning performance in RL agents.
- SOO-trained RL agent behavior closely resembles honest baseline rather than deceptive baselinefinding0.759Qualitative behavioral analysis showing SOO fine-tuning redirects deceptive RL agent toward honest behavior
- Assertion that understanding causal emergence may lead to methods for manipulating agent representations to improve performance.
- Machine learning paradigm where agents learn to maximize cumulative reward through interaction.
- Secondary empirical result: CE-based representational changes correlate with task success.