method
pending-review
method:length-normalized-advantage-formulationLength-Normalized Advantage Formulation
nguyen-2025-sfr-deepresearch.mdFrontmatter (10 fields)
{
"doc": "nguyen-2025-sfr-deepresearch.md",
"context": "Novel modification to REINFORCE that normalizes step-level advantage by trajectory length to prevent long but low-quality trajectories from dominating training.",
"category": "ai",
"norm_label": "Length-Normalized Advantage Formulation",
"graphify_id": "length_normalized_advantage",
"source_file": "nguyen-2025-sfr-deepresearch.md",
"imported_from": "/Users/antonborzov/Documents/Research.nosync/papers/extract_typed_out/nguyen-2025-sfr-deepresearch/graph.json",
"extracted_type": "method",
"source_location": "§3.3, Eq. 1",
"graphify_file_type": "method"
}Outgoing (1)
Extends (1)
- REINFORCE(framework)
Incoming (1)
Mentions (1)
- papers-typed
nguyen-2025-sfr-deepresearch.md