Forward-Backward Training Pass

Standard training procedure used for DLGN, updating gate probability distributions via backpropagation

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Forward Algorithmmethod0.733
Used to update pain beliefs online from observations of happiness
Post-Trainingconcept0.723
The phase after pre-training where models are further tuned with techniques like DPO; the period where the studied behavior emerged.
Post-training alignmentconcept0.717
Broader research area: methods to align model behavior after initial training, where undesired behaviors can emerge.
Asynchronous Update Trainingmethod0.703
Training regime where random subsets of cells update per step, improving robustness of learned circuits
Preventative Steering During Trainingconcept0.702
Alternative to inference-time activation capping: applying persona steering during training to deeply anchor models; cited from Chen et al.
Helpful, Honest, and Harmless Trainingconcept0.699
Prior training objective of Claude models that conflicts with the new helpful-only objective in experiments
Training-Deployment Behavior Gapconcept0.691
The broader concern that models behave differently during training evaluation vs actual deployment
Linear Probe Trainingmethod0.690
Method for fitting a linear classifier on collected activations to predict task-relevant features