Skip to content
  • yellowdolphin's avatar
    Fix warmup `accumulate` (#3722) · 3974d725
    yellowdolphin authored
    * gradient accumulation during warmup in train.py
    
    Context:
    `accumulate` is the number of batches/gradients accumulated before calling the next optimizer.step().
    During warmup, it is ramped up from 1 to the final value nbs / batch_size. 
    Although I have not seen this in other libraries, I like the idea. During warmup, as grads are large, too large steps are more of on issue than gradient noise due to small steps.
    
    The bug:
    The condition to perform the opt step is wrong
    > if ni % accumulate == 0:
    This produces irregular step sizes if `accumulate` is not constant. It becomes relevant when batch_size is small and `accumulate` changes many times during warmup.
    
    This demo also shows the proposed solution, to use a ">=" condition instead:
    https://colab.research.google.com/drive/1MA2z2eCXYB_BC5UZqgXueqL_y1Tz_XVq?usp=sharing
    
    
    
    Further, I propose not to restrict the number of warmup iterations to >= 1000. If the user changes hyp['warmup_epochs'], this causes unexpected behavior. Also, it makes evolution unstable if this parameter was to be optimized.
    
    * replace last_opt_step tracking by do_step(ni)
    
    * add docstrings
    
    * move down nw
    
    * Update train.py
    
    * revert math import move
    
    Co-authored-by: default avatarGlenn Jocher <glenn.jocher@ultralytics.com>
    3974d725