CosineAnnealingMomentum¶
- class mmengine.optim.CosineAnnealingMomentum(optimizer, *args, **kwargs)[源代码]¶
Set the momentum of each parameter group using a cosine annealing schedule, where \(\eta_{max}\) is set to the initial value and \(T_{cur}\) is the number of epochs since the last restart in SGDR:
\[\begin{split}\begin{aligned} \eta_t & = \eta_{min} + \frac{1}{2}(\eta_{max} - \eta_{min})\left(1 + \cos\left(\frac{T_{cur}}{T_{max}}\pi\right)\right), & T_{cur} \neq (2k+1)T_{max}; \\ \eta_{t+1} & = \eta_{t} + \frac{1}{2}(\eta_{max} - \eta_{min}) \left(1 - \cos\left(\frac{1}{T_{max}}\pi\right)\right), & T_{cur} = (2k+1)T_{max}. \end{aligned}\end{split}\]Notice that because the schedule is defined recursively, the momentum can be simultaneously modified outside this scheduler by other operators. If the momentum is set solely by this scheduler, the momentum at each step becomes:
\[\eta_t = \eta_{min} + \frac{1}{2}(\eta_{max} - \eta_{min})\left(1 + \cos\left(\frac{T_{cur}}{T_{max}}\pi\right)\right)\]It has been proposed in SGDR: Stochastic Gradient Descent with Warm Restarts. Note that this only implements the cosine annealing part of SGDR, and not the restarts.
- 参数:
optimizer (Optimizer or OptimWrapper) – optimizer or Wrapped optimizer.
T_max (int) – Maximum number of iterations.
eta_min (float) – Minimum momentum value. Defaults to None.
begin (int) – Step at which to start updating the momentum. Defaults to 0.
end (int) – Step at which to stop updating the momentum. Defaults to INF.
last_step (int) – The index of last step. Used for resume without state dict. Defaults to -1.
by_epoch (bool) – Whether the scheduled momentum is updated by epochs. Defaults to True.
verbose (bool) – Whether to print the momentum for each update. Defaults to False.
eta_min_ratio (float, optional) – The ratio of the minimum parameter value to the base parameter value. Either eta_min or eta_min_ratio should be specified. Defaults to None. New in version 0.3.2.