Shortcuts

# CosineAnnealingMomentum¶

class mmengine.optim.CosineAnnealingMomentum(optimizer, *args, **kwargs)[源代码]

Set the momentum of each parameter group using a cosine annealing schedule, where $$\eta_{max}$$ is set to the initial value and $$T_{cur}$$ is the number of epochs since the last restart in SGDR:

\begin{split}\begin{aligned} \eta_t & = \eta_{min} + \frac{1}{2}(\eta_{max} - \eta_{min})\left(1 + \cos\left(\frac{T_{cur}}{T_{max}}\pi\right)\right), & T_{cur} \neq (2k+1)T_{max}; \\ \eta_{t+1} & = \eta_{t} + \frac{1}{2}(\eta_{max} - \eta_{min}) \left(1 - \cos\left(\frac{1}{T_{max}}\pi\right)\right), & T_{cur} = (2k+1)T_{max}. \end{aligned}\end{split}

Notice that because the schedule is defined recursively, the momentum can be simultaneously modified outside this scheduler by other operators. If the momentum is set solely by this scheduler, the momentum at each step becomes:

$\eta_t = \eta_{min} + \frac{1}{2}(\eta_{max} - \eta_{min})\left(1 + \cos\left(\frac{T_{cur}}{T_{max}}\pi\right)\right)$

It has been proposed in SGDR: Stochastic Gradient Descent with Warm Restarts. Note that this only implements the cosine annealing part of SGDR, and not the restarts.

• optimizer (Optimizer or OptimWrapper) – optimizer or Wrapped optimizer.

• T_max (int) – Maximum number of iterations.

• eta_min (float) – Minimum momentum value. Defaults to 0.

• begin (int) – Step at which to start updating the momentum. Defaults to 0.

• end (int) – Step at which to stop updating the momentum. Defaults to INF.

• last_step (int) – The index of last step. Used for resume without state dict. Defaults to -1.

• by_epoch (bool) – Whether the scheduled momentum is updated by epochs. Defaults to True.

• verbose (bool) – Whether to print the momentum for each update. Defaults to False.

© Copyright 2022, mmengine contributors. Revision 13484aae.

Built with Sphinx using a theme provided by Read the Docs.