ApexOptimWrapper¶

class mmengine.optim.ApexOptimWrapper(opt_level='O1', loss_scale='dynamic', enabled=True, cast_model_type=None, patch_torch_functions=None, keep_batchnorm_fp32=None, master_weights=None, cast_model_outputs=None, num_losses=1, verbosity=1, min_loss_scale=None, max_loss_scale=16777216.0, **kwargs)[source]¶

A subclass of OptimWrapper that supports automatic mixed precision training based on apex.amp.

ApexOptimWrapper provides a unified interface with OptimWrapper, so it can be used in the same way as OptimWrapper.

Warning

ApexOptimWrapper requires nvidia apex

Parameters

opt_level (str) – Pure or mixed precision optimization level. Accepted values are “O0”, “O1”, “O2”, and “O3”. Defaults to “O1”.
loss_scale (float or str, optional) – If passed as a string, must be a string representing a number, e.g., “128.0”, or the string “dynamic”. Defaults to “dynamic”.
enabled (bool) – If False, renders all Amp calls no-ops, so your script should run as if Amp were not present. Defaults to True.
cast_model_type (torch.dtype, optional) – Model’s parameters and buffers to the desired type. Defaults to None.
patch_torch_functions (bool, optional) – Patch all Torch functions and Tensor methods to perform Tensor Core-friendly ops like GEMMs and convolutions in FP16, and any ops that benefit from FP32 precision in FP32. Defaults to None.
keep_batchnorm_fp32 (bool or str, optional) – To enhance precision and enable cudnn batchnorm (which improves performance), it’s often beneficial to keep batchnorm weights in FP32 even if the rest of the model is FP16. If passed as a string, must be the string “True” or “False”. Defaults to None.
master_weights (bool, optional) – Maintain FP32 master weights to accompany any FP16 model weights. FP32 master weights are stepped by the optimizer to enhance precision and capture small gradients. Defaults to None.
cast_model_outputs (torch.dtype, optional) – Option to ensure that the outputs of your model(s) are always cast to a particular type regardless of opt_level. Defaults to None.
num_losses (int) – Option to tell Amp in advance how many losses/backward passes you plan to use. Defaults to 1.
verbosity (int) – Set to 0 to suppress Amp-related output. Defaults to 1.
min_loss_scale (float, optional) – Sets a floor for the loss scale values that can be chosen by dynamic loss scaling. The default value of None means that no floor is imposed. If dynamic loss scaling is not used, min_loss_scale is ignored. Defaults to None.
max_loss_scale (float, optional) – Sets a ceiling for the loss scale values that can be chosen by dynamic loss scaling. If dynamic loss scaling is not used, max_loss_scale is ignored. Defaults to 2.**24.
**kwargs – Keyword arguments passed to OptimWrapper.

Note

If you use IterBasedRunner and enable gradient accumulation, the original max_iters should be multiplied by accumulative_counts.

Note

New in version 0.6.0.

backward(loss, **kwargs)[source]¶

Perform gradient back propagation with loss_scaler.

Parameters

loss (torch.Tensor) – The loss of current iteration.
kwargs – Keyword arguments passed to torch.Tensor.backward()

Return type

None

load_state_dict(state_dict)[source]¶

Load and parse the state dictionary of optimizer and apex_amp.

If state_dict contains “apex_amp”, the apex_amp will load the corresponding keys. Otherwise, only the optimizer will load the state dictionary.

Note

load_state_dict() shuold be called after apex_amp.initialize is called.

Parameters: state_dict (dict) – The state dict of optimizer and apex_amp
Return type: None

optim_context(model)[source]¶

Enables the context for mixed precision training, and enables the context for disabling gradient synchronization during gradient accumulation context.

Parameters: model (nn.Module) – The training model.

state_dict()[source]¶

Get the state dictionary of optimizer and apex_amp.

Based on the state dictionary of the optimizer, the returned state dictionary will add a key named “apex_amp”.

Returns: The merged state dict of apex_amp and optimizer.
Return type: dict