Speed up Training¶
The usage of distributed had been moved to Distributed Training.
Mixed Precision Training¶
Nvidia introduced the Tensor Core unit into the Volta and Turing architectures to support FP32 and FP16 mixed precision computing. They further support BF16 in Ampere architectures. With automatic mixed precision training enabled, some operators operate at FP16/BF16 and the rest operate at FP32, which reduces training time and storage requirements without changing the model or degrading its training precision, thus supporting training with larger batch sizes, larger models, and larger input sizes.
MMEngine provides the wrapper AmpOptimWrapper for auto-mixing precision training, just set
optim_wrapper to enable auto-mixing precision training, no other code changes are needed.
runner = Runner( model=ResNet18(), work_dir='./work_dir', train_dataloader=train_dataloader_cfg, optim_wrapper=dict( type='AmpOptimWrapper', # If you want to use bfloat16, uncomment the following line # dtype='bfloat16', # valid values: ('float16', 'bfloat16', None) optimizer=dict(type='SGD', lr=0.001, momentum=0.9)), train_cfg=dict(by_epoch=True, max_epochs=3), ) runner.train()
Up till PyTorch 1.13,
torch.bfloat16 performance on
Convolution is bad unless manually set environment variable
TORCH_CUDNN_V8_API_ENABLED=1. More context at PyTorch issue
PyTorch introduced torch.compile in its 2.0 release. It compiles your model to speedup trainning & validation. This feature can be enabled since MMEngine v0.7.0, by passing to
Runner an extra
cfg dict with
runner = Runner( model=ResNet18(), ... # other arguments you want cfg=dict(compile=True) )
For advanced usage, you can also change compile options as illustrated in torch.compile API Documentation. For example:
compile_options = dict(backend='inductor', mode='max-autotune') runner = Runner( model=ResNet18(), ... # other arguments you want cfg=dict(compile=compile_options) )
This feature is only available for PyTorch >= 2.0.0.
Using faster Optimizers¶
If Ascend devices are used, you can use the Ascend optimizers to shorten the training time of the model. The optimizers supported by Ascend devices are as follows:
The usage is the same as native optimizers, and you can refer to Using Optimizers for more information.