DeepSpeedStrategy¶
- class mmengine._strategy.DeepSpeedStrategy(*, config=None, zero_optimization=None, gradient_clipping=None, fp16=None, inputs_to_half=None, bf16=None, amp=None, activation_checkpointing=None, aio=None, train_micro_batch_size_per_gpu=None, gradient_accumulation_steps=None, steps_per_print=10000000000000, exclude_frozen_parameters=None, **kwargs)[source]¶
Support training models with DeepSpeed.
Note
The detailed usage of parameters can be found at https://www.deepspeed.ai/docs/config-json/.
- Parameters:
config (str or dict, optional) – If it is a string, it is a path to load config for deepspeed. Defaults to None.
zero_optimization (dict, optional) – Enabling and configuring ZeRO memory optimizations. Defaults to None.
gradient_clipping (float, optional) – Enable gradient clipping with value. Defaults to None.
fp16 (dict, optional) – Configuration for using mixed precision/FP16 training that leverages NVIDIA’s Apex package. Defaults to None.
inputs_to_half (list[int or str], optional) – Which inputs are to converted to half precision. Defaults to None. If
fp16
is enabled, it also should be set.bf16 (dict, optional) – Configuration for using bfloat16 floating-point format as an alternative to FP16. Defaults to None.
amp (dict, optional) – Configuration for using automatic mixed precision (AMP) training that leverages NVIDIA’s Apex AMP package. Defaults to None.
activation_checkpointing (dict, optional) – Reduce memory usage by clearing activations of certain layers and recomputing them during a backward pass. Defaults to None.
aio (dict, optional) – Configuring the asynchronous I/O module for offloading parameter and optimizer states to persistent (NVMe) storage. This module uses Linux native asynchronous I/O (libaio). Defaults to None.
train_micro_batch_size_per_gpu (int, optional) – Batch size to be processed by one GPU in one step (without gradient accumulation). Defaults to None.
gradient_accumulation_steps (int, optional) – Number of training steps to accumulate gradients before averaging and applying them. Defaults to None.
exclude_frozen_parameters (bool, optional) – Exclude frozen parameters from saved checkpoint.
steps_per_print (int) –
- load_checkpoint(filename, *, map_location='cpu', strict=False, revise_keys=[('^module.', '')], callback=None)[source]¶
Load checkpoint from given
filename
.Warning
map_localtion and callback parameters are not supported yet.
- prepare(model, *, optim_wrapper=None, param_scheduler=None, compile=False, dispatch_kwargs=None)[source]¶
Prepare model and some components.
- Parameters:
model (
torch.nn.Module
or dict) – The model to be run. It can be a dict used for build a model.optim_wrapper (BaseOptimWrapper | dict | None) –
param_scheduler (_ParamScheduler | Dict | List | None) –
dispatch_kwargs (dict | None) –
- Keyword Arguments:
optim_wrapper (BaseOptimWrapper or dict, optional) – Computing the gradient of model parameters and updating them. Defaults to None. See
build_optim_wrapper()
for examples.param_scheduler (_ParamScheduler or dict or list, optional) – Parameter scheduler for updating optimizer parameters. If specified,
optim_wrapper
should also be specified. Defaults to None. Seebuild_param_scheduler()
for examples.compile (dict, optional) – Config to compile model. Defaults to False. Requires PyTorch>=2.0.
dispatch_kwargs (dict, optional) – Kwargs to be passed to other methods of Strategy. Defaults to None.
- resume(filename, *, resume_optimizer=True, resume_param_scheduler=True, map_location='default', callback=None)[source]¶
Resume training from given
filename
.Warning
map_location and callback parameters are not supported yet.
- Parameters:
- Keyword Arguments:
- Return type: