DeepSpeedStrategy¶

class mmengine._strategy.DeepSpeedStrategy(*, config=None, zero_optimization=None, gradient_clipping=1.0, fp16=None, inputs_to_half=None, bf16=None, amp=None, activation_checkpointing=None, aio=None, train_micro_batch_size_per_gpu=None, gradient_accumulation_steps=1, steps_per_print=10000000000000, **kwargs)[源代码]¶

Support training models with DeepSpeed.

备注

The detailed usage of parameters can be found at https://www.deepspeed.ai/docs/config-json/.

参数

config (str or dict, optional) – If it is a string, it is a path to load config for deepspeed. Defaults to None.
zero_optimization (dict, optional) – Enabling and configuring ZeRO memory optimizations. Defaults to None.
gradient_clipping (float) – Enable gradient clipping with value. Defaults to 1.0.
fp16 (dict, optional) – Configuration for using mixed precision/FP16 training that leverages NVIDIA’s Apex package.
inputs_to_half (list[int or str], optional) – Which inputs are to converted to half precision. Defaults to None. If fp16 is enabled, it also should be set.
bf16 (dict, optional) – Configuration for using bfloat16 floating-point format as an alternative to FP16. Defaults to None.
amp (dict, optional) – Configuration for using automatic mixed precision (AMP) training that leverages NVIDIA’s Apex AMP package. Defaults to None.
activation_checkpointing (dict, optional) – Reduce memory usage by clearing activations of certain layers and recomputing them during a backward pass. Defaults to None.
aio (dict, optional) – Configuring the asynchronous I/O module for offloading parameter and optimizer states to persistent (NVMe) storage. This module uses Linux native asynchronous I/O (libaio). Defaults to None.
train_micro_batch_size_per_gpu (Optional[int]) –
gradient_accumulation_steps (int) –
steps_per_print (int) –

load_checkpoint(filename, *, map_location='cpu', strict=False, revise_keys=[('^module.', '')], callback=None)[源代码]¶

Load checkpoint from given filename.

警告

map_localtion and callback parameters are not supported yet.

参数

filename (str) – Accept local filepath, URL, torchvision://xxx, open-mmlab://xxx.
map_location (Union[str, Callable]) –
strict (bool) –
revise_keys (list) –
callback (Optional[Callable]) –

返回类型

dict

prepare(model, *, optim_wrapper=None, param_scheduler=None, compile=False, dispatch_kwargs=None)[源代码]¶

Prepare model and some components.

参数

model (torch.nn.Module or dict) – The model to be run. It can be a dict used for build a model.
optim_wrapper (Optional[Union[mmengine.optim.optimizer.base.BaseOptimWrapper, dict]]) –
param_scheduler (Optional[Union[mmengine.optim.scheduler.param_scheduler._ParamScheduler, Dict, List]]) –
compile (Union[dict, bool]) –
dispatch_kwargs (Optional[dict]) –

关键字参数

optim_wrapper (BaseOptimWrapper or dict, optional) – Computing the gradient of model parameters and updating them. Defaults to None. See build_optim_wrapper() for examples.
param_scheduler (_ParamScheduler or dict or list, optional) – Parameter scheduler for updating optimizer parameters. If specified, optim_wrapper should also be specified. Defaults to None. See build_param_scheduler() for examples.
compile (dict, optional) – Config to compile model. Defaults to False. Requires PyTorch>=2.0.
dispatch_kwargs (dict, optional) – Kwargs to be passed to other methods of Strategy. Defaults to None.

resume(filename, *, resume_optimizer=True, resume_param_scheduler=True, map_location='default', callback=None)[源代码]¶

Resume training from given filename.

警告

map_location and callback parameters are not supported yet.

参数

filename (str) – Accept local filepath.
resume_optimizer (bool) –
resume_param_scheduler (bool) –
map_location (Union[str, Callable]) –
callback (Optional[Callable]) –

关键字参数

resume_optimizer (bool) – Whether to resume optimizer state. Defaults to True.
resume_param_scheduler (bool) – Whether to resume param scheduler state. Defaults to True.

返回类型

dict

save_checkpoint(filename, *, save_optimizer=True, save_param_scheduler=True, extra_ckpt=None, callback=None)[源代码]¶

Save checkpoint to given filename.

警告

save_optimizer and callback parameters are not supported yet.

参数

filename (str) – Filename to save checkpoint.
save_optimizer (bool) –
save_param_scheduler (bool) –
extra_ckpt (Optional[dict]) –
callback (Optional[Callable]) –

关键字参数

save_param_scheduler (bool) – Whether to save the param_scheduler to the checkpoint. Defaults to True.
extra_ckpt (dict, optional) – Extra checkpoint to save. Defaults to None.

返回类型

None