Shortcuts

DeepSpeedStrategy

class mmengine._strategy.DeepSpeedStrategy(*, config=None, zero_optimization=None, gradient_clipping=None, fp16=None, inputs_to_half=None, bf16=None, amp=None, activation_checkpointing=None, aio=None, train_micro_batch_size_per_gpu=None, gradient_accumulation_steps=None, steps_per_print=10000000000000, exclude_frozen_parameters=None, **kwargs)[source]

Support training models with DeepSpeed.

Note

The detailed usage of parameters can be found at https://www.deepspeed.ai/docs/config-json/.

Parameters:
  • config (str or dict, optional) – If it is a string, it is a path to load config for deepspeed. Defaults to None.

  • zero_optimization (dict, optional) – Enabling and configuring ZeRO memory optimizations. Defaults to None.

  • gradient_clipping (float, optional) – Enable gradient clipping with value. Defaults to None.

  • fp16 (dict, optional) – Configuration for using mixed precision/FP16 training that leverages NVIDIA’s Apex package. Defaults to None.

  • inputs_to_half (list[int or str], optional) – Which inputs are to converted to half precision. Defaults to None. If fp16 is enabled, it also should be set.

  • bf16 (dict, optional) – Configuration for using bfloat16 floating-point format as an alternative to FP16. Defaults to None.

  • amp (dict, optional) – Configuration for using automatic mixed precision (AMP) training that leverages NVIDIA’s Apex AMP package. Defaults to None.

  • activation_checkpointing (dict, optional) – Reduce memory usage by clearing activations of certain layers and recomputing them during a backward pass. Defaults to None.

  • aio (dict, optional) – Configuring the asynchronous I/O module for offloading parameter and optimizer states to persistent (NVMe) storage. This module uses Linux native asynchronous I/O (libaio). Defaults to None.

  • train_micro_batch_size_per_gpu (int, optional) – Batch size to be processed by one GPU in one step (without gradient accumulation). Defaults to None.

  • gradient_accumulation_steps (int, optional) – Number of training steps to accumulate gradients before averaging and applying them. Defaults to None.

  • exclude_frozen_parameters (bool, optional) – Exclude frozen parameters from saved checkpoint.

  • steps_per_print (int) –

load_checkpoint(filename, *, map_location='cpu', strict=False, revise_keys=[('^module.', '')], callback=None)[source]

Load checkpoint from given filename.

Warning

map_localtion and callback parameters are not supported yet.

Parameters:
  • filename (str) – Accept local filepath, URL, torchvision://xxx, open-mmlab://xxx.

  • map_location (str | Callable) –

  • strict (bool) –

  • revise_keys (list) –

  • callback (Callable | None) –

Return type:

dict

prepare(model, *, optim_wrapper=None, param_scheduler=None, compile=False, dispatch_kwargs=None)[source]

Prepare model and some components.

Parameters:
Keyword Arguments:
  • optim_wrapper (BaseOptimWrapper or dict, optional) – Computing the gradient of model parameters and updating them. Defaults to None. See build_optim_wrapper() for examples.

  • param_scheduler (_ParamScheduler or dict or list, optional) – Parameter scheduler for updating optimizer parameters. If specified, optim_wrapper should also be specified. Defaults to None. See build_param_scheduler() for examples.

  • compile (dict, optional) – Config to compile model. Defaults to False. Requires PyTorch>=2.0.

  • dispatch_kwargs (dict, optional) – Kwargs to be passed to other methods of Strategy. Defaults to None.

resume(filename, *, resume_optimizer=True, resume_param_scheduler=True, map_location='default', callback=None)[source]

Resume training from given filename.

Warning

map_location and callback parameters are not supported yet.

Parameters:
  • filename (str) – Accept local filepath.

  • resume_optimizer (bool) –

  • resume_param_scheduler (bool) –

  • map_location (str | Callable) –

  • callback (Callable | None) –

Keyword Arguments:
  • resume_optimizer (bool) – Whether to resume optimizer state. Defaults to True.

  • resume_param_scheduler (bool) – Whether to resume param scheduler state. Defaults to True.

Return type:

dict

save_checkpoint(filename, *, save_optimizer=True, save_param_scheduler=True, extra_ckpt=None, callback=None)[source]

Save checkpoint to given filename.

Warning

callback parameter is not supported yet.

Parameters:
  • filename (str) – Filename to save checkpoint.

  • save_optimizer (bool) –

  • save_param_scheduler (bool) –

  • extra_ckpt (dict | None) –

  • callback (Callable | None) –

Keyword Arguments:
  • save_param_scheduler (bool) – Whether to save the param_scheduler to the checkpoint. Defaults to True.

  • extra_ckpt (dict, optional) – Extra checkpoint to save. Defaults to None.

Return type:

None

Read the Docs v: v0.10.4
Versions
latest
stable
v0.10.4
v0.10.3
v0.10.2
v0.10.1
v0.10.0
v0.9.1
v0.9.0
v0.8.5
v0.8.4
v0.8.3
v0.8.2
v0.8.1
v0.8.0
v0.7.4
v0.7.3
v0.7.2
v0.7.1
v0.7.0
v0.6.0
v0.5.0
v0.4.0
v0.3.0
v0.2.0
Downloads
On Read the Docs
Project Home
Builds

Free document hosting provided by Read the Docs.