Shortcuts

DeepSpeedStrategy

class mmengine._strategy.DeepSpeedStrategy(*, config=None, zero_optimization=None, gradient_clipping=None, fp16=None, inputs_to_half=None, bf16=None, amp=None, activation_checkpointing=None, aio=None, train_micro_batch_size_per_gpu=None, gradient_accumulation_steps=None, steps_per_print=10000000000000, exclude_frozen_parameters=None, **kwargs)[源代码]

Support training models with DeepSpeed.

备注

The detailed usage of parameters can be found at https://www.deepspeed.ai/docs/config-json/.

参数:
  • config (str or dict, optional) – If it is a string, it is a path to load config for deepspeed. Defaults to None.

  • zero_optimization (dict, optional) – Enabling and configuring ZeRO memory optimizations. Defaults to None.

  • gradient_clipping (float, optional) – Enable gradient clipping with value. Defaults to None.

  • fp16 (dict, optional) – Configuration for using mixed precision/FP16 training that leverages NVIDIA’s Apex package. Defaults to None.

  • inputs_to_half (list[int or str], optional) – Which inputs are to converted to half precision. Defaults to None. If fp16 is enabled, it also should be set.

  • bf16 (dict, optional) – Configuration for using bfloat16 floating-point format as an alternative to FP16. Defaults to None.

  • amp (dict, optional) – Configuration for using automatic mixed precision (AMP) training that leverages NVIDIA’s Apex AMP package. Defaults to None.

  • activation_checkpointing (dict, optional) – Reduce memory usage by clearing activations of certain layers and recomputing them during a backward pass. Defaults to None.

  • aio (dict, optional) – Configuring the asynchronous I/O module for offloading parameter and optimizer states to persistent (NVMe) storage. This module uses Linux native asynchronous I/O (libaio). Defaults to None.

  • train_micro_batch_size_per_gpu (int, optional) – Batch size to be processed by one GPU in one step (without gradient accumulation). Defaults to None.

  • gradient_accumulation_steps (int, optional) – Number of training steps to accumulate gradients before averaging and applying them. Defaults to None.

  • exclude_frozen_parameters (bool, optional) – Exclude frozen parameters from saved checkpoint.

  • steps_per_print (int) –

load_checkpoint(filename, *, map_location='cpu', strict=False, revise_keys=[('^module.', '')], callback=None)[源代码]

Load checkpoint from given filename.

警告

map_localtion and callback parameters are not supported yet.

参数:
  • filename (str) – Accept local filepath, URL, torchvision://xxx, open-mmlab://xxx.

  • map_location (str | Callable) –

  • strict (bool) –

  • revise_keys (list) –

  • callback (Callable | None) –

返回类型:

dict

prepare(model, *, optim_wrapper=None, param_scheduler=None, compile=False, dispatch_kwargs=None)[源代码]

Prepare model and some components.

参数:
关键字参数:
  • optim_wrapper (BaseOptimWrapper or dict, optional) – Computing the gradient of model parameters and updating them. Defaults to None. See build_optim_wrapper() for examples.

  • param_scheduler (_ParamScheduler or dict or list, optional) – Parameter scheduler for updating optimizer parameters. If specified, optim_wrapper should also be specified. Defaults to None. See build_param_scheduler() for examples.

  • compile (dict, optional) – Config to compile model. Defaults to False. Requires PyTorch>=2.0.

  • dispatch_kwargs (dict, optional) – Kwargs to be passed to other methods of Strategy. Defaults to None.

resume(filename, *, resume_optimizer=True, resume_param_scheduler=True, map_location='default', callback=None)[源代码]

Resume training from given filename.

警告

map_location and callback parameters are not supported yet.

参数:
  • filename (str) – Accept local filepath.

  • resume_optimizer (bool) –

  • resume_param_scheduler (bool) –

  • map_location (str | Callable) –

  • callback (Callable | None) –

关键字参数:
  • resume_optimizer (bool) – Whether to resume optimizer state. Defaults to True.

  • resume_param_scheduler (bool) – Whether to resume param scheduler state. Defaults to True.

返回类型:

dict

save_checkpoint(filename, *, save_optimizer=True, save_param_scheduler=True, extra_ckpt=None, callback=None)[源代码]

Save checkpoint to given filename.

警告

callback parameter is not supported yet.

参数:
  • filename (str) – Filename to save checkpoint.

  • save_optimizer (bool) –

  • save_param_scheduler (bool) –

  • extra_ckpt (dict | None) –

  • callback (Callable | None) –

关键字参数:
  • save_param_scheduler (bool) – Whether to save the param_scheduler to the checkpoint. Defaults to True.

  • extra_ckpt (dict, optional) – Extra checkpoint to save. Defaults to None.

返回类型:

None

Read the Docs v: stable
Versions
latest
stable
v0.10.4
v0.10.3
v0.10.2
v0.10.1
v0.10.0
v0.9.1
v0.9.0
v0.8.5
v0.8.4
v0.8.3
v0.8.2
v0.8.1
v0.8.0
v0.7.4
v0.7.3
v0.7.2
v0.7.1
v0.7.0
v0.6.0
v0.5.0
v0.4.0
v0.3.0
v0.2.0
Downloads
epub
On Read the Docs
Project Home
Builds

Free document hosting provided by Read the Docs.