Shortcuts

DeepSpeedStrategy

class mmengine._strategy.DeepSpeedStrategy(*, config=None, zero_optimization=None, gradient_clipping=1.0, fp16=None, inputs_to_half=None, bf16=None, amp=None, activation_checkpointing=None, aio=None, train_micro_batch_size_per_gpu=None, gradient_accumulation_steps=1, steps_per_print=10000000000000, **kwargs)[源代码]

Support training models with DeepSpeed.

备注

The detailed usage of parameters can be found at https://www.deepspeed.ai/docs/config-json/.

参数
  • config (str or dict, optional) – If it is a string, it is a path to load config for deepspeed. Defaults to None.

  • zero_optimization (dict, optional) – Enabling and configuring ZeRO memory optimizations. Defaults to None.

  • gradient_clipping (float) – Enable gradient clipping with value. Defaults to 1.0.

  • fp16 (dict, optional) – Configuration for using mixed precision/FP16 training that leverages NVIDIA’s Apex package.

  • inputs_to_half (list[int or str], optional) – Which inputs are to converted to half precision. Defaults to None. If fp16 is enabled, it also should be set.

  • bf16 (dict, optional) – Configuration for using bfloat16 floating-point format as an alternative to FP16. Defaults to None.

  • amp (dict, optional) – Configuration for using automatic mixed precision (AMP) training that leverages NVIDIA’s Apex AMP package. Defaults to None.

  • activation_checkpointing (dict, optional) – Reduce memory usage by clearing activations of certain layers and recomputing them during a backward pass. Defaults to None.

  • aio (dict, optional) – Configuring the asynchronous I/O module for offloading parameter and optimizer states to persistent (NVMe) storage. This module uses Linux native asynchronous I/O (libaio). Defaults to None.

  • train_micro_batch_size_per_gpu (Optional[int]) –

  • gradient_accumulation_steps (int) –

  • steps_per_print (int) –

load_checkpoint(filename, *, map_location='cpu', strict=False, revise_keys=[('^module.', '')], callback=None)[源代码]

Load checkpoint from given filename.

警告

map_localtion and callback parameters are not supported yet.

参数
返回类型

dict

prepare(model, *, optim_wrapper=None, param_scheduler=None, compile=False, dispatch_kwargs=None)[源代码]

Prepare model and some components.

参数
关键字参数
  • optim_wrapper (BaseOptimWrapper or dict, optional) – Computing the gradient of model parameters and updating them. Defaults to None. See build_optim_wrapper() for examples.

  • param_scheduler (_ParamScheduler or dict or list, optional) – Parameter scheduler for updating optimizer parameters. If specified, optim_wrapper should also be specified. Defaults to None. See build_param_scheduler() for examples.

  • compile (dict, optional) – Config to compile model. Defaults to False. Requires PyTorch>=2.0.

  • dispatch_kwargs (dict, optional) – Kwargs to be passed to other methods of Strategy. Defaults to None.

resume(filename, *, resume_optimizer=True, resume_param_scheduler=True, map_location='default', callback=None)[源代码]

Resume training from given filename.

警告

map_location and callback parameters are not supported yet.

参数
关键字参数
  • resume_optimizer (bool) – Whether to resume optimizer state. Defaults to True.

  • resume_param_scheduler (bool) – Whether to resume param scheduler state. Defaults to True.

返回类型

dict

save_checkpoint(filename, *, save_optimizer=True, save_param_scheduler=True, extra_ckpt=None, callback=None)[源代码]

Save checkpoint to given filename.

警告

save_optimizer and callback parameters are not supported yet.

参数
关键字参数
  • save_param_scheduler (bool) – Whether to save the param_scheduler to the checkpoint. Defaults to True.

  • extra_ckpt (dict, optional) – Extra checkpoint to save. Defaults to None.

返回类型

None

Read the Docs v: v0.8.0
Versions
latest
stable
v0.8.0
v0.7.4
v0.7.3
v0.7.2
v0.7.1
v0.7.0
v0.6.0
v0.5.0
v0.4.0
v0.3.0
v0.2.0
Downloads
On Read the Docs
Project Home
Builds

Free document hosting provided by Read the Docs.