Shortcuts

DefaultSampler

class mmengine.dataset.DefaultSampler(dataset, shuffle=True, seed=None, round_up=True)[source]

The default data sampler for both distributed and non-distributed environment.

It has several differences from the PyTorch DistributedSampler as below:

  1. This sampler supports non-distributed environment.

  2. The round up behaviors are a little different.

    • If round_up=True, this sampler will add extra samples to make the number of samples is evenly divisible by the world size. And this behavior is the same as the DistributedSampler with drop_last=False.

    • If round_up=False, this sampler won’t remove or add any samples while the DistributedSampler with drop_last=True will remove tail samples.

Parameters:
  • dataset (Sized) – The dataset.

  • shuffle (bool) – Whether shuffle the dataset or not. Defaults to True.

  • seed (int, optional) – Random seed used to shuffle the sampler if shuffle=True. This number should be identical across all processes in the distributed group. Defaults to None.

  • round_up (bool) – Whether to add extra samples to make the number of samples evenly divisible by the world size. Defaults to True.

set_epoch(epoch)[source]

Sets the epoch for this sampler.

When shuffle=True, this ensures all replicas use a different random ordering for each epoch. Otherwise, the next iteration of this sampler will yield the same ordering.

Parameters:

epoch (int) – Epoch number.

Return type:

None