Shortcuts

ClassBalancedDataset

class mmengine.dataset.ClassBalancedDataset(dataset, oversample_thr, lazy_init=False)[source]

A wrapper of class balanced dataset.

Suitable for training on class imbalanced datasets like LVIS. Following the sampling strategy in the paper, in each epoch, an image may appear multiple times based on its “repeat factor”. The repeat factor for an image is a function of the frequency the rarest category labeled in that image. The “frequency of category c” in [0, 1] is defined by the fraction of images in the training set (without repeats) in which category c appears. The dataset needs to instantiate get_cat_ids() to support ClassBalancedDataset.

The repeat factor is computed as followed.

  1. For each category c, compute the fraction # of images that contain it: \(f(c)\)

  2. For each category c, compute the category-level repeat factor: \(r(c) = max(1, sqrt(t/f(c)))\)

  3. For each image I, compute the image-level repeat factor: \(r(I) = max_{c in I} r(c)\)

Note

ClassBalancedDataset should not inherit from BaseDataset since get_subset and get_subset_ could produce ambiguous meaning sub-dataset which conflicts with original dataset. If you want to use a sub-dataset of ClassBalancedDataset, you should set indices arguments for wrapped dataset which inherit from BaseDataset.

Parameters:
  • dataset (BaseDataset or dict) – The dataset to be repeated.

  • oversample_thr (float) – frequency threshold below which data is repeated. For categories with f_c >= oversample_thr, there is no oversampling. For categories with f_c < oversample_thr, the degree of oversampling following the square-root inverse frequency heuristic above.

  • lazy_init (bool, optional) – whether to load annotation during instantiation. Defaults to False

full_init()[source]

Loop to full_init each dataset.

get_cat_ids(idx)[source]

Get category ids of class balanced dataset by index.

Parameters:

idx (int) – Index of data.

Returns:

All categories in the image of specified index.

Return type:

List[int]

get_data_info(idx)[source]

Get annotation by index.

Parameters:

idx (int) – Global index of ConcatDataset.

Returns:

The idx-th annotation of the dataset.

Return type:

dict

get_subset(indices)[source]

Not supported in ClassBalancedDataset for the ambiguous meaning of sub-dataset.

Parameters:

indices (List[int] | int) –

Return type:

BaseDataset

get_subset_(indices)[source]

Not supported in ClassBalancedDataset for the ambiguous meaning of sub-dataset.

Parameters:

indices (List[int] | int) –

Return type:

None

property metainfo: dict

Get the meta information of the repeated dataset.

Returns:

The meta information of repeated dataset.

Return type:

dict