Shortcuts

ClassBalancedDataset

class mmengine.dataset.ClassBalancedDataset(dataset, oversample_thr, lazy_init=False)[source]

A wrapper of class balanced dataset.

Suitable for training on class imbalanced datasets like LVIS. Following the sampling strategy in the paper, in each epoch, an image may appear multiple times based on its “repeat factor”. The repeat factor for an image is a function of the frequency the rarest category labeled in that image. The “frequency of category c” in [0, 1] is defined by the fraction of images in the training set (without repeats) in which category c appears. The dataset needs to instantiate get_cat_ids() to support ClassBalancedDataset.

The repeat factor is computed as followed.

  1. For each category c, compute the fraction # of images that contain it: \(f(c)\)

  2. For each category c, compute the category-level repeat factor: \(r(c) = max(1, sqrt(t/f(c)))\)

  3. For each image I, compute the image-level repeat factor: \(r(I) = max_{c in I} r(c)\)

Note

ClassBalancedDataset should not inherit from BaseDataset since get_subset and get_subset_ could produce ambiguous meaning sub-dataset which conflicts with original dataset. If you want to use a sub-dataset of ClassBalancedDataset, you should set indices arguments for wrapped dataset which inherit from BaseDataset.

Parameters
  • dataset (BaseDataset or dict) – The dataset to be repeated.

  • oversample_thr (float) – frequency threshold below which data is repeated. For categories with f_c >= oversample_thr, there is no oversampling. For categories with f_c < oversample_thr, the degree of oversampling following the square-root inverse frequency heuristic above.

  • lazy_init (bool, optional) – whether to load annotation during instantiation. Defaults to False

full_init()[source]

Loop to full_init each dataset.

get_cat_ids(idx)[source]

Get category ids of class balanced dataset by index.

Parameters

idx (int) – Index of data.

Returns

All categories in the image of specified index.

Return type

List[int]

get_data_info(idx)[source]

Get annotation by index.

Parameters

idx (int) – Global index of ConcatDataset.

Returns

The idx-th annotation of the dataset.

Return type

dict

get_subset(indices)[source]

Not supported in ClassBalancedDataset for the ambiguous meaning of sub-dataset.

Parameters

indices (Union[List[int], int]) –

Return type

mmengine.dataset.base_dataset.BaseDataset

get_subset_(indices)[source]

Not supported in ClassBalancedDataset for the ambiguous meaning of sub-dataset.

Parameters

indices (Union[List[int], int]) –

Return type

None

property metainfo: dict

Get the meta information of the repeated dataset.

Returns

The meta information of repeated dataset.

Return type

dict

Read the Docs v: v0.2.0
Versions
latest
stable
v0.5.0
v0.4.0
v0.3.0
v0.2.0
Downloads
On Read the Docs
Project Home
Builds

Free document hosting provided by Read the Docs.