ClassBalancedDataset¶

class mmengine.dataset.ClassBalancedDataset(dataset, oversample_thr, lazy_init=False)[源代码]¶

A wrapper of class balanced dataset.

Suitable for training on class imbalanced datasets like LVIS. Following the sampling strategy in the paper, in each epoch, an image may appear multiple times based on its “repeat factor”. The repeat factor for an image is a function of the frequency the rarest category labeled in that image. The “frequency of category c” in [0, 1] is defined by the fraction of images in the training set (without repeats) in which category c appears. The dataset needs to instantiate get_cat_ids() to support ClassBalancedDataset.

The repeat factor is computed as followed.

For each category c, compute the fraction # of images that contain it: \(f(c)\)
For each category c, compute the category-level repeat factor: \(r(c) = max(1, sqrt(t/f(c)))\)
For each image I, compute the image-level repeat factor: \(r(I) = max_{c in I} r(c)\)

备注

ClassBalancedDataset should not inherit from BaseDataset since get_subset and get_subset_ could produce ambiguous meaning sub-dataset which conflicts with original dataset. If you want to use a sub-dataset of ClassBalancedDataset, you should set indices arguments for wrapped dataset which inherit from BaseDataset.

参数:

dataset (BaseDataset or dict) – The dataset to be repeated.
oversample_thr (float) – frequency threshold below which data is repeated. For categories with f_c >= oversample_thr, there is no oversampling. For categories with f_c < oversample_thr, the degree of oversampling following the square-root inverse frequency heuristic above.
lazy_init (bool, optional) – whether to load annotation during instantiation. Defaults to False

full_init()[源代码]¶: Loop to full_init each dataset.

get_cat_ids(idx)[源代码]¶

Get category ids of class balanced dataset by index.

参数:: idx (int) – Index of data.
返回:: All categories in the image of specified index.
返回类型:: List[int]

get_data_info(idx)[源代码]¶

Get annotation by index.

参数:: idx (int) – Global index of ConcatDataset.
返回:: The idx-th annotation of the dataset.
返回类型:: dict

get_subset(indices)[源代码]¶

Not supported in ClassBalancedDataset for the ambiguous meaning of sub-dataset.

参数:: indices (List[int] | int) –
返回类型:: BaseDataset

get_subset_(indices)[源代码]¶

Not supported in ClassBalancedDataset for the ambiguous meaning of sub-dataset.

参数:: indices (List[int] | int) –
返回类型:: None

property metainfo: dict¶

Get the meta information of the repeated dataset.

返回:: The meta information of repeated dataset.
返回类型:: dict