Shortcuts

BaseDataset

class mmengine.dataset.BaseDataset(ann_file='', metainfo=None, data_root='', data_prefix={'img_path': ''}, filter_cfg=None, indices=None, serialize_data=True, pipeline=[], test_mode=False, lazy_init=False, max_refetch=1000)[source]

BaseDataset for open source projects in OpenMMLab.

The annotation format is shown as follows.

{
    "metainfo":
    {
      "dataset_type": "test_dataset",
      "task_name": "test_task"
    },
    "data_list":
    [
      {
        "img_path": "test_img.jpg",
        "height": 604,
        "width": 640,
        "instances":
        [
          {
            "bbox": [0, 0, 10, 20],
            "bbox_label": 1,
            "mask": [[0,0],[0,10],[10,20],[20,0]],
            "extra_anns": [1,2,3]
          },
          {
            "bbox": [10, 10, 110, 120],
            "bbox_label": 2,
            "mask": [[10,10],[10,110],[110,120],[120,10]],
            "extra_anns": [4,5,6]
          }
        ]
      },
    ]
}
Parameters
  • ann_file (str) – Annotation file path. Defaults to ‘’.

  • metainfo (dict, optional) – Meta information for dataset, such as class information. Defaults to None.

  • data_root (str) – The root directory for data_prefix and ann_file. Defaults to ‘’.

  • data_prefix (dict) – Prefix for training data. Defaults to dict(img_path=’’).

  • filter_cfg (dict, optional) – Config for filter data. Defaults to None.

  • indices (int or Sequence[int], optional) – Support using first few data in annotation file to facilitate training/testing on a smaller dataset. Defaults to None which means using all data_infos.

  • serialize_data (bool, optional) – Whether to hold memory using serialized objects, when enabled, data loader workers can use shared RAM from master process instead of making a copy. Defaults to True.

  • pipeline (list, optional) – Processing pipeline. Defaults to [].

  • test_mode (bool, optional) – test_mode=True means in test phase. Defaults to False.

  • lazy_init (bool, optional) – Whether to load annotation during instantiation. In some cases, such as visualization, only the meta information of the dataset is needed, which is not necessary to load annotation file. Basedataset can skip load annotations to save time by set lazy_init=True. Defaults to False.

  • max_refetch (int, optional) – If Basedataset.prepare_data get a None img. The maximum extra number of cycles to get a valid image. Defaults to 1000.

Note

BaseDataset collects meta information from annotation file (the lowest priority), BaseDataset.METAINFO``(medium) and ``metainfo parameter (highest) passed to constructors. The lower priority meta information will be overwritten by higher one.

Note

Dataset wrapper such as ConcatDataset, RepeatDataset .etc. should not inherit from BaseDataset since get_subset and get_subset_ could produce ambiguous meaning sub-dataset which conflicts with original dataset.

Examples

>>> # Assume the annotation file is given above.
>>> class CustomDataset(BaseDataset):
>>>     METAINFO: dict = dict(task_name='custom_task',
>>>                           dataset_type='custom_type')
>>> metainfo=dict(task_name='custom_task_name')
>>> custom_dataset = CustomDataset(
>>>                      'path/to/ann_file',
>>>                      metainfo=metainfo)
>>> # meta information of annotation file will be overwritten by
>>> # `CustomDataset.METAINFO`. The merged meta information will
>>> # further be overwritten by argument `metainfo`.
>>> custom_dataset.metainfo
{'task_name': custom_task_name, dataset_type: custom_type}
filter_data()[source]

Filter annotations according to filter_cfg. Defaults return all data_list.

If some data_list could be filtered according to specific logic, the subclass should override this method.

Returns

Filtered results.

Return type

list[int]

full_init()[source]

Load annotation file and set BaseDataset._fully_initialized to True.

If lazy_init=False, full_init will be called during the instantiation and self._fully_initialized will be set to True. If obj._fully_initialized=False, the class method decorated by force_full_init will call full_init automatically.

Several steps to initialize annotation:

  • load_data_list: Load annotations from annotation file.

  • filter data information: Filter annotations according to filter_cfg.

  • slice_data: Slice dataset according to self._indices

  • serialize_data: Serialize self.data_list if

self.serialize_data is True.

get_cat_ids(idx)[source]

Get category ids by index. Dataset wrapped by ClassBalancedDataset must implement this method.

The ClassBalancedDataset requires a subclass which implements this method.

Parameters

idx (int) – The index of data.

Returns

All categories in the image of specified index.

Return type

list[int]

get_data_info(idx)[source]

Get annotation by index and automatically call full_init if the dataset has not been fully initialized.

Parameters

idx (int) – The index of data.

Returns

The idx-th annotation of the dataset.

Return type

dict

get_subset(indices)[source]

Return a subset of dataset.

This method will return a subset of original dataset. If type of indices is int, get_subset_ will return a subdataset which contains the first or last few data information according to indices is positive or negative. If type of indices is a sequence of int, the subdataset will extract the information according to the index given in indices.

Examples

>>> dataset = BaseDataset('path/to/ann_file')
>>> len(dataset)
100
>>> subdataset = dataset.get_subset(90)
>>> len(sub_dataset)
90
>>> # if type of indices is list, extract the corresponding
>>> # index data information
>>> subdataset = dataset.get_subset([0, 1, 2, 3, 4, 5, 6, 7,
>>>                                  8, 9])
>>> len(sub_dataset)
10
>>> subdataset = dataset.get_subset(-3)
>>> len(subdataset) # Get the latest few data information.
3
Parameters

indices (int or Sequence[int]) – If type of indices is int, indices represents the first or last few data of dataset according to indices is positive or negative. If type of indices is Sequence, indices represents the target data information index of dataset.

Returns

A subset of dataset.

Return type

BaseDataset

get_subset_(indices)[source]

The in-place version of ``get_subset `` to convert dataset to a subset of original dataset.

This method will convert the original dataset to a subset of dataset. If type of indices is int, get_subset_ will return a subdataset which contains the first or last few data information according to indices is positive or negative. If type of indices is a sequence of int, the subdataset will extract the data information according to the index given in indices.

Examples

>>> dataset = BaseDataset('path/to/ann_file')
>>> len(dataset)
100
>>> dataset.get_subset_(90)
>>> len(dataset)
90
>>> # if type of indices is sequence, extract the corresponding
>>> # index data information
>>> dataset.get_subset_([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> len(dataset)
10
>>> dataset.get_subset_(-3)
>>> len(dataset) # Get the latest few data information.
3
Parameters

indices (int or Sequence[int]) – If type of indices is int, indices represents the first or last few data of dataset according to indices is positive or negative. If type of indices is Sequence, indices represents the target data information index of dataset.

Return type

None

load_data_list()[source]

Load annotations from an annotation file named as self.ann_file

If the annotation file does not follow OpenMMLab 2.0 format dataset . The subclass must override this method for load annotations. The meta information of annotation file will be overwritten METAINFO and metainfo argument of constructor.

Returns

A list of annotation.

Return type

list[dict]

property metainfo: dict

Get meta information of dataset.

Returns

meta information collected from BaseDataset.METAINFO, annotation file and metainfo argument during instantiation.

Return type

dict

parse_data_info(raw_data_info)[source]

Parse raw annotation to target format.

This method should return dict or list of dict. Each dict or list contains the data information of a training sample. If the protocol of the sample annotations is changed, this function can be overridden to update the parsing logic while keeping compatibility.

Parameters

raw_data_info (dict) – Raw data information load from ann_file

Returns

Parsed annotation.

Return type

list or list[dict]

prepare_data(idx)[source]

Get data processed by self.pipeline.

Parameters

idx (int) – The index of data_info.

Returns

Depends on self.pipeline.

Return type

Any

Read the Docs v: v0.4.0
Versions
latest
stable
v0.5.0
v0.4.0
v0.3.0
v0.2.0
Downloads
On Read the Docs
Project Home
Builds

Free document hosting provided by Read the Docs.