BaseDataset¶
- class mmengine.dataset.BaseDataset(ann_file='', metainfo=None, data_root='', data_prefix={'img_path': ''}, filter_cfg=None, indices=None, serialize_data=True, pipeline=[], test_mode=False, lazy_init=False, max_refetch=1000)[源代码]¶
BaseDataset for open source projects in OpenMMLab.
The annotation format is shown as follows.
{ "metainfo": { "dataset_type": "test_dataset", "task_name": "test_task" }, "data_list": [ { "img_path": "test_img.jpg", "height": 604, "width": 640, "instances": [ { "bbox": [0, 0, 10, 20], "bbox_label": 1, "mask": [[0,0],[0,10],[10,20],[20,0]], "extra_anns": [1,2,3] }, { "bbox": [10, 10, 110, 120], "bbox_label": 2, "mask": [[10,10],[10,110],[110,120],[120,10]], "extra_anns": [4,5,6] } ] }, ] }
- 参数:
ann_file (str, optional) – Annotation file path. Defaults to ‘’.
metainfo (Mapping or Config, optional) – Meta information for dataset, such as class information. Defaults to None.
data_root (str, optional) – The root directory for
data_prefix
andann_file
. Defaults to ‘’.data_prefix (dict) – Prefix for training data. Defaults to dict(img_path=’’).
filter_cfg (dict, optional) – Config for filter data. Defaults to None.
indices (int or Sequence[int], optional) – Support using first few data in annotation file to facilitate training/testing on a smaller
serialize_data (bool, optional) – Whether to hold memory using serialized objects, when enabled, data loader workers can use shared RAM from master process instead of making a copy. Defaults to True.
pipeline (list, optional) – Processing pipeline. Defaults to [].
test_mode (bool, optional) –
test_mode=True
means in test phase. Defaults to False.lazy_init (bool, optional) – Whether to load annotation during instantiation. In some cases, such as visualization, only the meta information of the dataset is needed, which is not necessary to load annotation file.
Basedataset
can skip load annotations to save time by setlazy_init=True
. Defaults to False.max_refetch (int, optional) – If
Basedataset.prepare_data
get a None img. The maximum extra number of cycles to get a valid image. Defaults to 1000.
备注
BaseDataset collects meta information from
annotation file
(the lowest priority),BaseDataset.METAINFO``(medium) and ``metainfo parameter
(highest) passed to constructors. The lower priority meta information will be overwritten by higher one.备注
Dataset wrapper such as
ConcatDataset
,RepeatDataset
.etc. should not inherit fromBaseDataset
sinceget_subset
andget_subset_
could produce ambiguous meaning sub-dataset which conflicts with original dataset.示例
>>> # Assume the annotation file is given above. >>> class CustomDataset(BaseDataset): >>> METAINFO: dict = dict(task_name='custom_task', >>> dataset_type='custom_type') >>> metainfo=dict(task_name='custom_task_name') >>> custom_dataset = CustomDataset( >>> 'path/to/ann_file', >>> metainfo=metainfo) >>> # meta information of annotation file will be overwritten by >>> # `CustomDataset.METAINFO`. The merged meta information will >>> # further be overwritten by argument `metainfo`. >>> custom_dataset.metainfo {'task_name': custom_task_name, dataset_type: custom_type}
- filter_data()[源代码]¶
Filter annotations according to filter_cfg. Defaults return all
data_list
.If some
data_list
could be filtered according to specific logic, the subclass should override this method.
- full_init()[源代码]¶
Load annotation file and set
BaseDataset._fully_initialized
to True.If
lazy_init=False
,full_init
will be called during the instantiation andself._fully_initialized
will be set to True. Ifobj._fully_initialized=False
, the class method decorated byforce_full_init
will callfull_init
automatically.Several steps to initialize annotation:
load_data_list: Load annotations from annotation file.
filter data information: Filter annotations according to filter_cfg.
slice_data: Slice dataset according to
self._indices
serialize_data: Serialize
self.data_list
ifself.serialize_data
is True.
- get_cat_ids(idx)[源代码]¶
Get category ids by index. Dataset wrapped by ClassBalancedDataset must implement this method.
The
ClassBalancedDataset
requires a subclass which implements this method.
- get_data_info(idx)[源代码]¶
Get annotation by index and automatically call
full_init
if the dataset has not been fully initialized.
- get_subset(indices)[源代码]¶
Return a subset of dataset.
This method will return a subset of original dataset. If type of indices is int,
get_subset_
will return a subdataset which contains the first or last few data information according to indices is positive or negative. If type of indices is a sequence of int, the subdataset will extract the information according to the index given in indices.示例
>>> dataset = BaseDataset('path/to/ann_file') >>> len(dataset) 100 >>> subdataset = dataset.get_subset(90) >>> len(sub_dataset) 90 >>> # if type of indices is list, extract the corresponding >>> # index data information >>> subdataset = dataset.get_subset([0, 1, 2, 3, 4, 5, 6, 7, >>> 8, 9]) >>> len(sub_dataset) 10 >>> subdataset = dataset.get_subset(-3) >>> len(subdataset) # Get the latest few data information. 3
- get_subset_(indices)[源代码]¶
The in-place version of
get_subset
to convert dataset to a subset of original dataset.This method will convert the original dataset to a subset of dataset. If type of indices is int,
get_subset_
will return a subdataset which contains the first or last few data information according to indices is positive or negative. If type of indices is a sequence of int, the subdataset will extract the data information according to the index given in indices.示例
>>> dataset = BaseDataset('path/to/ann_file') >>> len(dataset) 100 >>> dataset.get_subset_(90) >>> len(dataset) 90 >>> # if type of indices is sequence, extract the corresponding >>> # index data information >>> dataset.get_subset_([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) >>> len(dataset) 10 >>> dataset.get_subset_(-3) >>> len(dataset) # Get the latest few data information. 3
- load_data_list()[源代码]¶
Load annotations from an annotation file named as
self.ann_file
If the annotation file does not follow OpenMMLab 2.0 format dataset . The subclass must override this method for load annotations. The meta information of annotation file will be overwritten
METAINFO
andmetainfo
argument of constructor.
- property metainfo: dict¶
Get meta information of dataset.
- 返回:
meta information collected from
BaseDataset.METAINFO
, annotation file and metainfo argument during instantiation.- 返回类型:
- parse_data_info(raw_data_info)[源代码]¶
Parse raw annotation to target format.
This method should return dict or list of dict. Each dict or list contains the data information of a training sample. If the protocol of the sample annotations is changed, this function can be overridden to update the parsing logic while keeping compatibility.