Migrate Data Transform to OpenMMLab 2.0¶
Introduction¶
According to the data transform interface convention of TorchVision, all data transform classes need to
implement the __call__
method. And in the convention of OpenMMLab 1.0, we require the input and output of
the __call__
method should be a dictionary.
In OpenMMLab 2.0, to make the data transform classes more extensible, we use transform
method instead of
__call__
method to implement data transformation, and all data transform classes should inherit the
mmcv.transforms.BaseTransform class. And you can still use these data
transform classes by calling.
A tutorial to implement a data transform class can be found in the Data Transform.
In addition, we move some common data transform classes from every repositories to MMCV, and in this document, we will compare the functionalities, usages and implementations between the original data transform classes (in MMClassification v0.23.2, MMDetection v2.25.1) and the new data transform classes (in MMCV v2.0.0rc1)
Functionality Differences¶
MMClassification (original) | MMDetection (original) | MMCV (new) | |
---|---|---|---|
LoadImageFromFile |
Join the 'img_prefix' and 'img_info.filename' field to find the path of images and loading. | Join the 'img_prefix' and 'img_info.filename' field to find the path of images and loading. Support specifying the order of channels. | Load images from 'img_path'. Support ignoring failed loading and specifying decode backend. |
LoadAnnotations |
Not available. | Load bbox, label, mask (include polygon masks), semantic segmentation. Support converting bbox coordinate system. | Load bbox, label, mask (not include polygon masks), semantic segmentation. |
Pad |
Pad all images in the "img_fields" field. | Pad all images in the "img_fields" field. Support padding to integer multiple size. | Pad the image in the "img" field. Support padding to integer multiple size. |
CenterCrop |
Crop all images in the "img_fields" field. Support cropping as EfficientNet style. | Not available. | Crop the image in the "img" field, the bbox in the "gt_bboxes" field, the semantic segmentation in the "gt_seg_map" field, the keypoints in the "gt_keypoints" field. Support padding the margin of the cropped image. |
Normalize |
Normalize the image. | No differences. | No differences, but we recommend to use data preprocessor to normalize the image. |
Resize |
Resize all images in the "img_fields" field. Support resizing proportionally according to the specified edge. | Use Resize with ratio_range=None , the img_scale have a single scale, and multiscale_mode="value" . |
Resize the image in the "img" field, the bbox in the "gt_bboxes" field, the semantic segmentation in the "gt_seg_map" field, the keypoints in the "gt_keypoints" field. Support specifying the ratio of new scale to original scale and support resizing proportionally. |
RandomResize |
Not available | Use Resize with ratio_range=None , img_scale have two scales and multiscale_mode="range" , or ratio_range is not None.
Resize( img_sacle=[(640, 480), (960, 720)], mode="range", ) |
Have the same resize function as Resize . Support sampling the scale from a scale range or scale ratio range.
RandomResize(scale=[(640, 480), (960, 720)]) |
RandomChoiceResize |
Not available | Use Resize with ratio_range=None , img_scale have multiple scales, and multiscale_mode="value" .
Resize( img_sacle=[(640, 480), (960, 720)], mode="value", ) |
Have the same resize function as Resize . Support randomly choosing the scale from multiple scales or multiple scale ratios.
RandomChoiceResize(scales=[(640, 480), (960, 720)]) |
RandomGrayscale |
Randomly grayscale all images in the "img_fields" field. Support keeping channels after grayscale. | Not available | Randomly grayscale the image in the "img" field. Support specifying the weight of each channel, and support keeping channels after grayscale. |
RandomFlip |
Randomly flip all images in the "img_fields" field. Support flipping horizontally and vertically. | Randomly flip all values in the "img_fields", "bbox_fields", "mask_fields" and "seg_fields". Support flipping horizontally, vertically and diagonally, and support specifying the probability of every kind of flipping. | Randomly flip the values in the "img", "gt_bboxes", "gt_seg_map", "gt_keypoints" field. Support flipping horizontally, vertically and diagonally, and support specifying the probability of every kind of flipping. |
MultiScaleFlipAug |
Not available | Used for test-time-augmentation. | Use TestTimeAug |
ToTensor |
Convert the values in the specified fields to torch.Tensor . |
No differences | No differences |
ImageToTensor |
Convert the values in the specified fields to torch.Tensor and transpose the channels to CHW. |
No differences. | No differences. |
Implementation Differences¶
Take RandomFlip
as example, the new version RandomFlip in MMCV inherits BaseTransfrom
, and move the
functionality implementation from __call__
to transform
method. In addition, the randomness related code
is placed in some extra methods and these methods need to be wrapped by cache_randomness
decorator.
MMDetection (original version)
class RandomFlip:
def __call__(self, results):
"""Randomly flip images."""
...
# Randomly choose the flip direction
cur_dir = np.random.choice(direction_list, p=flip_ratio_list)
...
return results
MMCV (new version)
class RandomFlip(BaseTransfrom):
def transform(self, results):
"""Randomly flip images"""
...
cur_dir = self._random_direction()
...
return results
@cache_randomness
def _random_direction(self):
"""Randomly choose the flip direction"""
...
return np.random.choice(direction_list, p=flip_ratio_list)