Shortcuts

Migrate Data Transform to OpenMMLab 2.0

Introduction

According to the data transform interface convention of TorchVision, all data transform classes need to implement the __call__ method. And in the convention of OpenMMLab 1.0, we require the input and output of the __call__ method should be a dictionary.

In OpenMMLab 2.0, to make the data transform classes more extensible, we use transform method instead of __call__ method to implement data transformation, and all data transform classes should inherit the mmcv.transforms.BaseTransform class. And you can still use these data transform classes by calling.

A tutorial to implement a data transform class can be found in the Data Transform.

In addition, we move some common data transform classes from every repositories to MMCV, and in this document, we will compare the functionalities, usages and implementations between the original data transform classes (in MMClassification v0.23.2, MMDetection v2.25.1) and the new data transform classes (in MMCV v2.0.0rc1)

Functionality Differences

MMClassification (original) MMDetection (original) MMCV (new)
LoadImageFromFile Join the 'img_prefix' and 'img_info.filename' field to find the path of images and loading. Join the 'img_prefix' and 'img_info.filename' field to find the path of images and loading. Support specifying the order of channels. Load images from 'img_path'. Support ignoring failed loading and specifying decode backend.
LoadAnnotations Not available. Load bbox, label, mask (include polygon masks), semantic segmentation. Support converting bbox coordinate system. Load bbox, label, mask (not include polygon masks), semantic segmentation.
Pad Pad all images in the "img_fields" field. Pad all images in the "img_fields" field. Support padding to integer multiple size. Pad the image in the "img" field. Support padding to integer multiple size.
CenterCrop Crop all images in the "img_fields" field. Support cropping as EfficientNet style. Not available. Crop the image in the "img" field, the bbox in the "gt_bboxes" field, the semantic segmentation in the "gt_seg_map" field, the keypoints in the "gt_keypoints" field. Support padding the margin of the cropped image.
Normalize Normalize the image. No differences. No differences, but we recommend to use data preprocessor to normalize the image.
Resize Resize all images in the "img_fields" field. Support resizing proportionally according to the specified edge. Use Resize with ratio_range=None, the img_scale have a single scale, and multiscale_mode="value". Resize the image in the "img" field, the bbox in the "gt_bboxes" field, the semantic segmentation in the "gt_seg_map" field, the keypoints in the "gt_keypoints" field. Support specifying the ratio of new scale to original scale and support resizing proportionally.
RandomResize Not available Use Resize with ratio_range=None, img_scale have two scales and multiscale_mode="range", or ratio_range is not None.
Resize(
    img_sacle=[(640, 480), (960, 720)],
    mode="range",
)
Have the same resize function as Resize. Support sampling the scale from a scale range or scale ratio range.
RandomResize(scale=[(640, 480), (960, 720)])
RandomChoiceResize Not available Use Resize with ratio_range=None, img_scale have multiple scales, and multiscale_mode="value".
Resize(
    img_sacle=[(640, 480), (960, 720)],
    mode="value",
)
Have the same resize function as Resize. Support randomly choosing the scale from multiple scales or multiple scale ratios.
RandomChoiceResize(scales=[(640, 480), (960, 720)])
RandomGrayscale Randomly grayscale all images in the "img_fields" field. Support keeping channels after grayscale. Not available Randomly grayscale the image in the "img" field. Support specifying the weight of each channel, and support keeping channels after grayscale.
RandomFlip Randomly flip all images in the "img_fields" field. Support flipping horizontally and vertically. Randomly flip all values in the "img_fields", "bbox_fields", "mask_fields" and "seg_fields". Support flipping horizontally, vertically and diagonally, and support specifying the probability of every kind of flipping. Randomly flip the values in the "img", "gt_bboxes", "gt_seg_map", "gt_keypoints" field. Support flipping horizontally, vertically and diagonally, and support specifying the probability of every kind of flipping.
MultiScaleFlipAug Not available Used for test-time-augmentation. Use TestTimeAug
ToTensor Convert the values in the specified fields to torch.Tensor. No differences No differences
ImageToTensor Convert the values in the specified fields to torch.Tensor and transpose the channels to CHW. No differences. No differences.

Implementation Differences

Take RandomFlip as example, the new version RandomFlip in MMCV inherits BaseTransfrom, and move the functionality implementation from __call__ to transform method. In addition, the randomness related code is placed in some extra methods and these methods need to be wrapped by cache_randomness decorator.

  • MMDetection (original version)

class RandomFlip:
    def __call__(self, results):
        """Randomly flip images."""
        ...
        # Randomly choose the flip direction
        cur_dir = np.random.choice(direction_list, p=flip_ratio_list)
        ...
        return results
  • MMCV (new version)

class RandomFlip(BaseTransfrom):
    def transform(self, results):
        """Randomly flip images"""
        ...
        cur_dir = self._random_direction()
        ...
        return results

    @cache_randomness
    def _random_direction(self):
        """Randomly choose the flip direction"""
        ...
        return np.random.choice(direction_list, p=flip_ratio_list)