Migrate Data Transform to OpenMMLab 2.0¶

Introduction¶

According to the data transform interface convention of TorchVision, all data transform classes need to implement the __call__ method. And in the convention of OpenMMLab 1.0, we require the input and output of the __call__ method should be a dictionary.

In OpenMMLab 2.0, to make the data transform classes more extensible, we use transform method instead of __call__ method to implement data transformation, and all data transform classes should inherit the mmcv.transforms.BaseTransform class. And you can still use these data transform classes by calling.

A tutorial to implement a data transform class can be found in the Data Transform.

In addition, we move some common data transform classes from every repositories to MMCV, and in this document, we will compare the functionalities, usages and implementations between the original data transform classes (in MMClassification v0.23.2, MMDetection v2.25.1) and the new data transform classes (in MMCV v2.0.0rc1)

Functionality Differences¶

	MMClassification (original)	MMDetection (original)	MMCV (new)
`LoadImageFromFile`	Join the 'img_prefix' and 'img_info.filename' field to find the path of images and loading.	Join the 'img_prefix' and 'img_info.filename' field to find the path of images and loading. Support specifying the order of channels.	Load images from 'img_path'. Support ignoring failed loading and specifying decode backend.
`LoadAnnotations`	Not available.	Load bbox, label, mask (include polygon masks), semantic segmentation. Support converting bbox coordinate system.	Load bbox, label, mask (not include polygon masks), semantic segmentation.
`Pad`	Pad all images in the "img_fields" field.	Pad all images in the "img_fields" field. Support padding to integer multiple size.	Pad the image in the "img" field. Support padding to integer multiple size.
`CenterCrop`	Crop all images in the "img_fields" field. Support cropping as EfficientNet style.	Not available.	Crop the image in the "img" field, the bbox in the "gt_bboxes" field, the semantic segmentation in the "gt_seg_map" field, the keypoints in the "gt_keypoints" field. Support padding the margin of the cropped image.
`Normalize`	Normalize the image.	No differences.	No differences, but we recommend to use data preprocessor to normalize the image.
`Resize`	Resize all images in the "img_fields" field. Support resizing proportionally according to the specified edge.	Use `Resize` with `ratio_range=None`, the `img_scale` have a single scale, and `multiscale_mode="value"`.	Resize the image in the "img" field, the bbox in the "gt_bboxes" field, the semantic segmentation in the "gt_seg_map" field, the keypoints in the "gt_keypoints" field. Support specifying the ratio of new scale to original scale and support resizing proportionally.
`RandomResize`	Not available	Use `Resize` with `ratio_range=None`, `img_scale` have two scales and `multiscale_mode="range"`, or `ratio_range` is not None. Resize( img_sacle=[(640, 480), (960, 720)], mode="range", )	Have the same resize function as `Resize`. Support sampling the scale from a scale range or scale ratio range. RandomResize(scale=[(640, 480), (960, 720)])
`RandomChoiceResize`	Not available	Use `Resize` with `ratio_range=None`, `img_scale` have multiple scales, and `multiscale_mode="value"`. Resize( img_sacle=[(640, 480), (960, 720)], mode="value", )	Have the same resize function as `Resize`. Support randomly choosing the scale from multiple scales or multiple scale ratios. RandomChoiceResize(scales=[(640, 480), (960, 720)])
`RandomGrayscale`	Randomly grayscale all images in the "img_fields" field. Support keeping channels after grayscale.	Not available	Randomly grayscale the image in the "img" field. Support specifying the weight of each channel, and support keeping channels after grayscale.
`RandomFlip`	Randomly flip all images in the "img_fields" field. Support flipping horizontally and vertically.	Randomly flip all values in the "img_fields", "bbox_fields", "mask_fields" and "seg_fields". Support flipping horizontally, vertically and diagonally, and support specifying the probability of every kind of flipping.	Randomly flip the values in the "img", "gt_bboxes", "gt_seg_map", "gt_keypoints" field. Support flipping horizontally, vertically and diagonally, and support specifying the probability of every kind of flipping.
`MultiScaleFlipAug`	Not available	Used for test-time-augmentation.	Use `TestTimeAug`
`ToTensor`	Convert the values in the specified fields to `torch.Tensor`.	No differences	No differences
`ImageToTensor`	Convert the values in the specified fields to `torch.Tensor` and transpose the channels to CHW.	No differences.	No differences.

Implementation Differences¶

Take RandomFlip as example, the new version RandomFlip in MMCV inherits BaseTransfrom, and move the functionality implementation from __call__ to transform method. In addition, the randomness related code is placed in some extra methods and these methods need to be wrapped by cache_randomness decorator.

MMDetection (original version)

class RandomFlip:
    def __call__(self, results):
        """Randomly flip images."""
        ...
        # Randomly choose the flip direction
        cur_dir = np.random.choice(direction_list, p=flip_ratio_list)
        ...
        return results

MMCV (new version)

class RandomFlip(BaseTransfrom):
    def transform(self, results):
        """Randomly flip images"""
        ...
        cur_dir = self._random_direction()
        ...
        return results

    @cache_randomness
    def _random_direction(self):
        """Randomly choose the flip direction"""
        ...
        return np.random.choice(direction_list, p=flip_ratio_list)