File IO¶

MMEngine implements a unified set of file reading and writing interfaces in fileio module. With the fileio module, we can use the same function to handle different file formats, such as json, yaml and pickle. Other file formats can also be easily extended.

The fileio module also supports reading and writing files from a variety of file storage backends, including disk, Petrel (for internal use), Memcached, LMDB, and HTTP.

Load and dump data¶

MMEngine provides a universal API for loading and dumping data, currently supported formats are json, yaml, and pickle.

Load from disk or dump to disk¶

from mmengine import load, dump

# load data from a file
data = load('test.json')
data = load('test.yaml')
data = load('test.pkl')
# load data from a file-like object
with open('test.json', 'r') as f:
    data = load(f, file_format='json')

# dump data to a string
json_str = dump(data, file_format='json')

# dump data to a file with a filename (infer format from file extension)
dump(data, 'out.pkl')

# dump data to a file with a file-like object
with open('test.yaml', 'w') as f:
    data = dump(data, f, file_format='yaml')

Load from other backends or dump to other backends¶

from mmengine import load, dump

# load data from a file
data = load('s3://bucket-name/test.json')
data = load('s3://bucket-name/test.yaml')
data = load('s3://bucket-name/test.pkl')

# dump data to a file with a filename (infer format from file extension)
dump(data, 's3://bucket-name/out.pkl')

It is also very convenient to extend the API to support more file formats. All you need to do is to write a file handler inherited from BaseFileHandler and register it with one or several file formats.

from mmengine import register_handler, BaseFileHandler

# To register multiple file formats, a list can be used as the argument.
# @register_handler(['txt', 'log'])
@register_handler('txt')
class TxtHandler1(BaseFileHandler):

    def load_from_fileobj(self, file):
        return file.read()

    def dump_to_fileobj(self, obj, file):
        file.write(str(obj))

    def dump_to_str(self, obj, **kwargs):
        return str(obj)

Here is an example of PickleHandler:

from mmengine import BaseFileHandler
import pickle

class PickleHandler(BaseFileHandler):

    def load_from_fileobj(self, file, **kwargs):
        return pickle.load(file, **kwargs)

    def load_from_path(self, filepath, **kwargs):
        return super(PickleHandler, self).load_from_path(
            filepath, mode='rb', **kwargs)

    def dump_to_str(self, obj, **kwargs):
        kwargs.setdefault('protocol', 2)
        return pickle.dumps(obj, **kwargs)

    def dump_to_fileobj(self, obj, file, **kwargs):
        kwargs.setdefault('protocol', 2)
        pickle.dump(obj, file, **kwargs)

    def dump_to_path(self, obj, filepath, **kwargs):
        super(PickleHandler, self).dump_to_path(
            obj, filepath, mode='wb', **kwargs)

Load a text file as a list or dict¶

For example a.txt is a text file with 5 lines.

a
b
c
d
e

Load from disk¶

Use list_from_file to load the list from a.txt:

from mmengine import list_from_file

print(list_from_file('a.txt'))
# ['a', 'b', 'c', 'd', 'e']
print(list_from_file('a.txt', offset=2))
# ['c', 'd', 'e']
print(list_from_file('a.txt', max_num=2))
# ['a', 'b']
print(list_from_file('a.txt', prefix='/mnt/'))
# ['/mnt/a', '/mnt/b', '/mnt/c', '/mnt/d', '/mnt/e']

For example b.txt is a text file with 3 lines.

cat
dog cow
panda

Then use dict_from_file to load the dict from b.txt:

from mmengine import dict_from_file

print(dict_from_file('b.txt'))
# {'1': 'cat', '2': ['dog', 'cow'], '3': 'panda'}
print(dict_from_file('b.txt', key_type=int))
# {1: 'cat', 2: ['dog', 'cow'], 3: 'panda'}

Load from other backends¶

Use list_from_file to load the list from s3://bucket-name/a.txt:

from mmengine import list_from_file

print(list_from_file('s3://bucket-name/a.txt'))
# ['a', 'b', 'c', 'd', 'e']
print(list_from_file('s3://bucket-name/a.txt', offset=2))
# ['c', 'd', 'e']
print(list_from_file('s3://bucket-name/a.txt', max_num=2))
# ['a', 'b']
print(list_from_file('s3://bucket-name/a.txt', prefix='/mnt/'))
# ['/mnt/a', '/mnt/b', '/mnt/c', '/mnt/d', '/mnt/e']

Use dict_from_file to load the dict from s3://bucket-name/b.txt.

from mmengine import dict_from_file

print(dict_from_file('s3://bucket-name/b.txt'))
# {'1': 'cat', '2': ['dog', 'cow'], '3': 'panda'}
print(dict_from_file('s3://bucket-name/b.txt', key_type=int))
# {1: 'cat', 2: ['dog', 'cow'], 3: 'panda'}

Load and dump checkpoints¶

We can read the checkpoints from disk or internet in the following way:

import torch

filepath1 = '/path/of/your/checkpoint1.pth'
filepath2 = 'http://path/of/your/checkpoint3.pth'

# read checkpoints from disk
checkpoint = torch.load(filepath1)
# save checkpoints to disk
torch.save(checkpoint, filepath1)

# read checkpoints from internet
checkpoint = torch.utils.model_zoo.load_url(filepath2)

In MMEngine, reading and writing checkpoints in different storage forms can be uniformly implemented with load_checkpoint and save_checkpoint:

from mmengine import load_checkpoint, save_checkpoint

filepath1 = '/path/of/your/checkpoint1.pth'
filepath2 = 's3://bucket-name/path/of/your/checkpoint1.pth'
filepath3 = 'http://path/of/your/checkpoint3.pth'

# read checkpoints from disk
checkpoint = load_checkpoint(filepath1)
# save checkpoints to disk
save_checkpoint(checkpoint, filepath1)

# read checkpoints from s3
checkpoint = load_checkpoint(filepath2)
# save checkpoints to s3
save_checkpoint(checkpoint, filepath2)

# read checkpoints from internet
checkpoint = load_checkpoint(filepath3)