Quick examples

Version	Docs	Tests	Coverage	Style	PyPI	Python	PyTorch	Docker	Roadmap

torchdata is PyTorch oriented library focused on data processing and input pipelines in general.

It extends torch.utils.data.Dataset and equips it with functionalities known from tensorflow.data like map or cache (with some additions unavailable in aforementioned) . All of that with minimal interference (single call to super().__init__()) in original PyTorch's datasets.

Functionalities overview:

map or apply arbitrary functions to dataset
cache allows you to cache data in memory or on disk (even partially, say first 20%)
Full torch.utils.data.IterableDataset and torch.utils.data.Dataset support
Easy to create custom methods of caching, choosing elements to cache, maps and datasets
Concrete and base classes designed for file reading and other general tasks

Quick examples

Create image dataset, convert it to Tensors, cache and concatenate with smoothed labels:

# Imports assumed
# Example dataset return all 1 labels
class Labels(torchdata.Dataset):
    def __init__(self, length):
        self.length = length
        super().__init__()

    def __getitem__(self, _):
        return 1

    def __len__(self):
        return len(length)


# Convenience class based on torchdata.Dataset
class ImageDataset(torchdata.Files):
    def __getitem__(self, index):
        return Image.open(self.files[index])


images = (
    ImageDataset.from_folder("./data").map(torchvision.transforms.ToTensor()).cache()
)

smoothed_labels = Labels(len(images)).map(lambda label: label - 0.1)

# That's how you concatenate sample-wise
for image, label in images | smoothed_labels:
    pass

Cache first 1000 samples in memory, save the rest on disk in folder ./cache:

images = (
    ImageDataset.from_folder("./data").map(torchvision.transforms.ToTensor())
    # First 1000 samples in memory
    .cache(torchdata.modifiers.UpToIndex(torchdata.cachers.Memory(), 1000))
    # Sample from 1000 to the end saved with Pickle on disk
    .cache(torchdata.modifiers.FromIndex(torchdata.cachers.Pickle("./cache"), 1000))
    # You can define your own cachers, modifiers, see docs
)

To see what else you can do please check torchdata documentation

Installation

pip

Latest release:

pip install --user torchdata

Nightly:

pip install --user torchdata-nightly

Docker

CPU standalone and various versions of GPU enabled images are available at dockerhub.

For CPU quickstart, issue:

docker pull szymonmaszke/torchdata:18.04

Nightly builds are also available, just prefix tag with nightly_. If you are going for GPU image make sure you have nvidia/docker installed and it's runtime set.

Contributing

If you find any issue or you think some functionality may be useful to others and fits this library, please open new Issue or create Pull Request.

To get an overview of something which one can done to help this project, see Roadmap

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
.github/workflows		.github/workflows
assets		assets
docs		docs
environments		environments
tests		tests
torchdata		torchdata
.codebeatignore		.codebeatignore
.coveragerc		.coveragerc
.dockerignore		.dockerignore
.gitignore		.gitignore
LICENSE		LICENSE
METADATA		METADATA
README.md		README.md
ROADMAP.md		ROADMAP.md
nightly		nightly
setup.py		setup.py
tox.ini		tox.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Functionalities overview:

Quick examples

Installation

pip

Latest release:

Nightly:

Docker

Contributing

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 6

Uh oh!

Languages

License

szymonmaszke/torchdatasets

Folders and files

Latest commit

History

Repository files navigation

Functionalities overview:

Quick examples

Installation

pip

Latest release:

Nightly:

Docker

Contributing

About

Topics

Resources

License

Code of conduct

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 6

Uh oh!

Languages

Packages