8000 Loading unoptimized parquet dataset throws an error · Issue #591 · Lightning-AI/litData · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
Loading unoptimized parquet dataset throws an error #591
Closed
@karinazad

Description

@karinazad

Hi, I'm trying to use StreamingDataset directly with parquet dataset and it's giving an error related do caching. Providing custom cache dir doesn't help. Any pointers on how I can run this?

import litdata as ld

uri = "s3://my-bucket/my-data"

ld.index_parquet_dataset(uri, "index")

ds = ld.StreamingDataset(uri)
    317     if self._item_loader.__class__.__name__ != self._config["item_loader"]:
    318         item_loader = self._config["item_loader"]
--> 319         raise ValueError(f"Please, use Cache(..., item_loader={item_loader}(...))")
    320 else:
    321     if (
    322         len(self._config["data_format"]) == 1
    323         and self._config["data_format"][0].startswith("no_header_tensor")
    324         and not isinstance(self._item_loader, TokensLoader)
    325     ):

ValueError: Please, use Cache(..., item_loader=ParquetLoader(...))

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workinghelp wantedExtra attention is needed

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0