Closed
Description
Hi, I'm trying to use StreamingDataset directly with parquet dataset and it's giving an error related do caching. Providing custom cache dir doesn't help. Any pointers on how I can run this?
import litdata as ld
uri = "s3://my-bucket/my-data"
ld.index_parquet_dataset(uri, "index")
ds = ld.StreamingDataset(uri)
317 if self._item_loader.__class__.__name__ != self._config["item_loader"]:
318 item_loader = self._config["item_loader"]
--> 319 raise ValueError(f"Please, use Cache(..., item_loader={item_loader}(...))")
320 else:
321 if (
322 len(self._config["data_format"]) == 1
323 and self._config["data_format"][0].startswith("no_header_tensor")
324 and not isinstance(self._item_loader, TokensLoader)
325 ):
ValueError: Please, use Cache(..., item_loader=ParquetLoader(...))