SEN2VENµS, a dataset for the training of Sentinel-2 super-resolution algorithms
- 1. CESBIO, Université de Toulouse, CNES, CNRS, INRAE, IRD, UT3
Description
1 Description
SEN2VENµS is an open dataset for the super-resolution of Sentinel-2 images by leveraging simultaneous acquisitions with the VENµS satellite. The dataset is composed of 10m and 20m cloud-free surface reflectance patches from Sentinel-2, with their reference spatially-registered surface reflectance patches at 5 meters resolution acquired on the same day by the VENµS satellite. This dataset covers 29 locations with a total of 132 955 patches of 256x256 pixels at 5 meters resolution, and can be used for the training of super-resolution algorithms to bring spatial resolution of 8 of the Sentinel-2 bands down to 5 meters.
2 Files organization
The dataset is composed of separate sub-datasets, one for each site, as described in table 1.
Site | Number of patches | Number of pairs | VENµS Zenith Angle |
---|---|---|---|
FR-LQ1 | 4888 | 18 | 1.795402 |
NARYN | 3814 | 25 | 5.010906 |
FGMANAUS | 129 | 4 | 7.232127 |
MAD-AMBO | 1443 | 19 | 14.788115 |
ARM | 15859 | 39 | 15.160683 |
BAMBENW2 | 9018 | 34 | 17.766533 |
ES-IC3XG | 8823 | 35 | 18.807686 |
ANJI | 2314 | 16 | 19.310494 |
ATTO | 2258 | 9 | 22.048651 |
ESGISB-3 | 6057 | 19 | 23.683871 |
ESGISB-1 | 2892 | 13 | 24.561609 |
FR-BIL | 7105 | 30 | 24.802892 |
K34-AMAZ | 1385 | 21 | 24.982675 |
ESGISB-2 | 3067 | 13 | 26.209776 |
ALSACE | 2654 | 17 | 26.877071 |
LERIDA-1 | 2281 | 6 | 28.524780 |
ESTUAMAR | 912 | 13 | 28.871947 |
SUDOUE-5 | 2176 | 20 | 29.170244 |
KUDALIAR | 7269 | 20 | 29.180855 |
SUDOUE-6 | 2435 | 14 | 29.192055 |
SUDOUE-4 | 935 | 7 | 29.516127 |
SUDOUE-3 | 5363 | 14 | 29.998115 |
SO1 | 12018 | 36 | 30.255978 |
SUDOUE-2 | 9700 | 27 | 31.295256 |
ES-LTERA | 1701 | 19 | 31.971764 |
FR-LAM | 7299 | 22 | 32.054056 |
SO2 | 738 | 22 | 32.218481 |
BENGA | 5858 | 29 | 32.587334 |
JAM2018 | 2564 | 18 | 33.718953 |
For each site, the sub-dataset folder contains a set of files for each date, following this naming convention as the pair id: {site_name}_{mgrs_tile}_{acquisition_date}
. For each pair, 5 files are available, as shown in table 2. Patches are encoded as ready-to-use tensors as serialized by the well known Pytorch library1. As such they can be loaded by a simple call to the torch.load()
function. Note that bands are separated into two groups (10m and 20m Sentinel2 bands), which leads to four separate tensor files (2 groups of bands \(\times\) source and target resolution). Tensor shape is [n,c,w,h]
where \(n\) is the number of patches, \(c=4\) is the number of bands, \(w\) is the patch width and \(h\) is the patch height. In order to save storage space, they are encoded as 16 bits signed integers and should be converted back to floating point surface reflectance by dividing each and every value by 10 000 upon reading.
File | Content |
---|---|
{id}_05m_b2b3b4b8.pt |
5m patches (\(256\times256\) pix.) for S2 B2, B3, B4 and B8 (from VENµS) |
{id}_10m_b2b3b4b8.pt |
10m patches (\(128\times128\) pix.) for S2 B2, B3, B4 and B8 (from Sentinel-2) |
{id}_05m_b5b6b7b8a.pt |
5m patches (\(256\times256\) pix.) for S2 B5, B6, B7 and B8A (from VENµS) |
{id}_20m_b5b6b7b8a.pt |
20m patches (\(64\times64\) pix.) for S2 B5, B6, B7 and B8A (from Sentinel-2) |
{id}_patches.gpkg |
GIS file with footprint of each patch |
Each file comes with a master index.csv
CSV (Comma Separated Values) file, with one row for each pair sampled in the given site, and columns as described in table 3, separated with tabs.
Column | Description |
---|---|
venus_product_id |
ID of the sampled VENµS L2A product |
sentinel2_product_id |
ID of the sampled Sentinel-2 L2A product |
tensor_05m_b2b3b4b8 |
Name of the 5m tensor file for S2 B2, B3, B4 and B8 (from VENµS) |
tensor_10m_b2b3b4b8 |
Name of the 10m tensor file for S2 B2, B3, B4 and B8 (from Sentinel-2) |
tensor_05m_b5b6b7b8a |
Name of the 5m tensor file for S2 B5, B6, B7 and B8A (from VENµS) |
tensor_20m_b5b6b7b8a |
Name of the 20m tensor file for S2 B5, B6, B7 and B8A (from Sentinel-2) |
s2_tile |
Sentinel-2 MGRS tile |
vns_site |
Name of VENµS site |
date |
Acquisition date as YYYY-MM-DD |
venus_zenith_angle |
VENµS zenith viewing angle in degrees |
patches_gpkg |
Name of the GIS file with footprint for each patch |
nb_patches |
Number of patches for this pair |
Each site folder is compressed to a different 7z file.
3 Licencing
3.1 Sentinel-2 patches
3.1.1 Copyright
Value-added data processed by CNES for the Theia data centre www.theia-land.fr using Copernicus products. The processing uses algorithms developed by Theia's Scientific Expertise Centres. Note: Copernicus Sentinel-2 Level 1C data is subject to this license: https://theia.cnes.fr/atdistrib/documents/TC_Sentinel_Data_31072014.pdf
3.1.2 Licence
Files {id}_05m_b2b3b4b8.pt
and {id}_05m_b5b6b7b8a.pt
are distributed under the the original licence of the Sentinel-2 Theia L2A products, which is the Etalab Open Licence Version 2.0 2.
3.2 VENµS patches
3.2.1 Copyright
Value-added data processed by CNES for the Theia data centre www.theia-land.fr using VENµS satellite imagery from CNES and Israeli Space Agency. The processing uses algorithms developed by Theia's Scientific Expertise Centres.
3.2.2 Licence
Files {id}_05m_b2b3b4b8.pt
and {id}_05m_b5b6b7b8a.pt
are distributed under the original licence of the VENµS products, which is Creative Commons BY-NC 4.0 3.
3.3 Remaining files
All remaining files are distributed under the Creative Commons BY 4.0 4 licence.
4 Note to users
Note that even if the VenµS2 dataset is sorted by sites and by pairs, we strongly encourage users to apply the full set of machine learning best practices when using it : random keeping separate pairs (or even sites) for testing purpose, and randomization of patches accross sites and pairs in the training and validation sets.
5 Citing
Please cite the following data paper (preprint, submitted to MDPI Data) and zenodo link when publishing work derived from this dataset:
Michel, J.; Vinasco-Salinas, J.; Inglada, J.; Hagolle, O. SEN2VENµS, a Dataset for the Training of Sentinel-2 Super-Resolution Algorithms. Data 2022, 7, 96. https://doi.org/10.3390/data7070096
https://zenodo.org/deposit/6514159
Footnotes:
https://theia.cnes.fr/atdistrib/documents/Licence-Theia-CNES-Sentinel-ETALAB-v2.0-en.pdf
Files
Files
(85.0 GB)
Name | Size | Download all |
---|---|---|
md5:ecbf57fc83a8c8ca47ab421642bbef57
|
1.7 GB | Download |
md5:2b6521e2fd43fc220557d1a171f94c06
|
1.5 GB | Download |
md5:9c264cd01640707f483f78a88c1a40c8
|
9.6 GB | Download |
md5:c6d7905816f8c807e5a87f4a2d09a4ae
|
1.1 GB | Download |
md5:f804161f30c295dab1172e904ecb38be
|
5.2 GB | Download |
md5:a3bdc8fd5ac049b2d07b308fc1f0706a
|
3.7 GB | Download |
md5:e7a19cd51f048a006688f6b2ea795d55
|
5.6 GB | Download |
md5:226cd7c10689f9aad92c760d9c1899fe
|
1.1 GB | Download |
md5:ab1c0e9a70c566d6fe8b94ba421a15d6
|
2.0 GB | Download |
md5:20196e6e963170e641fc805330077434
|
2.0 GB | Download |
md5:ac42ab2ddb89975b55395ace90ecc0a6
|
3.7 GB | Download |
md5:2b540369499c7b9882f7e195699e9438
|
498.1 MB | Download |
md5:06d422d9f4ba0c2ed1087c2a7f0339c5
|
65.7 MB | Download |
md5:c4305e091b61de5583842f71b4122ed3
|
4.3 GB | Download |
md5:1bceb23259d7f101ee0e1df141b5e550
|
5.1 GB | Download |
md5:535489d0d3bc23e8e7646a20b99575e6
|
3.1 GB | Download |
md5:2e2a6de2b5842ce86d074ebd8c68354b
|
1.6 GB | Download |
md5:7abf9ef3f89bd30b905c0029169b88d1
|
659.9 MB | Download |
md5:1427c8a4bc1e238c5c63e434fd6d31c6
|
5.0 GB | Download |
md5:d507dcbc1b92676410df9e4f650ea23b
|
1.5 GB | Download |
md5:373f2ea88a57d51c5f54778c36503027
|
2.1 kB | Download |
md5:49e43cd47ecdc5360c83e448eaf73fbb
|
889.3 MB | Download |
md5:a21a655812d6cfd309d1e76c95463916
|
1.3 kB | Download |
md5:56474220d0014e53aa0c96ea93c03bc9
|
2.4 GB | Download |
md5:62b5ce44dc641639079c15227cdbd794
|
8.2 GB | Download |
md5:59afd969b950f90df0f8ce8b1dbccd62
|
510.2 MB | Download |
md5:5aed36a3d5e9746e5f5c438d10fae413
|
6.7 GB | Download |
md5:0eeb556caaae171b8fbd0696f4757308
|
3.5 GB | Download |
md5:aac762b62ac240720d34d5bb3fc4a906
|
580.6 MB | Download |
md5:69042546af7bd25a0398b04c2ce60057
|
1.4 GB | Download |
md5:ca143d2a2a56db30ab82c33420433e01
|
1.6 GB | Download |
Additional details
Related works
- Is documented by
- Journal article: 10.3390/data7070096 (DOI)