- Free software: BSD license
- Documentation: https://WenyuOuyang.github.io/hydrodatasource
📜 中文文档
Although numerous public watershed hydrological datasets are available, there are still challenges in this field:
- Many datasets are not updated or included in subsequent versions after initial organization.
- Some datasets remain uncovered by existing collections.
- Non-public datasets cannot be directly shared.
To address these issues, hydrodatasource provides a framework to organize and manage these datasets, making them more efficient for use in watershed-based research and production scenarios.
This repository works in conjunction with hydrodataset, which focuses on public datasets for hydrological modeling. In contrast, hydrodatasource integrates a broader range of data resources, including non-public and custom datasets.
hydrodatasource processes data that primarily falls into three categories:
These are typically publicly available hydrological datasets from academic papers, currently including:
- GAGES dataset
- GRDC dataset
- CRD and other reservoir datasets
These datasets are often proprietary or confidential and require specific tools for formatting and integration, including:
Custom Station Data: User-prepared station data formatted according to standard specifications and converted to NetCDF format.
Based on these two categories of data, we also organize a category of custom hydrological datasets, which are datasets constructed for specific research needs based on agreed standard formats.
hydrodatasource provides standardized methods for:
- Structuring datasets according to predefined conventions.
- Integrating various data sources into a unified framework.
- Supporting data access and processing for hydrological modeling.
- Public Data: Supports data format conversion and local file operations.
- Non-Public Data: Provides tools to format and integrate user-prepared data.
The repository structure supports diverse workflows, including:
- Category A Datasets: Tools to organize and access public hydrological datasets.
- Category B Data: Custom tools to clean and process station, reservoir, and basin time-series data.
- Category C Custom Datasets: Support for reading data in defined standard dataset formats.
hydrodatasource interacts with the following components:
- hydrodataset: Provides necessary support for accessing public watershed hydrological modeling datasets for hydrodatasource.
- HydroDataCompiler: Supports semi-automated processing of non-public and custom data (currently not public).
Install the package via pip:
pip install hydrodatasource
Note: The project is still in the early stages of development, so development mode is recommended.
The repository adopts the following directory structure for organizing data:
├── ClassA
├── 1st_origin
├── 2nd_process
├── ClassB
├── 1st_origin
├── 2nd_process
├── ClassC
1st_origin
: Raw data, often from proprietary sources, in unified formats.2nd_process
: Intermediate results after initial processing and data ready for analysis or modeling.
The data reading code is mainly located in the reader folder. Currently, the main interface functions provided are:
- Reading GRDC, GAGES, CRD and other datasets
- Reading custom station data
- Reading custom datasets for hydrological modeling
We will provide more detailed documentation in the future.