This repository includes python code related to the Silent Cities Dataset. There are two parts:
- A Dataset Downloader script
- Code used for data preparation prior to publication. It is provided here for transparency and documentation. You do NOT need to run this code if you want to use the dataset.
The script download_osf.py will download the whole dataset. Beware that the full dataset is close to 5 terabytes.
usage: download_osf.py [-h] [--dest DEST [DEST ...]]
options:
-h, --help show this help message and exit
--dest DEST [DEST ...]
Add here as many paths as you want to download the data. If one path does not have enough space, it will switch to the next one
Example usage
python download_osf.py --dest /media/disk2/ /media/disk1/
This will first download on /media/disk1/
, then when it's full it will download on /media/disk2/
.
Requirements for the data downloader
numpy==1.23.5
pandas==2.0.0
Requests==2.31.0
wget==3.2
The overall sequence of processing is as follows.
Data preparation:
- Generation of a metadata file for each recording site, gathering all available audio for this site using metadata_file.py
- Generation of a metadata file across all sites using metadata_site.py
Data analysis (Script audio_processing.py):
- Calculation of ecoacoustic indices every 10 seconds
- Applying a pretrained Deep Learning model for audio tagging. (pretrained on Audioset using ResNet22 from PANNs)
Final step:
- Encoding of all raw audio files to FLAC format and final export in script convert.py
A requirements file is provided for reference here, exact versions might not be needed. Please adapt according to your setup.
file by file (individual files)
python metadata_file.py [-h] [--folder FOLDER]
[--save_path SAVE
7824
_PATH]
[--verbose] [--all]
Silent City Meta data generator file by file
optional arguments:
-h, --help show this help message and exit
--folder FOLDER Path to folder with wavefiles, will walk through subfolders
--save_path SAVE_PATH
Path to save meta data
--verbose Verbose (default False = nothing printed)
--all process all site. for multiple HDD (list path must be given in HDD variable in the script)
Generate one CSV by site
All sites
python metadata_site.py [-h] [--folder FOLDER]
[--save_path SAVE_PATH] [--database DATABASE]
[--verbose]
Silent City Meta data generator site by site
optional arguments:
-h, --help show this help message and exit
--folder FOLDER Path to folder with meta data
--save_path SAVE_PATH Path to save meta data by site
--database DATABASE database (csv)
--verbose Verbose (default False = nothing printed)
Generate metadata from all site (one CSV)
First download the ResNet22 pretrained model using instructions here
Processing one site
python audio_processing.py [-h] [--length LENGTH] [--batch_size BATCH_SIZE]
[--metadata_folder METADATA_FOLDER] [--site SITE]
[--folder FOLDER] [--database DATABASE] [--nocuda]
Silent City Audio Tagging with pretrained ResNet22 on Audioset (from https://github.com/qiuqiangkong/audioset_tagging_cnn)
optional arguments:
-h, --help show this help message and exit
--length LENGTH Segment length
--batch_size BATCH_SIZE
batch size
--metadata_folder METADATA_FOLDER
folder with all metadata
--site SITE site to process
--folder FOLDER Path to folder with wavefiles, will walk through subfolders
--database DATABASE Path to metadata (given by metadata_site.py)
--nocuda Do not use the GPU for acceleration
Nicolas Farrugia, Nicolas Pajusco, IMT Atlantique, 2020.
Code for Audioset Tagging CNN from Qiu Qiang Kong
Code for Eco-acoustic indices from Patric Guyot