8000 GitHub - brain-bzh/SilentCities
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

brain-bzh/SilentCities

Repository files navigation

Silent Cities Data Download and Data Preparation reference

This repository includes python code related to the Silent Cities Dataset. There are two parts:

  1. A Dataset Downloader script
  2. Code used for data preparation prior to publication. It is provided here for transparency and documentation. You do NOT need to run this code if you want to use the dataset.

Dataset Downloader

The script download_osf.py will download the whole dataset. Beware that the full dataset is close to 5 terabytes.

usage: download_osf.py [-h] [--dest DEST [DEST ...]]

options:
  -h, --help            show this help message and exit
  --dest DEST [DEST ...]
                        Add here as many paths as you want to download the data. If one path does not have enough space, it will switch to the next one

Example usage

python download_osf.py --dest /media/disk2/ /media/disk1/

This will first download on /media/disk1/, then when it's full it will download on /media/disk2/.

Requirements for the data downloader

numpy==1.23.5
pandas==2.0.0
Requests==2.31.0
wget==3.2

Description of Data preparation

The overall sequence of processing is as follows.

Data preparation:

  • Generation of a metadata file for each recording site, gathering all available audio for this site using metadata_file.py
  • Generation of a metadata file across all sites using metadata_site.py

Data analysis (Script audio_processing.py):

  • Calculation of ecoacoustic indices every 10 seconds
  • Applying a pretrained Deep Learning model for audio tagging. (pretrained on Audioset using ResNet22 from PANNs)

Final step:

  • Encoding of all raw audio files to FLAC format and final export in script convert.py

Requirements

A requirements file is provided for reference here, exact versions might not be needed. Please adapt according to your setup.

Usage

Preprocessing :

file by file (individual files)

python metadata_file.py [-h] [--folder FOLDER] 
                            [--save_path SAVE
7824
_PATH]
                            [--verbose] [--all]

Silent City Meta data generator file by file

optional arguments:
-h, --help            show this help message and exit
--folder FOLDER       Path to folder with wavefiles, will walk through subfolders
--save_path SAVE_PATH
                        Path to save meta data
--verbose             Verbose (default False = nothing printed)
--all                 process all site. for multiple HDD (list path must be given in HDD variable in the script)

Generate one CSV by site

All sites

python metadata_site.py [-h] [--folder FOLDER] 
                        [--save_path SAVE_PATH] [--database DATABASE] 
                        [--verbose]

Silent City Meta data generator site by site

optional arguments:
-h, --help            show this help message and exit
--folder FOLDER       Path to folder with meta data
--save_path SAVE_PATH Path to save meta data by site
--database DATABASE   database (csv)
--verbose             Verbose (default False = nothing printed)

Generate metadata from all site (one CSV)

Audio processing

First download the ResNet22 pretrained model using instructions here

Processing one site

python audio_processing.py [-h] [--length LENGTH] [--batch_size BATCH_SIZE] 
                            [--metadata_folder METADATA_FOLDER] [--site SITE] 
                            [--folder FOLDER] [--database DATABASE] [--nocuda]

Silent City Audio Tagging with pretrained ResNet22 on Audioset (from https://github.com/qiuqiangkong/audioset_tagging_cnn)

optional arguments:
-h, --help            show this help message and exit
--length LENGTH       Segment length
--batch_size BATCH_SIZE
                        batch size
--metadata_folder METADATA_FOLDER
                        folder with all metadata
--site SITE           site to process
--folder FOLDER       Path to folder with wavefiles, will walk through subfolders
--database DATABASE   Path to metadata (given by metadata_site.py)
--nocuda              Do not use the GPU for acceleration

Credits

Nicolas Farrugia, Nicolas Pajusco, IMT Atlantique, 2020.

Code for Audioset Tagging CNN from Qiu Qiang Kong

Code for Eco-acoustic indices from Patric Guyot

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

No packages published
0