Fetch metadata information from the following databases:
- GEO: Gene Expression Omnibus,
- SRA: Sequence Read Archive,
- EMBL-EBI: European Molecular BIology Laboratory’s European BIoinformatics Institute,
- DDBJ: DNA Data Bank of Japan,
- NIH Biosample: Biological source materials used in experimental assays,
- ENCODE: The Encyclopedia of DNA Elements.
ffq
receives an accession and returns the metadata for that accession as well as the metadata for all downstream accessions following the connections between GEO, SRA, EMBL-EBI, DDBJ, and Biosample. If you use ffq
in a publication, please the cite*:
Gálvez-Merchán, Á., et al. (2022). Metadata retrieval from sequence databases with ffq. bioRxiv 2022.05.18.492548.
The manuscript is available here: https://doi.org/10.1101/2022.05.18.492548.
By default, ffq returns all downstream metadata down to the level of the SRR record. However, the desired level of resolution can be specified.
ffq
can also skip returning the metadata, and instead return the raw data download links from any available host (FTP
, AWS
, GCP
or NCBI
) for GEO and SRA ids.
The latest release can be installed with
pip install ffq
The development version can be installed with
pip install git+https://github.com/pachterlab/ffq
ffq [accession]
where [accession]
is either:
-
an SRA/EBI/DDJ accession
- (
SRR
,SRX
,SRS
orSRP
) - (
ERR
,ERX
,ERS
orERP
) - (
DRR
,DRS
,DRX
orDRP
)
- (
-
a GEO accession (
GSE
orGSM
) -
an ENCODE accession (
ENCSR
,ENCSB
orENCSD
) -
a Bioproject accession (
CXR
) -
a Biosample accession (
SAMN
') -
a DOI
$ ffq SRR9990627
#=> Returns metadata for the SRR9990627 run.
$ ffq SRX7347523
#=> Returns metadata for the experiment SRX7347523 and for its associated SRR run.
$ ffq GSE129845
#=> Returns metadata for GSE129845 and for its 5 associated GSM, SRS, SRX and SRR ids.
$ ffq DRP004583
#=> Returns metadata for the study DRP004583 and its 104 associated DRS, DRX and SRR ids.
$ ffq ENCSR998WNE
#=> Returns metadata for the ENCODE experiment ENCSR998WNE.