8000 GitHub - robinschmid/microbe_masst: Using MASST or fastMASST, adding metadata onto a tree ontology for microbes
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

robinschmid/microbe_masst

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DOI

Welcome to domainMASSTs

This repository contains code and data for the different domain-specific MASSTs currently under development in the Dorrestein Lab at UC San Diego. This includes microbeMASST, plantMASST, foodMASST, and tissueMASST. Aggregated outputs of the domain MASSTs can be generated using metadataMASST.

The code for the different standalone web applications, which allow for the search of one spectrum at a time, can be found in GNPS_MASST

Web apps:

  1. microbeMASST
  2. plantMASST
  3. foodMASST
  4. tissueMASST
  5. metadataMASST

Publications associated with the different domainMASSTs:

  1. microbeMASST - Nature Microbiology
  2. plantMASST - biorxiv
  3. foodMASST - npj Science of Food

Fast Search via microbeMASST enables batch search of multiple spectra against multiple domain-specific MASSTs at once

Running jobs.py allows you to leverage the Fast Search API and execute a batch search of multiple MS/MS spectra against the current indexed data in GNPS/MassIVE (November 2023) and generate multiple outputs for all the listed domain-specific MASSTs simultaneously.

  1. A series of interactive HTML files trees will be generated for each domain-specific MASST ending with _domain.html (e.g., _microbe.html)
  2. A series of JSON files of the tree will be generated (e.g., _microbe.json)
  3. A _matches.tsv file will be generated, containing all the scans found to match your spectrum of interest in the data that have been indexed. This will include also samples that are not part of the listed domain-specific MASSTs.
  4. A _library.tsv file will be generated, containing a list of spectra from the GNPS libraries found to match your spectrum of interest. This enables level 2 annotation according the Metabolomics Standards Initiative.
  5. A _datasets.tsv file will be generated, containing number of samples found to be matching your spectrum per dataset included in the current index.
  6. A series of _count_domain.tsv files will be generated, containing information on matches found for each specific domain MASST.

Execute batch run

  1. Navigate to the jobs.py and add entries to the files list as ("input_directory/input_file", "output_directory/output_prefix)
  2. Check and adjust, based on your research question, the different parameters for the search, such as minimum cosine score, mz tolerance and number of minimum matching peaks.
  3. Run jobs.py

Note:

  1. You can run either a single .mgf file generated via MZmine, from the molecular networking in GNPS workflow, or a list of USIs provided either via a .csv or .tsv file.
  2. Make sure to run jobs.py a couple of times, until no new output is generated by having the option: skip_existing=True. Due to the Fast Search API some of the entries will fail. Nevertheless sequent re-runs should catch all the possible matches.
  3. Please make user to use Python 3.10

How to cite?

Please cite the following paper: microbeMASST: a taxonomically informed mass spectrometry search tool for microbial metabolomics data

About

Using MASST or fastMASST, adding metadata onto a tree ontology for microbes

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 7

0