8000 GitHub - ylab-hi/DeepChopper: Language models identify chimeric artificial reads in NanoPore direct-RNA sequencing data.
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

ylab-hi/DeepChopper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

logo DeepChopper social

pypi PyPI - Wheel license pypi version platform Actions status Space

🧬 DeepChopper leverages a language model to accurately detect and chop artificial sequences that may cause chimeric reads, ensuring higher quality and more reliable sequencing results. By integrating seamlessly with existing workflows, DeepChopper provides a robust solution for researchers and bioinformaticians working with Nanopore direct-RNA sequencing data.

πŸ“˜ FEATURED: We provide a comprehensive tutorial that includes an example dataset in our full documentation.

πŸš€ Quick Start: Try DeepChopper Online

Experience DeepChopper instantly through our user-friendly web interface. No installation required! Simply click the button below to launch the web application and start exploring DeepChopper's capabilities:

Open in Hugging Face Spaces

What you can do online:

  • πŸ“€ Upload your sequencing data
  • πŸ”¬ Run DeepChopper's analysis
  • πŸ“Š Visualize results
  • πŸŽ›οΈ Experiment with different parameters

Perfect for quick tests or demonstrations! However, for extensive analyses or custom workflows, we recommend installing DeepChopper locally.

⚠️ Note: The online version is limited to one FASTQ record at a time and may not be suitable for large-scale projects.

πŸ“¦ Installation

DeepChopper can be installed using pip, the Python package installer. Follow these steps to install:

  1. Ensure you have Python 3.10 or later installed on your system.

  2. Create a virtual environment (recommended):

    python -m venv deepchopper_env
    source deepchopper_env/bin/activate  # On Windows use `deepchopper_env\Scripts\activate`
  3. Install DeepChopper:

    pip install deepchopper
  4. Verify the installation:

    deepchopper --help

Compatibility and Support

DeepChopper is designed to work across various platforms and Python versions. Below are the compatibility matrices for PyPI installations:

Python Version Linux x86_64 macOS Intel macOS Apple Silicon Windows x86_64
3.10 βœ… βœ… βœ… βœ…
3.11 βœ… βœ… βœ… βœ…
3.12 βœ… βœ… βœ… βœ…

πŸ†˜ Trouble installing? Check our Troubleshooting Guide or open an issue.

πŸ› οΈ Usage

For a comprehensive guide, check out our full tutorial. Here's a quick overview:

Command-Line Interface

DeepChopper offers three main commands: encode, predict, and chop.

  1. Encode your input data:

    deepchopper encode <input.fq>
  2. Predict chimera artifacts:

    deepchopper predict <input.parquet> --output predictions

    Using GPUs? Add the --gpus flag:

    deepchopper predict <input.parquet> --output predictions --gpus 2
  3. Chop chimera artifacts:

    deepchopper chop <predictions> raw.fq

Want a GUI? Launch the web interface (note: limited to one FASTQ record at a time):

deepchopper web

Python Library

Integrate DeepChopper into your Python scripts:

import deepchopper

model = deepchopper.DeepChopper.from_pretrained("yangliz5/deepchopper")
# Your analysis code here

πŸ“š Cite

If DeepChopper aids your research, please cite our paper:

@article {Li2024.10.23.619929,
        author = {Li, Yangyang and Wang, Ting-You and Guo, Qingxiang and Ren, Yanan and Lu, Xiaotong and Cao, Qi and Yang, Rendong},
        title = {A Genomic Language Model for Chimera Artifact Detection in Nanopore Direct RNA Sequencing},
        elocation-id = {2024.10.23.619929},
        year = {2024},
        doi = {10.1101/2024.10.23.619929},
        publisher = {Cold Spring Harbor Laboratory},
        abstract = {Chimera artifacts in nanopore direct RNA sequencing (dRNA-seq) data can confound transcriptome analyses, yet no existing tools are capable of detecting and removing them due to limitations in basecalling models. We present DeepChopper, a genomic language model that accurately identifies and eliminates adapter sequences within base-called dRNA-seq reads, effectively removing chimeric read artifacts. DeepChopper significantly improves critical downstream analyses, including transcript annotation and gene fusion detection, enhancing the reliability and utility of nanopore dRNA-seq for transcriptomics research. Competing Interests: The authors have declared no competing interests.},
        URL = {https://www.biorxiv.org/content/early/2024/10/25/2024.10.23.619929},
        eprint = {https://www.biorxiv.org/content/early/2024/10/25/2024.10.23.619929.full.pdf},
        journal = {bioRxiv}
}

🀝 Contribution

We welcome contributions! Here's how to set up your development environment:

Build Environment

git clone https://github.com/ylab-hi/DeepChopper.git
cd DeepChopper
conda env create -n environment.yaml
conda activate deepchopper

Install Dependencies

pip install pipx
pipx install --suffix @master git+https://github.com/python-poetry/poetry.git@master
poetry@master install

πŸŽ‰ Ready to contribute? Check out our Contribution Guidelines to get started!

πŸ“¬ Support

Need help? Have questions?


DeepChopper is developed with ❀️ by the YLab team. Happy sequencing! πŸ§¬πŸ”¬

0