8000 How to deal with libraries that do not have UMIs? · Issue #75 · timoast/sinto · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
How to deal with libraries that do not have UMIs? #75
Open
@gabrielnegreira

Description

@gabrielnegreira

Hi,

I am trying to run sinto barcode in a fastq file originating from a custom single-cell DNA (not RNA) library. This library does not have UMIs. It contains the cell barcode in the first 45 nt in read 2, which is followed by the genomic insert. The structure is:

NNNNNNNNAGGANNNNNNNNACTCNNNNNNNNAAGGNNNNNNNNT-Genomic Insert

I am using sinto barcode to simply add the cell barcode to the reads identifier with the following command:

sinto barcode --barcode_fastq "$r2" \
--read1 "$r1" \
--read2 "$r2" \
--bases 45 \
--whitelist "$WHITELIST \
--suffix $LIB_PREFIX"

Where $r2 points to the read2 fastq file, $r1 points to the read 1 file, and $WHITELIST points to a text file with the whitelist of known barcodes.

Unfortunately, after some time running I get the following error:

Function run_barcode called with the following arguments:

barcode_fastq   /scratch/antwerpen/205/vsc20542/atrandi_scDNA/input_fastq/fastp/scDNA_AT_01_R2_clean.fastq
read1   /scratch/antwerpen/205/vsc20542/atrandi_scDNA/input_fastq/fastp/scDNA_AT_01_R1_clean.fastq
read2   /scratch/antwerpen/205/vsc20542/atrandi_scDNA/input_fastq/fastp/scDNA_AT_01_R2_clean.fastq
bases   45
prefix
suffix  scDNA_AT_01
whitelist       /scratch/antwerpen/205/vsc20542/atrandi_scDNA/whitelist.tsv
func    <function run_barcode at 0x154c53258360>
Traceback (most recent call last):
  File "/data/antwerpen/205/vsc20542/python_lib/bin/sinto", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/data/antwerpen/205/vsc20542/python_lib/lib/python3.12/site-packages/sinto/arguments.py", line 555, in main
    options.func(options)
  File "/data/antwerpen/205/vsc20542/python_lib/lib/python3.12/site-packages/sinto/utils.py", line 24, in wrapper
    func(args)
  File "/data/antwerpen/205/vsc20542/python_lib/lib/python3.12/site-packages/sinto/cli.py", line 105, in run_barcode
    addbarcodes.addbarcodes(
  File "/data/antwerpen/205/vsc20542/python_lib/lib/python3.12/site-packages/sinto/addbarcodes.py", line 101, in addbarcodes
    barcodes = correct_barcodes(barcodes, whitelist)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/antwerpen/205/vsc20542/python_lib/lib/python3.12/site-packages/sinto/addbarcodes.py", line 49, in correct_barcodes
    for entry in clusterer(counts, threshold=1):
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/antwerpen/205/vsc20542/python_lib/lib/python3.12/site-packages/umi_tools/network.py", line 368, in __call__
    assert max(len_umis) == min(len_umis), (
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: not all umis are the same length(!):  43 - 56

Is it due to the lack of UMIs after the barcode sequence? If so, is there a way of making Sinto bypass the UMI detection?

Not sure if helpful, but here is the head of my whitelist.txt file:

$ head whitelist.tsv
GTAATGCCAGGATACAGCAGACTCTACAACCGAAGGGTAACCGAT
GTAATGCCAGGATACAGCAGACTCTACAACCGAAGGTCCTCAACT
GTAATGCCAGGATACAGCAGACTCTACAACCGAAGGTGGTCTCAT
GTAATGCCAGGATACAGCAGACTCTACAACCGAAGGGTCCGATTT
GTAATGCCAGGATACAGCAGACTCTACAACCGAAGGTTGACCACT
GTAATGCCAGGATACAGCAGACTCTACAACCGAAGGTCCAGGATT
GTAATGCCAGGATACAGCAGACTCTACAACCGAAGGGACAGCATT
GTAATGCCAGGATACAGCAGACTCTACAACCGAAGGGATGGTCTT
GTAATGCCAGGATACAGCAGACTCTACAACCGAAGGCATACCGTT
GTAATGCCAGGATACAGCAGACTCTACAACCGAAGGCGGTTGATT

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0