Please provide a better worked example #1

jkbonfield · 2024-12-12T09:29:41Z

I'm trying to use syng and the documentation appears to be out of date.

The example usage has syng -o fAulStu2 -k -b -S fAulStu2-reads.1seq. The actual program however has no -b or -S flag and -k takes a parameter.

Also I can't figure out how to get it to work. I tried extracting a chunk of Revio HG002 data (the chr1 centromere) as fastq and it crashes.

./syng -o test in.fq
k, w, seed are 16 1023 7
sequence file 1 in.fq type fastq: 
had 33385 sequences (0 filtered) 515435467 bp, yielding 1422788 syncs with 418056 extra syncmers
user	10.287280	system	0.548212	elapsed 5.931787	alloc_max 1581	max_RSS	724928
Total for this run 33385 sequences, total length 515435467
Overall total 1422788 instances of 418056 syncmers, average 3.40 coverage
Segmentation fault

If I build with debugging enabled it doesn't segmentation fault, so I cannot get a stack trace. However it also produced no output and no files named test*.

If I try converting in.fq to in.1seq and running with that, as per the worked example, it bails out instantly.

./syng -o test in.1seq
k, w, seed are 16 1023 7
free(): invalid pointer
Aborted

It appears to be a hard fail on my input format not matching the expected byte stream, but I don't know anything about 1seq nor why my 1seq is a different format to the expected one.

Breakpoint 1, seqIOopenRead (filename=0x7fffffffd87e "in.1seq", convert=0x55555577e020 <dna2indexConv>, 
    isQual=false) at seqio.c:33
33	  SeqIO *si = new0 (1, SeqIO) ;
(gdb) n
34	  if (!strcmp (filename, "-")) si->gzf = gzdopen (fileno (stdin), "r") ;
(gdb) 
35	  else si->gzf = gzopen (filename, "r") ;
(gdb) 
36	  if (!si->gzf) { free(si) ; return 0 ; }
(gdb) p si->gzf
$1 = (gzFile) 0x5555558abc50
(gdb) n
37	  si->bufSize = 1<<24 ; // 16 MB
(gdb) 
38	  si->b = si->buf = new (si->bufSize, char) ;
(gdb) 
39	  si->convert = convert ;
(gdb) 
40	  si->isQual = isQual ;
(gdb) 
41	  si->nb = gzread (si->gzf, si->buf, si->bufSize) ;
(gdb) 
42	  if (!si->nb)
(gdb) 
47	  si->line = 1 ;
(gdb) 
48	  if (*si->buf == '>')
(gdb) 
52	  else if (*si->buf == '@')
(gdb) 
75	  else if (*si->buf == 'b')
(gdb) 
103	  else if (*si->buf == '1')
(gdb) 
104	    { gzclose (si->gzf) ; si->gzf = 0 ;
(gdb) 
free(): invalid pointer

The text was updated successfully, but these errors were encountered:

jkbonfield · 2024-12-12T09:35:32Z

I get further if I bgzip the in.1seq file as the gzclose is failing due to the input not being gzipped. However then it gets a SEGV soon after as it cannot open the 1seq file.

I thought that perhaps it needs a custom gzip, like bgzip but perhaps different. I noticed gaffer/seqconvert has the statement ".gz ending outfile name implies gzip compression", but it doesn't work:

@ seq4d[quantum/syng]; ./gaffer/seqconvert -t -o in.1seq.gz in.fq
reading from file type fastq
Segmentation fault

It works if I don't specify the .gz there.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Please provide a better worked example #1

Please provide a better worked example #1

Uh oh!

Please provide a better worked example #1

Please provide a better worked example #1

Comments

Uh oh!