8000 Please provide a better worked example · Issue #1 · richarddurbin/syng · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Please provide a better worked example #1

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
jkbonfield opened this issue Dec 12, 2024 · 1 comment
Open

Please provide a better worked example #1

jkbonfield opened this issue Dec 12, 2024 · 1 comment

Comments

@jkbonfield
Copy link

I'm trying to use syng and the documentation appears to be out of date.

The example usage has syng -o fAulStu2 -k -b -S fAulStu2-reads.1seq. The actual program however has no -b or -S flag and -k takes a parameter.

Also I can't figure out how to get it to work. I tried extracting a chunk of Revio HG002 data (the chr1 centromere) as fastq and it crashes.

./syng -o test in.fq
k, w, seed are 16 1023 7
sequence file 1 in.fq type fastq: 
had 33385 sequences (0 filtered) 515435467 bp, yielding 1422788 syncs with 418056 extra syncmers
user	10.287280	system	0.548212	elapsed 5.931787	alloc_max 1581	max_RSS	724928
Total for this run 33385 sequences, total length 515435467
Overall total 1422788 instances of 418056 syncmers, average 3.40 coverage
Segmentation fault

If I build with debugging enabled it doesn't segmentation fault, so I cannot get a stack trace. However it also produced no output and no files named test*.

If I try converting in.fq to in.1seq and running with that, as per the worked example, it bails out instantly.

./syng -o test in.1seq
k, w, seed are 16 1023 7
free(): invalid pointer
Aborted

It appears to be a hard fail on my input format not matching the expected byte stream, but I don't know anything about 1seq nor why my 1seq is a different format to the expected one.

Breakpoint 1, seqIOopenRead (filename=0x7fffffffd87e "in.1seq", convert=0x55555577e020 <dna2indexConv>, 
    isQual=false) at seqio.c:33
33	  SeqIO *si = new0 (1, SeqIO) ;
(gdb) n
34	  if (!strcmp (filename, "-")) si->gzf = gzdopen (fileno (stdin), "r") ;
(gdb) 
35	  else si->gzf = gzopen (filename, "r") ;
(gdb) 
36	  if (!si->gzf) { free(si) ; return 0 ; }
(gdb) p si->gzf
$1 = (gzFile) 0x5555558abc50
(gdb) n
37	  si->bufSize = 1<<24 ; // 16 MB
(gdb) 
38	  si->b = si->buf = new (si->bufSize, char) ;
(gdb) 
39	  si->convert = convert ;
(gdb) 
40	  si->isQual = isQual ;
(gdb) 
41	  si->nb = gzread (si->gzf, si->buf, si->bufSize) ;
(gdb) 
42	  if (!si->nb)
(gdb) 
47	  si->line = 1 ;
(gdb) 
48	  if (*si->buf == '>')
(gdb) 
52	  else if (*si->buf == '@')
(gdb) 
75	  else if (*si->buf == 'b')
(gdb) 
103	  else if (*si->buf == '1')
(gdb) 
104	    { gzclose (si->gzf) ; si->gzf = 0 ;
(gdb) 
free(): invalid pointer
@jkbonfield
Copy link
Author

I get further if I bgzip the in.1seq file as the gzclose is failing due to the input not being gzipped. However then it gets a SEGV soon after as it cannot open the 1seq file.

I thought that perhaps it needs a custom gzip, like bgzip but perhaps different. I noticed gaffer/seqconvert has the statement ".gz ending outfile name implies gzip compression", but it doesn't work:

@ seq4d[quantum/syng]; ./gaffer/seqconvert -t -o in.1seq.gz in.fq
reading from file type fastq
Segmentation fault

It works if I don't specify the .gz there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant
0