-
Notifications
You must be signed in to change notification settings - Fork 22
Adding an 'N' to no coverage positions of the consensus #3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
It looks like a custom config file would need to be installed for IRMA to set the parameters It looks like I'll have to update the IRMA Singularity image adding config files with the desired parameters set. In that case, I might as well update IRMA too. |
Hi @mnebroski I've just created a version 2.0.0 release for the workflow, which updates IRMA to version 1.0.2, adds "deletion by ambiguation" ( Please note that a samplesheet is now required as input rather than a quoted path to some reads. More info in the usage docs and a script to make it easier to create a samplesheet from a FASTQ directory. If you get a chance, let me know if this new release resolves this issue or if further changes are needed. |
Hi @peterk87 I reran the last couple of runs with the updated workflow, and on the one run it worked great! It seemed to solve the issue where the NA subtype prediction was called differently than the most abundant NA read count we saw in the one sample. `Oops... Pipeline execution stopped with the following message: WARNING: Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap. whoami: unknown uid 1000 The sample was a blank sample so there weren't any assembled segments expected, but it seems that the rest of the workflow wouldn't continue with this error. |
Thanks for letting me know about that issue. I'll fix it in a patch release ASAP |
Hi @mnebroski The issue you encountered with the blank sample should be resolved with release 2.0.1 (fixed in #6). I've added blank/empty FASTQ files to the workflow test dataset to make sure there's no more regressions in the future. |
Hi @peterk87 I pulled the latest release and reran the samples, which did resolve the blank sample issue! However the samples that we were expecting to see N's in the consensus were still shorter than the expected segment length and did not contain the N's. It looks like the default for keeping the reference deletions is set to true, but is there another parameter that I'm missing? The command I ran was:
|
Hi @mnebroski I think I'd need more information and some example data/results to see what you mean. Are you talking about internal gaps being removed? Or are you talking about the ends which don't have any reads mapping? FYI IRMA uses its own set of reference sequences for the various segments and subtypes for performing assembly ( |
Hi @peterk87 It looks like internal gaps are being removed. When we Blast the consensus sequence the results are broken up into several smaller alignments instead of one continuous alignment. The expected length for the segment is also shorter than expected (for example segment 1 should be ~2341nt and the consensus sequence generated is ~1500nt). I can provide you with the specific samples that we're seeing this in if that will help. Oh okay that's interesting and good to know! Is there a way for us to edit this file to provide our own sequences or could that potentially cause problems with the way the IRMA analysis is performed? |
Yes, if you could link/pass on the specific reads and IRMA output directories where that is happening, then I could do some troubleshooting and see if it's just a matter of adjusting some parameters.
We could try creating a new module using the FLU module as a template, however, the process seems quite involved: https://wonder.cdc.gov/amd/flu/irma/modules.html There's also a FLU_AD module which may be worth trying out before trying to create a new IRMA module. |
It's been noted that some segments that are not complete are having the gaps removed from the consensus to create a continuous sequence. From the IRMA paper it looks like by default there is deletion editing occurring to remove gaps from the sequence:
Would it be possible to adjust the workflow to by default include an 'N' in the consensus at positions of no coverage?
The text was updated successfully, but these errors were encountered: