8000 Command that merging outcomes of multiple SV callers · Issue #18 · starskyzheng/panpop · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Command that merging outcomes of multiple SV callers #18

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
lovelycatZ opened this issue Feb 20, 2024 · 25 comments
Closed

Command that merging outcomes of multiple SV callers #18

lovelycatZ opened this issue Feb 20, 2024 · 25 comments

Comments

@lovelycatZ
Copy link

Hello Zheng,

Thank you for providing a very nice tool. Currently, I am curious about how to merge outcomes of multiple SV callers. How to provide some vcf files to the standalone PART script? I noticed that the --in_vcf option can support only one file.

Thank you for your soon reply!

@starskyzheng
Copy link
Owner

Hi! You can follow this issue (#15). Bcftools need to be used.

@lovelycatZ
Copy link
Author

Hello Zheng,

Thank you for your soon reply! It works well for after using BCFTOOLS to merge outcomes of multiple SV callers. But the final result file named 3.final.vcf.gz had no records, and there was only a header section. While I used the SURVIVOR to merge and I got some records. Why?

@starskyzheng
Copy link
Owner

Can you provid vcf files?

@lovelycatZ
Copy link
Author

vcfs.zip
There are 4 vcf files from 4 CNV callers.

@starskyzheng
Copy link
Owner

Your VCF file is not in common format.
GT should be 0/0, 0/1, 1/1, 0/2, et al. However, GT in 21B01474858_cn.MOPS.vcf.gz was one number, which cannot be recognized by PanPop or SURVIVOR.
GT even not exists in file 21B01474858_ExomeDepth.vcf.gz, 21B01474858_GATK_gCNV.vcf.gz, and 21B01474858_Manta.vcf.gz.
You may manully add GT to those VCF files and rerun PanPop or SURVIVOR.

@lovelycatZ
Copy link
Author

Yes, you are right. Actually, the output of some callers is not in VCF format (such as TSV format), so manual conversion is taken. But what if I can't infer whether a CNV is homozygous or heterozygous? Does the GT tag have to exist in the VCF files to run the PART standalone script?

@starskyzheng
Copy link
Owner

GT must exists to use PART script. But I dont know how to deal with CNVs, unless you can transformat it into normal vcf file. Perhaps the CNV=0 means a deletion of GT=1/1, CNV=1 means a deletion of GT=0/1, CNV=3 means a insertion of GT=0/1?

@lovelycatZ
Copy link
Author

OK, I'll try to provide GT tag, thanks for your convenient time!

@lovelycatZ
Copy link
Author

This time there are a lot of records in the final VCF file, but it seems like all of them are not merged. Can you give me some advice according to the data?
data.zip

@starskyzheng
Copy link
Owner

There were no sequence info in VCF files. REF and ALT (colunm 4 and 5) need to be completed. Many be you can refer to https://github.com/starskyzheng/panpop/blob/main/scripts/long_caller_parser.pl.

@lovelycatZ
Copy link
Author

Do you mean I have to have the details for column 4 and 5? But if a SV is DEL or DUP, it make no sense to present sequence in REF or
ALT. And what does this script do?

@starskyzheng
Copy link
Owner

For DUP or DEL, the information of REF and ALT can be determed by other columns. But in common VCF format, REF and ALT must be specific.
Some of sv-callers also generate VCFs without REF and ALT info, so we write this script as transformation. But you may do some modification to apply to your dataset.
DUP is a specific type of insertion.

@lovelycatZ
Copy link
Author

Thank you for your time! I noticed that this script was customized for the internal callers and it's a little hard for me to fully understand script/long_caller_parser.pl as I am not familiar with PERL.
This error that all the records were not merged was due to no sequences were provided for REF and ALT, right? What if I add the sequences? Is this step should be performed before or after the bcftools merge process? Would you please provide a VCF file that was processed by script/long_caller_parser.pl so I could make the correct format?

@starskyzheng
Copy link
Owner

yes, this process must be done before bcftools merge.
VCF file could like this: https://samtools.github.io/hts-specs/VCFv4.2.pdf.

@lovelycatZ
Copy link
Author

In some cases, a DEL or DUP has a length of more than 1000bp, should we also add detailed sequences for REF and ALT?

@starskyzheng
Copy link
Owner

yes.

@lovelycatZ
Copy link
Author

Hello Zheng,

I'm sorry to have bothered you so long.

After I added the detailed sequences for REF, I got the results that seemed to be a little different than before. But there were still the same warnings as before in the process:

Done reading ref fasta read 195 contig / total 3099922541 bp
Warn! aln software(muscle) exit status(2)! redo! No. 1
Warn! aln software(muscle) exit status(2)! redo! No. 1
Warn! no alt[0] from aln software(HAlignC)! redo! No. 1
Warn! aln software(famsaP) exit status(1)! redo! No. 1
Warn! aln software(mafft) exit status(1)! redo! No. 1
Warn! aln software(muscle) exit status(2)! redo! No. 1
0 not in alts: $VAR1 = [];
 at /data/xiangxd/software/panpop/scripts/../lib/realign_alts.pm line 333.
	realign_alts::alt_alts_to_muts(undef, 3, ARRAY(0x2b76d14944a0)) called at /data/xiangxd/software/panpop/scripts/../lib/realign_alts.pm line 311
	realign_alts::process_alts(ARRAY(0x2b77a5d9f070), 3, 12452, undef, ARRAY(0x2b76d14944a0)) called at /data/xiangxd/software/panpop/bin/../scripts/realign.pl line 495
	main::process_line_new(ARRAY(0x2b77a5d9f070), HASH(0x2b77a59f9890), "chr16", 152760) called at /data/xiangxd/software/panpop/bin/../scripts/realign.pl line 446
	main::process_lines_new(ARRAY(0x2b77a531e2e8), "chr16", 152760, 165211) called at /data/xiangxd/software/panpop/bin/../scripts/realign.pl line 254
	main::mce_run(MCE=HASH(0x2b76d1074b70)) called at /data/xiangxd/perl5/lib/perl5/MCE/Core/Worker.pm line 489
	MCE::_worker_do() called at /data/xiangxd/perl5/lib/perl5/MCE/Core/Worker.pm line 593
	MCE::_worker_loop() called at /data/xiangxd/perl5/lib/perl5/MCE/Core/Worker.pm line 714
	MCE::_worker_main() called at /data/xiangxd/perl5/lib/perl5/MCE.pm line 2057
	MCE::_dispatch() called at /data/xiangxd/perl5/lib/perl5/MCE.pm line 2101
	MCE::_dispatch_child() called at /data/xiangxd/perl5/lib/perl5/MCE.pm line 692
	MCE::spawn(MCE=HASH(0x2b76d1074b70)) called at /data/xiangxd/perl5/lib/perl5/MCE.pm line 991
	MCE::run() called at /data/xiangxd/perl5/lib/perl5/MCE/Flow.pm line 429
	MCE::Flow::run(CODE(0x2b76d147bb78), CODE(0x2b76d1467330)) called at /data/xiangxd/software/panpop/bin/../scripts/realign.pl line 193
Warn! aln software(muscle) exit status(2)! redo! No. 1
Warn! aln software(muscle) exit status(2)! redo! No. 1
Warn! aln software(muscle) exit status(2)! redo! No. 1
All Done

This time, I provided 3 VCF files which were from 3 different callers. But when I added more VCF files to the PART standalone script, this warning came out and the running program just got stuck:

0 not in alts: $VAR1 = [];
 at /data/xiangxd/software/panpop/scripts/../lib/realign_alts.pm line 333.
	realign_alts::alt_alts_to_muts(undef, 3, ARRAY(0x2b76d14944a0)) called at /data/xiangxd/software/panpop/scripts/../lib/realign_alts.pm line 311
	realign_alts::process_alts(ARRAY(0x2b77a5d9f070), 3, 12452, undef, ARRAY(0x2b76d14944a0)) called at /data/xiangxd/software/panpop/bin/../scripts/realign.pl line 495
	main::process_line_new(ARRAY(0x2b77a5d9f070), HASH(0x2b77a59f9890), "chr16", 152760) called at /data/xiangxd/software/panpop/bin/../scripts/realign.pl line 446
	main::process_lines_new(ARRAY(0x2b77a531e2e8), "chr16", 152760, 165211) called at /data/xiangxd/software/panpop/bin/../scripts/realign.pl line 254
	main::mce_run(MCE=HASH(0x2b76d1074b70)) called at /data/xiangxd/perl5/lib/perl5/MCE/Core/Worker.pm line 489
	MCE::_worker_do() called at /data/xiangxd/perl5/lib/perl5/MCE/Core/Worker.pm line 593
	MCE::_worker_loop() called at /data/xiangxd/perl5/lib/perl5/MCE/Core/Worker.pm line 714
	MCE::_worker_main() called at /data/xiangxd/perl5/lib/perl5/MCE.pm line 2057
	MCE::_dispatch() called at /data/xiangxd/perl5/lib/perl5/MCE.pm line 2101
	MCE::_dispatch_child() called at /data/xiangxd/perl5/lib/perl5/MCE.pm line 692
	MCE::spawn(MCE=HASH(0x2b76d1074b70)) called at /data/xiangxd/perl5/lib/perl5/MCE.pm line 991
	MCE::run() called at /data/xiangxd/perl5/lib/perl5/MCE/Flow.pm line 429
	MCE::Flow::run(CODE(0x2b76d147bb78), CODE(0x2b76d1467330)) called at /data/xiangxd/software/panpop/bin/../scripts/realign.pl line 193

In addition to that, there were too many ./. GT tag in 3.final.vcf, is that correct? Here's attached a subsample and reduced version.

Many thanks!
3.final.zip

@starskyzheng
Copy link
Owner

Your input VCF is in wrong format.
For deletion, REF is the SV sequence, ALT is * or the first base of REF
For DUP or insertion, REF is one base of reference genome of this position of mutation. ALT is the SV sequence.

@lovelycatZ
Copy link
Author

Great thanks for your help!
I've successfully run both two steps of the PART standalone script and got the result without any errors and warnings. I noticed that the INFO tag ZORIPOS=XXX;ZORIEND=XXX;ZLEN=XXX was within some records but not all. And some GT tag was "./." How can we interpret the result?
part.zip

@starskyzheng
Copy link
Owner

As for SVs you can use bin/Fill_DP.pl.
However, your VCF is for CNVs, I don't know how to deal with it.

@lovelycatZ
Copy link
Author

I have built the conda enviroment according to the guide, but it get wrong for perl bin/Fill_DP.pl:
Can't locate zzIO.pm in @INC (you may need to install the zzIO module) (@INC contains: /data/xiangxd/perl5/lib/perl5/x86_64-linux-thread-multi /data/xiangxd/perl5/lib/perl5 /data/xiangxd/software/miniconda3/envs/panpop/lib/site_perl/5.26.2/x86_64-linux-thread-multi /data/xiangxd/software/miniconda3/envs/panpop/lib/site_perl/5.26.2 /data/xiangxd/software/miniconda3/envs/panpop/lib/5.26.2/x86_64-linux-thread-multi /data/xiangxd/software/miniconda3/envs/panpop/lib/5.26.2 .) at bin/Fill_DP.pl line 26. BEGIN failed--compilation aborted at bin/Fill_DP.pl line 26.
I can't even find the zzIO module on the PREL website.

@starskyzheng
Copy link
Owner

Fixed. Please try again. (bffe3f0)
However, I don't think this script can handle your VCF properly.

@lovelycatZ
Copy link
Author

In fact, I just want to know which records in the VCF file were merged by PART and which are not.

@starskyzheng
Copy link
Owner

Sorry, PanPop do not support this function.

@starskyzheng
Copy link
Owner

If no more activity, this issue will be closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants
0